Wednesday, April 12, 2006

storage disks... what is RAID??

Hi there, no one doubts that data and storage is core in any GIS implementation... especially in a government project that collects tons of data. Sometimes in GIS projects you hear about hardware or database vendors talking about disks and RAID configuration. It can get quite confusing... and you need to clearly understand the technical concept before you agree to any configuration.

Storage is expensive and decisions made must be clear and precise. It would be too late and way too expensive to change later.

So let me explain a bit on RAID technology (I get a lot of support calls on this) and some of the common configurations used:
  • RAID 0 (also known as Disk Stripping) - This one requires minimum 2 drives and data written to the disk is broken down into blocks and each block written to seperate disks. The advantage of this is data becomes very fast... you are reading and writting data to multiple disks and the I/O is distributed. Just like walking into Public Bank and there are so many counters open... transactions are faster. The disadvantage of RAID 0 is ... when a single or more drive fails... thats it... the entire array is corrupted and you are screwed.
  • RAID 1 - This one requires minimum 2 (with multiples of 2). All data written to storage system is replicated to 2 physical disks, providing good redundancy. This is a reliable configuration assuming a single disk per pair fails... but performance can suffer for writting data because it has to write to 2 drives. Actually it depends on the kind of application you are using... cause if its gonna be high updates on data all the time... concurrently... then there might be other RAID levels to choose. I like RAID 1 actually.
  • RAID 3 (Parallel transfer disks with parity) - This one requires a minimum of 3 drives. Data is broken into byte level and evenly written across data disks. It tolerates the loss of a single drive. Reasonable sequential write performance. Good sequential read performance. However, its rarely used, so troubleshooting information could be sparse. Requires hardware RAID to be truly viable. RAID 3 is generally considered to be very efficient. Poor random write performance. Fair random read performance.
  • RAID 4 (Independent data disks with shared parity blocks) - A file is broken down into blocks and each block is written across multiple disks, but not necessarily evenly. Like RAID 3, RAID 4 uses a separate physical disk to handle parity. Excellent choice for environments in which read rate is critical for heavy transaction volume. Drives required (minimum): 3 Pros: Very good read rate. Tolerates the loss of a single drive. Cons: Write performance is poor. Block read performance is okay.
  • RAID 5 (Independent access array without rotating parity) - Minimum drives required is 3. Blocks of data are written across sets of disks… but parity (storage info)… is kept with the rest of the data. This is the most popular RAID and tolerates the loss of a single drive. So everytime data is written, parity info (storage metadata) is stored together… which sometimes can cause slowness… but its bearable.
  • RAID 6 (Independent Data disks with two independent distributed parity schemes) - Blocks of data are written to entire set of disks… it could be uneven too. Minimum drives required is 3. This config can tolerate up to 2 drives loss. Its mainly used for high end applications where storage is really mission critical. Very rare in GIS, unless its military applications.

Parity - In certain RAID levels, redundancy is achieved by the use of parity blocks. If a single drive in the array fails, data blocks and a parity block from the working drives can be combined to reconstruct the missing data.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home