Joern,
the failure of each individual disk is a statistically
independent
event. there is absolutely no synchronized self-destruct of disks
belonging to the same batch.
It is not independent because the environment has an impact on the
failure rate (vibration, delta temperature etc.). So it is more likely
that another drive fails if one has already failed.
there is however the tendency of new disks to either
die pretty fast or
last quite a while, and a very instructive long-term study by our
friends at google about aging disks.
http://labs.google.com/papers/disk_failures.pdf
so if you're really careful with
your data, you don't wait for an old disk to fail but retire it after a
defined time. plus you should watch your smart data for early warning
signs. doing this makes large arrays quite manageable.
Performing checksum rebuilds on a regular basis is mandatory to detect
scan errors. On TB RAIDs with many discs they might me undetected for
months. I run a full rebuild at least once per month to detect scan
errors early.
the problem these days is that disk sizes are getting
ever larger.
that means more data is at risk, and the time to resync a degraded array
is also increasing. hence, raid5 might not be safe anymore.
RAID5 has never been safe. After one disc failed, you're caught with
your pants down.
raid6 is
designed to not expose your data to a window of vulnerability at all -
one disk can always be swapped while the array remains fully redundant.
Run RAID6 with at least one hot spare. Discs tend to fail when you're
not around.
The *real* problem is how to handle and archive the huge amounts of data
we work with today. Audio is a joke in comparison to video data. The
cheapest solution so far for mid term storage is still using more RAIDs,
unfortunately.
Flo
--
Machines can do the work, so people have time to think.
public key DA43FEF4
x-hkp://wwwkeys.eu.pgp.net