Paul Winkler wrote:
On Thu, Jul 06, 2006 at 01:43:40PM +0200, Pieter
Palmers wrote:
Modern harddisks use a lot of write caching on
the controller to achieve
decent performance. So when power goes down when there is data in the
write cache, it is lost. The file system however 'thinks' that data has
been written correctly. This hence results in file system corruption.
FS corruption is no fun (I once spent two days recovering data
after bad RAM corrupted an ext2 fs... I ended up with every file I had
in lost+found). But the particular failure I mentioned was drive
hardware, no doubt about it. Lots of low-level IDE errors in
/var/log/messages. Couldn't fsck it, couldn't get any raw
data out of it with "dd if=/dev/hdb", nothing.
I didn't have any warning, either... no funny noises, no
problems or errors the last time I mounted it. *shrug*
Of course I'm not suggesting that there are no hardware failures, and
I'm really not questioning your judgment.
Just sharing my personal experience (own hardware and that of others),
which is that once you prune out the hard disk failures caused by power
outage, there are not much left. And in (almost) all of the power outage
caused failures you can re-use the hard disk perfectly after a low level
format. And that is not due to bad sector relocation, but simply because
the CRC errors are cleared.
Problems caused by the interruption of a write operation by the disk are
not distinguishable from 'real' hardware failures by the OS. The hard
disk's firmware treats the sector where this interruption occurred as a
bad sector, and reports it like that to the OS. So you also get a lot of
messages in /var/log/messages. This phenomenon occurs on the harddisk
itself, past the operating system, filesystem and ide controller.
However, you should be able to get some raw data out of it using dd (or
ddrescue), as not all sectors are marked as 'bad'. That makes it
slightly less bad as a total hard disk crash.
Pieter
PS: another tip: The sysadmins at work once told me that their
experience is that harddisks tend to fail when they are shut down. What
they dread the most is having to power down a server, even cleanly.