backup

The dumbfuckery of storage drive data integrity

With all the complicated standards including data integrity, it is outrageous how useless it all still is.

I got a 4 TB WD Black harddisk that is probably 99.99999999% just fine, but it is telling me it will die in less than 24 hours and I better throw it in the trash.

If your harddisk suffers from a bad sector, the file occupying it is ruined. But you usually won’t even know that until you try to access it manually, directly, intentionally, and that fails.

Generally you might get a warning of some kind. So you know there’s a defect.
Now, to find out where it is, you need a special tool like GSmartControl that can show you the log and then the 24 or so latest entries shown and spammed with the same error are hopefully indicating only onw bad sector.

Then you would have to use a Windows command line tool nfi.exe to inquire what file if any is using that sector. And once you get that info you know which one is damaged and thus lost.

But now the real insanity starts: Unless repair attempts on the sector succeed, no dozens of failed and time-wasting read attempts can convince the firmware to reallocate the sector, to replace it with a spare or simply block access. I haven’t seen it happening so I cannot even confirm it, but AFAIU this only happens if you WRITE to that sector the next time. BUT… things like secure erase of the file won’t do, because the sector also needs to be unused before you attempt to write to it.
AND apparently even secure erase still leaves a filesystem remnant, basically a nameless undeletable garbage file, and it seems that this is still considered part of the filesystem for God knows how long. Because I verified that the bad sector is not used by any file, and yet chkdsk gets stuck for all eternity on the files check at the start, as if that sector was still part of the filesystem contents. (And if you try chkdsk /b, it might take far too long to get to the damaged spot and even there it is not guaranteed to succeed.)

So to clarify: Failing to read a single damn spot on the disk for half an hour that procudes tons of SMART log spam about the exact location does not convince the drive to replace that bit.

Also, DiskGenius can do targeted scans of disk surface to identify damaged regions, but its repair attempt will fail quite like chkdsk, endless freeze. Sadly it doesn’t offer to replace it without that futile attempts. (It did manage to repair a weak sector, but since SMART does not report a new reallocation event, I have to assume it merely repaired it through writes, so it could start making trouble again soon.)

HDDScan offers such a surface test, too, but its output seems to get stuck on reporting bad sectors long after the one that actually is bad.

But the most frustrating thing about all this pretend-SMARTness that is F.U.C.K.I.N.G.D.U.M.B. is that while I am being informed there is “Current Pending Sector Count = 1”, it does not inform me which sector is the pending one that I haven’t identified yet, even though it would have to know the position from knowing it is there.

More or less realistic options you have:

  • Go to Linux and use command line hackery with smartctl and hdparm to manually surgically replace the bad sector.
  • Alter the disk’s partitioning to exclude the area with the defective sector. (DiskGenius can tell you the megabyte-based region on the disk based on the sector.)
  • Slow-format or secure-erase the whole disk. (But AFAIR I have had issues with such in the past because the system pretended the problems had been repaired but then they reoccured later after I recopied all the data onto it. Not on those writes, apparently, but afterwards.)

So, since I have an installation of that relatively shitty Linux, I went there and after some basic hassle this command did the trick:
“sudo hdparm –write-sector 6960709 –yes-i-know-what-i-am-doing /dev/sdc”
It is described as an alias for “–repair-sector” and I thought ‘damn, not that again’. And it went so quick with no delay whatsoever and reading the sector only yielded zeroes that I was wondering whether it was the correct sector, but after checking back in Windows, the obstacle was gone!
So apparently it can be easy to accomplish such simple and intelligent things!
(But as I keep pointing out, Linux has its own infuriating shortcomings.)

I am still not over my skepticism, though. The reallocated sector count is still the same, so I have to wonder whether under Windows the repair attempt just failed for some reason or whether under Linux that method simply doesn’t trigger the counter.
Also, Offline Uncorrectable is still 1, and I read it is supposed to go back to 0 when everything is fine, not be a history-type statistic, although that’s unreliable info.


Now, what SHOULD happen in this data safety system, as an integral part, is that the OS warns and informs which file is affected by a defective sector and not in its original state anymore, and the drive should also reallocate sectors that are just weak, because in my view there is no such thing as a repaired sector, as my own experience with this drive has shown. If it is weak, it cannot be trusted anymore, and there should be plenty of spares. It is no drama to lose even a couple megabytes, although that will rarely be the case, and if so, the harddisk is probably finished. – But to cause such an obstacle out of a single sector, that is nuts. A late surprise of data corruption and a huge drama. Very unprofessional and sloppy, kinda defying the purpose of accurately tracking what is happening with the data.

The existing system looks like not designed primarily to protect your data but to convince you to buy a new product.

The silliness of consumers purchasing M-DISCs

Clever business move to offer archival-tech optical media (https://en.wikipedia.org/wiki/M-DISC) to the consumer market. But I have to chuckle/facepalm when I read people mentioning how their 10 years old burned CDs aren’t readable anymore, so now, in the year 2015 where optical media are close to suffering the same fate that the floppy disc once did, people are paying insane prices for optical discs that will reliably last hundreds of years. What a comedy. For much less they could have transferred their data to fresh media if it’s so important to them to not just store them on a flash drive or hadddisk. And who knows? Maybe their old discs would still be alright if they had bought quality ones back then.

BTW this isn’t even relevant to people who still play their music from CDs, because M-DISCs are available for DVD and BD only. And pressed CDs have good longevity anyway, so you’ll have your originals as archive for a long time.

Even in the area of professional archival storage there are good technology competitors to optical M-DISC media. This is a niche technology, very useful for a very small number of applications. And not a surprise that such things come towards the end of a technology’s life. Because it’s business – focused on profit, not on usefulness. Shows how well it works to sell people stuff they don’t need.

And not that it would be relevant due to what I just pointed out, but even their marketing is deceptive, too. (As marketing so often is.) They advertise more than 1000 years of data storage, but if you dig a little for details, you learn that that doesn’t mean error-free. The time span in which you can call it virtually guaranteed that all data is still safely readable is significantly lower. … Yeah, the horribly truth is that in 100 years from now you might have to go through the hassle of transferring all your M-DISC data to quantum crystal storage, haha. What a drag. … Oh wait, in 100 years you’re dead! … And in 1000 years your distant offspring won’t even remember there ever was such a thing as M-DISCs, because even the few organisations who once made use of them will have switched to a different technology long ago. If the organisations still exist.