ZFS – Great Idea, if you have backups…

So this article is about the Zeta File System (ZFS) a relatively new file system and one that is supposedly very resilient to errors. It has some great features to prevent dataloss and is touted as the be-all and end-all of filesystems. Well my experience is varied, both good and bad, but the reality for me is that if you want to take advantage of some of the features ensure you have backups.. of everything because ultimately your data is not safe.

So a little history, many years ago I decided I needed a storage device to keep all my important stuff, and less important stuff and I was drawn to ZFS with its ‘self healing’ features so often touted, so I built a new server with 16x3T drives and configured it up using RAIDZ2 (the sorta equivalent of RAID6) with 15 disks and a hotspare. I started moving my data from the myriad of external drives to the storage array. I then suffered my first issues…. a drive died.. silently at first.. when I eventually spotted it I immediately issued a replace using the hotspare – which turned out to be not so ‘hot’ as you had to manually switch it in and the array was recovered following a week long resilver process.

Several years went past without incident, big 6kva UPSs kept power outages at bay from causing problems, regular scrubs seemed to work. A minor issue happened on another drive fail where the metadata was reported corrupt and to fix it I had to blow away the directory and it’s contents which were not important as could easily be recovered/rebuilt. This should have been the warning to me that all is not as safe as it should be, but I had bought into the hype and carried on blindly. Then came the undersea cable fire in Malta (well the fire wasn’t undersea, but it was on the end of the undersea cable directly under 4 substations which supplied my area) .. out went the power.. for some 12+ hours, the generator ran out of fuel, the UPSes all ran out of battery and the host died, hot and hard. It was not pretty, and on return of power things were not happy. The reboot was fine for the RAID set, but the ZFS was not happy it wouldn’t mount, infact it wouldn’t do anything especially import it. Calling one the FreeBSD Devs it was suggested I try, “zpool import -FfX storage” and a week of watching the drive activity I got the entire pool back without and visible errors.

I was sold, that was it this was the file system that was unbreakable. I threw caution to the wind and instead of keeping a server that was identical I started using the more useful features of ZFS. I created ZVOLs and built VMs, I created ‘timemachine’ drives for my Macs, and started to reap the short term benefits of ZFS.

Then I moved across the world from Malta to Australia, back home, and shipped the servers through an international removalists. The servers, despite being prepared correctly and packaged properly arrived damaged. Three (yes three) drives were damaged so as RAID6 (and ZRAID2) can only cope with two drives failed I decided to try something ….. I byte copied all three corrupt drives to new drives and put the new drives back into the server/array and ran the import as before. Several weeks of ‘rebuild’ and the array came back online – again without dataloss.. RESULT!

That’s not the end of it though, after all this you’d think I would be raving about it’s resilience. Well each time the issues have happened the critical data structures on disk have not been affected so recovery has been possible. Fast forward to March 2019, a drive died and I replaced it, then whilst resilvering on March 9, 2019 (just before midnight) a transformer blew up down the road, taking out all power. UPSes did not kick in and the generator did no good at all because the power was lost when the power went down. Power was resorted a short time later and the zpool was reporting it could not be imported without rolling back 5 seconds of data… this of course was not an issue as I had been minimising writes from the moment of the rebuild.

I went to bed leaving it to rebuild…

6am March 10, 2019 some idiot drunk or on drugs took out the power pole down the road from me, and the 11KV lines contacting the 240V lines ensured that the UPSes wouldn’t save a thing. Power went down and on return of power the problems of ZFS became very very apparent.

Here’s the issue (in sorta non technical terms)… The file system is a big database, one with lots of redundancy and checks, but it has a fundamental flaw built in. This flaw you can see in a multitude of posts where the devs state with resounding coherence of the party line, “The data on disk is always correct.” Well it is and isn’t… the data is all there, and it reports correct, but if one of the critical structures is corrupt, (eg a spacemap – in my case) the metadata (the stuff the makes sure that your data is right and stays right) is deemed corrupt and so ZFS in its wisdom pronounces the whole drive corrupt.

So lets reiterate that…

A small part of the data in the pool (drive).. just the right (or wrong – depending on your point of view) part got part written because of a power outage and now the entire pool (drive) cannot be mounted. All 36,964,899 files, some 21.2 terrabytes of data.. In fact according to “zpool status” there are just 3 errors in total and examination with “zdb” it appears they are all checksum errors of “metaslab 122” because of the spacemap corruption. So many weeks later I’m still trying to recover the data – I’ve just got myself another 36T of drive space after trying in place recovery, but still no luck.

I don’t have backups as I was moving stuff around and as previously stated had already thrown caution to the wind. The next step for me is to modify the code to ignore the checksum errors and see if I can ‘walk the dataset’ for all the files

I’ll let you know how I get on, but with all the ZFS devs posting that there is no need for a “fsck” in ZFS as the disk is always right I can only suggest anyone thinking of deploying ZFS to only do so if:

1/ You can make full backups, or

2/ You can afford to lose all the data

(and it should be noted, FreeBSD devs are advocating that the root file system should be on ZFS and is now actually the default when installing… good luck laptop owners on road…!)

Edit/Update: Added links to the news articles for the large power issues, changed data from 19th March -> 10th March as my initial post incorrectly detailed the date.

Update [2]: I posted this blog link to the FreeBSD mailing lists here: https://lists.freebsd.org/pipermail/freebsd-stable/2019-April/090988.html and unfortunately many chose to follow the same line of ‘ZFS is always right’ that I see elsewhere (eg: ZFS on Linux mailing lists) .. which is part of the problem. To their credit though a couple of the FreeBSD Devs contacted me onlist with helpful suggestions (you can see these in the links) and others contacted me offlist with really helpful information. One even (off list) pointed me at: Klennet Storage Software ZFS-Recovery which I have not had chance to test yet (being that I need to setup a Windows 7 image on an external USB drive) – but if it delivers what it promises it is the missing link that ZFS needs (in my opinion.)

Dive gear – The Do’s And Don’ts

Some of you will know I’ve been a diver for many years, the more astute of you will know of my love of underwater photography.

So a little about my policy on gear.. I tend to choose a manufacturer after doing a bit of research and stick with it, for everything. Its called brand loyalty…

Photographic equipment, I went with Nikon, and have gear worth in excess of €25,000, underwater housings, Sea and Sea worth a not insignificant amount. Dive gear, Oceanic, even my computer gear, all Apple (and no I’m not a “fan boy”.). I have just found if you stick to a brand everything “just works”.

Well unfortunately it seems I was wrong to trust brand loyalty is not a great thing for some brands as they have no customer loyalty.

This, therefore, is the story of Oceanic. Regulators, BCD, computers (three of them), masks, fins, even wetsuits, all of which I have despite certain items being better with other manufacturers I was sucked in by the “lifetime warranty” initially, and the deal was sealed when their “medium large” size for the wetsuit fit me perfectly.

Oceanic – Australia

Without fail in Australia I took my gear back to Nautilus SCUBA of Brisbane an authorized service center/dealer for Oceanic and all was fine. I then moved from Brisbane to Canberra and found myself visiting Norm Green from Indepth SCUBA who is both a good friend and great dive shop though this is where my problems started. They serviced my regulators one year and some mixup resulted in the Warranty being voided because I had supposedly no serviced the regulators one year… of course this I balked at and persisted in chasing Norman over the issue and after showing receipts and numerous emails from him to Oceanic the Warranty was reinstated due to me keeping to the service records over the years (turns out it was a late submission of paperwork that caused the problem.)

Oceanic – Malta

Then in 2009 I moved to Malta, and searched out a local Oceanic dealer.. world wide warranty? Pfft! From day one they told me there was no world wide warranty and I would have to pay in full for all servicing and parts, so I did, even when I had to stop diving because of a bout of cancer… Every year the regs, computer and BCD was serviced.

8 years later I returned to Australia and went to Dive Jervis Bay to get my gear serviced … especially after getting wet and finding my regs started free flowing. After waiting months for servicing and repair I was informed that the regulators were missing 2 parts, one of which was a critical O-ring and, in the words of Dive Jervis Bay, I was lucky to be alive as the regs could have failed at anytime.

The battery died on my Oceanic OC1 (not the first time), so I took it to Dive Jervis Bay and asked them to replace, test and service it. A couple of weeks and a few hundred dollars later it was returned to me and I booked a dive.

30 seconds into the dive I found the computer going into “calibrate compass” mode and buttons failing, then the dreaded water droplets. Dive aborted, and waited the first dive out, second dive I went with a backup. On return to shore I gave the computer back to the shop and asked them to look at it, they said they sent it back to Oceanic.

Weeks later (6-8 weeks) I was informed the computer was out of warranty and it was a write off as they were an obsolete model and $1000+ would need to be paid for a replacement. I suggested they should reconsider, and several weeks later received the reply that no, that was that, new computer at $1000 or I should go with another manufacturer. In shop I was asked to consider the Suunto range.

Well upshot of all this, after months of asking for the return of my now dead computer it was returned to me, and finally tonight I got around to opening it up. To my astonishment I found the computer very obviously had not even been opened, as it was still full of water, and the reason for the flood was the seal on the battery cover was both damaged and had debris on it.

So the do’s and don’ts …

Don’t trust a world wide warranty particularly by Oceanic, it’s not, and it will be cancelled at the drop of a hat, even if it is not your (the consumers) fault.

Don’t trust authorized service agents (particularly in Europe) to actually safely service your gear, let alone honor service agreements.

Don’t trust the manufacturer or their authorized service agents to care about you respecting brand loyalty (they don’t give a crap, it’s all money to them.)

Do research what you’re buying.

Do research “authorized service centers” to see if they have mandatory training.

Do learn how to service your own gear so you can at least check the work done by the agent.

Don’t assume because you are paying top dollar for gear you’re getting top quality.

Don’t bother with brand loyalty, it used to be worth something, but nowadays its worth nothing, the only thing brands care about are the number of greenbacks you can give up.

Footnote

So as I don’t expect to hear anything from Oceanic or any other Dive gear manufacturer, I’m now ridding myself of Oceanic stuff and going with what ever suits the purpose by which ever manufacturer I feel is not offering the best deal/value for money… Starting with a new air-integrated Computer.