ZFS – Great Idea, if you have backups…

So this article is about the Zeta File System (ZFS) a relatively new file system and one that is supposedly very resilient to errors. It has some great features to prevent dataloss and is touted as the be-all and end-all of filesystems. Well my experience is varied, both good and bad, but the reality for me is that if you want to take advantage of some of the features ensure you have backups.. of everything because ultimately your data is not safe.

So a little history, many years ago I decided I needed a storage device to keep all my important stuff, and less important stuff and I was drawn to ZFS with its ‘self healing’ features so often touted, so I built a new server with 16x3T drives and configured it up using RAIDZ2 (the sorta equivalent of RAID6) with 15 disks and a hotspare. I started moving my data from the myriad of external drives to the storage array. I then suffered my first issues…. a drive died.. silently at first.. when I eventually spotted it I immediately issued a replace using the hotspare – which turned out to be not so ‘hot’ as you had to manually switch it in and the array was recovered following a week long resilver process.

Several years went past without incident, big 6kva UPSs kept power outages at bay from causing problems, regular scrubs seemed to work. A minor issue happened on another drive fail where the metadata was reported corrupt and to fix it I had to blow away the directory and it’s contents which were not important as could easily be recovered/rebuilt. This should have been the warning to me that all is not as safe as it should be, but I had bought into the hype and carried on blindly. Then came the undersea cable fire in Malta (well the fire wasn’t undersea, but it was on the end of the undersea cable directly under 4 substations which supplied my area) .. out went the power.. for some 12+ hours, the generator ran out of fuel, the UPSes all ran out of battery and the host died, hot and hard. It was not pretty, and on return of power things were not happy. The reboot was fine for the RAID set, but the ZFS was not happy it wouldn’t mount, infact it wouldn’t do anything especially import it. Calling one the FreeBSD Devs it was suggested I try, “zpool import -FfX storage” and a week of watching the drive activity I got the entire pool back without and visible errors.

I was sold, that was it this was the file system that was unbreakable. I threw caution to the wind and instead of keeping a server that was identical I started using the more useful features of ZFS. I created ZVOLs and built VMs, I created ‘timemachine’ drives for my Macs, and started to reap the short term benefits of ZFS.

Then I moved across the world from Malta to Australia, back home, and shipped the servers through an international removalists. The servers, despite being prepared correctly and packaged properly arrived damaged. Three (yes three) drives were damaged so as RAID6 (and ZRAID2) can only cope with two drives failed I decided to try something ….. I byte copied all three corrupt drives to new drives and put the new drives back into the server/array and ran the import as before. Several weeks of ‘rebuild’ and the array came back online – again without dataloss.. RESULT!

That’s not the end of it though, after all this you’d think I would be raving about it’s resilience. Well each time the issues have happened the critical data structures on disk have not been affected so recovery has been possible. Fast forward to March 2019, a drive died and I replaced it, then whilst resilvering on March 9, 2019 (just before midnight) a transformer blew up down the road, taking out all power. UPSes did not kick in and the generator did no good at all because the power was lost when the power went down. Power was resorted a short time later and the zpool was reporting it could not be imported without rolling back 5 seconds of data… this of course was not an issue as I had been minimising writes from the moment of the rebuild.

I went to bed leaving it to rebuild…

6am March 10, 2019 some idiot drunk or on drugs took out the power pole down the road from me, and the 11KV lines contacting the 240V lines ensured that the UPSes wouldn’t save a thing. Power went down and on return of power the problems of ZFS became very very apparent.

Here’s the issue (in sorta non technical terms)… The file system is a big database, one with lots of redundancy and checks, but it has a fundamental flaw built in. This flaw you can see in a multitude of posts where the devs state with resounding coherence of the party line, “The data on disk is always correct.” Well it is and isn’t… the data is all there, and it reports correct, but if one of the critical structures is corrupt, (eg a spacemap – in my case) the metadata (the stuff the makes sure that your data is right and stays right) is deemed corrupt and so ZFS in its wisdom pronounces the whole drive corrupt.

So lets reiterate that…

A small part of the data in the pool (drive).. just the right (or wrong – depending on your point of view) part got part written because of a power outage and now the entire pool (drive) cannot be mounted. All 36,964,899 files, some 21.2 terrabytes of data.. In fact according to “zpool status” there are just 3 errors in total and examination with “zdb” it appears they are all checksum errors of “metaslab 122” because of the spacemap corruption. So many weeks later I’m still trying to recover the data – I’ve just got myself another 36T of drive space after trying in place recovery, but still no luck.

I don’t have backups as I was moving stuff around and as previously stated had already thrown caution to the wind. The next step for me is to modify the code to ignore the checksum errors and see if I can ‘walk the dataset’ for all the files

I’ll let you know how I get on, but with all the ZFS devs posting that there is no need for a “fsck” in ZFS as the disk is always right I can only suggest anyone thinking of deploying ZFS to only do so if:

1/ You can make full backups, or

2/ You can afford to lose all the data

(and it should be noted, FreeBSD devs are advocating that the root file system should be on ZFS and is now actually the default when installing… good luck laptop owners on road…!)

Edit/Update: Added links to the news articles for the large power issues, changed data from 19th March -> 10th March as my initial post incorrectly detailed the date.

Update [2]: I posted this blog link to the FreeBSD mailing lists here: https://lists.freebsd.org/pipermail/freebsd-stable/2019-April/090988.html and unfortunately many chose to follow the same line of ‘ZFS is always right’ that I see elsewhere (eg: ZFS on Linux mailing lists) .. which is part of the problem. To their credit though a couple of the FreeBSD Devs contacted me onlist with helpful suggestions (you can see these in the links) and others contacted me offlist with really helpful information. One even (off list) pointed me at: Klennet Storage Software ZFS-Recovery which I have not had chance to test yet (being that I need to setup a Windows 7 image on an external USB drive) – but if it delivers what it promises it is the missing link that ZFS needs (in my opinion.)

Dive gear – The Do’s And Don’ts

Some of you will know I’ve been a diver for many years, the more astute of you will know of my love of underwater photography.

So a little about my policy on gear.. I tend to choose a manufacturer after doing a bit of research and stick with it, for everything. Its called brand loyalty…

Photographic equipment, I went with Nikon, and have gear worth in excess of €25,000, underwater housings, Sea and Sea worth a not insignificant amount. Dive gear, Oceanic, even my computer gear, all Apple (and no I’m not a “fan boy”.). I have just found if you stick to a brand everything “just works”.

Well unfortunately it seems I was wrong to trust brand loyalty is not a great thing for some brands as they have no customer loyalty.

This, therefore, is the story of Oceanic. Regulators, BCD, computers (three of them), masks, fins, even wetsuits, all of which I have despite certain items being better with other manufacturers I was sucked in by the “lifetime warranty” initially, and the deal was sealed when their “medium large” size for the wetsuit fit me perfectly.

Oceanic – Australia

Without fail in Australia I took my gear back to Nautilus SCUBA of Brisbane an authorized service center/dealer for Oceanic and all was fine. I then moved from Brisbane to Canberra and found myself visiting Norm Green from Indepth SCUBA who is both a good friend and great dive shop though this is where my problems started. They serviced my regulators one year and some mixup resulted in the Warranty being voided because I had supposedly no serviced the regulators one year… of course this I balked at and persisted in chasing Norman over the issue and after showing receipts and numerous emails from him to Oceanic the Warranty was reinstated due to me keeping to the service records over the years (turns out it was a late submission of paperwork that caused the problem.)

Oceanic – Malta

Then in 2009 I moved to Malta, and searched out a local Oceanic dealer.. world wide warranty? Pfft! From day one they told me there was no world wide warranty and I would have to pay in full for all servicing and parts, so I did, even when I had to stop diving because of a bout of cancer… Every year the regs, computer and BCD was serviced.

8 years later I returned to Australia and went to Dive Jervis Bay to get my gear serviced … especially after getting wet and finding my regs started free flowing. After waiting months for servicing and repair I was informed that the regulators were missing 2 parts, one of which was a critical O-ring and, in the words of Dive Jervis Bay, I was lucky to be alive as the regs could have failed at anytime.

The battery died on my Oceanic OC1 (not the first time), so I took it to Dive Jervis Bay and asked them to replace, test and service it. A couple of weeks and a few hundred dollars later it was returned to me and I booked a dive.

30 seconds into the dive I found the computer going into “calibrate compass” mode and buttons failing, then the dreaded water droplets. Dive aborted, and waited the first dive out, second dive I went with a backup. On return to shore I gave the computer back to the shop and asked them to look at it, they said they sent it back to Oceanic.

Weeks later (6-8 weeks) I was informed the computer was out of warranty and it was a write off as they were an obsolete model and $1000+ would need to be paid for a replacement. I suggested they should reconsider, and several weeks later received the reply that no, that was that, new computer at $1000 or I should go with another manufacturer. In shop I was asked to consider the Suunto range.

Well upshot of all this, after months of asking for the return of my now dead computer it was returned to me, and finally tonight I got around to opening it up. To my astonishment I found the computer very obviously had not even been opened, as it was still full of water, and the reason for the flood was the seal on the battery cover was both damaged and had debris on it.

So the do’s and don’ts …

Don’t trust a world wide warranty particularly by Oceanic, it’s not, and it will be cancelled at the drop of a hat, even if it is not your (the consumers) fault.

Don’t trust authorized service agents (particularly in Europe) to actually safely service your gear, let alone honor service agreements.

Don’t trust the manufacturer or their authorized service agents to care about you respecting brand loyalty (they don’t give a crap, it’s all money to them.)

Do research what you’re buying.

Do research “authorized service centers” to see if they have mandatory training.

Do learn how to service your own gear so you can at least check the work done by the agent.

Don’t assume because you are paying top dollar for gear you’re getting top quality.

Don’t bother with brand loyalty, it used to be worth something, but nowadays its worth nothing, the only thing brands care about are the number of greenbacks you can give up.

Footnote

So as I don’t expect to hear anything from Oceanic or any other Dive gear manufacturer, I’m now ridding myself of Oceanic stuff and going with what ever suits the purpose by which ever manufacturer I feel is not offering the best deal/value for money… Starting with a new air-integrated Computer.

The IoT should really be IoSI (Internet of Security Issues)

The Internet of Things

So here I am seeing issues, reading about issues and trying to stop issues in the Internet of Things…  Everyday someone seems to be publishing articles on the issues, people are getting more aware (you’d think!) but there seems to be no real movement.

Some of my readers will know what I do for my day job, for those that don’t I wrote the SORBS Anti-spam system.. not quite the most hated, but some who should know better have said they just want me dead, then SORBS dead, then me killed again just to be sure I’m actually dead.  Several years ago I spent Christmas sitting in front of my computers rewriting part of the system, particularly that part that finds “bad stuff” and reports it (eg Open-Relay Servers) and whilst scanning hosts that were actively trying to send spam and/or viruses to me I came across the web page of a fridge.  The page half loaded before it became completely unresponsive and tracing it I found it on an IP address that appeared to be in Rome (Italy)….  When I reported my finding of a ‘Fridge Spamming’ to my boss all hell broke loose, blog articles were written, front pages were held and suddenly the world knew about ‘Fridges Spamming‘.  Shortly there after we got debunked by our main competitor of the time who asserted it wasn’t possible, the article however sparked off massive research and watching of the technology from a security stance.

In July of the same year a bunch of researchers at a University found that the premise of the ‘debunking’ was actually false and that with a specific sequence of commands it was possible to get the fridge concerned into a system ‘admin/debug’ mode that allowed a remote attacker to use the device as a simple proxy server and install other “apps”.  This largely went unnoticed in IoT industry with respect to the original report, I never understood why… perhaps someone can explain that to me? 🙂

3 years later…

One would think we have learned something, we certainly have seen more of these types of attacks, not always for spam but just as a device to get into a network, to provide the door way.  Indeed the attackers have pretty much made an art out of it, using combinations of direct hacks, social engineering to gain access or persuade users to install things and even stealing devices…  The lists and lengths seems endless, especially when you consider who is doing this sort of thing and even who is paying who…   We’ve all heard about Trump and Russia and the controversy, well there are teams of hackers in Russia who’s sole income is to break into systems and steal secrets.  Its not a stretch to imagine that they are not unconnected…  Personally I don’t go into the conspiracy theories but I can tell you there are companies and persons of interest that do pay for services of such teams and not just Russian ones, there are European teams, Chinese teams and American etc..

The result is a lot more tech out there, all with security issues and all trying to keep market share, by innovating or by destroying the competition.

So why are we helping these people along?  Why are we allowing companies to circumvent privacy laws?  Why are they even trying?  Why are there more and more companies dealing with security remediation rather than companies dealing with the actual problem…?

All questions for you the reader (and hopefully some people that can effect change.)

So what is this blog post about? Why did you write it?

Well quite simply I chase down security patches for my services…  You see I still manage SORBS and recently we moved some of the servers around to a new Datacenter and as a consequence I changed a lot of security settings to make the systems more secure.  The fall out of this was I completely re-wired my home office network and the only thing on my network now that is not ‘secured’ (ie may have issues) was my wireless network.

Originally I had an OpenVPN connection for every service over the wireless that was an ‘authorised machine’ and a straight session login for controlling access.  I deliberately set the whole network to ‘Open’ (ie unencrypted) to remind people using it that everything can be watched so if it’s important, use HTTPS (or use the OpenVPN) etc.

I decided to switch the network to WPA2-Enterprise for authorised users, and to use a Juniper NAC to provide a captive portal and control the logins etc…  I didn’t account for the ridiculous cost of the licenses of the Juniper NAC so even though I picked up a brand new IC4500 for less than €70 I couldn’t use it because the most basic license (to allow 25 devices to login) is over €1200 and using the Captive Portal aspect (which is what I actually wanted) it was going to cost over €4500…   I pulled it apart… I found that the IC4500 is just a Dual Core, 1-RU server with a couple of gigs of RAM, an 80G hard drive and 2 Gigabit Ethernet ports… so changing the drive to something larger and a bit of fiddling and I put the OS I have been developing on it (BSD Server UNIX -BSDSUX for short) and now I have a captive portal of my own making…  so last thing was to get the Access Points able to do both Open Security and WPA2-Enterprise at the same time, and when logged in get forced off the open wireless and allowed onto the secure wireless.

So finally to the point…

The Internet of Security Issues

Not so long ago a number of security vulnerabilities were hitting the headlines, and in particular ‘ShellShock’ so running Amped Wireless AP20000G‘s around my home which I happen to know run Linux I was a little concerned.  I had the latest firmware on the devices and this was dated  few years earlier (13 Dec 2012) so I emailed Amped Wireless about the issue and wasn’t actually told anything about the issue except they’d review the bug.  Time went by and more and more issues came up, and still no firmware… the latest one is CVE-2017-6074 which was introduced to the Linux Kernel way back in 2006, in fact the vulnerability description states this:

The oldest version that was checked is 2.6.18 (Sep 2006), which is
vulnerable. However, the bug was introduced before that, probably in the first release with DCCP support (2.6.14, Oct 2005).

Now the clueful of you would know that this is a local privilege escalation issue and when it comes to routers, APs etc you’d actually have to get on the device to exploit it.  The same clueful will know that’s not as difficult as it might sound.

So figuring that I’m never going to get the firmware update I need/want I might as well go about hacking the router myself and building my own firmware that can indeed work with the IC4500 and finally finish securing my network to the level I want.

(and for those fed up with reading… if you haven’t worked it out… it’s 2017, the Access Point is classed as one of the ‘Internet of Things’ it is vulnerable to hacking on multiple fronts and 5 years later and I can’t get an update to the firmware – even though they are still selling these devices in shops!!!! … the gory horror for the techs is coming, so keep reading if you want…)

First things first when going down this path… Research the hardware and see what’s available… the Website ‘WikiDevi‘ is great for this and provides the following details

CPU1: Realtek RTL8198 (620 MHz)
FLA1: 8 MiB (Macronix MX25L6406EM2I-12G)
RAM1: 64 MiB (Hynix H5PS5162GFR-S6C)

WI1 chip1: Realtek RTL8192DR
WI1 802dot11 protocols: an
WI1 MIMO config: 2×2:2
WI1 antenna connector: RP-SMA
WI2 chip1: Realtek RTL8192CE
WI2 802dot11 protocols: bgn
WI2 MIMO config: 2×2:2
WI2 antenna connector: RP-SMA

ETH chip1: Realtek RTL8198
Switch: Realtek RTL8198
LAN speed: 10/100/1000
LAN ports: 4
WAN speed: 10/100/1000
WAN ports: 1

Which also tells me that normal OpenWRT support is not available (they don’t support RealTek devices mostly).. but more looking (and the WikiDevi page now says it) there is RealTek support by some authors.  Looking up the chips I also get information there is JTAG support (which is basically a serial port for debugging) so I got to work with my screwdriver and soldering iron and this was the result…

Which applying power produced the following in a minicom session.

Booting...?
========== SPI =============
SDRAM CLOCK:181MHZ
 ------------------------- Force into Single IO Mode ------------------------ 
|No chipID  Sft chipSize blkSize secSize pageSize sdCk opCk      chipName    |
| 0 c22017h  0h  800000h  10000h   1000h     100h   86   30   MX6405D/05E/45E|
 ---------------------------------------------------------------------------- 
Reboot Result from Watchdog Timeout!

---RealTek(RTL8198)at 2012.04.12-16:11+0800 version v1.2 [16bit](620MHz)
no sys signature at 00010000!
no sys signature at 00020000!
no sys signature at 00030000!
no sys signature at 00140000!
no rootfs signature at 000E0000!
no rootfs signature at 000F0000!
no rootfs signature at 00130000!
no rootfs signature at 00240000!
Jump to image start=0x80500000...
decompressing kernel:
Uncompressing Linux... done, booting the kernel.
done decompressing kernel.
start address: 0x80003640
RTL8192C/RTL8188C driver version 1.6 (2011-07-18)



Probing RTL8186 10/100 NIC-kenel stack size order[3]...
chip name: 8196C, chip revid: 0
NOT YET
eth0 added. vid=9 Member port 0x1...
eth1 added. vid=8 Member port 0x10...
eth2 added. vid=9 Member port 0x2...
eth3 added. vid=9 Member port 0x4...
eth4 added. vid=9 Member port 0x8...
[peth0] added, mapping to [eth1]...
init started: BusyBox v1.13.4 (2012-12-13 11:08:29 CST)
Init Start...
Init bridge interface...
killall: smbd: no process killed
killall: nmbd: no process killed
basename(1)
basename(2 /sys/block/sda)
basename(2 /block/sda)
basename(2 /sda)
basename(3 sda)
basename(1)
basename(2 /sys/block/sda)
basename(2 /block/sda)
basename(2 /sda)
basename(3 sda)
basename(1)
basename(2 /sys/block/sda/sda1)
basename(2 /block/sda/sda1)
basename(2 /sda/sda1)
basename(2 /sda1)
basename(3 sda1)
basename(1)
basename(2 /sys/block/sda/sda1)
basename(2 /block/sda/sda1)
basename(2 /sda/sda1)
basename(2 /sda1)
basename(3 sda1)
try_mount(1) sda1, /var/tmp/usb/sda1
CMD: /bin/ntfs-3g /dev/sda1 /var/tmp/usb/sda1 -o force

Error opening '/dev/sda1': No such device or address
Failed to mount '/dev/sda1': No such device or address
Either the device is missing or it's powered down, or you have
SoftRAID hardware and must use an activated, different device under
/dev/mapper/, (e.g. /dev/mapper/nvidia_eahaabcc1) to mount NTFS.
Please see the 'dmraid' documentation for help.
Init Wlan application...

WiFi Simple Config v2.3 (2011.11.08-13:04+0000).

Register to wlan0
Register to wlan1
route: SIOCDELRT: No such process
iwcontrol RegisterPID to (wlan0)
iwcontrol RegisterPID to (wlan1)
$$$ eth1 & eth0 up $$$
IEEE 802.11f (IAPP) using interface br0 (v1.7)
#

As one can see straight in at a root prompt (no login – but hey, needs to physically connect to it with a soldering iron…), and we can see it’s running BusyBox (which means it’s running ash not bash so not vulnerable to Shellshock – nice of the company to tell me!??!?!)…  But confirmed….

# x='() { :;}; echo VULNERABLE' ash -c : 
#

So what about the latest bug that goes back to 2006… well…

# cat /proc/version   
Linux version 2.6.30.9 (kevinlin@localhost.localdomain) (gcc version 3.4.6-1.3.6) #603 Thu Dec 13 15:14:20 CST 2012

That would be a yes then…  In fact we can see that this OS was made with the old version of the RealTek SDK

# cat /etc/version
RTL8198 v1.0 --  Thu Dec 13 15:13:43 CST 2012
The SDK version is: Realtek SDK v2.5-r7984
Ethernet driver version is: 7953-7929
Wireless driver version is: 7977-7977
Fastpath source version is: 7873-6572
Feature support version is: 7927-7480

So my next trick is to work out which GPIO pins I need to manipulate to get the power output control of the Skyworks (SiGe) SE5004L / 5004L power amplifiers under my control but that’s digressing from the topic of this post.  Poking around looking for the details and I found something else rather interesting…

# ps -ax
  PID USER       VSZ STAT COMMAND
    1 root      1576 S    init      
    2 root         0 SW<  [kthreadd]
    3 root         0 SW<  [ksoftirqd/0]
    4 root         0 SW<  [events/0]
    5 root         0 SW<  [khelper]
    8 root         0 SW<  [async/mgr]
   61 root         0 SW<  [kblockd/0]
   71 root         0 SW<  [khubd]
   88 root         0 SW   [pdflush]
   89 root         0 SW<  [kswapd0]
  649 root         0 SW<  [mtdblockd]
  870 root     13760 S    /bin/smbd -D -s /var/smb.conf 
  878 root     13808 S    /bin/smbd -D -s /var/smb.conf 
  882 root      6508 S    /bin/nmbd -D -s /var/smb.conf 
  902 root       960 S    iapp br0 wlan0 wlan1 
  913 root      1260 S    wscd -start -c /var/wsc-wlan1.conf -w wlan1 -fi /var/
  917 root       984 S    iwcontrol wlan0 wlan1 
  942 root      1008 S    dnrd --cache=off -s 168.95.1.1 
  951 root       956 S    reload -k /var/wlsch.conf 
  984 root      2168 S    webs 
  985 root      1584 S    -/bin/sh 
 1021 root      1576 R    ps -ax 
#

.. That little thing that says, “dnrd –cache=off -s 168.95.1.1” .. What this program is is a DNS relay server ie something to help you resolve addresses from the names we know and are used to like “www.microsoft.com” into the quad octet that the computers can deal with called an ‘IP Address’.  Now the reason I’m pointing it out is that 168.95.1.1 is not something I have configured and it is not something on my network, so it tweaked my curiosity.  Turns out it belongs to a Taiwanese company “Chunghwa Telecom Co., Ltd”

$ host 168.95.1.1
1.1.95.168.in-addr.arpa domain name pointer dns.hinet.net.
$ whois hinet.net

.
.
.

   Server Name: HINET.NET.TW
   Registrar: MELBOURNE IT, LTD. D/B/A INTERNET NAMES WORLDWIDE
   Whois Server: whois.melbourneit.com
   Referral URL: http://www.melbourneit.com.au


   Domain Name: HINET.NET
   Registrar: NETWORK SOLUTIONS, LLC.
   Sponsoring Registrar IANA ID: 2
   Whois Server: whois.networksolutions.com
   Referral URL: http://networksolutions.com
   Name Server: ANS1.HINET.NET
   Name Server: ANS2.HINET.NET
   Status: ok https://icann.org/epp#ok
   Updated Date: 02-feb-2017
   Creation Date: 19-mar-1994
   Expiration Date: 20-mar-2018

.
.
.

Domain Name: HINET.NET
Registry Domain ID: 2854475_DOMAIN_NET-VRSN
Registrar WHOIS Server: whois.networksolutions.com
Registrar URL: http://networksolutions.com
Updated Date: 2017-03-05T15:11:26Z
Creation Date: 1994-03-19T05:00:00Z
Registrar Registration Expiration Date: 2018-03-20T04:00:00Z
Registrar: NETWORK SOLUTIONS, LLC.
Registrar IANA ID: 2
Registrar Abuse Contact Email: abuse@web.com
Registrar Abuse Contact Phone: +1.8003337680
Reseller: 
Domain Status: ok https://icann.org/epp#ok
Registry Registrant ID: 
Registrant Name: Internet Dept., DCBG, Chunghwa Telecom Co., Ltd.
Registrant Organization: Internet Dept., DCBG, Chunghwa Telecom Co., Ltd.
Registrant Street: Data-Bldg, No. 21 Sec.1, Hsin-Yi Rd.
Registrant City: Taipei
Registrant State/Province: Taiwan
Registrant Postal Code: 100
Registrant Country: TW
Registrant Phone: +886.223444720
Registrant Phone Ext: 
Registrant Fax: +886.223960399
Registrant Fax Ext: 
Registrant Email: vnsadm@hinet.net
Registry Admin ID: 
Admin Name: Internet Dept., DCBG, Chunghwa Telecom Co., Ltd.
Admin Organization: Internet Dept., DCBG, Chunghwa Telecom Co., Ltd.
Admin Street: Data-Bldg, No. 21 Sec.1, Hsin-Yi Rd.
Admin City: Taipei
Admin State/Province: Taiwan
Admin Postal Code: 100
Admin Country: TW
Admin Phone: +886.223444720
Admin Phone Ext: 
Admin Fax: +886.223960399
Admin Fax Ext: 
Admin Email: vnsadm@hinet.net

So the not only is this Access Point vulnerable to hacking it’s also sending details of every site I’m going to back to a server in Taiwan…  Well not quite, because unlike most home users I am using my own DNS servers and have specifically blocked the access points from talking to the Internet… I am not your average home user though.  That leads me to the following conclusion that some will find scary…

The Conclusion…

The biggest current threat to our networks, our privacy, and our electronic identities (including funds) is the threat of the Internet of Things that have not been patched. 

This threat is massive as the clueful people out there often can’t patch because the companies selling the devices are not providing security fixes because their profit is about getting new devices out there, not fixing old devices. 

It’s even bigger because most of the world are not techs, they don’t even know how to update the firmware or where it would even be available if they did. 

…Yet we’re all connecting up to the Internet, we’re all buying these boxes from household temperature controls available on your phone to Smart TVs and Fridges… even ‘Smart Bulbs‘!

All of which have the ability to run code, all of which have potential security issues, and all of which can provide the unethical people out there, ‘doorways into you home’.

 

Converting h.265 (HEVC) to h.264 (AVC)

Quick techie entry for anyone using the newer h265 codecs but unable to use them in players. (eg: Torrenting H265 encoded files then trying to play via PLEX and Roku)

Roku and other media players don’t support h.265 and as such any attempt to play h265 encoded files will result in an ‘Unable to play file’ error, so you might be wanting to convert the files to another format such as h.264.  To do this you need ffmpeg, however ffmpeg can be a little difficult to work, especially as it has so many options, so I wrote a little perl script to ‘mass convert’ all files in the current directory if they are h265 encoded to h264 encoding.  It is published here for those on UNIX systems (or those who know how to install Perl on Windows) to make life a little easier:

#!/usr/bin/perl

use strict;
use warnings;

open DIR, "ls -1 |";
while (<DIR>)
{
        chomp;
        next if ( -d "$_"); # skip directories
        next unless ( -r "$_"); # if it's not readable skip it!
        my $file = $_;
        open PROBE, "ffprobe -show_streams -of csv '$file' 2>/dev/null|" or die ("Unable to launch ffmpeg for $file! ($!)");
        my ($v, $a, $s, @c) = (0,0,0);
        while (<PROBE>)
        {
                my @streaminfo = split(/,/, $_);
                push(@c, $streaminfo[2]) if ($streaminfo[5] eq "video");
                $a++ if ($streaminfo[5] eq "audio");
                $s++ if ($streaminfo[5] eq "subtitle");
        }
        close PROBE;
        $v = scalar @c;
        if (scalar @c eq 1 and $c[0] eq "ansi")
        {
                warn("Text file detected, skipping...\n");
                next;
        }
        warn("$file: Video Streams: $v, Audio Streams: $a, Subtitle Streams: $s, Video Codec(s): " . join (", ", @c) . "\n");
        if (scalar @c > 1)
        {
                warn("$file has more than one video stream, bailing!\n");
                next;
        }
        if ($c[0] eq "hevc")
        {
                warn("HEVC detected for $file ...converting to AVC...\n");
                system("mkdir -p h265");
                my @params = ("-hide_banner", "-threads 2");
                push(@params, "-map 0") if ($a > 1 or $s > 1 or $v > 1);
                push(@params, "-c:a copy") if ($a);
                push(@params, "-c:s copy") if ($s);
                push(@params, "-c:v libx264 -pix_fmt yuv420p") if ($v);
                if (system("mv '$file' 'h265/$file'"))
                {
                        warn("Error moving $file -> h265/$file\n");
                        next;
                }
                if (system("ffmpeg -xerror -i 'h265/$file' " . join(" ", @params) . " '$file' 2>/dev/null"))
                {
                        warn("FFMPEG ERROR.  Cannot convert $file restoring original...\n");
                        system("mv 'h265/$file' '$file'");
                        next;
                }
        } else {
                warn("$file doesn't appear to need converting... Skipping...\n");
        }
}
close DIR;


Enjoy!

A history of my experience with FreeBSD and a warning to users….

So a rant about how a great project can go bad, and how it’s still doing stuff that should never be done and why I’ve decided to ‘fix it myself’ or move away from it completely.

Back in 2003 Paul Vixie forced me into using FreeBSD on one of my servers, it was not a welcome change for me, I was an avid Linux user until this point…. and it didn’t go well.. I started on 4.x, found there was no threading support, so “upgraded” the system to 5.x… which went badly…very badly… and every upgrade through 5.x was as bad…. Partly because of what I did, partly because of my lack of knowledge and partly because of system limitations.

Out came 6.0 and I started working with it and soon I had a whole slew of machines that were on 6.0 and with 6.1 and 6.2 things only got better. I had build servers, I had package servers, I could boot one of the servers off the network and have it completely re-install the server with the latest OS, Patches and Packages within 23 minutes (bare metal to built, configured and in production in 23 minutes…!)

Then came 7.0 and my ‘burnout’ – personal, profession life clashed, I ‘burnt out’ and my technical issues took a back burner, then before you knew it 7.3 and 7.4 were out and I had sold my company… and I was back working on getting things patched and upgraded… however some major changes had happened and the ‘ports’ tree no longer worked on 6.x hosts… so the entire system was frozen…. no new security patches, no upgrades, however along with the sale of the company came new hope… new hardware… and an opportunity to upgrade by replacing the hardware… New hardware was installed to 7.3 (as this was all that was available on Softlayer) and then my attention was diverted to getting my software upgraded to a new major revision and with it my attention and priorities changed from Sys-Admin work to developer and the older systems remained. Not long later the company that ‘bought me’, ‘sold me’ to another (my current) employer, Proofpoint Inc and new priorities were given… along with more new servers.. the result was 8.x systems being installed and with the advent of FreeBSD upgrading ‘bmake’ more stuff got changed in the ports tree, again making them non-working on pre 7.4 systems… more things got changed/patched on my servers and I ended up with new hardware again, this time running 9.0 and 9.1… at this point in time (2013) I had the following versions of FreeBSD in production:

  • FreeBSD 6.0
  • FreeBSD 6.1
  • FreeBSD 6.2
  • FreeBSD 7.2
  • FreeBSD 7.3
  • FreeBSD 8.1
  • FreeBSD 8.2
  • FreeBSD 8.3
  • FreeBSD 8.4
  • FreeBSD 9.0
  • FreeBSD 9.1

Which for any sys-admin you can guess would be a nightmare.  Further Proofpoint has policy and puppet, policies about how things are managed and puppet to manage everything.  It was suggested that my systems should be managed by puppet… so after Oct 2013 when the databases were finally migrated to the new hardware and then I could work on upgrading everything off old hardware and onto new OS’s and patches I setup a puppet server, a number of build servers and a test suite, all of my own creation and similar to what I had done in 2005… to take back control…  I also ended up with FreeBSD 9.2 on some servers, so I decided i would standardise on:

  • FreeBSD 8.4
  • FreeBSD 9.0
  • FreeBSD 9.1
  • FreeBSD 9.2

…at least until I could spend the time getting everything to a single OS level…  FreeBSD 10.0 came out, and later FreeBSD 9.3, but by that time I had the basic systems working and so adding these to the build and test suite was a matter of adding new build and test hosts… which just took a few hours.

As part of this build change I learned new tools:

  • Jenkins
  • Poudriere
  • Puppet
  • VirtualBox

I learned how to create my own ports, I learned how to patch my own ports privately.  I learned how to submit bugs back to FreeBSD ports maintainers.  I became a FreeBSD port maintainer myself.  I noted that as of 1st September 2014 the old pkg_* tools that had been around since day dot were about to be End-Of-Life’d in favour of a new ‘PKGNG‘ system.  I read the linked blog entry and decided that it was something I would have to look at, but later, because the EOL (as most sys-admins know) just means no new patches and something may start breaking that wouldn’t be supported by the developers.  At the end of July 2014 I spoke with the main protagonist of the change and was informed bluntly and to the point that they had already got a patch built and waiting to be applied, not to EOL the tools but to actually and deliberately break the existing tools thereby forcing people to use the new system.

Needless to say with less than 5 weeks of time before this was due to occur there was no chance of me converting all 57 servers, so I suggested that they shouldn’t I was told, its going to happen regardless… and that I should know that EOL means the product would no longer work, not that it would just not be supported anymore.  I guess all those years I had worked for the likes of Netscape, Oracle etc meant they all got it wrong… even Microsoft got it wrong, I mean Windows XP was ‘EOL’d a while back and well all those Window XP machines around the world just stopped working the same day… NOT!

So I continued with my build system and tried to get a stable patched repository of packages so I could at least continue my plan to get the servers to the standardised OS levels…  During testing of the packages I noted bugs, reported them to the developers, then pushed the maintainers (with mixed levels of success) to implement the fixes before the dead line (more appropriately named rather than EOL)… I failed.. several patches were not put into the ports tree until 7 days after the dead line (and that may have been deliberate on the developers aspect – though will never know.)  So the ports tree was patched on time, it rendered the old tools dead and my entire build, test and development system was broken.

I set about repairing it, for a while just copying pre-DeadLine files for building seemed to work with some local changes, so I continued to build out my systems to cope with this, and finally at the beginning of Dec 2014 I got a stable and complete repository.

Over Christmas 2014 I set myself the task of upgrading all servers to one of the standardised OS’s and at the same time patching all the existing OS’s on one of those versions.  Of 57 servers, 31 became un-usable in some way during the patch update process (freebsd-update)  Some became un-bootable, some couldn’t access the network, some (even going from 9.3-RELEASE to 9.3-P5) broke packages such as ‘sudo’ leaving me unable to gain increased privileges to finish the patch process…. after over 160 hours of work, only stopping Christmas day and New Years day, all systems were patched to 9.2 or 9.3 with all the security patches…as they had to be because of the NTPd remote root exploit…. only having to reinstall 2 of the systems from scratch as they were un-recoverable.

Early January 2015 the build system failed again when trying to patch new security issues and I found it was related to more changes by the same culprit so decided after seeing similar rants by other long standing advocates to ask for some help and got a working set of Mk/* files with the intention of fixing it again.  The files I got wouldn’t work so I merged the tree by hand (27900+ lines) only to find the system not quite working… a week later and I have a working build system for most of the ports.  I set it going and get a working repository and decide to re-run the build because of a failed patch, and it all broke again…

So for the warning to all FreeBSD Users:

IF YOU RUN PRODUCTION SERVERS THAT REQUIRE TESTING AND STABILITY BEFORE MAJOR CHANGES, YOU PROBABLY ARE STILL ON PKG_* TOOLS, DON’T UPGRADE, DON’T PATCH AND LOOK AT OTHER SOLUTIONS! Here’s why:

  • running ‘freebsd-update‘ the extra pass to “delete old” will delete all pkg_* tools (even if you haven’t converted to pkgng)
  • updating the ports tree and updating something will automatically convert the system to use pkgng (whether tested and working or not)
  • if you build your own packages using poudriere 3.1 or above it will also “upgrade” your system without confirmation or warning.

Basically whether tested or not, whether working or not, the FreeBSD developers (not the kernel devs as far as I know) will change your production systems to configurations that will probably render your automated systems completely ineffective, without warning and without notification.

What am I doing about it, well at the moment I have created a ports tree ( svn co http://svn.sorbs.net/repos/ports/head ) on http://svn.sorbs.net/repos/ports that can be put into poudriere (as SVN_HOST=svn.sorbs.net/repos ) and it will in theory build most packages for pkg_* tools – it’s not complete and is being changed on a daily basis currently as new changes go in, and with the latest “HEADSUP” announced on the FreeBSD Ports mailing list detailing another change in syntax that is not backward compatible with existing systems (even pkgng ones) I expect it won’t work for long….  My advice as the culprit seems hell bent on changing systems to the way Linux has been for years and ignoring all input from users of FreeBSD that does not agree with his vision, find an alternative.

After 12 years of promoting FreeBSD I am not any more, I’m not going to stop my employer moving everything to Linux, and I’m *NOT* going to upgrade anything to 10.x (and as 9.4 will probably not have pkg_* tools available, I won’t be going there either.)

Sadly, thinking about the whole issue, with a little work it could have been avoided, ensuring all variables in the ports are backwards compatible and having separate Mk/* repositories (even unmaintained/EOLd) would have made the whole process less painful an allowed the developers to continue their path, whether right or wrong, to completion, and allow us insignificant users to continue without pain.  In fact had someone had the for-sight I think even pre-bmake systems would still be patchable and working, even back to the 6.x tree! .. well at least until the new changes in the plist files… which most can be back-ported despite the claim that progress is impossible with the old pkg_* tools.