So a rant about how a great project can go bad, and how it’s still doing stuff that should never be done and why I’ve decided to ‘fix it myself’ or move away from it completely.
Back in 2003 Paul Vixie forced me into using FreeBSD on one of my servers, it was not a welcome change for me, I was an avid Linux user until this point…. and it didn’t go well.. I started on 4.x, found there was no threading support, so “upgraded” the system to 5.x… which went badly…very badly… and every upgrade through 5.x was as bad…. Partly because of what I did, partly because of my lack of knowledge and partly because of system limitations.
Out came 6.0 and I started working with it and soon I had a whole slew of machines that were on 6.0 and with 6.1 and 6.2 things only got better. I had build servers, I had package servers, I could boot one of the servers off the network and have it completely re-install the server with the latest OS, Patches and Packages within 23 minutes (bare metal to built, configured and in production in 23 minutes…!)
Then came 7.0 and my ‘burnout’ – personal, profession life clashed, I ‘burnt out’ and my technical issues took a back burner, then before you knew it 7.3 and 7.4 were out and I had sold my company… and I was back working on getting things patched and upgraded… however some major changes had happened and the ‘ports’ tree no longer worked on 6.x hosts… so the entire system was frozen…. no new security patches, no upgrades, however along with the sale of the company came new hope… new hardware… and an opportunity to upgrade by replacing the hardware… New hardware was installed to 7.3 (as this was all that was available on Softlayer) and then my attention was diverted to getting my software upgraded to a new major revision and with it my attention and priorities changed from Sys-Admin work to developer and the older systems remained. Not long later the company that ‘bought me’, ‘sold me’ to another (my current) employer, Proofpoint Inc and new priorities were given… along with more new servers.. the result was 8.x systems being installed and with the advent of FreeBSD upgrading ‘bmake’ more stuff got changed in the ports tree, again making them non-working on pre 7.4 systems… more things got changed/patched on my servers and I ended up with new hardware again, this time running 9.0 and 9.1… at this point in time (2013) I had the following versions of FreeBSD in production:
- FreeBSD 6.0
- FreeBSD 6.1
- FreeBSD 6.2
- FreeBSD 7.2
- FreeBSD 7.3
- FreeBSD 8.1
- FreeBSD 8.2
- FreeBSD 8.3
- FreeBSD 8.4
- FreeBSD 9.0
- FreeBSD 9.1
Which for any sys-admin you can guess would be a nightmare. Further Proofpoint has policy and puppet, policies about how things are managed and puppet to manage everything. It was suggested that my systems should be managed by puppet… so after Oct 2013 when the databases were finally migrated to the new hardware and then I could work on upgrading everything off old hardware and onto new OS’s and patches I setup a puppet server, a number of build servers and a test suite, all of my own creation and similar to what I had done in 2005… to take back control… I also ended up with FreeBSD 9.2 on some servers, so I decided i would standardise on:
- FreeBSD 8.4
- FreeBSD 9.0
- FreeBSD 9.1
- FreeBSD 9.2
…at least until I could spend the time getting everything to a single OS level… FreeBSD 10.0 came out, and later FreeBSD 9.3, but by that time I had the basic systems working and so adding these to the build and test suite was a matter of adding new build and test hosts… which just took a few hours.
As part of this build change I learned new tools:
- Jenkins
- Poudriere
- Puppet
- VirtualBox
I learned how to create my own ports, I learned how to patch my own ports privately. I learned how to submit bugs back to FreeBSD ports maintainers. I became a FreeBSD port maintainer myself. I noted that as of 1st September 2014 the old pkg_* tools that had been around since day dot were about to be End-Of-Life’d in favour of a new ‘PKGNG‘ system. I read the linked blog entry and decided that it was something I would have to look at, but later, because the EOL (as most sys-admins know) just means no new patches and something may start breaking that wouldn’t be supported by the developers. At the end of July 2014 I spoke with the main protagonist of the change and was informed bluntly and to the point that they had already got a patch built and waiting to be applied, not to EOL the tools but to actually and deliberately break the existing tools thereby forcing people to use the new system.
Needless to say with less than 5 weeks of time before this was due to occur there was no chance of me converting all 57 servers, so I suggested that they shouldn’t I was told, its going to happen regardless… and that I should know that EOL means the product would no longer work, not that it would just not be supported anymore. I guess all those years I had worked for the likes of Netscape, Oracle etc meant they all got it wrong… even Microsoft got it wrong, I mean Windows XP was ‘EOL’d a while back and well all those Window XP machines around the world just stopped working the same day… NOT!
So I continued with my build system and tried to get a stable patched repository of packages so I could at least continue my plan to get the servers to the standardised OS levels… During testing of the packages I noted bugs, reported them to the developers, then pushed the maintainers (with mixed levels of success) to implement the fixes before the dead line (more appropriately named rather than EOL)… I failed.. several patches were not put into the ports tree until 7 days after the dead line (and that may have been deliberate on the developers aspect – though will never know.) So the ports tree was patched on time, it rendered the old tools dead and my entire build, test and development system was broken.
I set about repairing it, for a while just copying pre-DeadLine files for building seemed to work with some local changes, so I continued to build out my systems to cope with this, and finally at the beginning of Dec 2014 I got a stable and complete repository.
Over Christmas 2014 I set myself the task of upgrading all servers to one of the standardised OS’s and at the same time patching all the existing OS’s on one of those versions. Of 57 servers, 31 became un-usable in some way during the patch update process (freebsd-update) Some became un-bootable, some couldn’t access the network, some (even going from 9.3-RELEASE to 9.3-P5) broke packages such as ‘sudo’ leaving me unable to gain increased privileges to finish the patch process…. after over 160 hours of work, only stopping Christmas day and New Years day, all systems were patched to 9.2 or 9.3 with all the security patches…as they had to be because of the NTPd remote root exploit…. only having to reinstall 2 of the systems from scratch as they were un-recoverable.
Early January 2015 the build system failed again when trying to patch new security issues and I found it was related to more changes by the same culprit so decided after seeing similar rants by other long standing advocates to ask for some help and got a working set of Mk/* files with the intention of fixing it again. The files I got wouldn’t work so I merged the tree by hand (27900+ lines) only to find the system not quite working… a week later and I have a working build system for most of the ports. I set it going and get a working repository and decide to re-run the build because of a failed patch, and it all broke again…
So for the warning to all FreeBSD Users:
IF YOU RUN PRODUCTION SERVERS THAT REQUIRE TESTING AND STABILITY BEFORE MAJOR CHANGES, YOU PROBABLY ARE STILL ON PKG_* TOOLS, DON’T UPGRADE, DON’T PATCH AND LOOK AT OTHER SOLUTIONS! Here’s why:
- running ‘freebsd-update‘ the extra pass to “delete old” will delete all pkg_* tools (even if you haven’t converted to pkgng)
- updating the ports tree and updating something will automatically convert the system to use pkgng (whether tested and working or not)
- if you build your own packages using poudriere 3.1 or above it will also “upgrade” your system without confirmation or warning.
Basically whether tested or not, whether working or not, the FreeBSD developers (not the kernel devs as far as I know) will change your production systems to configurations that will probably render your automated systems completely ineffective, without warning and without notification.
What am I doing about it, well at the moment I have created a ports tree ( svn co http://svn.sorbs.net/repos/ports/head ) on http://svn.sorbs.net/repos/ports that can be put into poudriere (as SVN_HOST=svn.sorbs.net/repos ) and it will in theory build most packages for pkg_* tools – it’s not complete and is being changed on a daily basis currently as new changes go in, and with the latest “HEADSUP” announced on the FreeBSD Ports mailing list detailing another change in syntax that is not backward compatible with existing systems (even pkgng ones) I expect it won’t work for long…. My advice as the culprit seems hell bent on changing systems to the way Linux has been for years and ignoring all input from users of FreeBSD that does not agree with his vision, find an alternative.
After 12 years of promoting FreeBSD I am not any more, I’m not going to stop my employer moving everything to Linux, and I’m *NOT* going to upgrade anything to 10.x (and as 9.4 will probably not have pkg_* tools available, I won’t be going there either.)
Sadly, thinking about the whole issue, with a little work it could have been avoided, ensuring all variables in the ports are backwards compatible and having separate Mk/* repositories (even unmaintained/EOLd) would have made the whole process less painful an allowed the developers to continue their path, whether right or wrong, to completion, and allow us insignificant users to continue without pain. In fact had someone had the for-sight I think even pre-bmake systems would still be patchable and working, even back to the 6.x tree! .. well at least until the new changes in the plist files… which most can be back-ported despite the claim that progress is impossible with the old pkg_* tools.