Ubuntu 11.10 Server Upgrade
A couple of days ago the new Ubuntu was released; version 11.10 named Oreiric Ocelot. Of course, I jumped on it and added it to my workstation and server.
Well, first I diligently tested on a VM, and found that the installation went smootly, as expected. I goofed and started two machines at the same time, which seemed to slow the downloads to a crawl. I turned in, expecting to be able to finish the upgrades in the morning. In the morning I found both machines paused at a prompt, and upon agreeing, the installations quickly finished.
The desktop finished first, as it had much less software, of course. The machine rebooted and the new interface (which I'm sure is a lot more different than it looks) popped right up and the machine seemed to be working correctly. The server finished, and I double-checked that I could SSH to the box, so I could continue checking things later.
I left the home office for the day-job office and promptly got involved in work for pay. Part way through my day I found lots of messages about web server failures. A brief check and it appeared that the user responsible for running the web server had been deleted as part of the upgrade (how rude), so I replaced that. I also found another file that was probably not removed by the upgrade, but that was no longer in place. I restored that from the last archive in which I found it (and also noted it had been removed a couple weeks ago...). The web server fired up and all seemed well there.
I also double-checked the Sendmail server, as previous upgrades have tried to replace that with exim or another MTA. Sendmail was neatly in place, but the configuration file had been replaced. I restored that and rebuilt it using the usual tools, and suddenly Sendmail was running again.
Sendmail complained that the SpamAssassin daemon wasn't running. This indicated to me that the Perl had been updated, so I re-updated everything and reinstalled SpamAssassin, and all of that was working well, too.
That was Friday.
Saturday, it seems, something else was horribly wrong. I'd received a message that the web-based e-mail access wasn't working. I've run into that, too, when Perl gets updated; the suid program isn't added by default for some reason. When I tried to check, however, I was instead greeted with "drive is full" messages. The drive in question is only about half-full, but all of the drive's inodes were being used.
Learning that took a little Google exploration. I know about inodes, of course, and like them. I guess it never occurred to me that you'd run out of them before you run out of drive space. It seems that you can. The suggestion was to investigate the failing partition and try to find the folder full of files. The partition in question was /home, so I had a pretty good idea where the many files are.
I've been working on a baseball-related hobby application, so I've been collecting the data from the MLB website (as is allowed by their usage documents), and storing it locally, so I don't access their site all of the time. There are tons of little XML files, so I set a task to archive the year folders and delete them. I've already got them on another drive, too (back-up, back-up, back-up...), but this is where they belong... The first folders (from the 192os to the early 2000s) didn't amount to much, as the data in them is actually not there; just the structure. When the process started cleaning about 2004, the inodes finally started coming back. By the time I'd cleaned through 2010, the inodes were released from 100% used to only 30%.
One interesting bit about all of this. On the other drive to which I copy this data every night, in a "time machine" style daily archive with symbolic links between previous days with the same file, the same partition is copied but that drive is only using 2% of its inodes. The partition on the server is like 500GB, while the archive drive is 2TB, but I'd expect a much bigger similarity.
With enough inodes freed, I restarted a couple of the servers that seemed to be failing. When the server's done with the rest of the clean-up, I'll restart it properly and allow all of the services to restart, too, and maybe trigger some journal clean-up as part of it.
That done, I turned back to my task of making the web-based e-mail work. I had hoped that this was either due to the, as previously noted and found, missing suid program, or perhaps even due to the inodes. It turns out that not only was suid-perl not installed, but it's been removed from the new version of Perl. Instead, Perl just does that when the permissions of the script are set appropriately, and the right flag is set when Perl is built.
It seems that flag was not set in the distro's version of Perl. I'm again faced with duplicating Perl with another version. I've done it before, but when the distribution was behind the curve. I guess technically this distro is behind, as the current version of Perl is 5.14.2, and the installed version is 5.12.4.
I'l decide and taclke that tomorrow.