Network Hiccup Causes Brief Outage
If you can read this, the outage is mitigated or over.
The plan hatched about nine months ago was to drop our Comcast Internet service and replace it with Century Link Internet service instead. Century Link is bringing gigabit fiber to homes nearby, and the promise was they'd get to us soon, (sales guy grain of salt here) probably sooner to customers. Either way the price was pledged to be lower and the speed higher. Well, the price is lower for the firstĀ year, and then pretty comparable if you can't negotiate another deal, but the speed is still higher.
I got the connection working fast enough, got some static IPs. Got a separate router. Reconfigured the LAN so that the DHCP clients (phones, tablets, TV devices, and so on) would use the Century Link connection, and only the server and my workstation would use the Comcast connection. Technically, the server and workstation use both, but let's not get picky.
I faltered a little bit as the Century Link doesn't do IPv6 so well. It seems they have IPv6rd on their network, which evidently works sometimes, but it terminates at their router, and things connected to it, but I can't pass addresses through the other router that's firewalling the LAN. I use an IPv6 6to4 tunnel on the Comcast connection to do just that. Everyone got an IPv6 address, tunneled and firewalled at the LAN WiFi router, not the CableCo router.
Although IPv6 isn't critical to my network, and only about 1% of my traffic arrives IPv6 (and almost all of that is e-mail), I didn't want to lose that, or backtrack too much on the configuration and knowledge I've garnered over the last few years playing with it.
The server, to get back to the point, had been mostly configured to work with both networks. Really, the only valid hold-up for it not to be using Century Link was DNS. I've got a couple dozen domains running on the server, and I needed to hit each one's DNS records to add its new IP. Lazy is really the only reason that hadn't happened.
Well, lazy and the distraction of the new server. The current server does everying, including web, JDK apps, databases, e-mail, log analysis, file services, firewall...everything. I've got a new 8x8-core server with 64GB of RAM waiting to distribute those apps among VMs in a "network in a box." I haven't done it yet, though. It's a lot of work, and, well, it seems I'm lazy.
Today's outage kicked me into higher gear. I ripped through 20-some DNS zone edits (sometimes more than once), changing main IPs, TTLs, and removing IPv6 entries. I double-checked and rearranged the Apache and other server configurations, and all seemed well. Tests with little-used domains started popping up in browsers.
Propagation is said to take days. It usually doesn't take more than minutes or sometimes hours. There were a few week-long TTLs, and a couple minutes-long, but most were set to an hour, so give-or-take, that's as long as it should live in any caches. It's unlikely that any cache was nearing expiration before passing along an hour-long (or longer) TTL down the chain, and even less likely that happened a lot. I suspect that by morning all will be back to normal, except now the traffic will be going through the other wires.
With this, much to the wife's happiness, there's little or no reason to maintain the Comcast connection. Over the next week or so I'll rearrange the LAN so that both of the firewall/routers and all of the direct workstation connections go through Century Link.
Then I can focus on splitting the services apart and getting everything onto the other system. But that's another story, and a whole lot of other hours to spend.