Network Routing Steps Forward and Back
I was so excited to get things working yesterday, and then it seems a power flicker messed everything up.
I had managed to get the new networks working. Quick recap: the directly connected nodes worked, but the WiFi router didn't until I changed its address. Then it worked fine.
I was setting out to get my IPv6 tunnel working. I did get it working on the directly connected subnet nodes. I let that run DHCP on the WAN router, so the directly connected nodes get random addresses in the routed /48 I have, and I set up a static IP for the server node. All of that worked great!
I poked at a couple IPv6 options on the WiFi router. First I tried "pass-through," which seemed like the hackiest way, bridging the IPv6 from the WAN router into the WiFi router's LAN. It worked, but felt goofy. I then switched to a static IP, giving that node a WAN IP from the /48, and a LAN IP in a /64 network from that /48. I crossed my fingers as I don't see any IPv6 routing bits in the WAN router interface, but it should kind of work itself out...and it did! Poof, WiFi nodes had IPs in the /64, and could hit IPv6 test sites on the Internet!
I did a little chair dance and took a deserved swig of coffee.
Then it stopped working. Ugh!
For reasons that didn't make sense, the WAN router suddenly lost its connection to the ISP router. Worse, the WiFi router started to repeatedly reboot, about every 300 seconds seemingly unable to connect to either the new or old LAN.
It was difficult to trek through the WiFi router settings to try to see what was awry. I turned off IPv6, with no impact. I checked the addressing, NAT, and firewall, and all seemed fine. Eventually I stumbled on a change I'd made to the WAN switching decisions, where I'd put my WAN router's far side as the node to check. Since the WAN router couldn't see it, of course anything connected to its LAN couldn't either. Further, it wasn't responding to Internet pings, which may be a choice by my ISP that I realized I hadn't checked before I changed the setting. I changed it to a known and busy Internet IP, and the WiFi router stopped rebooting. Since the IPv6 is related to the WAN router working, I left it turned off on the WiFi router.
I also turned off IPv6 on the WAN router. Not sure that it would make an impact, but it doesn't work without the WAN anyway. It didn't bring the setting back.
I poked at the WAN router quite a bit, but the crux of the issue was that the WAN Ethernet couldn't be detected. I went to the ISP router. Its LEDs were flickering on both its connected ports, fiber and Ethernet. No other weird indicators. As we've all been trained, when it doesn't work, turn it off and on again. I pulled the power plug, plugged it back in, waited for all the LEDs to return. Nothing weird occurred, and it looked like before.
Returning to my workstation, reconnecting to the WAN router directly, and it now saw the Ethernet, but still couldn't connect to the Internet. Small victory!
I poked at more things, but nothing mattered, so I dropped a note to my ISP's support. An hour later they noted that the router had a power outage for about an hour earlier; they noted a power outage about the same time I noticed a problem, and a cold restart about the time I pulled the plug. They pushed the router configuration again, and all should be well.
The equipment in my "data center" is all protected by UPSs. I have four 1500VA UPSs powering the servers, routers, and switches. Only one of each of the servers' PSUs are attached to a UPS, so there's a small risk of outage if that PSU has failed and the power goes out, but that's not my main concern; mostly I don't want all four of the PSUs on the big server sucking battery in a brief outage. The new router is on the other side, though, far away from the UPSs, and is left to the whims of the local power infrastructure.
There is massive construction at the end of our block, where the city is rebuilding the major road. There have been plenty of power flickers and some drops lasting seconds or minutes. It's conceivable one of these hit during that congratulatory coffee swig.
I shared a note with support that the problem didn't resolve, and asked if this was going to be volatile moving forward. Even if I add a UPS, it'll only last a short while, so it could lose power if an outage lasts long enough. I don't want to have to ask them to reset the configuration each time that happens (which fairly is normally spanned by months if not many, many months in our area).
The connection is still not working, but I have hopes it'll be resolved soon.