Broken Network Repaired
Of course, if it was still broken, this wouldn't be here.
I'm changing ISPs again. It seems to happen every few years. This time it's because my Quest Fiber doesn't actually offer static IPs or support IPv6, although both are supported by CenturyLink (the network under Quest Fiber) and available on my DSL (which I still maintain because the fiber doesn't do it). I've added US Internet Fiber to try to bridge that gap, because they offer static IP and indirectly (via 6-in-4 tunnels) IPv6.
The fiber service has been installed for a couple weeks now. It took a stretch of days because they buried the fiber from the street to the house. A few days after the service was running, they allocated my static IP, and a few days after I got that notice, I made the switch in my router to use it. Once I got the router using the static IP, it was time to start making the link to the web server (and other things).
Unfortunately, USI only offers single IP addresses. Technically, they allocate a /30, which has 2 usable addresses; technically four, but with one reserved for definition and another for broadcast. One of the usable addresses is mine, the other the "other side" (their router). I asked the fella helping me enable it if it was possible to get a /29, which at least gives me 5 addresses, with the definition, broadcast, and router already used out of the eight that offers. He verbally shrugged and suggested I reach out to sales. I moved on, because I can do what I need to do with one IP, especially with the addition of IPv6.
I added the network definitions to the right page in my router's configuration, bonked the button, and let it reboot. It immediately connected and I could reach the Internet from my router and desktop.
Getting the IPv6 tunnel established was my next task. My router only supports some features from one Internet connection at a time, even though it offers an ability to either bond, load-balance, or fail-over between two. Bonding doesn't work for me, because that's a way for the router to connect to a higher-speed connection (which is offered by USI), connecting two of its 1GB ports to the upstream and deliver 2GB service. I'm connecting my router to a 100MB DSL and a 1GB Fiber, and the way they load-balance is mostly round-robin, with some limited session bounding and controlled service routing, but I don't want to have 10% of our streaming traffic steal from the 100MB service used by the servers; so fail-over isn't right. I do use the fail-over feature to switch between the 1GB and 100MB connections when the 1GB fails.
The CenturyLink Fiber is overhead, from a pole in the alley behind the house, and while a new line, it wavers occasionally, as does the DSL, which is also overhead from the same pole. I blame squirrels who like to run on the lines. The outages are infrequent, but occasional. I've set up a Slack hook on the router to alert me when the switch (or switch back) happens. It hasn't happened since switching to USI, except the one time I unplugged the fiber media converter to move it to a different outlet (so I could trip the GFI power to the garage...but that's a different story).
I've digressed.
I mentioned the fail-over and other things because it means I had to abandon the IPv6 from the DSL (which is direct from CenturyLink, via IPv6rd) to use IPv6 via the new connection (and tunnelbroker.net). For whatever reason, it didn't work when I tried configuring the tunnel in my router with the routed /64 of the tunnel, but adding a /48 allowed the router to choose a /64 to route. I scratched my head, but moved on, as there are more than enough IPv6 addresses to shrug off this apparent waste. I think it had to do with the way the DHCP6 works on the router, but it allows me to allocate /64 out of that /48, and they still end up going through the tunnel.
With very little more than renewing my DHCP lease on my desktop, I got a new routed IPv6 address, confirmed by test-ipv6.com and ping6!
Next was trying to configure port forwarding to get web service, like this site, to go through the new IP. Later I'll address SMTP and the other things. Yes, I still host an SMTP server because I have a number of domains, and I haven't found an external service that bot accepts mail for multiple domains into the same mailbox, nor that allows me to relay from any of those domains.
Another digression.
The router configuration for the virtual ports is pretty straight forward, but like IPv6, is impacted by the way the external networks work. With the two public connections (which I do hope to terminate). There's just one configuration for the port forwarding, and it isn't associated to the connection. That is to say that "port 80" is open on whatever the active connection is, not either specific one; so it'll use the static IP from USI while it's active, but switch to its static IP from CenturyLink when it fails-over. Not a big deal, because if something happens to the connection, it'll be gone anyway. Where it gave me trouble was actually on the web servers...but I'm getting ahead of myself.
I set up the port forwarding for ports 80 and 443 to hit one of my web servers. I have a few servers, which I'll have to put behind a different edge reverse proxy to serve from one IP...but that's getting ahead, too. After setting up the port forwarding, I expected to be able to hit the new IP and see the page served by the server. If I hit the IP directly, the server responds with a simple "server root" text response, regardless of the path or parameters; hitting with a host name delivers whatever that configuration has, like this domain's page. When I hit the new IP, though, I got a connection failure instead. No log hits showed the request on the web server, so something was awry.
I spent the better part of the last two days tinkering with the IP addressing and routing on the server. I went through the effort of removing the NetworkManager configuration and leveraging netplan instead. I had previously worked through the nuances of source routing on the servers, and it worked pretty well. I figured the problem was that since the virtual port is coming through the LAN interface, it was probably the case that traffic was not returning via the LAN interface, because the only gateway configured is out the static IP, so I figured the traffic was trying to go out the other interface.
Here's an example of a previous NetworkManager configuration for an IP not mine any more:
# iface eth3 inet static
# address 75.146.174.77
# netmask 255.255.255.248
# broadcast 75.146.174.79
# network 75.146.174.72
# gateway 75.146.174.78
# metric 200
# up ip route add 75.146.174.72/29 dev eth3 src 75.146.174.77 table static75_77
# up ip route add default via 75.146.174.78 dev eth3 table static75_77
# up ip rule add from 75.146.174.77 table static75_77
# down ip route del 75.146.174.72/29 dev eth3 src 75.146.174.77 table static75_77
# down ip route del default via 75.146.174.78 dev eth3 table static75_77
# down ip rule del from 75.146.174.77 table static75_77
I found the details I could to get the same netplan configuration to allow each interface to return traffic. I jumped through a few iterations, some breaking my SSH session, requiring me to go to the server directly to revert. I did learn to love netplan try in order to let it attempt a configuration and attempt a reversion if it fails, but sometimes the fail didn't revert as expected.
I could get the server to respond to HTTP requests to the static IP, but I couldn't reach it through the LAN port. I couldn't get the servers to talk to each other, either. This causes problems because of the distributed servers and logging. I could always use IPv6 to reach the server, but I can't use that for all the services. In particular, I wasn't able to maintain HTTPS or SSH connections.
I was able to get many things, like SSH and HTTPS, to work on IPv6 instead, but there are a few things that don't want to work on IPv6. I was able to do some thing via the CenturyLink IPs, which worked for the servers that are connected directly to the Internet, but it didn't solve the problem for the NAS and DB servers that are only on the LAN.
Most frustrating was that the Docker containers on one server wouldn't start because they're all configured to directly feed Splunk with their access and system logs via a Splunk HTTP listener. Because the two machines on the same LAN couldn't share HTTPS connections. I couldn't find a way to make the remote logging optional, and I wasn't prepared to remove it just to get the containers to start, although I was considering it. Since this blog runs in a container, it was missing for a few hours. I went briefly down a rabbit hole of possibly trying to get Docker to work with IPv6, which I'll probably do in the future, but I needed something that still works with IPv4.
Ultimately, in a frustrated attempt, I found that if I took off the routing for the LAN, everything worked fine. The WAN needs its routing, and the default, to allow the SMTP server (it keeps coming up) to use the right ports. It's almost the same configuration as before, but it works now.
Really all I needed for the LAN was to define the addresses. All the traffic technically stays on the LAN, and the router handles pushing the traffic.
network:
version: 2
renderer: networkd
ethernets:
ens0:
dhcp6: false
addresses:
- 10.10.10.115/29
- 10.10.10.117/29
nameservers:
addresses:
- 1.1.1.1
- 8.8.8.8
routes:
- to: default
via: 10.10.10.118
- to: 10.10.10.112/29
via: 10.10.10.118
table: 112
routing-policy:
- from: 10.10.10.112/29
table: 112
ens1:
dhcp6: false
addresses:
- 192.168.1.77/24
- AAAA:BBBB:CCCC::77/64
nameservers:
addresses:
- 192.168.1.1
Not real addresses, of course. But both so I can see it in the future and remember, and maybe so someone else can solve it for their network similarly.
The change I made that broke things, that I later removed was this bit of the ens1 definition:
routes:
- to: default
via: 192.168.1.1
table: 192
- to: 192.168.1.1/24
via: 192.168.1.1
table: 192
routing-policy:
- from: 192.168.1.1/29
table: 192
That is supposed to keep the traffic on the LAN, and allow traffic originating from a service bound to the LAN to use the LAN gateway. It seems to have added trouble for LAN traffic staying on the LAN. I noticed this as log entries for tests from this server to the other always seemed to have the gateway address and not the host's address. That's what made me remove that bit on a little bit of a rage attempt, and then finally have success.
This particular server has 2 IP addresses served by the same NIC. It's got NIC ports so I could plug another Ethernet into the router and address it separately, but why bother? The ens0 interface is connected to the router handling the CenturyLink /29 network (not really in the 10.10.10 space). The ens1 is connected to the router handling the USI connection, which has to be NAT and use the virtual ports because of the /30 network. All the originating Internet traffic on the server is routed through the ens0 interface, defaulting to the 115 address, unless bound (like SMTP is) to the 117 address. All of the incoming traffic returns the way it arrived.
Eventually I'll remove all the CenturyLink static IPs, after I get the services all moved to the new static IP. I will need to make some decisions about the SMTP server. One is that I need to change from residential to business service, as USI blocks email ports on residential service, except through their relay. This won't work for my many domains and few users, though. If I can get a bigger address pool I'll rearrange it, too, but for now this works.
It is hard to find a multi-domain multi-user mail service, even for a fee. Google used to offer its workspace services for free for small groups, but working with multiple domains was hard; essentially each domain was a new workspace, or everyone was grouped together. I'll find something...for now I'll get off the SMTP grumble train.