Broke and Fixed Edge Router
I'm calling it the "edge" because it's the router between my network and my ISP's, even though it's not a proper edge.
I bought a different router for the job, only to learn it's always a NAT (because why wouldn't you want to always NAT?), and couldn't be tricked into doing other things. So I repurposed a previous gen WiFi router that I had been using as an access point in the house, which was superfluous since I have two dedicated APs. With a little tweak in its UI and a couple firewall rules to allow straight-up IP forwarding, it started doing the job.
I planned to research and replace it with a different system, probably a dedicated firewall server or hardware. I've been intrigued by the firewalla.com routers for a bit, as they're well liked online and seem to have all the whistles and most of the bells I might want. But they cost more than a new computer! I only have 1GB fiber now, but can have 10GB (or more, maybe) in the future, so I wanted to find something that fit my budget and my fantasy future network.
Over the last few days, though, the network has gotten wonky, as I noted the other day. There have been drops and pauses, and downright failures. Last night while watching television, the network just plain stopped wanting to deliver anything to the television. A bunch of other stuff didn't work right, either, although the servers still behaved.
It bothered me a lot, because this edge router couldn't do simple things like DNS look-ups and ping sites, but the servers and other router could. Some of the things that worked on tablets and laptops were because they weren't using the DNS from the routers, or therefore from our ISP, but using DNS-over-HTTPS instead. Some of the things that didn't work weren't using DoH, so that pointed me to the same DNS failures. The servers use the ISP DNS directly, but they weren't failing (as far as I could tell). So the whole problem seemed to be related to the edge router's inability to DNS or ping.
I poked until 1AM last night, and picked it up again this morning. I couldn't get anything to behave on the edge, although the rest of the network seems fine. The WiFi router and Internet servers are all connected to the edge server, and the rest of the devices use NAT from the WiFi router. I got the WiFi router to be the LAN's DNS, and it uses DoH to Cloudflare (or Google), with the same straight DNS servers. That seemed to stop any LAN hiccups. And there haven't been any server hiccups, except during the blips when I did something that restarted the edge router.
Since I couldn't find a cause for the stoppage of ping and DNS, I thought it must be something in the firewall on the router. So I figured I'd give chance a chance and turned it off. It didn't help. So then I turned it back on, and for whatever reason I couldn't connect to it directly again. It still passed traffic, but I couldn't SSH or HTTPS to the router from anywhere on or off my network!
I couldn't leave it in that state, so I popped the factory reset button and tried to set about getting everything back up and running quickly.
I tossed in the static IP from my ISP, but couldn't remember the subnet mask for its CIDR /30 network (it's 255.255.255.252), so I put that in wrong just so it would start. I also didn't have my phone in hand, so I couldn't get to the e-mail where I'd shared the MAC address for the router's interface (which was really the third router in that spot, so it isn't the router's real MAC address), so I knew the ISP wouldn't accept it until I put it in right anyway.
I tossed in the static IPs for my LAN from my ISP, and do recall the subnet mask for a /29 network (it's 255.255.255.248, so I should have been able to math the other...). Changing that from the 192.168.1.0 network it starts with caused a little havoc as my device had the wrong network, so I grumbled through that reconfiguration, too. I got the NAT turned off, but realized I didn't remember the iptables rules to get the forwarding working. It's pretty simple, so it didn't take long to get it working with some Internet sleuthing.
The edge nodes could all reach the Internet, pinging and curling and so on, but nothing could come in. Internet responses came, of course, because of the default firewall rules allowing outward traffic and established connection responses, so once a node reached out, the response could come back. I found the simple rules that allowed all the traffic coming on one interface bound for another to be accepted, but took a moment longer than before to make it special for the two nodes that I wanted to expose. Instead of allowing the whole world access to all the ports on the nodes (and then trusting the nodes' firewalls), I just exposed the ports needed on the couple nodes I wanted.
iptables -I FORWARD 2 -d webserver_ip/32 -i eth0 -p tcp -m tcp -m multiport --dports 80,443 -j ACCEPT
iptables -I FORWARD 3 -d web_and_mail_ip/32 -i eth0 -p tcp -m tcp -m multiport --dports 80,443,25,465,587,110,143,993,995 -j ACCEPT
Obfuscated a little, as I used the real IPs in there. I pondered putting port 22, as I'm wont to use sometimes, but I'll save that for later. This works now. As soon as I added the rule, a bevy of waiting connections poured in.
As an added benefit, the CPU use on the edge router is much less than it was before I did this. There must have been something screwy in there before the rest.
The network isn't any faster, though. Despite having a 1GB fiber, that delivered 1GB with a different router in place, it still peaks around 600Mb/s. Node-to-node on my WiFi I can get faster than 1GB, and hitting something that's wired in caps at the LAN switches' and NICs' 1GB speeds.
Frustrated, in the middle of all of this, I broke down and ordered a faster real router. It'll be here tomorrow, where I'll be waiting to do everything I just did on this thing all over again!