Things go wrong.

Takeaways

  • Rate limiting logins works,
  • however it is at 180 attempts per computer per hour.
  • Default server logging was not suitable for diagnosis
  • No notifications of high login attempts
  • No notification of software failure.

Looking at the first two posts on analysing 320 days of an email server, there is a suspicious spike in late May 2023 from IPs in the UK.

Zooming in, we find it’s only two computers, but that they’ve put 100 requests in each. That should never be able to happen.

To recap, the email server has:

  • rate limiting the amount of logins
  • short term bans
  • long term bans
  • geographic bans

The rate limit is set to 3 logins a minute.

Between 08:12 and 08:51, 2 computers attempted 206 logins. 39 minutes at 3 logins a minute for 2 machines, would equal 239 attempts. On the positive side, the rate limiting is working. Once again, no notifications were sent out.

This indicated a failure in the short term ban software, each computer should have been blocked after 5 failed logins.

Diagnosis

As well as the lack of notification and failure alerts, in November 2023 when the logs were collected, it was discovered that the server default installation only kept most logs for 12 weeks making it difficult to investigate why the software had failed.

Log collating software mails out a summary each day and provided the only clue to what happened. It seemed that python libraries were updated around 6am on the 25th, the short term banning software is written in python and in all likelihood it should have been restarted. Instead a server restart was scheduled for that night, 18 hours later. The server was restarted manually at just before 9 am by a user out of schedule. If this had not happened and the banning software had not caught up with the logs, those two UK ip addresses would have amassed 6,480 login attempts.

The email server relies on the short and long term banning quite heavily, Iran was already geographically blocked but at 180 logins possible an hour, the previous post shows how reliant the server is stopping the squads of computers attacking.

If the China based botnet adopted a scouting pattern like Iran and then sent all 900+ computers from inside its borders it could test over 162,000 user and password combinations an hour.

At that kind of volume, the server would have complained due to existing mechanisms about server load and network bandwidth, it does highlight how rapid this event was with only 2 machines attacking and pure luck in having someone manually reboot the server hours before it was scheduled.

Mitigations.

  • Notifications of an increase in login attempts.
  • Notification of software failure.
  • Increasing the server logging time from default values.
  • Adding a note to increase logging on all newly commissioned servers.

Notes on notifications. https://www.baeldung.com/linux/systemd-service-fail-notification gives a script to notify slack and email using smtp to connect to an external server, however as this is an email server the scripts were rewritten to use the local services and has the advantage of not storing plain text smtp configurations on the machine.

It should be highlighted that graphics can be misleading even if informative.


Posted

in

by