Looks like i picked the wrong week to quit sniffing glue

This week reminds me a bit of the Airplane! scenes “a bad week to have quit […]” where it starts off with smoking and then escalates from there.
Not sure which came out first, but on the same night I stayed up late reading the Cyber Safety Review Board’s report on their investigation and recommendations on the Microsoft Exchange Online intrusion I stumbled on the opensource xz backdoor attack while trying to find out more about the Microsoft incident.

So given that both events are not really too serious in that the Microsoft report was a hack that happened back in June and the xz thing is actually a positive in that it was a foiled attempt to plant a backdoor, why did this make an airplane bad week to have quit moment? Well, good question, and it comes down to the fact that since staying up late reading the report and then reading about the xz I have not stopped looping back to both events and questioning the bigger implications. Kinda like a movie or book that you watch, and then spend the next few days looking at it from multiple angles, and seeing things from those new angles.

So let’s first give a high level overview of both for those who have not heard about them, then let’s enumerate those new angles and pondering in attempt to organize my thoughts!

Cyber Safety Review Board’s report on Microsoft Online Exchange Incident from Summer 2023

The report is very interesting summary of their investigation outlining the facts, findings, and recommendations. It is of particular note that the board did not mince words or hold back on their holding Microsoft to blame for the intrusion and that their recommendations are very far reaching.

What happened?

On June 15th the US State Department’s Security Operation Center (SOC) received multiple alerts for a custom “Big Yellow Taxi” event correlation rule (Politico has published a bit more information about this in an article in September 2023 but I have not been able to find much more technical details about the rule). The rule had previously given some false positives, but this time it hit pay dirt and alerted the SOC to what was in fact a real intrusion that had been completely undetected by Microsoft or any of the other victims that were later identified.

This lead to what I can imagine was a rough next few days for the State Department security guys and Microsoft while they investigated what the heck was happening: A threat actor was using a 2016 MSA key to forge authentication tokens that allowed access to their victim’s Exchange Online account.

This was strange and puzzling for 3 main reasons:

  1. How was someone forging tokens?
  2. Why was a key from 2016 still working in 2023?
  3. Why was a token for Microsoft’s consumer authentication system working with Microsoft’s Enterprise Exchange Online system?

All are good questions and have been answered but the first as it is still unclear how someone other than Microsoft came into possession of a MSA master key from 2016. There are theories it was from some previous intrusion on Microsoft’s systems, but the smoking gun has not yet been found.

The reason why such an old key was working was that because the MSA identity infrastructure was created more than 20 years ago and had no automated signing key rotation system so it had to be done manually. In 2021 one of these interventions caused a major outage and since then Microsoft had stopped rotating the keys, allowing the one from 2016 to generate authentication tokens that were valid.

And for the third: Microsoft created authentication API’s that were useable for both their enterprise system and consumer system and had not bothered to make sure that the authentication token from each system was only valid on it and not the other.

What keeps bouncing around my head about Microsoft Online Exchange Incident?

  1. Why did multi-factor authentication (MFA) not stop this?
    We have spent the last few years with considerable effort and cost ensuring all systems have MFA and for what if someone just completely bypassed that whole part of the process!

  2. How many accounts were compromised?
    Luckily the Sate Department had some really smart people (or probably one really smart person) who implemented the Big Yellow Taxi and this all came to light, but how long had this been in use before they tried it against the Sate Department?
    Well, we don’t know because the logs needed to detect it were only held for 30 days so once it was clear what was going on they were analyzed they only had data that went back a month. In that month of data it was determined that accounts of 22 enterprise organizations had been compromised (I have not seen data on the number of individual email accounts compromised).

xz backdoor

This incident is more recent and currently ongoing and all kicked off on Friday March 29th when Andres FREUND published a (post to a Openwall opensource security mailing list indicating while investigating some performance issues of SSH on a bleeding edge version of Debian he had discovered a backdoor. His investigation is really detailed and led him down a long and twisty rabbit hole eventually arriving at:

  1. It was not SSH that was causing higher than usual CPU utilization but a package called XZ Utils that had a backdoor that had been planted in the latest release.
  2. SSH was using this library for LZMA compression algorithm.
  3. The attacker specifically targeted Debian and RedHat in the build process because both used system level libraries via systemd allowing this backdoor to be placed in an obscure downstream library subjected to less scrutiny than SSH.
  4. The actual back door was not in the GitHub code but was in the configure script, tarballs and also binary files that were in the test area to load aspects of the backdoor. The working of linux releases and how all this is compiled and included is beyond my grasp, but the important thing here is that it was not just a backdoor sitting in the source code easy to see, but obfuscated in multiple layers.
  5. The backdoor is invoked by connecting via SSH with a specially crafted SSH key that while being LZMA decompressed in the xz library allows the attacker to run a payload.
  6. The attacker actually started planting the seeds for this in 2021, so this was a long game adversary with lots of patience. The final touches of the backdoor were introduced into the code repo on February 23 and had only made it into beta versions of Debian and RedHat Fedora and before being detected by Andres FREUND on March 28th.

What keeps bouncing around my head about xz backdoor?

  1. Near miss but what other things are out there that have not been detected?
    Thanks to Andres FREUND this crazy backdoor was detected after being in the wild only a month and had not made it into any mainstream releases and the clean up has been easy due to the small footprint of impacted systems. But what other backdoors have been placed and not detected?
  2. Is open source really such a good thing?
    I understand this question is very polemic but I put the question keeps coming back to me on three fronts: 1. It allows bad actors to find security holes and exploit them without reporting them. 2. It allows bad actors to try and plant exploits as demonstrated in this case. 3. This allows counties we might not necessarily agree with their policies appropriate the knowledge and technology (i am looking at you North Korea). I am sure all of these points have been hashed over by people much smarter than me, but these are the things that keep bouncing around my mind the last few days.
  3. How do you protect against supply chain attacks?
    It is hard enough just keeping a clear inventory of all OS’s and applications and keeping them patched and secure, but given this attack at an underlying library it is clear that even all that effort might not be enough so how to even go about it this additional complexity?
  4. Is it a good idea just keep patching?
    Had this been undetected, at some point in the not too distant future I would have been doing my normal good hygiene patching of my Debian systems and instead of making them more secure, I would have planted a big fat SSH backdoor on them. That is kinda the opposite effect I was hoping for spending my time patching!

Learning take aways

So after getting over my “wrong week to have quit sniffing glue” shock, I need to see these events as what they are: a learning event that needs to be taken advantage of in our loop of constant improvement. So in that vein here is what I have identified so far:
(1) Security continues be like an onion and is achieved through layer after layer.
(2) The more footprint you have the more possible vulnerabilities you have. Get rid of anything not essentially needed (services, applications, browser plugins, etcetera).
(4) I hope with this will encourage more funding into code reviews of open source packages that have obscure unknown names but are a what make up and enable what we know as the internet.

Thanks for reading and feel free to give feedback or comments via email (andrew@jupiterstation.net).