CrowdStrike Crashes the Microsoft Windows World
A file update gets pushed and blue screens Windows users around the world.
Diversification and Modularization would have saved us today. Those strategies are common software solutions to very solvable computer problems in 2024, but often come with more investment cost and upkeep.
But, let's back up a second and go to how we got here. CrowdStrike - cybersecurity company extraordinaire - released a file update to their platform very early in the morning on the US east coast. It was not quality assured before being rolled out to a large portion of Windows PCs running the company's software agent that helps to protect against malicious attacks. Upon the agent triggering Windows to restart, Windows booted up into a blue screen of death. And, that's where a lot of PCs sat this morning when IT professionals in all industries around the world went to work. Many were left trying to manually rollback the update in Windows Safe Mode.
Industries that were affected included 911 call centers, TV networks, UPS and FedEx, payment processors, airlines, and many more. Airlines seemed to be some of the most hardest hit as most of the public and private facing systems had gone down, cancelling and delaying countless flights. However, a bright spot: Southwest Airlines was partially saved by its use of Windows 3.1 and Windows 95 (I'm not condoning the use of 30+ year old operating systems, but more on this later).
Let's break down the problems at play in this situation:
Many of the most secure cybersecurity software executable processes have access to the Operating System (OS) kernel, the lowest layer of code that orchestrates communication amongst all other code layers of the OS. Often, exploits try to target this layer since it grants the maximum amount of permissions.
Windows is the Operating System used almost ubiquitously in PC desktops - with Mac typecast as only for creatives and Linux as only for coders.
Quality Assurance! Especially for publicly traded companies! My conjecture as a professional software developer: someone at CrowdStrike made a small change to the release after testing the new release all day long, and thought "surely my tiny change doesn't need testing after I've tested so much today already." They cut the corner on their own testing and possibly that of any other team members or QA people.
So, solutions?
Diversifying isn't a new concept, but it's often overlooked as it adds cost and time. One could argue, though, that it is insurance against the possibility of an issue like today's CrowdStrike one. Here are some of the technologies that could be modified by any of these industries in order to mitigate the risk:
Know your infrastructure architecture and all of the single (or even double) points of failure. What would happen if any or all of these go down?
Diversify your hardware. Buy desktop PCs and networking devices from multiple vendors and manufacturers. Use Apple, use Dell, use HP, use Cisco, use Netgear. Think of multiple vendors as a way to add redundancy.
Diversify your Operating Systems. Windows, Mac OS, Linux. Give them all a shot. And, any systems that are segregated from the Internet could even use old OS versions that no longer receive updates (in a redundancy capacity like Southwest Airlines did).
Diversify your software. If any of these companies were using only CrowdStrike as their security platform, then they had lost the game already. Use 3 or 4 or even 5 security platforms (not all running on the same device - mind you - only one per device) and mix up the usage amongst Operating Systems.
Assure your Quality. It shouldn't be a trite saying. It should require the follow up question and actions, "where are my blind spots, and what are the steps to fix them?" List your blind spots. List your steps. Then, do the steps. Rinse and repeat.
As far as I can tell, I'm one of the lucky ones. Nothing I've worked with or used today was visibly affected by the outage. But, millions of people were and are having to cope with a tech company's laziness. For that reason, we can and should do better.