Faulty CrowdStrike Update Leads to Global Microsoft Outage
A faulty update by CrowdStrike created a massive worldwide outage across Microsoft Windows systems. Thousands of people and businesses witnessed system crashes and the Blue Screen of Death (BSOD). Learn more about this global outage, its cause, and a possible workaround.
- People and businesses using Windows systems worldwide witnessed a global outage.
- Many people experienced system crashes, the Blue Screen of Death (BSOD), and bootloop.
- The global outage is attributed to a faulty update by CrowdStrike.
A faulty update by CrowdStrike has created a massive worldwide outage across Microsoft Windows systems. Thousands of people and businesses witnessed system crashes and the dreaded Blue Screen of Death (BSOD). Many Windows-based computers also experienced a boot loop, where computers randomly started up and shut down. The glitch affects Windows servers and workstations and has even taken entire companies offline. Microsoft Azure and Microsoft 365 services, too, are said to be experiencing disruptions.
The Cause of the Outage
CrowdStrike is a major player in cybersecurity, and many businesses worldwide depend on the company’s software to protect their Windows servers and PCs from cyber threats.
The outage across Windows systems is linked to a software update by the cybersecurity services provider. Microsoft said the preliminary root cause was a “configuration change” in some Azure backend workloads, which caused interruptions between compute resources and storage, resulting in connectivity failures. These affected downstream Microsoft 365 services dependent on these connections.
According to George Kurtz, CEO of CrowdStrike, the outage was caused by a “defect” in a content update for Windows hosts. Kurtz also ruled out cyberattacks and said that Linux and Mac hosts were unaffected.
A post on CrowdStrike’s support forums acknowledged the issue, saying the company received crash reports related to a content update for Falcon Sensor, its cloud-based security service.
In an X post, Kurtz said that the company was rolling out a fix. He said, “CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted.”
“This is not a security incident or cyberattack. The issue has been identified, isolated, and a fix has been deployed,” Kurtz further said.
CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed. We…
— George Kurtz (@George_Kurtz) July 19, 2024
Who Has Been Affected?
Thousands of businesses across industries and sectors have been affected by the outage, including:
- Emergency services in Canada and many major American cities, including the 911 emergency services in New York, Arizona, and Alaska.
- The health hotline in Catalonia, Spain, is reported to be impacted.
- Hospitals in the Netherlands, the US, and Spain
- The UK’s National Health Services’ (NHS) clinical computer system.
- Dutch broadcasting organization NOS reported that the glitch disrupted the Schiphol Airport.
- Airports and airlines in Australia, Germany, Scotland, Spain, India, the Netherlands and the UK.
- Multiple news outlets and television stations, such as ABC and Sky News, suffered disruptions.
- The London Stock Exchange in the UK reported disruptions. Several banks in Australia and New Zealand also reported being affected by the outage.
CrowdStrike Suggests a Workaround
While the company fixes the issue, it has provided a few workaround steps for people who have been affected.
- Boot Windows into Safe Mode or the Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
- Locate the file “C-00000291*.sys”, and delete it.
- Boot the host normally.
Executives and Industry Experts Share Their Views
As soon as the global outage made headlines, several industry experts shared their views. Here are a few.
- Jake Moore, global security advisor at ESET
“These outages are increasing in volume due to the sheer increase in online users and traffic. After witnessing the blue screen of death (BSOD), many people are quick to suspect a cyberattack or find similarities to Netflix’s Leave The World Behind, but this can often add to the confusion. It highlights the importance of these services and the millions of people they serve.
Businesses must test their infrastructure and have multiple fail safes in place, regardless of company size. This is typically referred to as a cyber-resilience plan. However, as is often the case, it is simply impossible to simulate the size and magnitude of the issue in a safe environment without testing the actual network.
The inconvenience caused by the loss of access to services for thousands of people serves as a reminder of our dependence on Big Tech, such as Microsoft in running our daily lives and businesses. Upgrades and maintenance to systems and networks can unintentionally include small errors, which can have wide-reaching consequences, as experienced today by Crowdstrike’s customers.
Another aspect of this incident relates to “diversity” in using large-scale IT infrastructure. This applies to critical systems like operating systems (OSes), cybersecurity products, and other globally deployed (scaled) applications. Where diversity is low, a single technical incident, not to mention a security issue, can lead to global-scale outages with subsequent knock-on effects.”
2. Rob Reeves, principal cyber security engineer at Immersive Labs
“It is still too early to judge how such an error occurred and whether a code fault with the driver or an unanticipated and undocumented change in the Windows Operating System, which CrowdStrike was unable to predict, is responsible. It is clear, however, that the heavy reliance on Falcon has become a double-edged sword and is causing untold disruption to business operations worldwide.
The severity of this incident serves as a stark wake-up call, highlighting the critical need for rigorous and dependable testing of EDR and ELAM drivers in cybersecurity systems. Now more than ever, it is crucial to reassess and overhaul current testing procedures, swiftly identifying and addressing any issues that arise.
This prompts reflection on whether security product updates should be automatically applied universally for up-to-date protection or if customers should maintain control over the update process, ensuring thorough testing before implementation.”
3. Aleksandr Yampolskiy, CEO, SecurityScorecard
“When I used to work at Goldman Sachs, the policy was to get tools from multiple vendors. This way, if one firewall goes down by one vendor, you have another vendor who may be more resilient.
Today’s global outage reminds us of the fragility and systemic “nth-party” concentration risk of the technology that runs everyday life: airlines, banks, telecoms, stock exchanges, and more. SecurityScorecard, in collaboration with McKinsey, produced research showing that 62% of the global external attack surface is concentrated in the products and services of just 15 companies.
An outage is just another form of a security incident. Antifragility in these situations comes from not putting all your eggs in one basket. You need to have diverse systems, know where your single points of failure are, and proactively stress-test through tabletop exercises and simulations of outages. Consider the “chaos monkey” concept, where you deliberately break your systems—e.g., shut down your database or make your firewall malfunction to see how your computers react.
Whether caused by a malicious DDoS attack or a faulty patch update, an outage has the same end result: Users are denied access to critical systems.
This disruption creates a fertile ground for exploitation, as attackers prey on the vulnerability of users seeking solutions. The timing of this event and how public it is happens to be exactly what attackers look for to craft targeted attacks. Threat actors may use social engineering tactics to disguise malware as legitimate restoration tools to gain unauthorized access to systems. Vigilance is paramount, as organizations must not only address the outage but also fortify defenses against opportunistic attacks that exploit the chaos.”
4. Carlos Aguilar Melchor, chief scientist, cybersecurity at SandboxAQ
“It is essential to have visibility on your software supply chain, especially around critical practices such as cybersecurity, cryptography management, and, of course, testing and updates practices. With this historical outage, along with other recent software supply-chain catastrophic events, such as SolarWinds and Log4j, we cannot accept with blind trust software updates nor blindly trust cybersecurity or cryptography practices. Every company should implement observability in their software systems right away to monitor these high-impact platforms and prevent these catastrophes.”
5. Graham Steel, head of cybersecurity product at SandboxAQ
“This major outage was caused by a bug that CrowdStrike didn’t catch before rolling out an update to thousands of companies globally. This new outage should spur all companies to implement systems that analyze every update before it is allowed into their company. Recent consolidation in the cybersecurity market has increased the risk of this recurring – businesses rely on just a few vendors.”
For the latest service updates from the vendors, visit the status page for Microsoft Azure and CrowdStrike’s Statement on Falcon Content Update for Windows Hosts.
The article will be updated as we receive more updates on the issue.