Last week, many airplanes stopped taking off from airports, some TV channels in the UK stopped broadcasting, cash registers at Japan’s McDonalds outlets stopped working, emergency phone services in Alaska stopped taking calls, many banks around the world stopped giving out any money and some police departments stopped work altogether. This was caused by a big computer crash and many businesses saw their Windows computers freeze up. On their screen was a giant blue colour painted all over (also called the ‘Blue Screen of Death’ - BSOD). It was a huge and unusual event! It took atleast four days for things to be fixed and life to get back to normal. Experts call it ‘the greatest IT failure in human history’.
Every week on Lighter Side, I write one detailed news story - keeping in mind readers who are as young as 8 and as old as 100. This week’s story covers how a single line of code in a computer brought the entire world to a halt. If this post was forwarded to you and you liked it, consider subscribing. It’s free.
What happened exactly? 🖥️
On the morning of 19th of July, 2024 (Friday), ~8.5 million computers worldwide crashed (aka stopped working). All of them were computers that worked on Windows.
People waiting to board their flights in airports were stranded. Their computers stopped printing boarding passes (some resourceful airlines in India and the US moved to tearing sheets of paper and writing the boarding passes by hand, to get passengers going). The rest canceled their flights - more than 10,000 flights were canceled worldwide.
Public trains, buses and underground metros stopped working in cities in the US (Chicago, New York, Washington DC)
Hospitals could not accept new appointments or admissions or schedule surgeries. Patients had to wait for hours to be able to see a doctor at a neighbourhood clinic. Emergency calling services in some states in the US stopped accepting calls (Alaska, Indiana, New Hampshire etc.).
In some banks, online banking systems stopped working. People could not transfer money, some employees of companies did not get their salaries remitted into their accounts on time and many financial transactions globally had to be halted.
Many TV broadcast channels (eg. Sky News in the UK, ESPN in the US) stopped working
During a Formula One race, the computer of one of the racing companies (Mercedes) crashed. The driver and the car were waiting for the computer to be fixed.
The crash was sudden. This outage or crash was global - computers in all continents (barring Antarctica) were hit. Nobody knew what was going on.
Within 79 minutes, the problem was identified
A company called Crowdstrike clarified on Twitter, saying something to the effect that
Hey guys, all those of you with your computers frozen, don’t worry! This is not a cyberattack or an alien attack! Our company wrote one faulty line of code that went into your computers. Don’t worry, we’ve figured out how to fix it. We’re putting out a detailed guideline on how to fix it on the internet. Check it out.
What was this faulty line of code? 🛑
CrowdStrike is a company that helps other companies write softwares in their computers to protect them from cyber-attacks.
Every day, Crowdstrike sends new lines of code to computers worldwide, to strengthen them against cyberattacks (each day, they discover something new that might pose a threat). On the 18th of July, 2024, one of their software engineers wrote a line of code, which in simple English words, was supposed to do this.
Go into the computer’s operating system.
Check the different processes that are talking to each other
If you find any process suspicious, shut it down.
Here’s what the software actually did
Go into the computer’s operating system.
I don’t like many of the processes inside this computer. Hmmm…most of them look suspicious to me!
Let me shut them down
Result: The entire computer shut down
If you are a bit of a geek and would like a bit more information on the software code (beyond simple English), here’s what the faulty line of code did.
Crowdstrike’s code found legitimate (perfectly good and well-behaved) processes malicious. This was because the line of code that was written was too broad-based <check if there is ANYTHING malicious>. The code ended up finding / labeling most processes as a threat.
This led to the inadvertent termination of each of these processes. This in turn caused widespread system disruptions and failures.
CrowdStrike quickly identified the issue and released a subsequent update to correct the erroneous detection logic.
BUT, the fixes had to be done manually.
The only way this faulty line of code could be fixed, was by people booting each computer or laptop manually (just pressing a start button on the keyboard would not have worked). Technical staff went from one computer to another, rebooted each computer manually (imagine doing this for 8.5 million computers). The process took more than 4 days for a large chunk of the computers to be up and running.
This picture below was shared by someone who had to reset 2000 laptops in his company. These are just 120 of the laptops.
People have suffered in the millions. Who should we blame for this?
When most companies roll out a new line of code, they first run tests in a small group and check for errors. For example, Crowdstrike could have run this as a test in
1% of the computers in San Francisco city (check for errors).
If all well, run on 10% of computers in California state (check for errors)
If all well, run on 100% of computers in the USA (check for errors)
If all well, run on 20% of the rest of the world (check for errors)
If all well, run on 100% of the world
Crowdstrike chose a different method
Run the code on 100% of the world
Which we can all agree was lazy, tardy and enormously irresponsible. Large companies (like Crowdstrike) should have well designed processes to test each line of code thoroughly. Any software engineer writing code can have a bad day and may come up with something erroneous. The processes in these organisations should be robust to test every line written by every engineer before they are deployed anywhere (much less in the entire world in one-go). That is also why Crowdstrike’s investors are suing it in court saying the company’s testing processes are not good enough, and they should not be asked to bear any losses.
Ahem! That brings us to an important point - after every exam you’ve written - check your answers :)
Writing Course for Kids: If you enjoy writing or spinning stories in your head, you could become the next published author! Assuming the you is between the ages of 8-15:) We conduct a creative writing course that is designed for young children to bring their colourful ideas into one consistent story, show them how to use appropriate vocabulary to edit their work and help them publish the book. It’ll be fun! Check it out! The first class is available as a FREE TRIAL. Drop me a note at hello@wsnt.in if you would like to sign up for a free trial.
Update story - the PM of Bangladesh gets on a military aircraft and flees!
In case you have been reading Lighter Side updates regularly, you may have known that lots of students have been protesting in Bangladesh. Pakistan has been watching this chaos with glee (over the past 15 years, when Bangladesh’s income grew rapidly and Pakistan’s stagnated, they were not very happy)
This Monday - the Prime Minister of Bangladesh (Sheikh Hasina) had her morning cup of coffee and called the police and armed forces heads for a meeting. Large crowds had gathered on the roads and were marching towards her home. They wanted her to resign. It is suspected that many of these were not the students who had begun the protest to simply ask for the job quotas to go away. It is alleged that many of them were elements instigated by Pakistan’s forces to create chaos inside Bangladesh and topple the government.
The army chiefs felt that the PM’s life was in danger if she stayed on. Sheikh Hasina had exactly 45 minutes to resign and flee the country. She did just that. She got into a military aircraft, sent a message to the Indian government requesting them to give her asylum. Within a few minutes, Indian government radioed approval for her aircraft to enter the Indian airspace.
As of writing this newsletter, she is in Delhi, hoping that the UK will give her asylum. The chaos in Bangladesh has a very interesting back-story. Here’s the whole history (including this week’s updates).
Podcast this week
China is the world’s largest producer of electric cars. The US desperately wants cleaner air and thus more electric cars. But, it’s simply not managed to get electric cars that are affordable and in great working condition, onto its roads. China has managed to do this. Despite the US (supposedly) keen on clean air, it has said that if Americans want to get these cars from China, they’ll have to pay a big, fat tax. Are they nuts? Why would they do that? We look into this (seemingly weird) story in this week’s podcast. It’s a 11 min listen. Click right here.