The CrowdStrike glitch and the following global IT breakdown are already imminent

CrowdStrike CEO on global outage: The goal now is to ensure every customer is up and running again

When computer screens all over the world went blue on Friday, flights were canceled, hotel check-ins were unattainable, and cargo deliveries got here to a standstill. Companies resorted to pen and paper. And the primary suspicion fell on some sort of cyberterror attack. The reality, nevertheless, was way more banal: a botched software update from the cybersecurity company CrowdStrike.

“In this case, it was a content update,” said Nick Hyatt, director of threat intelligence at security firm Blackpoint Cyber.

And because CrowdStrike has such a broad customer base, the content update was felt all over the world.

“One mistake had catastrophic consequences. This is a great example of how closely connected our modern society is to IT – from cafes to hospitals to airports, a mistake like this has massive implications,” said Hyatt.

In this case, the content update was tied to CrowdStrike Falcon monitoring software. Falcon, Hyatt said, has deep connections to watch endpoints – on this case, laptops, desktops and servers – for malware and other malicious behavior. Falcon robotically updates itself to account for brand new threats.

“The automatic update feature pushed out bad code and here we are,” Hyatt said. The automatic update feature is standard in lots of software applications, not only in CrowdStrike. “It's just that the consequences are catastrophic because of what CrowdStrike is doing,” Hyatt added.

Although CrowdStrike quickly identified the issue and plenty of systems were back up and running inside hours, the worldwide cascade of harm isn’t really easy to reverse for corporations with complex systems.

“We expect it will take three to five days to resolve the situation,” said Eric O'Neill, a former FBI counterterrorism and counterintelligence agent and cybersecurity expert. “That means a lot of downtime for companies.”

The indisputable fact that the outage occurred on a Friday in the summertime, when many offices were empty and there was an absence of IT resources to unravel the issue, didn’t help, O'Neill said.

Software updates needs to be rolled out step by step

One lesson from the worldwide IT outage is that CrowdStrike's update must have been rolled out step by step, O'Neill said.

“Crowdstrike pushed out its updates to everyone at once. That's not the best idea. Send it to a group and test it. There are different levels of quality control it should go through,” O'Neill said.

“It should have been tested in sandboxes in many environments before release,” said Peter Avery, vp of security and compliance at Visual Edge IT.

He believes that more security measures are needed to stop future incidents by which errors of this sort are repeated.

“Organizations need the right controls in place. It could have been a single person who decided to roll out this update, or someone chose the wrong file to run,” Avery said.

In the IT industry, we discuss a single point failure – a failure in a single a part of the system that causes a technical catastrophe across all industries, functions, and interconnected communications networks; an enormous domino effect.

Call for the installation of redundancy in IT systems

We need to make these systems “much more resilient,” says Cohesity CEO on global technology outages

Friday's events could prompt corporations and individuals to extend their cyber preparedness.

“The bigger picture is how fragile the world is; it's not just a cyber or technical problem. There are a lot of different phenomena that can cause an outage, like solar flares that can take down our communications and electronics,” Avery said.

Ultimately, Friday's crash isn’t an indictment of Crowdstrike or Microsoft, but of the way in which corporations view cybersecurity, said Javad Abed, assistant professor of knowledge systems at Johns Hopkins Carey Business School. “Business owners need to stop viewing cybersecurity services as just a cost, but as an essential investment in the future of their business,” Abed said.

Companies should do that by constructing redundancy into their systems.

“A single point of failure should not be able to bring a business to a standstill, and that's exactly what happened,” Abed said. “You can't rely on just one cybersecurity tool, cybersecurity 101,” Abed said.

While constructing redundancy into enterprise systems is expensive, what happened on Friday was much more costly.

“I hope this is a wake-up call and that it causes a mindset shift among business owners and organizations to revise their cybersecurity strategies,” Abed said.

What to do with kernel-level code?

At the macro level, there’s definitely systemic blame to be placed on corporate IT, as cybersecurity, data security and the engineering supply chain are sometimes viewed as “nice-to-haves” fairly than essentials. And corporations lack overall cybersecurity leadership, says Nicholas Reese, a former Department of Homeland Security official and lecturer at New York University's SPS Center for Global Affairs.

At a micro level, Reese said, the code that caused this disruption was kernel-level code that affected every aspect of computer hardware and software communication. “Kernel-level code should be subject to the highest level of scrutiny,” Reese said, with approval and implementation being entirely separate processes with accountability.

This problem will proceed to persist across the ecosystem, which is stuffed with third-party products, all of which have vulnerabilities.

“How do we look at the third-party ecosystem and see where the next vulnerability is going to be? It's almost impossible, but we have to try,” Reese said. “It's not a maybe, it's a certainty until we get to grips with the number of potential vulnerabilities. We need to focus and invest in backup and redundancy, but companies say they can't afford to pay for things that may never happen. That's hard to justify,” he said.

image credit : www.cnbc.com