Crowdstrike outage and the single-point failure problem of worldwide software

Technical failures show how important it is to protect critical systems: The CEO of the Chertoff Group

The frequency of large-scale attacks on corporate IT is increasing. This is neither unusual nor unexpected, as firms invest heavily in cyber defense within the asymmetric fight against hackers who can wreak havoc with a number of lines of code.

But Friday's largest IT outage on record, caused not by a malicious attack but by a CrowdStrike software bug that became embedded in Microsoft's operating systems, highlights a variety of tech threat that’s growing alongside hacker attacks but gets less attention: the single-point failure – a flaw in a single a part of a system that triggers a technical catastrophe across industries, functions and interconnected communications networks – an enormous domino effect.

Earlier this 12 months, AT&T experienced a nationwide outage resulting from a technical update, and last 12 months the FAA experienced an outage that occurred after a single person replaced a critical file during a route update (now the FAA has a backup system to stop something like this from ever happening again).

“It's happening more frequently, even if it's just routine patches and updates,” Chad Sweet, co-founder and CEO of the Chertoff Group and former chief of staff on the Department of Homeland Security, told CNBC on Friday.

Managing risk for single points of failure is an issue that firms have to plan for and protect against. There is not any software on the planet that’s released that doesn't should be patched or updated later, and there are security best practices that exist for the period well after a production release and canopy ongoing software maintenance, Sweet said.

Companies the Chertoff Group works with are closely reviewing their standards for software development and updates following the CrowdStrike outage. Sweet pointed to a set of protocols the federal government already provides, the SSDF (Secure Software Development Framework), that might give the market an idea of ​​what to anticipate when Congress begins to look more closely at the difficulty. That's likely after the recent string of incidents, from AT&T to the FAA to CrowdStrike, as this sort of tech failure is now proven to affect residents' lives and demanding infrastructure operations on a big scale.

“Prepare on the corporate side,” Sweet said.

Aneesh Chopra, Arcadia chief strategist and former White House technology chief, told CNBC on Friday that critical sectors akin to energy, banking, healthcare and airlines have separate risk regulations and that the actions could also be different in essentially the most heavily regulated sectors. But for any business leader, the query now’s: “Suppose the systems fail, what's plan B? We're going to see a lot more scenario planning, and if that's not job No. 1, it's job No. 2 or 3 to map out those scenarios,” he said.

Former White House CTO Aneesh Chopra on major global tech outages: “This is a wake-up call”

Unlike many other issues in Washington, Chopra said, there may be bipartisan engagement on critical infrastructure and systemic risk issues, and technical standards are a “hallmark” of the U.S. system. There may now be efforts he said are aimed toward “improving competition” to extend accountability.

“If there is a mechanism for a more open and competitive update, there could be pressure to ensure that this is done with all the necessary detail,” Chopra said.

Sweet said this can inevitably raise concerns within the business community in regards to the danger of over-regulation. While there may be currently no option to say needless to say whether CrowdStrike could have used a more open process that might have allowed the single-point flaw to be identified, he said it’s a fair query.

The best option to avoid over-regulation, Sweet said, is to depend on market-strengthening mechanisms akin to the insurance industry. “The short answer is, 'Let's leave it to the free market, such as the insurance industry, which rewards good players with lower premiums,'” he said.

Sweet also said more firms should embrace the concept of ​​”antifragile” organizations, as he does along with his clients, a term coined by risk analyst Nassim Nicholas Taleb. “Not just an organization that is resilient after a disruption, but one that thrives and innovates and outpaces the competition,” he said. In his view, any single laws or regulation would struggle to maintain up with each malicious attacks and technical updates which are pushed through with unintended consequences.

“This is definitely a wake-up call,” Chopra said.

image credit : www.cnbc.com