Is Software a Mature Discipline ?

What sparked this whole line of thinking?

Recently a single software update from CrowdStrike brought down airlines, hospitals, banks, and more. Just one update—one mistake—crippled systems around the world. It made me ask: how did we end up with a global digital setup that fragile?

Wasn’t that just a rare incident? A freak error?

That’s what I thought at first. But then the question nagged at me: why should a single bug in one company’s update have the power to break the world?
Maybe this wasn’t a freak accident. Maybe it revealed something much deeper: the immature, overly centralized way we’ve built our digital systems.

So… is software still an infant discipline?

Can’t we make software as reliable as bridges or hospitals?

We can try. But here’s the catch—bridges have been around for thousands of years. Medicine evolved over centuries of painful trial and error. Software? It’s a few decades old. We’re still figuring out how to even define “best practices” in many areas. Plus there is a certain discipline which automatically comes from the users – fear of accidents, death in case of making mistakes. This does not exist in software.

But doesn’t software evolve faster than those older fields?

It does. And that’s part of the problem. Software can be changed, duplicated, and deployed instantly and globally. The same power that allows fast innovation also allows fast failure. What spreads fast breaks fast.

Why is software so prone to breaking down? Aren’t we testing everything before release?

We try—but software lives in an unpredictable world. Even if code is solid, external conditions change. A system made for 100 users suddenly faces 10,000. A harmless file becomes a giant edge case. A dependency updates and breaks your code. Code is written and tested based on certain assumptions, if those assumptions break, the code also breaks.

Is the real problem complexity?

Yes—but not just technical complexity. Social complexity, too. Software development involves thousands of decisions: languages, tools, naming, structure. Every developer does it differently.
Add to that: the flood of suggestions from non-technical managers, customers, investors. Everyone has an opinion. Every change introduces risk.

Can’t we just simulate everything and catch issues early? Can modern tools handle this level of variation?

Not really. Tools can test common scenarios. But they can’t simulate all the weird ways real users will interact with your system—or all the ways the environment might change. That’s how bugs creep in without changing a single line of code.

Are more global disasters like CrowdStrike waiting to happen? Is this going to get worse?

Unless we change our approach, yes. We’re digitizing everything—from water supply to national defense—without fully grasping how vulnerable these systems are. One bug, one outage, one misstep can ripple through industries and continents.

What would a better approach look like?

We might need less flash and more fundamentals:

Not sexy. But safer.

What warning signs should we start paying attention to? What practices are clearly going in the wrong direction?

Some patterns should make us pause:

Okay, but what do we actually do about this?

Is there one right answer?

No. But there are some enduring principles worth living by:

What does it mean to “stick to first principles”? Is that just a fancy way of saying ‘think clearly’?

Kind of. But more precisely, it means:

If you’re building a tool that requires internet, ask: What happens when the internet fails? Do we warn the user? Can they continue working offline? Do we cache anything? This kind of questioning often reveals weak spots before reality does.

What does it mean to “design for failure”?

Shouldn’t we try to prevent failure in the first place?

Yes—but assume failure is inevitable. So instead of designing for “success only,” ask:

It’s not about pessimism. It’s about resilience.

But even with better design, people still matter, right?

Why not just automate everything and avoid human error?

Because judgment still matters. No algorithm knows when a business goal has changed, or when an outlier is meaningful.
So we need human checks, thoughtfully placed. And those checks need:

If no one’s ever digging deeper, the system is probably too trusted. That’s risky.

How do we keep systems healthy over time?

Should we be spending more on maintenance than on new tools?

In many cases, yes. But maintenance gets ignored because it’s invisible.
Big Capex projects get attention—new platforms, flashy dashboards. But real failure creeps in quietly:

Maintenance is where true quality assurance happens. And mature systems treat it with the respect it deserves.

What’s the big picture here? What if our real job isn’t building systems—but keeping them honest?

That’s what it feels like. Maybe we don’t need more code. We need:

We need the wisdom of the few, made usable by the many.

Leave a Reply

Your email address will not be published. Required fields are marked *