Is Software a Mature Discipline ?
What sparked this whole line of thinking?
Recently a single software update from CrowdStrike brought down airlines, hospitals, banks, and more. Just one update—one mistake—crippled systems around the world. It made me ask: how did we end up with a global digital setup that fragile?
Wasn’t that just a rare incident? A freak error?
That’s what I thought at first. But then the question nagged at me: why should a single bug in one company’s update have the power to break the world?
Maybe this wasn’t a freak accident. Maybe it revealed something much deeper: the immature, overly centralized way we’ve built our digital systems.
So… is software still an infant discipline?
Can’t we make software as reliable as bridges or hospitals?
We can try. But here’s the catch—bridges have been around for thousands of years. Medicine evolved over centuries of painful trial and error. Software? It’s a few decades old. We’re still figuring out how to even define “best practices” in many areas. Plus there is a certain discipline which automatically comes from the users – fear of accidents, death in case of making mistakes. This does not exist in software.
But doesn’t software evolve faster than those older fields?
It does. And that’s part of the problem. Software can be changed, duplicated, and deployed instantly and globally. The same power that allows fast innovation also allows fast failure. What spreads fast breaks fast.
Why is software so prone to breaking down? Aren’t we testing everything before release?
We try—but software lives in an unpredictable world. Even if code is solid, external conditions change. A system made for 100 users suddenly faces 10,000. A harmless file becomes a giant edge case. A dependency updates and breaks your code. Code is written and tested based on certain assumptions, if those assumptions break, the code also breaks.
Is the real problem complexity?
Yes—but not just technical complexity. Social complexity, too. Software development involves thousands of decisions: languages, tools, naming, structure. Every developer does it differently.
Add to that: the flood of suggestions from non-technical managers, customers, investors. Everyone has an opinion. Every change introduces risk.
Can’t we just simulate everything and catch issues early? Can modern tools handle this level of variation?
Not really. Tools can test common scenarios. But they can’t simulate all the weird ways real users will interact with your system—or all the ways the environment might change. That’s how bugs creep in without changing a single line of code.
Are more global disasters like CrowdStrike waiting to happen? Is this going to get worse?
Unless we change our approach, yes. We’re digitizing everything—from water supply to national defense—without fully grasping how vulnerable these systems are. One bug, one outage, one misstep can ripple through industries and continents.
What would a better approach look like?
We might need less flash and more fundamentals:
- Open systems over locked-in platforms
- Decentralization over single points of failure
- Interoperability over vendor lock-in
- Systems that degrade gracefully, not collapse catastrophically
Not sexy. But safer.
What warning signs should we start paying attention to? What practices are clearly going in the wrong direction?
Some patterns should make us pause:
- Over-centralized platforms with root access
- Forcing people onto digital platforms with no alternatives
- Collecting excessive personal data for “personalized experiences”
- Pushing tech as a checkbox for compliance—not as thoughtful design
- Letting people who don’t understand the system make critical decisions about it
Okay, but what do we actually do about this?
Is there one right answer?
No. But there are some enduring principles worth living by:
- Stick to first principles. Don’t assume. Break things down. Question your defaults.
- Design for failure. Assume the system will fail. How will it fail? Who will catch it? What will it affect?
What does it mean to “stick to first principles”? Is that just a fancy way of saying ‘think clearly’?
Kind of. But more precisely, it means:
- Ask: What is absolutely essential for this system to work?
- What assumptions am I making about the user, the environment, the data?
- What happens if any of those assumptions break?
If you’re building a tool that requires internet, ask: What happens when the internet fails? Do we warn the user? Can they continue working offline? Do we cache anything? This kind of questioning often reveals weak spots before reality does.
What does it mean to “design for failure”?
Shouldn’t we try to prevent failure in the first place?
Yes—but assume failure is inevitable. So instead of designing for “success only,” ask:
- What if the payment gateway fails? Can the rest of the site still work?
- What if a file upload breaks? Does the user get helpful feedback—or just an error dump?
- What if one server crashes? Does traffic route elsewhere?
It’s not about pessimism. It’s about resilience.
But even with better design, people still matter, right?
Why not just automate everything and avoid human error?
Because judgment still matters. No algorithm knows when a business goal has changed, or when an outlier is meaningful.
So we need human checks, thoughtfully placed. And those checks need:
- Dashboards that summarize the system
- The ability to drill down into raw data when something feels off
- Shared responsibility across levels, so insights don’t get stuck in silos
If no one’s ever digging deeper, the system is probably too trusted. That’s risky.
How do we keep systems healthy over time?
Should we be spending more on maintenance than on new tools?
In many cases, yes. But maintenance gets ignored because it’s invisible.
Big Capex projects get attention—new platforms, flashy dashboards. But real failure creeps in quietly:
- Logs that stop recording
- Test cases that no longer cover edge scenarios
- A bug that gets passed over, until it causes a meltdown
Maintenance is where true quality assurance happens. And mature systems treat it with the respect it deserves.
What’s the big picture here? What if our real job isn’t building systems—but keeping them honest?
That’s what it feels like. Maybe we don’t need more code. We need:
- More humility in design
- More visibility into assumptions
- More disciplined maintenance
- More checks by actual humans
- And more wisdom extracted from the few deep thinkers and translated into repeatable practices
We need the wisdom of the few, made usable by the many.