Falling Into the Pit of Platform Success
Tuesday, 7 April 2026 by Joe Fitzgerald
Falling Into the Pit of Success: What Cloud Foundry’s Achilles Heel Teaches Us About Platform Design
Jeff Atwood wrote something on his blog Coding Horror that has stuck with me for years:
Make it easy to do the right thing and hard, but not impossible, to do the wrong thing.
This comes from his post Falling Into the Pit of Success, and the central thesis is simple: if you make it easy to do the right thing, people will naturally do it. Most people aren’t trying to do the wrong thing. They’re short on time, heading in a general direction, and will happily take the path of least resistance - if it’s also the right path.
But the second half of that quote is where it gets really interesting: hard, but not impossible, to do the wrong thing.
If you make it impossible to do the wrong thing, you remove the flexibility that people need to solve legitimate problems. And that’s where platforms go to die.
Two Ways Inflexibility Kills Your Platform
When a platform makes it impossible to color outside the lines, the damage shows up in two ways.
1. The Uncontemplated Requirement
Someone has a legitimate business need that you didn’t anticipate when you designed your opinionated platform. Your system could do it in theory, but you haven’t provided a way. You’ve essentially locked them out.
What happens next? They leave. They go to something more raw - like pure Kubernetes - and build their own platform with their own abstractions. Word spreads: “Yeah, that platform doesn’t work for this kind of use case.” You’ve lost a user, missed critical feedback, and potentially seeded a competitor.
It’s fine to decide you don’t want to support a particular workload. But that has to be intentional, not accidental.
And the cost to that team is brutal. They came in expecting maybe four hours of infrastructure thinking. Instead, they hit a brick wall and suddenly face tens, hundreds, or thousands of hours building a platform from scratch. Completely unplanned.
2. The Blocked Power User
Someone loves your platform. They’ve mastered your abstractions and they’re trying to take it further - essentially building the next feature you should have, for free. But there’s no power user mode. No way to extend the platform beyond its built-in boundaries.
When you shut down your power users, you slow your own rate of innovation. You’re saying only the anointed few can evolve the platform, throwing away the leverage of your entire organization.
The irony is devastating. Most platforms are created because the old way was too slow, too bureaucratic, too disconnected from what teams actually need. When you block power users from extending and experimenting, you become the very thing you set out to replace. You become the impediment instead of the enabler.
The Cloud Foundry Lesson
Cloud Foundry - and particularly Pivotal Cloud Foundry - is the perfect case study.
The benefit was extraordinary. Thousands of application developers could use a platform operated by a handful of platform engineers. The operator-to-developer ratio was insane. If you were building a Java, .NET, Python, Node, or Go application, and you needed a URL, deployments, scaling, and access to a database, cache, or message broker - it was phenomenal. For 80% of the applications in a typical enterprise portfolio, it was a no-brainer.
But as soon as you hit the limits of what it would let you do? You were stuck. There was no power user mode. No way to lift the hatch, go into the subfloor, and change the way things worked.
So people went to Kubernetes. It gave them all the configurability they were missing.
Fast forward to today: many of those organizations struggle because they didn’t build opinions and abstractions on top of Kubernetes. They got the flexibility they needed, but not the leverage. The operator-to-developer ratio in Kubernetes land is not great - you end up with tens or hundreds of people thinking deeply about Kubernetes, with a wide diversity of approaches to solving the same problem in the same organization. Consistency evaporates. Maintenance becomes a nightmare.
And yet - flexibility won. Kubernetes is ascendant. Cloud Foundry has declining market share.
Why? Because at the end of the day, you have to solve business problems. It doesn’t matter what the underlying platform is - the business problem still has to get solved. Teams will take the inefficiency hit if it means they can actually deliver. And once you must have Kubernetes for some workloads, the economic argument for paying extra for a rigid abstraction layer falls apart.
Cloud Foundry’s Achilles heel: it made it easy to do the right thing, but it made it impossible to do the wrong thing. That lack of flexibility drove people to Kubernetes and stalled its market growth.
So What Does a Well-Designed Platform Look Like?
The answer is a Kubernetes-based platform that makes it easy to do the right thing on Kubernetes - using custom resources and operators to lower the cognitive burden for developers, while creating a clear contract between the platform team and their customers (application teams, data teams, etc.).
That contract serves as a point of mediation. Without it, you can find yourself held hostage by a single team that refuses to let you maintain the underlying system because it might impact their use case.
We know what the 80% solution looks like. Cloud Foundry already proved it out:
- Ingress - a URL for my app, subdomain or context path, custom domains, SSL termination
- Application runtime - run, auto-scale, and do zero-downtime upgrades
- Network dependencies - connect to other services, external databases, etc.
- Data services - app-specific databases, Redis, RabbitMQ, and the like
- Environment-specific configuration - managed across the environments an app traverses from dev to production
The question is: why hasn’t anyone built consensus around this for Kubernetes? Lots of people have talked about it, but there’s no standard answer.
An Invitation
I suspect there’s prior art out there that I haven’t read - in fact, I’d say it’s almost certain. If there’s existing literature that addresses this, I’d love to see it.
If the consensus answer already exists and it’s good, then the question becomes: why isn’t everyone doing it this way? What’s standing in the way? And that thread is worth pulling.
If there isn’t one - then it’s time to build it.
What am I missing? Where should I be looking? I’d love to hear from you.