How Clean Code Ruined PHP's Best Feature

I've done PHP programming professionally for a decade. I started with just FTP'ing files to web servers, transitioned into version control, then the whole blue/green CI/CD pipeline. I've written my own autoloaders, written PHP extensions, written Apache rules, trained teams on several different PSR standards, given trainings on how the PHP runtime runs programs, and more. I've been through the PHP 5 to PHP 7 migration several times at different companies, and I've been through the PHP 7 to PHP 8 migration once. The only thing I haven't done is use Laravel, mostly since I've been working with custom PHP frameworks built in-house back in the early 2010's and late 2000's. At this point I'm pretty confident in my PHP skills, and I've seen how PHP has tried to catch up. I've also seen one of the greatest features of PHP of all time fall apart and crumble to the point it's useless. That features is PHP's hot reload.
Hot Reloading
As far as I'm aware, PHP doesn't call it's hot reloading feature "hot reloading." As far as I'm aware, there's no official name for it. The runtime simply just loads the PHP files from disk on every request, and performance-oriented extensions like opcache will essentially have an in-memory cache that gets flushed when fwatch reports a file change. In PHP land, we just change our files and the server just loads the changes naturaully. It's expected, it's normal, and we don't name it because it's just how things work.
Also, from my experience this appears to be atomic at the file-level from the Operating System, meaning that the OS won't let it get a file handle to the new
file contents until the file write is complete. So, when uploading a single file PHP won't read that new file until it's done uploading. Nice.
Anywhere else (except maybe Erlang-world), and that concept is magical and terrifying. It's abnormal. It's a hard-to-make feature deserving of a name. So, there are terms like "hot reloading", "live patching" and stuff to designate that this isn't normal.
So, while I'm using terminology that PHP developers may not be familar with, I am using terminology that non-PHP developers will understand quickly. It's just that the idea of a program automatically loading file changes as soone as they're made on disk is so common to PHP developers that we have to work around that fact. While in other languages, the idea is so tantilizing we spend years of engineering effort to get something half-baked.
My Early Days
PHP was what I wrote to pay my way through college. It was back in the early 2010's, well before the cloud was a big thing. AWS did exist back then. And I worked at a startup which was a very early adopter of AWS.
Most people were still discovering what the cloud was, and most of how people use the cloud today wasn't even possible back then. To give an idea of when this was, I started at this company years before AWS Lambda or AWS Aurora were released. Docker hadn't even been released yet, so we were years away from Kubernetes being a thing. We were years away from "blue/gree" deploys even being feasible, let alone common nomenclature.
The company I worked for had been around for years prior to me joining, so they were very "cutting edge" when it came to cloud services. We also had almost everything running on EC2 instances since that was the primary AWS feature when they started.
During this time, most websites would have scheduled maintenance windows where they would upgrade the software. Our website didn't have those windows. There were quite a few reasons for it, but the biggest one was we were using PHP, and we had architected our system in such a way that most code changes would only involve one or two files. This meant we had to only upload those one or two files, and PHP would reload our code automatically. No downtime.
We were used to doing deploys constantly. It was so normal, that we didn't even call them deploys. It was just uploading files to the server.
Do I miss those days? Sometimes. There are aspects I miss, like how fast and easy deploys were, but there have been a lot of improvements to how things are done since those early days. Git has improved my life a lot, and having better test environments also helps. The tooling has improved a lot, and ways that seemed ludicrous to develop software back then are now possible thanks to those improvements. However, not everything was improved. The parts that suffer the most has to do with how the code itself is written.
Upgrades and Rewrites
Eventually our system was upgraded, and the server count grew. We adopted a formal (albeit manual) deploy process that still resulted in zero downtime. It involved taking servers out of the load balancer, uploading the code, and then putting them back in.
By this point, our code changes were sometimes big, half a dozen files or more. That didn't cause too many issues, but it did mean that we needed more time to ensure we got all the files uploaded correctly.
But then came the rewrite. Everything was rewritten with the latest PHP "standards" and "best practices." And with these best practices, our company was no longer able to keep up zero downtime deploys. This change came from one devilish idea that broke PHP's reload system, and simultaneously brought PHP performance to its knees. Clean Code.
The rewrite was "clean". It was so clean we actually followed every tenant of SOLID at some point in the codebase - including the L. It was awful. We lost our zero downtime deploys, had scheduled maintenance that took entire nights, our code was slow, the bugs were high, and adding a feature was awful.
I saw the same thing happen at another company. We went from a not-clean codebase to a clean codebase, and everything started falling apart. Deploys would break when uploading files, so we had to do workarounds and even scrap hot reloading altogether. Performance fell with specific clean code principles, so I had to benchmark and bring things back to acceptable levels. Everything that went wrong previously went wrong again.
The Problem with Clean Code
To understand why clean code is a problem, we need to first understand the context it emerged in. To do that, let's cover how code was previously written, and how clean code proposed to write code.
Prior to clean code, we would have files with many different ideas, including authorization, authentication, database calls, data rendering, business logic, and more all in the same file. Combined with many different ways of doing things. It'd look something like the following:
<?php
require_once('../../core/db_connect.php');
$db = connect_to_db();
require_once('../my_autoloader.php');
$session = StdSession::get_session();
if (!$session->valid()) {
require('../redirects/logout.php');
}
require_once('../auth_helpers.php');
if (!StdAuth::is_admin($session) && !auth_checker($session->legacy, 'can_view')) {
$view = new \Views\Access\Denied();
$view->display();
return;
}
$notes = load_notes($db);
require('xss_protect.php');
?>
<h1>Your Notes</h1>
<ul>
<?php foreach ($notes as $note) { ?>
<li><?php echo xss_prevent($note['name']); ?></li>
<?php } ?>
</ul>
We have a lot going on here. Some of our code is using the old require pattern, other parts are using our custom auto-loader. Some parts are using classes, others are using methods, even on the same line. It's all over the place.
This was chaos. It was hard for developers to follow. It was even harder for tools to follow.Tooling didn't have a way to do auto complete with the above file, at least not effectively. Sure, it could look at the require files, but what about that auto-loader? That's arbitrary PHP code that could be getting ran, there's no standard definition for what an autoloader needs to do. For all the IDEs (or developers) know, it's doing a network request to get a file template out of some sort of database, running it through a transformer to generate the file, and then putting it in a tmp folder so it won't even be reused on the next request. For some code bases, it may have made more sense to actually do that.
The big idea with clean code was to standardize the chaos, that way even if humans couldn't always understand it, the tooling could. And if the tooling could, then developers would.
Keep in mind that clean code was coming from a Java world, where every class got a file and every file had a directory matching the package structure. The above file would be something like this:
// auto loader is loaded by the framework, no need to specify it
// I also left of the use declarations for brevity
/** File App/Auth/IAction.php */
interface IAction {
public function can_do(): bool;
}
/** File App/Auth/ViewAction.php */
class ViewAction implements IAction {
#[Inject]
private IRoleChecker $roleChecker;
const VIEW_ROLE = 'can_view';
public function can_do() {
return $this->roleChecker->has_role(static::VIEW_ROLE);
}
}
/** File App/Auth/ISession.php */
interface ISession {
public function current_user(): User;
public function is_valid(): boolean;
}
/** File App/Auth/IRoleChecker.php */
interface IRoleChecker {
public function has_role(string $roleId): boolean;
}
/** File App/Auth/RoleChecker.php */
class RoleChecker implements IRoleChecker {
#[Inject]
private ISession $session;
public function has_role(string $roleId): boolean {
return $this->session->is_valid() && in_array($roleId, $this->session->current->user()->get_roles());
}
}
/** File App/DataModels/Note.php */
class Note extends Model {}
/** File App/Pages/NotesController.php */
class NotesController extends Controller {
#[Inject]
public ViewAction $viewAction;
public function show(): View {
if (!$this->viewAction->can_do()) {
throw new UnauthorizedException();
}
return view('user.notes', [
'notes' => Notes::all()
]);
}
}
/** File resources/views/notes.blad.php */
<h1>Your Notes</h1>
<ul>
@foreach ($notes as $note)
<li>{{ $note->name }}</li>
@endforeach
</ul>
We have clean code now. It's not easy to deploy. Or change. Or read. Or understand. But it's "clean." Just as clean as the room of a kid who shoves everything in their closet and just barely manages to get the doors to close. Yep. It's clean.
The separate files for everything pattern is really hard for developers to follow, especially when classes are small and inheritance trees are tall. However, it is something tooling can follow since tools can simply look at fully-qualified class names, and then they'll know exactly where on the file system that file is. Developers can then just keep hitting "Go To Definition" and the tool will show them. Easy Peasy, right?
Well, no. Clean code was designed for only a few tools in Java land. The Java Go To Definition tool (which is now commonly provided by LSPs which can be used with any editor), the Java XML build systems and config files of the day (which are still debated whether they're useful, and they're losing ground), editors with limited memory buffers (which is much less of an issue today), lack of LSPs, and runtime Dependency Injection. All of which are developer experience tools that have been largely upgraded, replaced, reinvented, or completely obsoleted.
The rest of the tools that actually still exist in roughly the same form weren't accounted for. In fact, these are the most important tools that exist for any programming language. They impact developers and users. I'm talking about the compilers and interpreters, including the quirks of their runtime and upgrade models. None of those tools were accounted for in Clean Code. None of them were optimized for. In fact, we deoptimized those tools in pursuit of "Clean Code" and ended up making a mess. And PHP was brutalized.
The Clean Lie
Clean code, like the best lies, have portions of truth behind it. However, that truth is not inherent to the rest of the lie or statement, it is only mixed in to make people believe that the rest must also be true.
For instance, clean code says we should have meaningful names, follow standards, reduce complexity, and have useful comments. So does every other best practices list. There's nothing about those statements that make clean code better than any other way of writing code.
Where it falls apart, is with the core ideas primarily unique to clean code, and the fact that those ideas are so engrained to the way Java operates that develoeprs who do Java programming are trained to believe them to be true.
The biggest issue is SOLID (Sell Objective Lies In Development). If you can accurately tell someone what each letter stands for, then you already know more about SOLID than any developer who slings it around. If you can then state what each letter means, and why it was suggested, then you'll know more about SOLID than the entire slop of pro SOLID doctrine out there. Until then, my explaination is just as good as theirs, and it's way more memorable. Plus, you probably won't actually do the research, so we'll just roll with it.
SOLID has five core "pillars" which destroy code organization. I'm not going to go through all of them here, mostly since not only do most not apply, but many are outright lies in just the names alone. Go read Solid is not solid by David Copeland if you want a good deep dive into SOLID and it's flaws. Not only does he do the best job I've ever seen at explaining SOLID, he also does the best job at explaining why it's all useless too.The points I'm going to focus on is the practical application of SOLID, which is simply summed up as follows:
- Lots of tiny, small classes and interfaces
- Each class and interface getting it's own file
- Runtime dependency injection and reflection
Classes, Interfaces, Everywhere
The dumbest thing I've ever had to repeatedly do is argue with people on what "Single Responsibility" means. Single Responsibility is the actual worst thing ever.
Every developer I've met who bangs the SOLID drum without knowing what it is will say "It means a class should only do one thing." No, no it doesn't.
Bob Martin stated that a responsibility is a "reason to change," as in a reason to change the code. In other words, if you add a new feature to the user's page, you should only have to change one class (or file, or unit of code, or whatever).
This is what I did in my early days. The code was segmented by what needed changing not what things did. It meant we had our todo list data model in the same file as our todo list auth checking code next to our todo list viewing code. It meant our files were long. It meant our features were self contained in a single code unit.
When things got a little too big, we did split things up, but we kept them in the same directory. We did things by what needed changing, not by what things did. And it worked.
The problem is that nobody else believes that's the way to do it. They all believe that it's small classes, and small interfaces, and that everything gets it's own file.
Do you know what that means for PHP land? Instead of uploading one file when you change something, you upload ten. And if you make two changes, you upload 20. And guess what? File systems and networks are slow, but CPUs are fast. In practice, this means that it takes a lot more time to upload files to a server than it does to receive and process requests. So, while you're uploading 20 files, your server has processed 2,000 requests, with each request getting files in a different state.
This is a huge problem. What happens if you are uploading a change to AuthHandler
but it depends on a new class ZookeperConfig
? Well, until all 20 files are uploaded, your auth class will throw an error. That's right, 500's until you're done uploading. Or corrupted data. Or things just hang. I've had all of those when updating a live PHP production server with clean code. It's not fun to fix.
All of this could be avoided by organizing code by how it's changed, like we did in my early days. If I only have to upload a single file, then I don't have these race conditions between the state of my code. The OS keeps things tidy, and PHP just loads one single code unit.
The issue is no one ever listens to the creator of a catchy term. Sorry Bob, at least Alan Kay knows your pain. This is one of the few points I actually feel for Mr. Martin. If done the way he originally recommended, it actually works quite well (at least in PHP).
That said, Bob had a bunch of other conflicting ideas in SOLID and Clean Code, which in turn reinforce people's misconception of what Single Responsibility means, so there's that to address too.
Separate files for everything
Remember what I said about Clean Code coming from Java-land? Well, Java has this terrible concept that everything (and I mean every little thing) should get it's own file on disk. Everything gets a new folder, and a new file, and that's better. It's so much better that's how PHP does it now thanks to PSR-4.
The problem is, that's not how disk I/O works. Files aren't stored next to each other on disk. They're stored in a practically random and fragmented way (it's why disk defragmentation was such a big thing on spinning drives). Whenever you open a file, and then open a second file, those two file opens are random accesses. Whenever you read a line from a file, and then read the next line from a file, those two reads are sequential (or as I say "contiguous reads"). Is there a big difference? yes.
Sequential reads are way better. From CPU cache lines, to RAM prefetching, to less I/O instructions to swap memory regions, less timing and waiting as RAM switches regions to read, to less I/O ops to disk, to better disk prefetching, to less time for disk circuitry to switch, sequential reads are far faster. On my linux mini PC with an nvme SSD, a sequential read was able to do 2.8 GB/s, while random reads were 1 GB/s on the same 1MB file. Almost three times faster, without a bunch of extra syscalls to get new file handle. Add in the syscalls by splitting that 1MB file up, and we get an even bigger difference.
What does this mean in practical terms? Well, for starters, having more files means your compilations will take longer. Git pulls, file copies, etc. also take longer (especially on Windows). Linting, analyzing, indexing all take longer. Code changes require more files to be touched. And this is true of any programming language.
For PHP, well it also means all the above plus slower runtime performance. PHP has to load the code file it uses on every request. The more files you make, the more the runtime needs to load.
While we can (and do) use in-memory caches to speed things up, we still get a performance penalty by having a bunch of small files in a bunch of random places in memory. Almost always we're going to have some sort of tree or map in the in-memory cache. In PHP's opcache, we have maps and pointers to maps, we have file caches and class caches, etc. Sure, it helps. But also reducing the number of cache entries and increasing spacial locality in that cache also helps. Clean code just so happens to do the exact opposite and decreases spacial locality and increases how many cache entries are needed. It's a double penaltly if the number of cache entries grows to the point we have cache evictions.
This especially becomes problematic in PHP 8 since the opcache is partially used by the JIT now. But, if everything is small, spactially isolated, and constantly getting evicted, that makes the JIT's job harder. Sure, the JIT can make code faster, but if the JIT optimizations are lost every few requests it's not going to help that much.
Runtime Dependency Injection
This one totally kills performance. I'm going to start with pre-PHP 8 land, since that's what I'm most familiar with.
In pre-PHP 8, dependency injection was done either by annotations in comments, or in typed parameters (for PHP 7).
<?php
class MyClass {
/**
* Inject based on comment annotation
* @Inject
* @var MyType description
*/
public $someProp;
public $otherProp;
/**
* Inject based on type
* Annotation here isn't always needed depending on DI configuration
*/
function __construct(MyOtherType $otherProp) {
$this->otherProp = $otherProp;
}
}
The problem is that PHP doesn't parse the annotations in comments. Instead, the dependency injection engine, written in PHP and running on each request, must parse each comment for any injection point to see if there's an inject annotation. And if there is an injection annotation, it either needs to continue parsing the comment to get the type, or it needs to do reflection (which is historically very slow in PHP) to get the type information. Then it can do injection.
In addition to parsing every comment, the DI engine also generally does a lot of reflection on constructors to inject types as well.
This entire process is really slow, and it was a major source of a slowdown in production.
It's also one of the key pillars of practical clean code. Sury, Bob may say that he doesn't say "Dependency Injection" and only says "Dependency Inversion". But, when he's out there saying every interface should only be a few methods, and everyone's writing small classes, and everything depends on everything else, and now we have to pass in all dependencies, well, dependency injection is the only way to keep developer sanity. So, yeah, it's part of clean code since it's the only practical way of trying to get it somewhat working.
Fortunately, PHP 8 fixed the comment parsing aspect, so now we have attributes and the PHP DI systems use those instead. So now we have the following:
<?php
class MyClass {
#[Inject]
public MyType $someProp;
public $otherProp;
#[Inject]
function __construct(MyOtherType $otherProp) {
$this->otherProp = $otherProp;
}
}
But we still have the runtime reflection issue. Java gets around it by doing it on server startup, which is why Spring Boot takes so long to get going.
PHP still does it on every request. Sure, there are ways to sort of get around things if you have tools analyze every PHP file and create massive maps of what's needed. But once you do that level of preprocessing you've fully accepted full build and deploy pipelines, you're generally making wide sweeping changes that break PHP's reloading mechanism, and you may as well use a different language.
Using a different language
PHP really falls apart with clean code. Anyone who wants to do "clean code" really should use a different language. Clean code was made for Java, and even there it sucks. It doesn't apply to how computers work, it doesn't apply to how humans work, and the only good parts are either completely ignored by it's followers, or they apply to every other set of programming best practices.
PHP really needs a different way of usage in order to get the best parts out of the language. Code reloading doesn't have to be for development only. It can be for production, I've done it. It worked surprisingly well. PHP can also be a lot faster than it is now if code is differently structured. But, no one writes code that way anymore.
The PHP landscape has changed to resemble every other programming language with classes. And it was never meant to. It was meant to be single files for each page, with a few tidbits of shared code across them. It was meant to be something that you update a page and see the changes immediately, not a class hierarchy that takes a huge CI/CD pipeline.
In the pursuit of "clean" code, we've ended up not only creating atrocities, but forgetting what our tools and languages were meant to do. We've lost sight of ideas that changed the web. There are reasons that Wordpress, the non-Object Oriented PHP framework beats out Drupal the Object Oriented PHP framework. It's less about whether Object Oriented is "good" or "bad" but more of whether it reflects how the tool was designed to be used.
PHP is not meant for clean code, and forcing clean code down it's throat is ruining it. PHP should have it's own definition "clean" code that embraces PHP's unique strengths, not try to subjugate it to the mediocrity of every other language.
Erlang managed to avoid the clean code trap. Erlang has it's own style and best practices of what's "good" and what's "bad" which uniquely fits Erlang's strengths. The OTP platform, structured releases, supervisor trees are all accounted for in their way of doing things. PHP missed an opportunity to do something similar, and now it's going to be almost impossible to get something similar.
The community left behind PHP's strengths. Maybe they'll come back, maybe they won't. With every release they seem to be straying further and further, trying to optimize PHP to "bring it to the future" without remembering its legacy or identity. Hopefully other languages will stop take a look at themselves, identify their strengths, and start creating their own style for best practices that don't reflect the "Java way" or "Bob's Clean Code," but instead reflect their own way, their own strengths.
I miss PHP back when it was acceptable to write code tailored to how PHP works.