Software Developer vs Software Engineer

13 January 2025

Many people have opined on this subject. I guess it’s time for me to do so as well.

If you read management books from the 80s, the good ones[^1] define improving quality as the removal of waste from a system and thus improving quality also tends to reduce costs.

In these, usually manufacturing, systems there are too types of employees: front-line workers and management.

Front-line workers can only repair defects. One-off problems whose solution gets us back to baseline metrics. These don’t improve quality of the system.

Management on the other hand gives the front line workers tools to be able to spot defects in their output and training to resolve them. Management can also make fundamental changes to the system that can improve quality.

In software, we work as part of two different, but intertwined systems: development and production. The development system produces the software and configurations of infrastructure that runs in the production system.

Software developers are the equivalent of frontline workers who use the metrics defined for them to identify defects in their work and resolve them, getting them back to baseline.

Software engineers can make management decisions about these two systems in order to improve quality.

At present, I’m working on a fundamental change to the software I work on in order to give it new capabilities. This change affects a huge surface area of the software and is being shipped as many discrete pull requests over a period of months, likely 100+ by the time we are done.

The automated tests that run against our backend are composed of nearly 300 test suites and these run against a given pull request prior to being able to merge it to main.

Over the 4+ years we’ve been working on this piece of software, the time we have to wait for these tests to pass on a given PR has slowly gone up.

In theory, we only run the tests that are affected by a given change, and since most of the changes are small, these test runs shouldn’t need to run most of the tests.

In practice, a given change causes almost 250/300 test suites to be run, and this introduces 15-20 minutes of lag time between when a PR is finalized and when that PR can be merged.

After getting fed up waiting on these tests to pass, I decided to investigate why so many test suites are almost always getting selected.

It turns out that the utilities we use for setting up a given scenario for a test are dependent on much of the rest of the code base, not to mention, each other. So, I found the smallest change that could prove my theory and re-architected the smallest subset of these test utilities that would prove my theory, splitting these utilities into files with a single function that can be used in many places.

The change isn’t complete yet, but my experiment showed a change that previously triggered 250 of these test suites now triggers 25, an order of magnitude improvement.

This removes a ton of lag time and context switching in order stay busy while waiting on all those tests to run. It is an improvement to the quality of the development system.

This is an example of the kind of quality improvements a software engineer should be able to make and a developer won't likely realize how much waste is going on.

They don’t pop up every day. I’ve been waiting for a good example for awhile now to be able to write this post. But when they do pop up and you have the insight to solve them, the impact is hard to overstate. There are only so many 20 minute periods in an 8 hour work day.

Alright, back to this 100+ PR change, now with many fewer test runs.

[^1]: I recommend Out of the Crisis by Deming.