I have been working with a group on whether they should have a single organizational backlog or whether they should separate their backlog into a product backlog and a technical improvements backlog. Like many places they want to make sure that infrastructure and technical debt is prioritized properly.
Any practitioner will tell you that technical improvements generally get ignored in favour of product improvements. Technical debt pay down normally gets deprioritized when there is a chance to improve the product from the customer’s perspective. At one organisation I worked with, the CEO said that thirty percent of every team’s capacity should be focused on improving the quality of the product. Every quarter the CEO would review where the teams were focusing and discover that product improvements dominated, normally taking one hundred percent of the effort.
In theory a separate backlog for technical improvements makes sense. We assign a certain percentage of capacity of each team to technical improvements and then we work through the technical improvement in the order in the backlog. However, in practice, we know that the teams will work almost exclusively against the product backlog. When you have two backlogs, you have the problem of deciding how the priority of each item in the technical improvement backlog relates to the priority of items in the product backlog.
The only solution is to have one backlog containing both Product and Technical Improvements. The investor group that prioritises the backlog considers the relative importance of each item regardless of whether it’s a product improvement (i.e. customer / business benefit) or a technical improvement. That way it is clear to teams that the technical improvement is more important than that new feature that the client wants.
So where does the confusion in Agile circles come from? It turns out that SAFE advocates the use of two organizational backlogs, a product backlog and an architectural backlog. The authors of the SAFE framework must know that two backlogs create this problem that leads to excessive technical debt, so why do they advocate this approach. My theory is quite simple. SAFE advocates the use of a particularly bad implementation of Weighted Shortest Job First (WSJF) to prioritise the product backlog. WSJF contains a bias that favours backlog items where the outcome is known. In Cynefin terms, WSJF has a bias towards “Obvious” and “Complicated” items, and against “Complex” and “Chaos” items. Technical backlog items often (though not always) fall into the “Complex” and “Chaos” domain. My theory is that the authors of SAFE have seen this problem in early implementations of SAFE and so separated out the Product and Technical Backlog.
Practice has shown that one backlog is needed if technical items are going to be given the appropriate level of priority. So how do we do this in practice? The real problem is that technical improvements are often expressed in terms of cost and the “What” / “How”. Technical improvements are rarely expressed in terms of the benefit they will deliver. To do this, two additional organisational level metrics are required, one for customer perceived quality (functional bugs, performance bugs, UX bugs, availability etc), and one for the lead time to deliver value (e.g. weighted lead time for investments and lead time from detection to fix for bugs). Technical improvements can be expressed in terms of these metrics (e.g. Paying down this technical debt will reduce the uncertainty of lead time for this change to the component, or this item will reduce the probability of bugs). The outcome is often unknowable which means they are in the “Complex” or “Chaos” domain, however the investor group can understand the intent. Once the intent is known, it is possible to construct a narrative that explains the value of the technical item in the context of other product investments. It is then possible to construct a backlog where WSJF may assist the prioritization discussion but does not dominate it.
Another couple of related points.
- Paying down technical debt is a great way to train new people on a code base and reduce key man dependency (a form of technical debt). If the organisation knows it will be making major changes to a particular component, investing in the pay down of technical debt is a great way to prepare for that future development and build additional capacity so that it can be done quicker.
- A more important point is that having to put technical debt into the organizational backlog is a transitional state. It is necessary because the teams allow the build up of technical debt and see the need to pay down debt in large chunks. In mature Agile teams, the teams will gradually improve the quality of the code base as part of every piece of work that they do. For example, a few years ago I visited Nat Pryce on a project. He showed a graph that his team had created. The graph showed the number of lines of code in their application. When they started it contained one million lines of code. After six months it consisted of one hundred thousand lines of code. They had not taken time out to pay down of technical debt, rather it was a continual process alongside developing new features. On a mature team you are less likely to see backlog items to clean up technical debt because it is a continual process that is part of normal development. In other words, a mature Agile culture will clean up as they go along whereas a immature Agile culture will have teams that see a choice between delivering features and creating technical debt. It would appear that SAFE intends to embed this immature practice in its process by having a separate technical backlog.
In conclusion, a separate technical backlog is a failure state. A technical backlog institutionalizes immature practices and creates a separation between product and technical concerns when there should be no split. A second technical backlog hides technical concerns from the product organisation when they should be a primary concern. Instead, create Quality and Lead Time based metrics that allow engineers to communicate the importance of the work they need to do.
One backlog to rule them all, not two or three or more!
October 4th, 2015 at 8:01 am
Backlog items don’t fall into the Chaos domain. Chaos is the domain of immediate action: an outage, a DDoS attack, impending bankruptcy, where whether you act or not there will be an outcome. It could be death and resurrection or it could be game over. In a deliberate dive into chaos you hope to employ safe-to-fail experiments and knowledge of how you will pull up from the dive, but it’s chaos so you will be managing the situation continually.
October 4th, 2015 at 8:12 am
There are many differing descriptions of ‘Technical debt’. The more infrequent covered ones are:
* technical debt linked to a company culture where factual knowledge dwarfs design appreciation
* technical debt linked to behavioural reciprocity https://youtu.be/npOGOmkxuio
October 4th, 2015 at 9:20 am
A very interesting question I do believe the answer is what should have been done by the recent scandle regarding V W and the emissions solution. They avoided the real issue!
October 4th, 2015 at 2:05 pm
I’m surprised you didn’t mention risk management. Your metaphor of unhedged options is a better model for technical weaknesses. People can drive without paying for insurance, but it’s a bad idea.
The other misconception is that technical debt is about bad code. Ward’s original proposal was that we only have high-quality code, but not everything is complete, which I think is a safer way to go. Unfortunately, that got lost in the general code’n’fix version we have now.
October 4th, 2015 at 3:50 pm
“The other misconception is that technical debt is about bad code.”
The problem with labelling bad code bad code is that people respond by claiming you’re dogmatic https://youtu.be/Wi9y5-Vo61w
October 9th, 2015 at 5:51 pm
[…] This is why I believe that successfully managing technical debt can’t happen without successfully communicating to the customer when it’s being taken on, what the costs involved are (or may be) and how it’s affecting the evolution of the product. […]
February 14th, 2016 at 4:55 pm
I too always have believed in ONE prioritized backlog. I’ve got a couple of teams at a client who insist on having a “initiative” backlog and a “BAU” backlog where BAU work always comes first. Their reasoning is that each backlog is fed into the same team from two different POs who individually prioritize each backlog. The root of the evil is funding streams. Two different stream of money allocated from above (of course, when you trace it all the way back up to the division it’s out of the same ocean). By pushing BOTH “sources” of backlog into ONE backlog they are constantly making choices on what is most important. This lets them see than not all BAU items are more important than new initiative work.
On the SAFe topic, you are way off the mark. Version 3 of the SAFe Framework only had separation of Business and Architectural epics in the refinement process. Much like my example above, they start from two different points of view, but make their way quickly into ONE prioritized “Portfolio Backlog” and don’t get separated again into parallel backlogs as you suggest. In Version 4 there isn’t even a suggestion of separate backlogs.
Regarding WSJF, I personally share your skepticism. Like any ranking method the numbers can easily be gamed. It may not be perfect (when is anything?) but it at least gives transparency. I’m curious to know how your “theory” statement needs to be rethought in light of SAFe not actually splitting backlogs as you incorrectly asserted.
Instead of WSJF, I prefer the time-proven model of pair-wise comparison. It fits well into the teachings of relative sizing, but from a prioritization point of view. Everything must be ranked in an “order” – using that model there will be no ties and no gaming of the WSJF calculation just to mathematically move an item higher in priority. The person making the pair-wise comparison will have to be justify the position and transparency will be attained.