Many of the agile techniques are effective risk management tools that make problems and risks transparent and leave the solution to be determined based on the context. It is common for people to use an agile technique to reveal a problem or risk, and then ignore it when they find it… reveal a serious issue in a retrospective and then simply put an item in the improvement backlog, or turn off a failing test rather than fix the bug that it reveals. It is so common that agile practitioners have a term for it “Agile Kabuki”, an elaborate process where people make extreme gestures but fundamentally they have no impact other than to entertain and give the appearance of action.
In a risk averse failureship culture, the most common and most damaging example of Agile Kabuki is the “Go/No Go” meeting. The “Go/No Go” meeting comes with many different names but if it looks like a duck, and it quacks like a duck…. whether it is called “Design Review Board”, “Architecture Review Board”, “Change Advisory Board”, its a “Duck”
All “Go/No Go” meeting follow the same pattern:
- The stated purpose and justification of the meeting is to protect the production environment.
- Rather than do anything practical and useful, it simply introduces random delay in the development release process. These random delays add to the risk of investments and make it harder for the business investors to plan.
- The “Go/No Go” meetings occur at a regular arbitrary cadence and duration based on the convenience and availability of the decision makers, they are not held on on demand in order to minimise delay.
- Decision makers do not need to have specific knowledge relevant to the system or release being discussed.
- Decisions made in the “Go/No Go” are final, requiring escalation to overturn.
Although the “Go/No Go” has the above stated purpose, the reality is as follows:
- The primary purpose of the “Go/No Go” meeting is to establish the power and importance of the decision makers that attend the meeting. This is multiplied when the decision makers are so important that they can send a delegate to represent them who also has no specific knowledge about the system or the change.
- A “Go/No Go” meeting is “successful” if a decision maker can justify the existence of the meeting, and thus reinforce their importance, by imposing a constraint that forces a delay of the release of an investment.
- “Go/No Go” meetings are vitally important in a risk averse failureship culture as they move the responsibility for a production failure to the process and away from the team making the change. Any failure is blamed on the process rather than the teams responsible for the system. A “Go/No Go” allows a team to focus on passing the “Go/No Go” meeting rather than focusing on the risk of damaging the existing business model. Given the information asymmetry, the team know more than the “Go/No Go” decision makers, it is entirely inappropriate for a distant team with generic knowledge to decide on the safety or otherwise of a release.
- “Go/No Go” meetings are an opportunity for people to increase their importance at the expense of the wider organisations. They are a final opportunity for people to raise issues that should have been mentioned much earlier in the process of developing an investment. An issue raised at the start of a development process is much easier to accommodate and so attracts little attention, whereas the same issue raised just prior to deployment can generate much praise and status for the individual who “saved” the organisation from a disaster. This late reveal is often justified because the individual was far too important and busy to engage at an earlier point. I once worked on an agile flagship project that took eighteen months to deliver the first business value. Nine months into the project, during the User Acceptance Test and just prior to go live, a key user looked at the solution for the first time and declared it would not work and would cause a disaster. They detailed the changes that were needed. Everyone revered them for it.… A home for Dragon Slayers was established.
- “Go/No Go” meetings often occur as a mechanism to demonstrate to industry regulators that the organisation are taking the risk associated with production stability seriously. If an organisation needs an industry regulator to tell them that they need to take their production stability seriously, the share holders of the organisation should change the management, and the customers should change their service provider to one that does take it seriously.
- “Go/No Go” meetings are an appallingly bad approach to managing the risk of preventing a production incident, and even worse approach to minimizing the impact of a production incident.
- Resolving a production incident will often require releasing a fix into production, something that is delayed and made more risky by a “Go/No Go” process.
A Risk Management culture’s alternative to the “Go/No Go” Meeting
In an organisation with a risk managed culture, the focus is on preventing a production incident, and in the event of a production incident returning the organisation and its customers to safety as soon as possible.
Instead of a “Go/No Go” meeting, a risk managed culture would do the following:
- Clearly state the controls and tests to be performed before an application or change can be released to production. Normally these are not a solution but a test that must be passed.
- Automate the controls and tests. Normally building them into the platform. See Jez Humble’s inspiring talk about the US Government from almost a decade ago.
- Automate testing.
- Specify the tests before the code so that the code is built in a way that is testable, and the tests do not break when the code is changed. Moving the creation of tests before the code means that the responsibility creating tested code shifts from the tester to the developer. It means that when the developer changes code and it fails the test, the get immediate feedback and can fix the issue.
- Architect the system so that it consists of smaller independent parts that can be quickly and more easily tested (For example, a micro-services architecture).
- Minimise the number and duration of feature flags and other latent code patterns in production.
- Lock down the build chain and test environments to prevent anyone from tampering with the system and potentially causing a discrepancy between what is tested and what is deployed.
- Use containers to ensure that the test environment is identical to the production environment.
- Use Green/Blue deployments to enable rapid roll back to the previous system in the event of an issue.
- Remove technical debt from parts of a system that is being changed to reduce the possibility of production incident in the event that a fast fix is needed.
- Perform a scenario analysis to identify as many things as might go wrong, and ensure that there is an option to address them.
- Identify fast routes to safety and establish authority to execute them.
- Build automated systems to detect anticipated and unanticipated issues.
- If individuals are given the ability to block the route to production, they should be instantly available to address any blockage, dropping all other work until the business value is unblocked.
- If a “Go/No Go” meeting is required, run a short one regularly, once or twice a day with the option to extended its duration. Releases of business value should not wait for meetings at times convenient to the attendees.
- Prioritise the business value being delivered rather than the status of the “Go/No Go” attendees.
The “Go/No Go” is just one example of Risk Management Theatre or “Agile Kabuki” that occur in organisation with a risk averse failureship culture. Other examples include:
- Risk registers that have a “Probability” or “Chance of it happening” column.
- Risk registers that only contain one type of risk (For example “Delivery risk” but no mention of “business case risk”, or “damage to the existing model risk”.
- PMO groups that obsess about cost but do not measure outcomes (Metrics).
- Outcomes that are not measured even though significant investments are made to improve them.
- Executives that introduce governance that is ignored.
- Retrospectives that generate improvement backlog items that are never executed.
- Automated tests that are switched off when they fail.
- Organisations that ignore complaints about toxic behaviour or whistle blowers.
- People who are promoted for failing.
- Leaders who are allowed to ignore change that is being introduced.
Leaders should understand whether the risk management practices in their organisation are effective or simply risk management theatre.