Difference between revisions of "Software Support Lifecycle"
(→Result) |
|||
Line 57: | Line 57: | ||
== Result == | == Result == | ||
+ | The conflicting goals of the task are obvious. On the one side, we wish to minimize the time of incident resolution. Unless we consider rearchitecting the support process, this is done mainly by deploying additional resources. On the other side, we wish to minimize the amount of deployed resources to optimize support costs and create room for margin generation. | ||
+ | |||
+ | The model shows, that a good compromise between these two goals can be reached with the following resource deployment: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Resource Type | ||
+ | ! Number of Resources | ||
+ | |- | ||
+ | | Junior Developer | ||
+ | | 1 | ||
+ | |- | ||
+ | | Senior Developer | ||
+ | | 2 | ||
+ | |- | ||
+ | | Standard Developer | ||
+ | | 4 | ||
+ | |- | ||
+ | | Standard Developer - Overtime | ||
+ | | 2 | ||
+ | |} | ||
+ | |||
+ | The rest of this chapter will aim at providing supporting evidence for this conclusion. I will refer to the above configuration as 1-2-4-2 configuration | ||
+ | |||
+ | === Utilization and Depreciating Returns on Additional Resources === | ||
+ | The utilization of the above configuration is as follows: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Resource Type | ||
+ | ! Utilization (%) | ||
+ | |- | ||
+ | | Junior Developer | ||
+ | | 49.93 | ||
+ | |- | ||
+ | | Senior Developer | ||
+ | | 65.35 | ||
+ | |- | ||
+ | | Standard Developer | ||
+ | | 79.1 | ||
+ | |- | ||
+ | | Standard Developer - Overtime | ||
+ | | 83.09 | ||
+ | |} | ||
+ | |||
+ | Granted, the utilization above might seem low. Consider the case when we remove one Senior Developer from the team. Than the utilization will be as follows (1-1-4-2): | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Resource Type | ||
+ | ! Utilization (%) | ||
+ | |- | ||
+ | | Junior Developer | ||
+ | | 87.77 | ||
+ | |- | ||
+ | | Senior Developer | ||
+ | | 94.92 | ||
+ | |- | ||
+ | | Standard Developer | ||
+ | | 95.7 | ||
+ | |- | ||
+ | | Standard Developer - Overtime | ||
+ | | 96.45 | ||
+ | |} | ||
+ | |||
+ | This looks much better. It is however important to realize, that while near 100% utilization is good for product development teams, for support teams the situation looks different. There, utilization near 100% means very little headroom for situations where more than expected incident occur. To illustrate this, let’s compare average incident resolution times between 1-2-4-2 and 1-1-4-2 |
Revision as of 14:57, 24 January 2015
Contents
Problem Recap
A software firm was contracted to develop a new customer-facing solution for a major banking institution. As part of the negotiation process, an SLA needs to be reached. The banking institution provided required issue resolution times and asked the software firm to appropriately price the contract while provide reasoning for the contract pricing.
The software firm decided to create a simulation of a typical month of the support cycle as a basis for approximate the resources needed to provide the support.
Approach
The model consists of various severity incidents, represented as entities, and various development resources, represented as resources in SIMPROCESS. The model aims to represent a reasonably simplified version of the real development process.
The model needs to represent developer shifts, “emergency holding” (where developer does not work, but is available to start solving incidents in a reasonable amount of time) and overtime billing.
Model Structure
Entities
Incidents
There are several severity of incidents, represented as different types of entities. The severity of incident, apart from having different SLA requirements, differ in their flow throughout their process. Different severity incidents are generated using different rules. The SLA terms of different incidents can be found here.
Incident Type | Severity (lower is less severe) | Probability of Occurrence (per hour) |
---|---|---|
Standard | 1 | Nor(0.4, 0.25, 1) |
Severe | 2 | Nor(0.2, 0.25, 1) |
Critical | 3 | Nor(0.075, 0.25, 1) |
It is important to note, that higher severity incidents can preempt lower severity incidents, which is desirable as higher severity incidents have more strict SLA terms.
While the normal distribution is sometimes considered problematic when using it to generate entities, due to the fact that a lot of real distributions are not symmetrical and instead are “right-leaning”, I believe that the normal distribution is sufficient for this scenario. An alternative shape that seem to be a bit more realistic was a beta distribution, but seeing the relatively small impact on the results, I chose a normal distribution, since it is far more accessible and requires less expertise to understand.
Technical Entities
Another type of entity in the system is a Release Trigger. The Release Trigger is responsible for triggering an automated software build every 24 hours.
Resources (Developers)
Developers are grouped into three tiers – standard, junior and senior. Each developer tier has different pricing (here) and might not be able to participate in all parts of the process. The developers get paid a fixed wage, regardless of their utilization. The developers work in the 8x5 mode. This is however problematic when dealing with high-severity incidents, which have strict SLA terms.
Therefore, a new tier has been added – “Developer – Standard – Overtime”. The role of this tier is to hold “emergency” in non-working hours of the day (17:00 – 9:00 on work days + whole weekends). Holding emergency means, that the developer is ready to immediately start resolving critical bugs from his home office. For this, the developer is compensated in the following way: The developer gets paid 10% of his standard hourly wage for every hour he holds emergency, regardless of the number of incidents (fixed cost). Apart from that, the developer gets paid for every hour he spends resolving incidents in the emergency hours (variable cost).
Support Process
The incident resolution process is as follows:
Things to note about the process:
- Standard severity incidents are not eligible for hotfixing
- Since junior developers do not have full knowledge of the system, they are excluded from the hotfix development and incident resolution activities
- Hotfix development is a high-risk activity (deployed directly to production without proper testing), standard developers need to pair up when developing the hotfix
- Critical incidents are released “out-of-band”, meaning they do not wait for the next release and are released individually
Result
The conflicting goals of the task are obvious. On the one side, we wish to minimize the time of incident resolution. Unless we consider rearchitecting the support process, this is done mainly by deploying additional resources. On the other side, we wish to minimize the amount of deployed resources to optimize support costs and create room for margin generation.
The model shows, that a good compromise between these two goals can be reached with the following resource deployment:
Resource Type | Number of Resources |
---|---|
Junior Developer | 1 |
Senior Developer | 2 |
Standard Developer | 4 |
Standard Developer - Overtime | 2 |
The rest of this chapter will aim at providing supporting evidence for this conclusion. I will refer to the above configuration as 1-2-4-2 configuration
Utilization and Depreciating Returns on Additional Resources
The utilization of the above configuration is as follows:
Resource Type | Utilization (%) |
---|---|
Junior Developer | 49.93 |
Senior Developer | 65.35 |
Standard Developer | 79.1 |
Standard Developer - Overtime | 83.09 |
Granted, the utilization above might seem low. Consider the case when we remove one Senior Developer from the team. Than the utilization will be as follows (1-1-4-2):
Resource Type | Utilization (%) |
---|---|
Junior Developer | 87.77 |
Senior Developer | 94.92 |
Standard Developer | 95.7 |
Standard Developer - Overtime | 96.45 |
This looks much better. It is however important to realize, that while near 100% utilization is good for product development teams, for support teams the situation looks different. There, utilization near 100% means very little headroom for situations where more than expected incident occur. To illustrate this, let’s compare average incident resolution times between 1-2-4-2 and 1-1-4-2