/ Process Improvement

Quantifying change effort and opportunity in closing a performance gap

Dario Boriani

Dario Boriani

Read more posts by this author.

Read More

In this post, I want to talk a little about change and opportunity. Change is an overused word, and it is often mentioned without a clear understanding of what is required for it to occur. Opportunity can come from change and how organizations go about managing it. Whether an organization rejects or embraces change, opportunity is there for the competition as well.

Suddenly in life, or perhaps gradually over time, you come to the realization you need to change. Maybe you are overweight, or your job feels like a grind and a dead end. What do you do? For one, it pays to know where you are and where you want to go, so figuring those out should be your first steps. But how far do you have to go? Are you equipped to do this? Or will you give up halfway? You need to assess how far apart the two endpoints are (this is known as gap analysis), so you can gauge the correct approach to follow for the change to be successful and sustainable over time. Many fail to think this through, which is why so many new-year initiatives often come to an abrupt halt.

Where business and industry are concerned, things aren't necessarily better. Improvement initiatives are usually embarked upon with some goal, yet all too often no commitment to accurately baselining the current situation exists. This is quite basic, yet few spend any time on it. Remember, this is the very first thing to do -- establish where one is starting from. In an enterprise, someone is usually in a rush to 'get it done' by a deadline. Deadlines, as they say, are enthusiastically imposed by those not doing the work, who also lack a grasp of the effort required. Many good ideas are trampled and good advice ignored as people stumble blindly in the dust storm raised by this ill-advised stampede of haphazard activity. It's all go, go, go. As a result, it becomes impossible to assess if a change of the type and magnitude desired is feasible or sustainable, or to gauge actual improvements and return-on-investment (ROI) once it's all over and done with. You may have gotten somewhere, but is it anywhere near where you thought, and do you know how much better off you are now for doing so? The answer is, probably not. It is very much like going on a sea voyage without charts or knowing the basics of route planning.

IMG_0467
To chart a course.

When change is needed

By way of illustrative example, consider what Ripple's CEO, Brad Garlinghouse, had to say about SWIFT (Society for Worldwide Interbank Financial Telecommunication.)

Garlinghouse emphasized the prowess of Ripple XCurrent money transfer solution when compared to the widely used SWIFT system. “SWIFT’s published error rate is six percent,” said Garlinghouse. “Imagine if six percent of your emails didn’t go through without additional human intervention.”1
-- Brad Garlinghouse, Ripple (XRP) CEO: A New Payments System for the Digital Age, Cryptorecorder, 3/20/2018

Six percent? What does that mean? How bad is that number? And what is an error rate? For that matter, what is an error? In an earlier post on transfer of value, I distinguished between errors, defects, and failures.2 Summarizing:

...an error is a mistake that can be traced to a human, a bug is what a tester calls an error he/she has detected and documented, a defect is the consequence of an error as realized in the product or service provided, and a failure is an overall inability of a defective product or service to meet the customer's requirements.

For SWIFT, we will take the quoted figure at face value and assume that 6% of their wire transfers fail to occur properly, with the sender and receiver not satisfied that the funds sent arrived at their destination in the expected amount, or even arrived at all. Something happened between the two endpoints, and this failure has left customers unhappy and without access or severely delayed access to the needed funds. Perhaps the transactions required human intervention more than once or at more than one location to eventually get them to complete.

What is being left unsaid?

Quite a bit, actually. The 6% error rate does not in itself mean that the other 94% of transactions occurred flawlessly. They could have completed at a slower pace than expected, or required the customer to go through dervish-like gyrations to try to track the money as it bounces back and forth sight unseen between correspondent banks and other intermediaries. Excessive fees may have been incurred in a way that is unpredictable and that the customer cannot plan for. These are all failures of another type, in that they also cause the transaction or service not to meet customer expectations. Therefore, irrespective of its value, the stated error rate is in all likelihood but a lower bound on the number of ways in which things can and do go wrong. Who really knows about the potentially subpar performance of transactions that do complete but have little else to recommend them? Again -- and this cannot be overstated -- in Philip Crosby's words, the definition of the performance standard for quality is zero defects, not acceptable quality levels.2

The real question then becomes how hard is it to minimize an error rate and drive it to zero or near zero?

Even without being insiders, and with the most meager of information at our disposal, we can still reason about the situation and attempt to draw some conclusions. For this, we need to consider a process quality framework.

Quantifying the improvement effort via Six Sigma

As to a 6% error rate, one could be forgiven for thinking 'not bad, if that were a grade in school it would be an A or A-.' Unfortunately, in industry and business, scoring 94 out of 100 and doing it repeatedly over time labels you as irredeemably mediocre and, if you persist in that level of performance in a competitive field, you may well be on the way out.

A six percent error rate translates to 60,000 failures out of one million attempts. When you think in terms of millions of transactions of this sort taking place in short time frames and the billions of dollars involved, the picture is perhaps a little less rosy than a score of '94 out of 100' may have led you to believe at first.

Quality measures

In process improvement, you hear about first pass yield, percent defective, defects per unit, and similar quantifiers. The number of defects 'per million opportunities' (or DPMO) is one such quality yardstick. An opportunity is an instance of service delivery where a defect could manifest itself. Now, there are different ways for a process to fail, and those should be taken into account when setting up an improvement initiative, but here we concern ourselves with the basics. Fortunately, we can translate DPMO into a 'Level' rating within Six Sigma (see Table 1, three leftmost columns only.) 4

A brief detour into meal delivery

If you were interested in improving a process of hot meal deliveries, say, you would need to define what constitutes a defect first. Given a promised time window for delivery, lateness qualifies as a defect. A hot meal delivered with the wrong side means this is yet another defect. Keeping it simple, two defect opportunities (or ways to fail, as defined) exist per meal delivery. If 120,000 meals out of 500,000 were delivered late, and if another 35,000 were delivered with the wrong side, we would have:

DPMO = (# of defective meals) / (# of opportunities for a defect per meal) (# of meals looked at) x 1,000,000
= (120,000 + 35,000) / (2)(500,000) x 1,000,000
= (155,000 / 1,000,000) x 1,000,000
= 155,000

Consulting Table 1, and going down the DPMO column, we see 158655 on the sixth row, which is close enough to 155,000 for our purposes. Going left on that row, this is equivalent to a Six Sigma level of 2.5, and a LT process yield of about 84%.

           **Table 1.  Process yield vs DPMO and Six Sigma Level**
Long-term yield Six Sigma lvl DPMO +1 lvl +2 lvls +3 lvls
99.99966 6 3.4 -- -- --
99.98 5 233 68.53 -- --
99.4 4 6210 26.65 1826.5 --
93.3 3 66807 10.75 286.7 19649
84.1 2.5 158655 2.37 -- --
69.1 2 308538 1.94 -- --
50.0 1.5 500000 1.62 -- --
46.0 1.4 539828 1.08 -- --
42.1 1.3 579260 1.07 -- --
38.2 1.2 617911 1.07 -- --
34.5 1.1 655422 1.06 -- --
30.9 1 691462 1.05 -- --
15.9 0.5 841345 1.22 -- --
6.7 0 933193 1.11 -- --

Failed messaging: what about volume?

In May 2018, SWIFT recorded an average of 32.24 million FIN messages per day. Traffic grew by 11.5 % versus May 2017 which brings the year-to-date growth to +12.0%. (from SWIFT.com)3

At a minimum, there are 60,000 unhappy customers per million transactions, or, more probably, 120,000 if you count two customers, one at each endpoint, per failed transaction. On the average daily basis given, 6% of 32 million daily transactions is 1.9 million transactions. Are there really almost twice that, or 4 million newly disappointed customers, on average, every day? If these numbers are anywhere near the ballpark, they should give everyone pause. Note also that average error rates don't tell you anything about the spread, or deviation, of the numbers involved. With both in hand, you can have an intelligent discussion about process capability.

Remittances are often done across borders, from workers in a country to dependent families in another. The impact of a failed transaction is, for now, not easily quantifiable due to the complexity or urgency of the individual situations and the consequences a failed or severely delayed transfer may have in someone's daily life.

8a---a-day-in-the-life
Days gone by.

From defects to PPM, DPMO, and Six Sigma levels

Now, having 6 failures out of 100 equates to 60,000 out of 1 million. That is 60,000 PPM (parts per million.) This is not the same as DPMO, unless the opportunities to fail per message sent are equal to 1. But, we don't really know the process. For the sake of illustration, let's assume 4 ways exist for every transaction to fail. These could include wrong customer info, wrong amount, wrong routing or destination, and wrong transaction priority. Then, looking at 32M transactions in one day and with the reported error rate giving 1.92M fails, it can be shown that DPMO = 15,000 and the process Six Sigma level = 3.67. If there were only 3, 2, or 1 way for a transaction to fail, the process would be at 3.55, 3.38, or 3.05 Six Sigma level respectively. These are all levels of performance considered below average or poor, and uncompetitive long term. That they have not doomed anyone to extinction until now is likely due more to a quasi-monopolistic market share and lack of accountability to customers than to anything else. Until now.

Enterprises want as low a failure rate as possible. The 'ideal' is to be at Level 6, or 3.4 DPMO (again, see row 1, column 3 of Table 1) and a process yield of 99.99966%. We also see that, if a process were already at 99.98% (or Level 5) to begin with, its DPMO would have to go from 233 to 3.4, a factor of about 68x (68 times), to get to Level 6. This is a fairly ambitious endeavor.

Again, reading horizontally across the table above, a Level 4 process with a 99.4% success rate would need its DPMO to improve almost 27x to be rated at Level 5 and 1827x to make it to Level 6. Almost 2000x? This is beginning to sound overwhelming. Notice the deceptive 'closeness' of two numbers such as 99.4% (Level 4) and 99.98% (Level 5), which in reality are quite apart in terms of effort to bridge the gap. This means going from 'two nines' (99.4%) to 'three nines' (99.98%) -- and, in fact, almost 'four nines' (99.99%) level.

Five Nines: what's in a digit?

Why 99.99966%? What does this mean? This figure is also known as the 'five nines' and is often used in discussing availability. Check out the summary below, which applies to IT networks or power infrastructures:

                    **Table 2. Availability and Downtime**
Availability % Downtime per year
99.95% ("three and a half nines") 4 hours 23 minutes
99.99% ("four nines") 52 minutes 34 seconds
99.995% ("four and a half nines") 26 minutes 17 seconds
99.999% ("five nines") 5 minutes 16 seconds

Being at Six Sigma Level 6 is equivalent to the 'five nines' level of performance, here meaning about 5 minutes downtime per year. A year is 525,600 minutes. Think about that level of performance. Over half-a-million minutes of service, out of which you're down for just over 5 minutes! Being at 99.99% (four nines) means about fifty minutes of downtime out of over half-a-million minutes, and is roughly equivalent to the LT yield of a process at Six Sigma Level 5.

Sadly, 94% 'success' -- and anything at 90.0% or better but below 99.0% -- is at the 'one nine' level.

Improvement vs. redesign

One of the truths buried deep inside any improvement framework is that not everything can be improved to a given target if something else is not done in parallel. In other words, improving by one level, or even two, within the Six Sigma framework is difficult but perhaps not impossible. Improving by three levels, when using the same or slightly tweaked legacy process structure as many organizations are wont to do, is a non-starter.

Numbers can be tweaked, and it is possible to manipulate the presentation of data to support a talking point. The truth remains below the surface, however. Improvement is a word that applies to a specific range of situations, given a starting point and a goal, which identify the gap to bridge. By baselining the current situation first, one immediately puts an upper bound on how much improvement is realistically achievable if nothing else changes, structurally speaking. To maintain that one can need only introduce new tech and tweak other things a bit to somehow miraculously get to unheard levels of performance is a fallacy. New tech slapped on old processes which are unreliable and riddled with inefficiencies will only automate those inefficiencies and most likely up the error rate!

So, what modifications are 'they' doing?

Are things changing for the better due to GPI and other messaging modifications by SWIFT?

Swift GPI is like putting a Ferrari shell on a Model-T engine. It’s a cosmetic upgrade on old infrastructure: messaging is still not tied to settlement, it’s unidirectional and can’t solve for liquidity.5
-- Brad Garlinghouse, CEO Ripple, on Twitter, June 6, 2018

We would have to know what errors the 6% encompasses as reported, and which ones it does not include. We would also have to know whether this figure is based on a short-term (ST) data collection effort or a long term (LT) one -- one shouldn't extrapolate to DPMOs and PPMs from ST data or one might get an over-optimistic picture. Clearly, there is a lot we don't know.

As to whether things are getting any better, again this is impossible to know as an outsider. What is at least as important, though, is whether any attempted improvement is sustainable if the changes being made are not structural. A specific error rate figure, beyond its magnitude, is useful for framing an understanding of the improvement effort required. The other key aspects, per Ripple's CEO, are that the task of settlement, as well as bidirectionality and liquidity are not addressed by SWIFT's current improvement efforts.

Conclusion

A stated willingness to change may not be enough unless it encompasses aspects that go beyond what many perceive as mere tweaks to the existing process structure. Whenever goals get too ambitious within an improvement framework, it is time for a complete redesign. Strategically speaking, however, many organizations do not bother to truly analyze their current situation -- data collection can be expensive, everyone is in a rush, and facing the fact that current data are poor in quality and nearly useless for decision-making is a bitter pill to swallow. Therefore, they fail to uncover existing ills and never quite understand if what they're aiming to achieve is even remotely possible. This leads them to embark upon initiatives that are bound to cost time, waste money, and lead to great frustration when expected outcomes fail to materialize and achievement horizons continue to recede. What naturally follows is the abandonment of improvement ideas, attempts to discredit the approach or methodology, and a further iteration of the all too familiar blame game.

That said, can underperforming organizations incrementally improve to where they need to be without essentially redesigning their processes whole, or without partnering with someone who doesn't suffer from decades of antiquated structures fit for an earlier purpose and is 'already there', so to speak? What is it these organizations are trying to do in practice with their announced remediation initiatives? If changes need to be more fundamental than their current efforts, how long might it take them to get there? Can a horse-drawn buggy compete with a car even if a tired horse is replaced by a great one or two? Can you go from being a 'one nine' performer to being a 'five nines' one if you only have a teetering edifice to work with?

my-whole-life-is-a-circle
Competition.

Importantly, what does this mean for the competition and Ripple? One has to go beyond the error rate and assess carefully the effort and cost involved in bridging the performance gap implied by the error rate. Putting some bounds around this is a first step in estimating the likelihood of success of any change initiative and in orienting one's own efforts as a competitor. It becomes a part of business intelligence if used strategically. Key questions need to be asked and can shine a light on what to do.

What is the time window of opportunity and what strategies can be developed in response? Should one wait the incumbent out and eventually partner up and leverage a significant customer base? Should one be relentlessly aggressive in blazing one's own path and accept having to sign up new customers much more gradually and at a higher marginal cost as the tradeoff? Should one focus on fine-tuning one's own processes to extend the existing lead in correct transaction completions? Is one's advantage strictly technology-based, or is it rooted in superior process understanding, which will make future, inevitable changes adoptable at low cost -- the alternative being to eventually become as trapped in legacy structures as the current market share leader? Something else? All of the above? A keen grasp of processes is essential for any strategic assessment to amount to more than hand-waving.

Time waits for no one, regardless of the perceived advantage in market share on one side and in great technology on the other. As gaps close, some outcomes are going to be likelier than others, depending on the path followed by the various players. As to whether one of them will prevail over another, or rivalries will end up morphing into collaboration out of sheer necessity and self-interest, I will let you be the judge.

References

  1. https://twitter.com/bgarlinghouse/status/1004451192154439680
  2. https://xrpcommunity.blog/a-proposal-for-a-quality-specification-for-the-transfer-of-value/
  3. https://www.swift.com/about-us/swift-fin-traffic-figures
  4. Eckes, George, Six Sigma for Everyone, Wiley, 2003.
  5. https://cryptorecorder.com/2018/03/20/brad-garlinghouse-ripple-xrp-ceo-a-new-payments-system-for-the-digital-age/?utm_sq=fpjr7g8k4d

All images © 2000-2018 Dario Boriani.


Did you like this post by Dario Boriani?

Send some love: