Measurement in Context
In “A Piece of Sky” from the “Yentl” soundtrack, Barbara Streisand holds the final vocal note for 19 seconds. (Yes, I’m THAT geek, too.)
Try it now. Sing a note at full volume for 19 seconds. No one will care. Go ahead, I’ll wait…
I know, right? Tough to do! (for me, anyway).
For context, the Guinness Record for Longest Vocal Note in a Pop Song (according to a random internet post I wasn’t able to verify…but still) is Freddy Curci holding a note for 30 seconds in the song “When I’m With You” by Sheriff (remember them?) in 1989.
Suddenly Babs isn’t so impressive.
It’s all about context.
~Geek~
Read “The Goal”
Reasons to read The Goal (1986) by Eli Goldratt:
- Nostalgic 80′s references like the secretary who drives a Chevette and a company called “Douglas” that makes DC-3′s and DC-10′s.
- A match-and-bowl “game” that demonstrates statistical fluctuation and process dependency brilliantly (can’t wait to try the game).
- Main character refers to a group of Boy Scouts he’s leading on a hike as “little bastards”.
- A physicist is given rock-star status bordering on top-of-the-mountain guru status.
- Relationship between the main character and his wife is modeled (apparently) off of Ricky and Lucy Ricardo.
- Festive character names: boss is “Peach” (but isn’t); Controller “Ethan Frost” could “double as the Grim Reaper” (not my words); diner waitress named Maxine.
- Plot is gauzy thin disguising a brilliantly logical approach to work.
I’m on page 120 (of 320) and can’t wait to finish so I can re-read key parts (like the match-and-bowl game). Read this book.
~Geek~
Your New Training Measurement Strategy
Learning measurement has become mired in its own best intentions and is sinking. Everyone of us agree that there should be some significant measurement of the learning function. Why? To get a seat at “the table”! To protect our budgets! To calculate the ROI for the training dollars we DO spend!
So we set about studying the methodologies: Don Kirkpatrick, Jack & Patti Phillips, Robert Brinkerhoff and more. What does Josh Bersin have to say? Surely Elliott Masie has an opinion! What about Judy Hale? Surely there’s a consultant who can help me unravel this! And there are. There are consultants with opinions sprouting like spring wildflowers, ready to tell us the best way to measure learning effectiveness. I should know; I’m one of them.
But really it’s this: the effectiveness of learning is measured the way anything is measured:
- The measure of my coffee cup’s effectiveness is its ability to hold coffee.
- The measure of my pen’s effectiveness is its ability to deliver ink to paper for as long as my hand can hold out without leaving ink splots on the page.
- The measure of my car’s effectiveness is its ability to transport me to work AND its ability to provide a quite space to listen to music.
So here’s your new Learning Measurement Strategy: How well does it do what it’s supposed to do? This is a simple question, but it’s best understood if we break it down into its component parts. Remember sentence diagramming? (do they still teach that?)
“How well…”
“How well” tells us that we’re being asked to evaluate something. The words measurement, evaluation and assessment sometimes get used interchangeably and that’s not always a bad thing, but let’s come to a common definition. Measuring is counting and counting is easy.
- I have $.75 in my pocket.
- We trained 162 people.
- 37 employees quit within the first 30 days of hire.
But evaluation is more than counting, evaluation asks us to compare the measurement to a target. Anytime we measure how well something does, we’re comparing the measure to a target. So evaluation includes a pass/fail component when the measurement is compared to the target.
- With $.75 in my pocket, I don’t have enough for the bus, which is $1.25. Fail.
- We trained 162 people, which is the entire target audience. Pass.
- 37 Employees quit within 30 days of hire, but last quarter that number was 97. Pass.
You see that targets can be subjective. 37 people quitting within 30 days of hire isn’t likely a crowning achievement, but out of context we really don’t know how to define success. So in our measurement strategy so far we know we’re going to be asked to develop a performance target at some point.
“…does it do…”
This is a two-parter. In the middle is the easy one, “it” is training, or “it” could be mentoring, or executive coaching, or the implementation of a new resume tracking system. “It” is simply the intervention that you put in place to create change in your organization.
The cookie sandwiching the creamy center of “it” makes up our handy action phrase, implying that our intervention is more than theoretical, and it is. We’re not just imagining training, we’re going to actually develop training and then we’re going to deliver it somehow. In short we expect it to DO something.
“…what it’s supposed to…”
This is the big one. Encased in “supposed to” is the business process that our training supports, how that business process is measured, and the definition of success (i.e. the target) for that measurement. “…supposed to” could also be “…designed to.” Design cannot be separated from measurement, they’re like bookends: the stuff in the middle won’t hold up if they’re not both in place. “Supposed to” could include:
- Conclusions drawn from the audience analysis
- The items on your boss’s performance evaluation
- Key performance indicators on the company’s annual report to shareholders
And by the way, the most important of all those is “make more money than we spend”.
So our measurement strategy is to determine how well we have designed and delivered an intervention to support a specific business process. Imagine that this is like digging a tunnel through two sides of a mountain with plans to meet in the middle. If we’ve done the analysis and design correctly, those two tunnels should seamlessly meet in the middle of the mountain.
So then, our measurement strategy is “Does it do what it’s supposed to do?”. More specifically: “Does the data we collect when people take this training indicate that the training is driving the correct business process target?” Even more specifically across various examples:
- Did the sales training increase sales of our new product?
- Did the safety training reduce lost-time injuries?
- Did the leadership training in basic financial skill reduce monthly budget inconsistencies?
- Did the customer service training decrease the complaints regarding rude representatives?
By the way, this is not a new idea. Dr. Deming, in teaching our friends in Japan about zero-defect quality, called this the PDCA model. Six Sigma calls it the DMAIC model. Learning calls it the ADDIE model. But no matter what you call it, the process is the same. When you design a solution to meet a need, the effective measurement is the degree to which the solution meets the need.
~Geek~
The Pepper Grinder Debacle
OK, so it wasn’t so much a debacle as a snafu, or maybe just a glitch. But the experience got branded on my psyche because it was so emblematic of what I believed about business. Intrigued?
I’m reading The Goal (finally) by Eli Goldratt (thanks for the recommendation, Chris at Ceptara) and it made me think of the Pepper Grinder Debacle so of course, I want to share.
I was a waiter for 13 years in establishments ranging from neighborhood taverns in South Miami Heights to fancy-ish hotel restaurants in Seattle. One of these was McGuffey’s Restaurants in Asheville, NC. If you Google McGuffey’s you’ll see old references to various outlets, including one in Branson, MO as well as the three or four that were once in Asheville. At its height, McGuffey’s had 14 restaurants across North and South Carolina and Missouri. The story of their rise and fall is one for another post, but I can say with confidence that when I worked there, it was one of the best restaurants in every market it served; particularly in Asheville.
At one point, in addition to my waiter duties, I was providing some quality consulting so they could apply for (what was then called) the North Carolina Quality Leadership Award (a stepping stone to the Malcolm Baldridge National Quality Award). So I had the honor of mucking about in some of the inner workings of the corporation in order to guide the award application process. The owner had decided that the Baldridge was well within his company’s abilities, but I had talked him down to the NC version as a warm-up.
In one conversation with the kitchen manager, a great guy whose name escapes me – let’s call him Earnest – I learned about the Pepper Grinder Debacle.
It was the late 80′s and McGuffey’s was introducing many of the trends in contemporary restaurants at the time including pepper grinders. This era also included the introduction of sun-dried tomatoes (Waiter, is this a tomato or a raisin?), albacore tune steaks served rare (Waiter, this fish is RAW!), and blackened food outside of New Orleans (Waiter, this chicken is burned!) but those also are posts for another day.
Rare, even for high end restaurants, we had matching salt shaker / pepper grinder sets on each and every table. At least twice a week I had to teach a guest how to make pepper come out the bottom, as they frantically shook it up and down trying to get pepper out of the little silver knob at the top.
The set matched: about 6 inches tall made out of some oak-ish looking wood. The salt shaker was simply a hollow piece of wood with holes poked in the top and a rubber stopper in the bottom, while the pepper grinder was (obviously) somewhat more complicated with an adjustment screw at the top, shaft running through the middle, and a grinding mechanism at the bottom.
The superior taste of fresh-ground pepper was not lost on our guests and it was common for pepper grinders to disappear from tables.
This is where Earnest comes in. Looking over a new shipment of various supplies with Earnest, I saw a case of salt and pepper sets in the loading dock. Earnest opened each box, took out the pepper grinder, and put the now half-empty (half-full?) box containing the salt shaker into storage. Looking over his shoulder, I saw a LOT of boxes that I assumed contained just a salt shaker.
“Ummm…Earnest, why do you buy the whole set when all you need is the grinder?”
You know where this is going. Earnest was the KITCHEN MANAGER and I was just a waiter. He explained patiently to me that they came as a SET. Duh.
I had some time on my hands, so looked on the box, found the name of the company and called them. It turned out that McGuffey’s paid $20 for a salt and pepper set. I asked what it would cost to just get pepper grinders. $17 for just a pepper grinder. This seemed easy!
Earnest however, wasn’t convinced. “What? That’s a rip-off! The grinder is half of the set, it should cost $10!”
I’ll summarize, because this is painful even in memory. Short version:
- Earnest thought I was crazy or at least meddling in things I didn’t understand.
- He continued to order sets and continued to fill up the storage room with salt shakers. Maybe he eventually burned them for heat in his home. I don’t know.
- McGuffey’s won neither the NC Quality Leadership Award, nor the Baldridge (you’re shocked, I know). They continued to delight customers for a couple more years, however.
Why Earnest couldn’t calculate the real value of not spending money on something he didn’t need and didn’t have to subsequently store, was beyond me then and remains beyond me now. I told you it was branded on my psyche.
I think of this because I’m delighted at the characters in The Goal as they newly discover that the actual purpose of their manufacturing plant is to make money and that every single thing they do is related to whether they make money or not.
While seemingly common sense, the Pepper Grinder Debacle was not so much a debacle as an example of doing what had always been done because it was easier than critically thinking about the relationship between something as innocuous as a salt shaker and the success (or eventual failure) of a company.
Makes me want to ask clients to show me the storage rooms they never go into to see what skeletons lurk there.
~Geek~
Live life in the intersection
I’ve been thinking a lot about Venn Diagrams lately. Or, more accurately, I’ve been giddily appreciating Venn Diagrams lately.
If you’re unfamiliar, they’re well outlined here.
As a way of ordering the universe I’m thinking (at least lately) that Venn Diagrams can’t be beat. The basic premise (in my musings) is that if a circle encloses all the members of a certainy category like all blonde-headed people, or all gardeners, or all circus clowns; and then another circle encloses all the members of a completely different category, like all lovers of books more than 1000 pages, or all people who like asparagus, or all bloggers that meander from weird topic to weird topic; well then what exciting possibilities exist for understanding our universe in the intersection of those populations?!
- What do we think about blonde-headed people who like asparagus?
- Is there a market to sell things to gardeners who love books longer than 1000 pages?
- Gracious, what conclusions can we draw about meandering bloggers who are also circus clowns?
As a lens through which to view the world (literally…a two-population Venn Diagram looks kind of like glasses, no?) Venn Diagrams seem to offer infinite possibilities.
If we are (as some of us do) measuring training:
- What conclusions can we draw about managers who’ve been in their role for more than 10 years AND engage in athletic hobbies?
- What about new hires who have experience in food-service (I always favored these folks as a hiring manager)?
- What about training held on a Monday compared with the same training held on a Wednesday?
In the same vein, what if we take the population of managers in a company for whom their team turnover is higher than the company average and investigate to see what else is true for them? What’s the size of the intersection for managers with high turnover and managers who work first shift? Second shift?
Measurement isn’t just about collecting numbers and crunching them around to see what they tell us. Sometimes the most useful measurement activity is thinking about all the possible populations we’re faced with and imagining how they intersect in order to imagine the story behind it.
- What if 1st shift managers in a production environment have higher team turnover? What hyphothesis do we form?
- What if leaders with MBAs have shorter overall tenure than leaders without MBA’s? What hypothesis?
- Do male new hires score higher on 90 evaluations than female? Whats that hypothesis?
Venn Diagrams. Bringing order and clarity. Try it…what are the population intersections you’re most interested in?
~Geek~
Everything IS Measureable
Sent my my friend Mark:
Demonstrating what I definitely think is true, namely that what you choose to measure has a LOT to do with what action you take. Success by one definition isn’t necessarily success by another.
Choosing the Right Analytics in Afghanistan.
~Geek~
Training Doesn’t Save Money…
…or make money; business processes do.
Not a ton more to say about this other than to ensure that we differentiate the two. Admittedly, I currently make my living measuring the “business impact” of training, but in truth what I do is help clients measure the business processes that are supported by training.
Training measurement books are as ubiquitious as diet books, but like the folks buying diet books without losing weight, buying (and reading) training measurement books don’t seem to be getting folks any closer.
The key is alignment with the business units for whom the training department is developing training. Actually, alignment doesn’t seem to quite get there…I’m talking about collaboration and partnership here. Business units need to collaborate with the training department to drive their key performance indicators.
I’ve said it before (and heaven knows I will again): The right success measure for training is the success measure for the business process it supports.
~Geek~
Use Your Measurement Powers for Evil instead of Good
When Good Measurement Goes Bad:
We all know that our world is governed by the numbers. Yes, there is the occasional maverick who ignores the numbers and goes full speed ahead based on a whim while the numbers, figures and statistics sit quietly in the back seat waiting for the inevitable crash and crumble so they can politely point out what they knew all along (you see my bias here).
We also all know that sometimes technology outpaces morality (genetic cloning…good or bad?) so that morality is running frantically alongside the bus of technological advancement pounding on the side of the bus in hopes it will slow down. It never does.
All that is to say that even good measurement practices can lead to bad outcomes:
In recent news, American Express has come under fire for lowering the credit limits of cardholders who shop at discount stores (like Wal-Mart). This is done through “behavioral scoring,” in which credit issuers model risk based in part on the repayment behavior of other card users that shop at the same establishments that you do. In other words, you’re being judged not just on your own merits, but on those of your fellow shoppers.The logic is impeccable, really…insurance companies have been doing this for years. If you’ve ever been (or aspire to be) a male under 25 years old in the United States, you know how unfair your car insurance premiums are simply because males under 25 have more accidents. So Amex compared the population of folks who don’t pay on time and found that a large percentage of those folks also shopped at Wal-Mart. Ergo, people who shop at Wal-Mart don’t pay their bills on time! Yes, there is a tragic flaw there, but it doesn’t prevent it from happening. Here’s some others:
The timing of traffic lights – when they go from yellow to red – is (logically) a function of the speed limit on the road. However, the increase of remote traffic cameras (that take a picture of the license tag and mail traffic tickets to offenders) has gotten some municipalities seeing green. These gray-hat folks are shortening the yellow light timing and letting the cameras send tickets to all the new red-light runners. Shame, shame.
Of course, the classic: In 1971, Ford Motor Company put the Pinto on the market to compete in the newly emergent small car segment in the US. Reports differ and I’m sure there are folks who would violently disagree, but many folks believe that Ford knew of the fuel tank flaw that could (and did on at least one occasion) result in the car exploding from a rear-end collision. The story goes that they calculated the likelihood of a collision and the cost of legal action (paying victims’ families) and compared that to the cost of recalling and fixing all the cars and made the (incredibly unethical) decision not to recall the car because it was cheaper to pay the families.
So I give you this admonition: go forth with your data to do good, not evil; be guided by your conscience as much as the data. You’ve been warned.
~Geek~
Fake, but cool point about process variability
OK it’s fake, but I didn’t know that at first because I’m just gullible that way. I was forwarded this video of a machine apparently created in collaboration by the Robert M. Trammel Music Conservatory and the Sharon Wick School of Engineering at the University of Iowa.
Yes, yes, yes it’s fake. But here’s the cool thing, and the thing that made me decide to post here. While watching the video (I hope you saw it before you knew it was fake, it’s pretty neato. Neato as a fake, too.) I kept thinking about (you guessed it!) process variability! Yay!
This machine spits balls out of various holes that bounce against strings and drum heads with perfectly choreographed timing in order to make music. The balls shoot out of distinctly non-precision holes (should have been my first indication of the animation) that fly through the air to exactly the right spot.
So (geek alert) I found myself wondering how the “output holes” could have possibly been engineered to shoot the balls exactly to the spot on the drum, vibraphone, or string in the right order! Then (HUGE geek alert) I started thinking about all the variables that would have to be managed:
- Ball size
- Ball weight
- “Instrument” position
- “waterfall” impact of tiny timing variability on the music
- Air temperature
- Wind/moving air
- output/input/output (it appears the balls cycle through the machine)
- Wear and tear on the balls
This is all I came up with but I’m sure engineers could come up with a truly scary list.
OK, here’s the Metrics Geek part. Assuming, as I do, that the balls shoot out of the output hole with some variability (minute perhaps, but still variable) and that the data set of the trajectory and landing point of each ball would display as a normal distribution. In order for each ball to actually hit its target in the right spot and on time, the data set would have to be ridiculously closely clustered around the mean (very little variability) to a point where realistically it’s probably not possible.
OK enough. It’s just a cool video and it’s nice to imagine that we can engineer music that way.
~Geek~
How Do Dashboards Drive Behavior
*DigitalSherpa comment: Would love to see some examples of what you consider to be “good dashboards.” So many that I’ve come across are poorly designed or don’t fully capture the information needed. It would be interesting as well to hear your thoughts on how dashboards drive behavior. At the executive level? The Developers? The Customer?
Example of dashboards…good idea! I’ll see what I can find to post. In the meantime I think the question about how dashboards drive behavior is a good one and is tightly linked to design. I think it’s most useful to back into the answer, however. In other words, rather than designing a dashboard that will drive the right behavior, the right behavior needs to be defined in order to design the dashboard. The easiest way is to ask:
“What decisions will be made from this data?”
Decisions made from data are usually some version of ‘course correction’ or ‘go/no go’ decisions. So a dashboard that drives behavior is one that provides exactly the right data to inform those decisions.
The other necessary moving part is knowing what warning thresholds to set so the data is properly displayed. If the possible scale of responses on a fictitious measure is 1 – 10 (where ten is perfect, and one is complete failure) the dashboard design needs to accommodate the warning threshold. If the target for that fictitious measure is 7 (of a possible perfect 10), then there may need to be a warning when it reaches 8. Thresholds are highly subjective and need to be tailored to the dashboard user. In that example, a warning at 8 may be sufficient, or it may be appropriate to set the threshold at 9. In all cases, it comes back to the ‘what decisions will be made?’ question and more specifically asking what action will be taken on the threshold warning. So if 7 is the target, what action will be taken when the score is 9? If none, I would recommend trimming the warning threshold closer to the target (8).
The obvious comparison/metaphor is that of an actual stop light. The timing of a yellow light has some relationship to the speed limit on that road (I assume). On a road with a 25 mph speed limit, a four second yellow light is probably sufficient. On a road with a 60 mph speed limit, a four second yellow light probably doesn’t allow sufficient time for a course correction (slow down and stop).
*Side note* In an incredibly festive example of using your data power for evil instead of good, here are some stories of cities that deliberately shortened yellow light times and then installed traffic cameras to ticket red-light-runners as a way to increase revenue! Brilliant! (although unethical)
So those are my convoluted thoughts on dashboards that drive behavior. In short:
- Define what action will be taken (or is expected) from data and model the data display to that action.
- Clearly define the warning thresholds
- Avoid (at all costs) data for its own sake.
~Geek~