Level 3 Evaluation
For the learning measurement wonks (and you know who you are), it seems important to take a jab at the venerable Dr. Kirkpatrick today.
For those who aren’t immersed in the learning measurement world, here’s the gist: Dr. Donald Kirkpatrick, way back in 1959, wrote his doctoral dissertation on the 4 Levels of learning measurement. His work remains the juggernaut of thinking and execution about measurement since then. If you’re expecting me to flay Dr. Kirkpatrick’s work here…sorry to disappoint, but it’s not going to happen. His work elegantly lays out a progressive measurement approach that makes perfect theoretical sense (yes, there’s a “but” in there). I will say that the way his work is implemented is often different than it was written (specifically Level 4), but I digress.
Level 3 in his model is Evaluating Behavior, particularly behavior on the job. Level 1 measures the learner’s reaction, Level 2 measures whether they learned what they were supposed to learn, Level 3 measures whether they’re using that new knowledge or skill on the job, and Level 4 measures what business results came from all that training. As I said, A great model.
In the training world, Level 4 sometimes gets replaced by Jack Phillips ROI model, which I don’t like very much for a completely different set of reasons. But today, it’s Level 3 upon which I need to geek-out.
For me, Level 3 evaluation (on the job performance) conjures images of supervisors crouching behind file cabinets with clipboards in their hands. Truthfully, I don’t even know whether file cabinets and clipboards are even part of the 21st century office anymore, but that’s the image.
Awright, enough rambling. I believe Level 3, as commonly practiced, is unsustainable. Hard words, I know, but I stand by them.
Up to the point when Level 3 is prescribed, measures of the training (Levels 1 and 2, as well as the others I think are critical earlier in the process) have been of, by, and for the training folks themselves. Once you get to Level 3 however, you’re out in the business…in the productive work flow if you will… and once you introduce a process to the work flow (like asking supervisors to fill out an observation “survey” about their recently trained employees) that doesn’t actually add value to the work flow, you brand it for extinction. Well-meaning training folks will create brilliant Level 3 documents and thrust them like sticks into the rushing flow of the business stream, where they will be snatched from their hands and whirled away. Gracious, that’s a cheesy metaphor. I should write romance novels.
Most companies have some sort of performance evaluation program (if they don’t, shame on them). Some of these are yearly, some are quarterly. Nearly all of them ask that supervisors evaluate employee performance in a number of different categories.
Well heck, if this training is so important to the business (and it is, or we wouldn’t be doing it…right?), then evidence of its successful expression on the job should already be baked into the performance evaluation process.
That said, it’s possible that the training results are a subset of the larger category that actually shows up on the performance evaluation.
(like learning in training how to put a key in the ignition, while the performance evaluation might simply rate how well you drive a car)
In cases like these, a temporary Level 3 survey or observation could be useful in the early stages of the training in order to ensure the link between the training and performance.
(if we find out in the Level 3 that even after training, folks aren’t putting the key in the ignition correctly, then the performance evaluation will fail and that Level 3 observation becomes appropriately predictive)
But as the work flow rushes by, the ideal situation is to have access to the data produced by the (hopefully…please let it be so) electronic performance evaluation process, so you can sort out the one related to the training and get regular data on how well employees are performing in that particular category. This will provide a consistent flow of data that directly reflects what I believe Dr. Kirkpatrick wanted us to learn in Level 3…namely whether people are using what they learned in their regular jobs.
I do believe Dr. Kirkpatrick’s work holds up well after all these years, but I think implementing any learning measurement program requires a bit of real-world scrutiny to see whether we’re creating a measurement process in order to check it off a list, or if we’re creating one that actually allows action to be taken for continuous improvement.
Go forth and measure…
~Geek~
Demographer Magazine
It’s The Onion, so there’s much to appreciate, but I admit this made milk come out my nose.
“Demography Today Magazine targets the demographer demographic.”
~Geek~
Lead the Horse to Water
When you, as I do, find yourself talking about training measurement over and over, themes emerge. Some of these are so important that they get a certain rhythm in the telling…they are almost poetic. OK yes, I’m a geek for this stuff. I actually like talking about measurement in organizations, but still. The poetry makes me smile.
One of the most common of these is that I have found most organizations to be “data rich and action poor.” Has a nice ring to it, no?
But poetry and rhythm aside, I believe this is true. As a random corporate drone, I have sat in my cubicle and looked at reports for which I was one of a zillion people on a distribution list. These reports were often beautiful. Colorful graphs lined up like soldiers standing at attention with crisp bright uniforms.
And silent. The soldiers were also silent. Which is to say, the charts were presented as graphic representations of data, but entirely lacking interpretation.
Now, one could make the argument (and many have) that it’s the report recipients’ responsibility (say THAT five times fast) to draw conclusions about the data and usually the report creator has spent so much time with the data that the conclusions are obvious: the equivalent of the uniformed soldier smacking the reader on the noggin with the butt of a rifle.
But I’m going to go ahead and say that the emperor has no clothes. I’m not scared. The vast majority of these reports that arrived in my cube, were little more than an ocean of numbers in which I paddled around looking for meaning for about 3 minutes (if I had extra time on my hands). But more often than not, if the report didn’t come from my boss accompanied by a little red exclamation point, it got filed for future reference (which is to say “ignored”).
My point, and I do have one, is that the purpose of a report is to communicate. This is not art and we are not Leonardo DaVinci painting a cryptic smile on a woman that can be interpreted any number of ways. We are busy folks and we are communicating to busy folks and if we don’t lead those horses to water, they’ll likely wander around the pasture bonking into fences.
(If you have been skimming this posting so far, here is where you should slow down and read) In order to communicate clearly, it is the report creator’s responsibility to interpret every report, every chart and graph, and to include a clear statement about what the report represents.
- “Product sales are generally on the rise, but have leveled off this month.”
- “Employee utilization is low this month, but still on target for the quarter.”
- “Customer satisfaction is down in three of five divisions and flat in the other two.”
It will often seem, as report creators, that these are painfully obvious statements. But I consistently find that what seems obvious to the person who’s just spent a full day structuring a data set is often not at all obvious to the person who has (literally) less than a minute to comprehend a report.
The best part is when someone looks at the report, reads my summary statement and disagrees with it! While this may not seem at first blush to be the best part, it means I’ve accomplished my goal; namely to have report recipients engage with the data enough to form an informed opinion. Yay!
Not that this means one should deliberately draw inaccurate conclusions on reports…folks will believe that you’re obtuse and no one wants that.
OK, much rambling. Here’s the gist (sorry skimmers): Vow today to produce no report without also producing a conclusion about what story the data tells. Similarly, when you receive a report, vow to email back to the sender if there is no interpretation. “Marv, thanks so much for sending this. Before I review the report in detail (yeah, right), what conclusion do YOU draw from this?”
Because of course, coming back to “data rich and action poor”, the purpose of all this data and all this reporting is to take action. So the implied question is “what should I do with this data?”. Unless someone looks at the report and takes some sort of action, then it may as well be a weather report. Interesting to know what sort of storm is coming (or more likely, in organization reports, what storm has just passed), but no ability to do anything about it.
~Geek~
Alignment with organizational goals
*Warning: this is a relatively unformed series of ideas (not uninformed…just unformed), so may appear to be rambling.*
There is a constant effort to align training with organizational goals. It seems that alignment to organization goals is on a continuum, with training being developed completely in the absence of what the organization is trying to accomplish on the far left of the continuum and regular conversation with training requestors somewhere in the middle. Yes, the middle.
Under ideal circumstances in the organizations I’ve worked with, the training leaders and the business unit leaders are in close communication with business unit leaders, some going as far as “embedding” a training professional within the business unit. While this reduces the response time and ideally results in the training pro getting a head-start in responding to requests, it remains a transactional relationship. The shoddy image below illustrates what I’m talking about here.

So then, what is the ultimate “alignment” for training? It’s the condition where training actually drives change in the organization. The summary of this ideal condition is one where the training team maintains an up-to-date dashboard of the key training indicators that are known drivers of organization measures.
In other words, imagine that the training team knows (through previous measurement) that product release training has to get to sales people within 30 days prior to product release. If, on the day of product release, only half the target audience has received the release training, the training team has the opportunity to go directly to the product team (or whoever) with a data-driven prediction of a gap in expected sales.
Yes, somewhat unformed…but I’m working on it. This is what comes from questioning closely-held assumptions (like the one that “alignment” is the ideal).
~Geek~

ADDIE & DMAIC: Two Sides of the Same Coin
Many of you are familiar with the ADDIE model of instructional design, yes?
- Analyze the audience, any current training or documentation, the organizational need.
- Design a training intervention…turn process documentation and tribal knowledge into useful training.
- Develop the intervention you designed…build web-based training, ajob aid, instructor-led training, etc.
- Implement the intervention…bring it to the people.
- Evaluate the results…by comparing the original needs of the audience with the experience after training.
When followed, these steps should ensure that training is designed with the learner and the organization objectives in mind. If you imagine that these steps boil down to:
- Determine the current state (Analyze)
- Determine the ideal state (Design)
- Build a bridge between those two states (Develop, Implement)
- Check your work (Evaluate)
Then you’ll start to see the ADDIE model as one facet of what is a pretty common problem solving model.
It seems important first to say that this reductionist model of problem solving isn’t the the only way to approach improvement or change. I won’t attempt to be sufficiently informed about chaos theory and quantum physics, but I can say that there always seems to be more to the world than meets the eye. So if you’re an aficionado of chaos theory as a problem solving methodology, please share it with us here.
For the rest of us, the process of reducing something to its constituent parts for analysis is common. This method can be boiled down even further to:
- “Don’t like what you’re getting? Do something new.”
Did you know that Six Sigma, driver of organizational change and spawner of management books, boils down to the same basic methodology? By understanding the basic steps in the ADDIE model you already have a basic knowledge of Six Sigma!
Six Sigma uses the DMAIC (pronounced “Duh-May’-Ick”) model, which stands for:
- Define the customer and their requirements
- Measure the current business process performance compared to those customer requirements.
- Analyze the difference between customer requirements and current performance.
- Improve the current process through changes that align to customer requirements.
- Control the achieved benefits by ensuring that the process doesn’t revert to old patterns.
These steps provide a framework for driving efficiency (reducing defects) in business processes. They also map easily to our previous 4 steps:
- Determine the current state (Define, Measure)
- Determine the ideal state (Also Define, since customer requirements drive the desired state)
- Build a bridge between those two states (Analyze, Improve)
- Check your work (Control)
These and other models like them are simply descriptions of a logical process for making sure something (training, an airplane manufacturing process, a selling strategy) does what it was designed to do…again, we see the relationship between design and measurement.
One of the foundations of Six Sigma is the work done by Dr. Edwards W. Deming in the early part of the 20th century, and one of Dr. Deming’s simple contributions to this idea of reductionist problem solving was his PDCA model, or:
- Plan for what you want to do.
- Do it.
- Check the results against your plan.
- Act on what the check tells you.
Dr. Deming’s work was a cycle…the “Act” portion, fed back into the “Plan” portion, and that’s one of the ways he illustrated the idea of continuous improvement. Six Sigma, includes “Control” as a way to ensure that results of improvements are embedded into regular processes and lead to continuous improvement.
Similarly, although I think this is rare, one of the primary consumers of the “Evaluate” information in the ADDIE model, should be the Instructional Designer(s) for updates to the course and to continue to improve the ability to design to stakeholder needs.
Just for fun, this model isn’t at all limited to business processes and training! In 1965, William Glasser wrote a book called Reality Therapy in which he outlined Choice Theory (described in greater detail through the Glasser Institute). The short version of Glasser’s therapeutic approach is that if you’re not getting what you want out of your life (relationship with your mom, success in school, whatever) then design some new behavior that will result in a different outcome. The point here of course, is that this reductionist method of problem solving is pervasive.
~Geek~
Statistics in action
It’s coming, Campers…the 2010 US Census. And the brouhaha is broiling around the place of statistical sampling in the official numbers. Apparently Dems favor sampling the population and drawing conclusions based on samples, and Repubs are opposed to sampling.
As we all know, congressional representation boundaries, education funding and a zillion other critical decisions are made based on the census. So what do YOU think? Sampling or not?
~Geek~
Decimate the competition
Where measurement and etymology meet. Getting some refresher info on mutually exclusive random sampling, I came across a definition of “decimate”.
Decimation (Latin: decimatio; decem = “ten”) was a form of military discipline used by officers in the Roman Army to punish mutinous or cowardly soldiers. The word decimation is derived from Latin meaning “removal of a tenth.
OMG, really? I see the connection now…”dec”, like “decimal” in a base-ten system.
So apparently (according to Wikipedia), a form of punishment for Roman Soldiers was to decimate them…to literally choose one soldier in 10 by lottery and have the other nine soldiers kill that one by stoning or clubbing (and not the kind of clubbing practiced in trendy downtown neighborhoods by festive hipsters, either).
The most interesting thing here (to me) is that some professor, whose statistics class notes are randomly available online, felt compelled to explain this little tidbit of trivia as an example of appropriate random population sampling. Dark, yes?
Bring this up in conversation today, I dare you.
~Geek~
Measuring the Assumptions
If we assume (and we do), that training has a positive impact on an organization’s goals, then the goal of a training measurement plan is to measure our assumptions about that impact wherever possible.
We know that:
- Training enables knowledge and skill
- Knowledge and skill enable job performance
- Job performance enables business process success
- Business process success enables organizational / financial success
In each of these items, the purpose of the training measurement program is as much as possible, to reduce the distance between “enables” and “causes”. Because causal relationships are rare outside of experimental conditions, our primary goal is to balance the cost of measurement with the minimum information needed to make informed decisions and take action.
Existing training measurement maps into these assumptions as well:
- Training enables knowledge and skill – Kirkpatrick Level 2 (knowledge transfer)
- Knowledge and skill enables job performance – Kirkpatrick Level 3 (behavior on the job)
- Job performance enables business process success – Kirkpatrick Level 4 (business results)
- Business process success enables organizational / financial success – Kirkpatrick Level 4 & Phillips ROI.
Number 4 maps to both Kirkpatrick and Phillips, although in application I believe that both those approaches are lacking at this level. Kirkpatrick’s original work definitely called out business metrics and financial success at Level 4, but he didnt’ break out business process-level measures from organizational measures (the difference between increasing sales and increasing profit). Phillips ROI model produces a percent ROI calculation that I think is misleading, although the path to that calculation is incredibly useful for understanding the interplay between training and other organizational activities.
~Geek~
Speak With Facts
“Speak With Facts” was one of several slogans tossed around while I learned about and helped facilitate the Quality Improvement Program at Florida Power & Light Company back in the dim ages (lighting pun intended). FPL won the Deming Prize in 1989 in no small part because of that slogan and others (then promptly put them all in a deep file drawer in a hidden file room so they could get back to the business of generating electric power; but that’s another story), but that slogan stuck with me.
Of course we all agree that measurement is important. Measurement of everything, not just learning, not just business measures, not just our grocery spend for the week…everything. But we also make decisions every day in the complete absence of measureable data.
You’re freaking out right now from the very thought that a decision could be made without data and I am SO with you on that. But we must persevere.
How then, do we balance our appreciation - our desire – for data to backup our decisions, with the need to listen to that internal voice that tells us whether a decision just feels right? I think there is a zen state of internalizing data and information; of keeping (if you will) a mental spreadsheet with high level data or at least high level themes that inform your decision making.
Some could (and likely should) note that this is a little crazy and not bound to make me popular at cocktail parties, but to heck with them.
~Geek~
It’s ALL about the assumptions
I have to admit that this seems so basic to me and perhaps it comes from my time working with ADHD children when, as a caregiver, you take NOTHING for granted in your communication. I found these kids to be brilliant at finding loopholes (perhaps this is all kids).
For those kids and for other folks I deal with regularly, if we don’t agree on our terms – have the same assumptions – then our communication fails. In business, this is also often called setting expectations. Same dealio.
Assumptions are the map through which explanation paths are laid (I love this line…it came to me just before bed last night).
In order to agree, for example, that Interstate 5 treks north through California, Oregon and Washington, we have to agree…have to assume…what the boundaries are for those states and even have to agree that there ARE arbitrary political state lines across the landscape. Generally we take state boundaries for granted, so map makers don’t have to convince us (at least for US maps).
Frankly, even “north” is a social construct, so while we can verify factually that there is a highway laid across the landscape, it’s only the context that allows us to say that it runs “north / south” through “California”, “Oregon” and “Washington”. We could as easily say (in Washington and some of Oregon) that it runs just “west” of the Cascade mountain range.
Why, you’re wondering right now, does this make a damned bit of difference? Well I’m glad you asked…it’s because we regularly try to communicate…to demonstrate a point…so someone else with the expectation that they will come to the same conclusion we do, and outside of agreement on that context (agreement on those expectations) failure is common. Examples:
- “It will take me 1 hour to finish this report.” – There is clarification for “one hour” (some folks believe that 15 minutes past the hour is still on time), and “finish” (is that the first draft, or the final approved draft?).
- “I will clean the kitchen.” – I could take the rest of the day clarifying different interpretations of “clean” (mopping? doing dishes or just loading the dishwasher?), but even “kitchen” may need clarification (where, exactly, does the kitchen end and the next room begin?).
- And a personal favorite “I can create Level 2 web-based training for $20K per finished hour.” This dogged my professional world for better than a year. What’s the industry definition of “Level 2″? What’s the client’s definition of Level 2? Does a “finished hour” include time for participants to take assessments? Does this mean that a half hour is $10K?
I get accused of being unnecessarily literal but I gotta tell you…it takes only a little effort and given the choice between being literal and being vague, I’ll take the former.
~Geek~