Sales Intelligence

Conversation Intelligence for Sales Enablement: Stop Measuring Deal Signals, Start Measuring Skill Lift

Published May 15, 2026·8 min read·Alex Handsaker

Most enablement teams that buy conversation intelligence do so with a specific goal: to demonstrate that coaching programmes work, and the logic is sound: if you can score every rep on every call, you should be able to show whether scores changed after training. What teams find after a year though is that the scoring is there but the measurement is not, and the ROI case remains as difficult to make as it was before they had the tool.

The problem is almost always the schema - Most conversation intelligence platforms are built around deal signals: whether the buyer confirmed a timeline, whether a competitor was mentioned, whether a next step was agreed. These signals are useful for pipeline management and forecast accuracy but they're not useful for measuring whether a rep developed a capability. Things like skill lift require a different set of questions, measured over a different window, against a different benchmark, and most teams never change the schema because they do not realise the one they inherited was built for a different job.

Hand-sketched comparison of deal signals feeding deal state and skill signals feeding rep capability. — Deal signals and skill signals answer different questions, so they need different schemas.

What deal signals actually measure

Deal signals answer the question: is this deal progressing? A boolean for "buyer confirmed timeline" tells you whether the pipeline record should be updated. "Competitor mentioned" tells you whether the deal is at risk. "Next step agreed" tells you whether the deal is likely to stay active. These are all useful questions, and the case for extracting them automatically from calls is strong. But they describe the state of a deal, not the capability of the rep.

When enablement programmes are evaluated against deal signals, the measurement produces a correlation at best. A rep with a strong quarter will have better deal signal scores, and a rep with a weak patch will have worse ones. Neither tells you whether the coaching programme changed anything. A rep can have a buyer confirm a timeline on a call where the discovery was shallow, the pain was never quantified, and the next step was agreed because the buyer was polite. The deal signal resolves as positive. The skill you were coaching did not improve.

The reason enablement teams end up measuring deal signals is inheritance: they get access to a CI platform that already has a schema, and that schema was built for the sales leader, not the enablement team. The fields that exist are the fields that get measured, and the insight that comes back looks like enablement data but answers a different question. As most CI tools are not built for enablement analytics, the schema mismatch is structural, not accidental.

What skill lift actually means as a measurement target

Skill lift is a specific, measurable thing: a quantifiable change in a defined capability, observed consistently across a cohort, over a window that spans a coaching intervention. That definition has four parts, and each one has implications for how you set up the measurement.

Hand-sketched measurement contract showing skill lift connected to quantified score, defined capability, same cohort scale, and before and after window. — Skill lift only becomes measurable when all four parts hold at the same time.

Quantifiable means the skill is expressed as a number or a category, not as a manager's impression. If the capability you are developing is discovery depth, you need a score for discovery depth, not a note about whether the call was good. Defined means the capability has a specific, written description of what it looks like when a rep demonstrates it, precise enough that the same evaluation criteria produce the same score on the same call regardless of who runs the evaluation. Consistent across a cohort means the schema does not change between reps or between weeks: the score for a rep in week one and a different rep in week ten are on the same scale, measuring the same thing. And spanning a coaching intervention means you have scores from before the programme started and scores from after it ran.

Most enablement measurement fails on at least two of these. Scores exist but are not quantified consistently. The capability is described broadly enough that the evaluation drifts between reviewers. The schema changes partway through because someone thought it needed improvement. Or the before-period data does not exist because scoring was not running before the programme launched. Any one of these breaks the measurement. All four need to hold simultaneously for the before-and-after comparison to mean anything.

The difference between a deal signal and a skill signal

The distinction is clearest in concrete examples. "Buyer confirmed a specific timeline" is a deal signal: it tells you something about pipeline health. "Rep successfully drew out a specific, quantified timeline from a buyer who initially gave a vague answer" is a skill signal: it tells you something about whether the rep applied a coaching technique. The first is extractable from the buyer's words. The second requires evaluating the rep's approach and its effect on the conversation.

"Buyer articulated a specific pain" is a deal signal, not a discovery depth skill signal. The skill signal is "rep asked follow-up questions that moved the buyer from a surface-level problem statement to a quantified, specific pain." The buyer's articulation is evidence that the rep's skill worked, but the evaluation should be looking at what the rep produced, not just what the buyer said. A rep can get a buyer to articulate pain in a call where they happened to be well prepared. The skill signal asks whether the rep's technique was the cause, not whether the outcome appeared.

This distinction requires a different approach to Brick design. A deal signal Brick evaluates a buyer statement, but a skill signal Brick evaluates a rep technique and its observable effect. The criteria for a skill signal Brick need to be specific about both the action (what the rep did) and the evidence that it worked (what the buyer produced as a result). That is harder to write, but it is the only formulation that produces data you can use to evaluate a coaching programme.

Hand-sketched skill signal Brick pipeline showing rep technique, buyer response, and typed skill score with evidence quote. — A skill signal Brick scores the rep action and the buyer evidence it produced.

Designing the schema for a specific capability

An enablement schema built for skill lift starts from the capability, not from what is easy to extract. The question is: what does demonstrating this skill look like in a transcript, and how would you know the rep produced it rather than the buyer volunteering it unprompted?

For a discovery depth programme, the schema might cover three dimensions. First, whether the rep moved a buyer from a stated surface problem to a quantified, specific pain, with the criterion requiring both a follow-up sequence from the rep and a measurably more specific buyer response by the end of it. Second, whether the rep connected the buyer's stated pain to a business impact, with the criterion requiring the buyer to name the impact rather than the rep naming it on their behalf. Third, whether the rep secured buyer confirmation of the priority level of the pain, with the criterion requiring an explicit buyer statement about urgency rather than an implicit one inferred from tone. Each of these is a rep skill with observable evidence in the transcript, and each maps directly to the thing the programme is trying to develop.

The same logic applies to other coaching programmes. For objection handling, the schema looks for whether the rep acknowledged the objection, reframed it rather than deflecting it, and recovered buyer momentum. For mutual action plan adherence, it looks for whether the rep introduced the plan, confirmed each step with the buyer, and got explicit commitment rather than polite agreement. In each case the Brick criteria describe both the technique and the evidence that it landed, so the score reflects capability rather than outcome.

The measurement window and what it requires

The before-period needs to start running before the programme is announced. If reps know scoring is happening, behaviour changes. If scoring starts the week before training, the before-period captures a modified baseline. The most reliable approach is to have scoring running continuously, so any window of calls is a valid before-period without any change to rep awareness or behaviour.

The after-period needs long enough to distinguish programme effect from natural variation. A rep who has five calls in the first two weeks after training may have improved on three of them for reasons unrelated to the programme. At twenty calls per rep, cohort-level patterns become visible. At fifty calls per rep across the cohort, programme-level effects separate from individual variation reliably. The minimum viable measurement is one cohort, one programme, and a schema that does not change between the before and after windows. Any schema change between the two windows creates a confound that invalidates the comparison.

This is why a versioned evaluation schema is not optional for enablement measurement. Kits lock the evaluation criteria at a point in time, so the score from a call in week one of the before-period and the score from a call in week eight of the after-period are on the same scale, produced by the same rubric, comparable without qualification. Changing a Brick definition mid-measurement is equivalent to changing the test between a student's mock exam and their final: the scores cannot be compared.

Closing the feedback loop

The measurement data is most useful when it feeds back into the programme design, not just the ROI case. If discovery depth scores improved on the first dimension (moving from surface to specific pain) but not on the second (connecting pain to business impact), the programme worked on one skill and not the other. That is a precise finding: the next iteration should spend more time on business impact framing and less on follow-up questioning technique.

Hand-sketched before and after skill score chart with follow-up depth improving, business impact flat, and priority confirmed slightly improving after training. — A sketched skill-lift chart makes the coaching result visible by Brick, not just as a programme-level claim.

That kind of iteration is not possible with deal signal data. A deal signal that improves after training tells you that deals got healthier; it does not tell you which specific skill changed, which element of the programme produced it, or where the next investment should go. Skill signal data does all three, and it compounds across programmes: each iteration builds on the last because the measurement is specific enough to show what worked and what did not.

The enablement teams that build durable measurement infrastructure are the ones that design the schema first, before the programme launches, and treat the evaluation criteria as a commitment rather than a first draft. The ROI case follows from the data, and the data follows from the schema. Getting the schema right is the investment that makes everything else in the measurement stack worth building.

Semarize produces versioned, queryable skill signal scores from every call: the same schema, the same scale, the before and after data you need to measure what coaching actually changes.

Start building →

Common questions

What is the difference between a deal signal and a skill signal in a call evaluation schema?

A deal signal evaluates the state of a deal: buyer confirmed a timeline, competitor was mentioned, next step was agreed. It answers whether the deal is progressing. A skill signal evaluates a rep capability: whether the rep applied a specific technique and whether it produced the expected buyer response. Deal signals are useful for pipeline management, and skill signals are useful for coaching measurement. The two require different schema designs, and using deal signals to evaluate a coaching programme produces correlations with deal health rather than evidence of skill development.

How do you design a Brick that measures rep skill rather than deal outcome?

The criteria need to specify both the rep action and the evidence that it worked. A criterion that evaluates whether the buyer articulated a specific pain is a deal signal: it scores based on what the buyer said. A criterion that evaluates whether the rep used follow-up questioning to move the buyer from a surface statement to a specific, quantified pain is a skill signal: it scores based on what the rep did and whether it produced a measurable change in the buyer's response. The difference is whether the evaluation is looking at the rep's technique or the buyer's state.

How many calls do you need to measure skill lift reliably?

For cohort-level trends, twenty calls per rep per measurement window is a workable minimum, with fifty producing stable results. Below twenty, individual call variation dominates the aggregate and makes programme-level effects hard to separate from noise. For the before-period specifically, the calls should come from a window where rep behaviour was unaffected by knowledge of the programme or the scoring. Continuous scoring rather than programme-triggered scoring produces the most reliable baselines.

What happens to the measurement if the schema changes mid-programme?

The before-and-after comparison breaks. A schema change partway through a measurement window means the before-period scores and the after-period scores were produced by different criteria, and any difference between them could be an artefact of the schema change rather than a programme effect. The evaluation schema should be locked for the duration of a measurement window and version-bumped only at the start of a new period. If you discover the schema needs improvement during a measurement window, document it and apply it at the next natural break rather than mid-programme.

How do you use skill lift data to improve the next programme?

Look at which Bricks moved and which did not. If a discovery depth programme improved scores on follow-up questioning but not on connecting pain to business impact, the next iteration should weight business impact framing more heavily. If objection handling scores improved on acknowledgement but not on reframing, the coaching content on reframing needs more depth. Skill signal data at the individual Brick level gives you the specific finding that deal signal data cannot: which element of the programme worked, which did not, and where the next investment in content or coaching time should go.

Continue reading

Conversation Intelligence Isn't Enablement Analytics. Here's What Is.

Sales enablement teams buy conversation intelligence to measure coaching impact, then find the dashboards don't produce what they need: consistent rubric scoring, queryable time-series data, and before-and-after skill lift metrics. Visibility into calls and measurement of skill development are different problems - and most CI tools only solve the first one.

Read post

Sales Coaching

Why Conversation Intelligence Doesn't Drive Behavioural Change (and What Does)

Eighteen months into a CI implementation, many teams find that call scores have improved but win rates haven't moved. The data is there. The dashboards are running. The coaching is happening. What's missing is the step where insight becomes a different behaviour in the next conversation - and CI alone doesn't close that gap.

Read post

Sales Coaching

AI Scorecards Don't Disagree. Your Prompt Does.

Inconsistent AI scorecards aren't an AI problem - they're a process failure. Freeform prompts ask the model to re-interpret evaluation criteria on every run, and that interpretation drifts with phrasing, model updates, and context. The fix is an evaluation contract: a locked schema with defined output types that produces the same result on the same call, every time.

Read post