Trust Before Advice

Setting the rules for AI judgement about how someone led

The problem. People mistrust AI when they can't see how it got to its conclusions. How might we surface AI judgement about how someone led that a manager will trust enough to learn from?

What I did. I designed how Wendi's AI surfaces its coaching insights about how someone led, with the manager kept as the authority on their own relationships.

Outcome. Two design partners converted from free trial to paying annually. PostHog showed both checking the per-meeting insight after every recording. That return behaviour was the signal we followed to expand leadership insights across the product.

Leadership insights, at three scales.
One meeting

How did I do in this meeting?

People Dwight Schrute Career check-in
🎙 Take Notes Ask Wendi
Career check-in
with Dwight Schrute ›
⊕ Share Remove from timeline ▾
Meeting Summary
Leadership Insights
Overview

The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.

WHAT WORKED
  • Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
  • Asked what was behind the frustration before offering solutions.
  • Reframed the question from "are you leaving?" to "what work do you actually want?"
TIGHTEN
  • The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
  • A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
  • The "visuals person" framing was named twice without being addressed.
POSSIBLE ISSUES
  • Dwight has been quieter in standups for two weeks before this came up.
  • External coffee chats often precede active interviewing.
  • The auth flow is the third strategic call given to a peer in three months.
NEXT COACHING MOVE
Try this: Name the decision before the next step
Use when: a report points to a specific decision that landed badly
Script
  1. "Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
  2. "What's the gap between the work you're being given and the work you'd want?"
Why it helps: Names the specific decision before offering a plan. Without it, "let's talk about your career" can read as deflection.
Dimensions Expand all
Clarity & Direction
A career conversation got booked but the substance stayed vague.
WHAT WENT WELL
  • Took Dwight's signal seriously by booking 45 minutes, not the usual 30.
WHAT TO IMPROVE NEXT
  • Tuesday got booked but no specific commitment was named on what would be different.
  • "We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.
Supportive Coaching Move
When to use: a career conversation gets scheduled but no immediate change is named
Do: name something concrete that changes between now and the next conversation
Script

"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."

Listening & Understanding
Heard what triggered the frustration, but didn't acknowledge it.
Empathy & Respect
Stayed steady when Dwight mentioned external coffee chats.
Coaching & Development
Surfaced the structural question without resolving it.
Multiple reports, over time

How am I doing with my reports?

Leadership Insights
🎙 Take Notes Ask Wendi
Leadership
What Wendi noticed in your recent 1:1s.
Based on 12 meetings across 3 people
Insights
Trends
Meetings
Reflection
Across your recent 1:1s, you create space when people push back on decisions or name something they want to own. The area to watch is what happens after: career conversations get booked but the work distribution often doesn't shift.
Next focus
Name what changes between now and the next conversation, not just when it'll happen.
"I've been having coffee chats outside the company. Not interviewing yet."
You stayed even and asked what was behind the frustration. The auth flow reassignment didn't get acknowledged in the meeting.
"Can we talk about what promotion readiness looks like?"
You committed to following up but no specific change to the promotion process was named.
Creating space for pushback Seen in 4 meetings
When people raise concerns about decisions you've made, you ask what's behind the concern before defending the call.
From your meetings
"I actually think we should re-evaluate the API design before committing to the new schema." — Dwight Schrute, 6 Mar
"I need to be honest, I don't think the timeline is realistic." — Pam Beesly, 12 Apr
Letting rationale come from the person Seen in 3 meetings
When someone explains their reasoning, you engage with the rationale rather than overriding it.
Closing on ownership before the meeting ends Seen in 3 meetings
Career conversations and ownership questions get booked for later, but the structural change between now and the next conversation isn't always named.
Acknowledging decisions that land badly Seen in 2 meetings
Specific decisions people raise as having affected them tend to surface but not get named back to them in the meeting.
All managers

Which of my managers need a check-in?

Admin Leadership
🎙 Take Notes Ask Wendi
Admin

3 people 12 meetings

How your managers show up in their 1:1s.

Settings
Dashboard

What Wendi flagged

Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.

URGENT

Flight risk signal detected in Dwight's team

A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.

Flight risk · 24 Mar · Dwight Schrute

Workload concerns raised across Jim's team

Two reports independently flagged unsustainable hours and unrealistic sprint commitments.

Workload · 20 Mar, 17 Mar · Jim Halpert

WATCH

Interpersonal tension flagged in Pam's team

Conflict · 19 Mar · Pam Beesly

NOTED

Peer performance concern raised in Jim's team

Performance · 14 Mar · Jim Halpert

Your managers

Dwight Schrute On track
9 meetings · 4 reports

Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.

View leadership breakdown
Jim Halpert Gap detected
12 meetings · 3 reports

Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.

Hide leadership breakdown
Openness
Clarity
Follow-through
Growth
Ryan Howard hasn't had a 1:1 in 12 days.
Pam Beesly On track
8 meetings · 3 reports

Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.

View leadership breakdown
Context

The unscalable part of management.

Coaching is the bit every manager needs and few get.

Small HR teams can’t be in every room coaching managers through every conversation they have. Wendi was built to extend what HR already does - not replace it; to give managers support at a scale one person can’t personally deliver.

The first version of Wendi (a ChatGPT-style tool) didn’t get much traction. Was Wendi a productivity tool that also coached, or a coaching tool powered by productivity data? PostHog showed both design partners opening the leadership page regularly even before we’d designed it properly, so that’s where we doubled down.

The first prototype our engineering team created was built around eight leadership dimensions adapted from Google’s Project Oxygen, scored per meeting. How do you show AI judgement about someone’s leadership in a way they’d trust?

The constraints.
Backend pipeline already built. Eight dimensions scored per meeting. The scoring wasn't up for debate.
Two paying users. Enough to watch behaviour in PostHog. Not enough to run qualitative tests on the leadership surface specifically.
Manager user, HR lead buyer. Two audiences to serve from one surface.

The Problem

Building trust in AI judgement.

How do you surface AI judgement about someone's leadership without triggering the appraisal response?

People mistrust AI because they can’t see how it arrived at what it generated. In coaching, where the AI is making claims about how a person handled a difficult conversation, the design has to earn trust first. The first version of leadership insights looked more like a report card with scores per dimension and prescriptive advice.

The report-card version we redesigned away from.
People Dwight Leadership Insights
🎙 Take Notes Ask Wendi

Overview

You scored 5.5/10 on average across eight leadership dimensions in this meeting. Two dimensions are below threshold and may need attention before the next 1:1.

WHAT WENT WELL

  • Asked two open questions before offering an opinion.
  • Paused to hear concern about shared inbox.

WHAT TO IMPROVE

  • Did not commit to a follow-up on the tech lead growth path.
  • Action items without clear owners.
  • No reference to company strategy.

POSSIBLE ISSUES

  • Dwell time on tactical topics increasing by 18% over 3 meetings.

Risk Signals

urgent flight-risk

Dwight raised a promotion question that went unresolved.

2nd mention of career growth in 3 weeks without committed follow-up.

watching workload

Overload mentioned for the 2nd time.

Next Coaching Move

Try this: Close a career question with a timeboxed commitment

Script
1.

“Let's block time next Tuesday to sketch what the tech lead path looks like. Bring three things you'd want to own.”

Dimensions

Inclusive Environment 7/10
Dwight raised a concern about the shared inbox and you paused to hear it.
Coaching Moments
Before

“Sounds like we're aligned then.”

After

“Is there anything that isn't sitting right?”

Communication 6/10
Weekly check-in had three topic shifts before landing on the growth question.
Coaching 8/10
You asked two open questions before offering an opinion.
Career Development 4/10
Dwight mentioned wanting a tech lead role. You said you'd think about it.
Coaching Moments
Before

“I'll think about it.”

After

“Let's block time next Tuesday to sketch that out.”

Results Oriented 5/10
Three action items emerged but only one had an owner.
Empowerment 6/10
You gave clear direction on the inbox issue.
Vision & Strategy 3/10
No reference to company or team direction in the 24-minute meeting.
Decision Making 5/10
Two decisions were implied but not stated.

A score asks: 7.2 out of what? What if we tried to display the data as an observation instead?

SCORE

"Inclusive Environment: 7.2/10"

OBSERVATION

You go deeper on growth conversations with Dwight than with the rest of your team.

Which one would you come back to?


The Meeting Summary

Every claim, traceable.

The first trust mechanism: tie every AI claim to a source.

Every claim the AI makes in a meeting summary is tied to a transcript line with a timestamp. A claim you can see the source of is one you might decide to trust.

The meeting summary is the manager’s first read after a recording. From interviews with HR leaders, I knew managers were short on time and had a lot competing for their attention. The question was: how could the summary make a manager’s life easier?

Design principle

A manager with two minutes should walk away with their next move.

A chronological transcript-dump won’t get read. I reorganised the summary around outcomes: overview of topics, decisions and agreements, action items with owner chips.

Old summary

The meeting began at 10:03. You joined and asked how Dwight's week had been going. Dwight said it had been fine but that he was feeling slightly overloaded because of the onboarding work. You asked if there was anything specific blocking him and Dwight said that the shared inbox was still unclear to him. They discussed the shared inbox for several minutes. Dwight explained that the handover process from the previous team member hadn't fully transferred knowledge about which messages needed same-day responses. You suggested he look at the documentation and said you would check in with the team lead about it. Dwight said that would be helpful and that he would follow up by end of week.

You then moved on to the Q3 targets. Dwight said he felt the targets were achievable but that the timeline was tight given the current headcount. You said the timeline was fixed and suggested he prioritise accordingly. Dwight asked about whether there was any flexibility on scope and you said you would look into it. Dwight said that would be helpful. You both agreed to revisit scope in the next planning session.

You then asked about Dwight's growth plan. Dwight mentioned he had been thinking about moving into a tech lead role within the next year and that he wasn't sure if that was something you supported. You said you would think about it. Dwight said he appreciated that. You asked if there was anything else.

Dwight mentioned that the team retro from last week had surfaced some concerns about the sprint planning process. You asked what specifically. He said some team members felt there wasn't enough time to discuss blockers before being assigned tickets. You said you would raise it in the next planning meeting. Dwight said that would help. You agreed to revisit this in the next 1:1.

You asked about the product launch preparation. Dwight said the engineering work was on track but he was worried about the QA timeline. You asked him to estimate how much runway they needed and he said ideally two more weeks. You said you would talk to the PM. Dwight said thanks.

The meeting ended at 10:27.

Redesigned summary
Meeting summary with one bullet linked to its transcript source line
Quick and actionable items replaced the summary. Sources are included as a way to build trust and support Wendi's claims.

Every claim has a source the manager can check. If the AI got something wrong, they can see exactly where.

Per-meeting insight.

How well did I do in this meeting? Is there any pattern I might have missed in the moment?

People Dwight Schrute Career check-in
🎙 Take Notes Ask Wendi
Career check-in
with Dwight Schrute ›
⊕ Share Remove from timeline ▾
Meeting Summary
Leadership Insights
Overview

The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.

WHAT WORKED
  • Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
  • Asked what was behind the frustration before offering solutions.
  • Reframed the question from "are you leaving?" to "what work do you actually want?"
TIGHTEN
  • The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
  • A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
  • The "visuals person" framing was named twice without being addressed.
POSSIBLE ISSUES
  • Dwight has been quieter in standups for two weeks before this came up.
  • External coffee chats often precede active interviewing.
  • The auth flow is the third strategic call given to a peer in three months.
NEXT COACHING MOVE
Try this: Name the decision before the next step
Use when: a report points to a specific decision that landed badly
Script
  1. "Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
  2. "What's the gap between the work you're being given and the work you'd want?"
Why it helps: Names the specific decision before offering a plan. Without it, "let's talk about your career" can read as deflection.
Dimensions Expand all
Clarity & Direction
A career conversation got booked but the substance stayed vague.
WHAT WENT WELL
  • Took Dwight's signal seriously by booking 45 minutes, not the usual 30.
WHAT TO IMPROVE NEXT
  • Tuesday got booked but no specific commitment was named on what would be different.
  • "We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.
Supportive Coaching Move
When to use: a career conversation gets scheduled but no immediate change is named
Do: name something concrete that changes between now and the next conversation
Script

"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."

Listening & Understanding
Heard what triggered the frustration, but didn't acknowledge it.
Empathy & Respect
Stayed steady when Dwight mentioned external coffee chats.
Coaching & Development
Surfaced the structural question without resolving it.
2.2 The per-meeting insight.

Leadership Insights

Observations > scores.

How am I doing across all my meetings, and what do I know about each person that might shape how I manage them?

Leadership Insights
🎙 Take Notes Ask Wendi
Leadership
What Wendi noticed in your recent 1:1s.
Based on 12 meetings across 3 people
Insights
Trends
Meetings
Reflection
Across your recent 1:1s, you create space when people push back on decisions or name something they want to own. The area to watch is what happens after: career conversations get booked but the work distribution often doesn't shift.
Next focus
Name what changes between now and the next conversation, not just when it'll happen.
"I've been having coffee chats outside the company. Not interviewing yet."
You stayed even and asked what was behind the frustration. The auth flow reassignment didn't get acknowledged in the meeting.
"Can we talk about what promotion readiness looks like?"
You committed to following up but no specific change to the promotion process was named.
Creating space for pushback Seen in 4 meetings
When people raise concerns about decisions you've made, you ask what's behind the concern before defending the call.
From your meetings
"I actually think we should re-evaluate the API design before committing to the new schema." — Dwight Schrute, 6 Mar
"I need to be honest, I don't think the timeline is realistic." — Pam Beesly, 12 Apr
Letting rationale come from the person Seen in 3 meetings
When someone explains their reasoning, you engage with the rationale rather than overriding it.
Closing on ownership before the meeting ends Seen in 3 meetings
Career conversations and ownership questions get booked for later, but the structural change between now and the next conversation isn't always named.
Acknowledging decisions that land badly Seen in 2 meetings
Specific decisions people raise as having affected them tend to surface but not get named back to them in the meeting.
  1. A
    Reflection + Next focus Plain prose names what's working and what to watch. No score. The Next focus is one concrete move, not a script.
  2. B
    Named source moments Moments are tagged with the person and date. Provenance at the person level, not a flat score.
  3. C
    Patterns labelled by frequency Each pattern shows how often it was seen, not a verdict. Soft framing that earns confidence as data accumulates.
  4. D
    Evidence drawer Each pattern expands to transcript quotes with a leaf-source icon. Every claim links back to a real moment.

On scoring. Scoring qualitative data has always been difficult to do. I knew we were still going to score in some way on the backend, but the user didn’t need to see it. One wrong score could break trust in the whole tool. Besides, the point of the tool was not to gamify management, but to help managers choose a next step with their reports.

That settled how to present the dimensions. The other question was how many to show at all.

On the eight dimensions. The dimensions were adapted from Google’s Project Oxygen. Eight dimensions on one page asks the user to weigh eight separate signals before they can act. I cut it from eight to four, but I didn’t get to validate this with users.

Before 8 Oxygen dimensions Scored 1–10, per dimension, per meeting
After 4 categories Observational, unscored, scannable
Inclusive EnvironmentCommunication
Openness Are people comfortable disagreeing?
Results OrientedDecision Making
Clarity Did the meeting leave with a clear decision?
Results OrientedEmpowerment
Follow-through Do commitments get closed?
Career DevelopmentCoachingVision & Strategy
Growth focus Is long-term development getting space?
3.1 How eight backend dimensions became four user-facing categories.

On voice. The first AI responses were lengthy and lacked action items a manager could pick up from skimming. They were also too direct, with criticism shaped as instruction. The reaction: ‘I don’t want it to tell me what to do.’

Strengths address the manager directly: “You asked strong follow-up questions.” The AI describes situations, not people. This matters for users who aren’t confrontational, culturally or personally.

INSTRUCTION

"You lost focus."

SITUATION

"The conversation shifted away from the agenda."

Sensitive observations describe what happened, not what the manager did.

Strengths: address the manager directly.

Rejected draft

“Demonstrated strong rapport-building skills.”

Shipped

“You asked strong follow-up questions.”

Sensitive observations: shift to what happened.

Rejected draft

“You lost focus in the middle of the meeting.”

Shipped

“The conversation shifted away from the agenda.”

Coaching moves: name the move, don't script the words.

Rejected draft

“Say this: 'Who owns this?'”

Shipped

“Try this: pick the must-win, name the trade. Use when priorities compete.”

3.2 The three voice rules.

On confidence. Insights depend on how much data we’re able to pull from meeting transcripts, notes and chats. The page can’t speak with confidence after one meeting, but early users still need something useful from it. So I designed it to hedge language until enough meetings accumulated: Empty before any data, Early during accumulation, and established at six. Six was a judgment call I didn’t get to validate.

Empty 0 meetings
Silent. Doesn't pretend.
Leadership
Leadership

What Wendi noticed in your recent 1:1s.

Based on 0 meetings.

No 1:1s on record yet.

Wendi will start surfacing patterns once you've recorded a few meetings.

Start a recording
Early 1–5 meetings
Hedged. Names the limit.
Leadership
Leadership

Based on 2 meetings with Dwight.

Reflection

From two meetings, you've created room for Dwight to push back on design decisions. Too early to call it a pattern.

Next focus

Watch whether the rework concern keeps showing up.

Recent moment

"Should we re-evaluate the onboarding IA before committing?"

Dwight Schrute · 6 Mar · Openness

Patterns and trends appear after about 6 meetings.

Established 6+ meetings
Direct. Still links back.
Leadership
Leadership

Based on 12 meetings across 3 designers.

Reflection

You consistently let design rationale come from your reports. The area to watch is follow-through: design directions get discussed but don't always land an owner.

Next focus

Try ending design reviews with a clear owner and a date.

What's going well
  • Letting design rationale come from the report 4 meetings
  • Naming trade-offs without taking sides 3 meetings
Worth watching
  • Closing on design ownership before the meeting ends 3 meetings
3.3 The three states.

Imagine the page as margin notes in a notebook. Would you scribble this down to remember? “Wants the tech lead role next.” Yes. “Strong communicator in the meeting.” No.


Information Provenance

Every claim sourced. Every record kept.

The principle that ties the surfaces together.

AI-generated content in Wendi shows where it came from and stays as the AI wrote it. The user can add their own notes alongside, in a separate editable layer. That arrangement keeps the AI’s record auditable.

AI-generated

Meeting summary

You and Dwight covered the shared inbox (unresolved), the onboarding load, and a mention of a tech lead growth path. You committed to think about the growth question. No owner was named for the inbox fix.

Why locked: summaries cite specific moments. We kept this uneditable to maintain truth.

Your notes

What you want to remember

Tech lead path — bring it up first thing next week. He's been in this role for 18 months. Bring two concrete stretches.

Notes are always editable by the user because it was written by the user.


Return behaviour

Trust in the criticism.

PostHog showed both design partners returning to the per-meeting insight after every recording. We didn’t get to interview them about why, but return behaviour like that meant the surface had earned their trust.


Admin

Designing for the HR Lead

The manager opened Leadership Insights before a 1:1, scrolling the whole surface, switching between two reports mid-session. The HR-lead opened it after recordings, clicking through to a person’s name, dwelling on insights for three minutes, then opening Ask Wendi.

The manager Primary user
When
In-session
How
Opened Leadership Insights before a 1:1 as a briefing. Cross-referenced across employees.
Evidence
PostHog: scrolled the whole surface, switched between two people's pages mid-session.
The HR lead Buyer
When
Post-session
How
Opened Leadership Insights after recordings. Clicked through to a person, then into Ask Wendi.
Evidence
PostHog: clicked into named people, opened chat, dwelled on insights for 3+ minutes.

HR leads on the other hand, have different needs. They’re looking for early signals of risk: anything that could escalate to a grievance or a tribunal. The chat queries were about the whole org, and the admin dashboard came from that gap.

But what about privacy and trust? What would a flag look like? The behaviour should not be one of panic, but instead encouragement to check in with a manager or team.

Signals on the admin surface are abstracted from specifics. They point the HR lead toward a person to check in with.

Admin Leadership
🎙 Take Notes Ask Wendi
Admin

3 people 12 meetings

How your managers show up in their 1:1s.

Settings
Dashboard

What Wendi flagged

Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.

URGENT

Flight risk signal detected in Dwight's team

A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.

Flight risk · 24 Mar · Dwight Schrute

Workload concerns raised across Jim's team

Two reports independently flagged unsustainable hours and unrealistic sprint commitments.

Workload · 20 Mar, 17 Mar · Jim Halpert

WATCH

Interpersonal tension flagged in Pam's team

Conflict · 19 Mar · Pam Beesly

NOTED

Peer performance concern raised in Jim's team

Performance · 14 Mar · Jim Halpert

Your managers

Dwight Schrute On track
9 meetings · 4 reports

Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.

View leadership breakdown
Jim Halpert Gap detected
12 meetings · 3 reports

Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.

Hide leadership breakdown
Openness
Clarity
Follow-through
Growth
Ryan Howard hasn't had a 1:1 in 12 days.
Pam Beesly On track
8 meetings · 3 reports

Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.

View leadership breakdown

What I'd take forward.

Research HR leads before designing for the manager. PostHog revealed the HR lead’s use case after launch. I designed leadership insights for the manager and retrofitted admin once the persona split surfaced. Earlier research with HR leads would have surfaced their org-wide need before the manager-centric design shipped.

Question the score. When anyone hands me a scored dataset now, I ask three things. Why did you choose to score this? Based on what? Do users know what that “what” is, or care? If the answer to the third question is no, the score doesn’t belong in the interface.

AI becomes useful when the user can see its sources. I was wary at the start about whether AI could do something helpful with meeting data. The mysticism goes away when the user can see where the claims came from. Structuring the timeline as the source of truth was the move that made the rest of it work.

Name the behaviour, then design. Each design move in this case study was tied to a specific behaviour: returning to a critical surface, picking up a next move from a two-minute skim, checking in with a manager when a flag appears. Naming the behaviour first made every decision testable against something concrete.

Next Case Study

Finding Focus

Designing an AI tool around people, not conversations.