Trust Before Advice

Setting the rules for AI judgement about how someone led

The problem. People mistrust AI when they can't see how it got to its conclusions. How might we surface AI judgement about how someone led that a manager will trust enough to learn from?

What I did. I designed how Wendi's AI surfaces its coaching insights about how someone led, with the manager kept as the authority on their own relationships.

Outcome. Two design partners converted from free trial to paying annually. PostHog showed both checking the per-meeting insight after every recording. That return behaviour was the signal we followed to expand leadership insights across the product.

Leadership insights, at three scales.

One meeting

How did I do in this meeting?

🎙 Take Notes ▾ Ask Wendi

Career check-in

with Dwight Schrute ›

⊕ Share Remove from timeline ▾

Meeting Summary

Leadership Insights

Overview

The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.

› WHAT WORKED

Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
Asked what was behind the frustration before offering solutions.
Reframed the question from "are you leaving?" to "what work do you actually want?"

› TIGHTEN

The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
The "visuals person" framing was named twice without being addressed.

› POSSIBLE ISSUES

Dwight has been quieter in standups for two weeks before this came up.
External coffee chats often precede active interviewing.
The auth flow is the third strategic call given to a peer in three months.

NEXT COACHING MOVE

Try this: Name the decision before the next step

Use when: a report points to a specific decision that landed badly

Script

"Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
"What's the gap between the work you're being given and the work you'd want?"

Why it helps: Names the specific decision before offering a plan. Without it, "let's talk about your career" can read as deflection.

Dimensions Expand all

Clarity & Direction

A career conversation got booked but the substance stayed vague.

→

WHAT WENT WELL

Took Dwight's signal seriously by booking 45 minutes, not the usual 30.

→

WHAT TO IMPROVE NEXT

Tuesday got booked but no specific commitment was named on what would be different.
"We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.

Supportive Coaching Move

When to use: a career conversation gets scheduled but no immediate change is named

Do: name something concrete that changes between now and the next conversation

Script

"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."

Listening & Understanding

Heard what triggered the frustration, but didn't acknowledge it.

Empathy & Respect

Stayed steady when Dwight mentioned external coffee chats.

Coaching & Development

Surfaced the structural question without resolving it.

Multiple reports, over time

How am I doing with my reports?

🎙 Take Notes ▾ Ask Wendi

Insights

Trends

Meetings

Reflection

Across your recent 1:1s, you create space when people push back on decisions or name something they want to own. The area to watch is what happens after: career conversations get booked but the work distribution often doesn't shift.

Next focus

Name what changes between now and the next conversation, not just when it'll happen.

Recent moments

"I've been having coffee chats outside the company. Not interviewing yet."

You stayed even and asked what was behind the frustration. The auth flow reassignment didn't get acknowledged in the meeting.

"Can we talk about what promotion readiness looks like?"

You committed to following up but no specific change to the promotion process was named.

What's going well

Creating space for pushback Seen in 4 meetings

When people raise concerns about decisions you've made, you ask what's behind the concern before defending the call.

From your meetings

○ "I actually think we should re-evaluate the API design before committing to the new schema." — Dwight Schrute, 6 Mar

○ "I need to be honest, I don't think the timeline is realistic." — Pam Beesly, 12 Apr

Letting rationale come from the person Seen in 3 meetings

When someone explains their reasoning, you engage with the rationale rather than overriding it.

Worth watching

Closing on ownership before the meeting ends Seen in 3 meetings

Career conversations and ownership questions get booked for later, but the structural change between now and the next conversation isn't always named.

Acknowledging decisions that land badly Seen in 2 meetings

Specific decisions people raise as having affected them tend to surface but not get named back to them in the meeting.

All managers

Which of my managers need a check-in?

🎙 Take Notes ▾ Ask Wendi

What Wendi flagged

Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.

URGENT

Flight risk signal detected in Dwight's team

A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.

Flight risk · 24 Mar · Dwight Schrute

Workload concerns raised across Jim's team

Two reports independently flagged unsustainable hours and unrealistic sprint commitments.

Workload · 20 Mar, 17 Mar · Jim Halpert

WATCH

Interpersonal tension flagged in Pam's team

Conflict · 19 Mar · Pam Beesly

NOTED

Peer performance concern raised in Jim's team

Performance · 14 Mar · Jim Halpert

Your managers

Dwight Schrute On track

9 meetings · 4 reports

Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.

View leadership breakdown

Jim Halpert Gap detected

12 meetings · 3 reports

Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.

Hide leadership breakdown

Openness

Clarity

Follow-through

Growth

Ryan Howard hasn't had a 1:1 in 12 days.

Pam Beesly On track

8 meetings · 3 reports

Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.

View leadership breakdown

Context

The unscalable part of management.

Coaching is the bit every manager needs and few get.

Small HR teams can’t be in every room coaching managers through every conversation they have. Wendi was built to extend what HR already does - not replace it; to give managers support at a scale one person can’t personally deliver.

The first version of Wendi (a ChatGPT-style tool) didn’t get much traction. Was Wendi a productivity tool that also coached, or a coaching tool powered by productivity data? PostHog showed both design partners opening the leadership page regularly even before we’d designed it properly, so that’s where we doubled down.

The first prototype our engineering team created was built around eight leadership dimensions adapted from Google’s Project Oxygen, scored per meeting. How do you show AI judgement about someone’s leadership in a way they’d trust?

The constraints.

Backend pipeline already built. Eight dimensions scored per meeting. The scoring wasn't up for debate.

Two paying users. Enough to watch behaviour in PostHog. Not enough to run qualitative tests on the leadership surface specifically.

Manager user, HR lead buyer. Two audiences to serve from one surface.

The Problem

Building trust in AI judgement.

How do you surface AI judgement about someone's leadership without triggering the appraisal response?

People mistrust AI because they can’t see how it arrived at what it generated. In coaching, where the AI is making claims about how a person handled a difficult conversation, the design has to earn trust first. The first version of leadership insights looked more like a report card with scores per dimension and prescriptive advice.

The report-card version we redesigned away from.

🎙 Take Notes ▾ Ask Wendi

Overview

You scored 5.5/10 on average across eight leadership dimensions in this meeting. Two dimensions are below threshold and may need attention before the next 1:1.

›WHAT WENT WELL

Asked two open questions before offering an opinion.
Paused to hear concern about shared inbox.

›WHAT TO IMPROVE

Did not commit to a follow-up on the tech lead growth path.
Action items without clear owners.
No reference to company strategy.

›POSSIBLE ISSUES

Dwell time on tactical topics increasing by 18% over 3 meetings.

Risk Signals

urgent flight-risk

Dwight raised a promotion question that went unresolved.

2nd mention of career growth in 3 weeks without committed follow-up.

watching workload

Overload mentioned for the 2nd time.

Next Coaching Move

Try this: Close a career question with a timeboxed commitment

Script

“Let's block time next Tuesday to sketch what the tech lead path looks like. Bring three things you'd want to own.”

Dimensions

Inclusive Environment 7/10

Dwight raised a concern about the shared inbox and you paused to hear it.

Coaching Moments

Before

“Sounds like we're aligned then.”

After

“Is there anything that isn't sitting right?”

Communication 6/10

Weekly check-in had three topic shifts before landing on the growth question.

Coaching 8/10

You asked two open questions before offering an opinion.

Career Development 4/10

Dwight mentioned wanting a tech lead role. You said you'd think about it.

Coaching Moments

Before

“I'll think about it.”

After

“Let's block time next Tuesday to sketch that out.”

Results Oriented 5/10

Three action items emerged but only one had an owner.

Empowerment 6/10

You gave clear direction on the inbox issue.

Vision & Strategy 3/10

No reference to company or team direction in the 24-minute meeting.

Decision Making 5/10

Two decisions were implied but not stated.

A score asks: 7.2 out of what? What if we tried to display the data as an observation instead?

SCORE

"Inclusive Environment: 7.2/10"

OBSERVATION

You go deeper on growth conversations with Dwight than with the rest of your team.

Which one would you come back to?

The Meeting Summary

Every claim, traceable.

The first trust mechanism: tie every AI claim to a source.

Every claim the AI makes in a meeting summary is tied to a transcript line with a timestamp. A claim you can see the source of is one you might decide to trust.

The meeting summary is the manager’s first read after a recording. From interviews with HR leaders, I knew managers were short on time and had a lot competing for their attention. The question was: how could the summary make a manager’s life easier?

Design principle

A manager with two minutes should walk away with their next move.

A chronological transcript-dump won’t get read. I reorganised the summary around outcomes: overview of topics, decisions and agreements, action items with owner chips.

Meeting summary with one bullet linked to its transcript source line — Quick and actionable items replaced the summary. Sources are included as a way to build trust and support Wendi's claims.

Every claim has a source the manager can check. If the AI got something wrong, they can see exactly where.

Per-meeting insight.

How well did I do in this meeting? Is there any pattern I might have missed in the moment?

🎙 Take Notes ▾ Ask Wendi

Career check-in

with Dwight Schrute ›

⊕ Share Remove from timeline ▾

Meeting Summary

Leadership Insights

Overview

The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.

› WHAT WORKED

Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
Asked what was behind the frustration before offering solutions.
Reframed the question from "are you leaving?" to "what work do you actually want?"

› TIGHTEN

The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
The "visuals person" framing was named twice without being addressed.

› POSSIBLE ISSUES

Dwight has been quieter in standups for two weeks before this came up.
External coffee chats often precede active interviewing.
The auth flow is the third strategic call given to a peer in three months.

NEXT COACHING MOVE

Try this: Name the decision before the next step

Use when: a report points to a specific decision that landed badly

Script

"Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
"What's the gap between the work you're being given and the work you'd want?"

Why it helps: Names the specific decision before offering a plan. Without it, "let's talk about your career" can read as deflection.

Dimensions Expand all

Clarity & Direction

A career conversation got booked but the substance stayed vague.

→

WHAT WENT WELL

Took Dwight's signal seriously by booking 45 minutes, not the usual 30.

→

WHAT TO IMPROVE NEXT

Tuesday got booked but no specific commitment was named on what would be different.
"We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.

Supportive Coaching Move

When to use: a career conversation gets scheduled but no immediate change is named

Do: name something concrete that changes between now and the next conversation

Script

"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."

Listening & Understanding

Heard what triggered the frustration, but didn't acknowledge it.

Empathy & Respect

Stayed steady when Dwight mentioned external coffee chats.

Coaching & Development

Surfaced the structural question without resolving it.

2.2 The per-meeting insight.

Leadership Insights

Observations > scores.

How am I doing across all my meetings, and what do I know about each person that might shape how I manage them?

🎙 Take Notes ▾ Ask Wendi

Insights

Trends

Meetings

Reflection

Next focus

Name what changes between now and the next conversation, not just when it'll happen.

Recent moments

"I've been having coffee chats outside the company. Not interviewing yet."

You stayed even and asked what was behind the frustration. The auth flow reassignment didn't get acknowledged in the meeting.

"Can we talk about what promotion readiness looks like?"

You committed to following up but no specific change to the promotion process was named.

What's going well

Creating space for pushback Seen in 4 meetings

When people raise concerns about decisions you've made, you ask what's behind the concern before defending the call.

From your meetings

○ "I actually think we should re-evaluate the API design before committing to the new schema." — Dwight Schrute, 6 Mar

○ "I need to be honest, I don't think the timeline is realistic." — Pam Beesly, 12 Apr

Letting rationale come from the person Seen in 3 meetings

When someone explains their reasoning, you engage with the rationale rather than overriding it.

Worth watching

Closing on ownership before the meeting ends Seen in 3 meetings

Career conversations and ownership questions get booked for later, but the structural change between now and the next conversation isn't always named.

Acknowledging decisions that land badly Seen in 2 meetings

Specific decisions people raise as having affected them tend to surface but not get named back to them in the meeting.

A
Reflection + Next focus Plain prose names what's working and what to watch. No score. The Next focus is one concrete move, not a script.
B
Named source moments Moments are tagged with the person and date. Provenance at the person level, not a flat score.
C
Patterns labelled by frequency Each pattern shows how often it was seen, not a verdict. Soft framing that earns confidence as data accumulates.
D
Evidence drawer Each pattern expands to transcript quotes with a leaf-source icon. Every claim links back to a real moment.

On scoring. Scoring qualitative data has always been difficult to do. I knew we were still going to score in some way on the backend, but the user didn’t need to see it. One wrong score could break trust in the whole tool. Besides, the point of the tool was not to gamify management, but to help managers choose a next step with their reports.

That settled how to present the dimensions. The other question was how many to show at all.

On the eight dimensions. The dimensions were adapted from Google’s Project Oxygen. Eight dimensions on one page asks the user to weigh eight separate signals before they can act. I cut it from eight to four, but I didn’t get to validate this with users.

Before 8 Oxygen dimensions Scored 1–10, per dimension, per meeting

After 4 categories Observational, unscored, scannable

Inclusive EnvironmentCommunication

Openness Are people comfortable disagreeing?

Results OrientedDecision Making

Clarity Did the meeting leave with a clear decision?

Results OrientedEmpowerment

Follow-through Do commitments get closed?

Career DevelopmentCoachingVision & Strategy

Growth focus Is long-term development getting space?

3.1 How eight backend dimensions became four user-facing categories.

On voice. The first AI responses were lengthy and lacked action items a manager could pick up from skimming. They were also too direct, with criticism shaped as instruction. The reaction: ‘I don’t want it to tell me what to do.’

Strengths address the manager directly: “You asked strong follow-up questions.” The AI describes situations, not people. This matters for users who aren’t confrontational, culturally or personally.

INSTRUCTION

"You lost focus."

SITUATION

"The conversation shifted away from the agenda."

Sensitive observations describe what happened, not what the manager did.

Strengths: address the manager directly.

Rejected draft

“Demonstrated strong rapport-building skills.”

Shipped

“You asked strong follow-up questions.”

Sensitive observations: shift to what happened.

Rejected draft

“You lost focus in the middle of the meeting.”

Shipped

“The conversation shifted away from the agenda.”

Coaching moves: name the move, don't script the words.

Rejected draft

“Say this: 'Who owns this?'”

Shipped

“Try this: pick the must-win, name the trade. Use when priorities compete.”

3.2 The three voice rules.

On confidence. Insights depend on how much data we’re able to pull from meeting transcripts, notes and chats. The page can’t speak with confidence after one meeting, but early users still need something useful from it. So I designed it to hedge language until enough meetings accumulated: Empty before any data, Early during accumulation, and established at six. Six was a judgment call I didn’t get to validate.

Empty 0 meetings

Silent. Doesn't pretend.

Leadership

What Wendi noticed in your recent 1:1s.

Based on 0 meetings.

No 1:1s on record yet.

Wendi will start surfacing patterns once you've recorded a few meetings.

Start a recording

Early 1–5 meetings

Hedged. Names the limit.

Leadership

Based on 2 meetings with Dwight.

Reflection

From two meetings, you've created room for Dwight to push back on design decisions. Too early to call it a pattern.

Next focus

Watch whether the rework concern keeps showing up.

Recent moment

"Should we re-evaluate the onboarding IA before committing?"

Dwight Schrute · 6 Mar · Openness

Patterns and trends appear after about 6 meetings.

Established 6+ meetings

Direct. Still links back.

Leadership

Based on 12 meetings across 3 designers.

Reflection

You consistently let design rationale come from your reports. The area to watch is follow-through: design directions get discussed but don't always land an owner.

Next focus

Try ending design reviews with a clear owner and a date.

What's going well

Letting design rationale come from the report 4 meetings
Naming trade-offs without taking sides 3 meetings

Worth watching

Closing on design ownership before the meeting ends 3 meetings

3.3 The three states.

Imagine the page as margin notes in a notebook. Would you scribble this down to remember? “Wants the tech lead role next.” Yes. “Strong communicator in the meeting.” No.

Information Provenance

Every claim sourced. Every record kept.

The principle that ties the surfaces together.

AI-generated content in Wendi shows where it came from and stays as the AI wrote it. The user can add their own notes alongside, in a separate editable layer. That arrangement keeps the AI’s record auditable.

AI-generated

Meeting summary

You and Dwight covered the shared inbox (unresolved), the onboarding load, and a mention of a tech lead growth path. You committed to think about the growth question. No owner was named for the inbox fix.

Why locked: summaries cite specific moments. We kept this uneditable to maintain truth.

Your notes

What you want to remember

Tech lead path — bring it up first thing next week. He's been in this role for 18 months. Bring two concrete stretches.

Notes are always editable by the user because it was written by the user.

Return behaviour

Trust in the criticism.

PostHog showed both design partners returning to the per-meeting insight after every recording. We didn’t get to interview them about why, but return behaviour like that meant the surface had earned their trust.

Admin

Designing for the HR Lead

The manager opened Leadership Insights before a 1:1, scrolling the whole surface, switching between two reports mid-session. The HR-lead opened it after recordings, clicking through to a person’s name, dwelling on insights for three minutes, then opening Ask Wendi.

The manager Primary user

When: In-session
How: Opened Leadership Insights before a 1:1 as a briefing. Cross-referenced across employees.
Evidence: PostHog: scrolled the whole surface, switched between two people's pages mid-session.

The HR lead Buyer

When: Post-session
How: Opened Leadership Insights after recordings. Clicked through to a person, then into Ask Wendi.
Evidence: PostHog: clicked into named people, opened chat, dwelled on insights for 3+ minutes.

HR leads on the other hand, have different needs. They’re looking for early signals of risk: anything that could escalate to a grievance or a tribunal. The chat queries were about the whole org, and the admin dashboard came from that gap.

But what about privacy and trust? What would a flag look like? The behaviour should not be one of panic, but instead encouragement to check in with a manager or team.

Signals on the admin surface are abstracted from specifics. They point the HR lead toward a person to check in with.

🎙 Take Notes ▾ Ask Wendi

What Wendi flagged

Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.

URGENT

Flight risk signal detected in Dwight's team

A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.

Flight risk · 24 Mar · Dwight Schrute

Workload concerns raised across Jim's team

Two reports independently flagged unsustainable hours and unrealistic sprint commitments.

Workload · 20 Mar, 17 Mar · Jim Halpert

WATCH

Interpersonal tension flagged in Pam's team

Conflict · 19 Mar · Pam Beesly

NOTED

Peer performance concern raised in Jim's team

Performance · 14 Mar · Jim Halpert

Your managers

Dwight Schrute On track

9 meetings · 4 reports

Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.

View leadership breakdown

Jim Halpert Gap detected

12 meetings · 3 reports

Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.

Hide leadership breakdown

Openness

Clarity

Follow-through

Growth

Ryan Howard hasn't had a 1:1 in 12 days.

Pam Beesly On track

8 meetings · 3 reports

Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.

View leadership breakdown

What I'd take forward.

Research HR leads before designing for the manager. PostHog revealed the HR lead’s use case after launch. I designed leadership insights for the manager and retrofitted admin once the persona split surfaced. Earlier research with HR leads would have surfaced their org-wide need before the manager-centric design shipped.

Question the score. When anyone hands me a scored dataset now, I ask three things. Why did you choose to score this? Based on what? Do users know what that “what” is, or care? If the answer to the third question is no, the score doesn’t belong in the interface.

AI becomes useful when the user can see its sources. I was wary at the start about whether AI could do something helpful with meeting data. The mysticism goes away when the user can see where the claims came from. Structuring the timeline as the source of truth was the move that made the rest of it work.

Name the behaviour, then design. Each design move in this case study was tied to a specific behaviour: returning to a critical surface, picking up a next move from a two-minute skim, checking in with a manager when a flag appears. Naming the behaviour first made every decision testable against something concrete.

Next Case Study

Finding Focus→

Designing an AI tool around people, not conversations.