Trust Before Advice
Setting the rules for AI judgement about how someone led
The problem. People mistrust AI when they can't see how it got to its conclusions. How might we surface AI judgement about how someone led that a manager will trust enough to learn from?
What I did. I designed how Wendi's AI surfaces its coaching insights about how someone led, with the manager kept as the authority on their own relationships.
Outcome. Two design partners converted from free trial to paying annually. PostHog showed both checking the per-meeting insight after every recording. That return behaviour was the signal we followed to expand leadership insights across the product.
How did I do in this meeting?
The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.
- Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
- Asked what was behind the frustration before offering solutions.
- Reframed the question from "are you leaving?" to "what work do you actually want?"
- The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
- A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
- The "visuals person" framing was named twice without being addressed.
- Dwight has been quieter in standups for two weeks before this came up.
- External coffee chats often precede active interviewing.
- The auth flow is the third strategic call given to a peer in three months.
- "Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
- "What's the gap between the work you're being given and the work you'd want?"
WHAT WENT WELL
- Took Dwight's signal seriously by booking 45 minutes, not the usual 30.
WHAT TO IMPROVE NEXT
- Tuesday got booked but no specific commitment was named on what would be different.
- "We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.
"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."
How am I doing with my reports?
Which of my managers need a check-in?
What Wendi flagged
Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.
Flight risk signal detected in Dwight's team
A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.
Workload concerns raised across Jim's team
Two reports independently flagged unsustainable hours and unrealistic sprint commitments.
Interpersonal tension flagged in Pam's team
Peer performance concern raised in Jim's team
Your managers
Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.
Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.
Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.
The unscalable part of management.
Coaching is the bit every manager needs and few get.Small HR teams can’t be in every room coaching managers through every conversation they have. Wendi was built to extend what HR already does - not replace it; to give managers support at a scale one person can’t personally deliver.
The first version of Wendi (a ChatGPT-style tool) didn’t get much traction. Was Wendi a productivity tool that also coached, or a coaching tool powered by productivity data? PostHog showed both design partners opening the leadership page regularly even before we’d designed it properly, so that’s where we doubled down.
The first prototype our engineering team created was built around eight leadership dimensions adapted from Google’s Project Oxygen, scored per meeting. How do you show AI judgement about someone’s leadership in a way they’d trust?
Building trust in AI judgement.
How do you surface AI judgement about someone's leadership without triggering the appraisal response?People mistrust AI because they can’t see how it arrived at what it generated. In coaching, where the AI is making claims about how a person handled a difficult conversation, the design has to earn trust first. The first version of leadership insights looked more like a report card with scores per dimension and prescriptive advice.
Overview
You scored 5.5/10 on average across eight leadership dimensions in this meeting. Two dimensions are below threshold and may need attention before the next 1:1.
›WHAT WENT WELL
- Asked two open questions before offering an opinion.
- Paused to hear concern about shared inbox.
›WHAT TO IMPROVE
- Did not commit to a follow-up on the tech lead growth path.
- Action items without clear owners.
- No reference to company strategy.
›POSSIBLE ISSUES
- Dwell time on tactical topics increasing by 18% over 3 meetings.
Risk Signals
Dwight raised a promotion question that went unresolved.
2nd mention of career growth in 3 weeks without committed follow-up.
Overload mentioned for the 2nd time.
Try this: Close a career question with a timeboxed commitment
“Let's block time next Tuesday to sketch what the tech lead path looks like. Bring three things you'd want to own.”
Dimensions
“Sounds like we're aligned then.”
“Is there anything that isn't sitting right?”
“I'll think about it.”
“Let's block time next Tuesday to sketch that out.”
A score asks: 7.2 out of what? What if we tried to display the data as an observation instead?
"Inclusive Environment: 7.2/10"
You go deeper on growth conversations with Dwight than with the rest of your team.
Which one would you come back to?
Every claim, traceable.
The first trust mechanism: tie every AI claim to a source.Every claim the AI makes in a meeting summary is tied to a transcript line with a timestamp. A claim you can see the source of is one you might decide to trust.
The meeting summary is the manager’s first read after a recording. From interviews with HR leaders, I knew managers were short on time and had a lot competing for their attention. The question was: how could the summary make a manager’s life easier?
A manager with two minutes should walk away with their next move.
A chronological transcript-dump won’t get read. I reorganised the summary around outcomes: overview of topics, decisions and agreements, action items with owner chips.
The meeting began at 10:03. You joined and asked how Dwight's week had been going. Dwight said it had been fine but that he was feeling slightly overloaded because of the onboarding work. You asked if there was anything specific blocking him and Dwight said that the shared inbox was still unclear to him. They discussed the shared inbox for several minutes. Dwight explained that the handover process from the previous team member hadn't fully transferred knowledge about which messages needed same-day responses. You suggested he look at the documentation and said you would check in with the team lead about it. Dwight said that would be helpful and that he would follow up by end of week.
You then moved on to the Q3 targets. Dwight said he felt the targets were achievable but that the timeline was tight given the current headcount. You said the timeline was fixed and suggested he prioritise accordingly. Dwight asked about whether there was any flexibility on scope and you said you would look into it. Dwight said that would be helpful. You both agreed to revisit scope in the next planning session.
You then asked about Dwight's growth plan. Dwight mentioned he had been thinking about moving into a tech lead role within the next year and that he wasn't sure if that was something you supported. You said you would think about it. Dwight said he appreciated that. You asked if there was anything else.
Dwight mentioned that the team retro from last week had surfaced some concerns about the sprint planning process. You asked what specifically. He said some team members felt there wasn't enough time to discuss blockers before being assigned tickets. You said you would raise it in the next planning meeting. Dwight said that would help. You agreed to revisit this in the next 1:1.
You asked about the product launch preparation. Dwight said the engineering work was on track but he was worried about the QA timeline. You asked him to estimate how much runway they needed and he said ideally two more weeks. You said you would talk to the PM. Dwight said thanks.
The meeting ended at 10:27.
Every claim has a source the manager can check. If the AI got something wrong, they can see exactly where.
Per-meeting insight.
How well did I do in this meeting? Is there any pattern I might have missed in the moment?
The auth flow reassignment came up as the trigger but didn't get acknowledged directly. A career conversation got booked without that acknowledgement, which can read as deferral.
- Held space when Dwight mentioned the external coffee chats. Did not pivot to retention talk.
- Asked what was behind the frustration before offering solutions.
- Reframed the question from "are you leaving?" to "what work do you actually want?"
- The auth flow reassignment came up but didn't get acknowledged directly in the meeting.
- A 45-minute Tuesday slot got booked, but no specific change was named between now and then.
- The "visuals person" framing was named twice without being addressed.
- Dwight has been quieter in standups for two weeks before this came up.
- External coffee chats often precede active interviewing.
- The auth flow is the third strategic call given to a peer in three months.
- "Looking back, reassigning the auth flow without talking to you first was a call I should have made differently. What would have made that decision feel right?"
- "What's the gap between the work you're being given and the work you'd want?"
WHAT WENT WELL
- Took Dwight's signal seriously by booking 45 minutes, not the usual 30.
WHAT TO IMPROVE NEXT
- Tuesday got booked but no specific commitment was named on what would be different.
- "We'll plan your career" can read as deferral. Dwight may leave Tuesday with the same uncertainty.
"Between now and Tuesday, here's what I'll do. Here's what changes about how strategic work gets distributed."
Observations > scores.
How am I doing across all my meetings, and what do I know about each person that might shape how I manage them?
- A Reflection + Next focus Plain prose names what's working and what to watch. No score. The Next focus is one concrete move, not a script.
- B Named source moments Moments are tagged with the person and date. Provenance at the person level, not a flat score.
- C Patterns labelled by frequency Each pattern shows how often it was seen, not a verdict. Soft framing that earns confidence as data accumulates.
- D Evidence drawer Each pattern expands to transcript quotes with a leaf-source icon. Every claim links back to a real moment.
On scoring. Scoring qualitative data has always been difficult to do. I knew we were still going to score in some way on the backend, but the user didn’t need to see it. One wrong score could break trust in the whole tool. Besides, the point of the tool was not to gamify management, but to help managers choose a next step with their reports.
That settled how to present the dimensions. The other question was how many to show at all.
On the eight dimensions. The dimensions were adapted from Google’s Project Oxygen. Eight dimensions on one page asks the user to weigh eight separate signals before they can act. I cut it from eight to four, but I didn’t get to validate this with users.
On voice. The first AI responses were lengthy and lacked action items a manager could pick up from skimming. They were also too direct, with criticism shaped as instruction. The reaction: ‘I don’t want it to tell me what to do.’
Strengths address the manager directly: “You asked strong follow-up questions.” The AI describes situations, not people. This matters for users who aren’t confrontational, culturally or personally.
"You lost focus."
"The conversation shifted away from the agenda."
Sensitive observations describe what happened, not what the manager did.
Strengths: address the manager directly.
“Demonstrated strong rapport-building skills.”
“You asked strong follow-up questions.”
Sensitive observations: shift to what happened.
“You lost focus in the middle of the meeting.”
“The conversation shifted away from the agenda.”
Coaching moves: name the move, don't script the words.
“Say this: 'Who owns this?'”
“Try this: pick the must-win, name the trade. Use when priorities compete.”
On confidence. Insights depend on how much data we’re able to pull from meeting transcripts, notes and chats. The page can’t speak with confidence after one meeting, but early users still need something useful from it. So I designed it to hedge language until enough meetings accumulated: Empty before any data, Early during accumulation, and established at six. Six was a judgment call I didn’t get to validate.
What Wendi noticed in your recent 1:1s.
No 1:1s on record yet.
Start a recordingFrom two meetings, you've created room for Dwight to push back on design decisions. Too early to call it a pattern.
Watch whether the rework concern keeps showing up.
"Should we re-evaluate the onboarding IA before committing?"
Patterns and trends appear after about 6 meetings.
You consistently let design rationale come from your reports. The area to watch is follow-through: design directions get discussed but don't always land an owner.
Try ending design reviews with a clear owner and a date.
- Letting design rationale come from the report 4 meetings
- Naming trade-offs without taking sides 3 meetings
- Closing on design ownership before the meeting ends 3 meetings
Imagine the page as margin notes in a notebook. Would you scribble this down to remember? “Wants the tech lead role next.” Yes. “Strong communicator in the meeting.” No.
Every claim sourced. Every record kept.
The principle that ties the surfaces together.AI-generated content in Wendi shows where it came from and stays as the AI wrote it. The user can add their own notes alongside, in a separate editable layer. That arrangement keeps the AI’s record auditable.
Meeting summary
You and Dwight covered the shared inbox (unresolved), the onboarding load, and a mention of a tech lead growth path. You committed to think about the growth question. No owner was named for the inbox fix.
Why locked: summaries cite specific moments. We kept this uneditable to maintain truth.
What you want to remember
Tech lead path — bring it up first thing next week. He's been in this role for 18 months. Bring two concrete stretches.
Notes are always editable by the user because it was written by the user.
Trust in the criticism.
PostHog showed both design partners returning to the per-meeting insight after every recording. We didn’t get to interview them about why, but return behaviour like that meant the surface had earned their trust.
Designing for the HR Lead
The manager opened Leadership Insights before a 1:1, scrolling the whole surface, switching between two reports mid-session. The HR-lead opened it after recordings, clicking through to a person’s name, dwelling on insights for three minutes, then opening Ask Wendi.
- When
- In-session
- How
- Opened Leadership Insights before a 1:1 as a briefing. Cross-referenced across employees.
- Evidence
- PostHog: scrolled the whole surface, switched between two people's pages mid-session.
- When
- Post-session
- How
- Opened Leadership Insights after recordings. Clicked through to a person, then into Ask Wendi.
- Evidence
- PostHog: clicked into named people, opened chat, dwelled on insights for 3+ minutes.
HR leads on the other hand, have different needs. They’re looking for early signals of risk: anything that could escalate to a grievance or a tribunal. The chat queries were about the whole org, and the admin dashboard came from that gap.
But what about privacy and trust? What would a flag look like? The behaviour should not be one of panic, but instead encouragement to check in with a manager or team.
Signals on the admin surface are abstracted from specifics. They point the HR lead toward a person to check in with.
What Wendi flagged
Across 29 meetings this month, follow-through is the area that needs the most attention. Open action items resurfaced in 5 meetings without resolution.
Flight risk signal detected in Dwight's team
A report expressed dissatisfaction with growth opportunities and mentioned exploring other options.
Workload concerns raised across Jim's team
Two reports independently flagged unsustainable hours and unrealistic sprint commitments.
Interpersonal tension flagged in Pam's team
Peer performance concern raised in Jim's team
Your managers
Meetings are consistent and well-structured. His team is particularly open. They bring issues proactively rather than waiting to be asked.
Action items are resurfacing unresolved. Two reports mentioned waiting on follow-ups. Shorter meetings sometimes end without clear next steps.
Clearest meeting structure of the three. Having the most growth-focused conversations, though reports tend to be more guarded. They wait to be asked.
What I'd take forward.
Research HR leads before designing for the manager. PostHog revealed the HR lead’s use case after launch. I designed leadership insights for the manager and retrofitted admin once the persona split surfaced. Earlier research with HR leads would have surfaced their org-wide need before the manager-centric design shipped.
Question the score. When anyone hands me a scored dataset now, I ask three things. Why did you choose to score this? Based on what? Do users know what that “what” is, or care? If the answer to the third question is no, the score doesn’t belong in the interface.
AI becomes useful when the user can see its sources. I was wary at the start about whether AI could do something helpful with meeting data. The mysticism goes away when the user can see where the claims came from. Structuring the timeline as the source of truth was the move that made the rest of it work.
Name the behaviour, then design. Each design move in this case study was tied to a specific behaviour: returning to a critical surface, picking up a next move from a two-minute skim, checking in with a manager when a flag appears. Naming the behaviour first made every decision testable against something concrete.
Finding Focus→
Designing an AI tool around people, not conversations.