Today’s post is about working with AI when the answers sound right but aren’t.
It’s about what happens when we trust fluency too easily, skip reflection, and design in clean rooms instead of the real world.
In short, we’ll explore three things:
The mask that makes AI too convincing.
The mirrors that help us see more clearly.
And the muck—where the real-world design actually happens.
If you work in the design space and have similar concerns, you’re in the right place.
PART I: Masks → What AI Hides Behind Fluency
Let me tell you how this usually starts.
You’re sitting in a design room—maybe it’s a workshop, maybe it’s just Tuesday afternoon. The team’s been talking about building a health support app for older adults. Medication reminders, maybe some family check-ins, maybe a gentle way to track mood.
At some point someone says, “Let’s ask ChatGPT what features we should include.”
So you type:
“Give me five ways to make a mobile app easier for older adults.”
And boom!!!! Two seconds later, here comes the answer:
Use large fonts
Simplify navigation
Avoid jargon
Add voice input
Provide help tips
Everyone nods. “Nice. Let’s put these into the spec doc.”
But right there—that’s the moment the mask slips over your eyes.
What just happened?
You confused clarity with accuracy.
You trusted the AI because it sounded calm, confident, and bullet-pointed.
You stopped asking whether the suggestions were good. You just believed them because they came cleanly packaged.
That’s the mask.
AI wears it well.
It sounds like an expert. It writes like a strategist. But here’s the thing: it doesn’t know your context. It doesn’t know that your users are older adults in rural areas who mostly use Android phones with tiny screens and bad data plans. It doesn’t know that voice input breaks down in loud kitchens or that “help tips” are often ignored unless they blink or buzz.
It doesn’t know the constraints.
It just knows what sounds reasonable.
And worse—if it doesn’t know, it won’t say so. It will fill in the gaps anyway.
The other day, a friend of mine—she runs a research lab out at Hong Kong Science and Technology Park—shared a story that really stuck with me. Her team was running interviews with older adults who use community support apps. Later, they pasted the transcripts into ChatGPT and asked it to summarize key themes.
What came back sounded polished—like a textbook summary.
But here’s the twist: two of the “themes” the AI pulled out? They weren’t actually present in any of the transcripts. At all. Not once.
No ill intent. Just prediction. The model filled in what should have been said, based on how these stories usually go.
Because it’s trained to complete patterns, not to tell the truth.
So here’s the first lesson when designing with AI:
Don’t believe the mask.
Don’t mistake fluency for knowledge.
Don’t assume confidence equals correctness.
Don’t treat neat output as real insight.
Because fluency is the costume. The AI wears it well. But beneath it, there’s just probability.
That’s why we need something stronger. Something that reflects, not performs.
That’s where the mirrors come in.
PART II: Mirrors → How to Reflect, Challenge, and Clarify
If the mask is what makes AI sound right when it might be wrong, then mirrors are what help us see the truth in our thinking, especially the parts we usually skip past.
A mirror doesn’t just show you what’s there. A good mirror shows you what you’re not looking at.
There are three kinds of mirrors I use when designing with AI. Each helps me shift my perspective and ask better questions. Each one interrupts certainty at just the right time.
Let’s take them one by one.
Mirror One: Inversion (Thinking Backwards)
This is the simplest mirror, and maybe the most uncomfortable.
It works like this: instead of asking “How do we succeed?” you ask “How do we fail?”
You take the goal and you flip it.
Let’s go back to that older adults app—say it’s focused on helping users track medication and stay connected to their caregiver. You want to make it “easy to use.”
Fine.
But instead of asking:
“What would make this easier?”
Ask this instead:
“What would make this app really frustrating for a 72-year-old who lives alone and wears reading glasses?”
That question changes everything.
You prompt the AI:
“Give me 5 ways to make a medication-tracking app hard for older adults to use.”
And here’s what you’ll get:
Use small text
Require log-in every time
Hide the ‘remind me later’ button
Force location permissions before use
Use swipe gestures with no visual labels
And now you’re seeing it.
These aren’t just bad features.
They are mistakes waiting to happen—real-world, design-breaking mistakes that might slip in precisely because they’re common.
Now you flip each one:
Make text adjustable
Enable persistent log-in
Offer clear and skippable reminders
Delay permission requests until necessary
Use large, labeled buttons instead of swipes
See what happened?
By looking at what would make the app fail, you surfaced unspoken assumptions.
Then, you turned those into safeguards.
This is what I love about Inversion: it forces you into honesty and promotes critical thinking.
It reveals things you thought you already understood.
And AI is great at this. It doesn’t hesitate to offer bad ideas when you ask for them. It doesn’t flinch at “how to ruin this.” So use that.
Get the failures out early—then flip them into insights.
Mirror Two: Expert Pairing (Introducing Useful Conflict)
Here’s the second mirror. This one’s about contradiction—on purpose.
When you're designing anything halfway meaningful, you’re not just balancing “user need.” You’re balancing between:
What’s usable
What’s legal
What’s ethical
What’s technically doable
What makes sense to people under stress, fatigue, or fear
And no one person holds all of that.
That’s where Expert Pairing comes in.
Let’s take our older adults app again. You’re thinking of adding a video check-in feature, so family members can see how their loved ones are doing.
Sounds great on the surface.
But now you prompt the AI like this:
“Play the role of a UX designer, a privacy officer, and an 80-year-old user. Each gives their view on adding daily video check-ins to a caregiving app.”
And now you get this:
Designer: “It adds connection and accountability. A quick ‘face-to-face’ check improves trust.”
Privacy Officer: “We need explicit consent. Recording opens up risk. What if the user feels pressured to accept?”
Older User: “I don’t want to be seen every day. Sometimes I’m not well, or I’m still in pajamas.”
What you’ve done is create constructive disagreement.
Each voice brings a different layer of truth:
The designer sees opportunity.
The officer sees risk.
The user sees dignity.
Now you're not just designing a feature. You're negotiating a boundary.
And you haven’t even left the AI prompt window.
This is what I mean by Expert Pairing. You simulate competing lenses. You get contradiction before you get consensus. And in design, that’s a gift.
Because if everyone’s nodding, you’re probably missing something.
Here’s another version. Let’s say your team is trying to simplify the app’s language. You prompt:
“Have a health literacy researcher and a caregiver debate whether to use the term ‘dose reminder’ or ‘pill alert.’”
They’ll argue. That’s good.
You’ll hear nuances about clarity, tone, even stigma.
The point isn’t to “win” the argument.
It’s to hear what your blind spots sound like when they speak.
AI can’t replace real people. But it can simulate the tensions they bring into the room, and that makes it a powerful reflection tool.
Mirror Three: Reality Filtering (Preventing Beautiful Nonsense)
The last mirror is less about perspective and more about precision.
It’s called Reality Filtering.
AI can make things up. We know this. The problem is that it doesn’t look like fiction. It looks helpful. Reasonable. Finished.
So you need a way to inject skepticism into the process.
Let’s say you’ve collected open-ended survey responses from 20 older adults using your app. You ask the AI:
“What are the key concerns users raised?”
And it responds:
“Users are confused by notifications.”
“They find the interface overwhelming.”
“They want more control over privacy settings.”
Fine. But are these things actually in the data?
You don’t know—unless you check.
So now you say:
“For each insight, label whether it is [Verified], [Unverified], or [Speculative], based only on the input data.”
This is Reality Filtering.
It tells the AI:
Don’t summarize. Interrogate.
Only say what’s there. Flag what isn’t. Show your uncertainty.
Now your insights look more like:
“Notifications caused confusion for 5 respondents. [Verified]”
“Interface overwhelm mentioned once. [Unverified as trend]”
“Privacy control not mentioned. [Speculative]”
That’s clarity. That’s the difference between trusting your design and just liking the sound of it.
You can even push it further:
“What ideas in your previous answer require validation through usability testing?”
And the AI will start to backpedal, in a good way. It’ll say:
“We assumed the term ‘pill alert’ was unclear—needs testing.”
“We suggested default video check-ins—requires user opt-in testing.”
You’re teaching the machine to second-guess itself.
And in turn, you’re learning to second-guess your own comfort.
That’s what Reality Filtering gives you:
A way to resist beautiful nonsense.
To sort the polished from the proven.
To know where your confidence is coming from—and where it’s hiding uncertainty.
PART III: Muck → The Place Where Design Actually Happens
So now we’ve peeled off the mask—we’ve seen how AI’s smooth language can blind us.
We’ve used the mirrors—we’ve inverted the obvious, surfaced conflicts, and filtered the noise.
And now we’re here.
Where every real project eventually lands.
The muck.
Let me bring this down to earth.
Last year, I was speaking with a group of UX researchers in Hong Kong—colleagues working on a community health project focused on exergames for older adults. It wasn’t about fancy wearables or VR. Just low-cost tablet-based games to encourage light physical movement—leg lifts, arm rotations, stretching.
Simple on paper.
They’d used generative AI early on to help brainstorm features:
Visual progress tracking
Gentle auditory cues
Motivational feedback
Group leaderboards
All plausible. All clean.
But once they started testing with real older adults in Kowloon and Shatin, everything got… sticky.
Here’s what they found:
Auditory cues—great in theory—were ignored because many older adults muted their tablets by default to avoid “bothering the neighbors.”
Leaderboards, which sounded motivating in the prompt, created shame. Some participants said they didn’t want to play if they saw their name near the bottom.
The progress charts, full of color and medals, were read by caregivers—but not by the older users themselves. Many said, “I just want to know if I’m doing okay.”
That’s the muck.
The part no prompt prepares you for.
Where insights contradict each other.
Where “best practices” fall apart on contact with real lives.
And here’s the truth: no amount of ideation or simulation—or even validation—can fully eliminate this mess.
Because real people aren’t consistent. They don’t fit into templates.
They feel pride and embarrassment. They turn off sound because they worry about disturbing someone. They skip instructions, or misread them, or decide the whole thing feels childish.
And that’s not failure.
That is the design terrain.
What matters is how we show up to it.
If you’ve used the mirrors—if you’ve flipped the assumptions, heard multiple sides, and filtered for what’s real—then when you step into the muck, you’re ready to listen.
Not react. Not rush. Just stay there long enough to see what the design actually needs to become.
That’s what my colleagues in Hong Kong did.
They dropped the leaderboard.
They added a “quiet mode” with haptic cues instead of sound.
They changed the visual feedback from medals to simple green checkmarks that said: “You’ve done your movement today.”
And you know what?
Participation went up.
Not because the AI was wrong. But because the real wisdom emerged in the contradiction.
The muck isn’t what you clean up after design.
The muck is where design happens.
So here’s where we end:
Use the mask to notice what looks too smooth.
Use the mirrors to break your own certainty.
Then step into the muck—and stay there.
Because that’s where your work becomes worthy of the people you’re building for.
🚨 If you’d like to take these ideas further, we’re hosting a masterclass on “preventing AI harm in the real world” on August 28 from 10 AM to 1 PM EDT.
It’s hands-on, grounded in real-world insights, and designed for people building and working with AI systems.
Seats are limited, so we hope you’ll consider joining us.