Not in the Job Description But Necessary

Emerging UX practices in responsible AI despite the lack of authority, time, or recognition

Reza H. Mogavi, Ph.D.

and

Lennart Nacke, PhD

Jun 28, 2025

Last Tuesday, in our live webinar, we talked about Responsible AI.

Responsible AI is a framework for ensuring that AI is developed and deployed in a safe, trustworthy, and ethical manner.

It is not a secret anymore that AI systems are shaping decisions at scale. They write our text, recommend our actions, even mediate hiring and healthcare. And yet, despite enormous investment in algorithmic fairness and responsible AI tooling, socially harmful outcomes persist. Why?

One major reason: responsible AI efforts have focused on models, not products. On math, not context. While engineers optimize fairness scores, the social consequences of AI are often determined elsewhere—at the interface, in the workflow, during the design of interaction itself.

That’s where UX professionals come in. Designers, researchers, content strategists—they’re close to the user, close to the product, and increasingly pulled into the ethical core of AI development. Not by mandate, but by necessity.

Recent research maps this terrain. It reveals a set of emerging UX practices that respond to the real-world demands of building and using AI responsibly. These are not abstract ideals. They’re lived methods—often improvised, sometimes invisible, but deeply consequential.

In this regard, research identifies three key domains: cultivating a responsible AI lens, adapting prototyping practices, and rethinking evaluation strategies. And behind all of them is a structural reality: UX practitioners are shouldering ethical labor without clear authority, time, or institutional support.

1) Cultivating a Responsible AI Lens

Let’s start where many teams begin: reframing what matters.

What we see depends on how we frame. Responsible AI begins not with answers, but with the courage to ask better questions.

Responsible design doesn't start with choosing colors. It starts with choosing which questions are even asked. Rather than starting from user goals alone, teams shift focus toward who could be harmed, silenced, misrepresented, or excluded.

This isn’t about adding a risk slide or a makeshift feature at the end. It’s about embedding ethical reflection into early design rituals. In kickoff meetings, storyboarding, product briefs. It's about expanding the problem space to include unintended use, abuse, and social asymmetries.

A small but powerful shift: changing how prompts are worded in design workshops. Instead of “What’s a great experience?”, teams ask “What’s a harm scenario we haven’t accounted for?” Instead of designing for success alone, they design to prevent harm.

Tools are being adapted accordingly. Harm mapping templates, value tension worksheets, ethical design canvases. Some teams even redesign internal sprint templates to inject ethical cues, placing harm analysis next to feature scoping.

Language itself becomes infrastructure. Words like “users,” “customers,” or “personas” are being interrogated. Who's not represented? Whose pain isn't being tested? Teams are learning to speak explicitly about marginalized users, not hypothetical edge cases.

And at a meta-level, UX roles are evolving: no longer just architects of usability, but stewards of societal framing. Responsible AI begins with what designers allow themselves to see—and say—early in the process.

2) Prototyping with the Model, Not Just the UI

With AI systems, prototyping no longer means wireframing alone. The model is the interface. The backend behavior shapes the user experience as much as layout or tone.

Prototyping means working directly with the model, testing its behavior, and designing for how it might fail.

That’s especially true with generative systems. Language models don’t behave like static engines. They improvise. They guess. They hallucinate. Which means prototyping becomes part adversarial testing, part psychological modeling.

One key shift: instead of mocking up screens and hoping the model behaves, teams prototype directly with the live model. Prompt engineering is treated as UX design. Teams create “prompt variants” the way traditional designers create button variants.

This means design sessions often involve real-time prompt probing—trying out edge cases, ambiguous queries, emotionally loaded inputs. The goal isn't just to see what works, but to expose what fails gracefully, and what fails dangerously.

In practice, prototyping becomes a form of ethical red-teaming. Designers deliberately test limits: Can the AI be coaxed into giving bad advice? Can it be tricked? Does it subtly reflect societal stereotypes? Can it be abused with minimal effort?

But the prototyping doesn’t stop at model behavior. Teams move to shaping user perception: how do users understand what the model is doing? How do we signal when it’s uncertain? Should we make the system seem smart or visibly fallible?

Designers are building affordances to regulate trust. Constraining inputs. Disabling ambiguous queries. Adding labels like “AI-generated” or “draft only.” One tactic: introduce intentional latency or hesitation to signal the model is deliberating—even when it's not. Why? Because users believe in fluency. And fluency creates overtrust.

There's also a growing concern about over-polishing. A seamless UI wrapped around an unreliable model can create a façade of credibility. This isn't just bad UX—it’s a social liability. Prototyping responsibly means refusing to aestheticize unreliability.

3) Evaluation as Harm Discovery

Traditional usability testing is ill-equipped for AI systems. It tests satisfaction, not safety. It tracks friction, not fallout. That’s a problem when your product can be used to generate hate speech, reinforce bias, or quietly deceive users.

Testing for usability shows what works. Testing for harm reveals who it doesn’t work for.

So evaluation practices are being rebuilt. Not to test tasks, but to simulate harms.

Methods are evolving. Think fake news generation tasks, role-played abuse scenarios, and ethical storyboarding. Teams test for misuse, not just use. They ask users to try breaking the system. To imagine how it could be used maliciously. And they listen carefully to how users interpret AI output—even when it's wrong.

A key tactic: “misuse rehearsals.” These involve asking, “If someone wanted to trick or weaponize this system, how would they do it?” This doesn't just reveal design flaws. It helps prioritize defenses.

Testing for harms also means expanding who gets tested. Too often, AI systems are evaluated by and on populations that are most convenient—not the ones most at risk. RAI evaluation includes hard questions about representation: Who’s testing this? Whose safety is being measured?

And time matters. Responsible evaluation recognizes that some harms are slow. They emerge after repetition. After reliance. After users stop paying attention. That’s why longitudinal observation, real-world simulation, and user feedback loops are all being explored as next-generation evaluation methods.

Evaluation becomes a moral practice—not just a research method.

Organizational Realities

Now, none of these practices happen in a vacuum.

UX teams doing responsible AI work often face invisible barriers. Not because anyone tells them to stop, but because nothing is structured to support them when they start.

Time is a constant constraint. Fast product cycles don’t leave room for ethical exploration. Reflection gets deprioritized, reframed as “not on the critical path.” So teams squeeze RAI work between tasks—renaming it as “risk reduction,” “user confusion,” or “support deflection” just to get it on the roadmap.

Power is another issue. Many UX professionals involved in RAI don’t hold decision-making authority. They can flag harms, but can’t stop launches. They can suggest mitigations, but can't require them. This creates an ethical dependency on relationships—how persuasive the designer is, how open the PM is.

And then there’s the emotional weight. When you see harm coming, but can’t redirect the product. When you raise a concern, but it vanishes into a Slack thread. Responsible AI labor is often lonely, unacknowledged, and emotionally taxing.

Which leads to the deepest challenge: visibility. RAI work done by UX is often undocumented, uncategorized, and unrewarded. It doesn’t show up in sprint reports, OKRs, or performance reviews. And yet—it is often the only thing preventing harm.

What Now?

This is the part where we zoom out. We’ve seen how responsibility in AI is being shaped on the ground—quietly, creatively, structurally. So where do we go from here?

First: Stop separating ethics from design.

RAI is not a checklist. It’s not a standalone role. It’s a property of the whole design and development process. And that means we need to treat it as everybody’s job, not just compliance teams or fairness analysts.

What if every UX project brief included a “societal harm hypothesis”? What if product retros asked, “Who did we overlook?” What if feature acceptance criteria included thresholds for user misunderstanding?

Second: Build design tools that think beyond functionality.

Today’s UX tools are built for flow, polish, and speed. But responsible AI work needs tools for friction, ambiguity, harm anticipation, dissent. Tools that support slow thinking. That give teams permission to explore what shouldn’t be built.

Imagine a Figma plugin that nudges you to imagine marginalized personas. Or a prompt tester that flags when your input assumptions skew too normative. Tooling can be a form of pedagogy.

Third: Legitimize ethical labor.

This work must be recognized. That means formal roles, clear ownership, and organizational memory. RAI should not depend on who happens to be on the team that quarter.

It also means leadership needs to reward—not punish—ethical escalation. Practitioners should be able to say “this might harm someone” without needing to translate that into OKRs.

And finally:

Fourth: Treat care as infrastructure.

Responsible AI is not just about technical robustness. It’s about caring who gets hurt. Caring enough to change course. To slow down. To ask better questions.

Let’s stop acting like that’s extra!!! That’s the work!!!

Thank you for staying with us until the end of this deep dive. If you are interested in continuing the conversation live, we're running our famous masterclass again on Monday, June 30, 2025, at 7 PM EDT. Only a handful of seats remain. Last call to reserve yours if you haven’t already.

References

[1] Stanford Institute for Human-Centered Artificial Intelligence. (2025). Artificial Intelligence Index Report 2025: Chapter 3 — Responsible AI.

[2] Qiaosi Wang, Michael Madaio, Shaun Kane, Shivani Kapania, Michael Terry, and Lauren Wilcox. 2023. Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 249, 1–16. https://doi.org/10.1145/3544548.3581278

[3] Christian Djeffal. 2025. Reflexive Prompt Engineering: A Framework for Responsible Prompt Engineering and AI Interaction Design. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25). Association for Computing Machinery, New York, NY, USA, 1757–1768. https://doi.org/10.1145/3715275.3732118