• UX for AI
  • Posts
  • Arize Observe Agentic AI Conference: Hot Takes from a UX Hacker

Arize Observe Agentic AI Conference: Hot Takes from a UX Hacker

Notes from the top AI industry open-source vendor conference

Key Take-Aways

Agents, agents everywhere – and lots for you and me! Every single major player (AWS, Microsoft, Google, etc.) presented their new Agentic development frameworks featuring hot-off-the-press integrations with Arize. AWS claimed their new Agentic framework allows “producing production-level agents in just hours.”

Rapid Agentic development methodology is to “break forward with a method” and “stabilize with an eval.” In other words, spike, then use built-in Arize QA automation to rapidly improve the quality.

Agents' eval is the moat – it is a Rubicon divide between “cool demo” and “production.” Not surprisingly, since Arize is the eval solution, agent eval this was the focus of the conference. However, I was surprised by how every speaker mentioned the importance of repeatable, consistent output, usually by using RAG files, and how quantifying the eval is the inflection point of going to production. 

Generic metrics are not a quantifier of quality – “20% hallucination” metric is almost useless as-is. Instead of relying on generic metrics as a measure of quality, use them to continuously prioritize and troubleshoot the RAG content pipelines and improve accuracy. 

Humans need a chance to correct the agent's output — opting for human-in-the-loop agentic actions rather than complete AI-driven automation use cases is the key to early success and scale. It decreases the cost of false positives/negatives, while multiplying the impact because of higher adoption due to lower risk, and provides almost the same ROI as full automation. Full automation may not (yet) be ready for mission-critical workflows. 

The team should use a multi-agent breakdown to role-play the user flow interaction. Simply doing this with LLM first will give your team a single-thread execution path plan and allow your team to sharpen their thinking about prompts at each agent hand-off.

Memories > Models. Increasingly, the value of agents is the memory. All of the bad guesses that were corrected, all of the experience in solving problems, etc. – memory is the key. Models are increasingly becoming a commodity. LLM models will soon be replaceable, so you will be able to plug in the next model (think ChatGPT 10, Claude 10, Gemini 10, etc.) into the same “perpetual agent” while memories are preserved. Perpetual agents will likely hit prime time next year. Like upgrading your computer while continuing to work with the same data.

Hot takes

  • “Agentic” is a sliding scale, a spectrum, from simply running a prompt to fully autonomous. 

  • The primary benefit of agents might be “to remove prompting pressure from users.”  

  • Most software will adapt for use by both users and agents.

  • The agent model will write its own tools to interact with the software to do what you want. You have to give it some good prompts, though.

  • Prompts are code.

  • Vibe coding = coding. There is no distinction between the two.

  • LLM as a Judge is a fundamentally different mode – analysis vs generation. Even for the same model. So yes, you can do it in the same prompt. Most of the time. Experiment.

  • Arize will help you write an automated “LLM as a Judge” prompt.

What this means for UX

First, off, “UX” may be becoming a skill instead of a profession:

Modern teams will be organized by skills, not roles – instead of a traditional role breakdown (UX, PM, Dev, Data Science, etc.), people will self-organize by skills (I want to talk to users, I can vibe code, etc.).

Which means UXers need to get involved in all aspects of agentic AI:

  • Identify low-hanging fruit workflows for AI agents (The entire Part 1 of our book is devoted to this)

  • Assist the team in identifying use cases and constructing RAG files that address them (more on exactly how to do this in the upcoming articles right here)

  • After the RAGs are added to the agent path, help the team evaluate the execution quality: How long/deep is the summary, and what is included? Does the agent create the right execution plan? etc. (The book has a set of 3 chapters addressing exactly how to do that and what to evaluate)

  • UXers are best equipped to evaluate the agentic output — that means you need to use AI-Containing, User-Centered RITE process to drive continuous improvement while collaborating with the customers and the rest of your team. (We show you exactly how to do that in the book in Part 3)

  • There is very little UI in most agents at the moment — that will evolve, but it’s clear UI is not the focus as we wrote in the viral “Titanic” post: https://www.linkedin.com/feed/update/urn:li:activity:7310349394065666050/

  • Start working in Arize instead of only in Figma. (Yes, Figma is a Titanic, we are going to continue to give you the tools you need to rescue yourself… If you like.)

Finally, brave UXers can role-play a multi-agent breakdown in a single LLM thread. Then vibe code (or just plain “code”) this type of single-thread workflow to create a vision prototype video recording of how the execution should work when the agent runs the flow autonomously or with human in the loop. Then UXers can work with devs to fit your optimized RAGs into one of the agentic frameworks for production. 

… And while we write up all that advanced goodness for your delectation, you should really get the book, if you haven’t already:

📘 #1 New Release in Data Modeling and Design on Amazon: https://amzn.to/4l2ShyL 

This book is your hands-on companion to everything you love about our blog and our famous “How to Become Indispensable in 2025” guide — and so much more.

You won’t just read about UX for AI techniques: Storyboarding, Digital Twin, Value Matrix, AI-first RITE…

  • You’ll build them.

  • You’ll practice them.

  • You’ll master them.

💥 Built on 35+ real-world AI case studies — what worked, what failed, and why (extreme snark warning!)

🛠️ A full set of battle-tested UX tools with step-by-step exercises and practical examples

🚀 The exact frameworks UX leaders are using today to shape UX for LLMs, copilots, and AI Agents

💡 Learn how to lead cross-functional teams to pick the right use cases, shape the model, define the data, and deliver actual impact.

This book is designed for people who are ready to lead.
Not theory. Not fluff. Just the tools and thinking you need to become indispensable in the new AI normal.

📦 #1 New Release in Data Modeling and Design on Amazon: https://amzn.to/4l2ShyL 

Peace,

Greg

Reply

or to participate.