“Do you think this potted plant would look better somewhere else?”
“Oh? What’s on your mind? I thought you liked it where it was.”
“It’s not that I don’t… I just feel like nothing looks right lately. I guess I’m just looking for a change of scenery.”
…
When we talk things over with friends, partners, or family, we rarely expect an immediate, clear-cut answer. The conversation often begins with a vague impulse or a half-formed idea.
They might build on your thought: “How about by the window? The sunlight might help it thrive.” Or they might probe deeper, sensing the motive behind the question: “Have you been feeling a bit drained lately? It sounds like you want to move more than just the plant — maybe you’re looking to bring something new into your life.”
Human conversation is a dynamic, exploratory journey. It’s not about simply transferring information. It’s about two people taking a fuzzy idea and, through a back-and-forth exchange, co-discovering, refining, and even shaping it into something entirely new — uncharted territory neither had imagined at the start. This is a process of Intent Co-construction.
As our relationship with AI evolves from “tool” to “partner,” we find ourselves sharing more of these ambiguous intentions. To meet this changing need, how can we learn from our human relationships to design interactions that foster deep connection and co-construct intent with our AI counterparts?
Reading between the lines with multimodality
Picture a perfect sunny weekend. You’re driving with the windows down, your favorite album playing, on your way to that new park you’ve been wanting to visit.
You tell your voice assistant your destination. It instantly displays three routes, color-coded by time and traffic, and helpfully highlights the one its algorithm deems fastest.
You subconsciously take its advice, but halfway there, something feels wrong.
While it may be the shortest path physically, the route involves constant lane changes on streets barely wide enough for one car. You’re flanked by parked cars whose doors could swing open at any moment and kids who might dart into the road. Your nerves are frayed, your palms are sweating on the wheel, and you find yourself muttering about the cramped, crowded conditions, nearly rear-ending an e-bike.
Through it all, the navigation remains indifferent, stubbornly sticking to its original recommendation.
Yes, multimodal inputs allow us to give clearer commands. But when our initial command is incomplete, we still end up with a generic solution. A true partner would think:
“They seem stressed by this complex route. Should I suggest a longer but easier alternative?”
“I’m detecting swearing and frequent hard braking. Is this road too difficult for them to handle?”
…
The real breakthrough isn’t just understanding what users say, but how they say it — combining their words with environmental cues and situational context. Do they type fluently or constantly backspace? Do they circle a data point with confidence or hesitation? These subconscious signals often reveal our true state of mind.
The AI we need isn’t just one that can process text, voice, images, and gestures simultaneously. We need a partner that, while respecting our privacy, can keenly and continuously read between the lines, detecting the unspoken truth in the dissonance between these multimodal signals.
“To design the best UX, pay attention to what users do, not what they say. Self-reported claims are unreliable, as are user speculations about future behavior. Users do not know what they want.”
— Jakob Nielsen
Now, let’s take this one step further. Imagine an AI that, through multimodal sensing, has perfectly understood our true intent. If it simply serves up a flawless answer like a data report, is that really the best way for us to learn and grow?
Information as a flowing process
Let’s rewind and take that drive to the park again. This time, instead of an AI, your co-pilot is a living, breathing friend.
When you reach that same algorithm-approved turnoff, you tense up at the sight of the narrow lane. Your friend notices immediately and guides you through the challenge:
“This road looks rough. Let me guide you to a better one.”
“Turn right just after that coffee shop up ahead.”
“We’re almost there. See the people with picnic blankets?”
…
The journey is seamless. You realize your friend didn’t necessarily give you more information than the AI, but they delivered the right information at the right time, in a way that made sense in the moment.
Similarly, AI-generated information can be delivered through diverse mediums; text is by no means the only way. Think about a recent conversation that stuck with you. Was it memorable for its dictionary-like volume of facts? More likely, you were captivated by how the story was told — in a way that helped you visualize it. This power of visualization is rooted in metaphor.
“…we often think we use metaphors to explain ideas, but I believe good metaphors don’t explain but rather transform how our minds engage with ideas, opening entirely new ways of thinking.”
— The Secret of Good Metaphors
Files that look like paper, directories that look like folders, icons for calculators, notepads, and clocks — back in the earliest days of personal computing, designers used graphical metaphors based on familiar physical objects to make strange and complex command lines feel intuitive and accessible.
Metaphors work by tapping into our past experiences and connecting them to something new, bridging the gap to understanding. So, how does this apply to AI output?
Think about how we typically use an AI to explore a complex topic. We might ask it a direct question, have it synthesize industry reports, or feed it a pile of research to summarize. Even with the AI’s best efforts, clicking open a result to find a wall of text can feel overwhelming.
We can’t see its thought process. We don’t know if it considered all the angles we did. We don’t know where to begin. What we truly need isn’t just a final answer, but to feel like a friend is walking us through their thinking — transforming information delivery from a static report into a guided process of shared discovery.
But what if, even after seeing the process, the answer is still too abstract?
We naturally understand information through different forms: charts for trends, diagrams for processes, and stories told through sound and images. Any good communication orchestrates different dimensions of information into a presentation that conveys meaning more effectively.
NotebookLM (Google): Can autonomously transform source materials into various accessible formats like illustrated videos, podcasts, or mind maps, turning passive learning into active co-creation.
However, there’s a risk. When an AI uses carefully crafted metaphors to present an output that is clear, beautiful, and logically flawless, it can feel like an unchallengeable final answer.
Is that how our conversations with human partners work?
When a friend shares an idea, we don’t just agree. Our responses are filled with questions, doubts, and counter-arguments. Sometimes, a single insightful comment can change the direction of an entire project. A meaningful dialogue is less about the period at the end of a sentence and more about the comma or the question mark that keeps the conversation going.
Progressive construction through dialogue and memory
“Let’s go hiking this weekend. I want to challenge myself.”
“Sounds good! But remember last time? You said your knee was bothering you halfway up. Are you sure? We could find an easier trail.”
“I’m fine, my knee’s all better.”
“Don’t push yourself…”
…
A true partner remembers your past knee injury. They remember you’re directionally challenged and that you’re not a fan of reading long texts. This long-term memory allows your interactions to build on a shared history, moving beyond simple Q&A into a state of mutual understanding where you can anticipate each other’s needs without lengthy explanations.
For an AI to co-construct intent like a partner, persistent memory is not just a feature — it’s essential.
Agent failures aren’t only model failures; they are context failures.
—The New Skill in AI is Not Prompting, It’s Context Engineering
But memory alone isn’t enough; we need to use it to foster deeper exploration. As we said from the start, the goal isn’t to get an instant answer, but to refine our intentions and formulate better, more insightful questions.
When a vague idea or question surfaces, we want an AI that is more than an answer machine. We want a true thinking partner: one that can reach beyond the immediate context, draw on our shared history to initiate meaningful dialogue, and guide us as we peel back the layers of our own thoughts. In this progressive, co-constructive process, it helps us finally articulate what we truly intend.