Finally, an interface for caring
After 30 years of the self-service web, it's time for a UI that enables one user to configure a voice-AI agent to serve someone else.
Since the advent of the web, our on-ramps to connected apps and services have evolved from web browsers to mobile apps on smartphones to voice on smart speakers. Each new interface, however, has used the same model: one user, on one device, starts and directs every session.
Somehow, the idea that someone might configure a service to benefit themselves and another person — e.g., a daughter in Boston creating a voice grng to brighten her mother’s day in Tampa — has barely been explored, let alone developed. And yet it’s been hiding in plain sight.
Thanks to LLMs, voice and conversational-AI have reached the point where the proactive half of voice, when the app speaks first, can be configured by one person to help another.
The half that already works better than ever is “reactive voice”. You speak, the agent responds. Siri, Alexa, and even today’s latest LLMs like ChatGPT and Gemini all use this prompt-response model. Ask a question, get an answer. Set a timer or play a song just by talking naturally. At long last, voice-AI agents are truly useful, even powerful — but still using only half of how humans talk to and with one another.
The missing half is “proactive voice”, where the voice-AI agent starts a dialog without being summoned by a user. Proactive voice mode can be used if and when something matters. Just like a human, an AI agent can be trained to “know” when the time is right, or not, to start a conversation with someone in a far-away room.
Unlike touchscreens, voice is not device-bound. It’s ambient, present and listening in a space, like another person would be. Voice is also unique in its ability to move from the periphery of our attention to the center when something matters — and when we request it or give it permission to do so.1
When a screen’s in your pocket or across the room, it’s invisible and uninvolved. And all reactive services, even traditional voice-AI agents, respond only after a user logs-in, starts an app or says something.
Whereas a screen can’t reach out and touch someone, proactive voice can be present and engage 24x7 — when it’s configured to by a host or a user.
But who’s at the helm?
Enabling a voice-AI agent to speak proactively, however, begs a question: Who decides what proactive voice agents say and when?
We see three options. The first is the tech platforms. Amazon and Google sometimes push audio of their own making to their subscribers, but being providers of consumables and search, they generally push ads and upsells out of the blue. That is, they use proactive voice just as they’ve always used proactive texts and emails — to serve their commercial interests.
The second is configuring an AI agent to speak proactively on its own. Several products designed for older adults take this route: after setup, the agent infers moods and needs and makes educated guesses about when and what to say to engage with an aging woman proactively. More personal than an ad, and potentially much more helpful.
The third is to let loving caregivers tell the AI agent what to say and when rather than rely on population-trained algorithms that may not reflect her unique blend of needs and relationships. The people who know and love the person on the other end — a daughter, a spouse, a close friend, even a paid companion — can tell a voice-AI agent what to say, when and how. Sometimes as a verbatim script; sometimes as guidance that the agent follows on their behalf.
To care for someone is to be proactive
Human caregivers don’t wait to be summoned. They reach out, check in, and follow up on their own. They often ask questions like: Have you eaten yet? Did you sleep well? And, without being asked, they remind their loved ones of things they might forget: Your doctor’s appointment is at ten today. The person being cared for often can’t remember these things themselves … it’s one of the many reasons they may need help.
Though AI agents can now generate natural language in real time, we believe that loving caregivers, with their aging care recipients, should define the context, and establish clear “guardrails,” for what voice-AI agents will say and do. A caregiving team can tell the AI agent who the care recipient is, what matters to her, what tone feels right and, critically, what the agent should do when something goes wrong. The agent reminds her to leave a light on at night to prevent falls and, if she falls, knows whom to call or text, and in what order.
For the person on the receiving end — the 85-year-old with arthritic hands or the parent who’s alone and whose memory is slipping — two-way, proactive and reactive voice is the first interface that doesn’t get in the way. No screens. No apps. No menus.
Just a familiar voice in the room with them, checking in on behalf of family and friends, and conversing with them naturally whenever they want to engage. The upbeat greeting is for the lonely mom. The text message to her family that she answered it, and sounded like herself, is for them.
Two-way voice isn’t only a new on-ramp to the same old web. It’s a new architecture. One that enables a person to configure a 24x7 service for themselves and for others.
The first and only interface humans ever had, voice, finally put to use for the most basic of human needs — watching over and caring for someone you love.
Mark Weiser and John Seely Brown coined "calm technology" in two papers at Xerox PARC (1995, 1996). Their central claim: well-designed technology lives in the periphery of attention and moves to the center when it matters, then recedes again. Proactive voice is the first UI/architecture that can do this based on the system's initiative rather than the user's.
