Skip to content
← writing

Building agents that live in your messages

agents
ai
messaging
architecture

I'm a big fan of agents that live in the messaging apps people already use — iMessage, Telegram, WhatsApp, Slack. No new app to download, no new tab, no "open the dashboard." You just text it, the way you text a friend. My team has been building these surfaces, and making an agent feel conversational — feel like a person and not a form with a chat skin — turns out to be a genuinely interesting product and engineering problem.

It's actually two problems wearing one coat. In a one-on-one channel like iMessage, the hard part is the channel itself: it's almost featureless, so you have to be clever about how you show information. In a shared channel like Slack or Discord, the channel is rich but the context is a minefield — one bot serving many people across many orgs, each with their own permissions. Let's do both.

The case for texting

The pitch is simple and a little subversive: people don't want another app, they want to text. Poke, the assistant from The Interaction Company, has built its whole identity around this — co-founder Marvin von Hagen's line is that users "want to text an AI the same way they text their partner, friends, parents, and colleagues." It runs in iMessage, SMS, Telegram and WhatsApp, and in June 2026 it became the first AI agent Apple approved for its Messages for Business platform. The bet is that the chat thread is the interface.

The catch is that not all chat threads are created equal.

The richness ladder

Here's what an agent can actually send on each platform. It's a steep ladder:

iMessage
can send
Plain text, images, link previews, and tapback reactions. That's the whole toolbox.
gives up
No buttons, menus, modals, callbacks, slash commands, or markdown. Every 'choice' comes back as free text you have to parse.

Input richness across channels. iMessage is the thinnest surface — and the most popular one.

Slack's Block Kit is basically a GUI toolkit — buttons, every kind of select menu, date pickers, checkboxes, modals, a persistent Home tab. Telegram is nearly as rich and even lets you embed a whole web app. WhatsApp is deliberately spare — three reply buttons, one ten-row list — and wraps even that in a 24-hour reply window and template approvals.

And then there's iMessage, sitting at the bottom of the ladder and, of course, being the one everyone actually wants. There is no public API. The only sanctioned path, Apple Messages for Business, is gated behind an approved provider and an experience review, and it sends gray business bubbles, not the blue ones. For everyone else you get plain text, images, link previews, and tapback reactions. That's the entire toolbox.

Faking a UI in a text bubble

So you fake it. Every rich element has a low-tech stand-in, and the trick is doing it without latency or jank:

A
Agent

No buttons, no cards, no menus — so you fake them. A numbered list is a menu, block characters are a chart, a link preview is a tappable card, and a tapback is a yes. Long answers arrive as rapid-fire bubbles to feel like a person texting.

A numbered list is a menu — you present "1 / 2 / 3" and parse whatever the user texts back. Block characters (████░░) are a progress bar or a chart. A link preview is a tappable card: send a URL whose Open Graph image and title render a little hero, and you've got a button that looks like a button. A tapback ❤️ on a confirmation message is a yes — Poke's leaked prompt maps positive reactions to "yes" and negative ones to "no." For anything genuinely visual — a receipt, a table, a chart — you render a PNG server-side and send it as an image. It's the only reliable way to lay out structure.

The other half of "feels human" is cadence. Real people don't send one wall-of-text paragraph; they fire off three quick bubbles. So you chunk long replies into rapid succession. You show a typing indicator. And — my favorite detail — Poke has a "wait" tool that lets it silently discard a reply it was about to send, so it doesn't clutter the thread when staying quiet is the more human move. The bar isn't "correct," it's "didn't feel like a bot."

Architecture: split the personality from the work

Once you're past the surface, the interesting design question is how the agent is organized internally. The cleanest pattern I've seen — again, reverse-engineered from Poke — is to separate the voice from the labor:

you
one chat thread
Interaction Agent
personality · context · the only voice you hear
Execution Agents · persistent · zero personality
Email agent
own history + tools
Calendar agent
own history + tools
Travel agent
own history + tools
memory
conversation (summarized) · per-agent action log · external (your inbox as context)
proactivity
a worker checks triggers + new email every minute, then wakes the agent that owns them

Poke's shape (reverse-engineered from its leaked prompt): split the personality from the execution. One warm front agent; cold, persistent task agents behind it.

One Interaction Agent owns the personality and is the only thing the user ever hears. Behind it sit Execution Agents — persistent, specialized, and deliberately personality-free. The email agent has its own history and tools; the calendar agent has its own. The front agent delegates ("book this"), they do the work in parallel, and it synthesizes the results back into one warm reply. Memory is layered: the conversation gets summarized as it grows, each execution agent keeps its own action log, and your connected inbox acts as ambient long-term context. Proactivity is a background worker that checks for due reminders and new email every minute, then wakes whichever agent owns the trigger.

It's a good shape because it keeps the thing that's hard to get right — tone, brevity, knowing when to shut up — in one place, and makes the task agents swappable and testable.

The shared-channel problem is a different shape

Now flip to Slack or Discord, where the agent is supposed to behave like a teammate. Suddenly the channel is rich but the context is treacherous. One deployed bot serves thousands of users across many orgs, all at once, in the same process. The single most important rule is: never key state by user alone. The reliable key is a composite — enterprise · team · channel · thread · user — because a raw Slack user ID is only unique inside its own workspace. Get this wrong and you get context bleed: one person's history surfacing in another person's reply. That's not a bug, that's an incident.

The scarier failure is about permissions. Your bot holds a broad token. A user asks it to do something. If the agent acts with the bot's powers instead of the user's, you've built a confused deputy — anyone can borrow the bot's authority just by asking nicely.

user can
agent can
what the agent may do

The safe rule. Effective authority is the strict intersection of what the user can do and what the agent can do. Never the union.

Toggle it. The whole security model of a teammate-bot is one set operation.

The rule is a single set operation: the agent's effective authority is the intersection of what the user can do and what the agent can do. Never the union. Toggle the visual above and you can see the whole security model flip between "fine" and "anyone can delete your channels."

The gotchas that bite everyone

These are the ones that show up in production, on both platforms, usually at 2am:

The 3-second clock
Slack and Discord drop the interaction if you don't ack in 3s.
Ack instantly, enqueue the real work, then edit the reply. Discord: defer (type 5) for a 15-min window.
Duplicate replies
A slow first ack triggers Slack's 3 retries — the user gets the same answer 3×.
Dedupe on event_id with a short TTL. Process once, ignore the retries.
Bot talking to itself
The bot's own message re-fires its handler. Infinite loop.
Drop events where bot_id / is_bot is set before you do anything else.
PII in a public channel
Every reply is visible to the whole channel.
Send sensitive answers ephemerally (postEphemeral / Discord flag 64). Set it on the first response.
Context bleed
Key state by user alone and one person's history leaks into another's reply.
Composite key: enterprise · team · channel · thread · user. Per-user memory stays per-user.
The @everyone bomb
A model that writes <!channel> can ping a whole org by accident.
Treat mass-mention as a privileged action. Strip broadcast tokens from generated text.

The 3-second ack (Slack and Discord both enforce it) is the one that breaks naive implementations first: do your LLM work inline and you'll blow the window, the platform retries, and now the user has three copies of the same answer. Ack instantly, enqueue, then edit the message in place. Dedupe on the event ID. Drop your own bot's messages before you process them or you'll talk to yourself forever. None of these are hard — they're just invisible until they aren't.

So: assistant or teammate?

That's the real split. A one-on-one agent in iMessage is an assistant — the work is making a featureless channel feel warm and immediate. A shared-channel agent is a teammate — the work is juggling identity, context, and permissions so it's helpful without being dangerous. Different problems, but the same north star underneath: the agent should feel like a person who happens to be very good at their job, not software you're operating. Get the plumbing right and that's exactly what it feels like.


Sources: Poke / TechCrunch and the OpenPoke reverse-engineering for architecture; Slack and Telegram / WhatsApp docs for platform capabilities; Arcade on agent authorization. Poke architecture details are reverse-engineered from a leaked prompt, not officially confirmed — treat them as credible, not gospel.