Most people's experience with AI assistants is a chat box in a browser tab. You type a question, read the answer, close the tab. It works, but you are using it like a search engine — not an assistant. An assistant should know what you care about, do things on your behalf, and occasionally tell you something you did not know to ask.
A wave of open-source projects — OpenClaw (docs.openclaw.ai), QwenPaw (github), Hermes Agent (hermes-agent.nousresearch.com), among others — is trying to build exactly this. This post tackles two questions: what is a personal assistant, really, and how should we build one? I will argue that the usual "a chatbot that helps you operate your PC" framing is too narrow. A personal assistant is better understood as an agent system that can directly interact with a human (using it directly from a console, or messaging it through channels like Slack), another agent (a research agent asking it to rank ten new papers; a project-management agent asking it to draft a client email), or an application (an IDE embedding it for code reasoning; a CRM generating outreach copy; a site like Redfin supporting it so the agent can interact with listings, comps, and tours on the user's behalf) — and that also runs scheduled background jobs on its own, pushing results to you (or to any of the above) rather than waiting to be asked. The core is the same across all of these — only the interface changes. I will also share a demo I built on top of QwenPaw — Copilot Digest.
Personal assistants vs. chatbots
When I say "personal assistant," I mean something specific: a program that runs persistently on your machine (or a server you control), has access to local files and tools, connects to messaging channels you already use, and can take actions without you being in the loop. All three connect to the messaging platforms you already use — Slack, Discord, Telegram, WhatsApp, iMessage, email, and more (OpenClaw, 2025; QwenPaw, 2025; Nous Research, 2025). Increasingly they can also be reached over standard tool protocols — MCP for LLM-host applications like Claude.ai and IDEs, ACP for other agents — so non-human callers can plug in without any bespoke integration.
That alone is a nice quality-of-life improvement, but another interesting part is what happens when you are not talking to it.
Heartbeat and cron
These projects typically offer some form of scheduled, proactive behavior. They call it a heartbeat: you write questions into a markdown file, set an interval — say every two hours between 8 AM and 10 PM — and the agent answers on schedule, pushing replies to whatever channel you last chatted on. You wake up to a message: "Three new preprints on retrieval-augmented generation were posted overnight. Here is a ranked summary." You did not ask. It just knew to check.
Separate cron systems let you schedule independent jobs, each with its own timing and delivery target. Morning digest at 8 AM. Compliance check on Fridays. PR review reminder before standup. Together these features turn the agent from something you pull from into something that pushes to you.
This is a cool concept — genuinely useful when it works. But living with it day to day reveals friction that is easy to underestimate from the outside.
Cost and safety
Token consumption
Every heartbeat tick is a full LLM inference call. Every cron job is a conversation turn. A heartbeat firing every 30 minutes across a 14-hour active window tops out at 28 calls per day — before you have asked a single question yourself. If you are using a cloud model (and most people are, because for long-horizon agentic tasks cloud models are still more reliable than anything you can run on consumer hardware), the cost accumulates fast. Depending on the model and context length, a single always-on agent can easily cost tens of dollars per month in API fees for scheduled activity alone.
You can mitigate this with shorter context, cheaper models, or longer intervals, but there is a fundamental tension: the more proactive and context-aware you want the agent to be, the more tokens it burns. There is no free lunch here.
Safety
These agents have real tools — file read/write, shell execution, web browsing. Each project takes a different approach to containment: QwenPaw layers pattern-based tool guards, file-path restrictions, and skill security scanning (QwenPaw security docs); OpenClaw uses DM pairing, allowlists, and optional Docker sandboxing (OpenClaw security docs); Hermes Agent offers six execution backends (local, Docker, SSH, Daytona, Singularity, Modal) with container hardening and isolated subagents (Nous Research, 2025).
These are meaningful protections. They are also not bulletproof. Pattern-based detection has blind spots. Prompt injection — malicious input that tricks the agent into unintended actions — remains an open problem (Greshake et al., 2023). Running one of these agents in production means accepting some operational overhead: monitoring logs, reviewing tool calls, keeping rules up to date.
Where are we heading
The ecosystem is extending the same core in two directions — one driven by users, one driven by the projects themselves.
On the user side, the main extension mechanism is skills: domain-specific capability packs (a research skill, a code-review skill, a customer-outreach skill) that plug into the agent runtime and teach it new tricks. Because skills live above the runtime, the assistant gets smarter at the edge without the underlying project having to ship a new release — anyone can extend their own assistant by writing or installing one.
On the provider side, the work is happening along two complementary axes. The first is interface support: more messaging channels for humans, and increasingly MCP and ACP landing across the ecosystem so agents and LLM-host applications can plug in without bespoke integration. The second is better agent harnesses: the runtime itself is getting more capable. Hermes Agent, for example, is self-improving — it distills completed complex tasks into reusable skills, iterates on those skills in use, and periodically persists what it has learned into MEMORY.md (Nous Research, 2025). Better tool-use loops, better memory, better safety rails — the floor keeps rising even as the ceiling keeps moving up.
A demo: Copilot Digest
To put this into practice, I built an assistant called Copilot Digest (source) on top of QwenPaw — think of it as a personalized knowledge podcaster that helps you digest what matters and stay up to date during dead time like commutes, walks, or chores. It ingests papers, articles, blog posts, and news you send to it but do not have time to read, then organizes, ranks, and summarizes them into a local knowledge base. You can browse a reading list, get ranked briefings ("what is new this week?"), read full article summaries, discuss specific papers in depth, capture notes and action items, and export compiled reports. Everything is stored as files on your machine — a workspace directory with an index, article summaries, work outputs, and exports.
With a cron job pointed at your RSS feeds or saved links, the knowledge base grows while you sleep. The agent does the reading, summarizing, and filing; you show up and ask what is new. This is the kind of task personal agents are built for — persistent, background work that a chatbot simply cannot do.
The first thing I wanted from Copilot Digest was to use it during the times I am not at a screen — commutes, walks, chores. Voice input alone does not cut it for that; I want a full hands-free, eyes-free conversation — speak, listen to the reply, keep going, all without touching the phone. That is exactly what Claude.ai's voice mode already does, and does well. Building a comparable voice-conversation layer on top of these agent projects (whose chat interfaces handle voice input at best, not a two-way spoken conversation) would be a project in itself.
I did not have to. If I could get Claude.ai to drive my Copilot Digest agent, the whole voice-mode experience would come along with it. MCP is the bridge: I expose Copilot Digest as an MCP server and register it in Claude.ai as a custom connector, and an LLM host application becomes the voice-mode front-end for my local agent. The agent itself does not have to change. Same core, a different consumer, and voice conversation suddenly works.
Open discussions
A few things I am still thinking about, and that the current wave of projects does not fully answer.
Should the agent, its tools, and its caller all live on one machine? The convenient case is yes. In practice, rarely — I want to message the agent from my phone while the core runs on my laptop (but I do not want to go through messaging channels), or reach it from an IDE on my work machine while it is running on a Mac mini under my desk. Each of those requires exposing the agent beyond its host, and every exposure path has a security cost. A messaging channel gives anyone who can DM the bot a ready prompt-injection surface, and a leaked bot token lets an attacker impersonate the bot outright. Binding a port directly to the open internet is worse: earlier this year Censys found more than 21,000 OpenClaw instances with their gateway bound to 0.0.0.0 instead of localhost, many of them leaking API keys and chat logs (Censys, 2026). A few mitigations worth considering — per-surface authentication (API keys, OAuth) to keep unauthenticated strangers out; per-caller allowlists so a stolen token can only do a subset of things; and pulling the whole thing onto a private overlay network like Tailscale so the agent never has to sit on the public internet at all — nobody even gets to knock on the door.
What happens when MCP, ACP, and messaging channels are not enough? They cover the common cases well, but they are generic by design — built to work across a wide range of callers, they only expose the thin slice of capabilities that everyone can agree on. A more advanced builder who wants to wire the assistant deeply into a product — with custom state, a specific UI dialect, or operations that do not round-trip cleanly through tool-calls — will hit walls. One escape hatch is for the project itself to ship an SDK — for example, one that lets builders invoke "magic commands" to pin down operations you want the agent to perform the same way every time. These show up as slash-style shortcuts a user types in chat (for example /compact to compress context, or /clear to reset the session), mapped to deterministic behavior. The upside is that common, stability-sensitive operations get fixed in place; the cost is that the commands only work inside products that know this SDK. My guess is that the better-polished assistants end up doing both: protocols so other systems can plug in, and an SDK so builders can own the user-facing surface themselves.
Closing thoughts
Most people still use AI like a search engine. The projects above suggest something richer is possible — an assistant that runs on its own schedule, fields calls from other agents and applications, and meets you where you already work.
And further out: maybe "personal" is itself a limiting frame. Once the SDK layer matures and the interface protocols stabilize, nothing forces this kind of system to stay niche — it could just as well become the default, replacing the chatbot as how most people interact with AI. A personal assistant today; the agent everyone uses tomorrow.
References
- Anthropic. "Model Context Protocol." modelcontextprotocol.io
- Censys (2026). "OpenClaw in the Wild: Mapping the Public Exposure of a Viral AI Assistant." censys.com
- Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." AISec 2023.
- Nous Research. Hermes Agent. hermes-agent.nousresearch.com
- OpenClaw. Documentation and security model. docs.openclaw.ai
- QwenPaw. Repository: github.com/agentscope-ai/QwenPaw.
- Copilot Digest skill specification. SKILL.md
- Cloudflare Tunnel. developers.cloudflare.com/cloudflare-one/connections/connect-networks/