Moltis: a personal AI assistant built in Rust

I built a personal AI assistant in Rust. It runs tools, remembers context, talks on Telegram, and every command executes in a sandbox. Here’s what’s inside.

Think OpenClaw, but Rust-native. One static binary, no Node, no runtime, no npm. I’ve been building this for a while, but did not want to announce too early. Shipping it today.

If you’ve been wanting to self-host an AI assistant that isn’t stitched together from five npm packages – it’s on GitHub.

Why I built this

I wanted an AI assistant I could trust, run myself, and understand end to end.

I also believe AI should feel simple: your assistant should help with day-to-day work quietly in the background, not add another dashboard to manage. Moltis is my Rust-native take on that idea: local-first, secure by design, practical in daily use, and focused on useful automation over hype.

I’ve written before about owning your content and owning your email. The same logic applies to AI assistants. If you rely on a hosted service for something that has access to your files, your credentials, and your daily workflow – you should be able to inspect it, audit it, and fork it if the project direction changes.

Infrastructure you depend on should be yours. Moltis is MIT licensed, runs on your hardware, and the code is all there. No telemetry, no phone-home, no vendor lock-in. If I disappear tomorrow, your stack still works.

What it is

One binary that runs the full assistant: web UI, provider routing, tools, sessions, memory, hooks, and integrations – without Node runtime overhead or dependency sprawl. The whole thing – 150k lines of Rust – compiles into a single 60MB executable. Web UI and assets included. No garbage collector.

At a high level, Moltis is designed to:

Connect multiple LLM providers through one consistent assistant
Keep you in control with full support for local LLMs, so private workflows can stay on your own machine
Stream responses in real time
Support agent workflows with tools, MCP, and long-term memory
Work across channels (web, API, Telegram) while keeping context coherent
Execute actions safely with sandboxing and approval controls

Instead of chatting with one model in one tab, you get an assistant that can actually help you get things done: run commands, search the web, watch files, schedule tasks, and remember what you told it last week.

The interesting bits

Multi-provider routing. OpenAI Codex, GitHub Copilot, local models – all through one interface with fallback chains and per-provider metrics. The batch API support also gives you 50% cost savings on OpenAI calls. More providers already built, shipping as I QA them.

Local LLMs built in. Search and download models from Hugging Face directly from the UI. Automatic GGUF model setup. MLX support on Apple devices for up to 30% faster inference ¹. Fully offline capable. No cloud dependency if you don’t want one.

Streaming-first. Token streaming on every provider, including when tools are enabled. Tool call arguments stream as deltas as they arrive. No waiting for the full response before you see output.

Web browsing. Built-in web_search (Brave, Perplexity), web_fetch with readability extraction, and browser tools for full page interaction. Browser runs in a sandbox. All fetches go through SSRF protection – DNS is resolved before the request, and private/loopback ranges are blocked.

MCP servers. Over stdio or HTTP/SSE with health polling, auto-restart on crash, and exponential backoff. You can edit server configs directly from the web UI.

Parallel tool execution. When the LLM requests multiple tool calls in one turn, they run concurrently via futures::join_all. This matters when your agent chains five tools in a row.

Sub-agent delegation. The LLM can spawn child agent loops with spawn_agent, with nesting depth limits and tool filtering. Your assistant can delegate ².

Authentication. Password, API keys, or passkeys (WebAuthn). First-run setup code printed to terminal, no default passwords floating around. Per-IP throttling on login attempts with 429 + Retry-After.

Sandboxed execution. Every command runs in Docker, Podman, or Apple Container. Per-session isolation. Environment variables injected but redacted from output – plain text, base64, and hex forms. Images auto-rebuild on config change.

Long-term memory. Hybrid vector + full-text search in SQLite. Local GGUF embeddings or OpenAI batch API. File watching with live sync. Auto-compaction when you hit 95% context window.

Multi-channel. Web UI, Telegram, REST API, mobile PWA, push notifications. One agent, coherent context across all of them. Messages that arrive during an active run can replay individually or get collected and sent as one batch.

Scheduled tasks. Cron-based task execution built in. No external scheduler needed. A heartbeat runs every 30 minutes (configurable), asks the LLM to check if anything needs your attention – inbox, calendar, reminders – and only notifies you when something does.

Voice. TTS and STT with multiple providers. Configure them from the settings UI. Local providers coming in a later release, needs more QA.

Self-extending. Pi-inspired self-extension: creates its own skills at runtime. Session branching, hot-reload.

Hook system. Lifecycle hooks on every event – BeforeToolCall, AfterToolCall, SessionEnd, etc. Modifying hooks run sequentially, read-only hooks run in parallel. Circuit breaker auto-disables failing hooks. Shell hooks communicate via exit code + JSON on stdout. You can manage hooks from the CLI (moltis hooks list) or edit them live from the web UI.

Onboarding wizard. First run walks you through setting up the agent identity (name, emoji, creature, vibe) and your user profile. The config is TOML-based with environment variable overrides, and moltis config check validates everything including typo detection with suggestions.

Observability. Prometheus metrics, OpenTelemetry tracing with OTLP export, structured logging. When something goes wrong, you’ll know where.

Tailscale integration. Expose the gateway over your tailnet via Tailscale Serve or Funnel, with status monitoring and mode switching from the web UI.

Why Rust

Moltis sits on a sensitive boundary: model traffic, tool execution, credentials, and automation. That boundary must be boringly reliable.

Rust gives me that foundation:

Memory safety by default
Strong compile-time guarantees
High performance without garbage-collector pauses
Predictable behavior under streaming and concurrency load

For AI systems, these are not abstract advantages. They directly impact safety, latency, and uptime. Moltis also leans into Rust’s security strengths in concrete ways: no unsafe code by default, strict secret handling, and explicit execution/network boundaries.

If you’re the kind of person who reads Cargo.toml before screenshots, here are the numbers from current main:

27 workspace crates split into focused modules (agents, gateway, tools, memory, voice, channels, etc).
53 non-default feature flags across the workspace (77 including default entries) to compile capabilities in or out.
376 feature-gated code paths (#[cfg(feature = "...")]) in Rust sources.
56 trait definitions and 160 Arc<dyn ...> injection points, which is how most boundaries stay explicit and replaceable.

Not just “uses Rust”, actually leans on Rust:

LlmProvider unifies streaming and tool-calling across providers.
Sandbox abstracts Docker/Podman/Apple Container backends.
ChannelPlugin, ChannelOutbound, and ChannelStatus define channel integration contracts.
The gateway layer alone exposes 21 service traits for strongly typed boundaries.

And yes, there are grown-up guardrails:

Workspace lints deny unsafe_code, unwrap_used, and expect_used by default (with narrow, explicit unsafe allowances only where local-LLM FFI requires it).
6 GitHub workflows cover formatting, linting, tests, coverage, E2E, and release builds.
Release artifacts are automatically keyless-signed with Sigstore/Cosign (cosign sign-blob), with checksums plus .sig/.crt published per file.
Docker images are built multi-arch with SBOM/provenance, then signed and verified by digest with Cosign in CI.
1,700+ test functions across crates, plus a dedicated benchmarks crate (WIP).

Security and control

AI assistants are useful when they can act – and risky when they act without boundaries. Moltis is built with defense in depth:

Sandboxed execution (Docker, Podman, or Apple Container backends)
Human-in-the-loop approval for sensitive actions
SSRF protections and origin validation
passkeys support (WebAuthn), plus scoped authentication and API key controls
Safer secret lifecycle handling

The goal is simple: useful automation without blind trust.

Try it today

# One-liner
curl -fsSL https://www.moltis.org/install.sh | sh

# Or via Homebrew
brew install moltis-org/tap/moltis

Also available as .deb, .rpm, .pkg.tar.zst, Snap, and AppImage. One-click deploy on DigitalOcean.

Treat it as alpha software – isolate your deployment, review permissions, and manage secrets carefully.

Moltis is about owning your AI assistant. Not outsourcing core control. Not treating safety as an afterthought. Not locking the future behind a hosted box. Just a fast, secure, extensible AI assistant you can run yourself.

It’s not perfect yet, but it’s mine, and it runs on my hardware. If that matters to you too, I’d love to hear what you think.

moltis.org

Based on benchmarks from Production-Grade Local LLM Inference on Apple Silicon and Benchmarking Apple’s MLX vs. llama.cpp. Results vary by model size and quantization. ↩
Sub-agent delegation is coming but not yet merged and available. ↩