I built a personal AI assistant in Rust. It runs tools, remembers context, talks on Telegram, and every command executes in a sandbox. Here’s what’s inside.
Think OpenClaw, but Rust-native. One static binary, no Node, no runtime, no npm. I’ve been building this for a while, but did not want to announce too early. Shipping it today.
If you’ve been wanting to self-host an AI assistant that isn’t stitched together from five npm packages – it’s on GitHub.
Why build this
I’ve written before about owning your content and owning your email. The same logic applies to AI assistants. If you rely on a hosted service for something that has access to your files, your credentials, and your daily workflow – you should be able to inspect it, audit it, and fork it if the project direction changes.
Infrastructure you depend on should be yours. Moltis is MIT licensed, runs on your hardware, and the code is all there. No telemetry, no phone-home, no vendor lock-in. If I disappear tomorrow, your stack still works.
What it is
One binary that runs the full assistant: web UI, provider routing, tools, sessions, memory, hooks, and integrations – without Node runtime overhead or dependency sprawl. The whole thing – 150k lines of Rust – compiles into a single 60MB executable. Web UI and assets included. No garbage collector.
Instead of chatting with one model in one tab, you get an assistant that can actually help you get things done: run commands, search the web, watch files, schedule tasks, and remember what you told it last week.
The interesting bits
Multi-provider routing. OpenAI Codex, GitHub Copilot, local models – all through one interface with fallback chains and per-provider metrics. The batch API support also gives you 50% cost savings on OpenAI calls. More providers already built, shipping as I QA them.
Local LLMs built in. Search and download models from Hugging Face directly from the UI. Automatic GGUF model setup. MLX support on Apple devices for up to 30% faster inference 1. Fully offline capable. No cloud dependency if you don’t want one.
Streaming-first. Token streaming on every provider, including when tools are enabled. Tool call arguments stream as deltas as they arrive. No waiting for the full response before you see output.
Web browsing. Built-in web_search (Brave, Perplexity), web_fetch with
readability extraction, and browser tools for full page interaction. Browser runs
in a sandbox. All fetches go through SSRF protection – DNS is resolved before
the request, and private/loopback ranges are blocked.
MCP servers over stdio or HTTP/SSE with health polling, auto-restart on crash, and exponential backoff. You can edit server configs directly from the web UI.
Parallel tool execution. When the LLM requests multiple tool calls in one
turn, they run concurrently via futures::join_all. This matters when your agent
chains five tools in a row.
Sub-agent delegation. The LLM can spawn child agent loops with
spawn_agent, with nesting depth limits and tool filtering. Your assistant can
delegate 2.
Authentication. Password, API keys, or passkeys (WebAuthn). First-run setup
code printed to terminal, no default passwords floating around. Per-IP
throttling on login attempts with 429 + Retry-After.
Sandboxed execution. Every command runs in Docker, Podman, or Apple Container. Per-session isolation. Environment variables injected but redacted from output – plain text, base64, and hex forms. Images auto-rebuild on config change.
Long-term memory. Hybrid vector + full-text search in SQLite. Local GGUF embeddings or OpenAI batch API. File watching with live sync. Auto-compaction when you hit 95% context window.
Multi-channel. Web UI, Telegram, REST API, mobile PWA, push notifications. One agent, coherent context across all of them. Messages that arrive during an active run can replay individually or get collected and sent as one batch.
Scheduled tasks. Cron-based task execution built in. No external scheduler needed. A heartbeat runs every 30 minutes (configurable), asks the LLM to check if anything needs your attention – inbox, calendar, reminders – and only notifies you when something does.
Voice. TTS and STT with multiple providers. Configure them from the settings UI. Local providers coming in a later release, needs more QA.
Self-extending. Pi-inspired self-extension: creates its own skills at runtime. Session branching, hot-reload.
Hook system. Lifecycle hooks on every event – BeforeToolCall,
AfterToolCall, SessionEnd, etc. Modifying hooks run sequentially, read-only
hooks run in parallel. Circuit breaker auto-disables failing hooks. Shell hooks
communicate via exit code + JSON on stdout. You can manage hooks from the CLI
(moltis hooks list) or edit them live from the web UI.
Onboarding wizard. First run walks you through setting up the agent identity
(name, emoji, creature, vibe) and your user profile. The config is TOML-based
with environment variable overrides, and moltis config check validates
everything including typo detection with suggestions.
Observability. Prometheus metrics, OpenTelemetry tracing with OTLP export, structured logging. When something goes wrong, you’ll know where.
Tailscale integration. Expose the gateway over your tailnet via Tailscale Serve or Funnel, with status monitoring and mode switching from the web UI.
Rustacean details
If you’re the kind of person who reads Cargo.toml before screenshots, here are
the numbers from current main:
- 27 workspace crates split into focused modules (
agents,gateway,tools,memory,voice,channels, etc). - 53 non-default feature flags across the workspace (77 including
defaultentries) to compile capabilities in or out. - 376 feature-gated code paths (
#[cfg(feature = "...")]) in Rust sources. - 56 trait definitions and 160
Arc<dyn ...>injection points, which is how most boundaries stay explicit and replaceable.
Not just “uses Rust”, actually leans on Rust:
LlmProviderunifies streaming and tool-calling across providers.Sandboxabstracts Docker/Podman/Apple Container backends.ChannelPlugin,ChannelOutbound, andChannelStatusdefine channel integration contracts.- The gateway layer alone exposes 21 service traits for strongly typed boundaries.
And yes, there are grown-up guardrails:
- Workspace lints deny
unsafe_code,unwrap_used, andexpect_usedby default (with narrow, explicit unsafe allowances only where local-LLM FFI requires it). - 6 GitHub workflows cover formatting, linting, tests, coverage, E2E, and release builds.
- Release artifacts are automatically keyless-signed with Sigstore/Cosign
(
cosign sign-blob), with checksums plus.sig/.crtpublished per file. - Docker images are built multi-arch with SBOM/provenance, then signed and verified by digest with Cosign in CI.
- 1,700+ test functions across crates, plus a dedicated benchmarks crate (WIP).
Try it today
# One-liner
curl -fsSL https://www.moltis.org/install.sh | sh
# Or via Homebrew
brew install moltis-org/tap/moltis
Also available as .deb, .rpm, .pkg.tar.zst, Snap, and AppImage. One-click deploy on DigitalOcean and Render.
Open source. MIT licensed. Treat it as alpha software – isolate your deployment, review permissions, and manage secrets carefully.
It’s not perfect yet, but it’s mine, and it runs on my hardware. If that matters to you too, I’d love to hear what you think.
-
Based on benchmarks from Production-Grade Local LLM Inference on Apple Silicon and Benchmarking Apple’s MLX vs. llama.cpp. Results vary by model size and quantization. ↩
-
Sub-agent delegation is coming but not yet merged and available. ↩