● refactoringbuildingsamridhlimbu.com/projects/allarkive · v0.1
AllArkive
● open source · BSides Melb 2026A private, offline research assistant that runs on your own hardware, with citations you can check. Useful when your internet is fine. More useful when it isn't. Wikipedia, repair manuals, medical wiki, and Stack Exchange — all served by Kiwix, all searchable via a local LLM that refuses to answer when it has no source material to cite from.
API layer
hallucination guard
10–25 min
indexing (laptop)
Context
Built with Sham, 2026. Presented at BSides Melbourne, May 16–17, 2026. The core problem: most "RAG with citations" systems enforce grounding at the prompt layer — they instruct the model not to hallucinate and trust it to comply. AllArkive enforces it at the API boundary: if the KNN retrieval step returns nothing past the L2 distance threshold, the FastAPI layer returns a fixed fallback string and the LLM is never invoked. The model cannot hallucinate because it is never called. The system prompt is belt-and-braces, not the actual constraint.
Timeline
Apr 2026
Problem framing
The premise: you already have the internet. AllArkive is useful when you do, and more useful when you don't. Censorship, ISP outages, cloud rug-pulls. The library should work without the AI — the AI is the bonus.
May 2026
Architecture locked
Three replaceable layers: nginx landing page, Kiwix serving ZIM files (Wikipedia / iFixit / WikiMed / Stack Exchange), and a FastAPI RAG service wired as an OpenAI-compatible endpoint. Every service bound to 127.0.0.1 by default. Open WebUI doesn't know AllArkive exists — it just sees a model called allarkive-rag in the picker.
May 2026
The refusal gate
Most "RAG with citations" projects enforce no-hallucination with a system prompt and hope. AllArkive short-circuits the LLM call at the FastAPI layer if KNN retrieval returns nothing past the L2 threshold. The model is never given the chance to fabricate. (server.py:300–301)
May 16–17 2026
v0.1 shipped · BSides Melbourne
Talk delivered. Five containers, three ZIM bundles (minimal ~5 GB, balanced ~24 GB, comprehensive ~200 GB), bootstrap.sh that wires everything in one command. Balanced default: Wikipedia mini + WikiMed + iFixit + SuperUser SE + Unix SE + Ask Ubuntu.
May 2026
v0.2 — indexer rewrite
Full rewrite of the indexer: batched async embeddings (10–30× faster on CPU), int8 vector quantisation (768 B vs 3,072 B per chunk), offset-only chunk storage — chunks stored as (char_offset, char_len) pointers into the ZIM, not full text. Index drops to ~25% of ZIM size. Hybrid BM25 mode for large-ZIM Pi deployments (reciprocal rank fusion). Schema v2.
Key technical decisions
01api-layer refusal › prompt-layer refusal
The LLM call is short-circuited at the FastAPI boundary if KNN retrieval returns no chunks past the L2 distance threshold. The system prompt also instructs the model to cite every claim and respond with "no sources found" if nothing is relevant — but that's belt-and-braces. The API is the real boundary. The model cannot fabricate because it is never invoked.
02sqlite-vec › pgvector / qdrant / chroma
No daemon, no schema migration, no running service. The entire index is one file you can rsync. On the Pi profile the index is ~10–15% of ZIM size with int8 quantisation. The choice matches the deployment target: a laptop or a Pi, not a server room.
03openai-compatible /v1/chat/completions › custom api shape
Open WebUI is wired via OPENAI_API_BASE_URLS=http://rag:8000/v1. AllArkive appears as a model in the picker — any future OpenAI-compatible client works for free with no integration code.
04AGPL-3.0 › MIT / Apache
Deliberate, not default. GPL leaves a SaaS loophole: you can run GPL software as a network service without releasing the source. AGPL closes it. A company cannot host AllArkive as a service without open-sourcing their modifications.
The refusal gate
kairos/scheduler.pypy
1# embed query → KNN top-5 (max L2 distance 1.0, normalised)
2chunks = vec_db.knn(
3 embed(question), k=5, max_distance=1.0
4)
5
6# server.py:300–301 — LLM is never called if retrieval is empty
7if not chunks:
8- # system prompt: "if no passage is relevant, say no sources found"
9+ return NO_SOURCES_TEXT # model cannot hallucinate
10
11answer = ollama.chat(build_prompt(chunks, question))
12return rewrite_citations(answer) # [N] → [[N: title]](kiwix link)
The system prompt tells the model to cite every claim with [N], use only the provided passages, and respond with “no sources found” if nothing is relevant. That instruction matters, but the real enforcement is two lines above it: the LLM call never executes if retrieval returned nothing. Citations are then rewritten from [N] to clickable Kiwix deep-links — anchored to the exact article that supplied the passage.
Stack
Gluenginx · Docker Compose · bootstrap.sh · FastAPI RAG service
Local AIOllama (qwen2.5:7b default) · Open WebUI · nomic-embed-text
ArchiveKiwix · ZIM files (Wikipedia mini, WikiMed, iFixit, Stack Exchange)
Vector DBsqlite-vec (vec0 virtual table) · int8 quantised embeddings
TargetsmacOS · Linux · Raspberry Pi 4/5 · WSL2