● refactoringbuildingsamridhlimbu.com/projects/allarkive · v0.1

❯ cd projects/allarkive

AllArkive

● open source · BSides Melb 2026

A private, offline research assistant that runs on your own hardware, with citations you can check. Useful when your internet is fine. More useful when it isn't. Wikipedia, repair manuals, medical wiki, and Stack Exchange — all served by Kiwix, all searchable via a local LLM that refuses to answer when it has no source material to cite from.

clupai8o0/allarkive docs →

API layer

hallucination guard

10–25 min

indexing (laptop)

~24 GB

default bundle

Context

Built with Sham, 2026. Presented at BSides Melbourne, May 16–17, 2026. The core problem: most "RAG with citations" systems enforce grounding at the prompt layer — they instruct the model not to hallucinate and trust it to comply. AllArkive enforces it at the API boundary: if the KNN retrieval step returns nothing past the L2 distance threshold, the FastAPI layer returns a fixed fallback string and the LLM is never invoked. The model cannot hallucinate because it is never called. The system prompt is belt-and-braces, not the actual constraint.

Timeline

Apr 2026

Problem framing

The premise: you already have the internet. AllArkive is useful when you do, and more useful when you don't. Censorship, ISP outages, cloud rug-pulls. The library should work without the AI — the AI is the bonus.

May 2026

Architecture locked

Three replaceable layers: nginx landing page, Kiwix serving ZIM files (Wikipedia / iFixit / WikiMed / Stack Exchange), and a FastAPI RAG service wired as an OpenAI-compatible endpoint. Every service bound to 127.0.0.1 by default. Open WebUI doesn't know AllArkive exists — it just sees a model called allarkive-rag in the picker.

May 2026

The refusal gate

Most "RAG with citations" projects enforce no-hallucination with a system prompt and hope. AllArkive short-circuits the LLM call at the FastAPI layer if KNN retrieval returns nothing past the L2 threshold. The model is never given the chance to fabricate. (server.py:300–301)

May 16–17 2026

v0.1 shipped · BSides Melbourne

Talk delivered. Five containers, three ZIM bundles (minimal ~5 GB, balanced ~24 GB, comprehensive ~200 GB), bootstrap.sh that wires everything in one command. Balanced default: Wikipedia mini + WikiMed + iFixit + SuperUser SE + Unix SE + Ask Ubuntu.

May 2026

v0.2 — indexer rewrite

Full rewrite of the indexer: batched async embeddings (10–30× faster on CPU), int8 vector quantisation (768 B vs 3,072 B per chunk), offset-only chunk storage — chunks stored as (char_offset, char_len) pointers into the ZIM, not full text. Index drops to ~25% of ZIM size. Hybrid BM25 mode for large-ZIM Pi deployments (reciprocal rank fusion). Schema v2.

Key technical decisions

api-layer refusal › prompt-layer refusal

The LLM call is short-circuited at the FastAPI boundary if KNN retrieval returns no chunks past the L2 distance threshold. The system prompt also instructs the model to cite every claim and respond with "no sources found" if nothing is relevant — but that's belt-and-braces. The API is the real boundary. The model cannot fabricate because it is never invoked.

sqlite-vec › pgvector / qdrant / chroma

No daemon, no schema migration, no running service. The entire index is one file you can rsync. On the Pi profile the index is ~10–15% of ZIM size with int8 quantisation. The choice matches the deployment target: a laptop or a Pi, not a server room.

openai-compatible /v1/chat/completions › custom api shape

Open WebUI is wired via OPENAI_API_BASE_URLS=http://rag:8000/v1. AllArkive appears as a model in the picker — any future OpenAI-compatible client works for free with no integration code.

AGPL-3.0 › MIT / Apache

Deliberate, not default. GPL leaves a SaaS loophole: you can run GPL software as a network service without releasing the source. AGPL closes it. A company cannot host AllArkive as a service without open-sourcing their modifications.

The refusal gate

kairos/scheduler.pypy

1# embed query → KNN top-5 (max L2 distance 1.0, normalised)
2chunks = vec_db.knn(
3    embed(question), k=5, max_distance=1.0
4)
5
6# server.py:300–301 — LLM is never called if retrieval is empty
7if not chunks:
8-    # system prompt: "if no passage is relevant, say no sources found"
9+    return NO_SOURCES_TEXT  # model cannot hallucinate
10
11answer = ollama.chat(build_prompt(chunks, question))
12return rewrite_citations(answer)  # [N] → [[N: title]](kiwix link)

The system prompt tells the model to cite every claim with [N], use only the provided passages, and respond with “no sources found” if nothing is relevant. That instruction matters, but the real enforcement is two lines above it: the LLM call never executes if retrieval returned nothing. Citations are then rewritten from [N] to clickable Kiwix deep-links — anchored to the exact article that supplied the passage.

Stack

Gluenginx · Docker Compose · bootstrap.sh · FastAPI RAG service

Local AIOllama (qwen2.5:7b default) · Open WebUI · nomic-embed-text

ArchiveKiwix · ZIM files (Wikipedia mini, WikiMed, iFixit, Stack Exchange)

Vector DBsqlite-vec (vec0 virtual table) · int8 quantised embeddings

TargetsmacOS · Linux · Raspberry Pi 4/5 · WSL2

source docs →← back to projects

1# embed query → KNN top-5 (max L2 distance 1.0, normalised)

2chunks = vec_db.knn(

3 embed(question), k=5, max_distance=1.0

6# server.py:300–301 — LLM is never called if retrieval is empty

7if not chunks:

8- # system prompt: "if no passage is relevant, say no sources found"

9+ return NO_SOURCES_TEXT # model cannot hallucinate

11answer = ollama.chat(build_prompt(chunks, question))

12return rewrite_citations(answer) # [N] → [[N: title]](kiwix link)