● buildingsamridhlimbu.com/projects/load-balancer · v0.1
❯ cd projects/load-balancer
load-balancer
● systems · GoAdaptive concurrent HTTP load balancer in Go — 21 iterations from basic round-robin to P2C-EWMA, AIMD concurrency control, circuit breakers, and Prometheus observability. SIT315 HD submission.
21
iterations
5
strategies
HD
grade
Context
SIT315 Concurrent and Distributed Systems assessment at Deakin — written in Go 1.23. The exercise: start with the simplest working load balancer and iterate toward production-grade. 21 versions. Each adds one capability — health awareness, adaptive concurrency, failure isolation. The progression is the point.
Timeline
2025 · v1–4
Round-robin baseline
SIT315 (Concurrent & Distributed Systems). Start simple: even distribution, no health awareness. Establish the request routing primitives in Go.
2025 · v5–10
Health-aware routing
Added weighted round-robin and least-connections. Backends with higher measured latency receive proportionally fewer requests.
2025 · v11–15
P2C-EWMA
Power-of-Two-Choices with Exponentially Weighted Moving Average. Pick two backends at random, route to the one with lower EWMA latency. Avoids hot-spots without explicit health scores.
2025 · v16–20
AIMD + circuit breakers
AIMD concurrency control — additive increase on success, multiplicative decrease on timeout. Circuit breakers (closed/half-open/open) with outlier quarantine.
2025 · v21
Observability + admin API
Prometheus /metrics, slog structured JSON logging, /admin/metrics/json. Dynamic backend add/remove without restart, live strategy switching, canary rollout controls. /admin/selftest and /debug/config for diagnostics.
Key technical decisions
01
p2c-ewma › pure round-robin
Round-robin ignores backend health. P2C-EWMA naturally avoids slow backends — latency is the signal, no explicit health scores needed. Two random picks, route to the better one.
02
aimd concurrency › fixed connection limit
The same algorithm TCP uses for congestion control. Grows aggressively when backends are healthy, halves on the first timeout. Adapts in real time without coordination between backends.
03
circuit breakers › timeout-only failure handling
A consistently failing backend should be quarantined, not just timed out. Warm-up ramp on recovery + half-open probing lets it return to rotation gradually without manual intervention.
04
ip-hash sticky sessions › stateless round-robin
Stateful workloads need affinity. IP-Hash maps client IP → consistent backend so session state stays local — no external store needed.
05
prometheus /metrics › log-only observability
Metrics are queryable over time; logs are forensic. Structured slog JSON logging pairs with Prometheus so you get both queryable time-series and machine-readable event context.
AIMD in practice
kairos/scheduler.pypy
1// additive increase on success2if success {3 concurrency += 14}5// multiplicative decrease on timeout6if timeout {7 concurrency = int(float64(concurrency) * 0.5)8}
The same algorithm TCP uses for congestion control, applied to backend concurrency. Grows aggressively when backends are healthy, backs off hard on the first timeout. Works without coordination between backends.
Stack
LanguageGo 1.23
StrategiesRound-Robin · Weighted RR · Least-Connections · P2C-EWMA · IP-Hash
ResilienceAIMD concurrency · token-bucket rate limiting · circuit breakers · outlier quarantine · warm-up ramp · global semaphore
ObservabilityPrometheus /metrics · slog JSON logging · /admin/metrics/json
Admindynamic backend add/remove · live strategy switching · canary rollout · /admin/selftest · /debug/config