● refactoringbuildingsamridhlimbu.com/projects/load-balancer · v0.1

❯ cd projects/load-balancer

load-balancer

● systems · Go

Adaptive concurrent HTTP load balancer in Go — 21 iterations from basic round-robin to P2C-EWMA, AIMD concurrency control, circuit breakers, and Prometheus observability. SIT315 HD submission.

github

iterations

strategies

grade

Context

SIT315 Concurrent and Distributed Systems assessment at Deakin — written in Go 1.23. The exercise: start with the simplest working load balancer and iterate toward production-grade. 21 versions. Each adds one capability — health awareness, adaptive concurrency, failure isolation. The progression is the point.

Timeline

2025 · v1–4

Round-robin baseline

SIT315 (Concurrent & Distributed Systems). Start simple: even distribution, no health awareness. Establish the request routing primitives in Go.

2025 · v5–10

Health-aware routing

Added weighted round-robin and least-connections. Backends with higher measured latency receive proportionally fewer requests.

2025 · v11–15

P2C-EWMA

Power-of-Two-Choices with Exponentially Weighted Moving Average. Pick two backends at random, route to the one with lower EWMA latency. Avoids hot-spots without explicit health scores.

2025 · v16–20

AIMD + circuit breakers

AIMD concurrency control — additive increase on success, multiplicative decrease on timeout. Circuit breakers (closed/half-open/open) with outlier quarantine.

2025 · v21

Observability + admin API

Prometheus /metrics, slog structured JSON logging, /admin/metrics/json. Dynamic backend add/remove without restart, live strategy switching, canary rollout controls. /admin/selftest and /debug/config for diagnostics.

Key technical decisions

p2c-ewma › pure round-robin

Round-robin ignores backend health. P2C-EWMA naturally avoids slow backends — latency is the signal, no explicit health scores needed. Two random picks, route to the better one.

aimd concurrency › fixed connection limit

The same algorithm TCP uses for congestion control. Grows aggressively when backends are healthy, halves on the first timeout. Adapts in real time without coordination between backends.

circuit breakers › timeout-only failure handling

A consistently failing backend should be quarantined, not just timed out. Warm-up ramp on recovery + half-open probing lets it return to rotation gradually without manual intervention.

ip-hash sticky sessions › stateless round-robin

Stateful workloads need affinity. IP-Hash maps client IP → consistent backend so session state stays local — no external store needed.

prometheus /metrics › log-only observability

Metrics are queryable over time; logs are forensic. Structured slog JSON logging pairs with Prometheus so you get both queryable time-series and machine-readable event context.

AIMD in practice

kairos/scheduler.pypy

1// additive increase on success
2if success {
3    concurrency += 1
4}
5// multiplicative decrease on timeout
6if timeout {
7    concurrency = int(float64(concurrency) * 0.5)
8}

The same algorithm TCP uses for congestion control, applied to backend concurrency. Grows aggressively when backends are healthy, backs off hard on the first timeout. Works without coordination between backends.

Stack

LanguageGo 1.23

StrategiesRound-Robin · Weighted RR · Least-Connections · P2C-EWMA · IP-Hash

ResilienceAIMD concurrency · token-bucket rate limiting · circuit breakers · outlier quarantine · warm-up ramp · global semaphore

ObservabilityPrometheus /metrics · slog JSON logging · /admin/metrics/json

Admindynamic backend add/remove · live strategy switching · canary rollout · /admin/selftest · /debug/config

source ← back to projects

● refactoringbuildingsamridhlimbu.com/projects/load-balancer · v0.1

❯ cd projects/load-balancer