← Blog

Stario 3.3 — single-worker performance pass

Stario 3.3 is a performance-focused release. The server model stays the same: one process, one event loop, one request task per request. The changes are in the work that happens for every HTTP exchange: request body allocation, header handling, response serialization, compression negotiation, static file selection, and telemetry overhead when you deliberately want it out of the way.

The benchmark below reuses the endpoint shape from Cemrehan Çavdar’s Gin, Elysia, BlackSheep, and FastAPI benchmark: plaintext, JSON, route params, and a small POST validation endpoint.

Setup

bash
wrk -t 2 -c 128 -d 10s http://127.0.0.1:<port>/plaintext
wrk -t 2 -c 128 -d 10s http://127.0.0.1:<port>/json
wrk -t 2 -c 128 -d 10s http://127.0.0.1:<port>/user/42
wrk -t 2 -c 128 -d 10s -s projects/stario/benchmark/validate.lua \
  http://127.0.0.1:<port>/validate

All rows use one worker or process. Stario, Uvicorn, and Granian run on uvloop; Uvicorn targets force httptools. Access logs are disabled where the server exposes that option. Stario response compression is disabled so compression negotiation does not hide routing and response-writing cost.

Run host: Apple M1, 8 CPU cores, 16 GB memory, macOS 26.3.1, Python 3.14.

For JSON responses, Stario, FastAPI, and BlackSheep use ujson explicitly. Sanic depends on ujson and uses its own JSON response helper. The POST body is {"name":"Ada","age":42}. Stario, BlackSheep, and Sanic validate those fields manually; FastAPI uses Pydantic request validation, matching the original article’s FastAPI shape.

The current harness lives in projects/stario/benchmark/run.sh and writes summary.md plus config.txt to timestamped directories under projects/stario/benchmark/results/. It focuses on the framework comparison; the 3.2 release comparison below is kept here as release context.

Stario 3.3 vs 3.2

This figure keeps both Stario versions on the JSON tracer. Stario 3.2 did not have --tracer noop, so --tracer json is the cleaner release-to-release comparison.

Stario 3.3 vs 3.2, JSON tracerrequests/sec · higher is better
Stario 3.3 vs 3.2, JSON tracerHorizontal bar chart. Values are requests per second; higher is better.PlaintextStario 3.3, --tracer json34,009Stario 3.2, --tracer json26,938JSONStario 3.3, --tracer json35,907Stario 3.2, --tracer json27,444ParamsStario 3.3, --tracer json28,927Stario 3.2, --tracer json20,734ValidateStario 3.3, --tracer json24,665Stario 3.2, --tracer json23,097
Each endpoint is scaled independently so the fastest row for that endpoint fills the track.

Relative to 3.2, this run puts 3.3 ahead on plaintext (+26%), JSON (+31%), params (+40%), and validate (+7%).

Framework comparison

This table is not a release-to-release test. It asks a different question: what does the same single-worker route mix look like when access tracing is out of the request path?

Stario uses --tracer noop. FastAPI uses Uvicorn with uvloop, httptools, and no access log. BlackSheep is shown twice: once through Uvicorn with the same loop/parser settings, and once through Granian. Sanic runs as one process with access logging disabled.

Single-worker framework comparisonrequests/sec · higher is better
Single-worker framework comparisonHorizontal bar chart. Values are requests per second; higher is better.PlaintextStario 3.3, --tracer noop80,646FastAPI, Uvicorn26,680BlackSheep, Uvicorn55,829BlackSheep, Granian88,746Sanic48,429JSONStario 3.3, --tracer noop68,279FastAPI, Uvicorn24,877BlackSheep, Uvicorn53,243BlackSheep, Granian80,195Sanic54,030ParamsStario 3.3, --tracer noop78,234FastAPI, Uvicorn20,304BlackSheep, Uvicorn46,040BlackSheep, Granian87,730Sanic52,433ValidateStario 3.3, --tracer noop65,778FastAPI, Uvicorn19,019BlackSheep, Uvicorn45,386BlackSheep, Granian45,354Sanic43,770
Each endpoint is scaled independently so the fastest row for that endpoint fills the track.

The Granian row is useful context, not a claim about Stario’s server model. Granian moves a lot of HTTP server work into Rust; Stario’s runtime remains an asyncio protocol. That is an architecture difference you should keep visible when comparing the rows.

What changed in 3.3

The release keeps the public handler model intact. The faster rows come from smaller internal pieces:

  • Request bodies are allocated lazily; a typical GET no longer creates a body reader it will never use.

  • Parsed headers stay closer to their wire form in the HTTP hot path.

  • One-shot responses avoid the chunked writer path and do less header-map work.

  • Accept-Encoding negotiation is byte-native and cached for common headers.

  • Static file variant selection avoids parsing negotiation headers when a file has no compressed variants.

  • NoOpTracer gives benchmark and high-throughput runs a tracer that preserves the tracer protocol without serializing spans.