I built a Reverse Proxy in Rust to learn what production traffic actually demands

May 24, 2026 • • 9 min read

Most backend projects treat the proxy layer like a black box. Requests come in, traffic gets forwarded, and as long as the app responds, nobody looks too closely at what happens in between. I wanted to understand that layer properly, so I built my own reverse proxy in Rust.

What started as a way to learn request routing and load balancing turned into something more useful. Once you stop treating traffic as a happy-path problem, the real questions show up fast: what happens when a backend starts flapping, when a client uploads too slowly, when an upstream hangs, or when you need to shut the system down without dropping in-flight requests?

Building this pushed me to think less like an application developer and more like an engineer responsible for traffic, failure, and control.

The full project is on GitHub

What I built

I built ferrum-proxy, a reverse proxy written in Rust. It sits in front of backend services, accepts HTTP requests, matches them to routes, selects a healthy backend, forwards the request, and streams the response back to the client.

At a high level, the request path does a few things in sequence: assign a request ID, reject invalid or oversized input early, match a route, and then decide whether the request can be forwarded once or needs retry-aware handling.

pub async fn handle_request<B>(mut request: Request<B>, state: AppState) -> Response<ProxyBody>
where
    B: hyper::body::Body<Data = Bytes> + Send + Sync + 'static,
    B::Error: std::error::Error + Send + Sync + 'static,
{
    state.telemetry.record_request();

    let request_id = request
        .headers()
        .get("x-request-id")
        .and_then(|v| v.to_str().ok())
        .map(str::to_string)
        .unwrap_or_else(next_request_id);

    let method = request.method().clone();
    let path = request.uri().path().to_string();

    if let Some(content_length) =
        content_length_exceeds(request.headers(), state.upstream.max_request_body_bytes())
    {
        return complete_request(/* trimmed */);
    }

    let Some(route) = match_route(&path, &state.config.routes) else {
        return complete_request(/* trimmed */);
    };

    if should_retry_request(&state, &method) {
        // buffer once, then retry safely if policy allows it
    } else {
        // forward directly on the fast path
    }
}

I like this snippet because it shows the project for what it is: not just a server that forwards bytes, but a system that enforces policy before traffic ever reaches an upstream.

Under the hood, I added path-based routing, round-robin load balancing, active and passive health checks, bounded retries for safe requests, request and response size limits, timeout handling, graceful shutdown, and Prometheus-style metrics. I also built benchmark tooling around the live proxy so I could measure throughput and latency under different scenarios instead of guessing.

Why I built it

I built it because I wanted to get closer to the kind of engineering problems that only show up once software has to deal with real traffic. A lot of backend work lives behind frameworks and managed infrastructure, which is useful, but it can also hide the mechanics that shape reliability and performance.

I did not want to build another demo project where everything works as long as nothing goes wrong. I wanted to work on the part of the stack where failures are normal, tradeoffs matter, and small design decisions have visible consequences. Writing a reverse proxy forced me to think about timeouts, retries, streaming, backpressure, unhealthy backends, and shutdown behavior as first-class concerns.

This project was also a way to make my engineering taste visible. I wanted something that showed more than syntax or framework familiarity. I wanted a project that reflected how I think about systems: build the core path carefully, measure it, test it under realistic conditions, and be honest about what still needs improvement.

Core features

The proxy supports path-based routing, so incoming requests can be mapped to different backend pools based on URL prefixes. Once a route is matched, traffic is distributed with round-robin load balancing across healthy backends.

I added both active and passive health checks. Active checks probe backend health endpoints in the background, while passive checks react to failures seen in real traffic. That made it possible to remove unstable backends from rotation and bring them back once they recovered.

The request path supports streaming for both inbound and outbound traffic, with explicit size limits and timeout handling to avoid letting slow or abusive clients hold resources indefinitely. I also added bounded retries for safe requests, graceful shutdown with connection draining, and Prometheus-style metrics so the system can be observed while it is under load.

To make performance work concrete, I built a benchmark workflow around the live proxy using wrk. That let me measure latency and throughput across healthy traffic, retries, large responses, and request uploads instead of relying on assumptions.

Engineering decisions

Rust felt like the right choice for this project because the problem space is concurrency-heavy, IO-bound, and sensitive to correctness. A reverse proxy has to manage many connections at once, move data efficiently, and stay predictable under failure. Rust gave me low-level control without forcing me to accept memory unsafety as the cost.

One of the biggest decisions was to treat streaming as the default instead of buffering everything up front. That matters for latency, memory usage, and behavior under larger payloads. At the same time, I had to be selective about where buffering was necessary, especially for retryable requests, because replaying a request safely requires a different tradeoff than simply forwarding it once.

I was also careful about where to draw boundaries around retry logic and health behavior. That tradeoff shows up directly in the retry path. I did not want retries to be open-ended or invisible. The proxy only retries within a bounded budget, keeps track of which backends have already been tried, and stops once the request is no longer safe or useful to replay.

async fn forward_with_retries(
    state: &AppState,
    route: &RouteConfig,
    method: &Method,
    path: &str,
    request: Request<Bytes>,
    client_addr: Option<SocketAddr>,
    started: Instant,
    client_body_timeout: Duration,
    request_id: &str,
) -> Response<ProxyBody> {
    let replayable = ReplayableRequest::from(request);
    let max_attempts = state.config.retry.max_attempts;
    let total_timeout = Duration::from_millis(state.config.retry.total_timeout_ms);
    let mut attempted_backends = HashSet::new();
    let mut attempt = 1usize;

    loop {
        if started.elapsed() >= total_timeout {
            return complete_request(/* retry budget exhausted */);
        }

        let backend = match select_backend(state, route, Some(&attempted_backends)) {
            Some(backend) => backend,
            None => return complete_request(/* no healthy backends left */),
        };
        attempted_backends.insert(backend.clone());

        let response = forward_request(/* trimmed */).await;
        let status = response.status();

        if !should_retry_response(state, status) || attempt >= max_attempts {
            return response;
        }

        attempt += 1;
    }
}

I like this part because it reflects the kind of decision-making I care about in systems work. The question is not “can I add retries?” It is “how do I add retries without making failure behavior worse?”

Retrying everything is easy to implement and dangerous in practice. I limited retries to safer request types and kept them inside a bounded timeout window. The same thinking applied to backend health: the goal was not just to mark things healthy or unhealthy, but to make failure handling stable enough that the proxy would not amplify backend problems.

Another important decision was how to measure the system. I chose to benchmark the live proxy with wrk instead of relying on function-level microbenchmarks, because the real costs are in sockets, HTTP parsing, routing, forwarding, streaming, retries, and logging under load. That gave me results that were much closer to how the system would actually behave in practice.

What was hard

The hardest part was that none of the interesting problems were isolated. Almost every feature touched something else. Retries affected buffering. Buffering affected memory pressure. Health checks affected load balancing. Timeouts affected both correctness and user experience. Even graceful shutdown turned into more than just “stop the server” once in-flight requests had to be handled properly.

Header handling was another place where the project became more real than it looked at first. A proxy cannot just forward everything blindly. Connection-specific headers, forwarding headers, and host information all have to be handled deliberately, or the proxy starts creating subtle correctness problems between clients and upstreams.

Failure handling was also harder than the happy path. It is easy to proxy a request when both client and backend behave normally. It is much harder to decide what should happen when a backend hangs, when a client sends data too slowly, or when one backend starts failing intermittently. Those are the situations that shape whether a system is trustworthy.

The other difficult part was performance measurement. It is tempting to treat one benchmark run as the answer, but that is not how benchmarking works. Results vary, especially on short runs, so I had to think about repeatability, scenario design, and what a useful baseline actually looks like. That changed the project from just building features to building a system I could evaluate with some discipline.

How I validated it

I did not want this to be the kind of systems project that only works in a README. I tested it at multiple levels so I could validate both individual components and the full request path.

At the code level, I added unit tests around routing, balancing, health state, config validation, upstream behavior, and telemetry. That helped lock down the smaller pieces and made it easier to change internals without breaking core assumptions.

I also added integration and black-box tests that start the real server and send HTTP traffic through it. One of the tests I like most checks graceful shutdown against a live in-flight request. That matters because shutdown behavior is easy to hand-wave and surprisingly easy to get wrong.

#[tokio::test]
async fn drains_in_flight_request_during_graceful_shutdown() {
    let backend = spawn_slow_upstream(
        "drained-backend",
        StatusCode::OK,
        Duration::from_millis(100),
    )
    .await;

    let (shutdown_tx, shutdown_rx) = oneshot::channel::<()>();

    let task = tokio::spawn(async move {
        let _ = server::run_with_shutdown(config, async move {
            let _ = shutdown_rx.await;
        })
        .await;
    });

    let request_task = tokio::spawn(async move {
        get(&base_url, "/api/users").await
    });

    sleep(Duration::from_millis(20)).await;
    let _ = shutdown_tx.send(());

    let response = timeout(Duration::from_secs(1), request_task).await.unwrap().unwrap();
    assert_eq!(response.status, StatusCode::OK);
    assert_eq!(response.body, "drained-backend");
}

Tests like that gave me more confidence than a basic happy-path check. They forced me to verify behavior at the exact boundary where production systems usually get exposed.

For performance validation, I built a benchmark workflow around the live proxy using wrk. Instead of measuring tiny internal functions, I measured the real system under healthy traffic, larger responses, retries, and uploads. I also repeated runs to get a more credible baseline, because one benchmark number on its own is not very useful.

That combination gave me something I trust more than a feature checklist. It gave me evidence that the system behaves the way I intended under both normal and stressed conditions.

What I learned

The biggest thing I learned is that systems work is mostly about edge cases, not the happy path. The basic request-forwarding flow is not the hard part. The hard part is deciding what the system should do when something is slow, partial, unhealthy, or inconsistent, and then making that behavior observable and predictable.

I also learned that performance work is easy to talk about and harder to do properly. It is not enough to say something is fast. You need a benchmark that reflects the real request path, a stable enough setup to compare runs, and enough discipline not to overread one good number. Building the benchmark harness was part of the engineering work, not something I tacked on at the end.

Another lesson was how much operational thinking changes the way you build software. Once you care about graceful shutdown, metrics, retries, backend health, and failure classification, the design gets sharper. You start asking better questions about what the system guarantees, what it exposes, and how it behaves when conditions are bad rather than ideal.

More than anything, this project pushed me to think in terms of behavior under load instead of just correctness in isolation. That shift made me feel closer to the kind of engineering work I want to keep doing.