Cloud & Infrastructure

API Gateway Patterns: Routing, Rate Limiting, Auth

An API gateway sits between clients and backend services, handling cross-cutting concerns that every API needs. This guide covers the patterns that make gateways effective.

API Gateway Patterns: Routing, Rate Limiting, and Authentication

Key Takeaways

  • Centralize cross-cutting concerns — API gateways consolidate routing, authentication, and rate limiting into one layer, preventing each microservice from reimplementing these capabilities.
  • Token bucket balances burst and sustained limits — The token bucket algorithm is the most practical rate limiting strategy, allowing legitimate traffic bursts while enforcing long-term rate caps.
  • Split auth responsibilities appropriately — Handle authentication and coarse-grained authorization at the gateway, but keep fine-grained authorization in backend services where business context lives.

As applications evolve from monoliths to distributed architectures, managing the interface between external clients and internal services becomes increasingly complex. An API gateway centralizes cross-cutting concerns like routing, authentication, rate limiting, and observability into a single layer, preventing each microservice from reimplementing these capabilities independently.

Without a gateway, every service must handle its own authentication, implement its own rate limiting, manage its own TLS termination, and expose its own endpoints. The gateway pattern consolidates this complexity into one manageable component.

Routing Patterns

Path-Based Routing

The most straightforward routing pattern directs requests to backend services based on URL path prefixes. Requests to /api/users/* route to the user service, /api/orders/* to the order service, and /api/products/* to the product service. Clients interact with a single domain and are unaware of the service boundaries behind the gateway.

Path-based routing requires careful namespace planning. Conflicting or overlapping paths create ambiguity. A clear naming convention established early prevents routing conflicts as the number of services grows.

Header-Based Routing

Some routing decisions depend on request headers rather than paths. API versioning is a common case: requests with Accept: application/vnd.api.v2+json route to the v2 service while older clients route to v1. Similarly, A/B test assignments carried in headers can route to different service versions.

Content-Based Routing

Advanced gateways inspect request bodies to make routing decisions. A multi-tenant platform might examine a tenant identifier in the request payload to route to the appropriate database shard or geographic region. This pattern adds latency since the gateway must parse the request body, so it should be used selectively.

Backend for Frontend (BFF)

The BFF pattern creates specialized gateway layers for different client types. A mobile BFF aggregates multiple backend calls into a single response optimized for mobile bandwidth constraints. A web BFF returns richer data structures suited for desktop interfaces. Each BFF handles the specific needs of its client type without polluting a shared API.

Rate Limiting Strategies

Rate limiting protects backend services from overload, prevents abuse, and ensures fair resource allocation among clients. The gateway is the natural enforcement point because it sees all incoming traffic before it reaches backend services.

Fixed Window

The simplest strategy allows a fixed number of requests within a time window, such as 100 requests per minute. When the window resets, the counter resets. Fixed windows are easy to implement but vulnerable to burst traffic at window boundaries. A client could send 100 requests at the end of one window and 100 more at the start of the next, creating a burst of 200 requests within seconds.

Sliding Window

Sliding window rate limiting smooths the burst problem by considering request history continuously rather than in fixed intervals. Instead of resetting the counter every minute, the algorithm considers the weighted count across the current and previous windows. This produces more consistent throttling behavior at the cost of slightly more complex implementation.

Token Bucket

The token bucket algorithm models a bucket that fills with tokens at a steady rate. Each request consumes a token. When the bucket is empty, requests are rejected or queued. The bucket has a maximum capacity, allowing short bursts up to the bucket size while enforcing a sustained rate limit.

Token bucket is the most widely used algorithm in production API gateways because it naturally allows bursts (which legitimate clients produce) while still enforcing long-term rate limits. AWS API Gateway, Kong, and NGINX all support token bucket rate limiting.

Rate Limit Scopes

Rate limits can be applied at multiple granularities. Per-API-key limits prevent individual clients from monopolizing resources. Per-endpoint limits protect expensive operations (complex searches, report generation) more aggressively than cheap ones (health checks, simple reads). Global limits protect the overall system capacity regardless of which client is calling.

Effective rate limiting communicates clearly with clients. The 429 Too Many Requests response code signals throttling. The Retry-After header tells clients when to retry. X-RateLimit-Remaining and X-RateLimit-Reset headers let clients self-regulate before hitting limits.

Authentication and Authorization Patterns

Token Validation

The most common gateway authentication pattern validates JWT or OAuth tokens on every request. The gateway verifies the token signature, checks expiration, and extracts claims (user ID, roles, permissions) that it passes to backend services as trusted headers. Backend services skip authentication entirely, trusting the gateway's validation.

This pattern centralizes the authentication logic and key management in one place. When signing keys rotate, only the gateway configuration changes. When authentication rules evolve, only the gateway needs updating.

API Key Authentication

For service-to-service and third-party API access, API keys provide a simpler authentication model. The gateway validates the key against a key store, looks up the associated rate limits and permissions, and either passes the request through or rejects it. API keys work well for machine-to-machine communication where OAuth flows would add unnecessary complexity.

Mutual TLS (mTLS)

In zero-trust architectures, the gateway enforces mutual TLS, requiring clients to present valid certificates. This provides strong identity verification beyond what tokens or API keys offer. mTLS is particularly common for inter-service communication within service meshes, where the gateway or sidecar proxy handles certificate management transparently.

Authorization at the Gateway

While authentication (who is this?) naturally belongs at the gateway, authorization (can they do this?) is more nuanced. Coarse-grained authorization, such as checking whether a user has a valid subscription or belongs to an allowed tenant, fits well at the gateway. Fine-grained authorization, such as checking whether a user can edit a specific document, requires business logic that belongs in the backend service.

Choosing a Gateway

  • Kong: Open-source, plugin-extensible, built on NGINX. Strong community and commercial support. Best for teams wanting extensive customization.
  • AWS API Gateway: Fully managed, deeply integrated with AWS services. Best for AWS-native architectures seeking minimal operational overhead.
  • Envoy: High-performance proxy often used as a gateway in Kubernetes environments. Best for teams already using service mesh architectures.
  • Traefik: Auto-discovers services in Docker and Kubernetes environments. Best for dynamic container orchestration setups.
  • NGINX: Proven, performant, widely understood. Best for teams with existing NGINX expertise who need gateway capabilities.

Operational Considerations

The API gateway is a single point of entry, which makes it both powerful and risky. A gateway outage affects all backend services. Design for high availability with multiple instances behind a load balancer. Implement health checks that detect degraded gateway instances. Keep gateway logic as thin as possible, handling only cross-cutting concerns, to minimize the blast radius of gateway bugs.

Monitor gateway latency separately from backend latency. The gateway adds milliseconds to every request; if that overhead grows, it impacts all services. Set alerts on gateway error rates, latency percentiles, and connection counts to catch problems before they cascade.

Ibrahim Samil Ceyisakar
Written by

Founder and Editor in Chief. Technology enthusiast tracking AI, digital business, and global market trends.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Stay in the loop

The week's most important stories from DevTools Feed, delivered once a week.