API Gateway Patterns: Routing, Rate Limiting, Auth

As applications evolve from monoliths to distributed architectures, managing the interface between external clients and internal services becomes increasingly complex. An API gateway centralizes cross-cutting concerns like routing, authentication, rate limiting, and observability into a single layer, preventing each microservice from reimplementing these capabilities independently.

Without a gateway, every service must handle its own authentication, implement its own rate limiting, manage its own TLS termination, and expose its own endpoints. The gateway pattern consolidates this complexity into one manageable component.

Routing Patterns

Path-Based Routing

The most straightforward routing pattern directs requests to backend services based on URL path prefixes. Requests to /api/users/* route to the user service, /api/orders/* to the order service, and /api/products/* to the product service. Clients interact with a single domain and are unaware of the service boundaries behind the gateway.

Path-based routing requires careful namespace planning. Conflicting or overlapping paths create ambiguity. A clear naming convention established early prevents routing conflicts as the number of services grows.

Header-Based Routing

Some routing decisions depend on request headers rather than paths. API versioning is a common case: requests with Accept: application/vnd.api.v2+json route to the v2 service while older clients route to v1. Similarly, A/B test assignments carried in headers can route to different service versions.

Content-Based Routing

Advanced gateways inspect request bodies to make routing decisions. A multi-tenant platform might examine a tenant identifier in the request payload to route to the appropriate database shard or geographic region. This pattern adds latency since the gateway must parse the request body, so it should be used selectively.

Backend for Frontend (BFF)

The BFF pattern creates specialized gateway layers for different client types. A mobile BFF aggregates multiple backend calls into a single response optimized for mobile bandwidth constraints. A web BFF returns richer data structures suited for desktop interfaces. Each BFF handles the specific needs of its client type without polluting a shared API.

Rate Limiting Strategies

Rate limiting protects backend services from overload, prevents abuse, and ensures fair resource allocation among clients. The gateway is the natural enforcement point because it sees all incoming traffic before it reaches backend services.

Fixed Window

The simplest strategy allows a fixed number of requests within a time window, such as 100 requests per minute. When the window resets, the counter resets. Fixed windows are easy to implement but vulnerable to burst traffic at window boundaries. A client could send 100 requests at the end of one window and 100 more at the start of the next, creating a burst of 200 requests within seconds.

Sliding Window

Sliding window rate limiting smooths the burst problem by considering request history continuously rather than in fixed intervals. Instead of resetting the counter every minute, the algorithm considers the weighted count across the current and previous windows. This produces more consistent throttling behavior at the cost of slightly more complex implementation.

Token Bucket

The token bucket algorithm models a bucket that fills with tokens at a steady rate. Each request consumes a token. When the bucket is empty, requests are rejected or queued. The bucket has a maximum capacity, allowing short bursts up to the bucket size while enforcing a sustained rate limit.

Token bucket is the most widely used algorithm in production API gateways because it naturally allows bursts (which legitimate clients produce) while still enforcing long-term rate limits. AWS API Gateway, Kong, and NGINX all support token bucket rate limiting.

Rate Limit Scopes

Rate limits can be applied at multiple granularities. Per-API-key limits prevent individual clients from monopolizing resources. Per-endpoint limits protect expensive operations (complex searches, report generation) more aggressively than cheap ones (health checks, simple reads). Global limits protect the overall system capacity regardless of which client is calling.

Effective rate limiting communicates clearly with clients. The 429 Too Many Requests response code signals throttling. The Retry-After header tells clients when to retry. X-RateLimit-Remaining and X-RateLimit-Reset headers let clients self-regulate before hitting limits.

Authentication and Authorization Patterns

Token Validation

The most common gateway authentication pattern validates JWT or OAuth tokens on every request. The gateway verifies the token signature, checks expiration, and extracts claims (user ID, roles, permissions) that it passes to backend services as trusted headers. Backend services skip authentication entirely, trusting the gateway's validation.

This pattern centralizes the authentication logic and key management in one place. When signing keys rotate, only the gateway configuration changes. When authentication rules evolve, only the gateway needs updating.

API Key Authentication

For service-to-service and third-party API access, API keys provide a simpler authentication model. The gateway validates the key against a key store, looks up the associated rate limits and permissions, and either passes the request through or rejects it. API keys work well for machine-to-machine communication where OAuth flows would add unnecessary complexity.

Mutual TLS (mTLS)

In zero-trust architectures, the gateway enforces mutual TLS, requiring clients to present valid certificates. This provides strong identity verification beyond what tokens or API keys offer. mTLS is particularly common for inter-service communication within service meshes, where the gateway or sidecar proxy handles certificate management transparently.

Authorization at the Gateway

While authentication (who is this?) naturally belongs at the gateway, authorization (can they do this?) is more nuanced. Coarse-grained authorization, such as checking whether a user has a valid subscription or belongs to an allowed tenant, fits well at the gateway. Fine-grained authorization, such as checking whether a user can edit a specific document, requires business logic that belongs in the backend service.

Choosing a Gateway

Kong: Open-source, plugin-extensible, built on NGINX. Strong community and commercial support. Best for teams wanting extensive customization.
AWS API Gateway: Fully managed, deeply integrated with AWS services. Best for AWS-native architectures seeking minimal operational overhead.
Envoy: High-performance proxy often used as a gateway in Kubernetes environments. Best for teams already using service mesh architectures.
Traefik: Auto-discovers services in Docker and Kubernetes environments. Best for dynamic container orchestration setups.
NGINX: Proven, performant, widely understood. Best for teams with existing NGINX expertise who need gateway capabilities.

Operational Considerations

The API gateway is a single point of entry, which makes it both powerful and risky. A gateway outage affects all backend services. Design for high availability with multiple instances behind a load balancer. Implement health checks that detect degraded gateway instances. Keep gateway logic as thin as possible, handling only cross-cutting concerns, to minimize the blast radius of gateway bugs.

Monitor gateway latency separately from backend latency. The gateway adds milliseconds to every request; if that overhead grows, it impacts all services. Set alerts on gateway error rates, latency percentiles, and connection counts to catch problems before they cascade.

API Gateway Patterns: Routing, Rate Limiting, Auth

Key Takeaways

Routing Patterns

Path-Based Routing

Header-Based Routing

Content-Based Routing

Backend for Frontend (BFF)

Rate Limiting Strategies

Fixed Window

Sliding Window

Token Bucket

Rate Limit Scopes

Authentication and Authorization Patterns

Token Validation

API Key Authentication

Mutual TLS (mTLS)

Authorization at the Gateway

Choosing a Gateway

Operational Considerations

Worth sharing?

⚡ Key Takeaways

Routing Patterns

Path-Based Routing

Header-Based Routing

Content-Based Routing

Backend for Frontend (BFF)

Rate Limiting Strategies

Fixed Window

Sliding Window

Token Bucket

Rate Limit Scopes

Authentication and Authorization Patterns

Token Validation

API Key Authentication

Mutual TLS (mTLS)

Authorization at the Gateway

Choosing a Gateway

Operational Considerations

Share this article

Worth sharing?

Related Stories

SAA-C03 Exam Failures: Not About AWS, But Exam Strategy

Micro Agents: The Production-Grade AI Secret Weapon?

Microservices: The Org Problem, Not The Code

LLM Gateway Shrinks Python Code by 60%

Stay in the loop

Key Takeaways