As applications evolve from monoliths to distributed architectures, managing the interface between external clients and internal services becomes increasingly complex. An API gateway centralizes cross-cutting concerns like routing, authentication, rate limiting, and observability into a single layer, preventing each microservice from reimplementing these capabilities independently.
Without a gateway, every service must handle its own authentication, implement its own rate limiting, manage its own TLS termination, and expose its own endpoints. The gateway pattern consolidates this complexity into one manageable component.
Routing Patterns
Path-Based Routing
The most straightforward routing pattern directs requests to backend services based on URL path prefixes. Requests to /api/users/* route to the user service, /api/orders/* to the order service, and /api/products/* to the product service. Clients interact with a single domain and are unaware of the service boundaries behind the gateway.
Path-based routing requires careful namespace planning. Conflicting or overlapping paths create ambiguity. A clear naming convention established early prevents routing conflicts as the number of services grows.
Header-Based Routing
Some routing decisions depend on request headers rather than paths. API versioning is a common case: requests with Accept: application/vnd.api.v2+json route to the v2 service while older clients route to v1. Similarly, A/B test assignments carried in headers can route to different service versions.
Content-Based Routing
Advanced gateways inspect request bodies to make routing decisions. A multi-tenant platform might examine a tenant identifier in the request payload to route to the appropriate database shard or geographic region. This pattern adds latency since the gateway must parse the request body, so it should be used selectively.
Backend for Frontend (BFF)
The BFF pattern creates specialized gateway layers for different client types. A mobile BFF aggregates multiple backend calls into a single response optimized for mobile bandwidth constraints. A web BFF returns richer data structures suited for desktop interfaces. Each BFF handles the specific needs of its client type without polluting a shared API.
Rate Limiting Strategies
Rate limiting protects backend services from overload, prevents abuse, and ensures fair resource allocation among clients. The gateway is the natural enforcement point because it sees all incoming traffic before it reaches backend services.
Fixed Window
The simplest strategy allows a fixed number of requests within a time window, such as 100 requests per minute. When the window resets, the counter resets. Fixed windows are easy to implement but vulnerable to burst traffic at window boundaries. A client could send 100 requests at the end of one window and 100 more at the start of the next, creating a burst of 200 requests within seconds.
Sliding Window
Sliding window rate limiting smooths the burst problem by considering request history continuously rather than in fixed intervals. Instead of resetting the counter every minute, the algorithm considers the weighted count across the current and previous windows. This produces more consistent throttling behavior at the cost of slightly more complex implementation.
Token Bucket
The token bucket algorithm models a bucket that fills with tokens at a steady rate. Each request consumes a token. When the bucket is empty, requests are rejected or queued. The bucket has a maximum capacity, allowing short bursts up to the bucket size while enforcing a sustained rate limit.
Token bucket is the most widely used algorithm in production API gateways because it naturally allows bursts (which legitimate clients produce) while still enforcing long-term rate limits. AWS API Gateway, Kong, and NGINX all support token bucket rate limiting.
Rate Limit Scopes
Rate limits can be applied at multiple granularities. Per-API-key limits prevent individual clients from monopolizing resources. Per-endpoint limits protect expensive operations (complex searches, report generation) more aggressively than cheap ones (health checks, simple reads). Global limits protect the overall system capacity regardless of which client is calling.
Effective rate limiting communicates clearly with clients. The 429 Too Many Requests response code signals throttling. The Retry-After header tells clients when to retry. X-RateLimit-Remaining and X-RateLimit-Reset headers let clients self-regulate before hitting limits.
Authentication and Authorization Patterns
Token Validation
The most common gateway authentication pattern validates JWT or OAuth tokens on every request. The gateway verifies the token signature, checks expiration, and extracts claims (user ID, roles, permissions) that it passes to backend services as trusted headers. Backend services skip authentication entirely, trusting the gateway's validation.
This pattern centralizes the authentication logic and key management in one place. When signing keys rotate, only the gateway configuration changes. When authentication rules evolve, only the gateway needs updating.
API Key Authentication
For service-to-service and third-party API access, API keys provide a simpler authentication model. The gateway validates the key against a key store, looks up the associated rate limits and permissions, and either passes the request through or rejects it. API keys work well for machine-to-machine communication where OAuth flows would add unnecessary complexity.
Mutual TLS (mTLS)
In zero-trust architectures, the gateway enforces mutual TLS, requiring clients to present valid certificates. This provides strong identity verification beyond what tokens or API keys offer. mTLS is particularly common for inter-service communication within service meshes, where the gateway or sidecar proxy handles certificate management transparently.
Authorization at the Gateway
While authentication (who is this?) naturally belongs at the gateway, authorization (can they do this?) is more nuanced. Coarse-grained authorization, such as checking whether a user has a valid subscription or belongs to an allowed tenant, fits well at the gateway. Fine-grained authorization, such as checking whether a user can edit a specific document, requires business logic that belongs in the backend service.
Choosing a Gateway
- Kong: Open-source, plugin-extensible, built on NGINX. Strong community and commercial support. Best for teams wanting extensive customization.
- AWS API Gateway: Fully managed, deeply integrated with AWS services. Best for AWS-native architectures seeking minimal operational overhead.
- Envoy: High-performance proxy often used as a gateway in Kubernetes environments. Best for teams already using service mesh architectures.
- Traefik: Auto-discovers services in Docker and Kubernetes environments. Best for dynamic container orchestration setups.
- NGINX: Proven, performant, widely understood. Best for teams with existing NGINX expertise who need gateway capabilities.
Operational Considerations
The API gateway is a single point of entry, which makes it both powerful and risky. A gateway outage affects all backend services. Design for high availability with multiple instances behind a load balancer. Implement health checks that detect degraded gateway instances. Keep gateway logic as thin as possible, handling only cross-cutting concerns, to minimize the blast radius of gateway bugs.
Monitor gateway latency separately from backend latency. The gateway adds milliseconds to every request; if that overhead grows, it impacts all services. Set alerts on gateway error rates, latency percentiles, and connection counts to catch problems before they cascade.