Building Production APIs: Architecture Patterns That Scale
A production API needs five things beyond basic CRUD: authentication with proper token management, consistent error responses, rate limiting, request validation, and structured logging. Most APIs ship without at least two of these and pay for it later — usually when the first real user hits a race condition at 2 AM or a bad actor discovers there is no rate limit on the login endpoint.
After building APIs across 29 production projects at Mobibean, I have seen the same gaps repeat. Teams get the happy path working and ship. Then they spend months patching the things they skipped. This post covers the patterns that prevent those patches — concrete, with code, and prioritized by how much pain they save.
The 5 Non-Negotiable Patterns for Production APIs
Every API that serves real users needs these five things before launch. Not after. Not when you "get around to it." Before the first deploy.
1. Authentication With Proper Token Management
JWT access tokens with short expiry (15 minutes) plus longer-lived refresh tokens is the standard pattern for most APIs. API keys work for server-to-server communication where there is no user session.
// JWT token pair generation
import jwt from "jsonwebtoken";
interface TokenPair {
accessToken: string;
refreshToken: string;
}
function generateTokenPair(userId: string): TokenPair {
const accessToken = jwt.sign(
{ sub: userId, type: "access" },
process.env.JWT_SECRET!,
{ expiresIn: "15m" }
);
const refreshToken = jwt.sign(
{ sub: userId, type: "refresh" },
process.env.JWT_REFRESH_SECRET!,
{ expiresIn: "7d" }
);
return { accessToken, refreshToken };
}
Store refresh tokens in the database so you can revoke them. Never store JWTs in localStorage if your API serves a browser client — use httpOnly cookies instead.
2. Consistent Error Response Format
Every error from your API should have the same shape. Every single one. Frontend developers should never have to guess what an error response looks like.
// Standard error response — same shape, always
interface ApiError {
error: {
code: string; // machine-readable: "VALIDATION_ERROR"
message: string; // human-readable: "Email is required"
details?: unknown[]; // field-level errors, stack trace in dev
requestId: string; // for support tickets and log correlation
};
}
3. Rate Limiting
Without rate limiting, one aggressive client can take down your entire API. Implement per-user limits and global limits separately.
// Rate limiting middleware using sliding window
import { RateLimiterRedis } from "rate-limiter-flexible";
import { Redis } from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);
const rateLimiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: "rl",
points: 100, // 100 requests
duration: 60, // per 60 seconds
blockDuration: 60, // block for 60s if exceeded
});
async function rateLimitMiddleware(req, res, next) {
try {
await rateLimiter.consume(req.userId || req.ip);
next();
} catch {
res.status(429).json({
error: {
code: "RATE_LIMIT_EXCEEDED",
message: "Too many requests. Try again in 60 seconds.",
requestId: req.id,
},
});
}
}
4. Input Validation at the Boundary
Validate every request before it reaches your business logic. Zod or a similar schema validation library makes this straightforward.
import { z } from "zod";
const CreateUserSchema = z.object({
email: z.string().email("Invalid email format"),
name: z.string().min(1, "Name is required").max(100),
role: z.enum(["admin", "member", "viewer"]),
});
// Validate at the route handler, not inside business logic
app.post("/users", (req, res) => {
const result = CreateUserSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({
error: {
code: "VALIDATION_ERROR",
message: "Invalid request body",
details: result.error.issues,
requestId: req.id,
},
});
}
// result.data is now typed and validated
return userService.create(result.data);
});
5. Structured Logging With Correlation IDs
When something breaks in production, you need to trace a request through your entire system. Assign a unique ID to every request and include it in every log entry.
import { randomUUID } from "crypto";
import pino from "pino";
const logger = pino();
function requestIdMiddleware(req, res, next) {
req.id = req.headers["x-request-id"] || randomUUID();
res.setHeader("x-request-id", req.id);
req.log = logger.child({ requestId: req.id, userId: req.userId });
next();
}
This costs almost nothing to implement and saves hours during every incident.
API Architecture Decision Table
These are the decisions you face at the start of every API project. The right answer depends on context, not dogma.
| Decision | Option A | Option B | When to Choose A | When to Choose B |
|---|---|---|---|---|
| Protocol | REST | GraphQL | Multiple consumers with simple data needs; team knows REST well | Single frontend with complex nested data; need to reduce over-fetching |
| Architecture | Monolith | Microservices | Team under 10; product still finding market fit; MVP stage | Clear domain boundaries; independent deployment needed; team over 20 |
| Database | PostgreSQL | MongoDB | Relational data; need transactions; most SaaS apps | Document-oriented data; very early prototyping; schema changes constantly |
| Auth | JWT tokens | Session cookies | Mobile clients; API-first; third-party consumers | Server-rendered apps; simpler security model; single domain |
| Versioning | URL path (/v1/) | Header-based | Public APIs; clear consumer expectations | Internal APIs; gradual migration; fewer consumers |
| Hosting | Serverless (Lambda, Vercel) | Containers (ECS, Fly.io) | Unpredictable traffic; event-driven workloads | Steady traffic; WebSockets; long-running processes |
| Caching | Redis | In-memory (node-cache) | Multiple instances; shared state needed | Single instance; simple caching needs |
| Task processing | Queue (SQS, BullMQ) | Synchronous | Email, webhooks, reports, anything slow | Only when response time is under 200ms anyway |
For most SaaS applications we build: REST, monolith, PostgreSQL, JWT, URL versioning, containers, Redis. That stack handles the first 100K users without rearchitecting.
Database Design for API-First Applications
Start with PostgreSQL. This is not a controversial opinion — it is the correct default for 90% of API projects. PostgreSQL gives you relational integrity, JSON columns for flexible data, full-text search, and an ecosystem of tooling that no other database matches.
Connection Pooling From Day One
Every database connection consumes memory on both your application server and the database. Without a connection pool, each API request opens a new connection, and you hit the database connection limit under moderate load.
Use PgBouncer or your ORM's built-in pooling. Configure this before your first deploy, not after you get a "too many connections" error in production.
Migrations Over Manual Schema Changes
Never modify a production database schema by hand. Use a migration tool (Prisma Migrate, Drizzle Kit, Knex migrations) and version your schema changes alongside your code. Every schema change should be reviewable in a pull request.
Index Strategy
Start with indexes on your primary query patterns: foreign keys, columns used in WHERE clauses, and columns used for sorting. Do not add indexes preemptively on every column — each index slows down writes and uses disk space.
Use EXPLAIN ANALYZE on your slowest queries and add indexes based on actual data, not guesses.
When to Add Redis
Add Redis when you need one of these:
- Caching: Frequently read, rarely changing data (user profiles, configuration)
- Sessions or rate limiting: Shared state across multiple API instances
- Queues: Background job processing with BullMQ
- Pub/sub: Real-time features across instances
Do not add Redis "just in case." It is another piece of infrastructure to monitor, back up, and keep running. Add it when you have a specific need.
Error Handling That Does Not Drive Frontend Developers Crazy
Bad error handling is the number one complaint frontend developers have about backend APIs. The fix is straightforward: be consistent and be specific.
Standard Error Response Shape
Every error response from your API should follow this structure:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid request body",
"details": [
{ "field": "email", "message": "Invalid email format" },
{ "field": "name", "message": "Name is required" }
],
"requestId": "req_abc123"
}
}
The code is machine-readable (frontend uses it for conditional logic). The message is human-readable (for logs and debugging). The details array carries field-level errors for forms.
HTTP Status Codes: Use Them Correctly
| Status Code | Meaning | When to Use |
|---|---|---|
| 200 | OK | Successful GET, PUT, PATCH |
| 201 | Created | Successful POST that created a resource |
| 204 | No Content | Successful DELETE |
| 400 | Bad Request | Validation errors, malformed JSON |
| 401 | Unauthorized | Missing or invalid authentication |
| 403 | Forbidden | Authenticated but insufficient permissions |
| 404 | Not Found | Resource does not exist |
| 409 | Conflict | Duplicate entry, state conflict |
| 422 | Unprocessable Entity | Valid JSON but business logic rejection |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Unhandled exception (never return details to client) |
The distinction between 401 and 403 matters. 401 means "I don't know who you are." 403 means "I know who you are, and you can't do this." Frontend developers use this distinction to decide whether to redirect to login (401) or show a permissions error (403).
Return All Validation Errors at Once
Nothing frustrates a user more than fixing one form error, submitting again, and getting a different error. Validate the entire request body and return every error in a single response. The Zod example above does this by default.
Never Leak Internal Errors
In development, include stack traces in error responses. In production, log them server-side and return a generic message to the client. A stack trace in a 500 response tells attackers which framework you use, what your file structure looks like, and where to probe next.
Versioning Without Pain
Most APIs do not need versioning on day one. Add it when you need to make a breaking change and you have consumers who cannot update immediately.
URL Versioning: The Simple Default
GET /v1/users/123
GET /v2/users/123
URL versioning is visible, cacheable, and easy to understand. Every developer who sees /v1/ in a URL knows what it means. Header-based versioning (Accept: application/vnd.api+json;version=2) is more "correct" according to REST purists, but it is harder to test, harder to cache, and harder to explain to new team members.
For public APIs, use URL versioning. For internal APIs where you control all consumers, you often do not need versioning at all — just coordinate deployments.
How to Sunset Old Versions
- Add a
Sunsetheader to responses from the old version with the deprecation date - Add a
Deprecationheader with a link to migration documentation - Send email notifications to API key owners 90, 60, and 30 days before shutdown
- Monitor traffic to the old version — do not shut it down while it still receives significant traffic
- After the sunset date, return
410 Goneinstead of silently breaking
When You Actually Need Versioning
You need versioning when you must change a response shape or remove a field that existing clients depend on. Adding new fields to a response is not a breaking change — existing clients simply ignore fields they do not recognize. Adding optional request parameters is not a breaking change either. Version when you must break the contract, not when you add to it.
Testing APIs in Production
Unit tests for individual functions matter less for APIs than integration tests that verify the full request-response cycle. Your users do not call your internal functions — they call your HTTP endpoints.
Integration Tests First
Write tests that send real HTTP requests to your API and verify the response. Test the entire path: middleware, validation, business logic, database, response serialization.
describe("POST /v1/users", () => {
it("creates a user with valid input", async () => {
const res = await request(app)
.post("/v1/users")
.send({ email: "test@example.com", name: "Test", role: "member" })
.expect(201);
expect(res.body).toMatchObject({
id: expect.any(String),
email: "test@example.com",
});
});
it("returns 400 with validation errors for invalid input", async () => {
const res = await request(app)
.post("/v1/users")
.send({ email: "not-an-email" })
.expect(400);
expect(res.body.error.code).toBe("VALIDATION_ERROR");
expect(res.body.error.details).toHaveLength(2);
});
});
Contract Testing Between Services
If your API is consumed by other services, use contract testing (Pact is the standard tool) to verify that both sides agree on the request and response format. This catches breaking changes before they reach production.
Load Testing Before Launch
Run load tests with realistic traffic patterns before every launch. k6 and Artillery are both good options. Test with expected peak traffic, not average traffic. If your API serves a product that sends marketing emails, test with 10x normal traffic — that is what happens when 50,000 people click a link at the same time.
Health Check Endpoints
Every API needs a /health endpoint that returns 200 when the service is running and can reach its database. Load balancers, monitoring systems, and deployment pipelines all depend on this.
Add a /health/detailed endpoint (behind authentication) that reports database connection status, Redis status, queue depth, and memory usage. This saves time during incidents.
Common API Mistakes We See in Code Audits
These are the patterns we find most often when reviewing APIs built by other teams. Every one of them causes real production problems. If you are building an API for a SaaS product or preparing for launch, audit your code for these before shipping.
No Pagination
Returning all records from a database query works fine with 50 rows. It crashes your server with 50,000. Every list endpoint needs pagination from day one — cursor-based for large datasets, offset-based if you need page numbers.
N+1 Queries
Fetching a list of users and then making a separate database query for each user's profile is the classic N+1. Use eager loading (Prisma include, SQL JOIN) or dataloader patterns to batch these into a single query.
No Request Timeouts
If your API calls a third-party service and that service hangs, your API hangs too — and holds a database connection the entire time. Set timeouts on every external HTTP call and every database query. Five seconds is a reasonable default for most external calls.
Synchronous Email and Notification Sending
Sending an email inside a request handler means your user waits for the SMTP server to respond before they get their API response. Move email, push notifications, webhook deliveries, and report generation to a background queue. The user gets a fast response; the email goes out seconds later.
Business Logic in Route Handlers
Route handlers should validate input, call a service function, and format the response. Business logic belongs in a service layer that can be tested independently and reused across different routes, background jobs, and CLI commands.
Frequently Asked Questions
REST or GraphQL — which should I choose?
Choose REST if you have multiple consumers (mobile app, web app, third-party integrations) with straightforward data needs. REST is simpler to cache, simpler to monitor, and every developer understands it. Choose GraphQL if you have a single frontend that needs to fetch deeply nested data and you want to avoid multiple round trips. For most projects we build using AI-augmented development, REST is the right default. GraphQL adds operational complexity (schema management, query cost analysis, N+1 prevention) that is only justified when you have the specific problems it solves.
How many endpoints is too many?
There is no magic number, but if your API has more than 50 endpoints and is maintained by a small team, you probably have too many. The symptom is not the endpoint count itself — it is that nobody can remember all the endpoints, documentation falls behind, and inconsistencies creep in. Group related endpoints behind clear resource boundaries and consider whether some functionality should be a separate service.
When should I move from a monolith to microservices?
Not until the monolith is causing specific, measurable problems. The most common trigger is deployment speed: when deploying one feature requires testing and deploying the entire application, and your team is large enough that this happens multiple times per day. Another trigger is scaling: when one part of your system needs to handle 100x the traffic of the rest, breaking it into a separate service lets you scale independently. If your team is under 10 people and your deploy pipeline takes less than 15 minutes, a monolith is almost certainly the better choice.
How do I handle API authentication for mobile apps?
Use the same JWT access/refresh token pattern described above, but store tokens in the device's secure storage (Keychain on iOS, Keystore on Android). Never store tokens in plain SharedPreferences or AsyncStorage. Implement token refresh automatically in your HTTP client so the user is never forced to re-login unless the refresh token expires. Set refresh token expiry to 30-90 days for mobile apps, since users expect persistent sessions.
Production API architecture is not about choosing the trendiest framework or following every best practice blog post. It is about making the five or six decisions that prevent the most common failures and implementing them before launch. Get authentication, error handling, rate limiting, validation, and logging right, and you have an API that can grow with your product instead of holding it back.
If you are building an API and want architecture guidance from someone who has shipped this 29 times, reach out. We offer API development as a standalone service and as part of full SaaS builds.
15 years of software architecture experience. Former Senior Backend Engineer at ClickFunnels. Building production software with AI-augmented workflows.
Learn more about YuryNeed Help Building Your Project?
We build production-grade software using AI-augmented workflows. Get a quote within 48 hours.
Start a Conversation