Microservices vs monolith

Q: What are the real costs of microservices?

Microservices shift complexity from code to infrastructure. The costs: (1) Network latency — every cross-service call is a network hop (1-10ms) vs an in-process function call (nanoseconds). A request that calls 5 services serially adds 5-50ms of latency before any business logic runs. (2) Distributed tracing — debugging a failure across 10 services requires distributed tracing (Jaeger, Zipkin, Datadog APM). Without it, root cause analysis is nearly impossible. (3) Data consistency — cross-service transactions require sagas or two-phase commit. Simple transactions that would be atomic in a monolith become distributed coordination problems. (4) Operational overhead — each service needs its own deployment pipeline, monitoring, alerting, auto-scaling, and on-call rotation. (5) Service discovery and load balancing — services must find each other via a registry (Consul, Kubernetes DNS) rather than in-process calls.

Q: What is a modular monolith?

A modular monolith is a single deployable unit that is internally structured as independent modules with explicit interfaces between them — no circular dependencies, no direct database sharing between modules, clear ownership boundaries. It deploys as one process but is designed to be splittable. It captures most of the organizational benefits of microservices (clear ownership, independent module development) without the distributed systems overhead (network hops, service discovery, distributed tracing). Shopify and Stack Overflow ran as modular monoliths at very large scale. The modular monolith is the recommended architecture for teams with fewer than ~10 engineers or systems with unclear service boundaries — you can always split later if a specific module needs independent scaling.

Q: How do you define the right service boundaries?

Domain-Driven Design (DDD) provides the vocabulary. A bounded context is a domain area with its own model, terminology, and data — a natural candidate for a service boundary. Examples: User identity (authentication, profile), Payments (transactions, billing), Catalog (products, search), Orders (order lifecycle), Fulfillment (shipping, warehouse). Good service boundaries have: low coupling (services communicate via well-defined APIs, not shared databases), high cohesion (all functionality within a service relates to the same domain), independent deployability (changes to one service don't require coordinated changes to others), and own their data (each service has its own database — no shared databases between services). The failure mode: teams split services by technical layer (frontend service, backend service, database service) rather than by domain — this creates distributed monoliths where a single business feature requires coordinating 5 services.

Q: Should you start with a monolith or microservices?

Start with a monolith (ideally modular). Amazon, Netflix, Twitter, Uber, and Airbnb all started as monoliths and migrated to microservices as they scaled — when the teams grew large enough that a single codebase created coordination overhead, and when specific components needed independent scaling. Premature microservices decomposition (before you understand the domain and access patterns) leads to incorrect service boundaries that are expensive to fix. Microservices boundaries should be driven by team structure (Conway's Law: systems mirror communication structures) and scaling requirements (this specific component needs 10× more resources than everything else), not by engineering preference. In a system design interview: propose a modular monolith for a startup-stage system, and explain when you'd extract services (specific scaling bottleneck, large independent team ownership, external API surface).

The microservices vs monolith question is a judgment call, not a technical fact — and interviewers know this. The candidate who says "always microservices" signals they follow hype. The candidate who says "always monolith" signals they haven't worked at scale. The correct answer is context-dependent and demonstrates that you understand the actual costs of distributed systems, not just the benefits shown in architecture diagrams.

What a monolith actually is

A monolith is a single deployable unit where all components run in the same process. All function calls are in-process (nanoseconds). A single deployment deploys everything. A single database transaction spans all operations. A monolith is not synonymous with "big ball of mud" — a well-structured monolith has clean module boundaries, dependency injection, and clear interfaces between components. The modular monolith is the architecture that gets this right.

Monoliths have genuine advantages: in-process function calls eliminate network latency and failure modes, a single deployment means no partial-deployment failures, ACID transactions span all components without distributed coordination, and a new engineer can run the entire system locally. These are not trivial benefits — they're the reason monoliths work well for teams under ~50 engineers and systems under ~10K requests/second.

What microservices add (and cost)

Microservices decompose a system into independently deployable services, each owning its data and exposing an API. The benefits are real: independent scaling (the payment service can scale independently of the catalog service), independent deployment (the recommendation team deploys without coordinating with the order team), fault isolation (a failure in the recommendation service doesn't bring down checkout), and independent technology choices (the ML pipeline can use Python while the order service uses Go).

The costs are also real and are the source of most microservices failures:

Network latency. Every cross-service call is a network hop. A checkout flow that calls user-service, catalog-service, inventory-service, pricing-service, and order-service sequentially adds 5–50ms of network overhead before any business logic runs. At p99, with retries and timeouts, this compounds badly.

Distributed tracing. When a request fails, which of the 10 services in the call chain caused it? Without distributed tracing (Jaeger, Zipkin, Datadog APM, AWS X-Ray), root cause analysis is hours of log archaeology. Distributed tracing is non-optional for microservices — it's an infrastructure investment that doesn't exist in a monolith.

Data consistency. Operations that would be a single database transaction in a monolith now span multiple services with separate databases. Two-phase commit is slow and creates locks across services. The saga pattern (see saga pattern guide) provides eventual consistency but adds compensating transaction complexity. Many "simple" features become coordination problems.

Operational overhead. Each service needs a deployment pipeline, container image, Kubernetes manifests, health checks, auto-scaling policy, alerting rules, runbook, and on-call rotation. A 20-service system has 20× the operational surface of a monolith. Teams that can't staff this operational overhead end up with poorly maintained services that fail silently.

The modular monolith: the middle path

A modular monolith is structured internally as independent modules with explicit, enforced interfaces between them — but deploys as a single process. Module boundaries: each module owns its own database tables (no direct cross-module table access, even though they share a physical database). Modules communicate through defined interfaces (interfaces, events, or a service layer), never by directly calling internal functions across module boundaries. No circular dependencies between modules.

This architecture captures the organizational benefits of microservices (clear ownership, independent development) without the distributed systems costs (network hops, distributed transactions, distributed tracing). Shopify's Rails monolith and Stack Overflow's .NET monolith serve millions of requests per second. The modular monolith is the correct default for systems with fewer than ~10 services worth of differentiated scaling requirements and teams that can't staff microservices operations.

The modular monolith is also the correct preparation for eventual microservices. When a specific module genuinely needs independent scaling (the ML recommendation module needs 10× more CPU than the rest), extracting it as a service is straightforward because the module boundary is already clean. Premature microservices creates incorrect boundaries that are expensive to undo.

When microservices make sense

Specific scaling asymmetry. One component needs dramatically different resources than others. At YouTube: video transcoding is CPU-intensive and write-latency-tolerant, completely different from the read-path serving. At Twitter: the tweet fan-out service is write-intensive, the timeline read service is read-intensive. These don't fit in one binary.

Large independent teams. Conway's Law: systems mirror the communication structures of the organizations that build them. If you have 10 teams of 8 engineers each, each team needs independent deployment capability — otherwise every deployment requires coordinating 10 teams. Microservices aligned to team boundaries eliminate this coordination cost. Teams smaller than ~6 engineers rarely justify the operational overhead of owning a microservice independently.

External API products. The payments team exposes a payment API to internal and external consumers. It has a defined API contract, versioning requirements, and SLA obligations independent of the rest of the system. This is a natural service boundary regardless of team size.

Compliance and data isolation. PCI-DSS compliance for payment card data, HIPAA for health data, GDPR for EU user data — these regulatory requirements may mandate that specific data is isolated in a separate service with its own security perimeter, audit logs, and access controls.

How to answer in an interview

When asked "would you use microservices or a monolith?", frame your answer around the three factors: team size, scaling requirements, and operational maturity. "For a startup-scale system with a single team, I'd start with a modular monolith — clean module boundaries that can be split later. I'd extract services when I hit a specific scaling bottleneck in a component or when team growth creates coordination overhead. I'd want distributed tracing, service discovery, and runbooks in place before committing to microservices operations."

This answer demonstrates: you know the costs of microservices, you have a principled decision framework, and you understand that the choice is reversible. It's a significantly more credible answer than "microservices for everything" or "monolith is simpler."

Frequently asked questions

What are the real costs of microservices?
Network latency (ms per hop vs ns in-process), distributed tracing overhead, data consistency via sagas instead of transactions, 20× operational surface for a 20-service system. These are not hypothetical — they're the primary source of microservices project failures.

What is a modular monolith?
Single deployable unit structured as modules with enforced boundaries and separate data ownership. Gets clear ownership without network hops. The right default for teams under ~50 engineers. Easier to split later than to merge premature microservices.

How do you define the right service boundaries?
Domain-driven design bounded contexts: low coupling, high cohesion, independent deployability, own their data. Avoid layer-based splits (frontend/backend/database). The correct split follows team ownership and domain semantics, not technical convenience.

Should you start with a monolith or microservices?
Start with a modular monolith. Amazon, Netflix, Uber all started as monoliths. Extract services when a specific component needs independent scaling or when team size creates coordination overhead. Premature decomposition creates wrong boundaries that are expensive to fix.

Practice in SysSimulator → Saga pattern guide

Next: Saga pattern: distributed transactions →