01 — Foundation

What is a distributed systems simulator?

A distributed systems simulator is a tool that models the runtime behaviour of multi-node architectures — load balancers, databases, caches, message queues, microservices — without deploying any real infrastructure. Rather than watching a diagram sit still, you execute the architecture and observe how it responds to traffic, contention, and failure.

SysSimulator specifically uses discrete-event simulation (DES). In a DES engine, time is not continuous — it jumps from one meaningful event to the next. An event might be "request arrives at load balancer at t=12ms" or "database replica lags 40ms behind primary at t=850ms". Between events, nothing happens. This makes simulation both computationally efficient and analytically precise: every state transition is tracked, every queue depth is measurable, and the results are deterministic and reproducible.

The term "distributed systems simulator" sits in a different keyword and concept space from the broader "system design simulator." A system design simulator often refers to a diagramming or interview-prep tool. A distributed systems simulator is specifically concerned with runtime behaviour: how latency propagates through a service graph, how a queue grows under load, how a cache miss storm cascades into database saturation. That's what SysSimulator models.

Definition

Discrete-event simulation (DES) models systems as a sequence of events in time. Each event changes the system state and may schedule future events. Time advances in jumps — only to the next scheduled event — making DES far more efficient than continuous-time simulation for request-driven systems.

02 — Engine

How Rust and WebAssembly make browser-native simulation possible

Running a simulation engine in a browser sounds like it should be slow. JavaScript is single-threaded, garbage-collected, and not designed for tight event loops processing thousands of state transitions per second. The answer to this constraint is the same one that powers SysSimulator: Rust compiled to WebAssembly.

Simulation Engine Architecture

User Interface

Flutter Web — canvas rendering, component palette, metrics dashboard, chaos controls

Bridge

JavaScript ↔ WASM FFI — typed message passing between the UI thread and the simulation module

Simulation Core

Rust / WebAssembly — discrete-event engine, event priority queue, request routing logic, chaos injection hooks

Cost Model

AWS pricing rules embedded at compile time — EC2, RDS, ElastiCache, Lambda, SQS, S3 cost estimation per architecture

Runtime

Your browser's WASM runtime — no server, no network call, no data leaves your machine

WebAssembly gives the Rust simulation core near-native execution speed inside the browser sandbox. The DES event loop — maintaining a priority queue of future events sorted by timestamp, popping the next event, processing it, enqueuing downstream events — runs entirely in WASM memory. For a simulation running at 10,000 requests per second across a twelve-node topology, this amounts to hundreds of thousands of event-queue operations per simulated second, all completing in wall-clock milliseconds on consumer hardware.

Rust was chosen specifically because its ownership model eliminates the class of memory bugs (use-after-free, double-free, data races) that would be catastrophic in a tight event loop. There is no garbage collector pausing simulation mid-run. The compiled WASM binary is small enough to load with the initial page request — the simulation engine is available the moment the canvas renders.

The privacy implication is significant: because the engine is browser-native, your architecture never leaves your machine. You can simulate a topology containing your actual service names, realistic traffic patterns, and proprietary component configurations without any data reaching an external server.

03 — Modelling

How request flow models real distributed system behaviour

When you place a Load Balancer connected to three App Servers connected to a PostgreSQL instance and a Redis Cache, you are not drawing a picture — you are defining a directed graph of queuing systems. Each component in SysSimulator is a node with configurable properties: capacity (concurrent requests it can handle), processing latency (mean and standard deviation), failure rate, and timeout threshold. Each connection is a directed edge carrying traffic with configurable network latency.

When the simulation runs at a given RPS, the DES engine generates request arrival events according to a Poisson distribution — which closely models real production traffic. Each request traverses the component graph, waiting in queue if all backends are saturated, being processed, making downstream calls, and completing. Every hop is timed. Every queue depth is tracked.

Request arrival

Requests arrive following a Poisson process at the configured RPS. Burst traffic, scheduled spikes, and gradual ramp-up patterns can all be modelled.

Queue formation and backpressure

Each component has a concurrency limit. When reached, incoming requests queue. When a queue overflows its configured buffer size, requests are dropped — appearing as errors in throughput metrics.

Latency propagation

Each component adds variable processing time sampled from a configurable distribution (log-normal by default). Downstream calls add their own latency. Total request latency is the sum of all hops including queue wait time.

Cache and database interaction

The cache component tracks hit/miss ratio based on a configurable working set and Zipf distribution. Cache hits skip the database call entirely. Cache miss storms automatically increase database queue depth.

Metrics collection

The engine collects latency samples, throughput, error rate, and queue depths at configurable intervals. After the run, read P50, P95, and P99 latency percentiles, peak queue depths, and the AWS cost estimate for the modelled topology.

This is why the results feel meaningful rather than illustrative. You are reading output from a queueing model — the same mathematical framework used in academic capacity planning. Little's Law (L = λW, relating queue length, arrival rate, and wait time) holds in the simulation just as it does in a real system.

04 — Positioning

How it differs from real load testing

The most important thing to understand about a distributed systems simulator is what it is not: it is not a load testing tool. Load testing — with tools like k6, Gatling, or Locust — sends actual HTTP requests to a running system and measures the real latency and error rate. SysSimulator sends no real traffic anywhere. The distinction matters because the two tools answer different questions.

Dimension	SysSimulator (DES)	Real Load Testing
Infrastructure required	✓ None — browser only	✗ Full deployed system
Stage of use	Design & architecture phase	Pre-launch & post-deploy
What it measures	Modelled latency, queue depth, throughput under assumptions	Real latency, real error rate, real resource consumption
Iteration speed	Seconds — change topology, re-run	Hours — redeploy, warm up, run test
Chaos injection	✓ Instant — node crash, partition, latency spike	✗ Requires chaos engineering tooling
Cost estimation	✓ Built-in AWS cost model	✗ Requires separate FinOps analysis
Production data risk	✓ Zero — no real traffic generated	Requires careful traffic isolation
Accuracy	Model-dependent — only as accurate as your assumptions	Ground truth for the deployed configuration

Use SysSimulator when you are deciding what to build: comparing architectures, evaluating whether a Redis cache layer actually moves your P99 latency, or stress-testing a new topology against a traffic spike before committing to it. Use load testing when you need to verify that what you built actually works at scale.

Practical workflow

The most effective pattern is simulation-first, load-test-to-verify: use SysSimulator to explore 3–5 candidate architectures and identify the promising one, then spin up that architecture and validate it with real load testing before shipping.

05 — Reliability

Chaos engineering without the blast radius

Chaos engineering in production — intentionally killing nodes, inducing network partitions, or saturating a database — requires careful controls to avoid customer impact. In SysSimulator, chaos is a first-class feature with zero blast radius: you trigger failure modes with a toggle and observe their effect on metrics immediately.

The 28 built-in chaos scenarios include node crashes, network partitions, memory pressure, latency injection, and traffic spikes. The value is in seeing propagation — a 200ms latency injection into a single database connection pool backs up the app server queue, which backs up the load balancer queue, which starts dropping requests. The blast radius is the whole request path, visible in the live metrics panel rather than in a post-mortem.

P99 Latency tracked P50, P95, P99 sampled across all components

28 Chaos scenarios Node crash, partition, spike, pressure, timeout

0s Recovery time Reset and re-run with no redeployment required

06 — Applications

Who uses a distributed systems simulator and why

System design interview preparation

SysSimulator is used heavily by engineers preparing for senior and staff-level system design interviews at FAANG-tier companies. Designing a system on paper is one thing; understanding why a cache hit ratio of 80% versus 95% produces dramatically different database load requires intuition that static diagrams don't build. The simulator makes the numbers tangible — you experience the tradeoff rather than describe it.

Architecture review and decision-making

Engineering teams use it to explore architectural options before committing to implementation. Loading both topologies into SysSimulator and running them at production-representative RPS quantifies the difference in minutes rather than days of implementation work.

Capacity planning

The built-in AWS cost model lets teams translate architecture decisions into monthly cost estimates. Running the simulation at 2×, 5×, and 10× current traffic load shows where the first bottleneck appears and what it costs to address it through horizontal scaling versus architectural change.

Teaching distributed systems concepts

University courses and engineering bootcamps use SysSimulator to make distributed systems concepts concrete. Watching a topology's queue depth spike when you take a Kafka broker offline — and recover when you bring it back — builds durable mental models in a way that a lecture cannot.