Scaling JMeter: A Guide to Distributed Performance Testing

Traditional load testing from a single machine quickly becomes a bottleneck for modern distributed systems. This article describes a scalable benchmarking architecture using Apache JMeter running on distributed cloud workers, Scribe for centralized log aggregation, and TimescaleDB for time-series performance analytics. The approach enables engineering teams to simulate large-scale production traffic while efficiently collecting and analyzing millions of performance events.

Most teams start performance testing the same way.

They install Apache JMeter, create a test plan, and run it from a laptop or a single VM.

For small systems this works fine. But once services begin handling tens of thousands of requests per second, the testing setup itself becomes the bottleneck.

The problem isn’t the application anymore.

It’s the load generator.

In modern distributed systems, realistic performance testing requires infrastructure capable of generating traffic at scale while collecting and analyzing millions of performance events in real time.

In this article, I’ll walk through a distributed performance benchmarking architecture that combines:

Apache JMeter for load generation
Cloud infrastructure (AWS EC2 or Google Cloud Compute) for scalable workers
Scribe-based log aggregation for centralized metric collection
TimescaleDB for high-volume time-series analytics

This architecture allows engineering teams to simulate production-scale traffic patterns and analyze performance results efficiently.

Cloud providers such as Amazon Web Services offer managed load testing solutions such as AWS Distributed Load Testing, which simplifies large-scale benchmarking through a managed infrastructure. However, many engineering teams prefer building custom benchmarking systems using open-source tools to gain greater control over workload generation, logging pipelines, and analytics storage. The architecture described in this article demonstrates one such approach using JMeter, distributed compute infrastructure, and time-series analytics.

In my work building large-scale data infrastructure and distributed services, performance testing frequently becomes a bottleneck long before the system under test reaches its limits. Designing a scalable benchmarking platform therefore requires treating the testing infrastructure itself as a distributed system.

The Problem with Traditional Load Testing

Most load testing environments fail for three reasons.

1. The Load Generator Becomes the Bottleneck

A single machine generating requests quickly runs into limits:

CPU exhaustion
Network saturation
Thread scheduling overhead

Before the system under test reaches its limits, the load generator collapses first.

2. Test Traffic Isn’t Realistic

Production traffic comes from:

Multiple regions
Different network latencies
Varying traffic spikes

Running all requests from one machine produces unrealistic behavior.

3. Results Are Hard to Analyze

When running distributed tests, each worker produces logs locally. Without centralized aggregation, analyzing results across dozens of workers becomes difficult.

What we need instead is a distributed load generation system with centralized observability.

Architecture Overview

The architecture is built around four key components:

Test Controller
Distributed JMeter Workers
Centralized Logging Pipeline
Time-Series Analytics Database

The data flow looks like this:

Controller → Distributed JMeter Workers → Scribe Logging Layer → TimescaleDB → Analytics / Visualization

Each layer solves a specific scalability problem

Scaling Load Generation with Cloud Workers

Apache JMeter is widely used for performance testing because it supports many protocols and provides flexible scripting capabilities.

To generate large-scale load, we deploy multiple JMeter workers across cloud instances.

Each worker:

Runs JMeter in non-GUI mode
Executes the same test plan
Generates a portion of the overall traffic

Example execution command:

jmeter -n -t test-plan.jmx

Workers can be provisioned using:

AWS Auto Scaling Groups
Google Cloud Managed Instance Groups

This enables horizontal scaling.

Example load generation capacity:

| Workers | Estimated Throughput |
|—-|—-|
| 5 | ~20k requests/sec |
| 20 | ~100k requests/sec |
| 100 | 500k+ requests/sec |

The exact numbers depend on:

Request complexity
Instance type
Network overhead

Why Centralized Logging Is Critical

When running distributed performance tests, every request generates metrics such as:

Request latency
Response code
Endpoint
Timestamp

During a large test run, this can easily produce hundreds of millions of records.

If workers store logs locally:

Disk I/O slows down the test
Results become fragmented
Analysis becomes difficult

Instead, workers should stream metrics to a centralized logging pipeline.

Aggregating Logs with Scribe

A common approach is to use Scribe, a distributed log aggregation system designed for high-throughput environments.

Each JMeter worker emits structured events like this:

{
  "timestamp": "2026-03-01T12:10:22",
  "test_run_id": "benchmark_test_1",
  "worker_id": "worker-08",
  "endpoint": "/api/resource",
  "latency_ms": 118,
  "status_code": 200
}

Scribe collects logs from all workers and forwards them to the storage layer.

Benefits of this architecture:

Decouples load generation from storage
Avoids disk bottlenecks on worker nodes
Enables centralized analysis pipelines

Storing Benchmark Results in TimescaleDB

Performance test data is fundamentally time-series data.

Each request produces a timestamped measurement. Over a 10-minute test, a distributed system may generate hundreds of millions of events.

A traditional relational database can store this data, but time-series workloads benefit from specialized optimizations.

TimescaleDB is a PostgreSQL extension designed for exactly this use case.

Key features include:

Time-Partitioned Storage

TimescaleDB stores data in hypertables, automatically partitioned by time.

This allows efficient storage and querying of massive datasets.

High Write Throughput

Batch inserts enable TimescaleDB to ingest large volumes of benchmark data without becoming a bottleneck.

Built-in Time-Series Analytics

TimescaleDB supports powerful analytical queries such as:

Latency percentiles
Time-bucket aggregations
Rolling averages

These operations are critical when analyzing system performance.

Example Schema

A simple schema for storing benchmark results might look like this:

CREATE TABLE perf_results (
    timestamp TIMESTAMPTZ NOT NULL,
    test_run_id TEXT,
    worker_id TEXT,
    endpoint TEXT,
    latency_ms INT,
    status_code INT
);

Convert the table into a hypertable:

SELECT create_hypertable('perf_results', 'timestamp');

This enables TimescaleDB’s automatic partitioning and performance optimizations.

Example Performance Queries

Once results are stored, engineers can analyze system behavior using SQL.

P95 latency

SELECT percentile_cont(0.95)
WITHIN GROUP (ORDER BY latency_ms)
FROM perf_results
WHERE test_run_id='benchmark_42';

Error rate

SELECT
COUNT(*) FILTER (WHERE status_code >=500) * 1.0 / COUNT(*)
FROM perf_results
WHERE test_run_id='benchmark_42';

Requests per minute

SELECT
time_bucket('1 minute', timestamp) AS minute,
COUNT(*) AS requests
FROM perf_results
GROUP BY minute
ORDER BY minute;

These queries allow teams to quickly evaluate system performance under heavy load.

Visualization and Monitoring

Once performance data is stored in TimescaleDB, it can be visualized using tools such as:

Grafana
Superset
Metabase

Typical dashboards include:

requests per second
P50 / P95 / P99 latency
error rate
latency distribution over time

This turns load testing into a repeatable and observable engineering workflow rather than a one-off experiment.

Design Considerations When Building Benchmarking Systems

Through building large-scale benchmarking environments, several design principles consistently emerge:

1. Load generation must scale independently of analytics

The infrastructure generating traffic should remain isolated from the systems used to store and analyze performance data. Coupling these layers can introduce measurement bias.

2. Benchmarking systems should simulate realistic production conditions

This includes:

distributed traffic sources
realistic request concurrency
network variability

A centralized test client rarely captures these dynamics.

3. Observability is as important as traffic generation

Without structured logging and time-series analytics, identifying bottlenecks becomes extremely difficult in high-throughput environments.

These considerations significantly improve the reliability of performance benchmarks.

Example Deployment

A typical deployment on AWS might include:

This setup can generate hundreds of thousands of requests per second, depending on the workload.

Lessons Learned

Building a distributed performance benchmarking system highlights several important principles.

Treat benchmarking infrastructure like production infrastructure

Load testing systems must be designed with the same scalability and reliability considerations as production systems.

Decouple load generation and analytics

Separating workers from the analytics pipeline prevents measurement bias and improves reliability.

Store performance metrics as time-series data

Time-series databases such as TimescaleDB make large-scale performance analysis significantly easier.

Final Thoughts

As distributed systems continue to grow in scale, benchmarking infrastructure must evolve alongside them. Treating performance testing as a distributed systems problem — rather than a simple testing task — enables engineers to simulate real production workloads and identify bottlenecks earlier in the development lifecycle.

References

Apache Software Foundation. Apache JMeter User Manual. https://jmeter.apache.org/usermanual/
Apache Software Foundation. Remote Testing in JMeter. https://jmeter.apache.org/usermanual/remote-test.html
Benson, T., Akella, A., & Maltz, D. Scribe: A Scalable and Reliable Logging Infrastructure. USENIX.
Timescale Inc. TimescaleDB Documentation. https://docs.timescale.com/
Kleppmann, M. Designing Data-Intensive Applications. O’Reilly Media, 2017.