Traditional load testing from a single machine quickly becomes a bottleneck for modern distributed systems. This article describes a scalable benchmarking architecture using Apache JMeter running on distributed cloud workers, Scribe for centralized log aggregation, and TimescaleDB for time-series performance analytics. The approach enables engineering teams to simulate large-scale production traffic while efficiently collecting and analyzing millions of performance events.
Most teams start performance testing the same way.
They install Apache JMeter, create a test plan, and run it from a laptop or a single VM.
For small systems this works fine. But once services begin handling tens of thousands of requests per second, the testing setup itself becomes the bottleneck.
The problem isn’t the application anymore.
It’s the load generator.
In modern distributed systems, realistic performance testing requires infrastructure capable of generating traffic at scale while collecting and analyzing millions of performance events in real time.
In this article, I’ll walk through a distributed performance benchmarking architecture that combines:
- Apache JMeter for load generation
- Cloud infrastructure (AWS EC2 or Google Cloud Compute) for scalable workers
- Scribe-based log aggregation for centralized metric collection
- TimescaleDB for high-volume time-series analytics
This architecture allows engineering teams to simulate production-scale traffic patterns and analyze performance results efficiently.
Cloud providers such as Amazon Web Services offer managed load testing solutions such as AWS Distributed Load Testing, which simplifies large-scale benchmarking through a managed infrastructure. However, many engineering teams prefer building custom benchmarking systems using open-source tools to gain greater control over workload generation, logging pipelines, and analytics storage. The architecture described in this article demonstrates one such approach using JMeter, distributed compute infrastructure, and time-series analytics.
In my work building large-scale data infrastructure and distributed services, performance testing frequently becomes a bottleneck long before the system under test reaches its limits. Designing a scalable benchmarking platform therefore requires treating the testing infrastructure itself as a distributed system.
The Problem with Traditional Load Testing
Most load testing environments fail for three reasons.
1. The Load Generator Becomes the Bottleneck
A single machine generating requests quickly runs into limits:
- CPU exhaustion
- Network saturation
- Thread scheduling overhead
Before the system under test reaches its limits, the load generator collapses first.
2. Test Traffic Isn’t Realistic
Production traffic comes from:
- Multiple regions
- Different network latencies
- Varying traffic spikes
Running all requests from one machine produces unrealistic behavior.
3. Results Are Hard to Analyze
When running distributed tests, each worker produces logs locally. Without centralized aggregation, analyzing results across dozens of workers becomes difficult.
What we need instead is a distributed load generation system with centralized observability.
Architecture Overview
The architecture is built around four key components:
- Test Controller
- Distributed JMeter Workers
- Centralized Logging Pipeline
- Time-Series Analytics Database
The data flow looks like this:
Controller → Distributed JMeter Workers → Scribe Logging Layer → TimescaleDB → Analytics / Visualization
Each layer solves a specific scalability problem
Scaling Load Generation with Cloud Workers
Apache JMeter is widely used for performance testing because it supports many protocols and provides flexible scripting capabilities.
To generate large-scale load, we deploy multiple JMeter workers across cloud instances.
Each worker:
- Runs JMeter in non-GUI mode
- Executes the same test plan
- Generates a portion of the overall traffic
Example execution command:
jmeter -n -t test-plan.jmx
Workers can be provisioned using:
- AWS Auto Scaling Groups
- Google Cloud Managed Instance Groups
This enables horizontal scaling.
Example load generation capacity:
| Workers | Estimated Throughput |
|—-|—-|
| 5 | ~20k requests/sec |
| 20 | ~100k requests/sec |
| 100 | 500k+ requests/sec |
The exact numbers depend on:
- Request complexity
- Instance type
- Network overhead
Why Centralized Logging Is Critical
When running distributed performance tests, every request generates metrics such as:
- Request latency
- Response code
- Endpoint
- Timestamp
During a large test run, this can easily produce hundreds of millions of records.
If workers store logs locally:
- Disk I/O slows down the test
- Results become fragmented
- Analysis becomes difficult
Instead, workers should stream metrics to a centralized logging pipeline.
Aggregating Logs with Scribe
A common approach is to use Scribe, a distributed log aggregation system designed for high-throughput environments.
Each JMeter worker emits structured events like this:
{
"timestamp": "2026-03-01T12:10:22",
"test_run_id": "benchmark_test_1",
"worker_id": "worker-08",
"endpoint": "/api/resource",
"latency_ms": 118,
"status_code": 200
}
Scribe collects logs from all workers and forwards them to the storage layer.
Benefits of this architecture:
- Decouples load generation from storage
- Avoids disk bottlenecks on worker nodes
- Enables centralized analysis pipelines
Storing Benchmark Results in TimescaleDB
Performance test data is fundamentally time-series data.
Each request produces a timestamped measurement. Over a 10-minute test, a distributed system may generate hundreds of millions of events.
A traditional relational database can store this data, but time-series workloads benefit from specialized optimizations.
TimescaleDB is a PostgreSQL extension designed for exactly this use case.
Key features include:
Time-Partitioned Storage
TimescaleDB stores data in hypertables, automatically partitioned by time.
This allows efficient storage and querying of massive datasets.
High Write Throughput
Batch inserts enable TimescaleDB to ingest large volumes of benchmark data without becoming a bottleneck.
Built-in Time-Series Analytics
TimescaleDB supports powerful analytical queries such as:
- Latency percentiles
- Time-bucket aggregations
- Rolling averages
These operations are critical when analyzing system performance.
Example Schema
A simple schema for storing benchmark results might look like this:
CREATE TABLE perf_results (
timestamp TIMESTAMPTZ NOT NULL,
test_run_id TEXT,
worker_id TEXT,
endpoint TEXT,
latency_ms INT,
status_code INT
);
Convert the table into a hypertable:
SELECT create_hypertable('perf_results', 'timestamp');
This enables TimescaleDB’s automatic partitioning and performance optimizations.
Example Performance Queries
Once results are stored, engineers can analyze system behavior using SQL.
P95 latency
SELECT percentile_cont(0.95)
WITHIN GROUP (ORDER BY latency_ms)
FROM perf_results
WHERE test_run_id='benchmark_42';
Error rate
SELECT
COUNT(*) FILTER (WHERE status_code >=500) * 1.0 / COUNT(*)
FROM perf_results
WHERE test_run_id='benchmark_42';
Requests per minute
SELECT
time_bucket('1 minute', timestamp) AS minute,
COUNT(*) AS requests
FROM perf_results
GROUP BY minute
ORDER BY minute;
These queries allow teams to quickly evaluate system performance under heavy load.
Visualization and Monitoring
Once performance data is stored in TimescaleDB, it can be visualized using tools such as:
- Grafana
- Superset
- Metabase
Typical dashboards include:
- requests per second
- P50 / P95 / P99 latency
- error rate
- latency distribution over time
This turns load testing into a repeatable and observable engineering workflow rather than a one-off experiment.
Design Considerations When Building Benchmarking Systems
Through building large-scale benchmarking environments, several design principles consistently emerge:
1. Load generation must scale independently of analytics
The infrastructure generating traffic should remain isolated from the systems used to store and analyze performance data. Coupling these layers can introduce measurement bias.
2. Benchmarking systems should simulate realistic production conditions
This includes:
- distributed traffic sources
- realistic request concurrency
- network variability
A centralized test client rarely captures these dynamics.
3. Observability is as important as traffic generation
Without structured logging and time-series analytics, identifying bottlenecks becomes extremely difficult in high-throughput environments.
These considerations significantly improve the reliability of performance benchmarks.
Example Deployment
A typical deployment on AWS might include:
| Component | Instance |
|—-|—-|
| Controller | t3.large |
| JMeter workers | 50 × c6i.large |
| Scribe collectors | m6i.large |
| TimescaleDB | PostgreSQL RDS |
This setup can generate hundreds of thousands of requests per second, depending on the workload.
Lessons Learned
Building a distributed performance benchmarking system highlights several important principles.
Treat benchmarking infrastructure like production infrastructure
Load testing systems must be designed with the same scalability and reliability considerations as production systems.
Decouple load generation and analytics
Separating workers from the analytics pipeline prevents measurement bias and improves reliability.
Store performance metrics as time-series data
Time-series databases such as TimescaleDB make large-scale performance analysis significantly easier.
Final Thoughts
As distributed systems continue to grow in scale, benchmarking infrastructure must evolve alongside them. Treating performance testing as a distributed systems problem — rather than a simple testing task — enables engineers to simulate real production workloads and identify bottlenecks earlier in the development lifecycle.
References
- Apache Software Foundation. Apache JMeter User Manual. https://jmeter.apache.org/usermanual/
- Apache Software Foundation. Remote Testing in JMeter. https://jmeter.apache.org/usermanual/remote-test.html
- Benson, T., Akella, A., & Maltz, D. Scribe: A Scalable and Reliable Logging Infrastructure. USENIX.
- Timescale Inc. TimescaleDB Documentation. https://docs.timescale.com/
- Kleppmann, M. Designing Data-Intensive Applications. O’Reilly Media, 2017.