This document explains how to use the built-in benchmarking tool to measure cursusβs throughput, latency, and system behavior under load. The benchmark tool simulates realistic workloads with configurable numbers of concurrent producers and consumers, partitions, and message counts.
For information about performance tuning configuration parameters that affect benchmark results, see Performance Tuning.
The cursus benchmarking system consists of three primary components:
The benchmark tool executes a three-phase workflow:
The benchmark tool is built as part of the standard build process:
make build
This compiles the benchmark binary to bin/bench. Alternatively, build only the benchmark tool:
go build -o bin/bench cmd/bench/main.go
Before running benchmarks, ensure the broker server is running:
./bin/cursus
./bin/cursus &
Wait for the broker to become ready (health check on port 9080):
curl http://localhost:9080/health
Execute the benchmark with default parameters:
./bin/bench
Broker Address: localhost:9000
Topic Name: bench-topic
Partitions: 12
Producers: 12
Consumers: 12
Messages per Producer: 100
The benchmark tool accepts the following flags:
| Flag | Type | Default | Description |
|---|---|---|---|
| -addr | string | localhost:9000 | Broker TCP address and port |
| -topic | string | bench-topic | Topic name for benchmark |
| -partitions | int | 12 | Number of partitions to create |
| -producers | int | 12 | Number of concurrent producers |
| -consumers | int | 12 | Number of concurrent consumers |
| -messages | int | 100 | Messages published per producer |
Test with high message volume and many producers:
./bin/bench -producers 50 -consumers 50 -messages 1000 -partitions 24
This produces 50,000 total messages (50 producers Γ 1000 messages each) distributed across 24 partitions.
Test serial performance:
./bin/bench -producers 1 -consumers 1 -messages 10000 -partitions 1
Benchmark publishing throughput without consumption overhead:
./bin/bench -producers 10 -consumers 0 -messages 5000
The RunTopicCreationPhase() method establishes a TCP connection and sends a CREATE command:
CREATE <topic> <partitions>CREATE bench-topic 12The method handles idempotent topic creation - if the topic already exists, the benchmark continues without error.
The RunConcurrentProducerPhase() spawns NumProducers goroutines, each calling RunMessageProductionPhase().
Each producer:
sendMessagesToPartition()bench-msg-P{producerID}-Part{partitionID}-Msg{msgIndex}Protocol: Messages are encoded using util.EncodeMessage() and sent with length prefixes via util.WriteWithLength().
The RunConsumerPhase() spawns one goroutine per partition via consumeMessagesFromPartition().
Each consumer:
util.ReadWithLength()AckTimeout * 2 (10 seconds) for read operations to accommodate high-throughput scenarios.Initializing Topic 'bench-topic' with 12 partitions...
Topic 'bench-topic' already exists, continuing...
Starting Producer Phase (12 Producers, 1200 Total Messages)
Producer Phase Finished in 2.345s
Starting Consumer Phase (12 Consumers)
Consumer0 finished reading 100/100 messages.
Consumer1 finished reading 100/100 messages.
...
Consumer Phase Finished in 1.234s
π§ͺ BENCHMARK RESULT [disk] π§ͺ
-------------------------------------
Topic : bench-topic
Partitions : 12
Producers : 12
Consumers : 12
Total Messages : 1200
Producer Duration : 2.345s
Consumer Duration : 1.234s
Total Duration (P+C) : 3.579s
Throughput (Combined) : 335.28 msg/sec
-------------------------------------
| Metric | Description | Calculation |
|---|---|---|
| Producer Duration | Time to publish all messages | time.Since(producerStart) |
| Consumer Duration | Time to consume all messages from disk | time.Since(consumerStart) |
| Total Duration (P+C) | Sum of producer and consumer phases | producerDuration + consumerDuration |
| Throughput (Combined) | Messages per second (end-to-end) | totalMessages / totalDuration.Seconds() |
| Throughput (Produce) | Publishing rate (producer-only mode) | totalMessages / producerDuration.Seconds() |
Important: Throughput includes both publish and consume operations. For producer-only benchmarks (-consumers 0), only βThroughput (Produce)β is displayed.
The repository includes a GitHub Actions workflow that runs benchmarks automatically on every push to main and on pull requests.
.github/workflows/benchmark.ymlThe CI workflow waits up to 30 seconds for the broker to become ready by polling the health endpoint:
for i in {1..30}; do
if curl -f http://localhost:9080/health 2>/dev/null; then
echo "Broker server ready."
break
fi
if [ $i -eq 30 ]; then
echo "Broker failed to start within 30 seconds"
exit 1
fi
sleep 1
done
This ensures the broker is fully operational before benchmark execution begins.
The Makefile provides a convenience target:
make bench
This internally invokes the benchmark binary with default parameters. To customize benchmark parameters in CI, modify the bench target in the Makefile or pass environment variables.
Each producer operates independently in its own goroutine. Within each producer, partition sends are parallelized:
BenchmarkRunner
βββ Producer Goroutine 0
β βββ Partition 0 Goroutine
β βββ Partition 1 Goroutine
β βββ Partition N Goroutine
βββ Producer Goroutine 1
β βββ Partition 0 Goroutine
β βββ Partition 1 Goroutine
β βββ Partition N Goroutine
βββ ...
Message Distribution: Messages are distributed evenly across partitions using integer division:
msgsPerPartition = NumMessages / Partitions
Remainder messages are distributed to the first remainder partitions
Each partition is consumed by a dedicated goroutine. Consumers are assigned to partitions using modulo arithmetic:
consumerID = partitionID % NumConsumers
This ensures partition messages are consumed in order while distributing load across consumers.
The benchmark client uses a configurable timeout for all network operations:
AckTimeout = 5 * time.SecondAckTimeout * 2 = 10 seconds)If operations exceed the timeout, the benchmark fails with a descriptive error message indicating which producer/consumer encountered the timeout.
Producer errors are collected using a mutex-protected slice. If any producer fails, the benchmark reports:
| Aspect | Behavior |
|---|---|
| Error Collection | Producer errors stored in mutex-protected slice |
| Reported Metrics | Total number of failed producers, first error encountered |
| Benchmark Behavior | Terminates immediately if any producer phase errors occur |
Consumer errors are printed to stdout but do not terminate the benchmark.
Each consumer reports:
This allows partial results even if some consumers encounter issues.
All benchmark messages use the standard cursus protocol:
util.EncodeMessage(topic, payload) creates a topic-prefixed messageserver.CompressMessage() compresses the payloadutil.WriteWithLength() sends a 4-byte length prefix followed by message dataThe consumer phase sends commands in this format:
CONSUME <topic> <partition> <offset>
This instructs the broker to stream messages from partition 0 starting at offset 0.