This document describes the monitoring and observability capabilities built into cursus, including health check endpoints, Prometheus metrics, and structured logging.
These features enable operators to monitor broker health, track performance metrics, and diagnose issues in production environments.
cursus provides three primary observability mechanisms:
| Component | Port | Protocol | Purpose |
|---|---|---|---|
| Health Check Server | 9080 | HTTP | Load balancer health checks, liveness probes |
| Metrics Exporter | 9100 | HTTP (Prometheus) | Performance metrics, resource utilization |
| Structured Logging | N/A | stdout/stderr | Request tracing, command auditing, error diagnostics |
Note: All components can be independently configured and integrate with standard monitoring infrastructure (Prometheus, Grafana, ELK stack, etc.).
The health check system is implemented as a lightweight HTTP server running on a separate port from the main broker listener. This separation ensures health checks don’t compete with broker traffic.
startHealthCheckServer(): Initializes HTTP server| Endpoint | Method | Response Codes | Response Body |
|---|---|---|---|
| /health | GET | 200, 503 | “OK” or “Broker not ready: Main listener not active” |
| / | GET | 200, 503 | Same as /health |
Note: Both endpoints return the same health status. The root endpoint(/) allows simple curl commands without specifying a path.
broker:
port: 9000
# Health check configuration
# Defaults to 9080 if not specified
The health check port is configured via:
health_check_port fieldThe Prometheus metrics exporter runs as a separate HTTP server on port 9100, exposing metrics in the standard Prometheus text format. The exporter can be toggled via configuration.
While the pkg/metrics package implementation isn’t shown in the provided files, the architecture suggests these metric categories based on the system design:
| Category | Metric Examples | Description |
|---|---|---|
| Topic Metrics | Topics created, deleted, active topics | Topic lifecycle tracking |
| Partition Metrics | Messages per partition, partition queue depth | Partition-level performance |
| Disk Metrics | Bytes written, flush operations, segment rotations | Storage subsystem performance |
| Consumer Metrics | Active consumers, messages delivered | Consumer tracking |
| Server Metrics | Active connections, commands processed | Server-level operations |
The TopicManager includes a metrics field [pkg/topic] for tracking deduplication and message flow.
manifests/config.yaml
broker:
enable_exporter: true
exporter_port: 9100
prometheus:
scrape_interval: "5s"
scrape_timeout: "3s"
| Parameter | Type | Default | Description |
|---|---|---|---|
| enable_exporter | boolean | true | Enable/disable Prometheus exporter |
| exporter_port | integer | 9100 | Port for metrics HTTP server |
Command-line Flags:
--exporter: Enable exporter (boolean)--exporter-port: Port number (integer)To scrape cursus metrics, configure Prometheus with:
prometheus.yml
scrape_configs:
- job_name: 'cursus'
scrape_interval: 5s
scrape_timeout: 3s
static_configs:
- targets: ['localhost:9100']
For Docker deployments, use the service name:
static_configs:
- targets: ['cursus:9100']
cursus implements structured logging throughout the codebase using Go’s standard log package with prefixed message categories. All logs are written to stdout/stderr for compatibility with container log aggregation systems.
Request Logging
[REQ] [192.168.1.10:52341] Received request. Topic: 'orders', Payload: 'CREATE orders 4'
Command Result Logging
[CMD] SUCCESS | Command: [CREATE orders 4] | Response: ✅ Topic 'orders' now has 4 partitions
[CMD] FAILURE | Command: [DELETE nonexistent] | Response: ERROR: topic 'nonexistent' not found
Streaming Logging
[STREAM] Completed streaming 342 messages for command [CONSUME orders 0 0]
Error Logging
[CONSUME_ERR] Error streaming data for command [CONSUME orders 0 0]: failed to read messages from disk
Operational Logging
📈 Prometheus exporter started on port 9100
🧩 Broker listening on :9000 (TLS=false, Gzip=false)
🩺 Health check endpoint started on port 9080
Log Categories Detail
| Prefix | Source | Level | Description |
|---|---|---|---|
[REQ] |
HandleConnection | INFO | Incoming request metadata |
[INPUT_WARN] |
HandleConnection | WARN | Malformed input |
[CMD] |
CommandHandler | INFO | Command execution results (SUCCESS/FAILURE) |
[STREAM] |
HandleConsumeCommand | INFO | Message streaming completion status |
[CONSUME_ERR] |
HandleConsumeCommand | ERROR | Errors during message consumption |
All connection-related logs include the remote client address for request tracing:
clientAddr := conn.RemoteAddr().String()
log.Printf("[%s] Received request. Topic: '%s', Payload: '%s'",
clientAddr, topicName, payload)
This enables:
Complete Configuration Example
manifests/config.yaml
broker:
# Network Ports
port: 9000 # Main broker TCP port
health_check_port: 9080 # Health check HTTP port (optional, defaults to 9080)
# Observability
enable_exporter: true # Enable Prometheus metrics
exporter_port: 9100 # Prometheus exporter port
# System Configuration
cleanup_interval: 60 # Deduplication cleanup interval (seconds)
use_tls: false # TLS encryption
enable_gzip: false # Message compression
# Performance Tuning
channel_buffer_size: 10000 # Partition channel buffer
disk_flush_batch_size: 500 # Messages per disk flush
linger_ms: 100 # Max wait before flush (ms)
# Prometheus Scrape Configuration
prometheus:
scrape_interval: "5s"
scrape_timeout: "3s"
Priority Order (highest to lowest):
manifests/docker-compose.yml
services:
broker:
ports:
- "9000:9000" # Main broker port
- "9100:9100" # Metrics exporter
- "9080:9080" # Health check
environment:
- CONFIG_PATH=/root/config.yaml
volumes:
- ./config.yaml:/root/config.yaml
All three ports must be exposed for full observability:
Load Balancer Configuration:
Configure load balancers to use the health endpoint for backend health checks:
Health Check URL: http://broker:9080/health
Expected Status: 200
Check Interval: 10s
Timeout: 3s
Unhealthy Threshold: 3
Kubernetes Liveness Probe:
livenessProbe:
httpGet:
path: /health
port: 9080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
Recommended Prometheus Alerts:
| Alert | Condition | Severity | Description |
|---|---|---|---|
| BrokerDown | up{job="cursus"} == 0 |
Critical | Broker unreachable |
| HighDiskUsage | Disk segment count growing rapidly | Warning | May need compaction |
| ConsumerLag | Consumer offset far behind latest | Warning | Processing bottleneck |
| HighErrorRate | Command failure rate > 5% | Warning | Application issues |
Recommended Setup:
docker logs -f cursus for developmentfilebeat.inputs:
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
processors:
- add_docker_metadata: ~
- decode_json_fields:
fields: ["message"]
target: "json"
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "cursus-logs-%{+yyyy.MM.dd}"
Symptom: Health check returns 503
Diagnosis:
curl http://localhost:9080/health
# Response: "Broker not ready: Main listener not active"
Possible Causes:
Resolution:
netstat -an | grep 9000Symptom: Prometheus cannot scrape metrics
Diagnosis:
curl http://localhost:9100/metrics
# Connection refused or timeout
Possible Causes:
Resolution:
docker ps📈 Prometheus exporter startedSymptom: Expected log messages not appearing
Possible Causes:
Resolution:
# View container logs directly
docker logs -f cursus
# Check Docker logging driver
docker inspect cursus | grep LogConfig
# Verify log output
docker exec cursus ls -la /proc/1/fd/1
cursus provides comprehensive observability through three independent systems:
All three systems are production-ready, configurable, and integrate with standard monitoring infrastructure. The separation of concerns (different ports, different protocols) ensures that monitoring operations don’t interfere with broker functionality.