This document describes the platform-specific optimizations implemented in cursus’s disk persistence layer. The system uses Go’s build tag mechanism to provide optimized implementations for Linux systems while maintaining compatibility with Windows through standard I/O operations.
For general information about the disk persistence system architecture, see Disk Persistence System. For segment rotation and management, see Segment Management.
cursus uses build tags (//go:build) to compile different implementations based on the target operating system. This allows the system to leverage platform-specific system calls on Linux while providing a fallback implementation for Windows.
The DiskHandler struct is defined in common code, but the methods openSegment() and SendCurrentSegmentToConn() have platform-specific implementations that are selected at compile time.
When opening a segment file on Linux, cursus provides a hint to the kernel about the expected access pattern using unix.Fadvise() with the FADV_SEQUENTIAL flag.
The fadvise call provides performance benefits:
| Optimization | Description | Benefit |
|---|---|---|
| Aggressive Read-Ahead | Kernel pre-fetches more data from disk | Reduces read latency for sequential scans |
| Reduced Page Cache Priority | Pages are evicted sooner after use | Frees memory for more valuable cached data |
| I/O Scheduling Hint | Kernel can optimize disk I/O scheduling | Better disk throughput for sequential workloads |
The signature used is:
unix.Fadvise(int(f.Fd()), 0, 0, unix.FADV_SEQUENTIAL)
Where:
f.Fd()unix.FADV_SEQUENTIAL (sequential access pattern)The most significant optimization on Linux is the use of the sendfile() system call to transfer segment data directly from the file descriptor to a network socket without copying data through userspace.
The code first checks if the connection is a *net.TCPConn. If not, it falls back to io.Copy:
sysConn, ok := conn.(*net.TCPConn)
if !ok {
if _, err := d.file.Seek(0, 0); err != nil {
return err
}
_, err := io.Copy(conn, d.file)
return err
}
For TCP connections, the code obtains the underlying system socket:
rawConn, err := sysConn.SyscallConn()
Using rawConn.Control(), the code executes the sendfile system call with direct access to the file descriptor:
The sendfile loop implementation:
| Variable | Type | Purpose |
|---|---|---|
| inFd | int | File descriptor of source segment file |
| outFd | int | File descriptor of destination TCP socket |
| offset | *int64 | Current position in source file (updated by kernel) |
| size | int64 | Total bytes to transfer |
| n | int | Bytes transferred in this call |
The loop continues until offset >= size or an error occurs. The offset pointer is automatically updated by the kernel after each successful transfer.
| Aspect | Linux Implementation | Windows Implementation |
|---|---|---|
| System Call | unix.Sendfile() |
Standard io.Copy() |
| Data Path | Kernel-to-kernel transfer | Kernel → User → Kernel copies |
| Buffer Usage | No userspace buffering needed | Data copied through userspace buffer |
| CPU Overhead | Minimal (DMA transfer) | Higher (multiple memory copies) |
| Advisory Hints | fadvise(SEQUENTIAL) | None available |
| Code Complexity | More complex (raw socket handling) | Simpler (standard library) |
The Windows implementation:
if _, err := d.file.Seek(0, 0); err != nil {
return err
}
_, err := io.Copy(conn, d.file)
return err
This performs a standard buffered copy operation that reads data from the file into a userspace buffer and then writes it to the network socket.
The platform-specific optimizations are most beneficial in these scenarios:
| Scenario | Impact | Reason |
|---|---|---|
| Large segment transfers | High | sendfile eliminates multiple GB copies |
| High consumer count | High | Each consumer streams segments; CPU savings multiply |
| Memory-constrained systems | Medium | Reduced userspace buffer requirements |
| Network-saturated workloads | Medium | Lower CPU overhead allows higher network utilization |
| Small message workloads | Low | System call overhead may dominate |
For most use cases, the batching and asynchronous write path (covered in DiskHandler and Write Path) has a larger performance impact than the platform-specific optimizations.
However, for high-throughput segment streaming to consumers, the Linux optimizations provide measurable benefits.
The DiskManager orchestrates handler creation but does not directly interact with platform-specific code. It simply calls methods on DiskHandler that resolve to platform-specific implementations at compile time.
The GetHandler method creates handlers without knowledge of platform-specific implementations:
func (dm *DiskManager) GetHandler(topic string, partitionID int) (*DiskHandler, error) {
dm.mu.Lock()
defer dm.mu.Unlock()
key := fmt.Sprintf("%s_%d", topic, partitionID)
if dh, ok := dm.handlers[key]; ok {
return dh, nil
}
// Creates handler; platform-specific methods linked at compile time
dh, err := NewDiskHandler(dm.cfg, topic, partitionID, segmentSize)
if err != nil {
return nil, err
}
dm.handlers[key] = dh
return dh, nil
}
cursus’s platform-specific optimizations leverage Linux kernel features for high-performance segment transfers while maintaining Windows compatibility through standard I/O operations:
sendfile enables zero-copy file-to-socket transfersSyscallConn() for direct system call invocationos.OpenFile and bufio.Writerio.Copy for segment transfersThe build tag system ensures zero runtime overhead from platform detection—each platform compiles only its specific implementation.