fix: eliminate excessive procfs scanning causing high CPU usage (#45)

The procfs-based process lookup was triggering a full scan on every cache
miss instead of relying on periodic refresh. This caused 50+ full procfs
scans per enrichment cycle when multiple connections lacked process info.

Changed get_process_for_connection() to do simple cache lookups only.
Periodic refresh (every 5s) is already handled by the enrichment thread.

Also added PROFILING.md with flamegraph profiling guide.
This commit is contained in:
Marco Cadetg
2025-10-11 16:40:47 +02:00
committed by GitHub
parent 0d55a86605
commit 529cab9d12
3 changed files with 187 additions and 17 deletions

180
PROFILING.md Normal file
View File

@@ -0,0 +1,180 @@
# RustNet Performance Profiling Guide
This guide explains how to profile RustNet to identify performance bottlenecks.
## Quick Start
### CPU Profiling with perf + flamegraph
The easiest way to profile CPU usage on Linux:
```bash
# 1. Install flamegraph tools
cargo install flamegraph
# 2. Build a release binary with debug symbols
# IMPORTANT: Debug symbols are required for meaningful flamegraphs!
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release --features linux-default
# Or add this to Cargo.toml temporarily:
# [profile.release]
# debug = true
# 3. Run with profiling (requires sudo for perf)
# Note: Use full path to flamegraph since sudo doesn't have your user's PATH
# IMPORTANT: Use -- before the command to profile
sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
# Or specify interface and other args after the binary
sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet -i eth0
# Alternatively, preserve PATH for cleaner commands:
sudo env "PATH=$PATH" flamegraph -- ./target/release/rustnet
# 4. Open the generated flamegraph.svg in a browser
firefox flamegraph.svg
```
### Alternative: Using perf directly
If you prefer to use `perf` directly:
```bash
# Build with debug symbols
cargo build --release --features linux-default
# Record performance data (run for 30-60 seconds, then Ctrl+C to stop)
sudo perf record -F 99 -g ./target/release/rustnet -i eth0
# Generate flamegraph (requires FlameGraph scripts)
# Install from: https://github.com/brendangregg/FlameGraph
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Or view in perf's TUI
sudo perf report
```
### Profiling a Running Instance
If RustNet is already running:
```bash
# Find the PID
ps aux | grep rustnet
# Profile the running process for 60 seconds
sudo -E ~/.cargo/bin/flamegraph -p <PID> --output rustnet-live.svg
# Or with perf directly
sudo perf record -F 99 -g -p <PID> sleep 60
sudo perf report
```
## Interpreting Flamegraphs
Look for:
- **Wide bars at the bottom**: Functions that consume a lot of total CPU time
- **Tall stacks**: Deep call chains (potential optimization targets)
- **Hot spots**: Functions with many samples (bright colors in some viewers)
Common hot spots:
- `packet_parser::parse_packet`: Normal - this is the core packet processing
- `DashMap::iter` or `iter_mut`: If this is a large portion, consider reducing iteration frequency
- `clone`: If excessive, reduce unnecessary cloning
- System calls (`read`, `write`, `ioctl`): Filesystem or network I/O overhead
## Benchmarking
For consistent benchmarks:
```bash
# Run with consistent traffic
sudo ./target/release/rustnet --interface eth0 &
PID=$!
# Monitor CPU usage
top -p $PID
# Or use perf stat for detailed metrics
sudo perf stat -p $PID sleep 60
# Stop the application
sudo kill $PID
```
## Performance Regression Testing
After making changes, compare before/after:
```bash
# Baseline (before changes)
sudo perf stat -r 3 timeout 60s ./target/release/rustnet-before > /dev/null
# After changes
sudo perf stat -r 3 timeout 60s ./target/release/rustnet > /dev/null
```
Key metrics to compare:
- CPU cycles
- Instructions per cycle (IPC)
- Cache misses
- Context switches
## Troubleshooting Flamegraphs
### Empty or Single-Entry Flamegraph
If your flamegraph only shows "rustnet (100%)" with no details:
**Problem**: Debug symbols are missing from the release build.
**Solution**:
```bash
# Rebuild with debug symbols
CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release --features linux-default
# Or add to Cargo.toml:
[profile.release]
debug = true
# Then re-profile
sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
```
### Flamegraph Shows Only Kernel Functions
**Problem**: Running with insufficient permissions or perf can't access user-space symbols.
**Solution**:
```bash
# Check perf_event_paranoid setting
cat /proc/sys/kernel/perf_event_paranoid
# If it's > 1, temporarily lower it (requires root):
sudo sysctl kernel.perf_event_paranoid=1
# Or run as root
sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
```
### Very Short Flamegraph (< 1000 samples)
**Problem**: Profiling session too short, not enough data collected.
**Solution**:
```bash
# Let rustnet run for at least 30-60 seconds before stopping
# The more network traffic, the better the profile
# For longer profiling:
timeout 60 sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
```
## Debugging Slow TUI
If the TUI feels sluggish:
1. **Check refresh rate**: Default is 1000ms, can be adjusted with `--refresh-interval`
2. **Check connection count**: High connection counts increase sorting overhead
3. **Profile the UI loop**: Look for hot spots in `run_ui_loop`, `draw`, or `sort_connections`
4. **Monitor thread contention**: Check if packet processing threads are blocking the snapshot provider

View File

@@ -195,6 +195,7 @@ See [USAGE.md](USAGE.md) for complete timeout details.
- **[INSTALL.md](INSTALL.md)** - Detailed installation instructions for all platforms, permission setup, and troubleshooting
- **[USAGE.md](USAGE.md)** - Complete usage guide including command-line options, filtering, sorting, and logging
- **[ARCHITECTURE.md](ARCHITECTURE.md)** - Technical architecture, platform implementations, and performance details
- **[PROFILING.md](PROFILING.md)** - Performance profiling guide with flamegraph setup and optimization tips
- **[ROADMAP.md](ROADMAP.md)** - Planned features and future improvements
- **[RELEASE.md](RELEASE.md)** - Release process for maintainers
- **[EBPF_BUILD.md](EBPF_BUILD.md)** - eBPF build instructions and requirements

View File

@@ -196,23 +196,12 @@ impl ProcessLookup for LinuxProcessLookup {
fn get_process_for_connection(&self, conn: &Connection) -> Option<(u32, String)> {
let key = ConnectionKey::from_connection(conn);
// Try cache first
{
let cache = self.cache.read().unwrap();
if cache.last_refresh.elapsed() < Duration::from_secs(2)
&& let Some(process_info) = cache.lookup.get(&key)
{
return Some(process_info.clone());
}
}
// Cache is stale or miss, refresh
if self.refresh().is_ok() {
let cache = self.cache.read().unwrap();
cache.lookup.get(&key).cloned()
} else {
None
}
// Simple cache lookup with no refresh on cache miss.
// The enrichment thread (app.rs:495-500) handles periodic refresh every 5 seconds.
// IMPORTANT: Do NOT refresh here as it caused high CPU usage when called for every
// connection without process info (flamegraph showed this was the main bottleneck).
let cache = self.cache.read().unwrap();
cache.lookup.get(&key).cloned()
}
fn refresh(&self) -> Result<()> {