From 529cab9d121af219d6ff889a33270c93729dcfde Mon Sep 17 00:00:00 2001
From: Marco Cadetg <cadetg@gmail.com>
Date: Sat, 11 Oct 2025 16:40:47 +0200
Subject: [PATCH] fix: eliminate excessive procfs scanning causing high CPU
 usage (#45)

The procfs-based process lookup was triggering a full scan on every cache
miss instead of relying on periodic refresh. This caused 50+ full procfs
scans per enrichment cycle when multiple connections lacked process info.

Changed get_process_for_connection() to do simple cache lookups only.
Periodic refresh (every 5s) is already handled by the enrichment thread.

Also added PROFILING.md with flamegraph profiling guide.
---
 PROFILING.md                  | 180 ++++++++++++++++++++++++++++++++++
 README.md                     |   1 +
 src/network/platform/linux.rs |  23 ++---
 3 files changed, 187 insertions(+), 17 deletions(-)
 create mode 100644 PROFILING.md
diff --git a/PROFILING.md b/PROFILING.md
new file mode 100644
index 0000000..d4483b7
--- /dev/null
+++ b/PROFILING.md
@@ -0,0 +1,180 @@
+# RustNet Performance Profiling Guide
+
+This guide explains how to profile RustNet to identify performance bottlenecks.
+
+## Quick Start
+
+### CPU Profiling with perf + flamegraph
+
+The easiest way to profile CPU usage on Linux:
+
+```bash
+# 1. Install flamegraph tools
+cargo install flamegraph
+
+# 2. Build a release binary with debug symbols
+# IMPORTANT: Debug symbols are required for meaningful flamegraphs!
+CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release --features linux-default
+
+# Or add this to Cargo.toml temporarily:
+# [profile.release]
+# debug = true
+
+# 3. Run with profiling (requires sudo for perf)
+# Note: Use full path to flamegraph since sudo doesn't have your user's PATH
+# IMPORTANT: Use -- before the command to profile
+sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
+
+# Or specify interface and other args after the binary
+sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet -i eth0
+
+# Alternatively, preserve PATH for cleaner commands:
+sudo env "PATH=$PATH" flamegraph -- ./target/release/rustnet
+
+# 4. Open the generated flamegraph.svg in a browser
+firefox flamegraph.svg
+```
+
+### Alternative: Using perf directly
+
+If you prefer to use `perf` directly:
+
+```bash
+# Build with debug symbols
+cargo build --release --features linux-default
+
+# Record performance data (run for 30-60 seconds, then Ctrl+C to stop)
+sudo perf record -F 99 -g ./target/release/rustnet -i eth0
+
+# Generate flamegraph (requires FlameGraph scripts)
+# Install from: https://github.com/brendangregg/FlameGraph
+perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
+
+# Or view in perf's TUI
+sudo perf report
+```
+
+### Profiling a Running Instance
+
+If RustNet is already running:
+
+```bash
+# Find the PID
+ps aux | grep rustnet
+
+# Profile the running process for 60 seconds
+sudo -E ~/.cargo/bin/flamegraph -p <PID> --output rustnet-live.svg
+
+# Or with perf directly
+sudo perf record -F 99 -g -p <PID> sleep 60
+sudo perf report
+```
+
+## Interpreting Flamegraphs
+
+Look for:
+- **Wide bars at the bottom**: Functions that consume a lot of total CPU time
+- **Tall stacks**: Deep call chains (potential optimization targets)
+- **Hot spots**: Functions with many samples (bright colors in some viewers)
+
+Common hot spots:
+- `packet_parser::parse_packet`: Normal - this is the core packet processing
+- `DashMap::iter` or `iter_mut`: If this is a large portion, consider reducing iteration frequency
+- `clone`: If excessive, reduce unnecessary cloning
+- System calls (`read`, `write`, `ioctl`): Filesystem or network I/O overhead
+
+## Benchmarking
+
+For consistent benchmarks:
+
+```bash
+# Run with consistent traffic
+sudo ./target/release/rustnet --interface eth0 &
+PID=$!
+
+# Monitor CPU usage
+top -p $PID
+
+# Or use perf stat for detailed metrics
+sudo perf stat -p $PID sleep 60
+
+# Stop the application
+sudo kill $PID
+```
+
+## Performance Regression Testing
+
+After making changes, compare before/after:
+
+```bash
+# Baseline (before changes)
+sudo perf stat -r 3 timeout 60s ./target/release/rustnet-before > /dev/null
+
+# After changes
+sudo perf stat -r 3 timeout 60s ./target/release/rustnet > /dev/null
+```
+
+Key metrics to compare:
+- CPU cycles
+- Instructions per cycle (IPC)
+- Cache misses
+- Context switches
+
+## Troubleshooting Flamegraphs
+
+### Empty or Single-Entry Flamegraph
+
+If your flamegraph only shows "rustnet (100%)" with no details:
+
+**Problem**: Debug symbols are missing from the release build.
+
+**Solution**:
+```bash
+# Rebuild with debug symbols
+CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release --features linux-default
+
+# Or add to Cargo.toml:
+[profile.release]
+debug = true
+
+# Then re-profile
+sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
+```
+
+### Flamegraph Shows Only Kernel Functions
+
+**Problem**: Running with insufficient permissions or perf can't access user-space symbols.
+
+**Solution**:
+```bash
+# Check perf_event_paranoid setting
+cat /proc/sys/kernel/perf_event_paranoid
+
+# If it's > 1, temporarily lower it (requires root):
+sudo sysctl kernel.perf_event_paranoid=1
+
+# Or run as root
+sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
+```
+
+### Very Short Flamegraph (< 1000 samples)
+
+**Problem**: Profiling session too short, not enough data collected.
+
+**Solution**:
+```bash
+# Let rustnet run for at least 30-60 seconds before stopping
+# The more network traffic, the better the profile
+
+# For longer profiling:
+timeout 60 sudo -E ~/.cargo/bin/flamegraph -- ./target/release/rustnet
+```
+
+## Debugging Slow TUI
+
+If the TUI feels sluggish:
+
+1. **Check refresh rate**: Default is 1000ms, can be adjusted with `--refresh-interval`
+2. **Check connection count**: High connection counts increase sorting overhead
+3. **Profile the UI loop**: Look for hot spots in `run_ui_loop`, `draw`, or `sort_connections`
+4. **Monitor thread contention**: Check if packet processing threads are blocking the snapshot provider
diff --git a/README.md b/README.md
index 6d8ec6f..36e9b41 100644
--- a/README.md
+++ b/README.md
@@ -195,6 +195,7 @@ See [USAGE.md](USAGE.md) for complete timeout details.
 - **[INSTALL.md](INSTALL.md)** - Detailed installation instructions for all platforms, permission setup, and troubleshooting
 - **[USAGE.md](USAGE.md)** - Complete usage guide including command-line options, filtering, sorting, and logging
 - **[ARCHITECTURE.md](ARCHITECTURE.md)** - Technical architecture, platform implementations, and performance details
+- **[PROFILING.md](PROFILING.md)** - Performance profiling guide with flamegraph setup and optimization tips
 - **[ROADMAP.md](ROADMAP.md)** - Planned features and future improvements
 - **[RELEASE.md](RELEASE.md)** - Release process for maintainers
 - **[EBPF_BUILD.md](EBPF_BUILD.md)** - eBPF build instructions and requirements
diff --git a/src/network/platform/linux.rs b/src/network/platform/linux.rs
index 2b1aac5..95ae9ae 100644
--- a/src/network/platform/linux.rs
+++ b/src/network/platform/linux.rs
@@ -196,23 +196,12 @@ impl ProcessLookup for LinuxProcessLookup {
     fn get_process_for_connection(&self, conn: &Connection) -> Option<(u32, String)> {
         let key = ConnectionKey::from_connection(conn);
 
-        // Try cache first
-        {
-            let cache = self.cache.read().unwrap();
-            if cache.last_refresh.elapsed() < Duration::from_secs(2)
-                && let Some(process_info) = cache.lookup.get(&key)
-            {
-                return Some(process_info.clone());
-            }
-        }
-
-        // Cache is stale or miss, refresh
-        if self.refresh().is_ok() {
-            let cache = self.cache.read().unwrap();
-            cache.lookup.get(&key).cloned()
-        } else {
-            None
-        }
+        // Simple cache lookup with no refresh on cache miss.
+        // The enrichment thread (app.rs:495-500) handles periodic refresh every 5 seconds.
+        // IMPORTANT: Do NOT refresh here as it caused high CPU usage when called for every
+        // connection without process info (flamegraph showed this was the main bottleneck).
+        let cache = self.cache.read().unwrap();
+        cache.lookup.get(&key).cloned()
     }
 
     fn refresh(&self) -> Result<()> {