Prompt
When collecting performance numbers, ensure both (1) benchmark parameters match the target hardware and (2) the execution environment is stable/isolated—otherwise results can be misleading.
Apply this:
- Use hardware-appropriate CPU threading: don’t oversubscribe a small CPU with too many threads.
- For accelerator/GPU runs, reduce sources of contention by using a conservative CPU thread count (e.g., single-thread or a value aligned to physical cores) and keep the GPU as uncontended as possible.
- If results are unstable across runs, treat it as a performance “bug” in your test setup: check whether other processes are using the GPU, stop/avoid them, and retest until the output stabilizes before updating documentation.
Example (command-line style):
- CPU thread sizing:
- Prefer something aligned to the machine, e.g.
./benchncnn <loop> 1 ...or./benchncnn <loop> 8 ...for an 8-core CPU, rather than a larger thread count.
- Prefer something aligned to the machine, e.g.
- Retest after isolating the GPU:
- If you suspect other GPU activity, rerun the exact same command after eliminating contention, then record only the stable run results.