Uses PRFC0 (KOS default timer) for nanosecond timing. PRFC1
available for any of the 33 SH4 event modes (cache misses, branch taken,
FPU instructions, etc.).
Zone begin/end is ~10ns overhead (just perf_cntr_timer_ns read +
subtract).
Rolling 60-frame window smooths spikes for readable FPS/ms
display.
The overlay renderer uses KOS bfont_draw_str() or
Shachi’s own font system.
Dump to /pc/ works via dcload — standard KOS
workflow.
#ifdef NDEBUG strips all profiler calls to zero-cost
for release builds.