(developer talk, assuming familiarity with embedded development tools)
It may be possible to do unobtrusive statistical real-time profiling without debugger, by programming a timer at random intervals, peeking at the program counter on the stack, and building a histogram of relevant program counter values, say, in sdram. Associating the program counter values to the C++ code may give a distorted view due to compiler optimizations, but I can imagine that this approach would be useful.
Instrumenting functions for function call counting would hurt performance a lot, and the results would be boring for the dsp code.
Here is a tutorial about profiling, but I have not done this.
On the RTOS side, ChibiOS can be configured to keep per-thread statistics, and after firmware recompilation and flashing, those could be read out using an st-link debugger, or self-reported by an object similar to the