how to capture the thread switch events log
echo 1 >/sys/kernel/debug/tracing/events/sched/sched_switch/enable
echo 1 >/sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
echo 1 >/sys/kernel/debug/tracing/tracing_on
reproduce the issue
echo 0 >/sys/kernel/debug/tracing/tracing_on
cat /sys/kernel/debug/tracing/trace
.use ftrace during boot phase
https://cloud.tencent.com/developer/article/1797239
. ftrace Event Tracing
https://www.kernel.org/doc/html/v5.4/trace/events.html
改变ftrace buffer size
比如将每个cpu的ftrace buffer size设为2M,如下cmd,这个cmd是将所有的cpu的trace buffer size改为2M:
echo 2048 >/sys/kernel/debug/tracing/buffer_size_kb
可以cat buffer_total_size_kb确定所有cpu的trace buffer size总和,以及查看per cpu trace buffer size,设置per_cpu目录下的buffer_size_kb将只改变这个cpu的buffer size,不影响其它cpu的,这与tracing目录下的buffer_size_kb不同:
/sys/kernel/debug/tracing/buffer_total_size_kb
/sys/kernel/debug/tracing/per_cpu/cpu0/buffer_size_kb
kmem:rss_stat traceevent usage
console:/sys/kernel/debug/tracing/events/kmem/rss_stat # ls enable filter format id trigger
console:/sys/kernel/debug/tracing/events/kmem/rss_stat # cat format name: rss_stat ID: 416 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned int mm_id; offset:8; size:4; signed:0; field:unsigned int curr; offset:12; size:4; signed:0; field:int member; offset:16; size:4; signed:1; field:long size; offset:24; size:8; signed:1; print fmt: "mm_id=%u curr=%d member=%d size=%ldB", REC->mm_id, REC->curr, REC->member, REC->size
上述format表示此trace event log输出的格式,带common前缀的表示所有trace event都会有,后面的才是此trace event独有的field
console:/sys/kernel/debug/tracing # echo 0 >tracing_on console:/sys/kernel/debug/tracing # echo kmem:rss_stat >set_event console:/sys/kernel/debug/tracing # echo >trace console:/sys/kernel/debug/tracing # echo 1 >tracing_on
trace一段时间后,关掉trace,cat trace或者trace_pipe(这两者类似,都是将trace buffer的数据打印出来)
console:/sys/kernel/debug/tracing # cat trace # tracer: nop # # entries-in-buffer/entries-written: 10/10 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | sh-3371 [003] ...1 123.119251: rss_stat: mm_id=763553689 curr=0 member=1 size=528384B cat-4845 [001] .... 123.119670: rss_stat: mm_id=763553689 curr=1 member=0 size=905216B cat-4845 [001] ...1 123.120010: rss_stat: mm_id=763553689 curr=0 member=1 size=430080B cat-4845 [001] ...1 123.120065: rss_stat: mm_id=763553689 curr=0 member=0 size=126976B cat-4845 [001] .... 123.121636: rss_stat: mm_id=2475105935 curr=1 member=0 size=757760B cat-4845 [001] ...1 123.125860: rss_stat: mm_id=2475105935 curr=1 member=0 size=1052672B cat-4845 [001] .... 123.132250: rss_stat: mm_id=2475105935 curr=1 member=0 size=2228224B cat-4845 [001] ...1 123.138207: rss_stat: mm_id=2475105935 curr=1 member=0 size=2088960B cat-4845 [001] ...1 123.138856: rss_stat: mm_id=2475105935 curr=1 member=0 size=2293760B cat-4845 [001] ...1 123.138861: rss_stat: mm_id=2475105935 curr=1 member=1 size=524288B
如果想看rss_stat event trace的callstack,可以执行如下cmd,它会将走到此event trace的callstack给记录下来:
echo stacktrace >events/kmem/rss_stat/trigger
这样,cat trace的结果如下:
console:/sys/kernel/debug/tracing # cat trace # tracer: nop # # entries-in-buffer/entries-written: 20/20 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | sh-3371 [000] ...1 343.463160: rss_stat: mm_id=3042163371 curr=0 member=1 size=528384B sh-3371 [000] ...2 343.463170: <stack trace> => _do_fork => __arm64_sys_clone => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] .... 343.463674: rss_stat: mm_id=3042163371 curr=1 member=0 size=905216B cat-4848 [000] ...1 343.463679: <stack trace> => do_page_fault => do_mem_abort => el0_da cat-4848 [000] ...1 343.463981: rss_stat: mm_id=3042163371 curr=0 member=1 size=430080B cat-4848 [000] ...2 343.463984: <stack trace> => mmput => flush_old_exec => load_elf_binary => search_binary_handler => __do_execve_file => __arm64_sys_execve => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] ...1 343.464036: rss_stat: mm_id=3042163371 curr=0 member=0 size=126976B cat-4848 [000] ...2 343.464038: <stack trace> => mmput => flush_old_exec => load_elf_binary => search_binary_handler => __do_execve_file => __arm64_sys_execve => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] .... 343.466181: rss_stat: mm_id=1288803295 curr=1 member=0 size=790528B cat-4848 [000] ...1 343.466190: <stack trace> => do_page_fault => do_translation_fault => do_mem_abort => el0_da cat-4848 [000] ...1 343.470308: rss_stat: mm_id=1288803295 curr=1 member=0 size=1064960B cat-4848 [000] ...2 343.470318: <stack trace> => unmap_region => do_munmap => __arm64_sys_munmap => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] .... 343.476642: rss_stat: mm_id=1288803295 curr=1 member=0 size=2236416B cat-4848 [000] ...1 343.476651: <stack trace> => do_page_fault => do_mem_abort => el0_da cat-4848 [000] ...1 343.483058: rss_stat: mm_id=1288803295 curr=1 member=0 size=2088960B cat-4848 [000] ...2 343.483067: <stack trace> => do_munmap => __arm64_sys_munmap => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] ...1 343.483656: rss_stat: mm_id=1288803295 curr=1 member=0 size=2355200B cat-4848 [000] ...2 343.483660: <stack trace> => unmap_region => do_munmap => __arm64_sys_munmap => el0_svc_common => el0_svc_handler => el0_svc cat-4848 [000] ...1 343.484383: rss_stat: mm_id=1288803295 curr=1 member=1 size=643072B cat-4848 [000] ...2 343.484387: <stack trace> => unmap_region => do_munmap => __arm64_sys_munmap => el0_svc_common => el0_svc_handler => el0_svc
如果想看某一个进程的event trace,可以将目标进程的pid echo到/sys/kernel/debug/tracing/set_event_pid。
在tracing目录下还有一个set_ftrace_pid文件,如果是要trace event,是将pid echo进set_event_pid,而不是set_ftrace_pid,如果是echo进set_ftrace_pid,将不会起作用,还是会trace所有的进程的event。
trace options
console:/sys/kernel/debug/tracing # cat trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock trace_printk annotate nouserstacktrace nosym-userobj noprintk-msg-only context-info nolatency-format record-cmd norecord-tgid overwrite nodisable_on_free irq-info markers noevent-fork function-trace nofunction-fork nodisplay-graph nostacktrace notest_nop_accept notest_nop_refuse
比如function tracer默认会将pid打印出来,而function_graph默认则不会,执行如下命令使其将pid打印出来:
echo funcgraph-proc > trace_options
function/function_graph tracer
cd /sys/kernel/debug/tracing #set tracer: echo function >current_tracer #func tracer echo function_graph >current_tracer #func graph tracer #set interesting function: echo target_function >set_ftrace_filter #func tracer echo target_function >set_graph_function #func graph tracer #filter the un-wanted function: echo target_function >set_ftrace_notrace #func tracer echo target_function >set_graph_notrace #func graph tracer echo 1 >tracing_on #test job echo 0 >tracing_on cat trace
ftrace usage reference
https://www.kernel.org/doc/html/v5.4/trace/events.html