cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

timerlat-tracer.rst (8887B)


      1###############
      2Timerlat tracer
      3###############
      4
      5The timerlat tracer aims to help the preemptive kernel developers to
      6find sources of wakeup latencies of real-time threads. Like cyclictest,
      7the tracer sets a periodic timer that wakes up a thread. The thread then
      8computes a *wakeup latency* value as the difference between the *current
      9time* and the *absolute time* that the timer was set to expire. The main
     10goal of timerlat is tracing in such a way to help kernel developers.
     11
     12Usage
     13-----
     14
     15Write the ASCII text "timerlat" into the current_tracer file of the
     16tracing system (generally mounted at /sys/kernel/tracing).
     17
     18For example::
     19
     20        [root@f32 ~]# cd /sys/kernel/tracing/
     21        [root@f32 tracing]# echo timerlat > current_tracer
     22
     23It is possible to follow the trace by reading the trace trace file::
     24
     25  [root@f32 tracing]# cat trace
     26  # tracer: timerlat
     27  #
     28  #                              _-----=> irqs-off
     29  #                             / _----=> need-resched
     30  #                            | / _---=> hardirq/softirq
     31  #                            || / _--=> preempt-depth
     32  #                            || /
     33  #                            ||||             ACTIVATION
     34  #         TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
     35  #            | |         |   ||||      |         |                  |                       |
     36          <idle>-0       [000] d.h1    54.029328: #1     context    irq timer_latency       932 ns
     37           <...>-867     [000] ....    54.029339: #1     context thread timer_latency     11700 ns
     38          <idle>-0       [001] dNh1    54.029346: #1     context    irq timer_latency      2833 ns
     39           <...>-868     [001] ....    54.029353: #1     context thread timer_latency      9820 ns
     40          <idle>-0       [000] d.h1    54.030328: #2     context    irq timer_latency       769 ns
     41           <...>-867     [000] ....    54.030330: #2     context thread timer_latency      3070 ns
     42          <idle>-0       [001] d.h1    54.030344: #2     context    irq timer_latency       935 ns
     43           <...>-868     [001] ....    54.030347: #2     context thread timer_latency      4351 ns
     44
     45
     46The tracer creates a per-cpu kernel thread with real-time priority that
     47prints two lines at every activation. The first is the *timer latency*
     48observed at the *hardirq* context before the activation of the thread.
     49The second is the *timer latency* observed by the thread. The ACTIVATION
     50ID field serves to relate the *irq* execution to its respective *thread*
     51execution.
     52
     53The *irq*/*thread* splitting is important to clarify in which context
     54the unexpected high value is coming from. The *irq* context can be
     55delayed by hardware-related actions, such as SMIs, NMIs, IRQs,
     56or by thread masking interrupts. Once the timer happens, the delay
     57can also be influenced by blocking caused by threads. For example, by
     58postponing the scheduler execution via preempt_disable(), scheduler
     59execution, or masking interrupts. Threads can also be delayed by the
     60interference from other threads and IRQs.
     61
     62Tracer options
     63---------------------
     64
     65The timerlat tracer is built on top of osnoise tracer.
     66So its configuration is also done in the osnoise/ config
     67directory. The timerlat configs are:
     68
     69 - cpus: CPUs at which a timerlat thread will execute.
     70 - timerlat_period_us: the period of the timerlat thread.
     71 - stop_tracing_us: stop the system tracing if a
     72   timer latency at the *irq* context higher than the configured
     73   value happens. Writing 0 disables this option.
     74 - stop_tracing_total_us: stop the system tracing if a
     75   timer latency at the *thread* context is higher than the configured
     76   value happens. Writing 0 disables this option.
     77 - print_stack: save the stack of the IRQ occurrence. The stack is printed
     78   after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
     79   is hit.
     80
     81timerlat and osnoise
     82----------------------------
     83
     84The timerlat can also take advantage of the osnoise: traceevents.
     85For example::
     86
     87        [root@f32 ~]# cd /sys/kernel/tracing/
     88        [root@f32 tracing]# echo timerlat > current_tracer
     89        [root@f32 tracing]# echo 1 > events/osnoise/enable
     90        [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
     91        [root@f32 tracing]# tail -10 trace
     92             cc1-87882   [005] d..h...   548.771078: #402268 context    irq timer_latency     13585 ns
     93             cc1-87882   [005] dNLh1..   548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
     94             cc1-87882   [005] dNLh2..   548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
     95             cc1-87882   [005] d...3..   548.771102: thread_noise:      cc1:87882 start 548.771078243 duration 9909 ns
     96      timerlat/5-1035    [005] .......   548.771104: #402268 context thread timer_latency     39960 ns
     97
     98In this case, the root cause of the timer latency does not point to a
     99single cause but to multiple ones. Firstly, the timer IRQ was delayed
    100for 13 us, which may point to a long IRQ disabled section (see IRQ
    101stacktrace section). Then the timer interrupt that wakes up the timerlat
    102thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
    103the cc1 thread noise took 9909 ns of time before the context switch.
    104Such pieces of evidence are useful for the developer to use other
    105tracing methods to figure out how to debug and optimize the system.
    106
    107It is worth mentioning that the *duration* values reported
    108by the osnoise: events are *net* values. For example, the
    109thread_noise does not include the duration of the overhead caused
    110by the IRQ execution (which indeed accounted for 12736 ns). But
    111the values reported by the timerlat tracer (timerlat_latency)
    112are *gross* values.
    113
    114The art below illustrates a CPU timeline and how the timerlat tracer
    115observes it at the top and the osnoise: events at the bottom. Each "-"
    116in the timelines means circa 1 us, and the time moves ==>::
    117
    118      External     timer irq                   thread
    119       clock        latency                    latency
    120       event        13585 ns                   39960 ns
    121         |             ^                         ^
    122         v             |                         |
    123         |-------------|                         |
    124         |-------------+-------------------------|
    125                       ^                         ^
    126  ========================================================================
    127                    [tmr irq]  [dev irq]
    128  [another thread...^       v..^       v.......][timerlat/ thread]  <-- CPU timeline
    129  =========================================================================
    130                    |-------|  |-------|
    131                            |--^       v-------|
    132                            |          |       |
    133                            |          |       + thread_noise: 9909 ns
    134                            |          +-> irq_noise: 6139 ns
    135                            +-> irq_noise: 7597 ns
    136
    137IRQ stacktrace
    138---------------------------
    139
    140The osnoise/print_stack option is helpful for the cases in which a thread
    141noise causes the major factor for the timer latency, because of preempt or
    142irq disabled. For example::
    143
    144        [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
    145        [root@f32 tracing]# echo 500 > osnoise/print_stack
    146        [root@f32 tracing]# echo timerlat > current_tracer
    147        [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
    148          insmod-1026    [007] dN.h1..   200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
    149          insmod-1026    [007] d..h1..   200.202587: #29800 context    irq timer_latency      1616 ns
    150          insmod-1026    [007] dN.h2..   200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
    151          insmod-1026    [007] dN.h3..   200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
    152          insmod-1026    [007] d...3..   200.203444: thread_noise:   insmod:1026 start 200.202586933 duration 838681 ns
    153      timerlat/7-1001    [007] .......   200.203445: #29800 context thread timer_latency    859978 ns
    154      timerlat/7-1001    [007] ....1..   200.203446: <stack trace>
    155  => timerlat_irq
    156  => __hrtimer_run_queues
    157  => hrtimer_interrupt
    158  => __sysvec_apic_timer_interrupt
    159  => asm_call_irq_on_stack
    160  => sysvec_apic_timer_interrupt
    161  => asm_sysvec_apic_timer_interrupt
    162  => delay_tsc
    163  => dummy_load_1ms_pd_init
    164  => do_one_initcall
    165  => do_init_module
    166  => __do_sys_finit_module
    167  => do_syscall_64
    168  => entry_SYSCALL_64_after_hwframe
    169
    170In this case, it is possible to see that the thread added the highest
    171contribution to the *timer latency* and the stack trace, saved during
    172the timerlat IRQ handler, points to a function named
    173dummy_load_1ms_pd_init, which had the following code (on purpose)::
    174
    175	static int __init dummy_load_1ms_pd_init(void)
    176	{
    177		preempt_disable();
    178		mdelay(1);
    179		preempt_enable();
    180		return 0;
    181
    182	}