cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

timekeeping.rst (9041B)


      1===========================================================
      2Clock sources, Clock events, sched_clock() and delay timers
      3===========================================================
      4
      5This document tries to briefly explain some basic kernel timekeeping
      6abstractions. It partly pertains to the drivers usually found in
      7drivers/clocksource in the kernel tree, but the code may be spread out
      8across the kernel.
      9
     10If you grep through the kernel source you will find a number of architecture-
     11specific implementations of clock sources, clockevents and several likewise
     12architecture-specific overrides of the sched_clock() function and some
     13delay timers.
     14
     15To provide timekeeping for your platform, the clock source provides
     16the basic timeline, whereas clock events shoot interrupts on certain points
     17on this timeline, providing facilities such as high-resolution timers.
     18sched_clock() is used for scheduling and timestamping, and delay timers
     19provide an accurate delay source using hardware counters.
     20
     21
     22Clock sources
     23-------------
     24
     25The purpose of the clock source is to provide a timeline for the system that
     26tells you where you are in time. For example issuing the command 'date' on
     27a Linux system will eventually read the clock source to determine exactly
     28what time it is.
     29
     30Typically the clock source is a monotonic, atomic counter which will provide
     31n bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over.
     32It will ideally NEVER stop ticking as long as the system is running. It
     33may stop during system suspend.
     34
     35The clock source shall have as high resolution as possible, and the frequency
     36shall be as stable and correct as possible as compared to a real-world wall
     37clock. It should not move unpredictably back and forth in time or miss a few
     38cycles here and there.
     39
     40It must be immune to the kind of effects that occur in hardware where e.g.
     41the counter register is read in two phases on the bus lowest 16 bits first
     42and the higher 16 bits in a second bus cycle with the counter bits
     43potentially being updated in between leading to the risk of very strange
     44values from the counter.
     45
     46When the wall-clock accuracy of the clock source isn't satisfactory, there
     47are various quirks and layers in the timekeeping code for e.g. synchronizing
     48the user-visible time to RTC clocks in the system or against networked time
     49servers using NTP, but all they do basically is update an offset against
     50the clock source, which provides the fundamental timeline for the system.
     51These measures does not affect the clock source per se, they only adapt the
     52system to the shortcomings of it.
     53
     54The clock source struct shall provide means to translate the provided counter
     55into a nanosecond value as an unsigned long long (unsigned 64 bit) number.
     56Since this operation may be invoked very often, doing this in a strict
     57mathematical sense is not desirable: instead the number is taken as close as
     58possible to a nanosecond value using only the arithmetic operations
     59multiply and shift, so in clocksource_cyc2ns() you find:
     60
     61  ns ~= (clocksource * mult) >> shift
     62
     63You will find a number of helper functions in the clock source code intended
     64to aid in providing these mult and shift values, such as
     65clocksource_khz2mult(), clocksource_hz2mult() that help determine the
     66mult factor from a fixed shift, and clocksource_register_hz() and
     67clocksource_register_khz() which will help out assigning both shift and mult
     68factors using the frequency of the clock source as the only input.
     69
     70For real simple clock sources accessed from a single I/O memory location
     71there is nowadays even clocksource_mmio_init() which will take a memory
     72location, bit width, a parameter telling whether the counter in the
     73register counts up or down, and the timer clock rate, and then conjure all
     74necessary parameters.
     75
     76Since a 32-bit counter at say 100 MHz will wrap around to zero after some 43
     77seconds, the code handling the clock source will have to compensate for this.
     78That is the reason why the clock source struct also contains a 'mask'
     79member telling how many bits of the source are valid. This way the timekeeping
     80code knows when the counter will wrap around and can insert the necessary
     81compensation code on both sides of the wrap point so that the system timeline
     82remains monotonic.
     83
     84
     85Clock events
     86------------
     87
     88Clock events are the conceptual reverse of clock sources: they take a
     89desired time specification value and calculate the values to poke into
     90hardware timer registers.
     91
     92Clock events are orthogonal to clock sources. The same hardware
     93and register range may be used for the clock event, but it is essentially
     94a different thing. The hardware driving clock events has to be able to
     95fire interrupts, so as to trigger events on the system timeline. On an SMP
     96system, it is ideal (and customary) to have one such event driving timer per
     97CPU core, so that each core can trigger events independently of any other
     98core.
     99
    100You will notice that the clock event device code is based on the same basic
    101idea about translating counters to nanoseconds using mult and shift
    102arithmetic, and you find the same family of helper functions again for
    103assigning these values. The clock event driver does not need a 'mask'
    104attribute however: the system will not try to plan events beyond the time
    105horizon of the clock event.
    106
    107
    108sched_clock()
    109-------------
    110
    111In addition to the clock sources and clock events there is a special weak
    112function in the kernel called sched_clock(). This function shall return the
    113number of nanoseconds since the system was started. An architecture may or
    114may not provide an implementation of sched_clock() on its own. If a local
    115implementation is not provided, the system jiffy counter will be used as
    116sched_clock().
    117
    118As the name suggests, sched_clock() is used for scheduling the system,
    119determining the absolute timeslice for a certain process in the CFS scheduler
    120for example. It is also used for printk timestamps when you have selected to
    121include time information in printk for things like bootcharts.
    122
    123Compared to clock sources, sched_clock() has to be very fast: it is called
    124much more often, especially by the scheduler. If you have to do trade-offs
    125between accuracy compared to the clock source, you may sacrifice accuracy
    126for speed in sched_clock(). It however requires some of the same basic
    127characteristics as the clock source, i.e. it should be monotonic.
    128
    129The sched_clock() function may wrap only on unsigned long long boundaries,
    130i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
    131after circa 585 years. (For most practical systems this means "never".)
    132
    133If an architecture does not provide its own implementation of this function,
    134it will fall back to using jiffies, making its maximum resolution 1/HZ of the
    135jiffy frequency for the architecture. This will affect scheduling accuracy
    136and will likely show up in system benchmarks.
    137
    138The clock driving sched_clock() may stop or reset to zero during system
    139suspend/sleep. This does not matter to the function it serves of scheduling
    140events on the system. However it may result in interesting timestamps in
    141printk().
    142
    143The sched_clock() function should be callable in any context, IRQ- and
    144NMI-safe and return a sane value in any context.
    145
    146Some architectures may have a limited set of time sources and lack a nice
    147counter to derive a 64-bit nanosecond value, so for example on the ARM
    148architecture, special helper functions have been created to provide a
    149sched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
    150same counter that is also used as clock source is used for this purpose.
    151
    152On SMP systems, it is crucial for performance that sched_clock() can be called
    153independently on each CPU without any synchronization performance hits.
    154Some hardware (such as the x86 TSC) will cause the sched_clock() function to
    155drift between the CPUs on the system. The kernel can work around this by
    156enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
    157that makes sched_clock() different from the ordinary clock source.
    158
    159
    160Delay timers (some architectures only)
    161--------------------------------------
    162
    163On systems with variable CPU frequency, the various kernel delay() functions
    164will sometimes behave strangely. Basically these delays usually use a hard
    165loop to delay a certain number of jiffy fractions using a "lpj" (loops per
    166jiffy) value, calibrated on boot.
    167
    168Let's hope that your system is running on maximum frequency when this value
    169is calibrated: as an effect when the frequency is geared down to half the
    170full frequency, any delay() will be twice as long. Usually this does not
    171hurt, as you're commonly requesting that amount of delay *or more*. But
    172basically the semantics are quite unpredictable on such systems.
    173
    174Enter timer-based delays. Using these, a timer read may be used instead of
    175a hard-coded loop for providing the desired delay.
    176
    177This is done by declaring a struct delay_timer and assigning the appropriate
    178function pointers and rate settings for this delay timer.
    179
    180This is available on some architectures like OpenRISC or ARM.