cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

suspend-flows.rst (11801B)


      1.. SPDX-License-Identifier: GPL-2.0
      2.. include:: <isonum.txt>
      3
      4=========================
      5System Suspend Code Flows
      6=========================
      7
      8:Copyright: |copy| 2020 Intel Corporation
      9
     10:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
     11
     12At least one global system-wide transition needs to be carried out for the
     13system to get from the working state into one of the supported
     14:doc:`sleep states <sleep-states>`.  Hibernation requires more than one
     15transition to occur for this purpose, but the other sleep states, commonly
     16referred to as *system-wide suspend* (or simply *system suspend*) states, need
     17only one.
     18
     19For those sleep states, the transition from the working state of the system into
     20the target sleep state is referred to as *system suspend* too (in the majority
     21of cases, whether this means a transition or a sleep state of the system should
     22be clear from the context) and the transition back from the sleep state into the
     23working state is referred to as *system resume*.
     24
     25The kernel code flows associated with the suspend and resume transitions for
     26different sleep states of the system are quite similar, but there are some
     27significant differences between the :ref:`suspend-to-idle <s2idle>` code flows
     28and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and
     29:ref:`standby <standby>` sleep states.
     30
     31The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states
     32cannot be implemented without platform support and the difference between them
     33boils down to the platform-specific actions carried out by the suspend and
     34resume hooks that need to be provided by the platform driver to make them
     35available.  Apart from that, the suspend and resume code flows for these sleep
     36states are mostly identical, so they both together will be referred to as
     37*platform-dependent suspend* states in what follows.
     38
     39
     40.. _s2idle_suspend:
     41
     42Suspend-to-idle Suspend Code Flow
     43=================================
     44
     45The following steps are taken in order to transition the system from the working
     46state to the :ref:`suspend-to-idle <s2idle>` sleep state:
     47
     48 1. Invoking system-wide suspend notifiers.
     49
     50    Kernel subsystems can register callbacks to be invoked when the suspend
     51    transition is about to occur and when the resume transition has finished.
     52
     53    That allows them to prepare for the change of the system state and to clean
     54    up after getting back to the working state.
     55
     56 2. Freezing tasks.
     57
     58    Tasks are frozen primarily in order to avoid unchecked hardware accesses
     59    from user space through MMIO regions or I/O registers exposed directly to
     60    it and to prevent user space from entering the kernel while the next step
     61    of the transition is in progress (which might have been problematic for
     62    various reasons).
     63
     64    All user space tasks are intercepted as though they were sent a signal and
     65    put into uninterruptible sleep until the end of the subsequent system resume
     66    transition.
     67
     68    The kernel threads that choose to be frozen during system suspend for
     69    specific reasons are frozen subsequently, but they are not intercepted.
     70    Instead, they are expected to periodically check whether or not they need
     71    to be frozen and to put themselves into uninterruptible sleep if so.  [Note,
     72    however, that kernel threads can use locking and other concurrency controls
     73    available in kernel space to synchronize themselves with system suspend and
     74    resume, which can be much more precise than the freezing, so the latter is
     75    not a recommended option for kernel threads.]
     76
     77 3. Suspending devices and reconfiguring IRQs.
     78
     79    Devices are suspended in four phases called *prepare*, *suspend*,
     80    *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more
     81    information on what exactly happens in each phase).
     82
     83    Every device is visited in each phase, but typically it is not physically
     84    accessed in more than two of them.
     85
     86    The runtime PM API is disabled for every device during the *late* suspend
     87    phase and high-level ("action") interrupt handlers are prevented from being
     88    invoked before the *noirq* suspend phase.
     89
     90    Interrupts are still handled after that, but they are only acknowledged to
     91    interrupt controllers without performing any device-specific actions that
     92    would be triggered in the working state of the system (those actions are
     93    deferred till the subsequent system resume transition as described
     94    `below <s2idle_resume_>`_).
     95
     96    IRQs associated with system wakeup devices are "armed" so that the resume
     97    transition of the system is started when one of them signals an event.
     98
     99 4. Freezing the scheduler tick and suspending timekeeping.
    100
    101    When all devices have been suspended, CPUs enter the idle loop and are put
    102    into the deepest available idle state.  While doing that, each of them
    103    "freezes" its own scheduler tick so that the timer events associated with
    104    the tick do not occur until the CPU is woken up by another interrupt source.
    105
    106    The last CPU to enter the idle state also stops the timekeeping which
    107    (among other things) prevents high resolution timers from triggering going
    108    forward until the first CPU that is woken up restarts the timekeeping.
    109    That allows the CPUs to stay in the deep idle state relatively long in one
    110    go.
    111
    112    From this point on, the CPUs can only be woken up by non-timer hardware
    113    interrupts.  If that happens, they go back to the idle state unless the
    114    interrupt that woke up one of them comes from an IRQ that has been armed for
    115    system wakeup, in which case the system resume transition is started.
    116
    117
    118.. _s2idle_resume:
    119
    120Suspend-to-idle Resume Code Flow
    121================================
    122
    123The following steps are taken in order to transition the system from the
    124:ref:`suspend-to-idle <s2idle>` sleep state into the working state:
    125
    126 1. Resuming timekeeping and unfreezing the scheduler tick.
    127
    128    When one of the CPUs is woken up (by a non-timer hardware interrupt), it
    129    leaves the idle state entered in the last step of the preceding suspend
    130    transition, restarts the timekeeping (unless it has been restarted already
    131    by another CPU that woke up earlier) and the scheduler tick on that CPU is
    132    unfrozen.
    133
    134    If the interrupt that has woken up the CPU was armed for system wakeup,
    135    the system resume transition begins.
    136
    137 2. Resuming devices and restoring the working-state configuration of IRQs.
    138
    139    Devices are resumed in four phases called *noirq resume*, *early resume*,
    140    *resume* and *complete* (see :ref:`driverapi_pm_devices` for more
    141    information on what exactly happens in each phase).
    142
    143    Every device is visited in each phase, but typically it is not physically
    144    accessed in more than two of them.
    145
    146    The working-state configuration of IRQs is restored after the *noirq* resume
    147    phase and the runtime PM API is re-enabled for every device whose driver
    148    supports it during the *early* resume phase.
    149
    150 3. Thawing tasks.
    151
    152    Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_
    153    transition are "thawed", which means that they are woken up from the
    154    uninterruptible sleep that they went into at that time and user space tasks
    155    are allowed to exit the kernel.
    156
    157 4. Invoking system-wide resume notifiers.
    158
    159    This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition
    160    and the same set of callbacks is invoked at this point, but a different
    161    "notification type" parameter value is passed to them.
    162
    163
    164Platform-dependent Suspend Code Flow
    165====================================
    166
    167The following steps are taken in order to transition the system from the working
    168state to platform-dependent suspend state:
    169
    170 1. Invoking system-wide suspend notifiers.
    171
    172    This step is the same as step 1 of the suspend-to-idle suspend transition
    173    described `above <s2idle_suspend_>`_.
    174
    175 2. Freezing tasks.
    176
    177    This step is the same as step 2 of the suspend-to-idle suspend transition
    178    described `above <s2idle_suspend_>`_.
    179
    180 3. Suspending devices and reconfiguring IRQs.
    181
    182    This step is analogous to step 3 of the suspend-to-idle suspend transition
    183    described `above <s2idle_suspend_>`_, but the arming of IRQs for system
    184    wakeup generally does not have any effect on the platform.
    185
    186    There are platforms that can go into a very deep low-power state internally
    187    when all CPUs in them are in sufficiently deep idle states and all I/O
    188    devices have been put into low-power states.  On those platforms,
    189    suspend-to-idle can reduce system power very effectively.
    190
    191    On the other platforms, however, low-level components (like interrupt
    192    controllers) need to be turned off in a platform-specific way (implemented
    193    in the hooks provided by the platform driver) to achieve comparable power
    194    reduction.
    195
    196    That usually prevents in-band hardware interrupts from waking up the system,
    197    which must be done in a special platform-dependent way.  Then, the
    198    configuration of system wakeup sources usually starts when system wakeup
    199    devices are suspended and is finalized by the platform suspend hooks later
    200    on.
    201
    202 4. Disabling non-boot CPUs.
    203
    204    On some platforms the suspend hooks mentioned above must run in a one-CPU
    205    configuration of the system (in particular, the hardware cannot be accessed
    206    by any code running in parallel with the platform suspend hooks that may,
    207    and often do, trap into the platform firmware in order to finalize the
    208    suspend transition).
    209
    210    For this reason, the CPU offline/online (CPU hotplug) framework is used
    211    to take all of the CPUs in the system, except for one (the boot CPU),
    212    offline (typically, the CPUs that have been taken offline go into deep idle
    213    states).
    214
    215    This means that all tasks are migrated away from those CPUs and all IRQs are
    216    rerouted to the only CPU that remains online.
    217
    218 5. Suspending core system components.
    219
    220    This prepares the core system components for (possibly) losing power going
    221    forward and suspends the timekeeping.
    222
    223 6. Platform-specific power removal.
    224
    225    This is expected to remove power from all of the system components except
    226    for the memory controller and RAM (in order to preserve the contents of the
    227    latter) and some devices designated for system wakeup.
    228
    229    In many cases control is passed to the platform firmware which is expected
    230    to finalize the suspend transition as needed.
    231
    232
    233Platform-dependent Resume Code Flow
    234===================================
    235
    236The following steps are taken in order to transition the system from a
    237platform-dependent suspend state into the working state:
    238
    239 1. Platform-specific system wakeup.
    240
    241    The platform is woken up by a signal from one of the designated system
    242    wakeup devices (which need not be an in-band hardware interrupt)  and
    243    control is passed back to the kernel (the working configuration of the
    244    platform may need to be restored by the platform firmware before the
    245    kernel gets control again).
    246
    247 2. Resuming core system components.
    248
    249    The suspend-time configuration of the core system components is restored and
    250    the timekeeping is resumed.
    251
    252 3. Re-enabling non-boot CPUs.
    253
    254    The CPUs disabled in step 4 of the preceding suspend transition are taken
    255    back online and their suspend-time configuration is restored.
    256
    257 4. Resuming devices and restoring the working-state configuration of IRQs.
    258
    259    This step is the same as step 2 of the suspend-to-idle suspend transition
    260    described `above <s2idle_resume_>`_.
    261
    262 5. Thawing tasks.
    263
    264    This step is the same as step 3 of the suspend-to-idle suspend transition
    265    described `above <s2idle_resume_>`_.
    266
    267 6. Invoking system-wide resume notifiers.
    268
    269    This step is the same as step 4 of the suspend-to-idle suspend transition
    270    described `above <s2idle_resume_>`_.