cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

taskstats.rst (8141B)


      1=============================
      2Per-task statistics interface
      3=============================
      4
      5
      6Taskstats is a netlink-based interface for sending per-task and
      7per-process statistics from the kernel to userspace.
      8
      9Taskstats was designed for the following benefits:
     10
     11- efficiently provide statistics during lifetime of a task and on its exit
     12- unified interface for multiple accounting subsystems
     13- extensibility for use by future accounting patches
     14
     15Terminology
     16-----------
     17
     18"pid", "tid" and "task" are used interchangeably and refer to the standard
     19Linux task defined by struct task_struct.  per-pid stats are the same as
     20per-task stats.
     21
     22"tgid", "process" and "thread group" are used interchangeably and refer to the
     23tasks that share an mm_struct i.e. the traditional Unix process. Despite the
     24use of tgid, there is no special treatment for the task that is thread group
     25leader - a process is deemed alive as long as it has any task belonging to it.
     26
     27Usage
     28-----
     29
     30To get statistics during a task's lifetime, userspace opens a unicast netlink
     31socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
     32The response contains statistics for a task (if pid is specified) or the sum of
     33statistics for all tasks of the process (if tgid is specified).
     34
     35To obtain statistics for tasks which are exiting, the userspace listener
     36sends a register command and specifies a cpumask. Whenever a task exits on
     37one of the cpus in the cpumask, its per-pid statistics are sent to the
     38registered listener. Using cpumasks allows the data received by one listener
     39to be limited and assists in flow control over the netlink interface and is
     40explained in more detail below.
     41
     42If the exiting task is the last thread exiting its thread group,
     43an additional record containing the per-tgid stats is also sent to userspace.
     44The latter contains the sum of per-pid stats for all threads in the thread
     45group, both past and present.
     46
     47getdelays.c is a simple utility demonstrating usage of the taskstats interface
     48for reporting delay accounting statistics. Users can register cpumasks,
     49send commands and process responses, listen for per-tid/tgid exit data,
     50write the data received to a file and do basic flow control by increasing
     51receive buffer sizes.
     52
     53Interface
     54---------
     55
     56The user-kernel interface is encapsulated in include/linux/taskstats.h
     57
     58To avoid this documentation becoming obsolete as the interface evolves, only
     59an outline of the current version is given. taskstats.h always overrides the
     60description here.
     61
     62struct taskstats is the common accounting structure for both per-pid and
     63per-tgid data. It is versioned and can be extended by each accounting subsystem
     64that is added to the kernel. The fields and their semantics are defined in the
     65taskstats.h file.
     66
     67The data exchanged between user and kernel space is a netlink message belonging
     68to the NETLINK_GENERIC family and using the netlink attributes interface.
     69The messages are in the format::
     70
     71    +----------+- - -+-------------+-------------------+
     72    | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
     73    +----------+- - -+-------------+-------------------+
     74
     75
     76The taskstats payload is one of the following three kinds:
     77
     781. Commands: Sent from user to kernel. Commands to get data on
     79a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
     80containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
     81the task/process for which userspace wants statistics.
     82
     83Commands to register/deregister interest in exit data from a set of cpus
     84consist of one attribute, of type
     85TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
     86attribute payload. The cpumask is specified as an ascii string of
     87comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
     88the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
     89in cpus before closing the listening socket, the kernel cleans up its interest
     90set over time. However, for the sake of efficiency, an explicit deregistration
     91is advisable.
     92
     932. Response for a command: sent from the kernel in response to a userspace
     94command. The payload is a series of three attributes of type:
     95
     96a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
     97a pid/tgid will be followed by some stats.
     98
     99b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
    100are being returned.
    101
    102c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
    103same structure is used for both per-pid and per-tgid stats.
    104
    1053. New message sent by kernel whenever a task exits. The payload consists of a
    106   series of attributes of the following type:
    107
    108a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
    109b) TASKSTATS_TYPE_PID: contains exiting task's pid
    110c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
    111d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
    112e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
    113f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
    114
    115
    116per-tgid stats
    117--------------
    118
    119Taskstats provides per-process stats, in addition to per-task stats, since
    120resource management is often done at a process granularity and aggregating task
    121stats in userspace alone is inefficient and potentially inaccurate (due to lack
    122of atomicity).
    123
    124However, maintaining per-process, in addition to per-task stats, within the
    125kernel has space and time overheads. To address this, the taskstats code
    126accumulates each exiting task's statistics into a process-wide data structure.
    127When the last task of a process exits, the process level data accumulated also
    128gets sent to userspace (along with the per-task data).
    129
    130When a user queries to get per-tgid data, the sum of all other live threads in
    131the group is added up and added to the accumulated total for previously exited
    132threads of the same thread group.
    133
    134Extending taskstats
    135-------------------
    136
    137There are two ways to extend the taskstats interface to export more
    138per-task/process stats as patches to collect them get added to the kernel
    139in future:
    140
    1411. Adding more fields to the end of the existing struct taskstats. Backward
    142   compatibility is ensured by the version number within the
    143   structure. Userspace will use only the fields of the struct that correspond
    144   to the version its using.
    145
    1462. Defining separate statistic structs and using the netlink attributes
    147   interface to return them. Since userspace processes each netlink attribute
    148   independently, it can always ignore attributes whose type it does not
    149   understand (because it is using an older version of the interface).
    150
    151
    152Choosing between 1. and 2. is a matter of trading off flexibility and
    153overhead. If only a few fields need to be added, then 1. is the preferable
    154path since the kernel and userspace don't need to incur the overhead of
    155processing new netlink attributes. But if the new fields expand the existing
    156struct too much, requiring disparate userspace accounting utilities to
    157unnecessarily receive large structures whose fields are of no interest, then
    158extending the attributes structure would be worthwhile.
    159
    160Flow control for taskstats
    161--------------------------
    162
    163When the rate of task exits becomes large, a listener may not be able to keep
    164up with the kernel's rate of sending per-tid/tgid exit data leading to data
    165loss. This possibility gets compounded when the taskstats structure gets
    166extended and the number of cpus grows large.
    167
    168To avoid losing statistics, userspace should do one or more of the following:
    169
    170- increase the receive buffer sizes for the netlink sockets opened by
    171  listeners to receive exit data.
    172
    173- create more listeners and reduce the number of cpus being listened to by
    174  each listener. In the extreme case, there could be one listener for each cpu.
    175  Users may also consider setting the cpu affinity of the listener to the subset
    176  of cpus to which it listens, especially if they are listening to just one cpu.
    177
    178Despite these measures, if the userspace receives ENOBUFS error messages
    179indicated overflow of receive buffers, it should take measures to handle the
    180loss of data.