credentials.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
credentials.rst (21162B)
      1====================
      2Credentials in Linux
      3====================
      4
      5By: David Howells <dhowells@redhat.com>
      6
      7.. contents:: :local:
      8
      9Overview
     10========
     11
     12There are several parts to the security check performed by Linux when one
     13object acts upon another:
     14
     15 1. Objects.
     16
     17     Objects are things in the system that may be acted upon directly by
     18     userspace programs.  Linux has a variety of actionable objects, including:
     19
     20	- Tasks
     21	- Files/inodes
     22	- Sockets
     23	- Message queues
     24	- Shared memory segments
     25	- Semaphores
     26	- Keys
     27
     28     As a part of the description of all these objects there is a set of
     29     credentials.  What's in the set depends on the type of object.
     30
     31 2. Object ownership.
     32
     33     Amongst the credentials of most objects, there will be a subset that
     34     indicates the ownership of that object.  This is used for resource
     35     accounting and limitation (disk quotas and task rlimits for example).
     36
     37     In a standard UNIX filesystem, for instance, this will be defined by the
     38     UID marked on the inode.
     39
     40 3. The objective context.
     41
     42     Also amongst the credentials of those objects, there will be a subset that
     43     indicates the 'objective context' of that object.  This may or may not be
     44     the same set as in (2) - in standard UNIX files, for instance, this is the
     45     defined by the UID and the GID marked on the inode.
     46
     47     The objective context is used as part of the security calculation that is
     48     carried out when an object is acted upon.
     49
     50 4. Subjects.
     51
     52     A subject is an object that is acting upon another object.
     53
     54     Most of the objects in the system are inactive: they don't act on other
     55     objects within the system.  Processes/tasks are the obvious exception:
     56     they do stuff; they access and manipulate things.
     57
     58     Objects other than tasks may under some circumstances also be subjects.
     59     For instance an open file may send SIGIO to a task using the UID and EUID
     60     given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
     61     the file struct will have a subjective context too.
     62
     63 5. The subjective context.
     64
     65     A subject has an additional interpretation of its credentials.  A subset
     66     of its credentials forms the 'subjective context'.  The subjective context
     67     is used as part of the security calculation that is carried out when a
     68     subject acts.
     69
     70     A Linux task, for example, has the FSUID, FSGID and the supplementary
     71     group list for when it is acting upon a file - which are quite separate
     72     from the real UID and GID that normally form the objective context of the
     73     task.
     74
     75 6. Actions.
     76
     77     Linux has a number of actions available that a subject may perform upon an
     78     object.  The set of actions available depends on the nature of the subject
     79     and the object.
     80
     81     Actions include reading, writing, creating and deleting files; forking or
     82     signalling and tracing tasks.
     83
     84 7. Rules, access control lists and security calculations.
     85
     86     When a subject acts upon an object, a security calculation is made.  This
     87     involves taking the subjective context, the objective context and the
     88     action, and searching one or more sets of rules to see whether the subject
     89     is granted or denied permission to act in the desired manner on the
     90     object, given those contexts.
     91
     92     There are two main sources of rules:
     93
     94     a. Discretionary access control (DAC):
     95
     96	 Sometimes the object will include sets of rules as part of its
     97	 description.  This is an 'Access Control List' or 'ACL'.  A Linux
     98	 file may supply more than one ACL.
     99
    100	 A traditional UNIX file, for example, includes a permissions mask that
    101	 is an abbreviated ACL with three fixed classes of subject ('user',
    102	 'group' and 'other'), each of which may be granted certain privileges
    103	 ('read', 'write' and 'execute' - whatever those map to for the object
    104	 in question).  UNIX file permissions do not allow the arbitrary
    105	 specification of subjects, however, and so are of limited use.
    106
    107	 A Linux file might also sport a POSIX ACL.  This is a list of rules
    108	 that grants various permissions to arbitrary subjects.
    109
    110     b. Mandatory access control (MAC):
    111
    112	 The system as a whole may have one or more sets of rules that get
    113	 applied to all subjects and objects, regardless of their source.
    114	 SELinux and Smack are examples of this.
    115
    116	 In the case of SELinux and Smack, each object is given a label as part
    117	 of its credentials.  When an action is requested, they take the
    118	 subject label, the object label and the action and look for a rule
    119	 that says that this action is either granted or denied.
    120
    121
    122Types of Credentials
    123====================
    124
    125The Linux kernel supports the following types of credentials:
    126
    127 1. Traditional UNIX credentials.
    128
    129	- Real User ID
    130	- Real Group ID
    131
    132     The UID and GID are carried by most, if not all, Linux objects, even if in
    133     some cases it has to be invented (FAT or CIFS files for example, which are
    134     derived from Windows).  These (mostly) define the objective context of
    135     that object, with tasks being slightly different in some cases.
    136
    137	- Effective, Saved and FS User ID
    138	- Effective, Saved and FS Group ID
    139	- Supplementary groups
    140
    141     These are additional credentials used by tasks only.  Usually, an
    142     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
    143     will be used as the objective.  For tasks, it should be noted that this is
    144     not always true.
    145
    146 2. Capabilities.
    147
    148	- Set of permitted capabilities
    149	- Set of inheritable capabilities
    150	- Set of effective capabilities
    151	- Capability bounding set
    152
    153     These are only carried by tasks.  They indicate superior capabilities
    154     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
    155     These are manipulated implicitly by changes to the traditional UNIX
    156     credentials, but can also be manipulated directly by the ``capset()``
    157     system call.
    158
    159     The permitted capabilities are those caps that the process might grant
    160     itself to its effective or permitted sets through ``capset()``.  This
    161     inheritable set might also be so constrained.
    162
    163     The effective capabilities are the ones that a task is actually allowed to
    164     make use of itself.
    165
    166     The inheritable capabilities are the ones that may get passed across
    167     ``execve()``.
    168
    169     The bounding set limits the capabilities that may be inherited across
    170     ``execve()``, especially when a binary is executed that will execute as
    171     UID 0.
    172
    173 3. Secure management flags (securebits).
    174
    175     These are only carried by tasks.  These govern the way the above
    176     credentials are manipulated and inherited over certain operations such as
    177     execve().  They aren't used directly as objective or subjective
    178     credentials.
    179
    180 4. Keys and keyrings.
    181
    182     These are only carried by tasks.  They carry and cache security tokens
    183     that don't fit into the other standard UNIX credentials.  They are for
    184     making such things as network filesystem keys available to the file
    185     accesses performed by processes, without the necessity of ordinary
    186     programs having to know about security details involved.
    187
    188     Keyrings are a special type of key.  They carry sets of other keys and can
    189     be searched for the desired key.  Each process may subscribe to a number
    190     of keyrings:
    191
    192	Per-thread keying
    193	Per-process keyring
    194	Per-session keyring
    195
    196     When a process accesses a key, if not already present, it will normally be
    197     cached on one of these keyrings for future accesses to find.
    198
    199     For more information on using keys, see ``Documentation/security/keys/*``.
    200
    201 5. LSM
    202
    203     The Linux Security Module allows extra controls to be placed over the
    204     operations that a task may do.  Currently Linux supports several LSM
    205     options.
    206
    207     Some work by labelling the objects in a system and then applying sets of
    208     rules (policies) that say what operations a task with one label may do to
    209     an object with another label.
    210
    211 6. AF_KEY
    212
    213     This is a socket-based approach to credential management for networking
    214     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
    215     interact directly with task and file credentials; rather it keeps system
    216     level credentials.
    217
    218
    219When a file is opened, part of the opening task's subjective context is
    220recorded in the file struct created.  This allows operations using that file
    221struct to use those credentials instead of the subjective context of the task
    222that issued the operation.  An example of this would be a file opened on a
    223network filesystem where the credentials of the opened file should be presented
    224to the server, regardless of who is actually doing a read or a write upon it.
    225
    226
    227File Markings
    228=============
    229
    230Files on disk or obtained over the network may have annotations that form the
    231objective security context of that file.  Depending on the type of filesystem,
    232this may include one or more of the following:
    233
    234 * UNIX UID, GID, mode;
    235 * Windows user ID;
    236 * Access control list;
    237 * LSM security label;
    238 * UNIX exec privilege escalation bits (SUID/SGID);
    239 * File capabilities exec privilege escalation bits.
    240
    241These are compared to the task's subjective security context, and certain
    242operations allowed or disallowed as a result.  In the case of execve(), the
    243privilege escalation bits come into play, and may allow the resulting process
    244extra privileges, based on the annotations on the executable file.
    245
    246
    247Task Credentials
    248================
    249
    250In Linux, all of a task's credentials are held in (uid, gid) or through
    251(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
    252Each task points to its credentials by a pointer called 'cred' in its
    253task_struct.
    254
    255Once a set of credentials has been prepared and committed, it may not be
    256changed, barring the following exceptions:
    257
    258 1. its reference count may be changed;
    259
    260 2. the reference count on the group_info struct it points to may be changed;
    261
    262 3. the reference count on the security data it points to may be changed;
    263
    264 4. the reference count on any keyrings it points to may be changed;
    265
    266 5. any keyrings it points to may be revoked, expired or have their security
    267    attributes changed; and
    268
    269 6. the contents of any keyrings to which it points may be changed (the whole
    270    point of keyrings being a shared set of credentials, modifiable by anyone
    271    with appropriate access).
    272
    273To alter anything in the cred struct, the copy-and-replace principle must be
    274adhered to.  First take a copy, then alter the copy and then use RCU to change
    275the task pointer to make it point to the new copy.  There are wrappers to aid
    276with this (see below).
    277
    278A task may only alter its _own_ credentials; it is no longer permitted for a
    279task to alter another's credentials.  This means the ``capset()`` system call
    280is no longer permitted to take any PID other than the one of the current
    281process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
    282longer permit attachment to process-specific keyrings in the requesting
    283process as the instantiating process may need to create them.
    284
    285
    286Immutable Credentials
    287---------------------
    288
    289Once a set of credentials has been made public (by calling ``commit_creds()``
    290for example), it must be considered immutable, barring two exceptions:
    291
    292 1. The reference count may be altered.
    293
    294 2. While the keyring subscriptions of a set of credentials may not be
    295    changed, the keyrings subscribed to may have their contents altered.
    296
    297To catch accidental credential alteration at compile time, struct task_struct
    298has _const_ pointers to its credential sets, as does struct file.  Furthermore,
    299certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
    300pointers, thus rendering casts unnecessary, but require to temporarily ditch
    301the const qualification to be able to alter the reference count.
    302
    303
    304Accessing Task Credentials
    305--------------------------
    306
    307A task being able to alter only its own credentials permits the current process
    308to read or replace its own credentials without the need for any form of locking
    309-- which simplifies things greatly.  It can just call::
    310
    311	const struct cred *current_cred()
    312
    313to get a pointer to its credentials structure, and it doesn't have to release
    314it afterwards.
    315
    316There are convenience wrappers for retrieving specific aspects of a task's
    317credentials (the value is simply returned in each case)::
    318
    319	uid_t current_uid(void)		Current's real UID
    320	gid_t current_gid(void)		Current's real GID
    321	uid_t current_euid(void)	Current's effective UID
    322	gid_t current_egid(void)	Current's effective GID
    323	uid_t current_fsuid(void)	Current's file access UID
    324	gid_t current_fsgid(void)	Current's file access GID
    325	kernel_cap_t current_cap(void)	Current's effective capabilities
    326	struct user_struct *current_user(void)  Current's user account
    327
    328There are also convenience wrappers for retrieving specific associated pairs of
    329a task's credentials::
    330
    331	void current_uid_gid(uid_t *, gid_t *);
    332	void current_euid_egid(uid_t *, gid_t *);
    333	void current_fsuid_fsgid(uid_t *, gid_t *);
    334
    335which return these pairs of values through their arguments after retrieving
    336them from the current task's credentials.
    337
    338
    339In addition, there is a function for obtaining a reference on the current
    340process's current set of credentials::
    341
    342	const struct cred *get_current_cred(void);
    343
    344and functions for getting references to one of the credentials that don't
    345actually live in struct cred::
    346
    347	struct user_struct *get_current_user(void);
    348	struct group_info *get_current_groups(void);
    349
    350which get references to the current process's user accounting structure and
    351supplementary groups list respectively.
    352
    353Once a reference has been obtained, it must be released with ``put_cred()``,
    354``free_uid()`` or ``put_group_info()`` as appropriate.
    355
    356
    357Accessing Another Task's Credentials
    358------------------------------------
    359
    360While a task may access its own credentials without the need for locking, the
    361same is not true of a task wanting to access another task's credentials.  It
    362must use the RCU read lock and ``rcu_dereference()``.
    363
    364The ``rcu_dereference()`` is wrapped by::
    365
    366	const struct cred *__task_cred(struct task_struct *task);
    367
    368This should be used inside the RCU read lock, as in the following example::
    369
    370	void foo(struct task_struct *t, struct foo_data *f)
    371	{
    372		const struct cred *tcred;
    373		...
    374		rcu_read_lock();
    375		tcred = __task_cred(t);
    376		f->uid = tcred->uid;
    377		f->gid = tcred->gid;
    378		f->groups = get_group_info(tcred->groups);
    379		rcu_read_unlock();
    380		...
    381	}
    382
    383Should it be necessary to hold another task's credentials for a long period of
    384time, and possibly to sleep while doing so, then the caller should get a
    385reference on them using::
    386
    387	const struct cred *get_task_cred(struct task_struct *task);
    388
    389This does all the RCU magic inside of it.  The caller must call put_cred() on
    390the credentials so obtained when they're finished with.
    391
    392.. note::
    393   The result of ``__task_cred()`` should not be passed directly to
    394   ``get_cred()`` as this may race with ``commit_cred()``.
    395
    396There are a couple of convenience functions to access bits of another task's
    397credentials, hiding the RCU magic from the caller::
    398
    399	uid_t task_uid(task)		Task's real UID
    400	uid_t task_euid(task)		Task's effective UID
    401
    402If the caller is holding the RCU read lock at the time anyway, then::
    403
    404	__task_cred(task)->uid
    405	__task_cred(task)->euid
    406
    407should be used instead.  Similarly, if multiple aspects of a task's credentials
    408need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
    409the result stored in a temporary pointer and then the credential aspects called
    410from that before dropping the lock.  This prevents the potentially expensive
    411RCU magic from being invoked multiple times.
    412
    413Should some other single aspect of another task's credentials need to be
    414accessed, then this can be used::
    415
    416	task_cred_xxx(task, member)
    417
    418where 'member' is a non-pointer member of the cred struct.  For instance::
    419
    420	uid_t task_cred_xxx(task, suid);
    421
    422will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
    423magic.  This may not be used for pointer members as what they point to may
    424disappear the moment the RCU read lock is dropped.
    425
    426
    427Altering Credentials
    428--------------------
    429
    430As previously mentioned, a task may only alter its own credentials, and may not
    431alter those of another task.  This means that it doesn't need to use any
    432locking to alter its own credentials.
    433
    434To alter the current process's credentials, a function should first prepare a
    435new set of credentials by calling::
    436
    437	struct cred *prepare_creds(void);
    438
    439this locks current->cred_replace_mutex and then allocates and constructs a
    440duplicate of the current process's credentials, returning with the mutex still
    441held if successful.  It returns NULL if not successful (out of memory).
    442
    443The mutex prevents ``ptrace()`` from altering the ptrace state of a process
    444while security checks on credentials construction and changing is taking place
    445as the ptrace state may alter the outcome, particularly in the case of
    446``execve()``.
    447
    448The new credentials set should be altered appropriately, and any security
    449checks and hooks done.  Both the current and the proposed sets of credentials
    450are available for this purpose as current_cred() will return the current set
    451still at this point.
    452
    453When replacing the group list, the new list must be sorted before it
    454is added to the credential, as a binary search is used to test for
    455membership.  In practice, this means groups_sort() should be
    456called before set_groups() or set_current_groups().
    457groups_sort() must not be called on a ``struct group_list`` which
    458is shared as it may permute elements as part of the sorting process
    459even if the array is already sorted.
    460
    461When the credential set is ready, it should be committed to the current process
    462by calling::
    463
    464	int commit_creds(struct cred *new);
    465
    466This will alter various aspects of the credentials and the process, giving the
    467LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
    468actually commit the new credentials to ``current->cred``, it will release
    469``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
    470will notify the scheduler and others of the changes.
    471
    472This function is guaranteed to return 0, so that it can be tail-called at the
    473end of such functions as ``sys_setresuid()``.
    474
    475Note that this function consumes the caller's reference to the new credentials.
    476The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
    477
    478Furthermore, once this function has been called on a new set of credentials,
    479those credentials may _not_ be changed further.
    480
    481
    482Should the security checks fail or some other error occur after
    483``prepare_creds()`` has been called, then the following function should be
    484invoked::
    485
    486	void abort_creds(struct cred *new);
    487
    488This releases the lock on ``current->cred_replace_mutex`` that
    489``prepare_creds()`` got and then releases the new credentials.
    490
    491
    492A typical credentials alteration function would look something like this::
    493
    494	int alter_suid(uid_t suid)
    495	{
    496		struct cred *new;
    497		int ret;
    498
    499		new = prepare_creds();
    500		if (!new)
    501			return -ENOMEM;
    502
    503		new->suid = suid;
    504		ret = security_alter_suid(new);
    505		if (ret < 0) {
    506			abort_creds(new);
    507			return ret;
    508		}
    509
    510		return commit_creds(new);
    511	}
    512
    513
    514Managing Credentials
    515--------------------
    516
    517There are some functions to help manage credentials:
    518
    519 - ``void put_cred(const struct cred *cred);``
    520
    521     This releases a reference to the given set of credentials.  If the
    522     reference count reaches zero, the credentials will be scheduled for
    523     destruction by the RCU system.
    524
    525 - ``const struct cred *get_cred(const struct cred *cred);``
    526
    527     This gets a reference on a live set of credentials, returning a pointer to
    528     that set of credentials.
    529
    530 - ``struct cred *get_new_cred(struct cred *cred);``
    531
    532     This gets a reference on a set of credentials that is under construction
    533     and is thus still mutable, returning a pointer to that set of credentials.
    534
    535
    536Open File Credentials
    537=====================
    538
    539When a new file is opened, a reference is obtained on the opening task's
    540credentials and this is attached to the file struct as ``f_cred`` in place of
    541``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
    542``file->f_gid`` should now access ``file->f_cred->fsuid`` and
    543``file->f_cred->fsgid``.
    544
    545It is safe to access ``f_cred`` without the use of RCU or locking because the
    546pointer will not change over the lifetime of the file struct, and nor will the
    547contents of the cred struct pointed to, barring the exceptions listed above
    548(see the Task Credentials section).
    549
    550To avoid "confused deputy" privilege escalation attacks, access control checks
    551during subsequent operations on an opened file should use these credentials
    552instead of "current"'s credentials, as the file may have been passed to a more
    553privileged process.
    554
    555Overriding the VFS's Use of Credentials
    556=======================================
    557
    558Under some circumstances it is desirable to override the credentials used by
    559the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
    560different set of credentials.  This is done in the following places:
    561
    562 * ``sys_faccessat()``.
    563 * ``do_coredump()``.
    564 * nfs4recover.c.