cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

unshare.rst (13539B)


      1unshare system call
      2===================
      3
      4This document describes the new system call, unshare(). The document
      5provides an overview of the feature, why it is needed, how it can
      6be used, its interface specification, design, implementation and
      7how it can be tested.
      8
      9Change Log
     10----------
     11version 0.1  Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
     12
     13Contents
     14--------
     15	1) Overview
     16	2) Benefits
     17	3) Cost
     18	4) Requirements
     19	5) Functional Specification
     20	6) High Level Design
     21	7) Low Level Design
     22	8) Test Specification
     23	9) Future Work
     24
     251) Overview
     26-----------
     27
     28Most legacy operating system kernels support an abstraction of threads
     29as multiple execution contexts within a process. These kernels provide
     30special resources and mechanisms to maintain these "threads". The Linux
     31kernel, in a clever and simple manner, does not make distinction
     32between processes and "threads". The kernel allows processes to share
     33resources and thus they can achieve legacy "threads" behavior without
     34requiring additional data structures and mechanisms in the kernel. The
     35power of implementing threads in this manner comes not only from
     36its simplicity but also from allowing application programmers to work
     37outside the confinement of all-or-nothing shared resources of legacy
     38threads. On Linux, at the time of thread creation using the clone system
     39call, applications can selectively choose which resources to share
     40between threads.
     41
     42unshare() system call adds a primitive to the Linux thread model that
     43allows threads to selectively 'unshare' any resources that were being
     44shared at the time of their creation. unshare() was conceptualized by
     45Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
     46of the discussion on POSIX threads on Linux.  unshare() augments the
     47usefulness of Linux threads for applications that would like to control
     48shared resources without creating a new process. unshare() is a natural
     49addition to the set of available primitives on Linux that implement
     50the concept of process/thread as a virtual machine.
     51
     522) Benefits
     53-----------
     54
     55unshare() would be useful to large application frameworks such as PAM
     56where creating a new process to control sharing/unsharing of process
     57resources is not possible. Since namespaces are shared by default
     58when creating a new process using fork or clone, unshare() can benefit
     59even non-threaded applications if they have a need to disassociate
     60from default shared namespace. The following lists two use-cases
     61where unshare() can be used.
     62
     632.1 Per-security context namespaces
     64~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     65
     66unshare() can be used to implement polyinstantiated directories using
     67the kernel's per-process namespace mechanism. Polyinstantiated directories,
     68such as per-user and/or per-security context instance of /tmp, /var/tmp or
     69per-security context instance of a user's home directory, isolate user
     70processes when working with these directories. Using unshare(), a PAM
     71module can easily setup a private namespace for a user at login.
     72Polyinstantiated directories are required for Common Criteria certification
     73with Labeled System Protection Profile, however, with the availability
     74of shared-tree feature in the Linux kernel, even regular Linux systems
     75can benefit from setting up private namespaces at login and
     76polyinstantiating /tmp, /var/tmp and other directories deemed
     77appropriate by system administrators.
     78
     792.2 unsharing of virtual memory and/or open files
     80~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     81
     82Consider a client/server application where the server is processing
     83client requests by creating processes that share resources such as
     84virtual memory and open files. Without unshare(), the server has to
     85decide what needs to be shared at the time of creating the process
     86which services the request. unshare() allows the server an ability to
     87disassociate parts of the context during the servicing of the
     88request. For large and complex middleware application frameworks, this
     89ability to unshare() after the process was created can be very
     90useful.
     91
     923) Cost
     93-------
     94
     95In order to not duplicate code and to handle the fact that unshare()
     96works on an active task (as opposed to clone/fork working on a newly
     97allocated inactive task) unshare() had to make minor reorganizational
     98changes to copy_* functions utilized by clone/fork system call.
     99There is a cost associated with altering existing, well tested and
    100stable code to implement a new feature that may not get exercised
    101extensively in the beginning. However, with proper design and code
    102review of the changes and creation of an unshare() test for the LTP
    103the benefits of this new feature can exceed its cost.
    104
    1054) Requirements
    106---------------
    107
    108unshare() reverses sharing that was done using clone(2) system call,
    109so unshare() should have a similar interface as clone(2). That is,
    110since flags in clone(int flags, void \*stack) specifies what should
    111be shared, similar flags in unshare(int flags) should specify
    112what should be unshared. Unfortunately, this may appear to invert
    113the meaning of the flags from the way they are used in clone(2).
    114However, there was no easy solution that was less confusing and that
    115allowed incremental context unsharing in future without an ABI change.
    116
    117unshare() interface should accommodate possible future addition of
    118new context flags without requiring a rebuild of old applications.
    119If and when new context flags are added, unshare() design should allow
    120incremental unsharing of those resources on an as needed basis.
    121
    1225) Functional Specification
    123---------------------------
    124
    125NAME
    126	unshare - disassociate parts of the process execution context
    127
    128SYNOPSIS
    129	#include <sched.h>
    130
    131	int unshare(int flags);
    132
    133DESCRIPTION
    134	unshare() allows a process to disassociate parts of its execution
    135	context that are currently being shared with other processes. Part
    136	of execution context, such as the namespace, is shared by default
    137	when a new process is created using fork(2), while other parts,
    138	such as the virtual memory, open file descriptors, etc, may be
    139	shared by explicit request to share them when creating a process
    140	using clone(2).
    141
    142	The main use of unshare() is to allow a process to control its
    143	shared execution context without creating a new process.
    144
    145	The flags argument specifies one or bitwise-or'ed of several of
    146	the following constants.
    147
    148	CLONE_FS
    149		If CLONE_FS is set, file system information of the caller
    150		is disassociated from the shared file system information.
    151
    152	CLONE_FILES
    153		If CLONE_FILES is set, the file descriptor table of the
    154		caller is disassociated from the shared file descriptor
    155		table.
    156
    157	CLONE_NEWNS
    158		If CLONE_NEWNS is set, the namespace of the caller is
    159		disassociated from the shared namespace.
    160
    161	CLONE_VM
    162		If CLONE_VM is set, the virtual memory of the caller is
    163		disassociated from the shared virtual memory.
    164
    165RETURN VALUE
    166	On success, zero returned. On failure, -1 is returned and errno is
    167
    168ERRORS
    169	EPERM	CLONE_NEWNS was specified by a non-root process (process
    170		without CAP_SYS_ADMIN).
    171
    172	ENOMEM	Cannot allocate sufficient memory to copy parts of caller's
    173		context that need to be unshared.
    174
    175	EINVAL	Invalid flag was specified as an argument.
    176
    177CONFORMING TO
    178	The unshare() call is Linux-specific and  should  not be used
    179	in programs intended to be portable.
    180
    181SEE ALSO
    182	clone(2), fork(2)
    183
    1846) High Level Design
    185--------------------
    186
    187Depending on the flags argument, the unshare() system call allocates
    188appropriate process context structures, populates it with values from
    189the current shared version, associates newly duplicated structures
    190with the current task structure and releases corresponding shared
    191versions. Helper functions of clone (copy_*) could not be used
    192directly by unshare() because of the following two reasons.
    193
    194  1) clone operates on a newly allocated not-yet-active task
    195     structure, where as unshare() operates on the current active
    196     task. Therefore unshare() has to take appropriate task_lock()
    197     before associating newly duplicated context structures
    198
    199  2) unshare() has to allocate and duplicate all context structures
    200     that are being unshared, before associating them with the
    201     current task and releasing older shared structures. Failure
    202     do so will create race conditions and/or oops when trying
    203     to backout due to an error. Consider the case of unsharing
    204     both virtual memory and namespace. After successfully unsharing
    205     vm, if the system call encounters an error while allocating
    206     new namespace structure, the error return code will have to
    207     reverse the unsharing of vm. As part of the reversal the
    208     system call will have to go back to older, shared, vm
    209     structure, which may not exist anymore.
    210
    211Therefore code from copy_* functions that allocated and duplicated
    212current context structure was moved into new dup_* functions. Now,
    213copy_* functions call dup_* functions to allocate and duplicate
    214appropriate context structures and then associate them with the
    215task structure that is being constructed. unshare() system call on
    216the other hand performs the following:
    217
    218  1) Check flags to force missing, but implied, flags
    219
    220  2) For each context structure, call the corresponding unshare()
    221     helper function to allocate and duplicate a new context
    222     structure, if the appropriate bit is set in the flags argument.
    223
    224  3) If there is no error in allocation and duplication and there
    225     are new context structures then lock the current task structure,
    226     associate new context structures with the current task structure,
    227     and release the lock on the current task structure.
    228
    229  4) Appropriately release older, shared, context structures.
    230
    2317) Low Level Design
    232-------------------
    233
    234Implementation of unshare() can be grouped in the following 4 different
    235items:
    236
    237  a) Reorganization of existing copy_* functions
    238
    239  b) unshare() system call service function
    240
    241  c) unshare() helper functions for each different process context
    242
    243  d) Registration of system call number for different architectures
    244
    2457.1) Reorganization of copy_* functions
    246~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    247
    248Each copy function such as copy_mm, copy_namespace, copy_files,
    249etc, had roughly two components. The first component allocated
    250and duplicated the appropriate structure and the second component
    251linked it to the task structure passed in as an argument to the copy
    252function. The first component was split into its own function.
    253These dup_* functions allocated and duplicated the appropriate
    254context structure. The reorganized copy_* functions invoked
    255their corresponding dup_* functions and then linked the newly
    256duplicated structures to the task structure with which the
    257copy function was called.
    258
    2597.2) unshare() system call service function
    260~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    261
    262       * Check flags
    263	 Force implied flags. If CLONE_THREAD is set force CLONE_VM.
    264	 If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
    265	 set and signals are also being shared, force CLONE_THREAD. If
    266	 CLONE_NEWNS is set, force CLONE_FS.
    267
    268       * For each context flag, invoke the corresponding unshare_*
    269	 helper routine with flags passed into the system call and a
    270	 reference to pointer pointing the new unshared structure
    271
    272       * If any new structures are created by unshare_* helper
    273	 functions, take the task_lock() on the current task,
    274	 modify appropriate context pointers, and release the
    275         task lock.
    276
    277       * For all newly unshared structures, release the corresponding
    278         older, shared, structures.
    279
    2807.3) unshare_* helper functions
    281~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    282
    283For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
    284and CLONE_THREAD, return -EINVAL since they are not implemented yet.
    285For others, check the flag value to see if the unsharing is
    286required for that structure. If it is, invoke the corresponding
    287dup_* function to allocate and duplicate the structure and return
    288a pointer to it.
    289
    2907.4) Finally
    291~~~~~~~~~~~~
    292
    293Appropriately modify architecture specific code to register the
    294new system call.
    295
    2968) Test Specification
    297---------------------
    298
    299The test for unshare() should test the following:
    300
    301  1) Valid flags: Test to check that clone flags for signal and
    302     signal handlers, for which unsharing is not implemented
    303     yet, return -EINVAL.
    304
    305  2) Missing/implied flags: Test to make sure that if unsharing
    306     namespace without specifying unsharing of filesystem, correctly
    307     unshares both namespace and filesystem information.
    308
    309  3) For each of the four (namespace, filesystem, files and vm)
    310     supported unsharing, verify that the system call correctly
    311     unshares the appropriate structure. Verify that unsharing
    312     them individually as well as in combination with each
    313     other works as expected.
    314
    315  4) Concurrent execution: Use shared memory segments and futex on
    316     an address in the shm segment to synchronize execution of
    317     about 10 threads. Have a couple of threads execute execve,
    318     a couple _exit and the rest unshare with different combination
    319     of flags. Verify that unsharing is performed as expected and
    320     that there are no oops or hangs.
    321
    3229) Future Work
    323--------------
    324
    325The current implementation of unshare() does not allow unsharing of
    326signals and signal handlers. Signals are complex to begin with and
    327to unshare signals and/or signal handlers of a currently running
    328process is even more complex. If in the future there is a specific
    329need to allow unsharing of signals and/or signal handlers, it can
    330be incrementally added to unshare() without affecting legacy
    331applications using unshare().
    332