cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

idmappings.rst (38073B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3Idmappings
      4==========
      5
      6Most filesystem developers will have encountered idmappings. They are used when
      7reading from or writing ownership to disk, reporting ownership to userspace, or
      8for permission checking. This document is aimed at filesystem developers that
      9want to know how idmappings work.
     10
     11Formal notes
     12------------
     13
     14An idmapping is essentially a translation of a range of ids into another or the
     15same range of ids. The notational convention for idmappings that is widely used
     16in userspace is::
     17
     18 u:k:r
     19
     20``u`` indicates the first element in the upper idmapset ``U`` and ``k``
     21indicates the first element in the lower idmapset ``K``. The ``r`` parameter
     22indicates the range of the idmapping, i.e. how many ids are mapped. From now
     23on, we will always prefix ids with ``u`` or ``k`` to make it clear whether
     24we're talking about an id in the upper or lower idmapset.
     25
     26To see what this looks like in practice, let's take the following idmapping::
     27
     28 u22:k10000:r3
     29
     30and write down the mappings it will generate::
     31
     32 u22 -> k10000
     33 u23 -> k10001
     34 u24 -> k10002
     35
     36From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
     37idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are
     38order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
     39the set of all possible ids useable on a given system.
     40
     41Looking at this mathematically briefly will help us highlight some properties
     42that make it easier to understand how we can translate between idmappings. For
     43example, we know that the inverse idmapping is an order isomorphism as well::
     44
     45 k10000 -> u22
     46 k10001 -> u23
     47 k10002 -> u24
     48
     49Given that we are dealing with order isomorphisms plus the fact that we're
     50dealing with subsets we can embedd idmappings into each other, i.e. we can
     51sensibly translate between different idmappings. For example, assume we've been
     52given the three idmappings::
     53
     54 1. u0:k10000:r10000
     55 2. u0:k20000:r10000
     56 3. u0:k30000:r10000
     57
     58and id ``k11000`` which has been generated by the first idmapping by mapping
     59``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset.
     60
     61Because we're dealing with order isomorphic subsets it is meaningful to ask
     62what id ``k11000`` corresponds to in the second or third idmapping. The
     63straightfoward algorithm to use is to apply the inverse of the first idmapping,
     64mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using
     65either the second idmapping mapping or third idmapping mapping. The second
     66idmapping would map ``u1000`` down to ``21000``. The third idmapping would map
     67``u1000`` down to ``u31000``.
     68
     69If we were given the same task for the following three idmappings::
     70
     71 1. u0:k10000:r10000
     72 2. u0:k20000:r200
     73 3. u0:k30000:r300
     74
     75we would fail to translate as the sets aren't order isomorphic over the full
     76range of the first idmapping anymore (However they are order isomorphic over
     77the full range of the second idmapping.). Neither the second or third idmapping
     78contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having
     79an id mapped. We can simply say that ``u1000`` is unmapped in the second and
     80third idmapping. The kernel will report unmapped ids as the overflowuid
     81``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace.
     82
     83The algorithm to calculate what a given id maps to is pretty simple. First, we
     84need to verify that the range can contain our target id. We will skip this step
     85for simplicity. After that if we want to know what ``id`` maps to we can do
     86simple calculations:
     87
     88- If we want to map from left to right::
     89
     90   u:k:r
     91   id - u + k = n
     92
     93- If we want to map from right to left::
     94
     95   u:k:r
     96   id - k + u = n
     97
     98Instead of "left to right" we can also say "down" and instead of "right to
     99left" we can also say "up". Obviously mapping down and up invert each other.
    100
    101To see whether the simple formulas above work, consider the following two
    102idmappings::
    103
    104 1. u0:k20000:r10000
    105 2. u500:k30000:r10000
    106
    107Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We
    108want to know what id this was mapped from in the upper idmapset of the first
    109idmapping. So we're mapping up in the first idmapping::
    110
    111 id     - k      + u  = n
    112 k21000 - k20000 + u0 = u1000
    113
    114Now assume we are given the id ``u1100`` in the upper idmapset of the second
    115idmapping and we want to know what this id maps down to in the lower idmapset
    116of the second idmapping. This means we're mapping down in the second
    117idmapping::
    118
    119 id    - u    + k      = n
    120 u1100 - u500 + k30000 = k30600
    121
    122General notes
    123-------------
    124
    125In the context of the kernel an idmapping can be interpreted as mapping a range
    126of userspace ids into a range of kernel ids::
    127
    128 userspace-id:kernel-id:range
    129
    130A userspace id is always an element in the upper idmapset of an idmapping of
    131type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower
    132idmapset of an idmapping of type ``kuid_t`` or ``kgid_t``. From now on
    133"userspace id" will be used to refer to the well known ``uid_t`` and ``gid_t``
    134types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``.
    135
    136The kernel is mostly concerned with kernel ids. They are used when performing
    137permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field.
    138A userspace id on the other hand is an id that is reported to userspace by the
    139kernel, or is passed by userspace to the kernel, or a raw device id that is
    140written or read from disk.
    141
    142Note that we are only concerned with idmappings as the kernel stores them not
    143how userspace would specify them.
    144
    145For the rest of this document we will prefix all userspace ids with ``u`` and
    146all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
    147an idmapping will be written as ``u0:k10000:r10000``.
    148
    149For example, the id ``u1000`` is an id in the upper idmapset or "userspace
    150idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a
    151kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``.
    152
    153A kernel id is always created by an idmapping. Such idmappings are associated
    154with user namespaces. Since we mainly care about how idmappings work we're not
    155going to be concerned with how idmappings are created nor how they are used
    156outside of the filesystem context. This is best left to an explanation of user
    157namespaces.
    158
    159The initial user namespace is special. It always has an idmapping of the
    160following form::
    161
    162 u0:k0:r4294967295
    163
    164which is an identity idmapping over the full range of ids available on this
    165system.
    166
    167Other user namespaces usually have non-identity idmappings such as::
    168
    169 u0:k10000:r10000
    170
    171When a process creates or wants to change ownership of a file, or when the
    172ownership of a file is read from disk by a filesystem, the userspace id is
    173immediately translated into a kernel id according to the idmapping associated
    174with the relevant user namespace.
    175
    176For instance, consider a file that is stored on disk by a filesystem as being
    177owned by ``u1000``:
    178
    179- If a filesystem were to be mounted in the initial user namespaces (as most
    180  filesystems are) then the initial idmapping will be used. As we saw this is
    181  simply the identity idmapping. This would mean id ``u1000`` read from disk
    182  would be mapped to id ``k1000``. So an inode's ``i_uid`` and ``i_gid`` field
    183  would contain ``k1000``.
    184
    185- If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000``
    186  then ``u1000`` read from disk would be mapped to ``k11000``. So an inode's
    187  ``i_uid`` and ``i_gid`` would contain ``k11000``.
    188
    189Translation algorithms
    190----------------------
    191
    192We've already seen briefly that it is possible to translate between different
    193idmappings. We'll now take a closer look how that works.
    194
    195Crossmapping
    196~~~~~~~~~~~~
    197
    198This translation algorithm is used by the kernel in quite a few places. For
    199example, it is used when reporting back the ownership of a file to userspace
    200via the ``stat()`` system call family.
    201
    202If we've been given ``k11000`` from one idmapping we can map that id up in
    203another idmapping. In order for this to work both idmappings need to contain
    204the same kernel id in their kernel idmapsets. For example, consider the
    205following idmappings::
    206
    207 1. u0:k10000:r10000
    208 2. u20000:k10000:r10000
    209
    210and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can
    211then translate ``k11000`` into a userspace id in the second idmapping using the
    212kernel idmapset of the second idmapping::
    213
    214 /* Map the kernel id up into a userspace id in the second idmapping. */
    215 from_kuid(u20000:k10000:r10000, k11000) = u21000
    216
    217Note, how we can get back to the kernel id in the first idmapping by inverting
    218the algorithm::
    219
    220 /* Map the userspace id down into a kernel id in the second idmapping. */
    221 make_kuid(u20000:k10000:r10000, u21000) = k11000
    222
    223 /* Map the kernel id up into a userspace id in the first idmapping. */
    224 from_kuid(u0:k10000:r10000, k11000) = u1000
    225
    226This algorithm allows us to answer the question what userspace id a given
    227kernel id corresponds to in a given idmapping. In order to be able to answer
    228this question both idmappings need to contain the same kernel id in their
    229respective kernel idmapsets.
    230
    231For example, when the kernel reads a raw userspace id from disk it maps it down
    232into a kernel id according to the idmapping associated with the filesystem.
    233Let's assume the filesystem was mounted with an idmapping of
    234``u0:k20000:r10000`` and it reads a file owned by ``u1000`` from disk. This
    235means ``u1000`` will be mapped to ``k21000`` which is what will be stored in
    236the inode's ``i_uid`` and ``i_gid`` field.
    237
    238When someone in userspace calls ``stat()`` or a related function to get
    239ownership information about the file the kernel can't simply map the id back up
    240according to the filesystem's idmapping as this would give the wrong owner if
    241the caller is using an idmapping.
    242
    243So the kernel will map the id back up in the idmapping of the caller. Let's
    244assume the caller has the slighly unconventional idmapping
    245``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``.
    246Consequently the user would see that this file is owned by ``u4000``.
    247
    248Remapping
    249~~~~~~~~~
    250
    251It is possible to translate a kernel id from one idmapping to another one via
    252the userspace idmapset of the two idmappings. This is equivalent to remapping
    253a kernel id.
    254
    255Let's look at an example. We are given the following two idmappings::
    256
    257 1. u0:k10000:r10000
    258 2. u0:k20000:r10000
    259
    260and we are given ``k11000`` in the first idmapping. In order to translate this
    261kernel id in the first idmapping into a kernel id in the second idmapping we
    262need to perform two steps:
    263
    2641. Map the kernel id up into a userspace id in the first idmapping::
    265
    266    /* Map the kernel id up into a userspace id in the first idmapping. */
    267    from_kuid(u0:k10000:r10000, k11000) = u1000
    268
    2692. Map the userspace id down into a kernel id in the second idmapping::
    270
    271    /* Map the userspace id down into a kernel id in the second idmapping. */
    272    make_kuid(u0:k20000:r10000, u1000) = k21000
    273
    274As you can see we used the userspace idmapset in both idmappings to translate
    275the kernel id in one idmapping to a kernel id in another idmapping.
    276
    277This allows us to answer the question what kernel id we would need to use to
    278get the same userspace id in another idmapping. In order to be able to answer
    279this question both idmappings need to contain the same userspace id in their
    280respective userspace idmapsets.
    281
    282Note, how we can easily get back to the kernel id in the first idmapping by
    283inverting the algorithm:
    284
    2851. Map the kernel id up into a userspace id in the second idmapping::
    286
    287    /* Map the kernel id up into a userspace id in the second idmapping. */
    288    from_kuid(u0:k20000:r10000, k21000) = u1000
    289
    2902. Map the userspace id down into a kernel id in the first idmapping::
    291
    292    /* Map the userspace id down into a kernel id in the first idmapping. */
    293    make_kuid(u0:k10000:r10000, u1000) = k11000
    294
    295Another way to look at this translation is to treat it as inverting one
    296idmapping and applying another idmapping if both idmappings have the relevant
    297userspace id mapped. This will come in handy when working with idmapped mounts.
    298
    299Invalid translations
    300~~~~~~~~~~~~~~~~~~~~
    301
    302It is never valid to use an id in the kernel idmapset of one idmapping as the
    303id in the userspace idmapset of another or the same idmapping. While the kernel
    304idmapset always indicates an idmapset in the kernel id space the userspace
    305idmapset indicates a userspace id. So the following translations are forbidden::
    306
    307 /* Map the userspace id down into a kernel id in the first idmapping. */
    308 make_kuid(u0:k10000:r10000, u1000) = k11000
    309
    310 /* INVALID: Map the kernel id down into a kernel id in the second idmapping. */
    311 make_kuid(u10000:k20000:r10000, k110000) = k21000
    312                                 ~~~~~~~
    313
    314and equally wrong::
    315
    316 /* Map the kernel id up into a userspace id in the first idmapping. */
    317 from_kuid(u0:k10000:r10000, k11000) = u1000
    318
    319 /* INVALID: Map the userspace id up into a userspace id in the second idmapping. */
    320 from_kuid(u20000:k0:r10000, u1000) = k21000
    321                             ~~~~~
    322
    323Idmappings when creating filesystem objects
    324-------------------------------------------
    325
    326The concepts of mapping an id down or mapping an id up are expressed in the two
    327kernel functions filesystem developers are rather familiar with and which we've
    328already used in this document::
    329
    330 /* Map the userspace id down into a kernel id. */
    331 make_kuid(idmapping, uid)
    332
    333 /* Map the kernel id up into a userspace id. */
    334 from_kuid(idmapping, kuid)
    335
    336We will take an abbreviated look into how idmappings figure into creating
    337filesystem objects. For simplicity we will only look at what happens when the
    338VFS has already completed path lookup right before it calls into the filesystem
    339itself. So we're concerned with what happens when e.g. ``vfs_mkdir()`` is
    340called. We will also assume that the directory we're creating filesystem
    341objects in is readable and writable for everyone.
    342
    343When creating a filesystem object the caller will look at the caller's
    344filesystem ids. These are just regular ``uid_t`` and ``gid_t`` userspace ids
    345but they are exclusively used when determining file ownership which is why they
    346are called "filesystem ids". They are usually identical to the uid and gid of
    347the caller but can differ. We will just assume they are always identical to not
    348get lost in too many details.
    349
    350When the caller enters the kernel two things happen:
    351
    3521. Map the caller's userspace ids down into kernel ids in the caller's
    353   idmapping.
    354   (To be precise, the kernel will simply look at the kernel ids stashed in the
    355   credentials of the current task but for our education we'll pretend this
    356   translation happens just in time.)
    3572. Verify that the caller's kernel ids can be mapped up to userspace ids in the
    358   filesystem's idmapping.
    359
    360The second step is important as regular filesystem will ultimately need to map
    361the kernel id back up into a userspace id when writing to disk.
    362So with the second step the kernel guarantees that a valid userspace id can be
    363written to disk. If it can't the kernel will refuse the creation request to not
    364even remotely risk filesystem corruption.
    365
    366The astute reader will have realized that this is simply a varation of the
    367crossmapping algorithm we mentioned above in a previous section. First, the
    368kernel maps the caller's userspace id down into a kernel id according to the
    369caller's idmapping and then maps that kernel id up according to the
    370filesystem's idmapping.
    371
    372Let's see some examples with caller/filesystem idmapping but without mount
    373idmappings. This will exhibit some problems we can hit. After that we will
    374revisit/reconsider these examples, this time using mount idmappings, to see how
    375they can solve the problems we observed before.
    376
    377Example 1
    378~~~~~~~~~
    379
    380::
    381
    382 caller id:            u1000
    383 caller idmapping:     u0:k0:r4294967295
    384 filesystem idmapping: u0:k0:r4294967295
    385
    386Both the caller and the filesystem use the identity idmapping:
    387
    3881. Map the caller's userspace ids into kernel ids in the caller's idmapping::
    389
    390    make_kuid(u0:k0:r4294967295, u1000) = k1000
    391
    3922. Verify that the caller's kernel ids can be mapped to userspace ids in the
    393   filesystem's idmapping.
    394
    395   For this second step the kernel will call the function
    396   ``fsuidgid_has_mapping()`` which ultimately boils down to calling
    397   ``from_kuid()``::
    398
    399    from_kuid(u0:k0:r4294967295, k1000) = u1000
    400
    401In this example both idmappings are the same so there's nothing exciting going
    402on. Ultimately the userspace id that lands on disk will be ``u1000``.
    403
    404Example 2
    405~~~~~~~~~
    406
    407::
    408
    409 caller id:            u1000
    410 caller idmapping:     u0:k10000:r10000
    411 filesystem idmapping: u0:k20000:r10000
    412
    4131. Map the caller's userspace ids down into kernel ids in the caller's
    414   idmapping::
    415
    416    make_kuid(u0:k10000:r10000, u1000) = k11000
    417
    4182. Verify that the caller's kernel ids can be mapped up to userspace ids in the
    419   filesystem's idmapping::
    420
    421    from_kuid(u0:k20000:r10000, k11000) = u-1
    422
    423It's immediately clear that while the caller's userspace id could be
    424successfully mapped down into kernel ids in the caller's idmapping the kernel
    425ids could not be mapped up according to the filesystem's idmapping. So the
    426kernel will deny this creation request.
    427
    428Note that while this example is less common, because most filesystem can't be
    429mounted with non-initial idmappings this is a general problem as we can see in
    430the next examples.
    431
    432Example 3
    433~~~~~~~~~
    434
    435::
    436
    437 caller id:            u1000
    438 caller idmapping:     u0:k10000:r10000
    439 filesystem idmapping: u0:k0:r4294967295
    440
    4411. Map the caller's userspace ids down into kernel ids in the caller's
    442   idmapping::
    443
    444    make_kuid(u0:k10000:r10000, u1000) = k11000
    445
    4462. Verify that the caller's kernel ids can be mapped up to userspace ids in the
    447   filesystem's idmapping::
    448
    449    from_kuid(u0:k0:r4294967295, k11000) = u11000
    450
    451We can see that the translation always succeeds. The userspace id that the
    452filesystem will ultimately put to disk will always be identical to the value of
    453the kernel id that was created in the caller's idmapping. This has mainly two
    454consequences.
    455
    456First, that we can't allow a caller to ultimately write to disk with another
    457userspace id. We could only do this if we were to mount the whole fileystem
    458with the caller's or another idmapping. But that solution is limited to a few
    459filesystems and not very flexible. But this is a use-case that is pretty
    460important in containerized workloads.
    461
    462Second, the caller will usually not be able to create any files or access
    463directories that have stricter permissions because none of the filesystem's
    464kernel ids map up into valid userspace ids in the caller's idmapping
    465
    4661. Map raw userspace ids down to kernel ids in the filesystem's idmapping::
    467
    468    make_kuid(u0:k0:r4294967295, u1000) = k1000
    469
    4702. Map kernel ids up to userspace ids in the caller's idmapping::
    471
    472    from_kuid(u0:k10000:r10000, k1000) = u-1
    473
    474Example 4
    475~~~~~~~~~
    476
    477::
    478
    479 file id:              u1000
    480 caller idmapping:     u0:k10000:r10000
    481 filesystem idmapping: u0:k0:r4294967295
    482
    483In order to report ownership to userspace the kernel uses the crossmapping
    484algorithm introduced in a previous section:
    485
    4861. Map the userspace id on disk down into a kernel id in the filesystem's
    487   idmapping::
    488
    489    make_kuid(u0:k0:r4294967295, u1000) = k1000
    490
    4912. Map the kernel id up into a userspace id in the caller's idmapping::
    492
    493    from_kuid(u0:k10000:r10000, k1000) = u-1
    494
    495The crossmapping algorithm fails in this case because the kernel id in the
    496filesystem idmapping cannot be mapped up to a userspace id in the caller's
    497idmapping. Thus, the kernel will report the ownership of this file as the
    498overflowid.
    499
    500Example 5
    501~~~~~~~~~
    502
    503::
    504
    505 file id:              u1000
    506 caller idmapping:     u0:k10000:r10000
    507 filesystem idmapping: u0:k20000:r10000
    508
    509In order to report ownership to userspace the kernel uses the crossmapping
    510algorithm introduced in a previous section:
    511
    5121. Map the userspace id on disk down into a kernel id in the filesystem's
    513   idmapping::
    514
    515    make_kuid(u0:k20000:r10000, u1000) = k21000
    516
    5172. Map the kernel id up into a userspace id in the caller's idmapping::
    518
    519    from_kuid(u0:k10000:r10000, k21000) = u-1
    520
    521Again, the crossmapping algorithm fails in this case because the kernel id in
    522the filesystem idmapping cannot be mapped to a userspace id in the caller's
    523idmapping. Thus, the kernel will report the ownership of this file as the
    524overflowid.
    525
    526Note how in the last two examples things would be simple if the caller would be
    527using the initial idmapping. For a filesystem mounted with the initial
    528idmapping it would be trivial. So we only consider a filesystem with an
    529idmapping of ``u0:k20000:r10000``:
    530
    5311. Map the userspace id on disk down into a kernel id in the filesystem's
    532   idmapping::
    533
    534    make_kuid(u0:k20000:r10000, u1000) = k21000
    535
    5362. Map the kernel id up into a userspace id in the caller's idmapping::
    537
    538    from_kuid(u0:k0:r4294967295, k21000) = u21000
    539
    540Idmappings on idmapped mounts
    541-----------------------------
    542
    543The examples we've seen in the previous section where the caller's idmapping
    544and the filesystem's idmapping are incompatible causes various issues for
    545workloads. For a more complex but common example, consider two containers
    546started on the host. To completely prevent the two containers from affecting
    547each other, an administrator may often use different non-overlapping idmappings
    548for the two containers::
    549
    550 container1 idmapping:  u0:k10000:r10000
    551 container2 idmapping:  u0:k20000:r10000
    552 filesystem idmapping:  u0:k30000:r10000
    553
    554An administrator wanting to provide easy read-write access to the following set
    555of files::
    556
    557 dir id:       u0
    558 dir/file1 id: u1000
    559 dir/file2 id: u2000
    560
    561to both containers currently can't.
    562
    563Of course the administrator has the option to recursively change ownership via
    564``chown()``. For example, they could change ownership so that ``dir`` and all
    565files below it can be crossmapped from the filesystem's into the container's
    566idmapping. Let's assume they change ownership so it is compatible with the
    567first container's idmapping::
    568
    569 dir id:       u10000
    570 dir/file1 id: u11000
    571 dir/file2 id: u12000
    572
    573This would still leave ``dir`` rather useless to the second container. In fact,
    574``dir`` and all files below it would continue to appear owned by the overflowid
    575for the second container.
    576
    577Or consider another increasingly popular example. Some service managers such as
    578systemd implement a concept called "portable home directories". A user may want
    579to use their home directories on different machines where they are assigned
    580different login userspace ids. Most users will have ``u1000`` as the login id
    581on their machine at home and all files in their home directory will usually be
    582owned by ``u1000``. At uni or at work they may have another login id such as
    583``u1125``. This makes it rather difficult to interact with their home directory
    584on their work machine.
    585
    586In both cases changing ownership recursively has grave implications. The most
    587obvious one is that ownership is changed globally and permanently. In the home
    588directory case this change in ownership would even need to happen everytime the
    589user switches from their home to their work machine. For really large sets of
    590files this becomes increasingly costly.
    591
    592If the user is lucky, they are dealing with a filesystem that is mountable
    593inside user namespaces. But this would also change ownership globally and the
    594change in ownership is tied to the lifetime of the filesystem mount, i.e. the
    595superblock. The only way to change ownership is to completely unmount the
    596filesystem and mount it again in another user namespace. This is usually
    597impossible because it would mean that all users currently accessing the
    598filesystem can't anymore. And it means that ``dir`` still can't be shared
    599between two containers with different idmappings.
    600But usually the user doesn't even have this option since most filesystems
    601aren't mountable inside containers. And not having them mountable might be
    602desirable as it doesn't require the filesystem to deal with malicious
    603filesystem images.
    604
    605But the usecases mentioned above and more can be handled by idmapped mounts.
    606They allow to expose the same set of dentries with different ownership at
    607different mounts. This is achieved by marking the mounts with a user namespace
    608through the ``mount_setattr()`` system call. The idmapping associated with it
    609is then used to translate from the caller's idmapping to the filesystem's
    610idmapping and vica versa using the remapping algorithm we introduced above.
    611
    612Idmapped mounts make it possible to change ownership in a temporary and
    613localized way. The ownership changes are restricted to a specific mount and the
    614ownership changes are tied to the lifetime of the mount. All other users and
    615locations where the filesystem is exposed are unaffected.
    616
    617Filesystems that support idmapped mounts don't have any real reason to support
    618being mountable inside user namespaces. A filesystem could be exposed
    619completely under an idmapped mount to get the same effect. This has the
    620advantage that filesystems can leave the creation of the superblock to
    621privileged users in the initial user namespace.
    622
    623However, it is perfectly possible to combine idmapped mounts with filesystems
    624mountable inside user namespaces. We will touch on this further below.
    625
    626Remapping helpers
    627~~~~~~~~~~~~~~~~~
    628
    629Idmapping functions were added that translate between idmappings. They make use
    630of the remapping algorithm we've introduced earlier. We're going to look at
    631two:
    632
    633- ``i_uid_into_mnt()`` and ``i_gid_into_mnt()``
    634
    635  The ``i_*id_into_mnt()`` functions translate filesystem's kernel ids into
    636  kernel ids in the mount's idmapping::
    637
    638   /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
    639   from_kuid(filesystem, kid) = uid
    640
    641   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
    642   make_kuid(mount, uid) = kuid
    643
    644- ``mapped_fsuid()`` and ``mapped_fsgid()``
    645
    646  The ``mapped_fs*id()`` functions translate the caller's kernel ids into
    647  kernel ids in the filesystem's idmapping. This translation is achieved by
    648  remapping the caller's kernel ids using the mount's idmapping::
    649
    650   /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
    651   from_kuid(mount, kid) = uid
    652
    653   /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
    654   make_kuid(filesystem, uid) = kuid
    655
    656Note that these two functions invert each other. Consider the following
    657idmappings::
    658
    659 caller idmapping:     u0:k10000:r10000
    660 filesystem idmapping: u0:k20000:r10000
    661 mount idmapping:      u0:k10000:r10000
    662
    663Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
    664to ``k21000`` according to it's idmapping. This is what is stored in the
    665inode's ``i_uid`` and ``i_gid`` fields.
    666
    667When the caller queries the ownership of this file via ``stat()`` the kernel
    668would usually simply use the crossmapping algorithm and map the filesystem's
    669kernel id up to a userspace id in the caller's idmapping.
    670
    671But when the caller is accessing the file on an idmapped mount the kernel will
    672first call ``i_uid_into_mnt()`` thereby translating the filesystem's kernel id
    673into a kernel id in the mount's idmapping::
    674
    675 i_uid_into_mnt(k21000):
    676   /* Map the filesystem's kernel id up into a userspace id. */
    677   from_kuid(u0:k20000:r10000, k21000) = u1000
    678
    679   /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */
    680   make_kuid(u0:k10000:r10000, u1000) = k11000
    681
    682Finally, when the kernel reports the owner to the caller it will turn the
    683kernel id in the mount's idmapping into a userspace id in the caller's
    684idmapping::
    685
    686  from_kuid(u0:k10000:r10000, k11000) = u1000
    687
    688We can test whether this algorithm really works by verifying what happens when
    689we create a new file. Let's say the user is creating a file with ``u1000``.
    690
    691The kernel maps this to ``k11000`` in the caller's idmapping. Usually the
    692kernel would now apply the crossmapping, verifying that ``k11000`` can be
    693mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't
    694be mapped up in the filesystem's idmapping directly this creation request
    695fails.
    696
    697But when the caller is accessing the file on an idmapped mount the kernel will
    698first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
    699a kernel id according to the mount's idmapping::
    700
    701 mapped_fsuid(k11000):
    702    /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
    703    from_kuid(u0:k10000:r10000, k11000) = u1000
    704
    705    /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
    706    make_kuid(u0:k20000:r10000, u1000) = k21000
    707
    708When finally writing to disk the kernel will then map ``k21000`` up into a
    709userspace id in the filesystem's idmapping::
    710
    711   from_kuid(u0:k20000:r10000, k21000) = u1000
    712
    713As we can see, we end up with an invertible and therefore information
    714preserving algorithm. A file created from ``u1000`` on an idmapped mount will
    715also be reported as being owned by ``u1000`` and vica versa.
    716
    717Let's now briefly reconsider the failing examples from earlier in the context
    718of idmapped mounts.
    719
    720Example 2 reconsidered
    721~~~~~~~~~~~~~~~~~~~~~~
    722
    723::
    724
    725 caller id:            u1000
    726 caller idmapping:     u0:k10000:r10000
    727 filesystem idmapping: u0:k20000:r10000
    728 mount idmapping:      u0:k10000:r10000
    729
    730When the caller is using a non-initial idmapping the common case is to attach
    731the same idmapping to the mount. We now perform three steps:
    732
    7331. Map the caller's userspace ids into kernel ids in the caller's idmapping::
    734
    735    make_kuid(u0:k10000:r10000, u1000) = k11000
    736
    7372. Translate the caller's kernel id into a kernel id in the filesystem's
    738   idmapping::
    739
    740    mapped_fsuid(k11000):
    741      /* Map the kernel id up into a userspace id in the mount's idmapping. */
    742      from_kuid(u0:k10000:r10000, k11000) = u1000
    743
    744      /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
    745      make_kuid(u0:k20000:r10000, u1000) = k21000
    746
    7472. Verify that the caller's kernel ids can be mapped to userspace ids in the
    748   filesystem's idmapping::
    749
    750    from_kuid(u0:k20000:r10000, k21000) = u1000
    751
    752So the ownership that lands on disk will be ``u1000``.
    753
    754Example 3 reconsidered
    755~~~~~~~~~~~~~~~~~~~~~~
    756
    757::
    758
    759 caller id:            u1000
    760 caller idmapping:     u0:k10000:r10000
    761 filesystem idmapping: u0:k0:r4294967295
    762 mount idmapping:      u0:k10000:r10000
    763
    764The same translation algorithm works with the third example.
    765
    7661. Map the caller's userspace ids into kernel ids in the caller's idmapping::
    767
    768    make_kuid(u0:k10000:r10000, u1000) = k11000
    769
    7702. Translate the caller's kernel id into a kernel id in the filesystem's
    771   idmapping::
    772
    773    mapped_fsuid(k11000):
    774       /* Map the kernel id up into a userspace id in the mount's idmapping. */
    775       from_kuid(u0:k10000:r10000, k11000) = u1000
    776
    777       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
    778       make_kuid(u0:k0:r4294967295, u1000) = k1000
    779
    7802. Verify that the caller's kernel ids can be mapped to userspace ids in the
    781   filesystem's idmapping::
    782
    783    from_kuid(u0:k0:r4294967295, k21000) = u1000
    784
    785So the ownership that lands on disk will be ``u1000``.
    786
    787Example 4 reconsidered
    788~~~~~~~~~~~~~~~~~~~~~~
    789
    790::
    791
    792 file id:              u1000
    793 caller idmapping:     u0:k10000:r10000
    794 filesystem idmapping: u0:k0:r4294967295
    795 mount idmapping:      u0:k10000:r10000
    796
    797In order to report ownership to userspace the kernel now does three steps using
    798the translation algorithm we introduced earlier:
    799
    8001. Map the userspace id on disk down into a kernel id in the filesystem's
    801   idmapping::
    802
    803    make_kuid(u0:k0:r4294967295, u1000) = k1000
    804
    8052. Translate the kernel id into a kernel id in the mount's idmapping::
    806
    807    i_uid_into_mnt(k1000):
    808      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
    809      from_kuid(u0:k0:r4294967295, k1000) = u1000
    810
    811      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
    812      make_kuid(u0:k10000:r10000, u1000) = k11000
    813
    8143. Map the kernel id up into a userspace id in the caller's idmapping::
    815
    816    from_kuid(u0:k10000:r10000, k11000) = u1000
    817
    818Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
    819idmapping. With the idmapped mount in place it now can be crossmapped into the
    820filesystem's idmapping via the mount's idmapping. The file will now be created
    821with ``u1000`` according to the mount's idmapping.
    822
    823Example 5 reconsidered
    824~~~~~~~~~~~~~~~~~~~~~~
    825
    826::
    827
    828 file id:              u1000
    829 caller idmapping:     u0:k10000:r10000
    830 filesystem idmapping: u0:k20000:r10000
    831 mount idmapping:      u0:k10000:r10000
    832
    833Again, in order to report ownership to userspace the kernel now does three
    834steps using the translation algorithm we introduced earlier:
    835
    8361. Map the userspace id on disk down into a kernel id in the filesystem's
    837   idmapping::
    838
    839    make_kuid(u0:k20000:r10000, u1000) = k21000
    840
    8412. Translate the kernel id into a kernel id in the mount's idmapping::
    842
    843    i_uid_into_mnt(k21000):
    844      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
    845      from_kuid(u0:k20000:r10000, k21000) = u1000
    846
    847      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
    848      make_kuid(u0:k10000:r10000, u1000) = k11000
    849
    8503. Map the kernel id up into a userspace id in the caller's idmapping::
    851
    852    from_kuid(u0:k10000:r10000, k11000) = u1000
    853
    854Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
    855idmapping. With the idmapped mount in place it now can be crossmapped into the
    856filesystem's idmapping via the mount's idmapping. The file is now owned by
    857``u1000`` according to the mount's idmapping.
    858
    859Changing ownership on a home directory
    860~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    861
    862We've seen above how idmapped mounts can be used to translate between
    863idmappings when either the caller, the filesystem or both uses a non-initial
    864idmapping. A wide range of usecases exist when the caller is using
    865a non-initial idmapping. This mostly happens in the context of containerized
    866workloads. The consequence is as we have seen that for both, filesystem's
    867mounted with the initial idmapping and filesystems mounted with non-initial
    868idmappings, access to the filesystem isn't working because the kernel ids can't
    869be crossmapped between the caller's and the filesystem's idmapping.
    870
    871As we've seen above idmapped mounts provide a solution to this by remapping the
    872caller's or filesystem's idmapping according to the mount's idmapping.
    873
    874Aside from containerized workloads, idmapped mounts have the advantage that
    875they also work when both the caller and the filesystem use the initial
    876idmapping which means users on the host can change the ownership of directories
    877and files on a per-mount basis.
    878
    879Consider our previous example where a user has their home directory on portable
    880storage. At home they have id ``u1000`` and all files in their home directory
    881are owned by ``u1000`` whereas at uni or work they have login id ``u1125``.
    882
    883Taking their home directory with them becomes problematic. They can't easily
    884access their files, they might not be able to write to disk without applying
    885lax permissions or ACLs and even if they can, they will end up with an annoying
    886mix of files and directories owned by ``u1000`` and ``u1125``.
    887
    888Idmapped mounts allow to solve this problem. A user can create an idmapped
    889mount for their home directory on their work computer or their computer at home
    890depending on what ownership they would prefer to end up on the portable storage
    891itself.
    892
    893Let's assume they want all files on disk to belong to ``u1000``. When the user
    894plugs in their portable storage at their work station they can setup a job that
    895creates an idmapped mount with the minimal idmapping ``u1000:k1125:r1``. So now
    896when they create a file the kernel performs the following steps we already know
    897from above:::
    898
    899 caller id:            u1125
    900 caller idmapping:     u0:k0:r4294967295
    901 filesystem idmapping: u0:k0:r4294967295
    902 mount idmapping:      u1000:k1125:r1
    903
    9041. Map the caller's userspace ids into kernel ids in the caller's idmapping::
    905
    906    make_kuid(u0:k0:r4294967295, u1125) = k1125
    907
    9082. Translate the caller's kernel id into a kernel id in the filesystem's
    909   idmapping::
    910
    911    mapped_fsuid(k1125):
    912      /* Map the kernel id up into a userspace id in the mount's idmapping. */
    913      from_kuid(u1000:k1125:r1, k1125) = u1000
    914
    915      /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
    916      make_kuid(u0:k0:r4294967295, u1000) = k1000
    917
    9182. Verify that the caller's kernel ids can be mapped to userspace ids in the
    919   filesystem's idmapping::
    920
    921    from_kuid(u0:k0:r4294967295, k1000) = u1000
    922
    923So ultimately the file will be created with ``u1000`` on disk.
    924
    925Now let's briefly look at what ownership the caller with id ``u1125`` will see
    926on their work computer:
    927
    928::
    929
    930 file id:              u1000
    931 caller idmapping:     u0:k0:r4294967295
    932 filesystem idmapping: u0:k0:r4294967295
    933 mount idmapping:      u1000:k1125:r1
    934
    9351. Map the userspace id on disk down into a kernel id in the filesystem's
    936   idmapping::
    937
    938    make_kuid(u0:k0:r4294967295, u1000) = k1000
    939
    9402. Translate the kernel id into a kernel id in the mount's idmapping::
    941
    942    i_uid_into_mnt(k1000):
    943      /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
    944      from_kuid(u0:k0:r4294967295, k1000) = u1000
    945
    946      /* Map the userspace id down into a kernel id in the mounts's idmapping. */
    947      make_kuid(u1000:k1125:r1, u1000) = k1125
    948
    9493. Map the kernel id up into a userspace id in the caller's idmapping::
    950
    951    from_kuid(u0:k0:r4294967295, k1125) = u1125
    952
    953So ultimately the caller will be reported that the file belongs to ``u1125``
    954which is the caller's userspace id on their workstation in our example.
    955
    956The raw userspace id that is put on disk is ``u1000`` so when the user takes
    957their home directory back to their home computer where they are assigned
    958``u1000`` using the initial idmapping and mount the filesystem with the initial
    959idmapping they will see all those files owned by ``u1000``.