cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

sharedsubtree.rst (30262B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3===============
      4Shared Subtrees
      5===============
      6
      7.. Contents:
      8	1) Overview
      9	2) Features
     10	3) Setting mount states
     11	4) Use-case
     12	5) Detailed semantics
     13	6) Quiz
     14	7) FAQ
     15	8) Implementation
     16
     17
     181) Overview
     19-----------
     20
     21Consider the following situation:
     22
     23A process wants to clone its own namespace, but still wants to access the CD
     24that got mounted recently.  Shared subtree semantics provide the necessary
     25mechanism to accomplish the above.
     26
     27It provides the necessary building blocks for features like per-user-namespace
     28and versioned filesystem.
     29
     302) Features
     31-----------
     32
     33Shared subtree provides four different flavors of mounts; struct vfsmount to be
     34precise
     35
     36	a. shared mount
     37	b. slave mount
     38	c. private mount
     39	d. unbindable mount
     40
     41
     422a) A shared mount can be replicated to as many mountpoints and all the
     43replicas continue to be exactly same.
     44
     45	Here is an example:
     46
     47	Let's say /mnt has a mount that is shared::
     48
     49	    mount --make-shared /mnt
     50
     51	Note: mount(8) command now supports the --make-shared flag,
     52	so the sample 'smount' program is no longer needed and has been
     53	removed.
     54
     55	::
     56
     57	    # mount --bind /mnt /tmp
     58
     59	The above command replicates the mount at /mnt to the mountpoint /tmp
     60	and the contents of both the mounts remain identical.
     61
     62	::
     63
     64	    #ls /mnt
     65	    a b c
     66
     67	    #ls /tmp
     68	    a b c
     69
     70	Now let's say we mount a device at /tmp/a::
     71
     72	    # mount /dev/sd0  /tmp/a
     73
     74	    #ls /tmp/a
     75	    t1 t2 t3
     76
     77	    #ls /mnt/a
     78	    t1 t2 t3
     79
     80	Note that the mount has propagated to the mount at /mnt as well.
     81
     82	And the same is true even when /dev/sd0 is mounted on /mnt/a. The
     83	contents will be visible under /tmp/a too.
     84
     85
     862b) A slave mount is like a shared mount except that mount and umount events
     87	only propagate towards it.
     88
     89	All slave mounts have a master mount which is a shared.
     90
     91	Here is an example:
     92
     93	Let's say /mnt has a mount which is shared.
     94	# mount --make-shared /mnt
     95
     96	Let's bind mount /mnt to /tmp
     97	# mount --bind /mnt /tmp
     98
     99	the new mount at /tmp becomes a shared mount and it is a replica of
    100	the mount at /mnt.
    101
    102	Now let's make the mount at /tmp; a slave of /mnt
    103	# mount --make-slave /tmp
    104
    105	let's mount /dev/sd0 on /mnt/a
    106	# mount /dev/sd0 /mnt/a
    107
    108	#ls /mnt/a
    109	t1 t2 t3
    110
    111	#ls /tmp/a
    112	t1 t2 t3
    113
    114	Note the mount event has propagated to the mount at /tmp
    115
    116	However let's see what happens if we mount something on the mount at /tmp
    117
    118	# mount /dev/sd1 /tmp/b
    119
    120	#ls /tmp/b
    121	s1 s2 s3
    122
    123	#ls /mnt/b
    124
    125	Note how the mount event has not propagated to the mount at
    126	/mnt
    127
    128
    1292c) A private mount does not forward or receive propagation.
    130
    131	This is the mount we are familiar with. Its the default type.
    132
    133
    1342d) A unbindable mount is a unbindable private mount
    135
    136	let's say we have a mount at /mnt and we make it unbindable::
    137
    138	    # mount --make-unbindable /mnt
    139
    140	 Let's try to bind mount this mount somewhere else::
    141
    142	    # mount --bind /mnt /tmp
    143	    mount: wrong fs type, bad option, bad superblock on /mnt,
    144		    or too many mounted file systems
    145
    146	Binding a unbindable mount is a invalid operation.
    147
    148
    1493) Setting mount states
    150
    151	The mount command (util-linux package) can be used to set mount
    152	states::
    153
    154	    mount --make-shared mountpoint
    155	    mount --make-slave mountpoint
    156	    mount --make-private mountpoint
    157	    mount --make-unbindable mountpoint
    158
    159
    1604) Use cases
    161------------
    162
    163	A) A process wants to clone its own namespace, but still wants to
    164	   access the CD that got mounted recently.
    165
    166	   Solution:
    167
    168		The system administrator can make the mount at /cdrom shared::
    169
    170		    mount --bind /cdrom /cdrom
    171		    mount --make-shared /cdrom
    172
    173		Now any process that clones off a new namespace will have a
    174		mount at /cdrom which is a replica of the same mount in the
    175		parent namespace.
    176
    177		So when a CD is inserted and mounted at /cdrom that mount gets
    178		propagated to the other mount at /cdrom in all the other clone
    179		namespaces.
    180
    181	B) A process wants its mounts invisible to any other process, but
    182	still be able to see the other system mounts.
    183
    184	   Solution:
    185
    186		To begin with, the administrator can mark the entire mount tree
    187		as shareable::
    188
    189		    mount --make-rshared /
    190
    191		A new process can clone off a new namespace. And mark some part
    192		of its namespace as slave::
    193
    194		    mount --make-rslave /myprivatetree
    195
    196		Hence forth any mounts within the /myprivatetree done by the
    197		process will not show up in any other namespace. However mounts
    198		done in the parent namespace under /myprivatetree still shows
    199		up in the process's namespace.
    200
    201
    202	Apart from the above semantics this feature provides the
    203	building blocks to solve the following problems:
    204
    205	C)  Per-user namespace
    206
    207		The above semantics allows a way to share mounts across
    208		namespaces.  But namespaces are associated with processes. If
    209		namespaces are made first class objects with user API to
    210		associate/disassociate a namespace with userid, then each user
    211		could have his/her own namespace and tailor it to his/her
    212		requirements. This needs to be supported in PAM.
    213
    214	D)  Versioned files
    215
    216		If the entire mount tree is visible at multiple locations, then
    217		an underlying versioning file system can return different
    218		versions of the file depending on the path used to access that
    219		file.
    220
    221		An example is::
    222
    223		    mount --make-shared /
    224		    mount --rbind / /view/v1
    225		    mount --rbind / /view/v2
    226		    mount --rbind / /view/v3
    227		    mount --rbind / /view/v4
    228
    229		and if /usr has a versioning filesystem mounted, then that
    230		mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
    231		/view/v4/usr too
    232
    233		A user can request v3 version of the file /usr/fs/namespace.c
    234		by accessing /view/v3/usr/fs/namespace.c . The underlying
    235		versioning filesystem can then decipher that v3 version of the
    236		filesystem is being requested and return the corresponding
    237		inode.
    238
    2395) Detailed semantics
    240---------------------
    241	The section below explains the detailed semantics of
    242	bind, rbind, move, mount, umount and clone-namespace operations.
    243
    244	Note: the word 'vfsmount' and the noun 'mount' have been used
    245	to mean the same thing, throughout this document.
    246
    2475a) Mount states
    248
    249	A given mount can be in one of the following states
    250
    251	1) shared
    252	2) slave
    253	3) shared and slave
    254	4) private
    255	5) unbindable
    256
    257	A 'propagation event' is defined as event generated on a vfsmount
    258	that leads to mount or unmount actions in other vfsmounts.
    259
    260	A 'peer group' is defined as a group of vfsmounts that propagate
    261	events to each other.
    262
    263	(1) Shared mounts
    264
    265		A 'shared mount' is defined as a vfsmount that belongs to a
    266		'peer group'.
    267
    268		For example::
    269
    270			mount --make-shared /mnt
    271			mount --bind /mnt /tmp
    272
    273		The mount at /mnt and that at /tmp are both shared and belong
    274		to the same peer group. Anything mounted or unmounted under
    275		/mnt or /tmp reflect in all the other mounts of its peer
    276		group.
    277
    278
    279	(2) Slave mounts
    280
    281		A 'slave mount' is defined as a vfsmount that receives
    282		propagation events and does not forward propagation events.
    283
    284		A slave mount as the name implies has a master mount from which
    285		mount/unmount events are received. Events do not propagate from
    286		the slave mount to the master.  Only a shared mount can be made
    287		a slave by executing the following command::
    288
    289			mount --make-slave mount
    290
    291		A shared mount that is made as a slave is no more shared unless
    292		modified to become shared.
    293
    294	(3) Shared and Slave
    295
    296		A vfsmount can be both shared as well as slave.  This state
    297		indicates that the mount is a slave of some vfsmount, and
    298		has its own peer group too.  This vfsmount receives propagation
    299		events from its master vfsmount, and also forwards propagation
    300		events to its 'peer group' and to its slave vfsmounts.
    301
    302		Strictly speaking, the vfsmount is shared having its own
    303		peer group, and this peer-group is a slave of some other
    304		peer group.
    305
    306		Only a slave vfsmount can be made as 'shared and slave' by
    307		either executing the following command::
    308
    309			mount --make-shared mount
    310
    311		or by moving the slave vfsmount under a shared vfsmount.
    312
    313	(4) Private mount
    314
    315		A 'private mount' is defined as vfsmount that does not
    316		receive or forward any propagation events.
    317
    318	(5) Unbindable mount
    319
    320		A 'unbindable mount' is defined as vfsmount that does not
    321		receive or forward any propagation events and cannot
    322		be bind mounted.
    323
    324
    325   	State diagram:
    326
    327   	The state diagram below explains the state transition of a mount,
    328	in response to various commands::
    329
    330	    -----------------------------------------------------------------------
    331	    |             |make-shared |  make-slave  | make-private |make-unbindab|
    332	    --------------|------------|--------------|--------------|-------------|
    333	    |shared	  |shared      |*slave/private|   private    | unbindable  |
    334	    |             |            |              |              |             |
    335	    |-------------|------------|--------------|--------------|-------------|
    336	    |slave	  |shared      | **slave      |    private   | unbindable  |
    337	    |             |and slave   |              |              |             |
    338	    |-------------|------------|--------------|--------------|-------------|
    339	    |shared       |shared      | slave        |    private   | unbindable  |
    340	    |and slave    |and slave   |              |              |             |
    341	    |-------------|------------|--------------|--------------|-------------|
    342	    |private      |shared      |  **private   |    private   | unbindable  |
    343	    |-------------|------------|--------------|--------------|-------------|
    344	    |unbindable   |shared      |**unbindable  |    private   | unbindable  |
    345	    ------------------------------------------------------------------------
    346
    347	    * if the shared mount is the only mount in its peer group, making it
    348	    slave, makes it private automatically. Note that there is no master to
    349	    which it can be slaved to.
    350
    351	    ** slaving a non-shared mount has no effect on the mount.
    352
    353	Apart from the commands listed below, the 'move' operation also changes
    354	the state of a mount depending on type of the destination mount. Its
    355	explained in section 5d.
    356
    3575b) Bind semantics
    358
    359	Consider the following command::
    360
    361	    mount --bind A/a  B/b
    362
    363	where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
    364	is the destination mount and 'b' is the dentry in the destination mount.
    365
    366	The outcome depends on the type of mount of 'A' and 'B'. The table
    367	below contains quick reference::
    368
    369	    --------------------------------------------------------------------------
    370	    |         BIND MOUNT OPERATION                                           |
    371	    |************************************************************************|
    372	    |source(A)->| shared      |       private  |       slave    | unbindable |
    373	    | dest(B)  |              |                |                |            |
    374	    |   |      |              |                |                |            |
    375	    |   v      |              |                |                |            |
    376	    |************************************************************************|
    377	    |  shared  | shared       |     shared     | shared & slave |  invalid   |
    378	    |          |              |                |                |            |
    379	    |non-shared| shared       |      private   |      slave     |  invalid   |
    380	    **************************************************************************
    381
    382     	Details:
    383
    384    1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
    385	which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
    386	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
    387	are created and mounted at the dentry 'b' on all mounts where 'B'
    388	propagates to. A new propagation tree containing 'C1',..,'Cn' is
    389	created. This propagation tree is identical to the propagation tree of
    390	'B'.  And finally the peer-group of 'C' is merged with the peer group
    391	of 'A'.
    392
    393    2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
    394	which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
    395	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
    396	are created and mounted at the dentry 'b' on all mounts where 'B'
    397	propagates to. A new propagation tree is set containing all new mounts
    398	'C', 'C1', .., 'Cn' with exactly the same configuration as the
    399	propagation tree for 'B'.
    400
    401    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
    402	mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
    403	'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
    404	'C3' ... are created and mounted at the dentry 'b' on all mounts where
    405	'B' propagates to. A new propagation tree containing the new mounts
    406	'C','C1',..  'Cn' is created. This propagation tree is identical to the
    407	propagation tree for 'B'. And finally the mount 'C' and its peer group
    408	is made the slave of mount 'Z'.  In other words, mount 'C' is in the
    409	state 'slave and shared'.
    410
    411    4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
    412	invalid operation.
    413
    414    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
    415	unbindable) mount. A new mount 'C' which is clone of 'A', is created.
    416	Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
    417
    418    6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
    419	which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
    420	mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
    421	peer-group of 'A'.
    422
    423    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
    424	new mount 'C' which is a clone of 'A' is created. Its root dentry is
    425	'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
    426	slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
    427	'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
    428	mount/unmount on 'A' do not propagate anywhere else. Similarly
    429	mount/unmount on 'C' do not propagate anywhere else.
    430
    431    8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
    432	invalid operation. A unbindable mount cannot be bind mounted.
    433
    4345c) Rbind semantics
    435
    436	rbind is same as bind. Bind replicates the specified mount.  Rbind
    437	replicates all the mounts in the tree belonging to the specified mount.
    438	Rbind mount is bind mount applied to all the mounts in the tree.
    439
    440	If the source tree that is rbind has some unbindable mounts,
    441	then the subtree under the unbindable mount is pruned in the new
    442	location.
    443
    444	eg:
    445
    446	  let's say we have the following mount tree::
    447
    448		A
    449	      /   \
    450	      B   C
    451	     / \ / \
    452	     D E F G
    453
    454	  Let's say all the mount except the mount C in the tree are
    455	  of a type other than unbindable.
    456
    457	  If this tree is rbound to say Z
    458
    459	  We will have the following tree at the new location::
    460
    461		Z
    462		|
    463		A'
    464	       /
    465	      B'		Note how the tree under C is pruned
    466	     / \ 		in the new location.
    467	    D' E'
    468
    469
    470
    4715d) Move semantics
    472
    473	Consider the following command
    474
    475	mount --move A  B/b
    476
    477	where 'A' is the source mount, 'B' is the destination mount and 'b' is
    478	the dentry in the destination mount.
    479
    480	The outcome depends on the type of the mount of 'A' and 'B'. The table
    481	below is a quick reference::
    482
    483	    ---------------------------------------------------------------------------
    484	    |         		MOVE MOUNT OPERATION                                 |
    485	    |**************************************************************************
    486	    | source(A)->| shared      |       private  |       slave    | unbindable |
    487	    | dest(B)  |               |                |                |            |
    488	    |   |      |               |                |                |            |
    489	    |   v      |               |                |                |            |
    490	    |**************************************************************************
    491	    |  shared  | shared        |     shared     |shared and slave|  invalid   |
    492	    |          |               |                |                |            |
    493	    |non-shared| shared        |      private   |    slave       | unbindable |
    494	    ***************************************************************************
    495
    496	.. Note:: moving a mount residing under a shared mount is invalid.
    497
    498      Details follow:
    499
    500    1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
    501	mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
    502	are created and mounted at dentry 'b' on all mounts that receive
    503	propagation from mount 'B'. A new propagation tree is created in the
    504	exact same configuration as that of 'B'. This new propagation tree
    505	contains all the new mounts 'A1', 'A2'...  'An'.  And this new
    506	propagation tree is appended to the already existing propagation tree
    507	of 'A'.
    508
    509    2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
    510	mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
    511	are created and mounted at dentry 'b' on all mounts that receive
    512	propagation from mount 'B'. The mount 'A' becomes a shared mount and a
    513	propagation tree is created which is identical to that of
    514	'B'. This new propagation tree contains all the new mounts 'A1',
    515	'A2'...  'An'.
    516
    517    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
    518	mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
    519	'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
    520	receive propagation from mount 'B'. A new propagation tree is created
    521	in the exact same configuration as that of 'B'. This new propagation
    522	tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
    523	propagation tree is appended to the already existing propagation tree of
    524	'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
    525	becomes 'shared'.
    526
    527    4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
    528	is invalid. Because mounting anything on the shared mount 'B' can
    529	create new mounts that get mounted on the mounts that receive
    530	propagation from 'B'.  And since the mount 'A' is unbindable, cloning
    531	it to mount at other mountpoints is not possible.
    532
    533    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
    534	unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
    535
    536    6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
    537	is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
    538	shared mount.
    539
    540    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
    541	The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
    542	continues to be a slave mount of mount 'Z'.
    543
    544    8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
    545	'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
    546	unbindable mount.
    547
    5485e) Mount semantics
    549
    550	Consider the following command::
    551
    552	    mount device  B/b
    553
    554	'B' is the destination mount and 'b' is the dentry in the destination
    555	mount.
    556
    557	The above operation is the same as bind operation with the exception
    558	that the source mount is always a private mount.
    559
    560
    5615f) Unmount semantics
    562
    563	Consider the following command::
    564
    565	    umount A
    566
    567	where 'A' is a mount mounted on mount 'B' at dentry 'b'.
    568
    569	If mount 'B' is shared, then all most-recently-mounted mounts at dentry
    570	'b' on mounts that receive propagation from mount 'B' and does not have
    571	sub-mounts within them are unmounted.
    572
    573	Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
    574	each other.
    575
    576	let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
    577	'B1', 'B2' and 'B3' respectively.
    578
    579	let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
    580	mount 'B1', 'B2' and 'B3' respectively.
    581
    582	if 'C1' is unmounted, all the mounts that are most-recently-mounted on
    583	'B1' and on the mounts that 'B1' propagates-to are unmounted.
    584
    585	'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
    586	on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
    587
    588	So all 'C1', 'C2' and 'C3' should be unmounted.
    589
    590	If any of 'C2' or 'C3' has some child mounts, then that mount is not
    591	unmounted, but all other mounts are unmounted. However if 'C1' is told
    592	to be unmounted and 'C1' has some sub-mounts, the umount operation is
    593	failed entirely.
    594
    5955g) Clone Namespace
    596
    597	A cloned namespace contains all the mounts as that of the parent
    598	namespace.
    599
    600	Let's say 'A' and 'B' are the corresponding mounts in the parent and the
    601	child namespace.
    602
    603	If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
    604	each other.
    605
    606	If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
    607	'Z'.
    608
    609	If 'A' is a private mount, then 'B' is a private mount too.
    610
    611	If 'A' is unbindable mount, then 'B' is a unbindable mount too.
    612
    613
    6146) Quiz
    615
    616	A. What is the result of the following command sequence?
    617
    618		::
    619
    620		    mount --bind /mnt /mnt
    621		    mount --make-shared /mnt
    622		    mount --bind /mnt /tmp
    623		    mount --move /tmp /mnt/1
    624
    625		what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
    626		Should they all be identical? or should /mnt and /mnt/1 be
    627		identical only?
    628
    629
    630	B. What is the result of the following command sequence?
    631
    632		::
    633
    634		    mount --make-rshared /
    635		    mkdir -p /v/1
    636		    mount --rbind / /v/1
    637
    638		what should be the content of /v/1/v/1 be?
    639
    640
    641	C. What is the result of the following command sequence?
    642
    643		::
    644
    645		    mount --bind /mnt /mnt
    646		    mount --make-shared /mnt
    647		    mkdir -p /mnt/1/2/3 /mnt/1/test
    648		    mount --bind /mnt/1 /tmp
    649		    mount --make-slave /mnt
    650		    mount --make-shared /mnt
    651		    mount --bind /mnt/1/2 /tmp1
    652		    mount --make-slave /mnt
    653
    654		At this point we have the first mount at /tmp and
    655		its root dentry is 1. Let's call this mount 'A'
    656		And then we have a second mount at /tmp1 with root
    657		dentry 2. Let's call this mount 'B'
    658		Next we have a third mount at /mnt with root dentry
    659		mnt. Let's call this mount 'C'
    660
    661		'B' is the slave of 'A' and 'C' is a slave of 'B'
    662		A -> B -> C
    663
    664		at this point if we execute the following command
    665
    666		mount --bind /bin /tmp/test
    667
    668		The mount is attempted on 'A'
    669
    670		will the mount propagate to 'B' and 'C' ?
    671
    672		what would be the contents of
    673		/mnt/1/test be?
    674
    6757) FAQ
    676
    677	Q1. Why is bind mount needed? How is it different from symbolic links?
    678		symbolic links can get stale if the destination mount gets
    679		unmounted or moved. Bind mounts continue to exist even if the
    680		other mount is unmounted or moved.
    681
    682	Q2. Why can't the shared subtree be implemented using exportfs?
    683
    684		exportfs is a heavyweight way of accomplishing part of what
    685		shared subtree can do. I cannot imagine a way to implement the
    686		semantics of slave mount using exportfs?
    687
    688	Q3 Why is unbindable mount needed?
    689
    690		Let's say we want to replicate the mount tree at multiple
    691		locations within the same subtree.
    692
    693		if one rbind mounts a tree within the same subtree 'n' times
    694		the number of mounts created is an exponential function of 'n'.
    695		Having unbindable mount can help prune the unneeded bind
    696		mounts. Here is an example.
    697
    698		step 1:
    699		   let's say the root tree has just two directories with
    700		   one vfsmount::
    701
    702				    root
    703				   /    \
    704				  tmp    usr
    705
    706		    And we want to replicate the tree at multiple
    707		    mountpoints under /root/tmp
    708
    709		step 2:
    710		      ::
    711
    712
    713			mount --make-shared /root
    714
    715			mkdir -p /tmp/m1
    716
    717			mount --rbind /root /tmp/m1
    718
    719		      the new tree now looks like this::
    720
    721				    root
    722				   /    \
    723				 tmp    usr
    724				/
    725			       m1
    726			      /  \
    727			     tmp  usr
    728			     /
    729			    m1
    730
    731			  it has two vfsmounts
    732
    733		step 3:
    734		    ::
    735
    736			    mkdir -p /tmp/m2
    737			    mount --rbind /root /tmp/m2
    738
    739			the new tree now looks like this::
    740
    741				      root
    742				     /    \
    743				   tmp     usr
    744				  /    \
    745				m1       m2
    746			       / \       /  \
    747			     tmp  usr   tmp  usr
    748			     / \          /
    749			    m1  m2      m1
    750				/ \     /  \
    751			      tmp usr  tmp   usr
    752			      /        / \
    753			     m1       m1  m2
    754			    /  \
    755			  tmp   usr
    756			  /  \
    757			 m1   m2
    758
    759		       it has 6 vfsmounts
    760
    761		step 4:
    762		      ::
    763			  mkdir -p /tmp/m3
    764			  mount --rbind /root /tmp/m3
    765
    766			  I won't draw the tree..but it has 24 vfsmounts
    767
    768
    769		at step i the number of vfsmounts is V[i] = i*V[i-1].
    770		This is an exponential function. And this tree has way more
    771		mounts than what we really needed in the first place.
    772
    773		One could use a series of umount at each step to prune
    774		out the unneeded mounts. But there is a better solution.
    775		Unclonable mounts come in handy here.
    776
    777		step 1:
    778		   let's say the root tree has just two directories with
    779		   one vfsmount::
    780
    781				    root
    782				   /    \
    783				  tmp    usr
    784
    785		    How do we set up the same tree at multiple locations under
    786		    /root/tmp
    787
    788		step 2:
    789		      ::
    790
    791
    792			mount --bind /root/tmp /root/tmp
    793
    794			mount --make-rshared /root
    795			mount --make-unbindable /root/tmp
    796
    797			mkdir -p /tmp/m1
    798
    799			mount --rbind /root /tmp/m1
    800
    801		      the new tree now looks like this::
    802
    803				    root
    804				   /    \
    805				 tmp    usr
    806				/
    807			       m1
    808			      /  \
    809			     tmp  usr
    810
    811		step 3:
    812		      ::
    813
    814			    mkdir -p /tmp/m2
    815			    mount --rbind /root /tmp/m2
    816
    817		      the new tree now looks like this::
    818
    819				    root
    820				   /    \
    821				 tmp    usr
    822				/   \
    823			       m1     m2
    824			      /  \     / \
    825			     tmp  usr tmp usr
    826
    827		step 4:
    828		      ::
    829
    830			    mkdir -p /tmp/m3
    831			    mount --rbind /root /tmp/m3
    832
    833		      the new tree now looks like this::
    834
    835				    	  root
    836				      /    	  \
    837				     tmp    	   usr
    838			         /    \    \
    839			       m1     m2     m3
    840			      /  \     / \    /  \
    841			     tmp  usr tmp usr tmp usr
    842
    8438) Implementation
    844
    8458A) Datastructure
    846
    847	4 new fields are introduced to struct vfsmount:
    848
    849	*   ->mnt_share
    850	*   ->mnt_slave_list
    851	*   ->mnt_slave
    852	*   ->mnt_master
    853
    854	->mnt_share
    855		links together all the mount to/from which this vfsmount
    856		send/receives propagation events.
    857
    858	->mnt_slave_list
    859		links all the mounts to which this vfsmount propagates
    860		to.
    861
    862	->mnt_slave
    863		links together all the slaves that its master vfsmount
    864		propagates to.
    865
    866	->mnt_master
    867		points to the master vfsmount from which this vfsmount
    868		receives propagation.
    869
    870	->mnt_flags
    871		takes two more flags to indicate the propagation status of
    872		the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
    873		vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
    874		replicated.
    875
    876	All the shared vfsmounts in a peer group form a cyclic list through
    877	->mnt_share.
    878
    879	All vfsmounts with the same ->mnt_master form on a cyclic list anchored
    880	in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
    881
    882	 ->mnt_master can point to arbitrary (and possibly different) members
    883	 of master peer group.  To find all immediate slaves of a peer group
    884	 you need to go through _all_ ->mnt_slave_list of its members.
    885	 Conceptually it's just a single set - distribution among the
    886	 individual lists does not affect propagation or the way propagation
    887	 tree is modified by operations.
    888
    889	All vfsmounts in a peer group have the same ->mnt_master.  If it is
    890	non-NULL, they form a contiguous (ordered) segment of slave list.
    891
    892	A example propagation tree looks as shown in the figure below.
    893	[ NOTE: Though it looks like a forest, if we consider all the shared
    894	mounts as a conceptual entity called 'pnode', it becomes a tree]::
    895
    896
    897		        A <--> B <--> C <---> D
    898		       /|\	      /|      |\
    899		      / F G	     J K      H I
    900		     /
    901		    E<-->K
    902			/|\
    903		       M L N
    904
    905	In the above figure  A,B,C and D all are shared and propagate to each
    906	other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
    907	mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
    908	'E' is also shared with 'K' and they propagate to each other.  And
    909	'K' has 3 slaves 'M', 'L' and 'N'
    910
    911	A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
    912
    913	A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
    914
    915	E's ->mnt_share links with ->mnt_share of K
    916
    917	'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
    918
    919	'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
    920
    921	K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
    922
    923	C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
    924
    925	J and K's ->mnt_master points to struct vfsmount of C
    926
    927	and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
    928
    929	'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
    930
    931
    932	NOTE: The propagation tree is orthogonal to the mount tree.
    933
    9348B Locking:
    935
    936	->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
    937	by namespace_sem (exclusive for modifications, shared for reading).
    938
    939	Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
    940	There are two exceptions: do_add_mount() and clone_mnt().
    941	The former modifies a vfsmount that has not been visible in any shared
    942	data structures yet.
    943	The latter holds namespace_sem and the only references to vfsmount
    944	are in lists that can't be traversed without namespace_sem.
    945
    9468C Algorithm:
    947
    948	The crux of the implementation resides in rbind/move operation.
    949
    950	The overall algorithm breaks the operation into 3 phases: (look at
    951	attach_recursive_mnt() and propagate_mnt())
    952
    953	1. prepare phase.
    954	2. commit phases.
    955	3. abort phases.
    956
    957	Prepare phase:
    958
    959	for each mount in the source tree:
    960
    961		   a) Create the necessary number of mount trees to
    962		   	be attached to each of the mounts that receive
    963			propagation from the destination mount.
    964		   b) Do not attach any of the trees to its destination.
    965		      However note down its ->mnt_parent and ->mnt_mountpoint
    966		   c) Link all the new mounts to form a propagation tree that
    967		      is identical to the propagation tree of the destination
    968		      mount.
    969
    970		   If this phase is successful, there should be 'n' new
    971		   propagation trees; where 'n' is the number of mounts in the
    972		   source tree.  Go to the commit phase
    973
    974		   Also there should be 'm' new mount trees, where 'm' is
    975		   the number of mounts to which the destination mount
    976		   propagates to.
    977
    978		   if any memory allocations fail, go to the abort phase.
    979
    980	Commit phase
    981		attach each of the mount trees to their corresponding
    982		destination mounts.
    983
    984	Abort phase
    985		delete all the newly created trees.
    986
    987	.. Note::
    988	   all the propagation related functionality resides in the file pnode.c
    989
    990
    991------------------------------------------------------------------------
    992
    993version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
    994
    995version 0.2  (Incorporated comments from Al Viro)