sharedsubtree.rst (30262B)
1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4Shared Subtrees 5=============== 6 7.. Contents: 8 1) Overview 9 2) Features 10 3) Setting mount states 11 4) Use-case 12 5) Detailed semantics 13 6) Quiz 14 7) FAQ 15 8) Implementation 16 17 181) Overview 19----------- 20 21Consider the following situation: 22 23A process wants to clone its own namespace, but still wants to access the CD 24that got mounted recently. Shared subtree semantics provide the necessary 25mechanism to accomplish the above. 26 27It provides the necessary building blocks for features like per-user-namespace 28and versioned filesystem. 29 302) Features 31----------- 32 33Shared subtree provides four different flavors of mounts; struct vfsmount to be 34precise 35 36 a. shared mount 37 b. slave mount 38 c. private mount 39 d. unbindable mount 40 41 422a) A shared mount can be replicated to as many mountpoints and all the 43replicas continue to be exactly same. 44 45 Here is an example: 46 47 Let's say /mnt has a mount that is shared:: 48 49 mount --make-shared /mnt 50 51 Note: mount(8) command now supports the --make-shared flag, 52 so the sample 'smount' program is no longer needed and has been 53 removed. 54 55 :: 56 57 # mount --bind /mnt /tmp 58 59 The above command replicates the mount at /mnt to the mountpoint /tmp 60 and the contents of both the mounts remain identical. 61 62 :: 63 64 #ls /mnt 65 a b c 66 67 #ls /tmp 68 a b c 69 70 Now let's say we mount a device at /tmp/a:: 71 72 # mount /dev/sd0 /tmp/a 73 74 #ls /tmp/a 75 t1 t2 t3 76 77 #ls /mnt/a 78 t1 t2 t3 79 80 Note that the mount has propagated to the mount at /mnt as well. 81 82 And the same is true even when /dev/sd0 is mounted on /mnt/a. The 83 contents will be visible under /tmp/a too. 84 85 862b) A slave mount is like a shared mount except that mount and umount events 87 only propagate towards it. 88 89 All slave mounts have a master mount which is a shared. 90 91 Here is an example: 92 93 Let's say /mnt has a mount which is shared. 94 # mount --make-shared /mnt 95 96 Let's bind mount /mnt to /tmp 97 # mount --bind /mnt /tmp 98 99 the new mount at /tmp becomes a shared mount and it is a replica of 100 the mount at /mnt. 101 102 Now let's make the mount at /tmp; a slave of /mnt 103 # mount --make-slave /tmp 104 105 let's mount /dev/sd0 on /mnt/a 106 # mount /dev/sd0 /mnt/a 107 108 #ls /mnt/a 109 t1 t2 t3 110 111 #ls /tmp/a 112 t1 t2 t3 113 114 Note the mount event has propagated to the mount at /tmp 115 116 However let's see what happens if we mount something on the mount at /tmp 117 118 # mount /dev/sd1 /tmp/b 119 120 #ls /tmp/b 121 s1 s2 s3 122 123 #ls /mnt/b 124 125 Note how the mount event has not propagated to the mount at 126 /mnt 127 128 1292c) A private mount does not forward or receive propagation. 130 131 This is the mount we are familiar with. Its the default type. 132 133 1342d) A unbindable mount is a unbindable private mount 135 136 let's say we have a mount at /mnt and we make it unbindable:: 137 138 # mount --make-unbindable /mnt 139 140 Let's try to bind mount this mount somewhere else:: 141 142 # mount --bind /mnt /tmp 143 mount: wrong fs type, bad option, bad superblock on /mnt, 144 or too many mounted file systems 145 146 Binding a unbindable mount is a invalid operation. 147 148 1493) Setting mount states 150 151 The mount command (util-linux package) can be used to set mount 152 states:: 153 154 mount --make-shared mountpoint 155 mount --make-slave mountpoint 156 mount --make-private mountpoint 157 mount --make-unbindable mountpoint 158 159 1604) Use cases 161------------ 162 163 A) A process wants to clone its own namespace, but still wants to 164 access the CD that got mounted recently. 165 166 Solution: 167 168 The system administrator can make the mount at /cdrom shared:: 169 170 mount --bind /cdrom /cdrom 171 mount --make-shared /cdrom 172 173 Now any process that clones off a new namespace will have a 174 mount at /cdrom which is a replica of the same mount in the 175 parent namespace. 176 177 So when a CD is inserted and mounted at /cdrom that mount gets 178 propagated to the other mount at /cdrom in all the other clone 179 namespaces. 180 181 B) A process wants its mounts invisible to any other process, but 182 still be able to see the other system mounts. 183 184 Solution: 185 186 To begin with, the administrator can mark the entire mount tree 187 as shareable:: 188 189 mount --make-rshared / 190 191 A new process can clone off a new namespace. And mark some part 192 of its namespace as slave:: 193 194 mount --make-rslave /myprivatetree 195 196 Hence forth any mounts within the /myprivatetree done by the 197 process will not show up in any other namespace. However mounts 198 done in the parent namespace under /myprivatetree still shows 199 up in the process's namespace. 200 201 202 Apart from the above semantics this feature provides the 203 building blocks to solve the following problems: 204 205 C) Per-user namespace 206 207 The above semantics allows a way to share mounts across 208 namespaces. But namespaces are associated with processes. If 209 namespaces are made first class objects with user API to 210 associate/disassociate a namespace with userid, then each user 211 could have his/her own namespace and tailor it to his/her 212 requirements. This needs to be supported in PAM. 213 214 D) Versioned files 215 216 If the entire mount tree is visible at multiple locations, then 217 an underlying versioning file system can return different 218 versions of the file depending on the path used to access that 219 file. 220 221 An example is:: 222 223 mount --make-shared / 224 mount --rbind / /view/v1 225 mount --rbind / /view/v2 226 mount --rbind / /view/v3 227 mount --rbind / /view/v4 228 229 and if /usr has a versioning filesystem mounted, then that 230 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and 231 /view/v4/usr too 232 233 A user can request v3 version of the file /usr/fs/namespace.c 234 by accessing /view/v3/usr/fs/namespace.c . The underlying 235 versioning filesystem can then decipher that v3 version of the 236 filesystem is being requested and return the corresponding 237 inode. 238 2395) Detailed semantics 240--------------------- 241 The section below explains the detailed semantics of 242 bind, rbind, move, mount, umount and clone-namespace operations. 243 244 Note: the word 'vfsmount' and the noun 'mount' have been used 245 to mean the same thing, throughout this document. 246 2475a) Mount states 248 249 A given mount can be in one of the following states 250 251 1) shared 252 2) slave 253 3) shared and slave 254 4) private 255 5) unbindable 256 257 A 'propagation event' is defined as event generated on a vfsmount 258 that leads to mount or unmount actions in other vfsmounts. 259 260 A 'peer group' is defined as a group of vfsmounts that propagate 261 events to each other. 262 263 (1) Shared mounts 264 265 A 'shared mount' is defined as a vfsmount that belongs to a 266 'peer group'. 267 268 For example:: 269 270 mount --make-shared /mnt 271 mount --bind /mnt /tmp 272 273 The mount at /mnt and that at /tmp are both shared and belong 274 to the same peer group. Anything mounted or unmounted under 275 /mnt or /tmp reflect in all the other mounts of its peer 276 group. 277 278 279 (2) Slave mounts 280 281 A 'slave mount' is defined as a vfsmount that receives 282 propagation events and does not forward propagation events. 283 284 A slave mount as the name implies has a master mount from which 285 mount/unmount events are received. Events do not propagate from 286 the slave mount to the master. Only a shared mount can be made 287 a slave by executing the following command:: 288 289 mount --make-slave mount 290 291 A shared mount that is made as a slave is no more shared unless 292 modified to become shared. 293 294 (3) Shared and Slave 295 296 A vfsmount can be both shared as well as slave. This state 297 indicates that the mount is a slave of some vfsmount, and 298 has its own peer group too. This vfsmount receives propagation 299 events from its master vfsmount, and also forwards propagation 300 events to its 'peer group' and to its slave vfsmounts. 301 302 Strictly speaking, the vfsmount is shared having its own 303 peer group, and this peer-group is a slave of some other 304 peer group. 305 306 Only a slave vfsmount can be made as 'shared and slave' by 307 either executing the following command:: 308 309 mount --make-shared mount 310 311 or by moving the slave vfsmount under a shared vfsmount. 312 313 (4) Private mount 314 315 A 'private mount' is defined as vfsmount that does not 316 receive or forward any propagation events. 317 318 (5) Unbindable mount 319 320 A 'unbindable mount' is defined as vfsmount that does not 321 receive or forward any propagation events and cannot 322 be bind mounted. 323 324 325 State diagram: 326 327 The state diagram below explains the state transition of a mount, 328 in response to various commands:: 329 330 ----------------------------------------------------------------------- 331 | |make-shared | make-slave | make-private |make-unbindab| 332 --------------|------------|--------------|--------------|-------------| 333 |shared |shared |*slave/private| private | unbindable | 334 | | | | | | 335 |-------------|------------|--------------|--------------|-------------| 336 |slave |shared | **slave | private | unbindable | 337 | |and slave | | | | 338 |-------------|------------|--------------|--------------|-------------| 339 |shared |shared | slave | private | unbindable | 340 |and slave |and slave | | | | 341 |-------------|------------|--------------|--------------|-------------| 342 |private |shared | **private | private | unbindable | 343 |-------------|------------|--------------|--------------|-------------| 344 |unbindable |shared |**unbindable | private | unbindable | 345 ------------------------------------------------------------------------ 346 347 * if the shared mount is the only mount in its peer group, making it 348 slave, makes it private automatically. Note that there is no master to 349 which it can be slaved to. 350 351 ** slaving a non-shared mount has no effect on the mount. 352 353 Apart from the commands listed below, the 'move' operation also changes 354 the state of a mount depending on type of the destination mount. Its 355 explained in section 5d. 356 3575b) Bind semantics 358 359 Consider the following command:: 360 361 mount --bind A/a B/b 362 363 where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B' 364 is the destination mount and 'b' is the dentry in the destination mount. 365 366 The outcome depends on the type of mount of 'A' and 'B'. The table 367 below contains quick reference:: 368 369 -------------------------------------------------------------------------- 370 | BIND MOUNT OPERATION | 371 |************************************************************************| 372 |source(A)->| shared | private | slave | unbindable | 373 | dest(B) | | | | | 374 | | | | | | | 375 | v | | | | | 376 |************************************************************************| 377 | shared | shared | shared | shared & slave | invalid | 378 | | | | | | 379 |non-shared| shared | private | slave | invalid | 380 ************************************************************************** 381 382 Details: 383 384 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C' 385 which is clone of 'A', is created. Its root dentry is 'a' . 'C' is 386 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 387 are created and mounted at the dentry 'b' on all mounts where 'B' 388 propagates to. A new propagation tree containing 'C1',..,'Cn' is 389 created. This propagation tree is identical to the propagation tree of 390 'B'. And finally the peer-group of 'C' is merged with the peer group 391 of 'A'. 392 393 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C' 394 which is clone of 'A', is created. Its root dentry is 'a'. 'C' is 395 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 396 are created and mounted at the dentry 'b' on all mounts where 'B' 397 propagates to. A new propagation tree is set containing all new mounts 398 'C', 'C1', .., 'Cn' with exactly the same configuration as the 399 propagation tree for 'B'. 400 401 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new 402 mount 'C' which is clone of 'A', is created. Its root dentry is 'a' . 403 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2', 404 'C3' ... are created and mounted at the dentry 'b' on all mounts where 405 'B' propagates to. A new propagation tree containing the new mounts 406 'C','C1',.. 'Cn' is created. This propagation tree is identical to the 407 propagation tree for 'B'. And finally the mount 'C' and its peer group 408 is made the slave of mount 'Z'. In other words, mount 'C' is in the 409 state 'slave and shared'. 410 411 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a 412 invalid operation. 413 414 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 415 unbindable) mount. A new mount 'C' which is clone of 'A', is created. 416 Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'. 417 418 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C' 419 which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is 420 mounted on mount 'B' at dentry 'b'. 'C' is made a member of the 421 peer-group of 'A'. 422 423 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A 424 new mount 'C' which is a clone of 'A' is created. Its root dentry is 425 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a 426 slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of 427 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But 428 mount/unmount on 'A' do not propagate anywhere else. Similarly 429 mount/unmount on 'C' do not propagate anywhere else. 430 431 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a 432 invalid operation. A unbindable mount cannot be bind mounted. 433 4345c) Rbind semantics 435 436 rbind is same as bind. Bind replicates the specified mount. Rbind 437 replicates all the mounts in the tree belonging to the specified mount. 438 Rbind mount is bind mount applied to all the mounts in the tree. 439 440 If the source tree that is rbind has some unbindable mounts, 441 then the subtree under the unbindable mount is pruned in the new 442 location. 443 444 eg: 445 446 let's say we have the following mount tree:: 447 448 A 449 / \ 450 B C 451 / \ / \ 452 D E F G 453 454 Let's say all the mount except the mount C in the tree are 455 of a type other than unbindable. 456 457 If this tree is rbound to say Z 458 459 We will have the following tree at the new location:: 460 461 Z 462 | 463 A' 464 / 465 B' Note how the tree under C is pruned 466 / \ in the new location. 467 D' E' 468 469 470 4715d) Move semantics 472 473 Consider the following command 474 475 mount --move A B/b 476 477 where 'A' is the source mount, 'B' is the destination mount and 'b' is 478 the dentry in the destination mount. 479 480 The outcome depends on the type of the mount of 'A' and 'B'. The table 481 below is a quick reference:: 482 483 --------------------------------------------------------------------------- 484 | MOVE MOUNT OPERATION | 485 |************************************************************************** 486 | source(A)->| shared | private | slave | unbindable | 487 | dest(B) | | | | | 488 | | | | | | | 489 | v | | | | | 490 |************************************************************************** 491 | shared | shared | shared |shared and slave| invalid | 492 | | | | | | 493 |non-shared| shared | private | slave | unbindable | 494 *************************************************************************** 495 496 .. Note:: moving a mount residing under a shared mount is invalid. 497 498 Details follow: 499 500 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is 501 mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An' 502 are created and mounted at dentry 'b' on all mounts that receive 503 propagation from mount 'B'. A new propagation tree is created in the 504 exact same configuration as that of 'B'. This new propagation tree 505 contains all the new mounts 'A1', 'A2'... 'An'. And this new 506 propagation tree is appended to the already existing propagation tree 507 of 'A'. 508 509 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is 510 mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An' 511 are created and mounted at dentry 'b' on all mounts that receive 512 propagation from mount 'B'. The mount 'A' becomes a shared mount and a 513 propagation tree is created which is identical to that of 514 'B'. This new propagation tree contains all the new mounts 'A1', 515 'A2'... 'An'. 516 517 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The 518 mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 519 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that 520 receive propagation from mount 'B'. A new propagation tree is created 521 in the exact same configuration as that of 'B'. This new propagation 522 tree contains all the new mounts 'A1', 'A2'... 'An'. And this new 523 propagation tree is appended to the already existing propagation tree of 524 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also 525 becomes 'shared'. 526 527 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation 528 is invalid. Because mounting anything on the shared mount 'B' can 529 create new mounts that get mounted on the mounts that receive 530 propagation from 'B'. And since the mount 'A' is unbindable, cloning 531 it to mount at other mountpoints is not possible. 532 533 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 534 unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'. 535 536 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A' 537 is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 538 shared mount. 539 540 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. 541 The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' 542 continues to be a slave mount of mount 'Z'. 543 544 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount 545 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 546 unbindable mount. 547 5485e) Mount semantics 549 550 Consider the following command:: 551 552 mount device B/b 553 554 'B' is the destination mount and 'b' is the dentry in the destination 555 mount. 556 557 The above operation is the same as bind operation with the exception 558 that the source mount is always a private mount. 559 560 5615f) Unmount semantics 562 563 Consider the following command:: 564 565 umount A 566 567 where 'A' is a mount mounted on mount 'B' at dentry 'b'. 568 569 If mount 'B' is shared, then all most-recently-mounted mounts at dentry 570 'b' on mounts that receive propagation from mount 'B' and does not have 571 sub-mounts within them are unmounted. 572 573 Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to 574 each other. 575 576 let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 577 'B1', 'B2' and 'B3' respectively. 578 579 let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on 580 mount 'B1', 'B2' and 'B3' respectively. 581 582 if 'C1' is unmounted, all the mounts that are most-recently-mounted on 583 'B1' and on the mounts that 'B1' propagates-to are unmounted. 584 585 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount 586 on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'. 587 588 So all 'C1', 'C2' and 'C3' should be unmounted. 589 590 If any of 'C2' or 'C3' has some child mounts, then that mount is not 591 unmounted, but all other mounts are unmounted. However if 'C1' is told 592 to be unmounted and 'C1' has some sub-mounts, the umount operation is 593 failed entirely. 594 5955g) Clone Namespace 596 597 A cloned namespace contains all the mounts as that of the parent 598 namespace. 599 600 Let's say 'A' and 'B' are the corresponding mounts in the parent and the 601 child namespace. 602 603 If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to 604 each other. 605 606 If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of 607 'Z'. 608 609 If 'A' is a private mount, then 'B' is a private mount too. 610 611 If 'A' is unbindable mount, then 'B' is a unbindable mount too. 612 613 6146) Quiz 615 616 A. What is the result of the following command sequence? 617 618 :: 619 620 mount --bind /mnt /mnt 621 mount --make-shared /mnt 622 mount --bind /mnt /tmp 623 mount --move /tmp /mnt/1 624 625 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be? 626 Should they all be identical? or should /mnt and /mnt/1 be 627 identical only? 628 629 630 B. What is the result of the following command sequence? 631 632 :: 633 634 mount --make-rshared / 635 mkdir -p /v/1 636 mount --rbind / /v/1 637 638 what should be the content of /v/1/v/1 be? 639 640 641 C. What is the result of the following command sequence? 642 643 :: 644 645 mount --bind /mnt /mnt 646 mount --make-shared /mnt 647 mkdir -p /mnt/1/2/3 /mnt/1/test 648 mount --bind /mnt/1 /tmp 649 mount --make-slave /mnt 650 mount --make-shared /mnt 651 mount --bind /mnt/1/2 /tmp1 652 mount --make-slave /mnt 653 654 At this point we have the first mount at /tmp and 655 its root dentry is 1. Let's call this mount 'A' 656 And then we have a second mount at /tmp1 with root 657 dentry 2. Let's call this mount 'B' 658 Next we have a third mount at /mnt with root dentry 659 mnt. Let's call this mount 'C' 660 661 'B' is the slave of 'A' and 'C' is a slave of 'B' 662 A -> B -> C 663 664 at this point if we execute the following command 665 666 mount --bind /bin /tmp/test 667 668 The mount is attempted on 'A' 669 670 will the mount propagate to 'B' and 'C' ? 671 672 what would be the contents of 673 /mnt/1/test be? 674 6757) FAQ 676 677 Q1. Why is bind mount needed? How is it different from symbolic links? 678 symbolic links can get stale if the destination mount gets 679 unmounted or moved. Bind mounts continue to exist even if the 680 other mount is unmounted or moved. 681 682 Q2. Why can't the shared subtree be implemented using exportfs? 683 684 exportfs is a heavyweight way of accomplishing part of what 685 shared subtree can do. I cannot imagine a way to implement the 686 semantics of slave mount using exportfs? 687 688 Q3 Why is unbindable mount needed? 689 690 Let's say we want to replicate the mount tree at multiple 691 locations within the same subtree. 692 693 if one rbind mounts a tree within the same subtree 'n' times 694 the number of mounts created is an exponential function of 'n'. 695 Having unbindable mount can help prune the unneeded bind 696 mounts. Here is an example. 697 698 step 1: 699 let's say the root tree has just two directories with 700 one vfsmount:: 701 702 root 703 / \ 704 tmp usr 705 706 And we want to replicate the tree at multiple 707 mountpoints under /root/tmp 708 709 step 2: 710 :: 711 712 713 mount --make-shared /root 714 715 mkdir -p /tmp/m1 716 717 mount --rbind /root /tmp/m1 718 719 the new tree now looks like this:: 720 721 root 722 / \ 723 tmp usr 724 / 725 m1 726 / \ 727 tmp usr 728 / 729 m1 730 731 it has two vfsmounts 732 733 step 3: 734 :: 735 736 mkdir -p /tmp/m2 737 mount --rbind /root /tmp/m2 738 739 the new tree now looks like this:: 740 741 root 742 / \ 743 tmp usr 744 / \ 745 m1 m2 746 / \ / \ 747 tmp usr tmp usr 748 / \ / 749 m1 m2 m1 750 / \ / \ 751 tmp usr tmp usr 752 / / \ 753 m1 m1 m2 754 / \ 755 tmp usr 756 / \ 757 m1 m2 758 759 it has 6 vfsmounts 760 761 step 4: 762 :: 763 mkdir -p /tmp/m3 764 mount --rbind /root /tmp/m3 765 766 I won't draw the tree..but it has 24 vfsmounts 767 768 769 at step i the number of vfsmounts is V[i] = i*V[i-1]. 770 This is an exponential function. And this tree has way more 771 mounts than what we really needed in the first place. 772 773 One could use a series of umount at each step to prune 774 out the unneeded mounts. But there is a better solution. 775 Unclonable mounts come in handy here. 776 777 step 1: 778 let's say the root tree has just two directories with 779 one vfsmount:: 780 781 root 782 / \ 783 tmp usr 784 785 How do we set up the same tree at multiple locations under 786 /root/tmp 787 788 step 2: 789 :: 790 791 792 mount --bind /root/tmp /root/tmp 793 794 mount --make-rshared /root 795 mount --make-unbindable /root/tmp 796 797 mkdir -p /tmp/m1 798 799 mount --rbind /root /tmp/m1 800 801 the new tree now looks like this:: 802 803 root 804 / \ 805 tmp usr 806 / 807 m1 808 / \ 809 tmp usr 810 811 step 3: 812 :: 813 814 mkdir -p /tmp/m2 815 mount --rbind /root /tmp/m2 816 817 the new tree now looks like this:: 818 819 root 820 / \ 821 tmp usr 822 / \ 823 m1 m2 824 / \ / \ 825 tmp usr tmp usr 826 827 step 4: 828 :: 829 830 mkdir -p /tmp/m3 831 mount --rbind /root /tmp/m3 832 833 the new tree now looks like this:: 834 835 root 836 / \ 837 tmp usr 838 / \ \ 839 m1 m2 m3 840 / \ / \ / \ 841 tmp usr tmp usr tmp usr 842 8438) Implementation 844 8458A) Datastructure 846 847 4 new fields are introduced to struct vfsmount: 848 849 * ->mnt_share 850 * ->mnt_slave_list 851 * ->mnt_slave 852 * ->mnt_master 853 854 ->mnt_share 855 links together all the mount to/from which this vfsmount 856 send/receives propagation events. 857 858 ->mnt_slave_list 859 links all the mounts to which this vfsmount propagates 860 to. 861 862 ->mnt_slave 863 links together all the slaves that its master vfsmount 864 propagates to. 865 866 ->mnt_master 867 points to the master vfsmount from which this vfsmount 868 receives propagation. 869 870 ->mnt_flags 871 takes two more flags to indicate the propagation status of 872 the vfsmount. MNT_SHARE indicates that the vfsmount is a shared 873 vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be 874 replicated. 875 876 All the shared vfsmounts in a peer group form a cyclic list through 877 ->mnt_share. 878 879 All vfsmounts with the same ->mnt_master form on a cyclic list anchored 880 in ->mnt_master->mnt_slave_list and going through ->mnt_slave. 881 882 ->mnt_master can point to arbitrary (and possibly different) members 883 of master peer group. To find all immediate slaves of a peer group 884 you need to go through _all_ ->mnt_slave_list of its members. 885 Conceptually it's just a single set - distribution among the 886 individual lists does not affect propagation or the way propagation 887 tree is modified by operations. 888 889 All vfsmounts in a peer group have the same ->mnt_master. If it is 890 non-NULL, they form a contiguous (ordered) segment of slave list. 891 892 A example propagation tree looks as shown in the figure below. 893 [ NOTE: Though it looks like a forest, if we consider all the shared 894 mounts as a conceptual entity called 'pnode', it becomes a tree]:: 895 896 897 A <--> B <--> C <---> D 898 /|\ /| |\ 899 / F G J K H I 900 / 901 E<-->K 902 /|\ 903 M L N 904 905 In the above figure A,B,C and D all are shared and propagate to each 906 other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave 907 mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'. 908 'E' is also shared with 'K' and they propagate to each other. And 909 'K' has 3 slaves 'M', 'L' and 'N' 910 911 A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D' 912 913 A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G' 914 915 E's ->mnt_share links with ->mnt_share of K 916 917 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A' 918 919 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K' 920 921 K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N' 922 923 C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K' 924 925 J and K's ->mnt_master points to struct vfsmount of C 926 927 and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I' 928 929 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'. 930 931 932 NOTE: The propagation tree is orthogonal to the mount tree. 933 9348B Locking: 935 936 ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected 937 by namespace_sem (exclusive for modifications, shared for reading). 938 939 Normally we have ->mnt_flags modifications serialized by vfsmount_lock. 940 There are two exceptions: do_add_mount() and clone_mnt(). 941 The former modifies a vfsmount that has not been visible in any shared 942 data structures yet. 943 The latter holds namespace_sem and the only references to vfsmount 944 are in lists that can't be traversed without namespace_sem. 945 9468C Algorithm: 947 948 The crux of the implementation resides in rbind/move operation. 949 950 The overall algorithm breaks the operation into 3 phases: (look at 951 attach_recursive_mnt() and propagate_mnt()) 952 953 1. prepare phase. 954 2. commit phases. 955 3. abort phases. 956 957 Prepare phase: 958 959 for each mount in the source tree: 960 961 a) Create the necessary number of mount trees to 962 be attached to each of the mounts that receive 963 propagation from the destination mount. 964 b) Do not attach any of the trees to its destination. 965 However note down its ->mnt_parent and ->mnt_mountpoint 966 c) Link all the new mounts to form a propagation tree that 967 is identical to the propagation tree of the destination 968 mount. 969 970 If this phase is successful, there should be 'n' new 971 propagation trees; where 'n' is the number of mounts in the 972 source tree. Go to the commit phase 973 974 Also there should be 'm' new mount trees, where 'm' is 975 the number of mounts to which the destination mount 976 propagates to. 977 978 if any memory allocations fail, go to the abort phase. 979 980 Commit phase 981 attach each of the mount trees to their corresponding 982 destination mounts. 983 984 Abort phase 985 delete all the newly created trees. 986 987 .. Note:: 988 all the propagation related functionality resides in the file pnode.c 989 990 991------------------------------------------------------------------------ 992 993version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com) 994 995version 0.2 (Incorporated comments from Al Viro)