cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

dm-clone.rst (13154B)


      1.. SPDX-License-Identifier: GPL-2.0-only
      2
      3========
      4dm-clone
      5========
      6
      7Introduction
      8============
      9
     10dm-clone is a device mapper target which produces a one-to-one copy of an
     11existing, read-only source device into a writable destination device: It
     12presents a virtual block device which makes all data appear immediately, and
     13redirects reads and writes accordingly.
     14
     15The main use case of dm-clone is to clone a potentially remote, high-latency,
     16read-only, archival-type block device into a writable, fast, primary-type device
     17for fast, low-latency I/O. The cloned device is visible/mountable immediately
     18and the copy of the source device to the destination device happens in the
     19background, in parallel with user I/O.
     20
     21For example, one could restore an application backup from a read-only copy,
     22accessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE,
     23etc.), into a local SSD or NVMe device, and start using the device immediately,
     24without waiting for the restore to complete.
     25
     26When the cloning completes, the dm-clone table can be removed altogether and be
     27replaced, e.g., by a linear table, mapping directly to the destination device.
     28
     29The dm-clone target reuses the metadata library used by the thin-provisioning
     30target.
     31
     32Glossary
     33========
     34
     35   Hydration
     36     The process of filling a region of the destination device with data from
     37     the same region of the source device, i.e., copying the region from the
     38     source to the destination device.
     39
     40Once a region gets hydrated we redirect all I/O regarding it to the destination
     41device.
     42
     43Design
     44======
     45
     46Sub-devices
     47-----------
     48
     49The target is constructed by passing three devices to it (along with other
     50parameters detailed later):
     51
     521. A source device - the read-only device that gets cloned and source of the
     53   hydration.
     54
     552. A destination device - the destination of the hydration, which will become a
     56   clone of the source device.
     57
     583. A small metadata device - it records which regions are already valid in the
     59   destination device, i.e., which regions have already been hydrated, or have
     60   been written to directly, via user I/O.
     61
     62The size of the destination device must be at least equal to the size of the
     63source device.
     64
     65Regions
     66-------
     67
     68dm-clone divides the source and destination devices in fixed sized regions.
     69Regions are the unit of hydration, i.e., the minimum amount of data copied from
     70the source to the destination device.
     71
     72The region size is configurable when you first create the dm-clone device. The
     73recommended region size is the same as the file system block size, which usually
     74is 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors
     75(1GB) and a power of two.
     76
     77Reads and writes from/to hydrated regions are serviced from the destination
     78device.
     79
     80A read to a not yet hydrated region is serviced directly from the source device.
     81
     82A write to a not yet hydrated region will be delayed until the corresponding
     83region has been hydrated and the hydration of the region starts immediately.
     84
     85Note that a write request with size equal to region size will skip copying of
     86the corresponding region from the source device and overwrite the region of the
     87destination device directly.
     88
     89Discards
     90--------
     91
     92dm-clone interprets a discard request to a range that hasn't been hydrated yet
     93as a hint to skip hydration of the regions covered by the request, i.e., it
     94skips copying the region's data from the source to the destination device, and
     95only updates its metadata.
     96
     97If the destination device supports discards, then by default dm-clone will pass
     98down discard requests to it.
     99
    100Background Hydration
    101--------------------
    102
    103dm-clone copies continuously from the source to the destination device, until
    104all of the device has been copied.
    105
    106Copying data from the source to the destination device uses bandwidth. The user
    107can set a throttle to prevent more than a certain amount of copying occurring at
    108any one time. Moreover, dm-clone takes into account user I/O traffic going to
    109the devices and pauses the background hydration when there is I/O in-flight.
    110
    111A message `hydration_threshold <#regions>` can be used to set the maximum number
    112of regions being copied, the default being 1 region.
    113
    114dm-clone employs dm-kcopyd for copying portions of the source device to the
    115destination device. By default, we issue copy requests of size equal to the
    116region size. A message `hydration_batch_size <#regions>` can be used to tune the
    117size of these copy requests. Increasing the hydration batch size results in
    118dm-clone trying to batch together contiguous regions, so we copy the data in
    119batches of this many regions.
    120
    121When the hydration of the destination device finishes, a dm event will be sent
    122to user space.
    123
    124Updating on-disk metadata
    125-------------------------
    126
    127On-disk metadata is committed every time a FLUSH or FUA bio is written. If no
    128such requests are made then commits will occur every second. This means the
    129dm-clone device behaves like a physical disk that has a volatile write cache. If
    130power is lost you may lose some recent writes. The metadata should always be
    131consistent in spite of any crash.
    132
    133Target Interface
    134================
    135
    136Constructor
    137-----------
    138
    139  ::
    140
    141   clone <metadata dev> <destination dev> <source dev> <region size>
    142         [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]]
    143
    144 ================ ==============================================================
    145 metadata dev     Fast device holding the persistent metadata
    146 destination dev  The destination device, where the source will be cloned
    147 source dev       Read only device containing the data that gets cloned
    148 region size      The size of a region in sectors
    149
    150 #feature args    Number of feature arguments passed
    151 feature args     no_hydration or no_discard_passdown
    152
    153 #core args       An even number of arguments corresponding to key/value pairs
    154                  passed to dm-clone
    155 core args        Key/value pairs passed to dm-clone, e.g. `hydration_threshold
    156                  256`
    157 ================ ==============================================================
    158
    159Optional feature arguments are:
    160
    161 ==================== =========================================================
    162 no_hydration         Create a dm-clone instance with background hydration
    163                      disabled
    164 no_discard_passdown  Disable passing down discards to the destination device
    165 ==================== =========================================================
    166
    167Optional core arguments are:
    168
    169 ================================ ==============================================
    170 hydration_threshold <#regions>   Maximum number of regions being copied from
    171                                  the source to the destination device at any
    172                                  one time, during background hydration.
    173 hydration_batch_size <#regions>  During background hydration, try to batch
    174                                  together contiguous regions, so we copy data
    175                                  from the source to the destination device in
    176                                  batches of this many regions.
    177 ================================ ==============================================
    178
    179Status
    180------
    181
    182  ::
    183
    184   <metadata block size> <#used metadata blocks>/<#total metadata blocks>
    185   <region size> <#hydrated regions>/<#total regions> <#hydrating regions>
    186   <#feature args> <feature args>* <#core args> <core args>*
    187   <clone metadata mode>
    188
    189 ======================= =======================================================
    190 metadata block size     Fixed block size for each metadata block in sectors
    191 #used metadata blocks   Number of metadata blocks used
    192 #total metadata blocks  Total number of metadata blocks
    193 region size             Configurable region size for the device in sectors
    194 #hydrated regions       Number of regions that have finished hydrating
    195 #total regions          Total number of regions to hydrate
    196 #hydrating regions      Number of regions currently hydrating
    197 #feature args           Number of feature arguments to follow
    198 feature args            Feature arguments, e.g. `no_hydration`
    199 #core args              Even number of core arguments to follow
    200 core args               Key/value pairs for tuning the core, e.g.
    201                         `hydration_threshold 256`
    202 clone metadata mode     ro if read-only, rw if read-write
    203
    204                         In serious cases where even a read-only mode is deemed
    205                         unsafe no further I/O will be permitted and the status
    206                         will just contain the string 'Fail'. If the metadata
    207                         mode changes, a dm event will be sent to user space.
    208 ======================= =======================================================
    209
    210Messages
    211--------
    212
    213  `disable_hydration`
    214      Disable the background hydration of the destination device.
    215
    216  `enable_hydration`
    217      Enable the background hydration of the destination device.
    218
    219  `hydration_threshold <#regions>`
    220      Set background hydration threshold.
    221
    222  `hydration_batch_size <#regions>`
    223      Set background hydration batch size.
    224
    225Examples
    226========
    227
    228Clone a device containing a file system
    229---------------------------------------
    230
    2311. Create the dm-clone device.
    232
    233   ::
    234
    235    dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \
    236      $source_dev 8 1 no_hydration"
    237
    2382. Mount the device and trim the file system. dm-clone interprets the discards
    239   sent by the file system and it will not hydrate the unused space.
    240
    241   ::
    242
    243    mount /dev/mapper/clone /mnt/cloned-fs
    244    fstrim /mnt/cloned-fs
    245
    2463. Enable background hydration of the destination device.
    247
    248   ::
    249
    250    dmsetup message clone 0 enable_hydration
    251
    2524. When the hydration finishes, we can replace the dm-clone table with a linear
    253   table.
    254
    255   ::
    256
    257    dmsetup suspend clone
    258    dmsetup load clone --table "0 1048576000 linear $dest_dev 0"
    259    dmsetup resume clone
    260
    261   The metadata device is no longer needed and can be safely discarded or reused
    262   for other purposes.
    263
    264Known issues
    265============
    266
    2671. We redirect reads, to not-yet-hydrated regions, to the source device. If
    268   reading the source device has high latency and the user repeatedly reads from
    269   the same regions, this behaviour could degrade performance. We should use
    270   these reads as hints to hydrate the relevant regions sooner. Currently, we
    271   rely on the page cache to cache these regions, so we hopefully don't end up
    272   reading them multiple times from the source device.
    273
    2742. Release in-core resources, i.e., the bitmaps tracking which regions are
    275   hydrated, after the hydration has finished.
    276
    2773. During background hydration, if we fail to read the source or write to the
    278   destination device, we print an error message, but the hydration process
    279   continues indefinitely, until it succeeds. We should stop the background
    280   hydration after a number of failures and emit a dm event for user space to
    281   notice.
    282
    283Why not...?
    284===========
    285
    286We explored the following alternatives before implementing dm-clone:
    287
    2881. Use dm-cache with cache size equal to the source device and implement a new
    289   cloning policy:
    290
    291   * The resulting cache device is not a one-to-one mirror of the source device
    292     and thus we cannot remove the cache device once cloning completes.
    293
    294   * dm-cache writes to the source device, which violates our requirement that
    295     the source device must be treated as read-only.
    296
    297   * Caching is semantically different from cloning.
    298
    2992. Use dm-snapshot with a COW device equal to the source device:
    300
    301   * dm-snapshot stores its metadata in the COW device, so the resulting device
    302     is not a one-to-one mirror of the source device.
    303
    304   * No background copying mechanism.
    305
    306   * dm-snapshot needs to commit its metadata whenever a pending exception
    307     completes, to ensure snapshot consistency. In the case of cloning, we don't
    308     need to be so strict and can rely on committing metadata every time a FLUSH
    309     or FUA bio is written, or periodically, like dm-thin and dm-cache do. This
    310     improves the performance significantly.
    311
    3123. Use dm-mirror: The mirror target has a background copying/mirroring
    313   mechanism, but it writes to all mirrors, thus violating our requirement that
    314   the source device must be treated as read-only.
    315
    3164. Use dm-thin's external snapshot functionality. This approach is the most
    317   promising among all alternatives, as the thinly-provisioned volume is a
    318   one-to-one mirror of the source device and handles reads and writes to
    319   un-provisioned/not-yet-cloned areas the same way as dm-clone does.
    320
    321   Still:
    322
    323   * There is no background copying mechanism, though one could be implemented.
    324
    325   * Most importantly, we want to support arbitrary block devices as the
    326     destination of the cloning process and not restrict ourselves to
    327     thinly-provisioned volumes. Thin-provisioning has an inherent metadata
    328     overhead, for maintaining the thin volume mappings, which significantly
    329     degrades performance.
    330
    331   Moreover, cloning a device shouldn't force the use of thin-provisioning. On
    332   the other hand, if we wish to use thin provisioning, we can just use a thin
    333   LV as dm-clone's destination device.