cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

dm-zoned.rst (8453B)


      1========
      2dm-zoned
      3========
      4
      5The dm-zoned device mapper target exposes a zoned block device (ZBC and
      6ZAC compliant devices) as a regular block device without any write
      7pattern constraints. In effect, it implements a drive-managed zoned
      8block device which hides from the user (a file system or an application
      9doing raw block device accesses) the sequential write constraints of
     10host-managed zoned block devices and can mitigate the potential
     11device-side performance degradation due to excessive random writes on
     12host-aware zoned block devices.
     13
     14For a more detailed description of the zoned block device models and
     15their constraints see (for SCSI devices):
     16
     17https://www.t10.org/drafts.htm#ZBC_Family
     18
     19and (for ATA devices):
     20
     21http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
     22
     23The dm-zoned implementation is simple and minimizes system overhead (CPU
     24and memory usage as well as storage capacity loss). For a 10TB
     25host-managed disk with 256 MB zones, dm-zoned memory usage per disk
     26instance is at most 4.5 MB and as little as 5 zones will be used
     27internally for storing metadata and performing reclaim operations.
     28
     29dm-zoned target devices are formatted and checked using the dmzadm
     30utility available at:
     31
     32https://github.com/hgst/dm-zoned-tools
     33
     34Algorithm
     35=========
     36
     37dm-zoned implements an on-disk buffering scheme to handle non-sequential
     38write accesses to the sequential zones of a zoned block device.
     39Conventional zones are used for caching as well as for storing internal
     40metadata. It can also use a regular block device together with the zoned
     41block device; in that case the regular block device will be split logically
     42in zones with the same size as the zoned block device. These zones will be
     43placed in front of the zones from the zoned block device and will be handled
     44just like conventional zones.
     45
     46The zones of the device(s) are separated into 2 types:
     47
     481) Metadata zones: these are conventional zones used to store metadata.
     49Metadata zones are not reported as useable capacity to the user.
     50
     512) Data zones: all remaining zones, the vast majority of which will be
     52sequential zones used exclusively to store user data. The conventional
     53zones of the device may be used also for buffering user random writes.
     54Data in these zones may be directly mapped to the conventional zone, but
     55later moved to a sequential zone so that the conventional zone can be
     56reused for buffering incoming random writes.
     57
     58dm-zoned exposes a logical device with a sector size of 4096 bytes,
     59irrespective of the physical sector size of the backend zoned block
     60device being used. This allows reducing the amount of metadata needed to
     61manage valid blocks (blocks written).
     62
     63The on-disk metadata format is as follows:
     64
     651) The first block of the first conventional zone found contains the
     66super block which describes the on disk amount and position of metadata
     67blocks.
     68
     692) Following the super block, a set of blocks is used to describe the
     70mapping of the logical device blocks. The mapping is done per chunk of
     71blocks, with the chunk size equal to the zoned block device size. The
     72mapping table is indexed by chunk number and each mapping entry
     73indicates the zone number of the device storing the chunk of data. Each
     74mapping entry may also indicate if the zone number of a conventional
     75zone used to buffer random modification to the data zone.
     76
     773) A set of blocks used to store bitmaps indicating the validity of
     78blocks in the data zones follows the mapping table. A valid block is
     79defined as a block that was written and not discarded. For a buffered
     80data chunk, a block is always valid only in the data zone mapping the
     81chunk or in the buffer zone of the chunk.
     82
     83For a logical chunk mapped to a conventional zone, all write operations
     84are processed by directly writing to the zone. If the mapping zone is a
     85sequential zone, the write operation is processed directly only if the
     86write offset within the logical chunk is equal to the write pointer
     87offset within of the sequential data zone (i.e. the write operation is
     88aligned on the zone write pointer). Otherwise, write operations are
     89processed indirectly using a buffer zone. In that case, an unused
     90conventional zone is allocated and assigned to the chunk being
     91accessed. Writing a block to the buffer zone of a chunk will
     92automatically invalidate the same block in the sequential zone mapping
     93the chunk. If all blocks of the sequential zone become invalid, the zone
     94is freed and the chunk buffer zone becomes the primary zone mapping the
     95chunk, resulting in native random write performance similar to a regular
     96block device.
     97
     98Read operations are processed according to the block validity
     99information provided by the bitmaps. Valid blocks are read either from
    100the sequential zone mapping a chunk, or if the chunk is buffered, from
    101the buffer zone assigned. If the accessed chunk has no mapping, or the
    102accessed blocks are invalid, the read buffer is zeroed and the read
    103operation terminated.
    104
    105After some time, the limited number of conventional zones available may
    106be exhausted (all used to map chunks or buffer sequential zones) and
    107unaligned writes to unbuffered chunks become impossible. To avoid this
    108situation, a reclaim process regularly scans used conventional zones and
    109tries to reclaim the least recently used zones by copying the valid
    110blocks of the buffer zone to a free sequential zone. Once the copy
    111completes, the chunk mapping is updated to point to the sequential zone
    112and the buffer zone freed for reuse.
    113
    114Metadata Protection
    115===================
    116
    117To protect metadata against corruption in case of sudden power loss or
    118system crash, 2 sets of metadata zones are used. One set, the primary
    119set, is used as the main metadata region, while the secondary set is
    120used as a staging area. Modified metadata is first written to the
    121secondary set and validated by updating the super block in the secondary
    122set, a generation counter is used to indicate that this set contains the
    123newest metadata. Once this operation completes, in place of metadata
    124block updates can be done in the primary metadata set. This ensures that
    125one of the set is always consistent (all modifications committed or none
    126at all). Flush operations are used as a commit point. Upon reception of
    127a flush request, metadata modification activity is temporarily blocked
    128(for both incoming BIO processing and reclaim process) and all dirty
    129metadata blocks are staged and updated. Normal operation is then
    130resumed. Flushing metadata thus only temporarily delays write and
    131discard requests. Read requests can be processed concurrently while
    132metadata flush is being executed.
    133
    134If a regular device is used in conjunction with the zoned block device,
    135a third set of metadata (without the zone bitmaps) is written to the
    136start of the zoned block device. This metadata has a generation counter of
    137'0' and will never be updated during normal operation; it just serves for
    138identification purposes. The first and second copy of the metadata
    139are located at the start of the regular block device.
    140
    141Usage
    142=====
    143
    144A zoned block device must first be formatted using the dmzadm tool. This
    145will analyze the device zone configuration, determine where to place the
    146metadata sets on the device and initialize the metadata sets.
    147
    148Ex::
    149
    150	dmzadm --format /dev/sdxx
    151
    152
    153If two drives are to be used, both devices must be specified, with the
    154regular block device as the first device.
    155
    156Ex::
    157
    158	dmzadm --format /dev/sdxx /dev/sdyy
    159
    160
    161Formatted device(s) can be started with the dmzadm utility, too.:
    162
    163Ex::
    164
    165	dmzadm --start /dev/sdxx /dev/sdyy
    166
    167
    168Information about the internal layout and current usage of the zones can
    169be obtained with the 'status' callback from dmsetup:
    170
    171Ex::
    172
    173	dmsetup status /dev/dm-X
    174
    175will return a line
    176
    177	0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
    178
    179where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
    180of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
    181<nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
    182total number of sequential zones.
    183
    184Normally the reclaim process will be started once there are less than 50
    185percent free random zones. In order to start the reclaim process manually
    186even before reaching this threshold the 'dmsetup message' function can be
    187used:
    188
    189Ex::
    190
    191	dmsetup message /dev/dm-X 0 reclaim
    192
    193will start the reclaim process and random zones will be moved to sequential
    194zones.