cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

switch.rst (5564B)


      1=========
      2dm-switch
      3=========
      4
      5The device-mapper switch target creates a device that supports an
      6arbitrary mapping of fixed-size regions of I/O across a fixed set of
      7paths.  The path used for any specific region can be switched
      8dynamically by sending the target a message.
      9
     10It maps I/O to underlying block devices efficiently when there is a large
     11number of fixed-sized address regions but there is no simple pattern
     12that would allow for a compact representation of the mapping such as
     13dm-stripe.
     14
     15Background
     16----------
     17
     18Dell EqualLogic and some other iSCSI storage arrays use a distributed
     19frameless architecture.  In this architecture, the storage group
     20consists of a number of distinct storage arrays ("members") each having
     21independent controllers, disk storage and network adapters.  When a LUN
     22is created it is spread across multiple members.  The details of the
     23spreading are hidden from initiators connected to this storage system.
     24The storage group exposes a single target discovery portal, no matter
     25how many members are being used.  When iSCSI sessions are created, each
     26session is connected to an eth port on a single member.  Data to a LUN
     27can be sent on any iSCSI session, and if the blocks being accessed are
     28stored on another member the I/O will be forwarded as required.  This
     29forwarding is invisible to the initiator.  The storage layout is also
     30dynamic, and the blocks stored on disk may be moved from member to
     31member as needed to balance the load.
     32
     33This architecture simplifies the management and configuration of both
     34the storage group and initiators.  In a multipathing configuration, it
     35is possible to set up multiple iSCSI sessions to use multiple network
     36interfaces on both the host and target to take advantage of the
     37increased network bandwidth.  An initiator could use a simple round
     38robin algorithm to send I/O across all paths and let the storage array
     39members forward it as necessary, but there is a performance advantage to
     40sending data directly to the correct member.
     41
     42A device-mapper table already lets you map different regions of a
     43device onto different targets.  However in this architecture the LUN is
     44spread with an address region size on the order of 10s of MBs, which
     45means the resulting table could have more than a million entries and
     46consume far too much memory.
     47
     48Using this device-mapper switch target we can now build a two-layer
     49device hierarchy:
     50
     51    Upper Tier - Determine which array member the I/O should be sent to.
     52    Lower Tier - Load balance amongst paths to a particular member.
     53
     54The lower tier consists of a single dm multipath device for each member.
     55Each of these multipath devices contains the set of paths directly to
     56the array member in one priority group, and leverages existing path
     57selectors to load balance amongst these paths.  We also build a
     58non-preferred priority group containing paths to other array members for
     59failover reasons.
     60
     61The upper tier consists of a single dm-switch device.  This device uses
     62a bitmap to look up the location of the I/O and choose the appropriate
     63lower tier device to route the I/O.  By using a bitmap we are able to
     64use 4 bits for each address range in a 16 member group (which is very
     65large for us).  This is a much denser representation than the dm table
     66b-tree can achieve.
     67
     68Construction Parameters
     69=======================
     70
     71    <num_paths> <region_size> <num_optional_args> [<optional_args>...] [<dev_path> <offset>]+
     72	<num_paths>
     73	    The number of paths across which to distribute the I/O.
     74
     75	<region_size>
     76	    The number of 512-byte sectors in a region. Each region can be redirected
     77	    to any of the available paths.
     78
     79	<num_optional_args>
     80	    The number of optional arguments. Currently, no optional arguments
     81	    are supported and so this must be zero.
     82
     83	<dev_path>
     84	    The block device that represents a specific path to the device.
     85
     86	<offset>
     87	    The offset of the start of data on the specific <dev_path> (in units
     88	    of 512-byte sectors). This number is added to the sector number when
     89	    forwarding the request to the specific path. Typically it is zero.
     90
     91Messages
     92========
     93
     94set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
     95
     96Modify the region table by specifying which regions are redirected to
     97which paths.
     98
     99<index>
    100    The region number (region size was specified in constructor parameters).
    101    If index is omitted, the next region (previous index + 1) is used.
    102    Expressed in hexadecimal (WITHOUT any prefix like 0x).
    103
    104<path_nr>
    105    The path number in the range 0 ... (<num_paths> - 1).
    106    Expressed in hexadecimal (WITHOUT any prefix like 0x).
    107
    108R<n>,<m>
    109    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
    110    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
    111    slots.
    112
    113Status
    114======
    115
    116No status line is reported.
    117
    118Example
    119=======
    120
    121Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
    122the same size.
    123
    124Create a switch device with 64kB region size::
    125
    126    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
    127	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
    128
    129Set mappings for the first 7 entries to point to devices switch0, switch1,
    130switch2, switch0, switch1, switch2, switch1::
    131
    132    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
    133
    134Set repetitive mapping. This command::
    135
    136    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
    137
    138is equivalent to::
    139
    140    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
    141	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2