cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

cxlflash.rst (22138B)


      1================================
      2Coherent Accelerator (CXL) Flash
      3================================
      4
      5Introduction
      6============
      7
      8    The IBM Power architecture provides support for CAPI (Coherent
      9    Accelerator Power Interface), which is available to certain PCIe slots
     10    on Power 8 systems. CAPI can be thought of as a special tunneling
     11    protocol through PCIe that allow PCIe adapters to look like special
     12    purpose co-processors which can read or write an application's
     13    memory and generate page faults. As a result, the host interface to
     14    an adapter running in CAPI mode does not require the data buffers to
     15    be mapped to the device's memory (IOMMU bypass) nor does it require
     16    memory to be pinned.
     17
     18    On Linux, Coherent Accelerator (CXL) kernel services present CAPI
     19    devices as a PCI device by implementing a virtual PCI host bridge.
     20    This abstraction simplifies the infrastructure and programming
     21    model, allowing for drivers to look similar to other native PCI
     22    device drivers.
     23
     24    CXL provides a mechanism by which user space applications can
     25    directly talk to a device (network or storage) bypassing the typical
     26    kernel/device driver stack. The CXL Flash Adapter Driver enables a
     27    user space application direct access to Flash storage.
     28
     29    The CXL Flash Adapter Driver is a kernel module that sits in the
     30    SCSI stack as a low level device driver (below the SCSI disk and
     31    protocol drivers) for the IBM CXL Flash Adapter. This driver is
     32    responsible for the initialization of the adapter, setting up the
     33    special path for user space access, and performing error recovery. It
     34    communicates directly the Flash Accelerator Functional Unit (AFU)
     35    as described in Documentation/powerpc/cxl.rst.
     36
     37    The cxlflash driver supports two, mutually exclusive, modes of
     38    operation at the device (LUN) level:
     39
     40        - Any flash device (LUN) can be configured to be accessed as a
     41          regular disk device (i.e.: /dev/sdc). This is the default mode.
     42
     43        - Any flash device (LUN) can be configured to be accessed from
     44          user space with a special block library. This mode further
     45          specifies the means of accessing the device and provides for
     46          either raw access to the entire LUN (referred to as direct
     47          or physical LUN access) or access to a kernel/AFU-mediated
     48          partition of the LUN (referred to as virtual LUN access). The
     49          segmentation of a disk device into virtual LUNs is assisted
     50          by special translation services provided by the Flash AFU.
     51
     52Overview
     53========
     54
     55    The Coherent Accelerator Interface Architecture (CAIA) introduces a
     56    concept of a master context. A master typically has special privileges
     57    granted to it by the kernel or hypervisor allowing it to perform AFU
     58    wide management and control. The master may or may not be involved
     59    directly in each user I/O, but at the minimum is involved in the
     60    initial setup before the user application is allowed to send requests
     61    directly to the AFU.
     62
     63    The CXL Flash Adapter Driver establishes a master context with the
     64    AFU. It uses memory mapped I/O (MMIO) for this control and setup. The
     65    Adapter Problem Space Memory Map looks like this::
     66
     67                     +-------------------------------+
     68                     |    512 * 64 KB User MMIO      |
     69                     |        (per context)          |
     70                     |       User Accessible         |
     71                     +-------------------------------+
     72                     |    512 * 128 B per context    |
     73                     |    Provisioning and Control   |
     74                     |   Trusted Process accessible  |
     75                     +-------------------------------+
     76                     |         64 KB Global          |
     77                     |   Trusted Process accessible  |
     78                     +-------------------------------+
     79
     80    This driver configures itself into the SCSI software stack as an
     81    adapter driver. The driver is the only entity that is considered a
     82    Trusted Process to program the Provisioning and Control and Global
     83    areas in the MMIO Space shown above.  The master context driver
     84    discovers all LUNs attached to the CXL Flash adapter and instantiates
     85    scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN
     86    seen from each path.
     87
     88    Once these scsi block devices are instantiated, an application
     89    written to a specification provided by the block library may get
     90    access to the Flash from user space (without requiring a system call).
     91
     92    This master context driver also provides a series of ioctls for this
     93    block library to enable this user space access.  The driver supports
     94    two modes for accessing the block device.
     95
     96    The first mode is called a virtual mode. In this mode a single scsi
     97    block device (/dev/sdb) may be carved up into any number of distinct
     98    virtual LUNs. The virtual LUNs may be resized as long as the sum of
     99    the sizes of all the virtual LUNs, along with the meta-data associated
    100    with it does not exceed the physical capacity.
    101
    102    The second mode is called the physical mode. In this mode a single
    103    block device (/dev/sdb) may be opened directly by the block library
    104    and the entire space for the LUN is available to the application.
    105
    106    Only the physical mode provides persistence of the data.  i.e. The
    107    data written to the block device will survive application exit and
    108    restart and also reboot. The virtual LUNs do not persist (i.e. do
    109    not survive after the application terminates or the system reboots).
    110
    111
    112Block library API
    113=================
    114
    115    Applications intending to get access to the CXL Flash from user
    116    space should use the block library, as it abstracts the details of
    117    interfacing directly with the cxlflash driver that are necessary for
    118    performing administrative actions (i.e.: setup, tear down, resize).
    119    The block library can be thought of as a 'user' of services,
    120    implemented as IOCTLs, that are provided by the cxlflash driver
    121    specifically for devices (LUNs) operating in user space access
    122    mode. While it is not a requirement that applications understand
    123    the interface between the block library and the cxlflash driver,
    124    a high-level overview of each supported service (IOCTL) is provided
    125    below.
    126
    127    The block library can be found on GitHub:
    128    http://github.com/open-power/capiflash
    129
    130
    131CXL Flash Driver LUN IOCTLs
    132===========================
    133
    134    Users, such as the block library, that wish to interface with a flash
    135    device (LUN) via user space access need to use the services provided
    136    by the cxlflash driver. As these services are implemented as ioctls,
    137    a file descriptor handle must first be obtained in order to establish
    138    the communication channel between a user and the kernel.  This file
    139    descriptor is obtained by opening the device special file associated
    140    with the scsi disk device (/dev/sdb) that was created during LUN
    141    discovery. As per the location of the cxlflash driver within the
    142    SCSI protocol stack, this open is actually not seen by the cxlflash
    143    driver. Upon successful open, the user receives a file descriptor
    144    (herein referred to as fd1) that should be used for issuing the
    145    subsequent ioctls listed below.
    146
    147    The structure definitions for these IOCTLs are available in:
    148    uapi/scsi/cxlflash_ioctl.h
    149
    150DK_CXLFLASH_ATTACH
    151------------------
    152
    153    This ioctl obtains, initializes, and starts a context using the CXL
    154    kernel services. These services specify a context id (u16) by which
    155    to uniquely identify the context and its allocated resources. The
    156    services additionally provide a second file descriptor (herein
    157    referred to as fd2) that is used by the block library to initiate
    158    memory mapped I/O (via mmap()) to the CXL flash device and poll for
    159    completion events. This file descriptor is intentionally installed by
    160    this driver and not the CXL kernel services to allow for intermediary
    161    notification and access in the event of a non-user-initiated close(),
    162    such as a killed process. This design point is described in further
    163    detail in the description for the DK_CXLFLASH_DETACH ioctl.
    164
    165    There are a few important aspects regarding the "tokens" (context id
    166    and fd2) that are provided back to the user:
    167
    168        - These tokens are only valid for the process under which they
    169          were created. The child of a forked process cannot continue
    170          to use the context id or file descriptor created by its parent
    171          (see DK_CXLFLASH_VLUN_CLONE for further details).
    172
    173        - These tokens are only valid for the lifetime of the context and
    174          the process under which they were created. Once either is
    175          destroyed, the tokens are to be considered stale and subsequent
    176          usage will result in errors.
    177
    178	- A valid adapter file descriptor (fd2 >= 0) is only returned on
    179	  the initial attach for a context. Subsequent attaches to an
    180	  existing context (DK_CXLFLASH_ATTACH_REUSE_CONTEXT flag present)
    181	  do not provide the adapter file descriptor as it was previously
    182	  made known to the application.
    183
    184        - When a context is no longer needed, the user shall detach from
    185          the context via the DK_CXLFLASH_DETACH ioctl. When this ioctl
    186	  returns with a valid adapter file descriptor and the return flag
    187	  DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
    188	  close the adapter file descriptor following a successful detach.
    189
    190	- When this ioctl returns with a valid fd2 and the return flag
    191	  DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
    192	  close fd2 in the following circumstances:
    193
    194		+ Following a successful detach of the last user of the context
    195		+ Following a successful recovery on the context's original fd2
    196		+ In the child process of a fork(), following a clone ioctl,
    197		  on the fd2 associated with the source context
    198
    199        - At any time, a close on fd2 will invalidate the tokens. Applications
    200	  should exercise caution to only close fd2 when appropriate (outlined
    201	  in the previous bullet) to avoid premature loss of I/O.
    202
    203DK_CXLFLASH_USER_DIRECT
    204-----------------------
    205    This ioctl is responsible for transitioning the LUN to direct
    206    (physical) mode access and configuring the AFU for direct access from
    207    user space on a per-context basis. Additionally, the block size and
    208    last logical block address (LBA) are returned to the user.
    209
    210    As mentioned previously, when operating in user space access mode,
    211    LUNs may be accessed in whole or in part. Only one mode is allowed
    212    at a time and if one mode is active (outstanding references exist),
    213    requests to use the LUN in a different mode are denied.
    214
    215    The AFU is configured for direct access from user space by adding an
    216    entry to the AFU's resource handle table. The index of the entry is
    217    treated as a resource handle that is returned to the user. The user
    218    is then able to use the handle to reference the LUN during I/O.
    219
    220DK_CXLFLASH_USER_VIRTUAL
    221------------------------
    222    This ioctl is responsible for transitioning the LUN to virtual mode
    223    of access and configuring the AFU for virtual access from user space
    224    on a per-context basis. Additionally, the block size and last logical
    225    block address (LBA) are returned to the user.
    226
    227    As mentioned previously, when operating in user space access mode,
    228    LUNs may be accessed in whole or in part. Only one mode is allowed
    229    at a time and if one mode is active (outstanding references exist),
    230    requests to use the LUN in a different mode are denied.
    231
    232    The AFU is configured for virtual access from user space by adding
    233    an entry to the AFU's resource handle table. The index of the entry
    234    is treated as a resource handle that is returned to the user. The
    235    user is then able to use the handle to reference the LUN during I/O.
    236
    237    By default, the virtual LUN is created with a size of 0. The user
    238    would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
    239    the virtual LUN to a desired size. To avoid having to perform this
    240    resize for the initial creation of the virtual LUN, the user has the
    241    option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
    242    ioctl, such that when success is returned to the user, the
    243    resource handle that is provided is already referencing provisioned
    244    storage. This is reflected by the last LBA being a non-zero value.
    245
    246    When a LUN is accessible from more than one port, this ioctl will
    247    return with the DK_CXLFLASH_ALL_PORTS_ACTIVE return flag set. This
    248    provides the user with a hint that I/O can be retried in the event
    249    of an I/O error as the LUN can be reached over multiple paths.
    250
    251DK_CXLFLASH_VLUN_RESIZE
    252-----------------------
    253    This ioctl is responsible for resizing a previously created virtual
    254    LUN and will fail if invoked upon a LUN that is not in virtual
    255    mode. Upon success, an updated last LBA is returned to the user
    256    indicating the new size of the virtual LUN associated with the
    257    resource handle.
    258
    259    The partitioning of virtual LUNs is jointly mediated by the cxlflash
    260    driver and the AFU. An allocation table is kept for each LUN that is
    261    operating in the virtual mode and used to program a LUN translation
    262    table that the AFU references when provided with a resource handle.
    263
    264    This ioctl can return -EAGAIN if an AFU sync operation takes too long.
    265    In addition to returning a failure to user, cxlflash will also schedule
    266    an asynchronous AFU reset. Should the user choose to retry the operation,
    267    it is expected to succeed. If this ioctl fails with -EAGAIN, the user
    268    can either retry the operation or treat it as a failure.
    269
    270DK_CXLFLASH_RELEASE
    271-------------------
    272    This ioctl is responsible for releasing a previously obtained
    273    reference to either a physical or virtual LUN. This can be
    274    thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
    275    DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
    276    is no longer valid and the entry in the resource handle table is
    277    made available to be used again.
    278
    279    As part of the release process for virtual LUNs, the virtual LUN
    280    is first resized to 0 to clear out and free the translation tables
    281    associated with the virtual LUN reference.
    282
    283DK_CXLFLASH_DETACH
    284------------------
    285    This ioctl is responsible for unregistering a context with the
    286    cxlflash driver and release outstanding resources that were
    287    not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon
    288    success, all "tokens" which had been provided to the user from the
    289    DK_CXLFLASH_ATTACH onward are no longer valid.
    290
    291    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
    292    attach, the application _must_ close the fd2 associated with the context
    293    following the detach of the final user of the context.
    294
    295DK_CXLFLASH_VLUN_CLONE
    296----------------------
    297    This ioctl is responsible for cloning a previously created
    298    context to a more recently created context. It exists solely to
    299    support maintaining user space access to storage after a process
    300    forks. Upon success, the child process (which invoked the ioctl)
    301    will have access to the same LUNs via the same resource handle(s)
    302    as the parent, but under a different context.
    303
    304    Context sharing across processes is not supported with CXL and
    305    therefore each fork must be met with establishing a new context
    306    for the child process. This ioctl simplifies the state management
    307    and playback required by a user in such a scenario. When a process
    308    forks, child process can clone the parents context by first creating
    309    a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to
    310    perform the clone from the parent to the child.
    311
    312    The clone itself is fairly simple. The resource handle and lun
    313    translation tables are copied from the parent context to the child's
    314    and then synced with the AFU.
    315
    316    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
    317    attach, the application _must_ close the fd2 associated with the source
    318    context (still resident/accessible in the parent process) following the
    319    clone. This is to avoid a stale entry in the file descriptor table of the
    320    child process.
    321
    322    This ioctl can return -EAGAIN if an AFU sync operation takes too long.
    323    In addition to returning a failure to user, cxlflash will also schedule
    324    an asynchronous AFU reset. Should the user choose to retry the operation,
    325    it is expected to succeed. If this ioctl fails with -EAGAIN, the user
    326    can either retry the operation or treat it as a failure.
    327
    328DK_CXLFLASH_VERIFY
    329------------------
    330    This ioctl is used to detect various changes such as the capacity of
    331    the disk changing, the number of LUNs visible changing, etc. In cases
    332    where the changes affect the application (such as a LUN resize), the
    333    cxlflash driver will report the changed state to the application.
    334
    335    The user calls in when they want to validate that a LUN hasn't been
    336    changed in response to a check condition. As the user is operating out
    337    of band from the kernel, they will see these types of events without
    338    the kernel's knowledge. When encountered, the user's architected
    339    behavior is to call in to this ioctl, indicating what they want to
    340    verify and passing along any appropriate information. For now, only
    341    verifying a LUN change (ie: size different) with sense data is
    342    supported.
    343
    344DK_CXLFLASH_RECOVER_AFU
    345-----------------------
    346    This ioctl is used to drive recovery (if such an action is warranted)
    347    of a specified user context. Any state associated with the user context
    348    is re-established upon successful recovery.
    349
    350    User contexts are put into an error condition when the device needs to
    351    be reset or is terminating. Users are notified of this error condition
    352    by seeing all 0xF's on an MMIO read. Upon encountering this, the
    353    architected behavior for a user is to call into this ioctl to recover
    354    their context. A user may also call into this ioctl at any time to
    355    check if the device is operating normally. If a failure is returned
    356    from this ioctl, the user is expected to gracefully clean up their
    357    context via release/detach ioctls. Until they do, the context they
    358    hold is not relinquished. The user may also optionally exit the process
    359    at which time the context/resources they held will be freed as part of
    360    the release fop.
    361
    362    When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
    363    attach, the application _must_ unmap and close the fd2 associated with the
    364    original context following this ioctl returning success and indicating that
    365    the context was recovered (DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET).
    366
    367DK_CXLFLASH_MANAGE_LUN
    368----------------------
    369    This ioctl is used to switch a LUN from a mode where it is available
    370    for file-system access (legacy), to a mode where it is set aside for
    371    exclusive user space access (superpipe). In case a LUN is visible
    372    across multiple ports and adapters, this ioctl is used to uniquely
    373    identify each LUN by its World Wide Node Name (WWNN).
    374
    375
    376CXL Flash Driver Host IOCTLs
    377============================
    378
    379    Each host adapter instance that is supported by the cxlflash driver
    380    has a special character device associated with it to enable a set of
    381    host management function. These character devices are hosted in a
    382    class dedicated for cxlflash and can be accessed via `/dev/cxlflash/*`.
    383
    384    Applications can be written to perform various functions using the
    385    host ioctl APIs below.
    386
    387    The structure definitions for these IOCTLs are available in:
    388    uapi/scsi/cxlflash_ioctl.h
    389
    390HT_CXLFLASH_LUN_PROVISION
    391-------------------------
    392    This ioctl is used to create and delete persistent LUNs on cxlflash
    393    devices that lack an external LUN management interface. It is only
    394    valid when used with AFUs that support the LUN provision capability.
    395
    396    When sufficient space is available, LUNs can be created by specifying
    397    the target port to host the LUN and a desired size in 4K blocks. Upon
    398    success, the LUN ID and WWID of the created LUN will be returned and
    399    the SCSI bus can be scanned to detect the change in LUN topology. Note
    400    that partial allocations are not supported. Should a creation fail due
    401    to a space issue, the target port can be queried for its current LUN
    402    geometry.
    403
    404    To remove a LUN, the device must first be disassociated from the Linux
    405    SCSI subsystem. The LUN deletion can then be initiated by specifying a
    406    target port and LUN ID. Upon success, the LUN geometry associated with
    407    the port will be updated to reflect new number of provisioned LUNs and
    408    available capacity.
    409
    410    To query the LUN geometry of a port, the target port is specified and
    411    upon success, the following information is presented:
    412
    413        - Maximum number of provisioned LUNs allowed for the port
    414        - Current number of provisioned LUNs for the port
    415        - Maximum total capacity of provisioned LUNs for the port (4K blocks)
    416        - Current total capacity of provisioned LUNs for the port (4K blocks)
    417
    418    With this information, the number of available LUNs and capacity can be
    419    can be calculated.
    420
    421HT_CXLFLASH_AFU_DEBUG
    422---------------------
    423    This ioctl is used to debug AFUs by supporting a command pass-through
    424    interface. It is only valid when used with AFUs that support the AFU
    425    debug capability.
    426
    427    With exception of buffer management, AFU debug commands are opaque to
    428    cxlflash and treated as pass-through. For debug commands that do require
    429    data transfer, the user supplies an adequately sized data buffer and must
    430    specify the data transfer direction with respect to the host. There is a
    431    maximum transfer size of 256K imposed. Note that partial read completions
    432    are not supported - when errors are experienced with a host read data
    433    transfer, the data buffer is not copied back to the user.