cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

cxl.rst (16192B)


      1====================================
      2Coherent Accelerator Interface (CXL)
      3====================================
      4
      5Introduction
      6============
      7
      8    The coherent accelerator interface is designed to allow the
      9    coherent connection of accelerators (FPGAs and other devices) to a
     10    POWER system. These devices need to adhere to the Coherent
     11    Accelerator Interface Architecture (CAIA).
     12
     13    IBM refers to this as the Coherent Accelerator Processor Interface
     14    or CAPI. In the kernel it's referred to by the name CXL to avoid
     15    confusion with the ISDN CAPI subsystem.
     16
     17    Coherent in this context means that the accelerator and CPUs can
     18    both access system memory directly and with the same effective
     19    addresses.
     20
     21
     22Hardware overview
     23=================
     24
     25    ::
     26
     27         POWER8/9             FPGA
     28       +----------+        +---------+
     29       |          |        |         |
     30       |   CPU    |        |   AFU   |
     31       |          |        |         |
     32       |          |        |         |
     33       |          |        |         |
     34       +----------+        +---------+
     35       |   PHB    |        |         |
     36       |   +------+        |   PSL   |
     37       |   | CAPP |<------>|         |
     38       +---+------+  PCIE  +---------+
     39
     40    The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
     41    unit which is part of the PCIe Host Bridge (PHB). This is managed
     42    by Linux by calls into OPAL. Linux doesn't directly program the
     43    CAPP.
     44
     45    The FPGA (or coherently attached device) consists of two parts.
     46    The POWER Service Layer (PSL) and the Accelerator Function Unit
     47    (AFU). The AFU is used to implement specific functionality behind
     48    the PSL. The PSL, among other things, provides memory address
     49    translation services to allow each AFU direct access to userspace
     50    memory.
     51
     52    The AFU is the core part of the accelerator (eg. the compression,
     53    crypto etc function). The kernel has no knowledge of the function
     54    of the AFU. Only userspace interacts directly with the AFU.
     55
     56    The PSL provides the translation and interrupt services that the
     57    AFU needs. This is what the kernel interacts with. For example, if
     58    the AFU needs to read a particular effective address, it sends
     59    that address to the PSL, the PSL then translates it, fetches the
     60    data from memory and returns it to the AFU. If the PSL has a
     61    translation miss, it interrupts the kernel and the kernel services
     62    the fault. The context to which this fault is serviced is based on
     63    who owns that acceleration function.
     64
     65    - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
     66    - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
     67
     68    This PSL Version 9 provides new features such as:
     69
     70    * Interaction with the nest MMU on the P9 chip.
     71    * Native DMA support.
     72    * Supports sending ASB_Notify messages for host thread wakeup.
     73    * Supports Atomic operations.
     74    * etc.
     75
     76    Cards with a PSL9 won't work on a POWER8 system and cards with a
     77    PSL8 won't work on a POWER9 system.
     78
     79AFU Modes
     80=========
     81
     82    There are two programming modes supported by the AFU. Dedicated
     83    and AFU directed. AFU may support one or both modes.
     84
     85    When using dedicated mode only one MMU context is supported. In
     86    this mode, only one userspace process can use the accelerator at
     87    time.
     88
     89    When using AFU directed mode, up to 16K simultaneous contexts can
     90    be supported. This means up to 16K simultaneous userspace
     91    applications may use the accelerator (although specific AFUs may
     92    support fewer). In this mode, the AFU sends a 16 bit context ID
     93    with each of its requests. This tells the PSL which context is
     94    associated with each operation. If the PSL can't translate an
     95    operation, the ID can also be accessed by the kernel so it can
     96    determine the userspace context associated with an operation.
     97
     98
     99MMIO space
    100==========
    101
    102    A portion of the accelerator MMIO space can be directly mapped
    103    from the AFU to userspace. Either the whole space can be mapped or
    104    just a per context portion. The hardware is self describing, hence
    105    the kernel can determine the offset and size of the per context
    106    portion.
    107
    108
    109Interrupts
    110==========
    111
    112    AFUs may generate interrupts that are destined for userspace. These
    113    are received by the kernel as hardware interrupts and passed onto
    114    userspace by a read syscall documented below.
    115
    116    Data storage faults and error interrupts are handled by the kernel
    117    driver.
    118
    119
    120Work Element Descriptor (WED)
    121=============================
    122
    123    The WED is a 64-bit parameter passed to the AFU when a context is
    124    started. Its format is up to the AFU hence the kernel has no
    125    knowledge of what it represents. Typically it will be the
    126    effective address of a work queue or status block where the AFU
    127    and userspace can share control and status information.
    128
    129
    130
    131
    132User API
    133========
    134
    1351. AFU character devices
    136^^^^^^^^^^^^^^^^^^^^^^^^
    137
    138    For AFUs operating in AFU directed mode, two character device
    139    files will be created. /dev/cxl/afu0.0m will correspond to a
    140    master context and /dev/cxl/afu0.0s will correspond to a slave
    141    context. Master contexts have access to the full MMIO space an
    142    AFU provides. Slave contexts have access to only the per process
    143    MMIO space an AFU provides.
    144
    145    For AFUs operating in dedicated process mode, the driver will
    146    only create a single character device per AFU called
    147    /dev/cxl/afu0.0d. This will have access to the entire MMIO space
    148    that the AFU provides (like master contexts in AFU directed).
    149
    150    The types described below are defined in include/uapi/misc/cxl.h
    151
    152    The following file operations are supported on both slave and
    153    master devices.
    154
    155    A userspace library libcxl is available here:
    156
    157	https://github.com/ibm-capi/libcxl
    158
    159    This provides a C interface to this kernel API.
    160
    161open
    162----
    163
    164    Opens the device and allocates a file descriptor to be used with
    165    the rest of the API.
    166
    167    A dedicated mode AFU only has one context and only allows the
    168    device to be opened once.
    169
    170    An AFU directed mode AFU can have many contexts, the device can be
    171    opened once for each context that is available.
    172
    173    When all available contexts are allocated the open call will fail
    174    and return -ENOSPC.
    175
    176    Note:
    177	  IRQs need to be allocated for each context, which may limit
    178          the number of contexts that can be created, and therefore
    179          how many times the device can be opened. The POWER8 CAPP
    180          supports 2040 IRQs and 3 are used by the kernel, so 2037 are
    181          left. If 1 IRQ is needed per context, then only 2037
    182          contexts can be allocated. If 4 IRQs are needed per context,
    183          then only 2037/4 = 509 contexts can be allocated.
    184
    185
    186ioctl
    187-----
    188
    189    CXL_IOCTL_START_WORK:
    190        Starts the AFU context and associates it with the current
    191        process. Once this ioctl is successfully executed, all memory
    192        mapped into this process is accessible to this AFU context
    193        using the same effective addresses. No additional calls are
    194        required to map/unmap memory. The AFU memory context will be
    195        updated as userspace allocates and frees memory. This ioctl
    196        returns once the AFU context is started.
    197
    198        Takes a pointer to a struct cxl_ioctl_start_work
    199
    200            ::
    201
    202                struct cxl_ioctl_start_work {
    203                        __u64 flags;
    204                        __u64 work_element_descriptor;
    205                        __u64 amr;
    206                        __s16 num_interrupts;
    207                        __s16 reserved1;
    208                        __s32 reserved2;
    209                        __u64 reserved3;
    210                        __u64 reserved4;
    211                        __u64 reserved5;
    212                        __u64 reserved6;
    213                };
    214
    215            flags:
    216                Indicates which optional fields in the structure are
    217                valid.
    218
    219            work_element_descriptor:
    220                The Work Element Descriptor (WED) is a 64-bit argument
    221                defined by the AFU. Typically this is an effective
    222                address pointing to an AFU specific structure
    223                describing what work to perform.
    224
    225            amr:
    226                Authority Mask Register (AMR), same as the powerpc
    227                AMR. This field is only used by the kernel when the
    228                corresponding CXL_START_WORK_AMR value is specified in
    229                flags. If not specified the kernel will use a default
    230                value of 0.
    231
    232            num_interrupts:
    233                Number of userspace interrupts to request. This field
    234                is only used by the kernel when the corresponding
    235                CXL_START_WORK_NUM_IRQS value is specified in flags.
    236                If not specified the minimum number required by the
    237                AFU will be allocated. The min and max number can be
    238                obtained from sysfs.
    239
    240            reserved fields:
    241                For ABI padding and future extensions
    242
    243    CXL_IOCTL_GET_PROCESS_ELEMENT:
    244        Get the current context id, also known as the process element.
    245        The value is returned from the kernel as a __u32.
    246
    247
    248mmap
    249----
    250
    251    An AFU may have an MMIO space to facilitate communication with the
    252    AFU. If it does, the MMIO space can be accessed via mmap. The size
    253    and contents of this area are specific to the particular AFU. The
    254    size can be discovered via sysfs.
    255
    256    In AFU directed mode, master contexts are allowed to map all of
    257    the MMIO space and slave contexts are allowed to only map the per
    258    process MMIO space associated with the context. In dedicated
    259    process mode the entire MMIO space can always be mapped.
    260
    261    This mmap call must be done after the START_WORK ioctl.
    262
    263    Care should be taken when accessing MMIO space. Only 32 and 64-bit
    264    accesses are supported by POWER8. Also, the AFU will be designed
    265    with a specific endianness, so all MMIO accesses should consider
    266    endianness (recommend endian(3) variants like: le64toh(),
    267    be64toh() etc). These endian issues equally apply to shared memory
    268    queues the WED may describe.
    269
    270
    271read
    272----
    273
    274    Reads events from the AFU. Blocks if no events are pending
    275    (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
    276    unrecoverable error or if the card is removed.
    277
    278    read() will always return an integral number of events.
    279
    280    The buffer passed to read() must be at least 4K bytes.
    281
    282    The result of the read will be a buffer of one or more events,
    283    each event is of type struct cxl_event, of varying size::
    284
    285            struct cxl_event {
    286                    struct cxl_event_header header;
    287                    union {
    288                            struct cxl_event_afu_interrupt irq;
    289                            struct cxl_event_data_storage fault;
    290                            struct cxl_event_afu_error afu_error;
    291                    };
    292            };
    293
    294    The struct cxl_event_header is defined as
    295
    296        ::
    297
    298            struct cxl_event_header {
    299                    __u16 type;
    300                    __u16 size;
    301                    __u16 process_element;
    302                    __u16 reserved1;
    303            };
    304
    305        type:
    306            This defines the type of event. The type determines how
    307            the rest of the event is structured. These types are
    308            described below and defined by enum cxl_event_type.
    309
    310        size:
    311            This is the size of the event in bytes including the
    312            struct cxl_event_header. The start of the next event can
    313            be found at this offset from the start of the current
    314            event.
    315
    316        process_element:
    317            Context ID of the event.
    318
    319        reserved field:
    320            For future extensions and padding.
    321
    322    If the event type is CXL_EVENT_AFU_INTERRUPT then the event
    323    structure is defined as
    324
    325        ::
    326
    327            struct cxl_event_afu_interrupt {
    328                    __u16 flags;
    329                    __u16 irq; /* Raised AFU interrupt number */
    330                    __u32 reserved1;
    331            };
    332
    333        flags:
    334            These flags indicate which optional fields are present
    335            in this struct. Currently all fields are mandatory.
    336
    337        irq:
    338            The IRQ number sent by the AFU.
    339
    340        reserved field:
    341            For future extensions and padding.
    342
    343    If the event type is CXL_EVENT_DATA_STORAGE then the event
    344    structure is defined as
    345
    346        ::
    347
    348            struct cxl_event_data_storage {
    349                    __u16 flags;
    350                    __u16 reserved1;
    351                    __u32 reserved2;
    352                    __u64 addr;
    353                    __u64 dsisr;
    354                    __u64 reserved3;
    355            };
    356
    357        flags:
    358            These flags indicate which optional fields are present in
    359            this struct. Currently all fields are mandatory.
    360
    361        address:
    362            The address that the AFU unsuccessfully attempted to
    363            access. Valid accesses will be handled transparently by the
    364            kernel but invalid accesses will generate this event.
    365
    366        dsisr:
    367            This field gives information on the type of fault. It is a
    368            copy of the DSISR from the PSL hardware when the address
    369            fault occurred. The form of the DSISR is as defined in the
    370            CAIA.
    371
    372        reserved fields:
    373            For future extensions
    374
    375    If the event type is CXL_EVENT_AFU_ERROR then the event structure
    376    is defined as
    377
    378        ::
    379
    380            struct cxl_event_afu_error {
    381                    __u16 flags;
    382                    __u16 reserved1;
    383                    __u32 reserved2;
    384                    __u64 error;
    385            };
    386
    387        flags:
    388            These flags indicate which optional fields are present in
    389            this struct. Currently all fields are Mandatory.
    390
    391        error:
    392            Error status from the AFU. Defined by the AFU.
    393
    394        reserved fields:
    395            For future extensions and padding
    396
    397
    3982. Card character device (powerVM guest only)
    399^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    400
    401    In a powerVM guest, an extra character device is created for the
    402    card. The device is only used to write (flash) a new image on the
    403    FPGA accelerator. Once the image is written and verified, the
    404    device tree is updated and the card is reset to reload the updated
    405    image.
    406
    407open
    408----
    409
    410    Opens the device and allocates a file descriptor to be used with
    411    the rest of the API. The device can only be opened once.
    412
    413ioctl
    414-----
    415
    416CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
    417    Starts and controls flashing a new FPGA image. Partial
    418    reconfiguration is not supported (yet), so the image must contain
    419    a copy of the PSL and AFU(s). Since an image can be quite large,
    420    the caller may have to iterate, splitting the image in smaller
    421    chunks.
    422
    423    Takes a pointer to a struct cxl_adapter_image::
    424
    425        struct cxl_adapter_image {
    426            __u64 flags;
    427            __u64 data;
    428            __u64 len_data;
    429            __u64 len_image;
    430            __u64 reserved1;
    431            __u64 reserved2;
    432            __u64 reserved3;
    433            __u64 reserved4;
    434        };
    435
    436    flags:
    437        These flags indicate which optional fields are present in
    438        this struct. Currently all fields are mandatory.
    439
    440    data:
    441        Pointer to a buffer with part of the image to write to the
    442        card.
    443
    444    len_data:
    445        Size of the buffer pointed to by data.
    446
    447    len_image:
    448        Full size of the image.
    449
    450
    451Sysfs Class
    452===========
    453
    454    A cxl sysfs class is added under /sys/class/cxl to facilitate
    455    enumeration and tuning of the accelerators. Its layout is
    456    described in Documentation/ABI/testing/sysfs-class-cxl
    457
    458
    459Udev rules
    460==========
    461
    462    The following udev rules could be used to create a symlink to the
    463    most logical chardev to use in any programming mode (afuX.Yd for
    464    dedicated, afuX.Ys for afu directed), since the API is virtually
    465    identical for each::
    466
    467	SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
    468	SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
    469	                  KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"