cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

README (10457B)


      1****************************
      2RDMA Transport (RTRS)
      3****************************
      4
      5RTRS (RDMA Transport) is a reliable high speed transport library
      6which provides support to establish optimal number of connections
      7between client and server machines using RDMA (InfiniBand, RoCE, iWarp)
      8transport. It is optimized to transfer (read/write) IO blocks.
      9
     10In its core interface it follows the BIO semantics of providing the
     11possibility to either write data from an sg list to the remote side
     12or to request ("read") data transfer from the remote side into a given
     13sg list.
     14
     15RTRS provides I/O fail-over and load-balancing capabilities by using
     16multipath I/O (see "add_path" and "mp_policy" configuration entries in
     17Documentation/ABI/testing/sysfs-class-rtrs-client).
     18
     19RTRS is used by the RNBD (RDMA Network Block Device) modules.
     20
     21==================
     22Transport protocol
     23==================
     24
     25Overview
     26--------
     27An established connection between a client and a server is called rtrs
     28session. A session is associated with a set of memory chunks reserved on the
     29server side for a given client for rdma transfer. A session
     30consists of multiple paths, each representing a separate physical link
     31between client and server. Those are used for load balancing and failover.
     32Each path consists of as many connections (QPs) as there are cpus on
     33the client.
     34
     35When processing an incoming write or read request, rtrs client uses memory
     36chunks reserved for him on the server side. Their number, size and addresses
     37need to be exchanged between client and server during the connection
     38establishment phase. Apart from the memory related information client needs to
     39inform the server about the session name and identify each path and connection
     40individually.
     41
     42On an established session client sends to server write or read messages.
     43Server uses immediate field to tell the client which request is being
     44acknowledged and for errno. Client uses immediate field to tell the server
     45which of the memory chunks has been accessed and at which offset the message
     46can be found.
     47
     48Module parameter always_invalidate is introduced for the security problem
     49discussed in LPC RDMA MC 2019. When always_invalidate=Y, on the server side we
     50invalidate each rdma buffer before we hand it over to RNBD server and
     51then pass it to the block layer. A new rkey is generated and registered for the
     52buffer after it returns back from the block layer and RNBD server.
     53The new rkey is sent back to the client along with the IO result.
     54The procedure is the default behaviour of the driver. This invalidation and
     55registration on each IO causes performance drop of up to 20%. A user of the
     56driver may choose to load the modules with this mechanism switched off
     57(always_invalidate=N), if he understands and can take the risk of a malicious
     58client being able to corrupt memory of a server it is connected to. This might
     59be a reasonable option in a scenario where all the clients and all the servers
     60are located within a secure datacenter.
     61
     62
     63Connection establishment
     64------------------------
     65
     661. Client starts establishing connections belonging to a path of a session one
     67by one via attaching RTRS_MSG_CON_REQ messages to the rdma_connect requests.
     68Those include uuid of the session and uuid of the path to be
     69established. They are used by the server to find a persisting session/path or
     70to create a new one when necessary. The message also contains the protocol
     71version and magic for compatibility, total number of connections per session
     72(as many as cpus on the client), the id of the current connection and
     73the reconnect counter, which is used to resolve the situations where
     74client is trying to reconnect a path, while server is still destroying the old
     75one.
     76
     772. Server accepts the connection requests one by one and attaches
     78RTRS_MSG_CONN_RSP messages to the rdma_accept. Apart from magic and
     79protocol version, the messages include error code, queue depth supported by
     80the server (number of memory chunks which are going to be allocated for that
     81session) and the maximum size of one io, RTRS_MSG_NEW_RKEY_F flags is set
     82when always_invalidate=Y.
     83
     843. After all connections of a path are established client sends to server the
     85RTRS_MSG_INFO_REQ message, containing the name of the session. This message
     86requests the address information from the server.
     87
     884. Server replies to the session info request message with RTRS_MSG_INFO_RSP,
     89which contains the addresses and keys of the RDMA buffers allocated for that
     90session.
     91
     925. Session becomes connected after all paths to be established are connected
     93(i.e. steps 1-4 finished for all paths requested for a session)
     94
     956. Server and client exchange periodically heartbeat messages (empty rdma
     96messages with an immediate field) which are used to detect a crash on remote
     97side or network outage in an absence of IO.
     98
     997. On any RDMA related error or in the case of a heartbeat timeout, the
    100corresponding path is disconnected, all the inflight IO are failed over to a
    101healthy path, if any, and the reconnect mechanism is triggered.
    102
    103CLT                                     SRV
    104*for each connection belonging to a path and for each path:
    105RTRS_MSG_CON_REQ  ------------------->
    106                   <------------------- RTRS_MSG_CON_RSP
    107...
    108*after all connections are established:
    109RTRS_MSG_INFO_REQ ------------------->
    110                   <------------------- RTRS_MSG_INFO_RSP
    111*heartbeat is started from both sides:
    112                   -------------------> [RTRS_HB_MSG_IMM]
    113[RTRS_HB_MSG_ACK] <-------------------
    114[RTRS_HB_MSG_IMM] <-------------------
    115                   -------------------> [RTRS_HB_MSG_ACK]
    116
    117IO path
    118-------
    119
    120* Write (always_invalidate=N) *
    121
    1221. When processing a write request client selects one of the memory chunks
    123on the server side and rdma writes there the user data, user header and the
    124RTRS_MSG_RDMA_WRITE message. Apart from the type (write), the message only
    125contains size of the user header. The client tells the server which chunk has
    126been accessed and at what offset the RTRS_MSG_RDMA_WRITE can be found by
    127using the IMM field.
    128
    1292. When confirming a write request server sends an "empty" rdma message with
    130an immediate field. The 32 bit field is used to specify the outstanding
    131inflight IO and for the error code.
    132
    133CLT                                                          SRV
    134usr_data + usr_hdr + rtrs_msg_rdma_write -----------------> [RTRS_IO_REQ_IMM]
    135[RTRS_IO_RSP_IMM]                        <----------------- (id + errno)
    136
    137* Write (always_invalidate=Y) *
    138
    1391. When processing a write request client selects one of the memory chunks
    140on the server side and rdma writes there the user data, user header and the
    141RTRS_MSG_RDMA_WRITE message. Apart from the type (write), the message only
    142contains size of the user header. The client tells the server which chunk has
    143been accessed and at what offset the RTRS_MSG_RDMA_WRITE can be found by
    144using the IMM field, Server invalidate rkey associated to the memory chunks
    145first, when it finishes, pass the IO to RNBD server module.
    146
    1472. When confirming a write request server sends an "empty" rdma message with
    148an immediate field. The 32 bit field is used to specify the outstanding
    149inflight IO and for the error code. The new rkey is sent back using
    150SEND_WITH_IMM WR, client When it recived new rkey message, it validates
    151the message and finished IO after update rkey for the rbuffer, then post
    152back the recv buffer for later use.
    153
    154CLT                                                          SRV
    155usr_data + usr_hdr + rtrs_msg_rdma_write -----------------> [RTRS_IO_REQ_IMM]
    156[RTRS_MSG_RKEY_RSP]                     <----------------- (RTRS_MSG_RKEY_RSP)
    157[RTRS_IO_RSP_IMM]                        <----------------- (id + errno)
    158
    159
    160* Read (always_invalidate=N)*
    161
    1621. When processing a read request client selects one of the memory chunks
    163on the server side and rdma writes there the user header and the
    164RTRS_MSG_RDMA_READ message. This message contains the type (read), size of
    165the user header, flags (specifying if memory invalidation is necessary) and the
    166list of addresses along with keys for the data to be read into.
    167
    1682. When confirming a read request server transfers the requested data first,
    169attaches an invalidation message if requested and finally an "empty" rdma
    170message with an immediate field. The 32 bit field is used to specify the
    171outstanding inflight IO and the error code.
    172
    173CLT                                           SRV
    174usr_hdr + rtrs_msg_rdma_read --------------> [RTRS_IO_REQ_IMM]
    175[RTRS_IO_RSP_IMM]            <-------------- usr_data + (id + errno)
    176or in case client requested invalidation:
    177[RTRS_IO_RSP_IMM_W_INV]      <-------------- usr_data + (INV) + (id + errno)
    178
    179* Read (always_invalidate=Y)*
    180
    1811. When processing a read request client selects one of the memory chunks
    182on the server side and rdma writes there the user header and the
    183RTRS_MSG_RDMA_READ message. This message contains the type (read), size of
    184the user header, flags (specifying if memory invalidation is necessary) and the
    185list of addresses along with keys for the data to be read into.
    186Server invalidate rkey associated to the memory chunks first, when it finishes,
    187passes the IO to RNBD server module.
    188
    1892. When confirming a read request server transfers the requested data first,
    190attaches an invalidation message if requested and finally an "empty" rdma
    191message with an immediate field. The 32 bit field is used to specify the
    192outstanding inflight IO and the error code. The new rkey is sent back using
    193SEND_WITH_IMM WR, client When it recived new rkey message, it validates
    194the message and finished IO after update rkey for the rbuffer, then post
    195back the recv buffer for later use.
    196
    197CLT                                           SRV
    198usr_hdr + rtrs_msg_rdma_read --------------> [RTRS_IO_REQ_IMM]
    199[RTRS_IO_RSP_IMM]            <-------------- usr_data + (id + errno)
    200[RTRS_MSG_RKEY_RSP]	     <----------------- (RTRS_MSG_RKEY_RSP)
    201or in case client requested invalidation:
    202[RTRS_IO_RSP_IMM_W_INV]      <-------------- usr_data + (INV) + (id + errno)
    203=========================================
    204Contributors List(in alphabetical order)
    205=========================================
    206Danil Kipnis <danil.kipnis@profitbricks.com>
    207Fabian Holler <mail@fholler.de>
    208Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
    209Jack Wang <jinpu.wang@profitbricks.com>
    210Kleber Souza <kleber.souza@profitbricks.com>
    211Lutz Pogrell <lutz.pogrell@cloud.ionos.com>
    212Milind Dumbare <Milind.dumbare@gmail.com>
    213Roman Penyaev <roman.penyaev@profitbricks.com>