mmc-async-req.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
mmc-async-req.rst (4008B)
      1========================
      2MMC Asynchronous Request
      3========================
      4
      5Rationale
      6=========
      7
      8How significant is the cache maintenance overhead?
      9
     10It depends. Fast eMMC and multiple cache levels with speculative cache
     11pre-fetch makes the cache overhead relatively significant. If the DMA
     12preparations for the next request are done in parallel with the current
     13transfer, the DMA preparation overhead would not affect the MMC performance.
     14
     15The intention of non-blocking (asynchronous) MMC requests is to minimize the
     16time between when an MMC request ends and another MMC request begins.
     17
     18Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
     19dma_unmap_sg are processing. Using non-blocking MMC requests makes it
     20possible to prepare the caches for next job in parallel with an active
     21MMC request.
     22
     23MMC block driver
     24================
     25
     26The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
     27
     28The increase in throughput is proportional to the time it takes to
     29prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
     30a request and how fast the memory is. The faster the MMC/SD is the
     31more significant the prepare request time becomes. Roughly the expected
     32performance gain is 5% for large writes and 10% on large reads on a L2 cache
     33platform. In power save mode, when clocks run on a lower frequency, the DMA
     34preparation may cost even more. As long as these slower preparations are run
     35in parallel with the transfer performance won't be affected.
     36
     37Details on measurements from IOZone and mmc_test
     38================================================
     39
     40https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
     41
     42MMC core API extension
     43======================
     44
     45There is one new public function mmc_start_req().
     46
     47It starts a new MMC command request for a host. The function isn't
     48truly non-blocking. If there is an ongoing async request it waits
     49for completion of that request and starts the new one and returns. It
     50doesn't wait for the new request to complete. If there is no ongoing
     51request it starts the new request and returns immediately.
     52
     53MMC host extensions
     54===================
     55
     56There are two optional members in the mmc_host_ops -- pre_req() and
     57post_req() -- that the host driver may implement in order to move work
     58to before and after the actual mmc_host_ops.request() function is called.
     59
     60In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
     61descriptor, and post_req() runs the dma_unmap_sg().
     62
     63Optimize for the first request
     64==============================
     65
     66The first request in a series of requests can't be prepared in parallel
     67with the previous transfer, since there is no previous request.
     68
     69The argument is_first_req in pre_req() indicates that there is no previous
     70request. The host driver may optimize for this scenario to minimize
     71the performance loss. A way to optimize for this is to split the current
     72request in two chunks, prepare the first chunk and start the request,
     73and finally prepare the second chunk and start the transfer.
     74
     75Pseudocode to handle is_first_req scenario with minimal prepare overhead::
     76
     77  if (is_first_req && req->size > threshold)
     78     /* start MMC transfer for the complete transfer size */
     79     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
     80
     81     /*
     82      * Begin to prepare DMA while cmd is being processed by MMC.
     83      * The first chunk of the request should take the same time
     84      * to prepare as the "MMC process command time".
     85      * If prepare time exceeds MMC cmd time
     86      * the transfer is delayed, guesstimate max 4k as first chunk size.
     87      */
     88      prepare_1st_chunk_for_dma(req);
     89      /* flush pending desc to the DMAC (dmaengine.h) */
     90      dma_issue_pending(req->dma_desc);
     91
     92      prepare_2nd_chunk_for_dma(req);
     93      /*
     94       * The second issue_pending should be called before MMC runs out
     95       * of the first chunk. If the MMC runs out of the first data chunk
     96       * before this call, the transfer is delayed.
     97       */
     98      dma_issue_pending(req->dma_desc);