cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

netfs_library.rst (20961B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=================================
      4Network Filesystem Helper Library
      5=================================
      6
      7.. Contents:
      8
      9 - Overview.
     10 - Per-inode context.
     11   - Inode context helper functions.
     12 - Buffered read helpers.
     13   - Read helper functions.
     14   - Read helper structures.
     15   - Read helper operations.
     16   - Read helper procedure.
     17   - Read helper cache API.
     18
     19
     20Overview
     21========
     22
     23The network filesystem helper library is a set of functions designed to aid a
     24network filesystem in implementing VM/VFS operations.  For the moment, that
     25just includes turning various VM buffered read operations into requests to read
     26from the server.  The helper library, however, can also interpose other
     27services, such as local caching or local data encryption.
     28
     29Note that the library module doesn't link against local caching directly, so
     30access must be provided by the netfs.
     31
     32
     33Per-Inode Context
     34=================
     35
     36The network filesystem helper library needs a place to store a bit of state for
     37its use on each netfs inode it is helping to manage.  To this end, a context
     38structure is defined::
     39
     40	struct netfs_inode {
     41		struct inode inode;
     42		const struct netfs_request_ops *ops;
     43		struct fscache_cookie *cache;
     44	};
     45
     46A network filesystem that wants to use netfs lib must place one of these in its
     47inode wrapper struct instead of the VFS ``struct inode``.  This can be done in
     48a way similar to the following::
     49
     50	struct my_inode {
     51		struct netfs_inode netfs; /* Netfslib context and vfs inode */
     52		...
     53	};
     54
     55This allows netfslib to find its state by using ``container_of()`` from the
     56inode pointer, thereby allowing the netfslib helper functions to be pointed to
     57directly by the VFS/VM operation tables.
     58
     59The structure contains the following fields:
     60
     61 * ``inode``
     62
     63   The VFS inode structure.
     64
     65 * ``ops``
     66
     67   The set of operations provided by the network filesystem to netfslib.
     68
     69 * ``cache``
     70
     71   Local caching cookie, or NULL if no caching is enabled.  This field does not
     72   exist if fscache is disabled.
     73
     74
     75Inode Context Helper Functions
     76------------------------------
     77
     78To help deal with the per-inode context, a number helper functions are
     79provided.  Firstly, a function to perform basic initialisation on a context and
     80set the operations table pointer::
     81
     82	void netfs_inode_init(struct netfs_inode *ctx,
     83			      const struct netfs_request_ops *ops);
     84
     85then a function to cast from the VFS inode structure to the netfs context::
     86
     87	struct netfs_inode *netfs_node(struct inode *inode);
     88
     89and finally, a function to get the cache cookie pointer from the context
     90attached to an inode (or NULL if fscache is disabled)::
     91
     92	struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx);
     93
     94
     95Buffered Read Helpers
     96=====================
     97
     98The library provides a set of read helpers that handle the ->read_folio(),
     99->readahead() and much of the ->write_begin() VM operations and translate them
    100into a common call framework.
    101
    102The following services are provided:
    103
    104 * Handle folios that span multiple pages.
    105
    106 * Insulate the netfs from VM interface changes.
    107
    108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
    109   don't match folio sizes or folio alignments and that may cross folios.
    110
    111 * Allow the netfs to expand a readahead request in both directions to meet its
    112   needs.
    113
    114 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
    115
    116 * Handle local caching, allowing cached data and server-read data to be
    117   interleaved for a single request.
    118
    119 * Handle clearing of bufferage that aren't on the server.
    120
    121 * Handle retrying of reads that failed, switching reads from the cache to the
    122   server as necessary.
    123
    124 * In the future, this is a place that other services can be performed, such as
    125   local encryption of data to be stored remotely or in the cache.
    126
    127From the network filesystem, the helpers require a table of operations.  This
    128includes a mandatory method to issue a read operation along with a number of
    129optional methods.
    130
    131
    132Read Helper Functions
    133---------------------
    134
    135Three read helpers are provided::
    136
    137	void netfs_readahead(struct readahead_control *ractl);
    138	int netfs_read_folio(struct file *file,
    139			     struct folio *folio);
    140	int netfs_write_begin(struct netfs_inode *ctx,
    141			      struct file *file,
    142			      struct address_space *mapping,
    143			      loff_t pos,
    144			      unsigned int len,
    145			      struct folio **_folio,
    146			      void **_fsdata);
    147
    148Each corresponds to a VM address space operation.  These operations use the
    149state in the per-inode context.
    150
    151For ->readahead() and ->read_folio(), the network filesystem just point directly
    152at the corresponding read helper; whereas for ->write_begin(), it may be a
    153little more complicated as the network filesystem might want to flush
    154conflicting writes or track dirty data and needs to put the acquired folio if
    155an error occurs after calling the helper.
    156
    157The helpers manage the read request, calling back into the network filesystem
    158through the suppplied table of operations.  Waits will be performed as
    159necessary before returning for helpers that are meant to be synchronous.
    160
    161If an error occurs, the ->free_request() will be called to clean up the
    162netfs_io_request struct allocated.  If some parts of the request are in
    163progress when an error occurs, the request will get partially completed if
    164sufficient data is read.
    165
    166Additionally, there is::
    167
    168  * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
    169				 ssize_t transferred_or_error,
    170				 bool was_async);
    171
    172which should be called to complete a read subrequest.  This is given the number
    173of bytes transferred or a negative error code, plus a flag indicating whether
    174the operation was asynchronous (ie. whether the follow-on processing can be
    175done in the current context, given this may involve sleeping).
    176
    177
    178Read Helper Structures
    179----------------------
    180
    181The read helpers make use of a couple of structures to maintain the state of
    182the read.  The first is a structure that manages a read request as a whole::
    183
    184	struct netfs_io_request {
    185		struct inode		*inode;
    186		struct address_space	*mapping;
    187		struct netfs_cache_resources cache_resources;
    188		void			*netfs_priv;
    189		loff_t			start;
    190		size_t			len;
    191		loff_t			i_size;
    192		const struct netfs_request_ops *netfs_ops;
    193		unsigned int		debug_id;
    194		...
    195	};
    196
    197The above fields are the ones the netfs can use.  They are:
    198
    199 * ``inode``
    200 * ``mapping``
    201
    202   The inode and the address space of the file being read from.  The mapping
    203   may or may not point to inode->i_data.
    204
    205 * ``cache_resources``
    206
    207   Resources for the local cache to use, if present.
    208
    209 * ``netfs_priv``
    210
    211   The network filesystem's private data.  The value for this can be passed in
    212   to the helper functions or set during the request.
    213
    214 * ``start``
    215 * ``len``
    216
    217   The file position of the start of the read request and the length.  These
    218   may be altered by the ->expand_readahead() op.
    219
    220 * ``i_size``
    221
    222   The size of the file at the start of the request.
    223
    224 * ``netfs_ops``
    225
    226   A pointer to the operation table.  The value for this is passed into the
    227   helper functions.
    228
    229 * ``debug_id``
    230
    231   A number allocated to this operation that can be displayed in trace lines
    232   for reference.
    233
    234
    235The second structure is used to manage individual slices of the overall read
    236request::
    237
    238	struct netfs_io_subrequest {
    239		struct netfs_io_request *rreq;
    240		loff_t			start;
    241		size_t			len;
    242		size_t			transferred;
    243		unsigned long		flags;
    244		unsigned short		debug_index;
    245		...
    246	};
    247
    248Each subrequest is expected to access a single source, though the helpers will
    249handle falling back from one source type to another.  The members are:
    250
    251 * ``rreq``
    252
    253   A pointer to the read request.
    254
    255 * ``start``
    256 * ``len``
    257
    258   The file position of the start of this slice of the read request and the
    259   length.
    260
    261 * ``transferred``
    262
    263   The amount of data transferred so far of the length of this slice.  The
    264   network filesystem or cache should start the operation this far into the
    265   slice.  If a short read occurs, the helpers will call again, having updated
    266   this to reflect the amount read so far.
    267
    268 * ``flags``
    269
    270   Flags pertaining to the read.  There are two of interest to the filesystem
    271   or cache:
    272
    273   * ``NETFS_SREQ_CLEAR_TAIL``
    274
    275     This can be set to indicate that the remainder of the slice, from
    276     transferred to len, should be cleared.
    277
    278   * ``NETFS_SREQ_SEEK_DATA_READ``
    279
    280     This is a hint to the cache that it might want to try skipping ahead to
    281     the next data (ie. using SEEK_DATA).
    282
    283 * ``debug_index``
    284
    285   A number allocated to this slice that can be displayed in trace lines for
    286   reference.
    287
    288
    289Read Helper Operations
    290----------------------
    291
    292The network filesystem must provide the read helpers with a table of operations
    293through which it can issue requests and negotiate::
    294
    295	struct netfs_request_ops {
    296		void (*init_request)(struct netfs_io_request *rreq, struct file *file);
    297		void (*free_request)(struct netfs_io_request *rreq);
    298		int (*begin_cache_operation)(struct netfs_io_request *rreq);
    299		void (*expand_readahead)(struct netfs_io_request *rreq);
    300		bool (*clamp_length)(struct netfs_io_subrequest *subreq);
    301		void (*issue_read)(struct netfs_io_subrequest *subreq);
    302		bool (*is_still_valid)(struct netfs_io_request *rreq);
    303		int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
    304					 struct folio *folio, void **_fsdata);
    305		void (*done)(struct netfs_io_request *rreq);
    306	};
    307
    308The operations are as follows:
    309
    310 * ``init_request()``
    311
    312   [Optional] This is called to initialise the request structure.  It is given
    313   the file for reference.
    314
    315 * ``free_request()``
    316
    317   [Optional] This is called as the request is being deallocated so that the
    318   filesystem can clean up any state it has attached there.
    319
    320 * ``begin_cache_operation()``
    321
    322   [Optional] This is called to ask the network filesystem to call into the
    323   cache (if present) to initialise the caching state for this read.  The netfs
    324   library module cannot access the cache directly, so the cache should call
    325   something like fscache_begin_read_operation() to do this.
    326
    327   The cache gets to store its state in ->cache_resources and must set a table
    328   of operations of its own there (though of a different type).
    329
    330   This should return 0 on success and an error code otherwise.  If an error is
    331   reported, the operation may proceed anyway, just without local caching (only
    332   out of memory and interruption errors cause failure here).
    333
    334 * ``expand_readahead()``
    335
    336   [Optional] This is called to allow the filesystem to expand the size of a
    337   readahead read request.  The filesystem gets to expand the request in both
    338   directions, though it's not permitted to reduce it as the numbers may
    339   represent an allocation already made.  If local caching is enabled, it gets
    340   to expand the request first.
    341
    342   Expansion is communicated by changing ->start and ->len in the request
    343   structure.  Note that if any change is made, ->len must be increased by at
    344   least as much as ->start is reduced.
    345
    346 * ``clamp_length()``
    347
    348   [Optional] This is called to allow the filesystem to reduce the size of a
    349   subrequest.  The filesystem can use this, for example, to chop up a request
    350   that has to be split across multiple servers or to put multiple reads in
    351   flight.
    352
    353   This should return 0 on success and an error code on error.
    354
    355 * ``issue_read()``
    356
    357   [Required] The helpers use this to dispatch a subrequest to the server for
    358   reading.  In the subrequest, ->start, ->len and ->transferred indicate what
    359   data should be read from the server.
    360
    361   There is no return value; the netfs_subreq_terminated() function should be
    362   called to indicate whether or not the operation succeeded and how much data
    363   it transferred.  The filesystem also should not deal with setting folios
    364   uptodate, unlocking them or dropping their refs - the helpers need to deal
    365   with this as they have to coordinate with copying to the local cache.
    366
    367   Note that the helpers have the folios locked, but not pinned.  It is
    368   possible to use the ITER_XARRAY iov iterator to refer to the range of the
    369   inode that is being operated upon without the need to allocate large bvec
    370   tables.
    371
    372 * ``is_still_valid()``
    373
    374   [Optional] This is called to find out if the data just read from the local
    375   cache is still valid.  It should return true if it is still valid and false
    376   if not.  If it's not still valid, it will be reread from the server.
    377
    378 * ``check_write_begin()``
    379
    380   [Optional] This is called from the netfs_write_begin() helper once it has
    381   allocated/grabbed the folio to be modified to allow the filesystem to flush
    382   conflicting state before allowing it to be modified.
    383
    384   It should return 0 if everything is now fine, -EAGAIN if the folio should be
    385   regrabbed and any other error code to abort the operation.
    386
    387 * ``done``
    388
    389   [Optional] This is called after the folios in the request have all been
    390   unlocked (and marked uptodate if applicable).
    391
    392
    393
    394Read Helper Procedure
    395---------------------
    396
    397The read helpers work by the following general procedure:
    398
    399 * Set up the request.
    400
    401 * For readahead, allow the local cache and then the network filesystem to
    402   propose expansions to the read request.  This is then proposed to the VM.
    403   If the VM cannot fully perform the expansion, a partially expanded read will
    404   be performed, though this may not get written to the cache in its entirety.
    405
    406 * Loop around slicing chunks off of the request to form subrequests:
    407
    408   * If a local cache is present, it gets to do the slicing, otherwise the
    409     helpers just try to generate maximal slices.
    410
    411   * The network filesystem gets to clamp the size of each slice if it is to be
    412     the source.  This allows rsize and chunking to be implemented.
    413
    414   * The helpers issue a read from the cache or a read from the server or just
    415     clears the slice as appropriate.
    416
    417   * The next slice begins at the end of the last one.
    418
    419   * As slices finish being read, they terminate.
    420
    421 * When all the subrequests have terminated, the subrequests are assessed and
    422   any that are short or have failed are reissued:
    423
    424   * Failed cache requests are issued against the server instead.
    425
    426   * Failed server requests just fail.
    427
    428   * Short reads against either source will be reissued against that source
    429     provided they have transferred some more data:
    430
    431     * The cache may need to skip holes that it can't do DIO from.
    432
    433     * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
    434       end of the slice instead of reissuing.
    435
    436 * Once the data is read, the folios that have been fully read/cleared:
    437
    438   * Will be marked uptodate.
    439
    440   * If a cache is present, will be marked with PG_fscache.
    441
    442   * Unlocked
    443
    444 * Any folios that need writing to the cache will then have DIO writes issued.
    445
    446 * Synchronous operations will wait for reading to be complete.
    447
    448 * Writes to the cache will proceed asynchronously and the folios will have the
    449   PG_fscache mark removed when that completes.
    450
    451 * The request structures will be cleaned up when everything has completed.
    452
    453
    454Read Helper Cache API
    455---------------------
    456
    457When implementing a local cache to be used by the read helpers, two things are
    458required: some way for the network filesystem to initialise the caching for a
    459read request and a table of operations for the helpers to call.
    460
    461The network filesystem's ->begin_cache_operation() method is called to set up a
    462cache and this must call into the cache to do the work.  If using fscache, for
    463example, the cache would call::
    464
    465	int fscache_begin_read_operation(struct netfs_io_request *rreq,
    466					 struct fscache_cookie *cookie);
    467
    468passing in the request pointer and the cookie corresponding to the file.
    469
    470The netfs_io_request object contains a place for the cache to hang its
    471state::
    472
    473	struct netfs_cache_resources {
    474		const struct netfs_cache_ops	*ops;
    475		void				*cache_priv;
    476		void				*cache_priv2;
    477	};
    478
    479This contains an operations table pointer and two private pointers.  The
    480operation table looks like the following::
    481
    482	struct netfs_cache_ops {
    483		void (*end_operation)(struct netfs_cache_resources *cres);
    484
    485		void (*expand_readahead)(struct netfs_cache_resources *cres,
    486					 loff_t *_start, size_t *_len, loff_t i_size);
    487
    488		enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
    489						       loff_t i_size);
    490
    491		int (*read)(struct netfs_cache_resources *cres,
    492			    loff_t start_pos,
    493			    struct iov_iter *iter,
    494			    bool seek_data,
    495			    netfs_io_terminated_t term_func,
    496			    void *term_func_priv);
    497
    498		int (*prepare_write)(struct netfs_cache_resources *cres,
    499				     loff_t *_start, size_t *_len, loff_t i_size,
    500				     bool no_space_allocated_yet);
    501
    502		int (*write)(struct netfs_cache_resources *cres,
    503			     loff_t start_pos,
    504			     struct iov_iter *iter,
    505			     netfs_io_terminated_t term_func,
    506			     void *term_func_priv);
    507
    508		int (*query_occupancy)(struct netfs_cache_resources *cres,
    509				       loff_t start, size_t len, size_t granularity,
    510				       loff_t *_data_start, size_t *_data_len);
    511	};
    512
    513With a termination handler function pointer::
    514
    515	typedef void (*netfs_io_terminated_t)(void *priv,
    516					      ssize_t transferred_or_error,
    517					      bool was_async);
    518
    519The methods defined in the table are:
    520
    521 * ``end_operation()``
    522
    523   [Required] Called to clean up the resources at the end of the read request.
    524
    525 * ``expand_readahead()``
    526
    527   [Optional] Called at the beginning of a netfs_readahead() operation to allow
    528   the cache to expand a request in either direction.  This allows the cache to
    529   size the request appropriately for the cache granularity.
    530
    531   The function is passed poiners to the start and length in its parameters,
    532   plus the size of the file for reference, and adjusts the start and length
    533   appropriately.  It should return one of:
    534
    535   * ``NETFS_FILL_WITH_ZEROES``
    536   * ``NETFS_DOWNLOAD_FROM_SERVER``
    537   * ``NETFS_READ_FROM_CACHE``
    538   * ``NETFS_INVALID_READ``
    539
    540   to indicate whether the slice should just be cleared or whether it should be
    541   downloaded from the server or read from the cache - or whether slicing
    542   should be given up at the current point.
    543
    544 * ``prepare_read()``
    545
    546   [Required] Called to configure the next slice of a request.  ->start and
    547   ->len in the subrequest indicate where and how big the next slice can be;
    548   the cache gets to reduce the length to match its granularity requirements.
    549
    550 * ``read()``
    551
    552   [Required] Called to read from the cache.  The start file offset is given
    553   along with an iterator to read to, which gives the length also.  It can be
    554   given a hint requesting that it seek forward from that start position for
    555   data.
    556
    557   Also provided is a pointer to a termination handler function and private
    558   data to pass to that function.  The termination function should be called
    559   with the number of bytes transferred or an error code, plus a flag
    560   indicating whether the termination is definitely happening in the caller's
    561   context.
    562
    563 * ``prepare_write()``
    564
    565   [Required] Called to prepare a write to the cache to take place.  This
    566   involves checking to see whether the cache has sufficient space to honour
    567   the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
    568   region can be shrunk or it can be expanded to a page boundary either way as
    569   necessary to align for direct I/O.  i_size holds the size of the object and
    570   is provided for reference.  no_space_allocated_yet is set to true if the
    571   caller is certain that no data has been written to that region - for example
    572   if it tried to do a read from there already.
    573
    574 * ``write()``
    575
    576   [Required] Called to write to the cache.  The start file offset is given
    577   along with an iterator to write from, which gives the length also.
    578
    579   Also provided is a pointer to a termination handler function and private
    580   data to pass to that function.  The termination function should be called
    581   with the number of bytes transferred or an error code, plus a flag
    582   indicating whether the termination is definitely happening in the caller's
    583   context.
    584
    585 * ``query_occupancy()``
    586
    587   [Required] Called to find out where the next piece of data is within a
    588   particular region of the cache.  The start and length of the region to be
    589   queried are passed in, along with the granularity to which the answer needs
    590   to be aligned.  The function passes back the start and length of the data,
    591   if any, available within that region.  Note that there may be a hole at the
    592   front.
    593
    594   It returns 0 if some data was found, -ENODATA if there was no usable data
    595   within the region or -ENOBUFS if there is no caching on this file.
    596
    597Note that these methods are passed a pointer to the cache resource structure,
    598not the read request structure as they could be used in other situations where
    599there isn't a read request structure as well, such as writing dirty data to the
    600cache.
    601
    602
    603API Function Reference
    604======================
    605
    606.. kernel-doc:: include/linux/netfs.h
    607.. kernel-doc:: fs/netfs/buffered_read.c
    608.. kernel-doc:: fs/netfs/io.c