allocators.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
allocators.rst (3171B)
      1.. SPDX-License-Identifier: GPL-2.0
      2
      3Block and Inode Allocation Policy
      4---------------------------------
      5
      6ext4 recognizes (better than ext3, anyway) that data locality is
      7generally a desirably quality of a filesystem. On a spinning disk,
      8keeping related blocks near each other reduces the amount of movement
      9that the head actuator and disk must perform to access a data block,
     10thus speeding up disk IO. On an SSD there of course are no moving parts,
     11but locality can increase the size of each transfer request while
     12reducing the total number of requests. This locality may also have the
     13effect of concentrating writes on a single erase block, which can speed
     14up file rewrites significantly. Therefore, it is useful to reduce
     15fragmentation whenever possible.
     16
     17The first tool that ext4 uses to combat fragmentation is the multi-block
     18allocator. When a file is first created, the block allocator
     19speculatively allocates 8KiB of disk space to the file on the assumption
     20that the space will get written soon. When the file is closed, the
     21unused speculative allocations are of course freed, but if the
     22speculation is correct (typically the case for full writes of small
     23files) then the file data gets written out in a single multi-block
     24extent. A second related trick that ext4 uses is delayed allocation.
     25Under this scheme, when a file needs more blocks to absorb file writes,
     26the filesystem defers deciding the exact placement on the disk until all
     27the dirty buffers are being written out to disk. By not committing to a
     28particular placement until it's absolutely necessary (the commit timeout
     29is hit, or sync() is called, or the kernel runs out of memory), the hope
     30is that the filesystem can make better location decisions.
     31
     32The third trick that ext4 (and ext3) uses is that it tries to keep a
     33file's data blocks in the same block group as its inode. This cuts down
     34on the seek penalty when the filesystem first has to read a file's inode
     35to learn where the file's data blocks live and then seek over to the
     36file's data blocks to begin I/O operations.
     37
     38The fourth trick is that all the inodes in a directory are placed in the
     39same block group as the directory, when feasible. The working assumption
     40here is that all the files in a directory might be related, therefore it
     41is useful to try to keep them all together.
     42
     43The fifth trick is that the disk volume is cut up into 128MB block
     44groups; these mini-containers are used as outlined above to try to
     45maintain data locality. However, there is a deliberate quirk -- when a
     46directory is created in the root directory, the inode allocator scans
     47the block groups and puts that directory into the least heavily loaded
     48block group that it can find. This encourages directories to spread out
     49over a disk; as the top-level directory/file blobs fill up one block
     50group, the allocators simply move on to the next block group. Allegedly
     51this scheme evens out the loading on the block groups, though the author
     52suspects that the directories which are so unlucky as to land towards
     53the end of a spinning drive get a raw deal performance-wise.
     54
     55Of course if all of these mechanisms fail, one can always use e4defrag
     56to defragment files.