cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

blockgroup.rst (6531B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3Layout
      4------
      5
      6The layout of a standard block group is approximately as follows (each
      7of these fields is discussed in a separate section below):
      8
      9.. list-table::
     10   :widths: 1 1 1 1 1 1 1 1
     11   :header-rows: 1
     12
     13   * - Group 0 Padding
     14     - ext4 Super Block
     15     - Group Descriptors
     16     - Reserved GDT Blocks
     17     - Data Block Bitmap
     18     - inode Bitmap
     19     - inode Table
     20     - Data Blocks
     21   * - 1024 bytes
     22     - 1 block
     23     - many blocks
     24     - many blocks
     25     - 1 block
     26     - 1 block
     27     - many blocks
     28     - many more blocks
     29
     30For the special case of block group 0, the first 1024 bytes are unused,
     31to allow for the installation of x86 boot sectors and other oddities.
     32The superblock will start at offset 1024 bytes, whichever block that
     33happens to be (usually 0). However, if for some reason the block size =
     341024, then block 0 is marked in use and the superblock goes in block 1.
     35For all other block groups, there is no padding.
     36
     37The ext4 driver primarily works with the superblock and the group
     38descriptors that are found in block group 0. Redundant copies of the
     39superblock and group descriptors are written to some of the block groups
     40across the disk in case the beginning of the disk gets trashed, though
     41not all block groups necessarily host a redundant copy (see following
     42paragraph for more details). If the group does not have a redundant
     43copy, the block group begins with the data block bitmap. Note also that
     44when the filesystem is freshly formatted, mkfs will allocate “reserve
     45GDT block” space after the block group descriptors and before the start
     46of the block bitmaps to allow for future expansion of the filesystem. By
     47default, a filesystem is allowed to increase in size by a factor of
     481024x over the original filesystem size.
     49
     50The location of the inode table is given by ``grp.bg_inode_table_*``. It
     51is continuous range of blocks large enough to contain
     52``sb.s_inodes_per_group * sb.s_inode_size`` bytes.
     53
     54As for the ordering of items in a block group, it is generally
     55established that the super block and the group descriptor table, if
     56present, will be at the beginning of the block group. The bitmaps and
     57the inode table can be anywhere, and it is quite possible for the
     58bitmaps to come after the inode table, or for both to be in different
     59groups (flex_bg). Leftover space is used for file data blocks, indirect
     60block maps, extent tree blocks, and extended attributes.
     61
     62Flexible Block Groups
     63---------------------
     64
     65Starting in ext4, there is a new feature called flexible block groups
     66(flex_bg). In a flex_bg, several block groups are tied together as one
     67logical block group; the bitmap spaces and the inode table space in the
     68first block group of the flex_bg are expanded to include the bitmaps
     69and inode tables of all other block groups in the flex_bg. For example,
     70if the flex_bg size is 4, then group 0 will contain (in order) the
     71superblock, group descriptors, data block bitmaps for groups 0-3, inode
     72bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
     73space in group 0 is for file data. The effect of this is to group the
     74block group metadata close together for faster loading, and to enable
     75large files to be continuous on disk. Backup copies of the superblock
     76and group descriptors are always at the beginning of block groups, even
     77if flex_bg is enabled. The number of block groups that make up a
     78flex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.
     79
     80Meta Block Groups
     81-----------------
     82
     83Without the option META_BG, for safety concerns, all block group
     84descriptors copies are kept in the first block group. Given the default
     85128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4
     86can have at most 2^27/64 = 2^21 block groups. This limits the entire
     87filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB.
     88
     89The solution to this problem is to use the metablock group feature
     90(META_BG), which is already in ext3 for all 2.6 releases. With the
     91META_BG feature, ext4 filesystems are partitioned into many metablock
     92groups. Each metablock group is a cluster of block groups whose group
     93descriptor structures can be stored in a single disk block. For ext4
     94filesystems with 4 KB block size, a single metablock group partition
     95includes 64 block groups, or 8 GiB of disk space. The metablock group
     96feature moves the location of the group descriptors from the congested
     97first block group of the whole filesystem into the first group of each
     98metablock group itself. The backups are in the second and last group of
     99each metablock group. This increases the 2^21 maximum block groups limit
    100to the hard limit 2^32, allowing support for a 512PiB filesystem.
    101
    102The change in the filesystem format replaces the current scheme where
    103the superblock is followed by a variable-length set of block group
    104descriptors. Instead, the superblock and a single block group descriptor
    105block is placed at the beginning of the first, second, and last block
    106groups in a meta-block group. A meta-block group is a collection of
    107block groups which can be described by a single block group descriptor
    108block. Since the size of the block group descriptor structure is 32
    109bytes, a meta-block group contains 32 block groups for filesystems with
    110a 1KB block size, and 128 block groups for filesystems with a 4KB
    111blocksize. Filesystems can either be created using this new block group
    112descriptor layout, or existing filesystems can be resized on-line, and
    113the field s_first_meta_bg in the superblock will indicate the first
    114block group using this new layout.
    115
    116Please see an important note about ``BLOCK_UNINIT`` in the section about
    117block and inode bitmaps.
    118
    119Lazy Block Group Initialization
    120-------------------------------
    121
    122A new feature for ext4 are three block group descriptor flags that
    123enable mkfs to skip initializing other parts of the block group
    124metadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean
    125that the inode and block bitmaps for that group can be calculated and
    126therefore the on-disk bitmap blocks are not initialized. This is
    127generally the case for an empty block group or a block group containing
    128only fixed-location block group metadata. The INODE_ZEROED flag means
    129that the inode table has been initialized; mkfs will unset this flag and
    130rely on the kernel to initialize the inode tables in the background.
    131
    132By not writing zeroes to the bitmaps and inode table, mkfs time is
    133reduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM,
    134but the dumpe2fs output prints this as “uninit_bg”. They are the same
    135thing.