cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

directory.rst (13283B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3Directory Entries
      4-----------------
      5
      6In an ext4 filesystem, a directory is more or less a flat file that maps
      7an arbitrary byte string (usually ASCII) to an inode number on the
      8filesystem. There can be many directory entries across the filesystem
      9that reference the same inode number--these are known as hard links, and
     10that is why hard links cannot reference files on other filesystems. As
     11such, directory entries are found by reading the data block(s)
     12associated with a directory file for the particular directory entry that
     13is desired.
     14
     15Linear (Classic) Directories
     16~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     17
     18By default, each directory lists its entries in an “almost-linear”
     19array. I write “almost” because it's not a linear array in the memory
     20sense because directory entries are not split across filesystem blocks.
     21Therefore, it is more accurate to say that a directory is a series of
     22data blocks and that each block contains a linear array of directory
     23entries. The end of each per-block array is signified by reaching the
     24end of the block; the last entry in the block has a record length that
     25takes it all the way to the end of the block. The end of the entire
     26directory is of course signified by reaching the end of the file. Unused
     27directory entries are signified by inode = 0. By default the filesystem
     28uses ``struct ext4_dir_entry_2`` for directory entries unless the
     29“filetype” feature flag is not set, in which case it uses
     30``struct ext4_dir_entry``.
     31
     32The original directory entry format is ``struct ext4_dir_entry``, which
     33is at most 263 bytes long, though on disk you'll need to reference
     34``dirent.rec_len`` to know for sure.
     35
     36.. list-table::
     37   :widths: 8 8 24 40
     38   :header-rows: 1
     39
     40   * - Offset
     41     - Size
     42     - Name
     43     - Description
     44   * - 0x0
     45     - __le32
     46     - inode
     47     - Number of the inode that this directory entry points to.
     48   * - 0x4
     49     - __le16
     50     - rec_len
     51     - Length of this directory entry. Must be a multiple of 4.
     52   * - 0x6
     53     - __le16
     54     - name_len
     55     - Length of the file name.
     56   * - 0x8
     57     - char
     58     - name[EXT4_NAME_LEN]
     59     - File name.
     60
     61Since file names cannot be longer than 255 bytes, the new directory
     62entry format shortens the name_len field and uses the space for a file
     63type flag, probably to avoid having to load every inode during directory
     64tree traversal. This format is ``ext4_dir_entry_2``, which is at most
     65263 bytes long, though on disk you'll need to reference
     66``dirent.rec_len`` to know for sure.
     67
     68.. list-table::
     69   :widths: 8 8 24 40
     70   :header-rows: 1
     71
     72   * - Offset
     73     - Size
     74     - Name
     75     - Description
     76   * - 0x0
     77     - __le32
     78     - inode
     79     - Number of the inode that this directory entry points to.
     80   * - 0x4
     81     - __le16
     82     - rec_len
     83     - Length of this directory entry.
     84   * - 0x6
     85     - __u8
     86     - name_len
     87     - Length of the file name.
     88   * - 0x7
     89     - __u8
     90     - file_type
     91     - File type code, see ftype_ table below.
     92   * - 0x8
     93     - char
     94     - name[EXT4_NAME_LEN]
     95     - File name.
     96
     97.. _ftype:
     98
     99The directory file type is one of the following values:
    100
    101.. list-table::
    102   :widths: 16 64
    103   :header-rows: 1
    104
    105   * - Value
    106     - Description
    107   * - 0x0
    108     - Unknown.
    109   * - 0x1
    110     - Regular file.
    111   * - 0x2
    112     - Directory.
    113   * - 0x3
    114     - Character device file.
    115   * - 0x4
    116     - Block device file.
    117   * - 0x5
    118     - FIFO.
    119   * - 0x6
    120     - Socket.
    121   * - 0x7
    122     - Symbolic link.
    123
    124To support directories that are both encrypted and casefolded directories, we
    125must also include hash information in the directory entry. We append
    126``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries
    127for dot and dotdot, which are kept the same. The structure follows immediately
    128after ``name`` and is included in the size listed by ``rec_len`` If a directory
    129entry uses this extension, it may be up to 271 bytes.
    130
    131.. list-table::
    132   :widths: 8 8 24 40
    133   :header-rows: 1
    134
    135   * - Offset
    136     - Size
    137     - Name
    138     - Description
    139   * - 0x0
    140     - __le32
    141     - hash
    142     - The hash of the directory name
    143   * - 0x4
    144     - __le32
    145     - minor_hash
    146     - The minor hash of the directory name
    147
    148
    149In order to add checksums to these classic directory blocks, a phony
    150``struct ext4_dir_entry`` is placed at the end of each leaf block to
    151hold the checksum. The directory entry is 12 bytes long. The inode
    152number and name_len fields are set to zero to fool old software into
    153ignoring an apparently empty directory entry, and the checksum is stored
    154in the place where the name normally goes. The structure is
    155``struct ext4_dir_entry_tail``:
    156
    157.. list-table::
    158   :widths: 8 8 24 40
    159   :header-rows: 1
    160
    161   * - Offset
    162     - Size
    163     - Name
    164     - Description
    165   * - 0x0
    166     - __le32
    167     - det_reserved_zero1
    168     - Inode number, which must be zero.
    169   * - 0x4
    170     - __le16
    171     - det_rec_len
    172     - Length of this directory entry, which must be 12.
    173   * - 0x6
    174     - __u8
    175     - det_reserved_zero2
    176     - Length of the file name, which must be zero.
    177   * - 0x7
    178     - __u8
    179     - det_reserved_ft
    180     - File type, which must be 0xDE.
    181   * - 0x8
    182     - __le32
    183     - det_checksum
    184     - Directory leaf block checksum.
    185
    186The leaf directory block checksum is calculated against the FS UUID, the
    187directory's inode number, the directory's inode generation number, and
    188the entire directory entry block up to (but not including) the fake
    189directory entry.
    190
    191Hash Tree Directories
    192~~~~~~~~~~~~~~~~~~~~~
    193
    194A linear array of directory entries isn't great for performance, so a
    195new feature was added to ext3 to provide a faster (but peculiar)
    196balanced tree keyed off a hash of the directory entry name. If the
    197EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
    198hashed btree (htree) to organize and find directory entries. For
    199backwards read-only compatibility with ext2, this tree is actually
    200hidden inside the directory file, masquerading as “empty” directory data
    201blocks! It was stated previously that the end of the linear directory
    202entry table was signified with an entry pointing to inode 0; this is
    203(ab)used to fool the old linear-scan algorithm into thinking that the
    204rest of the directory block is empty so that it moves on.
    205
    206The root of the tree always lives in the first data block of the
    207directory. By ext2 custom, the '.' and '..' entries must appear at the
    208beginning of this first block, so they are put here as two
    209``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of
    210the root node contains metadata about the tree and finally a hash->block
    211map to find nodes that are lower in the htree. If
    212``dx_root.info.indirect_levels`` is non-zero then the htree has two
    213levels; the data block pointed to by the root node's map is an interior
    214node, which is indexed by a minor hash. Interior nodes in this tree
    215contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
    216minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
    217array of all ``struct ext4_dir_entry_2``; all of these entries
    218(presumably) hash to the same value. If there is an overflow, the
    219entries simply overflow into the next leaf node, and the
    220least-significant bit of the hash (in the interior node map) that gets
    221us to this next leaf node is set.
    222
    223To traverse the directory as a htree, the code calculates the hash of
    224the desired file name and uses it to find the corresponding block
    225number. If the tree is flat, the block is a linear array of directory
    226entries that can be searched; otherwise, the minor hash of the file name
    227is computed and used against this second block to find the corresponding
    228third block number. That third block number will be a linear array of
    229directory entries.
    230
    231To traverse the directory as a linear array (such as the old code does),
    232the code simply reads every data block in the directory. The blocks used
    233for the htree will appear to have no entries (aside from '.' and '..')
    234and so only the leaf nodes will appear to have any interesting content.
    235
    236The root of the htree is in ``struct dx_root``, which is the full length
    237of a data block:
    238
    239.. list-table::
    240   :widths: 8 8 24 40
    241   :header-rows: 1
    242
    243   * - Offset
    244     - Type
    245     - Name
    246     - Description
    247   * - 0x0
    248     - __le32
    249     - dot.inode
    250     - inode number of this directory.
    251   * - 0x4
    252     - __le16
    253     - dot.rec_len
    254     - Length of this record, 12.
    255   * - 0x6
    256     - u8
    257     - dot.name_len
    258     - Length of the name, 1.
    259   * - 0x7
    260     - u8
    261     - dot.file_type
    262     - File type of this entry, 0x2 (directory) (if the feature flag is set).
    263   * - 0x8
    264     - char
    265     - dot.name[4]
    266     - “.\0\0\0”
    267   * - 0xC
    268     - __le32
    269     - dotdot.inode
    270     - inode number of parent directory.
    271   * - 0x10
    272     - __le16
    273     - dotdot.rec_len
    274     - block_size - 12. The record length is long enough to cover all htree
    275       data.
    276   * - 0x12
    277     - u8
    278     - dotdot.name_len
    279     - Length of the name, 2.
    280   * - 0x13
    281     - u8
    282     - dotdot.file_type
    283     - File type of this entry, 0x2 (directory) (if the feature flag is set).
    284   * - 0x14
    285     - char
    286     - dotdot_name[4]
    287     - “..\0\0”
    288   * - 0x18
    289     - __le32
    290     - struct dx_root_info.reserved_zero
    291     - Zero.
    292   * - 0x1C
    293     - u8
    294     - struct dx_root_info.hash_version
    295     - Hash type, see dirhash_ table below.
    296   * - 0x1D
    297     - u8
    298     - struct dx_root_info.info_length
    299     - Length of the tree information, 0x8.
    300   * - 0x1E
    301     - u8
    302     - struct dx_root_info.indirect_levels
    303     - Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR
    304       feature is set; cannot be larger than 2 otherwise.
    305   * - 0x1F
    306     - u8
    307     - struct dx_root_info.unused_flags
    308     -
    309   * - 0x20
    310     - __le16
    311     - limit
    312     - Maximum number of dx_entries that can follow this header, plus 1 for
    313       the header itself.
    314   * - 0x22
    315     - __le16
    316     - count
    317     - Actual number of dx_entries that follow this header, plus 1 for the
    318       header itself.
    319   * - 0x24
    320     - __le32
    321     - block
    322     - The block number (within the directory file) that goes with hash=0.
    323   * - 0x28
    324     - struct dx_entry
    325     - entries[0]
    326     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
    327
    328.. _dirhash:
    329
    330The directory hash is one of the following values:
    331
    332.. list-table::
    333   :widths: 16 64
    334   :header-rows: 1
    335
    336   * - Value
    337     - Description
    338   * - 0x0
    339     - Legacy.
    340   * - 0x1
    341     - Half MD4.
    342   * - 0x2
    343     - Tea.
    344   * - 0x3
    345     - Legacy, unsigned.
    346   * - 0x4
    347     - Half MD4, unsigned.
    348   * - 0x5
    349     - Tea, unsigned.
    350   * - 0x6
    351     - Siphash.
    352
    353Interior nodes of an htree are recorded as ``struct dx_node``, which is
    354also the full length of a data block:
    355
    356.. list-table::
    357   :widths: 8 8 24 40
    358   :header-rows: 1
    359
    360   * - Offset
    361     - Type
    362     - Name
    363     - Description
    364   * - 0x0
    365     - __le32
    366     - fake.inode
    367     - Zero, to make it look like this entry is not in use.
    368   * - 0x4
    369     - __le16
    370     - fake.rec_len
    371     - The size of the block, in order to hide all of the dx_node data.
    372   * - 0x6
    373     - u8
    374     - name_len
    375     - Zero. There is no name for this “unused” directory entry.
    376   * - 0x7
    377     - u8
    378     - file_type
    379     - Zero. There is no file type for this “unused” directory entry.
    380   * - 0x8
    381     - __le16
    382     - limit
    383     - Maximum number of dx_entries that can follow this header, plus 1 for
    384       the header itself.
    385   * - 0xA
    386     - __le16
    387     - count
    388     - Actual number of dx_entries that follow this header, plus 1 for the
    389       header itself.
    390   * - 0xE
    391     - __le32
    392     - block
    393     - The block number (within the directory file) that goes with the lowest
    394       hash value of this block. This value is stored in the parent block.
    395   * - 0x12
    396     - struct dx_entry
    397     - entries[0]
    398     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
    399
    400The hash maps that exist in both ``struct dx_root`` and
    401``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
    402long:
    403
    404.. list-table::
    405   :widths: 8 8 24 40
    406   :header-rows: 1
    407
    408   * - Offset
    409     - Type
    410     - Name
    411     - Description
    412   * - 0x0
    413     - __le32
    414     - hash
    415     - Hash code.
    416   * - 0x4
    417     - __le32
    418     - block
    419     - Block number (within the directory file, not filesystem blocks) of the
    420       next node in the htree.
    421
    422(If you think this is all quite clever and peculiar, so does the
    423author.)
    424
    425If metadata checksums are enabled, the last 8 bytes of the directory
    426block (precisely the length of one dx_entry) are used to store a
    427``struct dx_tail``, which contains the checksum. The ``limit`` and
    428``count`` entries in the dx_root/dx_node structures are adjusted as
    429necessary to fit the dx_tail into the block. If there is no space for
    430the dx_tail, the user is notified to run e2fsck -D to rebuild the
    431directory index (which will ensure that there's space for the checksum.
    432The dx_tail structure is 8 bytes long and looks like this:
    433
    434.. list-table::
    435   :widths: 8 8 24 40
    436   :header-rows: 1
    437
    438   * - Offset
    439     - Type
    440     - Name
    441     - Description
    442   * - 0x0
    443     - u32
    444     - dt_reserved
    445     - Zero.
    446   * - 0x4
    447     - __le32
    448     - dt_checksum
    449     - Checksum of the htree directory block.
    450
    451The checksum is calculated against the FS UUID, the htree index header
    452(dx_root or dx_node), all of the htree indices (dx_entry) that are in
    453use, and the tail block (dx_tail).