directory.rst (13283B)
1.. SPDX-License-Identifier: GPL-2.0 2 3Directory Entries 4----------------- 5 6In an ext4 filesystem, a directory is more or less a flat file that maps 7an arbitrary byte string (usually ASCII) to an inode number on the 8filesystem. There can be many directory entries across the filesystem 9that reference the same inode number--these are known as hard links, and 10that is why hard links cannot reference files on other filesystems. As 11such, directory entries are found by reading the data block(s) 12associated with a directory file for the particular directory entry that 13is desired. 14 15Linear (Classic) Directories 16~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17 18By default, each directory lists its entries in an “almost-linear” 19array. I write “almost” because it's not a linear array in the memory 20sense because directory entries are not split across filesystem blocks. 21Therefore, it is more accurate to say that a directory is a series of 22data blocks and that each block contains a linear array of directory 23entries. The end of each per-block array is signified by reaching the 24end of the block; the last entry in the block has a record length that 25takes it all the way to the end of the block. The end of the entire 26directory is of course signified by reaching the end of the file. Unused 27directory entries are signified by inode = 0. By default the filesystem 28uses ``struct ext4_dir_entry_2`` for directory entries unless the 29“filetype” feature flag is not set, in which case it uses 30``struct ext4_dir_entry``. 31 32The original directory entry format is ``struct ext4_dir_entry``, which 33is at most 263 bytes long, though on disk you'll need to reference 34``dirent.rec_len`` to know for sure. 35 36.. list-table:: 37 :widths: 8 8 24 40 38 :header-rows: 1 39 40 * - Offset 41 - Size 42 - Name 43 - Description 44 * - 0x0 45 - __le32 46 - inode 47 - Number of the inode that this directory entry points to. 48 * - 0x4 49 - __le16 50 - rec_len 51 - Length of this directory entry. Must be a multiple of 4. 52 * - 0x6 53 - __le16 54 - name_len 55 - Length of the file name. 56 * - 0x8 57 - char 58 - name[EXT4_NAME_LEN] 59 - File name. 60 61Since file names cannot be longer than 255 bytes, the new directory 62entry format shortens the name_len field and uses the space for a file 63type flag, probably to avoid having to load every inode during directory 64tree traversal. This format is ``ext4_dir_entry_2``, which is at most 65263 bytes long, though on disk you'll need to reference 66``dirent.rec_len`` to know for sure. 67 68.. list-table:: 69 :widths: 8 8 24 40 70 :header-rows: 1 71 72 * - Offset 73 - Size 74 - Name 75 - Description 76 * - 0x0 77 - __le32 78 - inode 79 - Number of the inode that this directory entry points to. 80 * - 0x4 81 - __le16 82 - rec_len 83 - Length of this directory entry. 84 * - 0x6 85 - __u8 86 - name_len 87 - Length of the file name. 88 * - 0x7 89 - __u8 90 - file_type 91 - File type code, see ftype_ table below. 92 * - 0x8 93 - char 94 - name[EXT4_NAME_LEN] 95 - File name. 96 97.. _ftype: 98 99The directory file type is one of the following values: 100 101.. list-table:: 102 :widths: 16 64 103 :header-rows: 1 104 105 * - Value 106 - Description 107 * - 0x0 108 - Unknown. 109 * - 0x1 110 - Regular file. 111 * - 0x2 112 - Directory. 113 * - 0x3 114 - Character device file. 115 * - 0x4 116 - Block device file. 117 * - 0x5 118 - FIFO. 119 * - 0x6 120 - Socket. 121 * - 0x7 122 - Symbolic link. 123 124To support directories that are both encrypted and casefolded directories, we 125must also include hash information in the directory entry. We append 126``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries 127for dot and dotdot, which are kept the same. The structure follows immediately 128after ``name`` and is included in the size listed by ``rec_len`` If a directory 129entry uses this extension, it may be up to 271 bytes. 130 131.. list-table:: 132 :widths: 8 8 24 40 133 :header-rows: 1 134 135 * - Offset 136 - Size 137 - Name 138 - Description 139 * - 0x0 140 - __le32 141 - hash 142 - The hash of the directory name 143 * - 0x4 144 - __le32 145 - minor_hash 146 - The minor hash of the directory name 147 148 149In order to add checksums to these classic directory blocks, a phony 150``struct ext4_dir_entry`` is placed at the end of each leaf block to 151hold the checksum. The directory entry is 12 bytes long. The inode 152number and name_len fields are set to zero to fool old software into 153ignoring an apparently empty directory entry, and the checksum is stored 154in the place where the name normally goes. The structure is 155``struct ext4_dir_entry_tail``: 156 157.. list-table:: 158 :widths: 8 8 24 40 159 :header-rows: 1 160 161 * - Offset 162 - Size 163 - Name 164 - Description 165 * - 0x0 166 - __le32 167 - det_reserved_zero1 168 - Inode number, which must be zero. 169 * - 0x4 170 - __le16 171 - det_rec_len 172 - Length of this directory entry, which must be 12. 173 * - 0x6 174 - __u8 175 - det_reserved_zero2 176 - Length of the file name, which must be zero. 177 * - 0x7 178 - __u8 179 - det_reserved_ft 180 - File type, which must be 0xDE. 181 * - 0x8 182 - __le32 183 - det_checksum 184 - Directory leaf block checksum. 185 186The leaf directory block checksum is calculated against the FS UUID, the 187directory's inode number, the directory's inode generation number, and 188the entire directory entry block up to (but not including) the fake 189directory entry. 190 191Hash Tree Directories 192~~~~~~~~~~~~~~~~~~~~~ 193 194A linear array of directory entries isn't great for performance, so a 195new feature was added to ext3 to provide a faster (but peculiar) 196balanced tree keyed off a hash of the directory entry name. If the 197EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a 198hashed btree (htree) to organize and find directory entries. For 199backwards read-only compatibility with ext2, this tree is actually 200hidden inside the directory file, masquerading as “empty” directory data 201blocks! It was stated previously that the end of the linear directory 202entry table was signified with an entry pointing to inode 0; this is 203(ab)used to fool the old linear-scan algorithm into thinking that the 204rest of the directory block is empty so that it moves on. 205 206The root of the tree always lives in the first data block of the 207directory. By ext2 custom, the '.' and '..' entries must appear at the 208beginning of this first block, so they are put here as two 209``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of 210the root node contains metadata about the tree and finally a hash->block 211map to find nodes that are lower in the htree. If 212``dx_root.info.indirect_levels`` is non-zero then the htree has two 213levels; the data block pointed to by the root node's map is an interior 214node, which is indexed by a minor hash. Interior nodes in this tree 215contains a zeroed out ``struct ext4_dir_entry_2`` followed by a 216minor_hash->block map to find leafe nodes. Leaf nodes contain a linear 217array of all ``struct ext4_dir_entry_2``; all of these entries 218(presumably) hash to the same value. If there is an overflow, the 219entries simply overflow into the next leaf node, and the 220least-significant bit of the hash (in the interior node map) that gets 221us to this next leaf node is set. 222 223To traverse the directory as a htree, the code calculates the hash of 224the desired file name and uses it to find the corresponding block 225number. If the tree is flat, the block is a linear array of directory 226entries that can be searched; otherwise, the minor hash of the file name 227is computed and used against this second block to find the corresponding 228third block number. That third block number will be a linear array of 229directory entries. 230 231To traverse the directory as a linear array (such as the old code does), 232the code simply reads every data block in the directory. The blocks used 233for the htree will appear to have no entries (aside from '.' and '..') 234and so only the leaf nodes will appear to have any interesting content. 235 236The root of the htree is in ``struct dx_root``, which is the full length 237of a data block: 238 239.. list-table:: 240 :widths: 8 8 24 40 241 :header-rows: 1 242 243 * - Offset 244 - Type 245 - Name 246 - Description 247 * - 0x0 248 - __le32 249 - dot.inode 250 - inode number of this directory. 251 * - 0x4 252 - __le16 253 - dot.rec_len 254 - Length of this record, 12. 255 * - 0x6 256 - u8 257 - dot.name_len 258 - Length of the name, 1. 259 * - 0x7 260 - u8 261 - dot.file_type 262 - File type of this entry, 0x2 (directory) (if the feature flag is set). 263 * - 0x8 264 - char 265 - dot.name[4] 266 - “.\0\0\0” 267 * - 0xC 268 - __le32 269 - dotdot.inode 270 - inode number of parent directory. 271 * - 0x10 272 - __le16 273 - dotdot.rec_len 274 - block_size - 12. The record length is long enough to cover all htree 275 data. 276 * - 0x12 277 - u8 278 - dotdot.name_len 279 - Length of the name, 2. 280 * - 0x13 281 - u8 282 - dotdot.file_type 283 - File type of this entry, 0x2 (directory) (if the feature flag is set). 284 * - 0x14 285 - char 286 - dotdot_name[4] 287 - “..\0\0” 288 * - 0x18 289 - __le32 290 - struct dx_root_info.reserved_zero 291 - Zero. 292 * - 0x1C 293 - u8 294 - struct dx_root_info.hash_version 295 - Hash type, see dirhash_ table below. 296 * - 0x1D 297 - u8 298 - struct dx_root_info.info_length 299 - Length of the tree information, 0x8. 300 * - 0x1E 301 - u8 302 - struct dx_root_info.indirect_levels 303 - Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR 304 feature is set; cannot be larger than 2 otherwise. 305 * - 0x1F 306 - u8 307 - struct dx_root_info.unused_flags 308 - 309 * - 0x20 310 - __le16 311 - limit 312 - Maximum number of dx_entries that can follow this header, plus 1 for 313 the header itself. 314 * - 0x22 315 - __le16 316 - count 317 - Actual number of dx_entries that follow this header, plus 1 for the 318 header itself. 319 * - 0x24 320 - __le32 321 - block 322 - The block number (within the directory file) that goes with hash=0. 323 * - 0x28 324 - struct dx_entry 325 - entries[0] 326 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 327 328.. _dirhash: 329 330The directory hash is one of the following values: 331 332.. list-table:: 333 :widths: 16 64 334 :header-rows: 1 335 336 * - Value 337 - Description 338 * - 0x0 339 - Legacy. 340 * - 0x1 341 - Half MD4. 342 * - 0x2 343 - Tea. 344 * - 0x3 345 - Legacy, unsigned. 346 * - 0x4 347 - Half MD4, unsigned. 348 * - 0x5 349 - Tea, unsigned. 350 * - 0x6 351 - Siphash. 352 353Interior nodes of an htree are recorded as ``struct dx_node``, which is 354also the full length of a data block: 355 356.. list-table:: 357 :widths: 8 8 24 40 358 :header-rows: 1 359 360 * - Offset 361 - Type 362 - Name 363 - Description 364 * - 0x0 365 - __le32 366 - fake.inode 367 - Zero, to make it look like this entry is not in use. 368 * - 0x4 369 - __le16 370 - fake.rec_len 371 - The size of the block, in order to hide all of the dx_node data. 372 * - 0x6 373 - u8 374 - name_len 375 - Zero. There is no name for this “unused” directory entry. 376 * - 0x7 377 - u8 378 - file_type 379 - Zero. There is no file type for this “unused” directory entry. 380 * - 0x8 381 - __le16 382 - limit 383 - Maximum number of dx_entries that can follow this header, plus 1 for 384 the header itself. 385 * - 0xA 386 - __le16 387 - count 388 - Actual number of dx_entries that follow this header, plus 1 for the 389 header itself. 390 * - 0xE 391 - __le32 392 - block 393 - The block number (within the directory file) that goes with the lowest 394 hash value of this block. This value is stored in the parent block. 395 * - 0x12 396 - struct dx_entry 397 - entries[0] 398 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 399 400The hash maps that exist in both ``struct dx_root`` and 401``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes 402long: 403 404.. list-table:: 405 :widths: 8 8 24 40 406 :header-rows: 1 407 408 * - Offset 409 - Type 410 - Name 411 - Description 412 * - 0x0 413 - __le32 414 - hash 415 - Hash code. 416 * - 0x4 417 - __le32 418 - block 419 - Block number (within the directory file, not filesystem blocks) of the 420 next node in the htree. 421 422(If you think this is all quite clever and peculiar, so does the 423author.) 424 425If metadata checksums are enabled, the last 8 bytes of the directory 426block (precisely the length of one dx_entry) are used to store a 427``struct dx_tail``, which contains the checksum. The ``limit`` and 428``count`` entries in the dx_root/dx_node structures are adjusted as 429necessary to fit the dx_tail into the block. If there is no space for 430the dx_tail, the user is notified to run e2fsck -D to rebuild the 431directory index (which will ensure that there's space for the checksum. 432The dx_tail structure is 8 bytes long and looks like this: 433 434.. list-table:: 435 :widths: 8 8 24 40 436 :header-rows: 1 437 438 * - Offset 439 - Type 440 - Name 441 - Description 442 * - 0x0 443 - u32 444 - dt_reserved 445 - Zero. 446 * - 0x4 447 - __le32 448 - dt_checksum 449 - Checksum of the htree directory block. 450 451The checksum is calculated against the FS UUID, the htree index header 452(dx_root or dx_node), all of the htree indices (dx_entry) that are in 453use, and the tail block (dx_tail).