squashfs.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
squashfs.rst (10901B)
      1.. SPDX-License-Identifier: GPL-2.0
      2
      3=======================
      4Squashfs 4.0 Filesystem
      5=======================
      6
      7Squashfs is a compressed read-only filesystem for Linux.
      8
      9It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
     10directories.  Inodes in the system are very small and all blocks are packed to
     11minimise data overhead. Block sizes greater than 4K are supported up to a
     12maximum of 1Mbytes (default block size 128K).
     13
     14Squashfs is intended for general read-only filesystem use, for archival
     15use (i.e. in cases where a .tar.gz file may be used), and in constrained
     16block device/memory systems (e.g. embedded systems) where low overhead is
     17needed.
     18
     19Mailing list: squashfs-devel@lists.sourceforge.net
     20Web site: www.squashfs.org
     21
     221. Filesystem Features
     23----------------------
     24
     25Squashfs filesystem features versus Cramfs:
     26
     27============================== 	=========		==========
     28				Squashfs		Cramfs
     29============================== 	=========		==========
     30Max filesystem size		2^64			256 MiB
     31Max file size			~ 2 TiB			16 MiB
     32Max files			unlimited		unlimited
     33Max directories			unlimited		unlimited
     34Max entries per directory	unlimited		unlimited
     35Max block size			1 MiB			4 KiB
     36Metadata compression		yes			no
     37Directory indexes		yes			no
     38Sparse file support		yes			no
     39Tail-end packing (fragments)	yes			no
     40Exportable (NFS etc.)		yes			no
     41Hard link support		yes			no
     42"." and ".." in readdir		yes			no
     43Real inode numbers		yes			no
     4432-bit uids/gids		yes			no
     45File creation time		yes			no
     46Xattr support			yes			no
     47ACL support			no			no
     48============================== 	=========		==========
     49
     50Squashfs compresses data, inodes and directories.  In addition, inode and
     51directory data are highly compacted, and packed on byte boundaries.  Each
     52compressed inode is on average 8 bytes in length (the exact length varies on
     53file type, i.e. regular file, directory, symbolic link, and block/char device
     54inodes have different sizes).
     55
     562. Using Squashfs
     57-----------------
     58
     59As squashfs is a read-only filesystem, the mksquashfs program must be used to
     60create populated squashfs filesystems.  This and other squashfs utilities
     61can be obtained from http://www.squashfs.org.  Usage instructions can be
     62obtained from this site also.
     63
     64The squashfs-tools development tree is now located on kernel.org
     65	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
     66
     673. Squashfs Filesystem Design
     68-----------------------------
     69
     70A squashfs filesystem consists of a maximum of nine parts, packed together on a
     71byte alignment::
     72
     73	 ---------------
     74	|  superblock 	|
     75	|---------------|
     76	|  compression  |
     77	|    options    |
     78	|---------------|
     79	|  datablocks   |
     80	|  & fragments  |
     81	|---------------|
     82	|  inode table	|
     83	|---------------|
     84	|   directory	|
     85	|     table     |
     86	|---------------|
     87	|   fragment	|
     88	|    table      |
     89	|---------------|
     90	|    export     |
     91	|    table      |
     92	|---------------|
     93	|    uid/gid	|
     94	|  lookup table	|
     95	|---------------|
     96	|     xattr     |
     97	|     table	|
     98	 ---------------
     99
    100Compressed data blocks are written to the filesystem as files are read from
    101the source directory, and checked for duplicates.  Once all file data has been
    102written the completed inode, directory, fragment, export, uid/gid lookup and
    103xattr tables are written.
    104
    1053.1 Compression options
    106-----------------------
    107
    108Compressors can optionally support compression specific options (e.g.
    109dictionary size).  If non-default compression options have been used, then
    110these are stored here.
    111
    1123.2 Inodes
    113----------
    114
    115Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
    116compressed block is prefixed by a two byte length, the top bit is set if the
    117block is uncompressed.  A block will be uncompressed if the -noI option is set,
    118or if the compressed block was larger than the uncompressed block.
    119
    120Inodes are packed into the metadata blocks, and are not aligned to block
    121boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
    122by a 48-bit number which encodes the location of the compressed metadata block
    123containing the inode, and the byte offset into that block where the inode is
    124placed (<block, offset>).
    125
    126To maximise compression there are different inodes for each file type
    127(regular file, directory, device, etc.), the inode contents and length
    128varying with the type.
    129
    130To further maximise compression, two types of regular file inode and
    131directory inode are defined: inodes optimised for frequently occurring
    132regular files and directories, and extended types where extra
    133information has to be stored.
    134
    1353.3 Directories
    136---------------
    137
    138Like inodes, directories are packed into compressed metadata blocks, stored
    139in a directory table.  Directories are accessed using the start address of
    140the metablock containing the directory and the offset into the
    141decompressed block (<block, offset>).
    142
    143Directories are organised in a slightly complex way, and are not simply
    144a list of file names.  The organisation takes advantage of the
    145fact that (in most cases) the inodes of the files will be in the same
    146compressed metadata block, and therefore, can share the start block.
    147Directories are therefore organised in a two level list, a directory
    148header containing the shared start block value, and a sequence of directory
    149entries, each of which share the shared start block.  A new directory header
    150is written once/if the inode start block changes.  The directory
    151header/directory entry list is repeated as many times as necessary.
    152
    153Directories are sorted, and can contain a directory index to speed up
    154file lookup.  Directory indexes store one entry per metablock, each entry
    155storing the index/filename mapping to the first directory header
    156in each metadata block.  Directories are sorted in alphabetical order,
    157and at lookup the index is scanned linearly looking for the first filename
    158alphabetically larger than the filename being looked up.  At this point the
    159location of the metadata block the filename is in has been found.
    160The general idea of the index is to ensure only one metadata block needs to be
    161decompressed to do a lookup irrespective of the length of the directory.
    162This scheme has the advantage that it doesn't require extra memory overhead
    163and doesn't require much extra storage on disk.
    164
    1653.4 File data
    166-------------
    167
    168Regular files consist of a sequence of contiguous compressed blocks, and/or a
    169compressed fragment block (tail-end packed block).   The compressed size
    170of each datablock is stored in a block list contained within the
    171file inode.
    172
    173To speed up access to datablocks when reading 'large' files (256 Mbytes or
    174larger), the code implements an index cache that caches the mapping from
    175block index to datablock location on disk.
    176
    177The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
    178retaining a simple and space-efficient block list on disk.  The cache
    179is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
    180Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
    181The index cache is designed to be memory efficient, and by default uses
    18216 KiB.
    183
    1843.5 Fragment lookup table
    185-------------------------
    186
    187Regular files can contain a fragment index which is mapped to a fragment
    188location on disk and compressed size using a fragment lookup table.  This
    189fragment lookup table is itself stored compressed into metadata blocks.
    190A second index table is used to locate these.  This second index table for
    191speed of access (and because it is small) is read at mount time and cached
    192in memory.
    193
    1943.6 Uid/gid lookup table
    195------------------------
    196
    197For space efficiency regular files store uid and gid indexes, which are
    198converted to 32-bit uids/gids using an id look up table.  This table is
    199stored compressed into metadata blocks.  A second index table is used to
    200locate these.  This second index table for speed of access (and because it
    201is small) is read at mount time and cached in memory.
    202
    2033.7 Export table
    204----------------
    205
    206To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
    207can optionally (disabled with the -no-exports Mksquashfs option) contain
    208an inode number to inode disk location lookup table.  This is required to
    209enable Squashfs to map inode numbers passed in filehandles to the inode
    210location on disk, which is necessary when the export code reinstantiates
    211expired/flushed inodes.
    212
    213This table is stored compressed into metadata blocks.  A second index table is
    214used to locate these.  This second index table for speed of access (and because
    215it is small) is read at mount time and cached in memory.
    216
    2173.8 Xattr table
    218---------------
    219
    220The xattr table contains extended attributes for each inode.  The xattrs
    221for each inode are stored in a list, each list entry containing a type,
    222name and value field.  The type field encodes the xattr prefix
    223("user.", "trusted." etc) and it also encodes how the name/value fields
    224should be interpreted.  Currently the type indicates whether the value
    225is stored inline (in which case the value field contains the xattr value),
    226or if it is stored out of line (in which case the value field stores a
    227reference to where the actual value is stored).  This allows large values
    228to be stored out of line improving scanning and lookup performance and it
    229also allows values to be de-duplicated, the value being stored once, and
    230all other occurrences holding an out of line reference to that value.
    231
    232The xattr lists are packed into compressed 8K metadata blocks.
    233To reduce overhead in inodes, rather than storing the on-disk
    234location of the xattr list inside each inode, a 32-bit xattr id
    235is stored.  This xattr id is mapped into the location of the xattr
    236list using a second xattr id lookup table.
    237
    2384. TODOs and Outstanding Issues
    239-------------------------------
    240
    2414.1 TODO list
    242-------------
    243
    244Implement ACL support.
    245
    2464.2 Squashfs Internal Cache
    247---------------------------
    248
    249Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
    250recently accessed data Squashfs uses two small metadata and fragment caches.
    251
    252The cache is not used for file datablocks, these are decompressed and cached in
    253the page-cache in the normal way.  The cache is used to temporarily cache
    254fragment and metadata blocks which have been read as a result of a metadata
    255(i.e. inode or directory) or fragment access.  Because metadata and fragments
    256are packed together into blocks (to gain greater compression) the read of a
    257particular piece of metadata or fragment will retrieve other metadata/fragments
    258which have been packed with it, these because of locality-of-reference may be
    259read in the near future. Temporarily caching them ensures they are available
    260for near future access without requiring an additional read and decompress.
    261
    262In the future this internal cache may be replaced with an implementation which
    263uses the kernel page cache.  Because the page cache operates on page sized
    264units this may introduce additional complexity in terms of locking and
    265associated race conditions.