cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

zram.rst (15291B)


      1========================================
      2zram: Compressed RAM-based block devices
      3========================================
      4
      5Introduction
      6============
      7
      8The zram module creates RAM-based block devices named /dev/zram<id>
      9(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
     10in memory itself. These disks allow very fast I/O and compression provides
     11good amounts of memory savings. Some of the use cases include /tmp storage,
     12use as swap disks, various caches under /var and maybe many more. :)
     13
     14Statistics for individual zram devices are exported through sysfs nodes at
     15/sys/block/zram<id>/
     16
     17Usage
     18=====
     19
     20There are several ways to configure and manage zram device(-s):
     21
     22a) using zram and zram_control sysfs attributes
     23b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
     24
     25In this document we will describe only 'manual' zram configuration steps,
     26IOW, zram and zram_control sysfs attributes.
     27
     28In order to get a better idea about zramctl please consult util-linux
     29documentation, zramctl man-page or `zramctl --help`. Please be informed
     30that zram maintainers do not develop/maintain util-linux or zramctl, should
     31you have any questions please contact util-linux@vger.kernel.org
     32
     33Following shows a typical sequence of steps for using zram.
     34
     35WARNING
     36=======
     37
     38For the sake of simplicity we skip error checking parts in most of the
     39examples below. However, it is your sole responsibility to handle errors.
     40
     41zram sysfs attributes always return negative values in case of errors.
     42The list of possible return codes:
     43
     44========  =============================================================
     45-EBUSY	  an attempt to modify an attribute that cannot be changed once
     46	  the device has been initialised. Please reset device first.
     47-ENOMEM	  zram was not able to allocate enough memory to fulfil your
     48	  needs.
     49-EINVAL	  invalid input has been provided.
     50========  =============================================================
     51
     52If you use 'echo', the returned value is set by the 'echo' utility,
     53and, in general case, something like::
     54
     55	echo 3 > /sys/block/zram0/max_comp_streams
     56	if [ $? -ne 0 ]; then
     57		handle_error
     58	fi
     59
     60should suffice.
     61
     621) Load Module
     63==============
     64
     65::
     66
     67	modprobe zram num_devices=4
     68
     69This creates 4 devices: /dev/zram{0,1,2,3}
     70
     71num_devices parameter is optional and tells zram how many devices should be
     72pre-created. Default: 1.
     73
     742) Set max number of compression streams
     75========================================
     76
     77Regardless of the value passed to this attribute, ZRAM will always
     78allocate multiple compression streams - one per online CPU - thus
     79allowing several concurrent compression operations. The number of
     80allocated compression streams goes down when some of the CPUs
     81become offline. There is no single-compression-stream mode anymore,
     82unless you are running a UP system or have only 1 CPU online.
     83
     84To find out how many streams are currently available::
     85
     86	cat /sys/block/zram0/max_comp_streams
     87
     883) Select compression algorithm
     89===============================
     90
     91Using comp_algorithm device attribute one can see available and
     92currently selected (shown in square brackets) compression algorithms,
     93or change the selected compression algorithm (once the device is initialised
     94there is no way to change compression algorithm).
     95
     96Examples::
     97
     98	#show supported compression algorithms
     99	cat /sys/block/zram0/comp_algorithm
    100	lzo [lz4]
    101
    102	#select lzo compression algorithm
    103	echo lzo > /sys/block/zram0/comp_algorithm
    104
    105For the time being, the `comp_algorithm` content does not necessarily
    106show every compression algorithm supported by the kernel. We keep this
    107list primarily to simplify device configuration and one can configure
    108a new device with a compression algorithm that is not listed in
    109`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
    110and, if some of the algorithms were built as modules, it's impossible
    111to list all of them using, for instance, /proc/crypto or any other
    112method. This, however, has an advantage of permitting the usage of
    113custom crypto compression modules (implementing S/W or H/W compression).
    114
    1154) Set Disksize
    116===============
    117
    118Set disk size by writing the value to sysfs node 'disksize'.
    119The value can be either in bytes or you can use mem suffixes.
    120Examples::
    121
    122	# Initialize /dev/zram0 with 50MB disksize
    123	echo $((50*1024*1024)) > /sys/block/zram0/disksize
    124
    125	# Using mem suffixes
    126	echo 256K > /sys/block/zram0/disksize
    127	echo 512M > /sys/block/zram0/disksize
    128	echo 1G > /sys/block/zram0/disksize
    129
    130Note:
    131There is little point creating a zram of greater than twice the size of memory
    132since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
    133size of the disk when not in use so a huge zram is wasteful.
    134
    1355) Set memory limit: Optional
    136=============================
    137
    138Set memory limit by writing the value to sysfs node 'mem_limit'.
    139The value can be either in bytes or you can use mem suffixes.
    140In addition, you could change the value in runtime.
    141Examples::
    142
    143	# limit /dev/zram0 with 50MB memory
    144	echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
    145
    146	# Using mem suffixes
    147	echo 256K > /sys/block/zram0/mem_limit
    148	echo 512M > /sys/block/zram0/mem_limit
    149	echo 1G > /sys/block/zram0/mem_limit
    150
    151	# To disable memory limit
    152	echo 0 > /sys/block/zram0/mem_limit
    153
    1546) Activate
    155===========
    156
    157::
    158
    159	mkswap /dev/zram0
    160	swapon /dev/zram0
    161
    162	mkfs.ext4 /dev/zram1
    163	mount /dev/zram1 /tmp
    164
    1657) Add/remove zram devices
    166==========================
    167
    168zram provides a control interface, which enables dynamic (on-demand) device
    169addition and removal.
    170
    171In order to add a new /dev/zramX device, perform a read operation on the hot_add
    172attribute. This will return either the new device's device id (meaning that you
    173can use /dev/zram<id>) or an error code.
    174
    175Example::
    176
    177	cat /sys/class/zram-control/hot_add
    178	1
    179
    180To remove the existing /dev/zramX device (where X is a device id)
    181execute::
    182
    183	echo X > /sys/class/zram-control/hot_remove
    184
    1858) Stats
    186========
    187
    188Per-device statistics are exported as various nodes under /sys/block/zram<id>/
    189
    190A brief description of exported device attributes follows. For more details
    191please read Documentation/ABI/testing/sysfs-block-zram.
    192
    193======================  ======  ===============================================
    194Name            	access            description
    195======================  ======  ===============================================
    196disksize          	RW	show and set the device's disk size
    197initstate         	RO	shows the initialization state of the device
    198reset             	WO	trigger device reset
    199mem_used_max      	WO	reset the `mem_used_max` counter (see later)
    200mem_limit         	WO	specifies the maximum amount of memory ZRAM can
    201				use to store the compressed data
    202writeback_limit   	WO	specifies the maximum amount of write IO zram
    203				can write out to backing device as 4KB unit
    204writeback_limit_enable  RW	show and set writeback_limit feature
    205max_comp_streams  	RW	the number of possible concurrent compress
    206				operations
    207comp_algorithm    	RW	show and change the compression algorithm
    208compact           	WO	trigger memory compaction
    209debug_stat        	RO	this file is used for zram debugging purposes
    210backing_dev	  	RW	set up backend storage for zram to write out
    211idle		  	WO	mark allocated slot as idle
    212======================  ======  ===============================================
    213
    214
    215User space is advised to use the following files to read the device statistics.
    216
    217File /sys/block/zram<id>/stat
    218
    219Represents block layer statistics. Read Documentation/block/stat.rst for
    220details.
    221
    222File /sys/block/zram<id>/io_stat
    223
    224The stat file represents device's I/O statistics not accounted by block
    225layer and, thus, not available in zram<id>/stat file. It consists of a
    226single line of text and contains the following stats separated by
    227whitespace:
    228
    229 =============    =============================================================
    230 failed_reads     The number of failed reads
    231 failed_writes    The number of failed writes
    232 invalid_io       The number of non-page-size-aligned I/O requests
    233 notify_free      Depending on device usage scenario it may account
    234
    235                  a) the number of pages freed because of swap slot free
    236                     notifications
    237                  b) the number of pages freed because of
    238                     REQ_OP_DISCARD requests sent by bio. The former ones are
    239                     sent to a swap block device when a swap slot is freed,
    240                     which implies that this disk is being used as a swap disk.
    241
    242                  The latter ones are sent by filesystem mounted with
    243                  discard option, whenever some data blocks are getting
    244                  discarded.
    245 =============    =============================================================
    246
    247File /sys/block/zram<id>/mm_stat
    248
    249The mm_stat file represents the device's mm statistics. It consists of a single
    250line of text and contains the following stats separated by whitespace:
    251
    252 ================ =============================================================
    253 orig_data_size   uncompressed size of data stored in this disk.
    254                  Unit: bytes
    255 compr_data_size  compressed size of data stored in this disk
    256 mem_used_total   the amount of memory allocated for this disk. This
    257                  includes allocator fragmentation and metadata overhead,
    258                  allocated for this disk. So, allocator space efficiency
    259                  can be calculated using compr_data_size and this statistic.
    260                  Unit: bytes
    261 mem_limit        the maximum amount of memory ZRAM can use to store
    262                  the compressed data
    263 mem_used_max     the maximum amount of memory zram has consumed to
    264                  store the data
    265 same_pages       the number of same element filled pages written to this disk.
    266                  No memory is allocated for such pages.
    267 pages_compacted  the number of pages freed during compaction
    268 huge_pages	  the number of incompressible pages
    269 huge_pages_since the number of incompressible pages since zram set up
    270 ================ =============================================================
    271
    272File /sys/block/zram<id>/bd_stat
    273
    274The bd_stat file represents a device's backing device statistics. It consists of
    275a single line of text and contains the following stats separated by whitespace:
    276
    277 ============== =============================================================
    278 bd_count	size of data written in backing device.
    279		Unit: 4K bytes
    280 bd_reads	the number of reads from backing device
    281		Unit: 4K bytes
    282 bd_writes	the number of writes to backing device
    283		Unit: 4K bytes
    284 ============== =============================================================
    285
    2869) Deactivate
    287=============
    288
    289::
    290
    291	swapoff /dev/zram0
    292	umount /dev/zram1
    293
    29410) Reset
    295=========
    296
    297	Write any positive value to 'reset' sysfs node::
    298
    299		echo 1 > /sys/block/zram0/reset
    300		echo 1 > /sys/block/zram1/reset
    301
    302	This frees all the memory allocated for the given device and
    303	resets the disksize to zero. You must set the disksize again
    304	before reusing the device.
    305
    306Optional Feature
    307================
    308
    309writeback
    310---------
    311
    312With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
    313to backing storage rather than keeping it in memory.
    314To use the feature, admin should set up backing device via::
    315
    316	echo /dev/sda5 > /sys/block/zramX/backing_dev
    317
    318before disksize setting. It supports only partitions at this moment.
    319If admin wants to use incompressible page writeback, they could do it via::
    320
    321	echo huge > /sys/block/zramX/writeback
    322
    323To use idle page writeback, first, user need to declare zram pages
    324as idle::
    325
    326	echo all > /sys/block/zramX/idle
    327
    328From now on, any pages on zram are idle pages. The idle mark
    329will be removed until someone requests access of the block.
    330IOW, unless there is access request, those pages are still idle pages.
    331Additionally, when CONFIG_ZRAM_MEMORY_TRACKING is enabled pages can be
    332marked as idle based on how long (in seconds) it's been since they were
    333last accessed::
    334
    335        echo 86400 > /sys/block/zramX/idle
    336
    337In this example all pages which haven't been accessed in more than 86400
    338seconds (one day) will be marked idle.
    339
    340Admin can request writeback of those idle pages at right timing via::
    341
    342	echo idle > /sys/block/zramX/writeback
    343
    344With the command, zram will writeback idle pages from memory to the storage.
    345
    346Additionally, if a user choose to writeback only huge and idle pages
    347this can be accomplished with::
    348
    349        echo huge_idle > /sys/block/zramX/writeback
    350
    351If an admin wants to write a specific page in zram device to the backing device,
    352they could write a page index into the interface.
    353
    354	echo "page_index=1251" > /sys/block/zramX/writeback
    355
    356If there are lots of write IO with flash device, potentially, it has
    357flash wearout problem so that admin needs to design write limitation
    358to guarantee storage health for entire product life.
    359
    360To overcome the concern, zram supports "writeback_limit" feature.
    361The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
    362any writeback. IOW, if admin wants to apply writeback budget, they should
    363enable writeback_limit_enable via::
    364
    365	$ echo 1 > /sys/block/zramX/writeback_limit_enable
    366
    367Once writeback_limit_enable is set, zram doesn't allow any writeback
    368until admin sets the budget via /sys/block/zramX/writeback_limit.
    369
    370(If admin doesn't enable writeback_limit_enable, writeback_limit's value
    371assigned via /sys/block/zramX/writeback_limit is meaningless.)
    372
    373If admin wants to limit writeback as per-day 400M, they could do it
    374like below::
    375
    376	$ MB_SHIFT=20
    377	$ 4K_SHIFT=12
    378	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
    379		/sys/block/zram0/writeback_limit.
    380	$ echo 1 > /sys/block/zram0/writeback_limit_enable
    381
    382If admins want to allow further write again once the budget is exhausted,
    383they could do it like below::
    384
    385	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
    386		/sys/block/zram0/writeback_limit
    387
    388If an admin wants to see the remaining writeback budget since last set::
    389
    390	$ cat /sys/block/zramX/writeback_limit
    391
    392If an admin wants to disable writeback limit, they could do::
    393
    394	$ echo 0 > /sys/block/zramX/writeback_limit_enable
    395
    396The writeback_limit count will reset whenever you reset zram (e.g.,
    397system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
    398writeback happened until you reset the zram to allocate extra writeback
    399budget in next setting is user's job.
    400
    401If admin wants to measure writeback count in a certain period, they could
    402know it via /sys/block/zram0/bd_stat's 3rd column.
    403
    404memory tracking
    405===============
    406
    407With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
    408zram block. It could be useful to catch cold or incompressible
    409pages of the process with*pagemap.
    410
    411If you enable the feature, you could see block state via
    412/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
    413
    414	  300    75.033841 .wh.
    415	  301    63.806904 s...
    416	  302    63.806919 ..hi
    417
    418First column
    419	zram's block index.
    420Second column
    421	access time since the system was booted
    422Third column
    423	state of the block:
    424
    425	s:
    426		same page
    427	w:
    428		written page to backing store
    429	h:
    430		huge page
    431	i:
    432		idle page
    433
    434First line of above example says 300th block is accessed at 75.033841sec
    435and the block's state is huge so it is written back to the backing
    436storage. It's a debugging feature so anyone shouldn't rely on it to work
    437properly.
    438
    439Nitin Gupta
    440ngupta@vflare.org