log-writes.rst - cachepc-linux - Fork of AMDESE/linux with modifications for CachePC side-channel attack

	cachepc-linux Fork of AMDESE/linux with modifications for CachePC side-channel attack
	git clone https://git.sinitax.com/sinitax/cachepc-linux
	Log \| Files \| Refs \| README \| LICENSE \| sfeed.txt
log-writes.rst (5170B)
      1=============
      2dm-log-writes
      3=============
      4
      5This target takes 2 devices, one to pass all IO to normally, and one to log all
      6of the write operations to.  This is intended for file system developers wishing
      7to verify the integrity of metadata or data as the file system is written to.
      8There is a log_write_entry written for every WRITE request and the target is
      9able to take arbitrary data from userspace to insert into the log.  The data
     10that is in the WRITE requests is copied into the log to make the replay happen
     11exactly as it happened originally.
     12
     13Log Ordering
     14============
     15
     16We log things in order of completion once we are sure the write is no longer in
     17cache.  This means that normal WRITE requests are not actually logged until the
     18next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
     19the log in a way that correlates to what is on disk and not what is in cache,
     20to make it easier to detect improper waiting/flushing.
     21
     22This works by attaching all WRITE requests to a list once the write completes.
     23Once we see a REQ_PREFLUSH request we splice this list onto the request and once
     24the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
     25completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
     26simulate the worst case scenario with regard to power failures.  Consider the
     27following example (W means write, C means complete):
     28
     29	W1,W2,W3,C3,C2,Wflush,C1,Cflush
     30
     31The log would show the following:
     32
     33	W3,W2,flush,W1....
     34
     35Again this is to simulate what is actually on disk, this allows us to detect
     36cases where a power failure at a particular point in time would create an
     37inconsistent file system.
     38
     39Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
     40they complete as those requests will obviously bypass the device cache.
     41
     42Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
     43have all the DISCARD requests, and then the WRITE requests and then the FLUSH
     44request.  Consider the following example:
     45
     46	WRITE block 1, DISCARD block 1, FLUSH
     47
     48If we logged DISCARD when it completed, the replay would look like this:
     49
     50	DISCARD 1, WRITE 1, FLUSH
     51
     52which isn't quite what happened and wouldn't be caught during the log replay.
     53
     54Target interface
     55================
     56
     57i) Constructor
     58
     59   log-writes <dev_path> <log_dev_path>
     60
     61   ============= ==============================================
     62   dev_path	 Device that all of the IO will go to normally.
     63   log_dev_path  Device where the log entries are written to.
     64   ============= ==============================================
     65
     66ii) Status
     67
     68    <#logged entries> <highest allocated sector>
     69
     70    =========================== ========================
     71    #logged entries	        Number of logged entries
     72    highest allocated sector    Highest allocated sector
     73    =========================== ========================
     74
     75iii) Messages
     76
     77    mark <description>
     78
     79	You can use a dmsetup message to set an arbitrary mark in a log.
     80	For example say you want to fsck a file system after every
     81	write, but first you need to replay up to the mkfs to make sure
     82	we're fsck'ing something reasonable, you would do something like
     83	this::
     84
     85	  mkfs.btrfs -f /dev/mapper/log
     86	  dmsetup message log 0 mark mkfs
     87	  <run test>
     88
     89	This would allow you to replay the log up to the mkfs mark and
     90	then replay from that point on doing the fsck check in the
     91	interval that you want.
     92
     93	Every log has a mark at the end labeled "dm-log-writes-end".
     94
     95Userspace component
     96===================
     97
     98There is a userspace tool that will replay the log for you in various ways.
     99It can be found here: https://github.com/josefbacik/log-writes
    100
    101Example usage
    102=============
    103
    104Say you want to test fsync on your file system.  You would do something like
    105this::
    106
    107  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
    108  dmsetup create log --table "$TABLE"
    109  mkfs.btrfs -f /dev/mapper/log
    110  dmsetup message log 0 mark mkfs
    111
    112  mount /dev/mapper/log /mnt/btrfs-test
    113  <some test that does fsync at the end>
    114  dmsetup message log 0 mark fsync
    115  md5sum /mnt/btrfs-test/foo
    116  umount /mnt/btrfs-test
    117
    118  dmsetup remove log
    119  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
    120  mount /dev/sdb /mnt/btrfs-test
    121  md5sum /mnt/btrfs-test/foo
    122  <verify md5sum's are correct>
    123
    124  Another option is to do a complicated file system operation and verify the file
    125  system is consistent during the entire operation.  You could do this with:
    126
    127  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
    128  dmsetup create log --table "$TABLE"
    129  mkfs.btrfs -f /dev/mapper/log
    130  dmsetup message log 0 mark mkfs
    131
    132  mount /dev/mapper/log /mnt/btrfs-test
    133  <fsstress to dirty the fs>
    134  btrfs filesystem balance /mnt/btrfs-test
    135  umount /mnt/btrfs-test
    136  dmsetup remove log
    137
    138  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
    139  btrfsck /dev/sdb
    140  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
    141	--fsck "btrfsck /dev/sdb" --check fua
    142
    143And that will replay the log until it sees a FUA request, run the fsck command
    144and if the fsck passes it will replay to the next FUA, until it is completed or
    145the fsck command exists abnormally.