cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

errseq.rst (6585B)


      1=====================
      2The errseq_t datatype
      3=====================
      4
      5An errseq_t is a way of recording errors in one place, and allowing any
      6number of "subscribers" to tell whether it has changed since a previous
      7point where it was sampled.
      8
      9The initial use case for this is tracking errors for file
     10synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
     11but it may be usable in other situations.
     12
     13It's implemented as an unsigned 32-bit value.  The low order bits are
     14designated to hold an error code (between 1 and MAX_ERRNO).  The upper bits
     15are used as a counter.  This is done with atomics instead of locking so that
     16these functions can be called from any context.
     17
     18Note that there is a risk of collisions if new errors are being recorded
     19frequently, since we have so few bits to use as a counter.
     20
     21To mitigate this, the bit between the error value and counter is used as
     22a flag to tell whether the value has been sampled since a new value was
     23recorded.  That allows us to avoid bumping the counter if no one has
     24sampled it since the last time an error was recorded.
     25
     26Thus we end up with a value that looks something like this:
     27
     28+--------------------------------------+----+------------------------+
     29| 31..13                               | 12 | 11..0                  |
     30+--------------------------------------+----+------------------------+
     31| counter                              | SF | errno                  |
     32+--------------------------------------+----+------------------------+
     33
     34The general idea is for "watchers" to sample an errseq_t value and keep
     35it as a running cursor.  That value can later be used to tell whether
     36any new errors have occurred since that sampling was done, and atomically
     37record the state at the time that it was checked.  This allows us to
     38record errors in one place, and then have a number of "watchers" that
     39can tell whether the value has changed since they last checked it.
     40
     41A new errseq_t should always be zeroed out.  An errseq_t value of all zeroes
     42is the special (but common) case where there has never been an error. An all
     43zero value thus serves as the "epoch" if one wishes to know whether there
     44has ever been an error set since it was first initialized.
     45
     46API usage
     47=========
     48
     49Let me tell you a story about a worker drone.  Now, he's a good worker
     50overall, but the company is a little...management heavy.  He has to
     51report to 77 supervisors today, and tomorrow the "big boss" is coming in
     52from out of town and he's sure to test the poor fellow too.
     53
     54They're all handing him work to do -- so much he can't keep track of who
     55handed him what, but that's not really a big problem.  The supervisors
     56just want to know when he's finished all of the work they've handed him so
     57far and whether he made any mistakes since they last asked.
     58
     59He might have made the mistake on work they didn't actually hand him,
     60but he can't keep track of things at that level of detail, all he can
     61remember is the most recent mistake that he made.
     62
     63Here's our worker_drone representation::
     64
     65        struct worker_drone {
     66                errseq_t        wd_err; /* for recording errors */
     67        };
     68
     69Every day, the worker_drone starts out with a blank slate::
     70
     71        struct worker_drone wd;
     72
     73        wd.wd_err = (errseq_t)0;
     74
     75The supervisors come in and get an initial read for the day.  They
     76don't care about anything that happened before their watch begins::
     77
     78        struct supervisor {
     79                errseq_t        s_wd_err; /* private "cursor" for wd_err */
     80                spinlock_t      s_wd_err_lock; /* protects s_wd_err */
     81        }
     82
     83        struct supervisor       su;
     84
     85        su.s_wd_err = errseq_sample(&wd.wd_err);
     86        spin_lock_init(&su.s_wd_err_lock);
     87
     88Now they start handing him tasks to do.  Every few minutes they ask him to
     89finish up all of the work they've handed him so far.  Then they ask him
     90whether he made any mistakes on any of it::
     91
     92        spin_lock(&su.su_wd_err_lock);
     93        err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
     94        spin_unlock(&su.su_wd_err_lock);
     95
     96Up to this point, that just keeps returning 0.
     97
     98Now, the owners of this company are quite miserly and have given him
     99substandard equipment with which to do his job. Occasionally it
    100glitches and he makes a mistake.  He sighs a heavy sigh, and marks it
    101down::
    102
    103        errseq_set(&wd.wd_err, -EIO);
    104
    105...and then gets back to work.  The supervisors eventually poll again
    106and they each get the error when they next check.  Subsequent calls will
    107return 0, until another error is recorded, at which point it's reported
    108to each of them once.
    109
    110Note that the supervisors can't tell how many mistakes he made, only
    111whether one was made since they last checked, and the latest value
    112recorded.
    113
    114Occasionally the big boss comes in for a spot check and asks the worker
    115to do a one-off job for him. He's not really watching the worker
    116full-time like the supervisors, but he does need to know whether a
    117mistake occurred while his job was processing.
    118
    119He can just sample the current errseq_t in the worker, and then use that
    120to tell whether an error has occurred later::
    121
    122        errseq_t since = errseq_sample(&wd.wd_err);
    123        /* submit some work and wait for it to complete */
    124        err = errseq_check(&wd.wd_err, since);
    125
    126Since he's just going to discard "since" after that point, he doesn't
    127need to advance it here. He also doesn't need any locking since it's
    128not usable by anyone else.
    129
    130Serializing errseq_t cursor updates
    131===================================
    132
    133Note that the errseq_t API does not protect the errseq_t cursor during a
    134check_and_advance_operation. Only the canonical error code is handled
    135atomically.  In a situation where more than one task might be using the
    136same errseq_t cursor at the same time, it's important to serialize
    137updates to that cursor.
    138
    139If that's not done, then it's possible for the cursor to go backward
    140in which case the same error could be reported more than once.
    141
    142Because of this, it's often advantageous to first do an errseq_check to
    143see if anything has changed, and only later do an
    144errseq_check_and_advance after taking the lock. e.g.::
    145
    146        if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
    147                /* su.s_wd_err is protected by s_wd_err_lock */
    148                spin_lock(&su.s_wd_err_lock);
    149                err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
    150                spin_unlock(&su.s_wd_err_lock);
    151        }
    152
    153That avoids the spinlock in the common case where nothing has changed
    154since the last time it was checked.
    155
    156Functions
    157=========
    158
    159.. kernel-doc:: lib/errseq.c