cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

botching-up-ioctls.rst (11573B)


      1=================================
      2(How to avoid) Botching up ioctls
      3=================================
      4
      5From: https://blog.ffwll.ch/2013/11/botching-up-ioctls.html
      6
      7By: Daniel Vetter, Copyright © 2013 Intel Corporation
      8
      9One clear insight kernel graphics hackers gained in the past few years is that
     10trying to come up with a unified interface to manage the execution units and
     11memory on completely different GPUs is a futile effort. So nowadays every
     12driver has its own set of ioctls to allocate memory and submit work to the GPU.
     13Which is nice, since there's no more insanity in the form of fake-generic, but
     14actually only used once interfaces. But the clear downside is that there's much
     15more potential to screw things up.
     16
     17To avoid repeating all the same mistakes again I've written up some of the
     18lessons learned while botching the job for the drm/i915 driver. Most of these
     19only cover technicalities and not the big-picture issues like what the command
     20submission ioctl exactly should look like. Learning these lessons is probably
     21something every GPU driver has to do on its own.
     22
     23
     24Prerequisites
     25-------------
     26
     27First the prerequisites. Without these you have already failed, because you
     28will need to add a 32-bit compat layer:
     29
     30 * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
     31   the kernel has special types like __u32, __s64. Use them.
     32
     33 * Align everything to the natural size and use explicit padding. 32-bit
     34   platforms don't necessarily align 64-bit values to 64-bit boundaries, but
     35   64-bit platforms do. So we always need padding to the natural size to get
     36   this right.
     37
     38 * Pad the entire struct to a multiple of 64-bits if the structure contains
     39   64-bit types - the structure size will otherwise differ on 32-bit versus
     40   64-bit. Having a different structure size hurts when passing arrays of
     41   structures to the kernel, or if the kernel checks the structure size, which
     42   e.g. the drm core does.
     43
     44 * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
     45   from/to a void __user * in the kernel. Try really hard not to delay this
     46   conversion or worse, fiddle the raw __u64 through your code since that
     47   diminishes the checking tools like sparse can provide. The macro
     48   u64_to_user_ptr can be used in the kernel to avoid warnings about integers
     49   and pointers of different sizes.
     50
     51
     52Basics
     53------
     54
     55With the joys of writing a compat layer avoided we can take a look at the basic
     56fumbles. Neglecting these will make backward and forward compatibility a real
     57pain. And since getting things wrong on the first attempt is guaranteed you
     58will have a second iteration or at least an extension for any given interface.
     59
     60 * Have a clear way for userspace to figure out whether your new ioctl or ioctl
     61   extension is supported on a given kernel. If you can't rely on old kernels
     62   rejecting the new flags/modes or ioctls (since doing that was botched in the
     63   past) then you need a driver feature flag or revision number somewhere.
     64
     65 * Have a plan for extending ioctls with new flags or new fields at the end of
     66   the structure. The drm core checks the passed-in size for each ioctl call
     67   and zero-extends any mismatches between kernel and userspace. That helps,
     68   but isn't a complete solution since newer userspace on older kernels won't
     69   notice that the newly added fields at the end get ignored. So this still
     70   needs a new driver feature flags.
     71
     72 * Check all unused fields and flags and all the padding for whether it's 0,
     73   and reject the ioctl if that's not the case. Otherwise your nice plan for
     74   future extensions is going right down the gutters since someone will submit
     75   an ioctl struct with random stack garbage in the yet unused parts. Which
     76   then bakes in the ABI that those fields can never be used for anything else
     77   but garbage. This is also the reason why you must explicitly pad all
     78   structures, even if you never use them in an array - the padding the compiler
     79   might insert could contain garbage.
     80
     81 * Have simple testcases for all of the above.
     82
     83
     84Fun with Error Paths
     85--------------------
     86
     87Nowadays we don't have any excuse left any more for drm drivers being neat
     88little root exploits. This means we both need full input validation and solid
     89error handling paths - GPUs will die eventually in the oddmost corner cases
     90anyway:
     91
     92 * The ioctl must check for array overflows. Also it needs to check for
     93   over/underflows and clamping issues of integer values in general. The usual
     94   example is sprite positioning values fed directly into the hardware with the
     95   hardware just having 12 bits or so. Works nicely until some odd display
     96   server doesn't bother with clamping itself and the cursor wraps around the
     97   screen.
     98
     99 * Have simple testcases for every input validation failure case in your ioctl.
    100   Check that the error code matches your expectations. And finally make sure
    101   that you only test for one single error path in each subtest by submitting
    102   otherwise perfectly valid data. Without this an earlier check might reject
    103   the ioctl already and shadow the codepath you actually want to test, hiding
    104   bugs and regressions.
    105
    106 * Make all your ioctls restartable. First X really loves signals and second
    107   this will allow you to test 90% of all error handling paths by just
    108   interrupting your main test suite constantly with signals. Thanks to X's
    109   love for signal you'll get an excellent base coverage of all your error
    110   paths pretty much for free for graphics drivers. Also, be consistent with
    111   how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
    112   userspace library. The i915 driver botched this with the set_tiling ioctl,
    113   now we're stuck forever with some arcane semantics in both the kernel and
    114   userspace.
    115
    116 * If you can't make a given codepath restartable make a stuck task at least
    117   killable. GPUs just die and your users won't like you more if you hang their
    118   entire box (by means of an unkillable X process). If the state recovery is
    119   still too tricky have a timeout or hangcheck safety net as a last-ditch
    120   effort in case the hardware has gone bananas.
    121
    122 * Have testcases for the really tricky corner cases in your error recovery code
    123   - it's way too easy to create a deadlock between your hangcheck code and
    124   waiters.
    125
    126
    127Time, Waiting and Missing it
    128----------------------------
    129
    130GPUs do most everything asynchronously, so we have a need to time operations and
    131wait for outstanding ones. This is really tricky business; at the moment none of
    132the ioctls supported by the drm/i915 get this fully right, which means there's
    133still tons more lessons to learn here.
    134
    135 * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
    136   v4l use by default nowadays. But let userspace know which timestamps are
    137   derived from different clock domains like your main system clock (provided
    138   by the kernel) or some independent hardware counter somewhere else. Clocks
    139   will mismatch if you look close enough, but if performance measuring tools
    140   have this information they can at least compensate. If your userspace can
    141   get at the raw values of some clocks (e.g. through in-command-stream
    142   performance counter sampling instructions) consider exposing those also.
    143
    144 * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
    145   convenient time specification, but it's mostly the standard.
    146
    147 * Check that input time values are normalized and reject them if not. Note
    148   that the kernel native struct ktime has a signed integer for both seconds
    149   and nanoseconds, so beware here.
    150
    151 * For timeouts, use absolute times. If you're a good fellow and made your
    152   ioctl restartable relative timeouts tend to be too coarse and can
    153   indefinitely extend your wait time due to rounding on each restart.
    154   Especially if your reference clock is something really slow like the display
    155   frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
    156   always be extended - but users will surely hate you if their neat animations
    157   starts to stutter due to this.
    158
    159 * Consider ditching any synchronous wait ioctls with timeouts and just deliver
    160   an asynchronous event on a pollable file descriptor. It fits much better
    161   into event driven applications' main loop.
    162
    163 * Have testcases for corner-cases, especially whether the return values for
    164   already-completed events, successful waits and timed-out waits are all sane
    165   and suiting to your needs.
    166
    167
    168Leaking Resources, Not
    169----------------------
    170
    171A full-blown drm driver essentially implements a little OS, but specialized to
    172the given GPU platforms. This means a driver needs to expose tons of handles
    173for different objects and other resources to userspace. Doing that right
    174entails its own little set of pitfalls:
    175
    176 * Always attach the lifetime of your dynamically created resources to the
    177   lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
    178   needs to be shared across processes -  fd-passing over unix domain sockets
    179   also simplifies lifetime management for userspace.
    180
    181 * Always have O_CLOEXEC support.
    182
    183 * Ensure that you have sufficient insulation between different clients. By
    184   default pick a private per-fd namespace which forces any sharing to be done
    185   explicitly. Only go with a more global per-device namespace if the objects
    186   are truly device-unique. One counterexample in the drm modeset interfaces is
    187   that the per-device modeset objects like connectors share a namespace with
    188   framebuffer objects, which mostly are not shared at all. A separate
    189   namespace, private by default, for framebuffers would have been more
    190   suitable.
    191
    192 * Think about uniqueness requirements for userspace handles. E.g. for most drm
    193   drivers it's a userspace bug to submit the same object twice in the same
    194   command submission ioctl. But then if objects are shareable userspace needs
    195   to know whether it has seen an imported object from a different process
    196   already or not. I haven't tried this myself yet due to lack of a new class
    197   of objects, but consider using inode numbers on your shared file descriptors
    198   as unique identifiers - it's how real files are told apart, too.
    199   Unfortunately this requires a full-blown virtual filesystem in the kernel.
    200
    201
    202Last, but not Least
    203-------------------
    204
    205Not every problem needs a new ioctl:
    206
    207 * Think hard whether you really want a driver-private interface. Of course
    208   it's much quicker to push a driver-private interface than engaging in
    209   lengthy discussions for a more generic solution. And occasionally doing a
    210   private interface to spearhead a new concept is what's required. But in the
    211   end, once the generic interface comes around you'll end up maintainer two
    212   interfaces. Indefinitely.
    213
    214 * Consider other interfaces than ioctls. A sysfs attribute is much better for
    215   per-device settings, or for child objects with fairly static lifetimes (like
    216   output connectors in drm with all the detection override attributes). Or
    217   maybe only your testsuite needs this interface, and then debugfs with its
    218   disclaimer of not having a stable ABI would be better.
    219
    220Finally, the name of the game is to get it right on the first attempt, since if
    221your driver proves popular and your hardware platforms long-lived then you'll
    222be stuck with a given ioctl essentially forever. You can try to deprecate
    223horrible ioctls on newer iterations of your hardware, but generally it takes
    224years to accomplish this. And then again years until the last user able to
    225complain about regressions disappears, too.