cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

reporting-regressions.rst (22591B)


      1.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
      2.. [see the bottom of this file for redistribution information]
      3
      4Reporting regressions
      5+++++++++++++++++++++
      6
      7"*We don't cause regressions*" is the first rule of Linux kernel development;
      8Linux founder and lead developer Linus Torvalds established it himself and
      9ensures it's obeyed.
     10
     11This document describes what the rule means for users and how the Linux kernel's
     12development model ensures to address all reported regressions; aspects relevant
     13for kernel developers are left to Documentation/process/handling-regressions.rst.
     14
     15
     16The important bits (aka "TL;DR")
     17================================
     18
     19#. It's a regression if something running fine with one Linux kernel works worse
     20   or not at all with a newer version. Note, the newer kernel has to be compiled
     21   using a similar configuration; the detailed explanations below describes this
     22   and other fine print in more detail.
     23
     24#. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst,
     25   it already covers all aspects important for regressions and repeated
     26   below for convenience. Two of them are important: start your report's subject
     27   with "[REGRESSION]" and CC or forward it to `the regression mailing list
     28   <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev).
     29
     30#. Optional, but recommended: when sending or forwarding your report, make the
     31   Linux kernel regression tracking bot "regzbot" track the issue by specifying
     32   when the regression started like this::
     33
     34       #regzbot introduced v5.13..v5.14-rc1
     35
     36
     37All the details on Linux kernel regressions relevant for users
     38==============================================================
     39
     40
     41The important basics
     42--------------------
     43
     44
     45What is a "regression" and what is the "no regressions rule"?
     46~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     47
     48It's a regression if some application or practical use case running fine with
     49one Linux kernel works worse or not at all with a newer version compiled using a
     50similar configuration. The "no regressions rule" forbids this to take place; if
     51it happens by accident, developers that caused it are expected to quickly fix
     52the issue.
     53
     54It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with
     555.14 doesn't work at all, works significantly slower, or misbehaves somehow.
     56It's also a regression if a perfectly working application suddenly shows erratic
     57behavior with a newer kernel version; such issues can be caused by changes in
     58procfs, sysfs, or one of the many other interfaces Linux provides to userland
     59software. But keep in mind, as mentioned earlier: 5.14 in this example needs to
     60be built from a configuration similar to the one from 5.13. This can be achieved
     61using ``make olddefconfig``, as explained in more detail below.
     62
     63Note the "practical use case" in the first sentence of this section: developers
     64despite the "no regressions" rule are free to change any aspect of the kernel
     65and even APIs or ABIs to userland, as long as no existing application or use
     66case breaks.
     67
     68Also be aware the "no regressions" rule covers only interfaces the kernel
     69provides to the userland. It thus does not apply to kernel-internal interfaces
     70like the module API, which some externally developed drivers use to hook into
     71the kernel.
     72
     73How do I report a regression?
     74~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     75
     76Just report the issue as outlined in
     77Documentation/admin-guide/reporting-issues.rst, it already describes the
     78important points. The following aspects outlined there are especially relevant
     79for regressions:
     80
     81 * When checking for existing reports to join, also search the `archives of the
     82   Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and
     83   `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
     84
     85 * Start your report's subject with "[REGRESSION]".
     86
     87 * In your report, clearly mention the last kernel version that worked fine and
     88   the first broken one. Ideally try to find the exact change causing the
     89   regression using a bisection, as explained below in more detail.
     90
     91 * Remember to let the Linux regressions mailing list
     92   (regressions@lists.linux.dev) know about your report:
     93
     94   * If you report the regression by mail, CC the regressions list.
     95
     96   * If you report your regression to some bug tracker, forward the submitted
     97     report by mail to the regressions list while CCing the maintainer and the
     98     mailing list for the subsystem in question.
     99
    100   If it's a regression within a stable or longterm series (e.g.
    101   v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list
    102   <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org).
    103
    104  In case you performed a successful bisection, add everyone to the CC the
    105  culprit's commit message mentions in lines starting with "Signed-off-by:".
    106
    107When CCing for forwarding your report to the list, consider directly telling the
    108aforementioned Linux kernel regression tracking bot about your report. To do
    109that, include a paragraph like this in your mail::
    110
    111       #regzbot introduced: v5.13..v5.14-rc1
    112
    113Regzbot will then consider your mail a report for a regression introduced in the
    114specified version range. In above case Linux v5.13 still worked fine and Linux
    115v5.14-rc1 was the first version where you encountered the issue. If you
    116performed a bisection to find the commit that caused the regression, specify the
    117culprit's commit-id instead::
    118
    119       #regzbot introduced: 1f2e3d4c5d
    120
    121Placing such a "regzbot command" is in your interest, as it will ensure the
    122report won't fall through the cracks unnoticed. If you omit this, the Linux
    123kernel's regressions tracker will take care of telling regzbot about your
    124regression, as long as you send a copy to the regressions mailing lists. But the
    125regression tracker is just one human which sometimes has to rest or occasionally
    126might even enjoy some time away from computers (as crazy as that might sound).
    127Relying on this person thus will result in an unnecessary delay before the
    128regressions becomes mentioned `on the list of tracked and unresolved Linux
    129kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the
    130weekly regression reports sent by regzbot. Such delays can result in Linus
    131Torvalds being unaware of important regressions when deciding between "continue
    132development or call this finished and release the final?".
    133
    134Are really all regressions fixed?
    135~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    136
    137Nearly all of them are, as long as the change causing the regression (the
    138"culprit commit") is reliably identified. Some regressions can be fixed without
    139this, but often it's required.
    140
    141Who needs to find the root cause of a regression?
    142~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    143
    144Developers of the affected code area should try to locate the culprit on their
    145own. But for them that's often impossible to do with reasonable effort, as quite
    146a lot of issues only occur in a particular environment outside the developer's
    147reach -- for example, a specific hardware platform, firmware, Linux distro,
    148system's configuration, or application. That's why in the end it's often up to
    149the reporter to locate the culprit commit; sometimes users might even need to
    150run additional tests afterwards to pinpoint the exact root cause. Developers
    151should offer advice and reasonably help where they can, to make this process
    152relatively easy and achievable for typical users.
    153
    154How can I find the culprit?
    155~~~~~~~~~~~~~~~~~~~~~~~~~~~
    156
    157Perform a bisection, as roughly outlined in
    158Documentation/admin-guide/reporting-issues.rst and described in more detail by
    159Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but
    160in many cases finds the culprit relatively quickly. If it's hard or
    161time-consuming to reliably reproduce the issue, consider teaming up with other
    162affected users to narrow down the search range together.
    163
    164Who can I ask for advice when it comes to regressions?
    165~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    166
    167Send a mail to the regressions mailing list (regressions@lists.linux.dev) while
    168CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the
    169issue might better be dealt with in private, feel free to omit the list.
    170
    171
    172Additional details about regressions
    173------------------------------------
    174
    175
    176What is the goal of the "no regressions rule"?
    177~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    178
    179Users should feel safe when updating kernel versions and not have to worry
    180something might break. This is in the interest of the kernel developers to make
    181updating attractive: they don't want users to stay on stable or longterm Linux
    182series that are either abandoned or more than one and a half years old. That's
    183in everybody's interest, as `those series might have known bugs, security
    184issues, or other problematic aspects already fixed in later versions
    185<http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_.
    186Additionally, the kernel developers want to make it simple and appealing for
    187users to test the latest pre-release or regular release. That's also in
    188everybody's interest, as it's a lot easier to track down and fix problems, if
    189they are reported shortly after being introduced.
    190
    191Is the "no regressions" rule really adhered in practice?
    192~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    193
    194It's taken really seriously, as can be seen by many mailing list posts from
    195Linux creator and lead developer Linus Torvalds, some of which are quoted in
    196Documentation/process/handling-regressions.rst.
    197
    198Exceptions to this rule are extremely rare; in the past developers almost always
    199turned out to be wrong when they assumed a particular situation was warranting
    200an exception.
    201
    202Who ensures the "no regressions" is actually followed?
    203~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    204
    205The subsystem maintainers should take care of that, which are watched and
    206supported by the tree maintainers -- e.g. Linus Torvalds for mainline and
    207Greg Kroah-Hartman et al. for various stable/longterm series.
    208
    209All of them are helped by people trying to ensure no regression report falls
    210through the cracks. One of them is Thorsten Leemhuis, who's currently acting as
    211the Linux kernel's "regressions tracker"; to facilitate this work he relies on
    212regzbot, the Linux kernel regression tracking bot. That's why you want to bring
    213your report on the radar of these people by CCing or forwarding each report to
    214the regressions mailing list, ideally with a "regzbot command" in your mail to
    215get it tracked immediately.
    216
    217How quickly are regressions normally fixed?
    218~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    219
    220Developers should fix any reported regression as quickly as possible, to provide
    221affected users with a solution in a timely manner and prevent more users from
    222running into the issue; nevertheless developers need to take enough time and
    223care to ensure regression fixes do not cause additional damage.
    224
    225The answer thus depends on various factors like the impact of a regression, its
    226age, or the Linux series in which it occurs. In the end though, most regressions
    227should be fixed within two weeks.
    228
    229Is it a regression, if the issue can be avoided by updating some software?
    230~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    231
    232Almost always: yes. If a developer tells you otherwise, ask the regression
    233tracker for advice as outlined above.
    234
    235Is it a regression, if a newer kernel works slower or consumes more energy?
    236~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    237
    238Yes, but the difference has to be significant. A five percent slow-down in a
    239micro-benchmark thus is unlikely to qualify as regression, unless it also
    240influences the results of a broad benchmark by more than one percent. If in
    241doubt, ask for advice.
    242
    243Is it a regression, if an external kernel module breaks when updating Linux?
    244~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    245
    246No, as the "no regression" rule is about interfaces and services the Linux
    247kernel provides to the userland. It thus does not cover building or running
    248externally developed kernel modules, as they run in kernel-space and hook into
    249the kernel using internal interfaces occasionally changed.
    250
    251How are regressions handled that are caused by security fixes?
    252~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    253
    254In extremely rare situations security issues can't be fixed without causing
    255regressions; those fixes are given way, as they are the lesser evil in the end.
    256Luckily this middling almost always can be avoided, as key developers for the
    257affected area and often Linus Torvalds himself try very hard to fix security
    258issues without causing regressions.
    259
    260If you nevertheless face such a case, check the mailing list archives if people
    261tried their best to avoid the regression. If not, report it; if in doubt, ask
    262for advice as outlined above.
    263
    264What happens if fixing a regression is impossible without causing another?
    265~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    266
    267Sadly these things happen, but luckily not very often; if they occur, expert
    268developers of the affected code area should look into the issue to find a fix
    269that avoids regressions or at least their impact. If you run into such a
    270situation, do what was outlined already for regressions caused by security
    271fixes: check earlier discussions if people already tried their best and ask for
    272advice if in doubt.
    273
    274A quick note while at it: these situations could be avoided, if people would
    275regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each
    276development cycle a test run. This is best explained by imagining a change
    277integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at
    278the same time is a hard requirement for some other improvement applied for
    2795.15-rc1. All these changes often can simply be reverted and the regression thus
    280solved, if someone finds and reports it before 5.15 is released. A few days or
    281weeks later this solution can become impossible, as some software might have
    282started to rely on aspects introduced by one of the follow-up changes: reverting
    283all changes would then cause a regression for users of said software and thus is
    284out of the question.
    285
    286Is it a regression, if some feature I relied on was removed months ago?
    287~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    288
    289It is, but often it's hard to fix such regressions due to the aspects outlined
    290in the previous section. It hence needs to be dealt with on a case-by-case
    291basis. This is another reason why it's in everybody's interest to regularly test
    292mainline pre-releases.
    293
    294Does the "no regression" rule apply if I seem to be the only affected person?
    295~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    296
    297It does, but only for practical usage: the Linux developers want to be free to
    298remove support for hardware only to be found in attics and museums anymore.
    299
    300Note, sometimes regressions can't be avoided to make progress -- and the latter
    301is needed to prevent Linux from stagnation. Hence, if only very few users seem
    302to be affected by a regression, it for the greater good might be in their and
    303everyone else's interest to lettings things pass. Especially if there is an
    304easy way to circumvent the regression somehow, for example by updating some
    305software or using a kernel parameter created just for this purpose.
    306
    307Does the regression rule apply for code in the staging tree as well?
    308~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    309
    310Not according to the `help text for the configuration option covering all
    311staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_,
    312which since its early days states::
    313
    314       Please note that these drivers are under heavy development, may or
    315       may not work, and may contain userspace interfaces that most likely
    316       will be changed in the near future.
    317
    318The staging developers nevertheless often adhere to the "no regressions" rule,
    319but sometimes bend it to make progress. That's for example why some users had to
    320deal with (often negligible) regressions when a WiFi driver from the staging
    321tree was replaced by a totally different one written from scratch.
    322
    323Why do later versions have to be "compiled with a similar configuration"?
    324~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    325
    326Because the Linux kernel developers sometimes integrate changes known to cause
    327regressions, but make them optional and disable them in the kernel's default
    328configuration. This trick allows progress, as the "no regressions" rule
    329otherwise would lead to stagnation.
    330
    331Consider for example a new security feature blocking access to some kernel
    332interfaces often abused by malware, which at the same time are required to run a
    333few rarely used applications. The outlined approach makes both camps happy:
    334people using these applications can leave the new security feature off, while
    335everyone else can enable it without running into trouble.
    336
    337How to create a configuration similar to the one of an older kernel?
    338~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    339
    340Start your machine with a known-good kernel and configure the newer Linux
    341version with ``make olddefconfig``. This makes the kernel's build scripts pick
    342up the configuration file (the ".config" file) from the running kernel as base
    343for the new one you are about to compile; afterwards they set all new
    344configuration options to their default value, which should disable new features
    345that might cause regressions.
    346
    347Can I report a regression I found with pre-compiled vanilla kernels?
    348~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    349
    350You need to ensure the newer kernel was compiled with a similar configuration
    351file as the older one (see above), as those that built them might have enabled
    352some known-to-be incompatible feature for the newer kernel. If in doubt, report
    353the matter to the kernel's provider and ask for advice.
    354
    355
    356More about regression tracking with "regzbot"
    357---------------------------------------------
    358
    359What is regression tracking and why should I care about it?
    360~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    361
    362Rules like "no regressions" need someone to ensure they are followed, otherwise
    363they are broken either accidentally or on purpose. History has shown this to be
    364true for Linux kernel development as well. That's why Thorsten Leemhuis, the
    365Linux Kernel's regression tracker, and some people try to ensure all regression
    366are fixed by keeping an eye on them until they are resolved. Neither of them are
    367paid for this, that's why the work is done on a best effort basis.
    368
    369Why and how are Linux kernel regressions tracked using a bot?
    370~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    371
    372Tracking regressions completely manually has proven to be quite hard due to the
    373distributed and loosely structured nature of Linux kernel development process.
    374That's why the Linux kernel's regression tracker developed regzbot to facilitate
    375the work, with the long term goal to automate regression tracking as much as
    376possible for everyone involved.
    377
    378Regzbot works by watching for replies to reports of tracked regressions.
    379Additionally, it's looking out for posted or committed patches referencing such
    380reports with "Link:" tags; replies to such patch postings are tracked as well.
    381Combined this data provides good insights into the current state of the fixing
    382process.
    383
    384How to see which regressions regzbot tracks currently?
    385~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    386
    387Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
    388
    389What kind of issues are supposed to be tracked by regzbot?
    390~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    391
    392The bot is meant to track regressions, hence please don't involve regzbot for
    393regular issues. But it's okay for the Linux kernel's regression tracker if you
    394involve regzbot to track severe issues, like reports about hangs, corrupted
    395data, or internal errors (Panic, Oops, BUG(), warning, ...).
    396
    397How to change aspects of a tracked regression?
    398~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    399
    400By using a 'regzbot command' in a direct or indirect reply to the mail with the
    401report. The easiest way to do that: find the report in your "Sent" folder or the
    402mailing list archive and reply to it using your mailer's "Reply-all" function.
    403In that mail, use one of the following commands in a stand-alone paragraph (IOW:
    404use blank lines to separate one or multiple of these commands from the rest of
    405the mail's text).
    406
    407 * Update when the regression started to happen, for example after performing a
    408   bisection::
    409
    410       #regzbot introduced: 1f2e3d4c5d
    411
    412 * Set or update the title::
    413
    414       #regzbot title: foo
    415
    416 * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of
    417   the issue or a fix are discussed:::
    418
    419       #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
    420       #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
    421
    422 * Point to a place with further details of interest, like a mailing list post
    423   or a ticket in a bug tracker that are slightly related, but about a different
    424   topic::
    425
    426       #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
    427
    428 * Mark a regression as invalid::
    429
    430       #regzbot invalid: wasn't a regression, problem has always existed
    431
    432Regzbot supports a few other commands primarily used by developers or people
    433tracking regressions. They and more details about the aforementioned regzbot
    434commands can be found in the `getting started guide
    435<https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and
    436the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_
    437for regzbot.
    438
    439..
    440   end-of-content
    441..
    442   This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
    443   of the file. If you want to distribute this text under CC-BY-4.0 only,
    444   please use "The Linux kernel developers" for author attribution and link
    445   this as source:
    446   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst
    447..
    448   Note: Only the content of this RST file as found in the Linux kernel sources
    449   is available under CC-BY-4.0, as versions of this text that were processed
    450   (for example by the kernel's build system) might contain content taken from
    451   files which use a more restrictive license.