cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

user_mode_linux_howto_v2.rst (47055B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3#########
      4UML HowTo
      5#########
      6
      7.. contents:: :local:
      8
      9************
     10Introduction
     11************
     12
     13Welcome to User Mode Linux
     14
     15User Mode Linux is the first Open Source virtualization platform (first
     16release date 1991) and second virtualization platform for an x86 PC.
     17
     18How is UML Different from a VM using Virtualization package X?
     19==============================================================
     20
     21We have come to assume that virtualization also means some level of
     22hardware emulation. In fact, it does not. As long as a virtualization
     23package provides the OS with devices which the OS can recognize and
     24has a driver for, the devices do not need to emulate real hardware.
     25Most OSes today have built-in support for a number of "fake"
     26devices used only under virtualization.
     27User Mode Linux takes this concept to the ultimate extreme - there
     28is not a single real device in sight. It is 100% artificial or if
     29we use the correct term 100% paravirtual. All UML devices are abstract
     30concepts which map onto something provided by the host - files, sockets,
     31pipes, etc.
     32
     33The other major difference between UML and various virtualization
     34packages is that there is a distinct difference between the way the UML
     35kernel and the UML programs operate.
     36The UML kernel is just a process running on Linux - same as any other
     37program. It can be run by an unprivileged user and it does not require
     38anything in terms of special CPU features.
     39The UML userspace, however, is a bit different. The Linux kernel on the
     40host machine assists UML in intercepting everything the program running
     41on a UML instance is trying to do and making the UML kernel handle all
     42of its requests.
     43This is different from other virtualization packages which do not make any
     44difference between the guest kernel and guest programs. This difference
     45results in a number of advantages and disadvantages of UML over let's say
     46QEMU which we will cover later in this document.
     47
     48
     49Why Would I Want User Mode Linux?
     50=================================
     51
     52
     53* If User Mode Linux kernel crashes, your host kernel is still fine. It
     54  is not accelerated in any way (vhost, kvm, etc) and it is not trying to
     55  access any devices directly.  It is, in fact, a process like any other.
     56
     57* You can run a usermode kernel as a non-root user (you may need to
     58  arrange appropriate permissions for some devices).
     59
     60* You can run a very small VM with a minimal footprint for a specific
     61  task (for example 32M or less).
     62
     63* You can get extremely high performance for anything which is a "kernel
     64  specific task" such as forwarding, firewalling, etc while still being
     65  isolated from the host kernel.
     66
     67* You can play with kernel concepts without breaking things.
     68
     69* You are not bound by "emulating" hardware, so you can try weird and
     70  wonderful concepts which are very difficult to support when emulating
     71  real hardware such as time travel and making your system clock
     72  dependent on what UML does (very useful for things like tests).
     73
     74* It's fun.
     75
     76Why not to run UML
     77==================
     78
     79* The syscall interception technique used by UML makes it inherently
     80  slower for any userspace applications. While it can do kernel tasks
     81  on par with most other virtualization packages, its userspace is
     82  **slow**. The root cause is that UML has a very high cost of creating
     83  new processes and threads (something most Unix/Linux applications
     84  take for granted).
     85
     86* UML is strictly uniprocessor at present. If you want to run an
     87  application which needs many CPUs to function, it is clearly the
     88  wrong choice.
     89
     90***********************
     91Building a UML instance
     92***********************
     93
     94There is no UML installer in any distribution. While you can use off
     95the shelf install media to install into a blank VM using a virtualization
     96package, there is no UML equivalent. You have to use appropriate tools on
     97your host to build a viable filesystem image.
     98
     99This is extremely easy on Debian - you can do it using debootstrap. It is
    100also easy on OpenWRT - the build process can build UML images. All other
    101distros - YMMV.
    102
    103Creating an image
    104=================
    105
    106Create a sparse raw disk image::
    107
    108   # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G
    109
    110This will create a 16G disk image. The OS will initially allocate only one
    111block and will allocate more as they are written by UML. As of kernel
    112version 4.19 UML fully supports TRIM (as usually used by flash drives).
    113Using TRIM inside the UML image by specifying discard as a mount option
    114or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to
    115return any unused blocks to the OS.
    116
    117Create a filesystem on the disk image and mount it::
    118
    119   # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt
    120
    121This example uses ext4, any other filesystem such as ext3, btrfs, xfs,
    122jfs, etc will work too.
    123
    124Create a minimal OS installation on the mounted filesystem::
    125
    126   # debootstrap buster /mnt http://deb.debian.org/debian
    127
    128debootstrap does not set up the root password, fstab, hostname or
    129anything related to networking. It is up to the user to do that.
    130
    131Set the root password - the easiest way to do that is to chroot into the
    132mounted image::
    133
    134   # chroot /mnt
    135   # passwd
    136   # exit
    137
    138Edit key system files
    139=====================
    140
    141UML block devices are called ubds. The fstab created by debootstrap
    142will be empty and it needs an entry for the root file system::
    143
    144   /dev/ubd0   ext4    discard,errors=remount-ro  0       1
    145
    146The image hostname will be set to the same as the host on which you
    147are creating its image. It is a good idea to change that to avoid
    148"Oh, bummer, I rebooted the wrong machine".
    149
    150UML supports two classes of network devices - the older uml_net ones
    151which are scheduled for obsoletion. These are called ethX. It also
    152supports the newer vector IO devices which are significantly faster
    153and have support for some standard virtual network encapsulations like
    154Ethernet over GRE and Ethernet over L2TPv3. These are called vec0.
    155
    156Depending on which one is in use, ``/etc/network/interfaces`` will
    157need entries like::
    158
    159   # legacy UML network devices
    160   auto eth0
    161   iface eth0 inet dhcp
    162
    163   # vector UML network devices
    164   auto vec0
    165   iface vec0 inet dhcp
    166
    167We now have a UML image which is nearly ready to run, all we need is a
    168UML kernel and modules for it.
    169
    170Most distributions have a UML package. Even if you intend to use your own
    171kernel, testing the image with a stock one is always a good start. These
    172packages come with a set of modules which should be copied to the target
    173filesystem. The location is distribution dependent. For Debian these
    174reside under /usr/lib/uml/modules. Copy recursively the content of this
    175directory to the mounted UML filesystem::
    176
    177   # cp -rax /usr/lib/uml/modules /mnt/lib/modules
    178
    179If you have compiled your own kernel, you need to use the usual "install
    180modules to a location" procedure by running::
    181
    182  # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install
    183
    184This will install modules into /mnt/lib/modules/$(KERNELRELEASE).
    185To specify the full module installation path, use::
    186
    187  # make MODLIB=/mnt/lib/modules modules_install
    188
    189At this point the image is ready to be brought up.
    190
    191*************************
    192Setting Up UML Networking
    193*************************
    194
    195UML networking is designed to emulate an Ethernet connection. This
    196connection may be either point-to-point (similar to a connection
    197between machines using a back-to-back cable) or a connection to a
    198switch. UML supports a wide variety of means to build these
    199connections to all of: local machine, remote machine(s), local and
    200remote UML and other VM instances.
    201
    202
    203+-----------+--------+------------------------------------+------------+
    204| Transport |  Type  |        Capabilities                | Throughput |
    205+===========+========+====================================+============+
    206| tap       | vector | checksum, tso                      | > 8Gbit    |
    207+-----------+--------+------------------------------------+------------+
    208| hybrid    | vector | checksum, tso, multipacket rx      | > 6GBit    |
    209+-----------+--------+------------------------------------+------------+
    210| raw       | vector | checksum, tso, multipacket rx, tx" | > 6GBit    |
    211+-----------+--------+------------------------------------+------------+
    212| EoGRE     | vector | multipacket rx, tx                 | > 3Gbit    |
    213+-----------+--------+------------------------------------+------------+
    214| Eol2tpv3  | vector | multipacket rx, tx                 | > 3Gbit    |
    215+-----------+--------+------------------------------------+------------+
    216| bess      | vector | multipacket rx, tx                 | > 3Gbit    |
    217+-----------+--------+------------------------------------+------------+
    218| fd        | vector | dependent on fd type               | varies     |
    219+-----------+--------+------------------------------------+------------+
    220| tuntap    | legacy | none                               | ~ 500Mbit  |
    221+-----------+--------+------------------------------------+------------+
    222| daemon    | legacy | none                               | ~ 450Mbit  |
    223+-----------+--------+------------------------------------+------------+
    224| socket    | legacy | none                               | ~ 450Mbit  |
    225+-----------+--------+------------------------------------+------------+
    226| pcap      | legacy | rx only                            | ~ 450Mbit  |
    227+-----------+--------+------------------------------------+------------+
    228| ethertap  | legacy | obsolete                           | ~ 500Mbit  |
    229+-----------+--------+------------------------------------+------------+
    230| vde       | legacy | obsolete                           | ~ 500Mbit  |
    231+-----------+--------+------------------------------------+------------+
    232
    233* All transports which have tso and checksum offloads can deliver speeds
    234  approaching 10G on TCP streams.
    235
    236* All transports which have multi-packet rx and/or tx can deliver pps
    237  rates of up to 1Mps or more.
    238
    239* All legacy transports are generally limited to ~600-700MBit and 0.05Mps.
    240
    241* GRE and L2TPv3 allow connections to all of: local machine, remote
    242  machines, remote network devices and remote UML instances.
    243
    244* Socket allows connections only between UML instances.
    245
    246* Daemon and bess require running a local switch. This switch may be
    247  connected to the host as well.
    248
    249
    250Network configuration privileges
    251================================
    252
    253The majority of the supported networking modes need ``root`` privileges.
    254For example, in the legacy tuntap networking mode, users were required
    255to be part of the group associated with the tunnel device.
    256
    257For newer network drivers like the vector transports, ``root`` privilege
    258is required to fire an ioctl to setup the tun interface and/or use
    259raw sockets where needed.
    260
    261This can be achieved by granting the user a particular capability instead
    262of running UML as root.  In case of vector transport, a user can add the
    263capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary.
    264Thenceforth, UML can be run with normal user privilges, along with
    265full networking.
    266
    267For example::
    268
    269   # sudo setcap cap_net_raw,cap_net_admin+ep linux
    270
    271Configuring vector transports
    272===============================
    273
    274All vector transports support a similar syntax:
    275
    276If X is the interface number as in vec0, vec1, vec2, etc, the general
    277syntax for options is::
    278
    279   vecX:transport="Transport Name",option=value,option=value,...,option=value
    280
    281Common options
    282--------------
    283
    284These options are common for all transports:
    285
    286* ``depth=int`` - sets the queue depth for vector IO. This is the
    287  amount of packets UML will attempt to read or write in a single
    288  system call. The default number is 64 and is generally sufficient
    289  for most applications that need throughput in the 2-4 Gbit range.
    290  Higher speeds may require larger values.
    291
    292* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
    293
    294* ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads.
    295  The effect of this option depends on the host side support in the transport
    296  which is being configured. In most cases it will enable TCP segmentation and
    297  RX/TX checksumming offloads. The setting must be identical on the host side
    298  and the UML side. The UML kernel will produce warnings if it is not.
    299  For example, GRO is enabled by default on local machine interfaces
    300  (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the
    301  corresponding UML transports (raw, tap, hybrid) in order for networking to
    302  operate correctly.
    303
    304* ``mtu=int`` - sets the interface MTU
    305
    306* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
    307  if a packet will need to be re-encapsulated into for instance VXLAN.
    308
    309* ``vec=0`` - disable multipacket IO and fall back to packet at a
    310  time mode
    311
    312Shared Options
    313--------------
    314
    315* ``ifname=str`` Transports which bind to a local network interface
    316  have a shared option - the name of the interface to bind to.
    317
    318* ``src, dst, src_port, dst_port`` - all transports which use sockets
    319  which have the notion of source and destination and/or source port
    320  and destination port use these to specify them.
    321
    322* ``v6=[0,1]`` to specify if a v6 connection is desired for all
    323  transports which operate over IP. Additionally, for transports that
    324  have some differences in the way they operate over v4 and v6 (for example
    325  EoL2TPv3), sets the correct mode of operation. In the absense of this
    326  option, the socket type is determined based on what do the src and dst
    327  arguments resolve/parse to.
    328
    329tap transport
    330-------------
    331
    332Example::
    333
    334   vecX:transport=tap,ifname=tap0,depth=128,gro=1
    335
    336This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
    337created using tunctl) and UP.
    338
    339tap0 can be configured as a point-to-point interface and given an IP
    340address so that UML can talk to the host. Alternatively, it is possible
    341to connect UML to a tap interface which is connected to a bridge.
    342
    343While tap relies on the vector infrastructure, it is not a true vector
    344transport at this point, because Linux does not support multi-packet
    345IO on tap file descriptors for normal userspace apps like UML. This
    346is a privilege which is offered only to something which can hook up
    347to it at kernel level via specialized interfaces like vhost-net. A
    348vhost-net like helper for UML is planned at some point in the future.
    349
    350Privileges required: tap transport requires either:
    351
    352* tap interface to exist and be created persistent and owned by the
    353  UML user using tunctl. Example ``tunctl -u uml-user -t tap0``
    354
    355* binary to have ``CAP_NET_ADMIN`` privilege
    356
    357hybrid transport
    358----------------
    359
    360Example::
    361
    362   vecX:transport=hybrid,ifname=tap0,depth=128,gro=1
    363
    364This is an experimental/demo transport which couples tap for transmit
    365and a raw socket for receive. The raw socket allows multi-packet
    366receive resulting in significantly higher packet rates than normal tap.
    367
    368Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
    369the UML user as well as the requirements for the tap transport.
    370
    371raw socket transport
    372--------------------
    373
    374Example::
    375
    376   vecX:transport=raw,ifname=p-veth0,depth=128,gro=1
    377
    378
    379This transport uses vector IO on raw sockets. While you can bind to any
    380interface including a physical one, the most common use it to bind to
    381the "peer" side of a veth pair with the other side configured on the
    382host.
    383
    384Example host configuration for Debian:
    385
    386**/etc/network/interfaces**::
    387
    388   auto veth0
    389   iface veth0 inet static
    390	address 192.168.4.1
    391	netmask 255.255.255.252
    392	broadcast 192.168.4.3
    393	pre-up ip link add veth0 type veth peer name p-veth0 && \
    394          ifconfig p-veth0 up
    395
    396UML can now bind to p-veth0 like this::
    397
    398   vec0:transport=raw,ifname=p-veth0,depth=128,gro=1
    399
    400
    401If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0
    402it can talk to the host on 192.168.4.1
    403
    404The raw transport also provides some support for offloading some of the
    405filtering to the host. The two options to control it are:
    406
    407* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter
    408
    409* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux.
    410  This option allows the use of the ethtool load firmware command to
    411  load bpf code.
    412
    413In either case the bpf code is loaded into the host kernel. While this is
    414presently limited to legacy bpf syntax (not ebpf), it is still a security
    415risk. It is not recommended to allow this unless the User Mode Linux
    416instance is considered trusted.
    417
    418Privileges required: raw socket transport requires `CAP_NET_RAW`
    419capability.
    420
    421GRE socket transport
    422--------------------
    423
    424Example::
    425
    426   vecX:transport=gre,src=$src_host,dst=$dst_host
    427
    428
    429This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
    430``GREIRB``) tunnel which will connect the UML instance to a ``GRE``
    431endpoint at host dst_host. ``GRE`` supports the following additional
    432options:
    433
    434* ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set,
    435  ``txkey`` must be set too
    436
    437* ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set
    438  ``rx_key`` must be set too
    439
    440* ``sequence=[0,1]`` - enable GRE sequence
    441
    442* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset
    443  on each packet (needed to interoperate with some really broken
    444  implementations)
    445
    446* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively
    447
    448* GRE checksum is not presently supported
    449
    450GRE has a number of caveats:
    451
    452* You can use only one GRE connection per IP address. There is no way to
    453  multiplex connections as each GRE tunnel is terminated directly on
    454  the UML instance.
    455
    456* The key is not really a security feature. While it was intended as such
    457  its "security" is laughable. It is, however, a useful feature to
    458  ensure that the tunnel is not misconfigured.
    459
    460An example configuration for a Linux host with a local address of
    461192.168.128.1 to connect to a UML instance at 192.168.129.1
    462
    463**/etc/network/interfaces**::
    464
    465   auto gt0
    466   iface gt0 inet static
    467    address 10.0.0.1
    468    netmask 255.255.255.0
    469    broadcast 10.0.0.255
    470    mtu 1500
    471    pre-up ip link add gt0 type gretap local 192.168.128.1 \
    472           remote 192.168.129.1 || true
    473    down ip link del gt0 || true
    474
    475Additionally, GRE has been tested versus a variety of network equipment.
    476
    477Privileges required: GRE requires ``CAP_NET_RAW``
    478
    479l2tpv3 socket transport
    480-----------------------
    481
    482_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more
    483options than GNU ls". While it has some advantages, there are usually
    484easier (and less verbose) ways to connect a UML instance to something.
    485For example, most devices which support L2TPv3 also support GRE.
    486
    487Example::
    488
    489    vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff
    490
    491This will configure an Ethernet over L2TPv3 fixed tunnel which will
    492connect the UML instance to a L2TPv3 endpoint at host $dst_host using
    493the L2TPv3 UDP flavour and UDP destination port $dst_port.
    494
    495L2TPv3 always requires the following additional options:
    496
    497* ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets
    498
    499* ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets
    500
    501As the tunnel is fixed these are not negotiated and they are
    502preconfigured on both ends.
    503
    504Additionally, L2TPv3 supports the following optional parameters.
    505
    506* ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same
    507  functionality as GRE key, more to prevent misconfiguration than provide
    508  actual security
    509
    510* ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets
    511
    512* ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit.
    513
    514* ``counter=[0,1]`` - enable l2tpv3 counter
    515
    516* ``pin_counter=[0,1]`` - pretend that the counter is always reset on
    517  each packet (needed to interoperate with some really broken
    518  implementations)
    519
    520* ``v6=[0,1]`` - force v6 sockets
    521
    522* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol
    523
    524L2TPv3 has a number of caveats:
    525
    526* you can use only one connection per IP address in raw mode. There is
    527  no way to multiplex connections as each L2TPv3 tunnel is terminated
    528  directly on the UML instance. UDP mode can use different ports for
    529  this purpose.
    530
    531Here is an example of how to configure a Linux host to connect to UML
    532via L2TPv3:
    533
    534**/etc/network/interfaces**::
    535
    536   auto l2tp1
    537   iface l2tp1 inet static
    538    address 192.168.126.1
    539    netmask 255.255.255.0
    540    broadcast 192.168.126.255
    541    mtu 1500
    542    pre-up ip l2tp add tunnel remote 127.0.0.1 \
    543           local 127.0.0.1 encap udp tunnel_id 2 \
    544           peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \
    545           ip l2tp add session name l2tp1 tunnel_id 2 \
    546           session_id 0xffffffff peer_session_id 0xffffffff
    547    down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \
    548           ip l2tp del tunnel tunnel_id 2
    549
    550
    551Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and
    552no special privileges for the UDP mode.
    553
    554BESS socket transport
    555---------------------
    556
    557BESS is a high performance modular network switch.
    558
    559https://github.com/NetSys/bess
    560
    561It has support for a simple sequential packet socket mode which in the
    562more recent versions is using vector IO for high performance.
    563
    564Example::
    565
    566   vecX:transport=bess,src=$unix_src,dst=$unix_dst
    567
    568This will configure a BESS transport using the unix_src Unix domain
    569socket address as source and unix_dst socket address as destination.
    570
    571For BESS configuration and how to allocate a BESS Unix domain socket port
    572please see the BESS documentation.
    573
    574https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports
    575
    576BESS transport does not require any special privileges.
    577
    578Configuring Legacy transports
    579=============================
    580
    581Legacy transports are now considered obsolete. Please use the vector
    582versions.
    583
    584***********
    585Running UML
    586***********
    587
    588This section assumes that either the user-mode-linux package from the
    589distribution or a custom built kernel has been installed on the host.
    590
    591These add an executable called linux to the system. This is the UML
    592kernel. It can be run just like any other executable.
    593It will take most normal linux kernel arguments as command line
    594arguments.  Additionally, it will need some UML-specific arguments
    595in order to do something useful.
    596
    597Arguments
    598=========
    599
    600Mandatory Arguments:
    601--------------------
    602
    603* ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will
    604  also accept K, M or G qualifiers.
    605
    606* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
    607  mandatory, but it is likely to be needed in nearly all cases so we can
    608  specify a root file system.
    609  The simplest possible image specification is the name of the image
    610  file for the filesystem (created using one of the methods described
    611  in `Creating an image`_).
    612
    613  * UBD devices support copy on write (COW). The changes are kept in
    614    a separate file which can be discarded allowing a rollback to the
    615    original pristine image.  If COW is desired, the UBD image is
    616    specified as: ``cow_file,master_image``.
    617    Example:``ubd0=Filesystem.cow,Filesystem.img``
    618
    619  * UBD devices can be set to use synchronous IO. Any writes are
    620    immediately flushed to disk. This is done by adding ``s`` after
    621    the ``ubdX`` specification.
    622
    623  * UBD performs some heuristics on devices specified as a single
    624    filename to make sure that a COW file has not been specified as
    625    the image. To turn them off, use the ``d`` flag after ``ubdX``.
    626
    627  * UBD supports TRIM - asking the Host OS to reclaim any unused
    628    blocks in the image. To turn it off, specify the ``t`` flag after
    629    ``ubdX``.
    630
    631* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
    632  filesystem image)
    633
    634Important Optional Arguments
    635----------------------------
    636
    637If UML is run as "linux" with no extra arguments, it will try to start an
    638xterm for every console configured inside the image (up to 6 in most
    639Linux distributions). Each console is started inside an
    640xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
    641however, the wrong approach if UML is to be used as a testing harness or run
    642in a text-only environment.
    643
    644In order to change this behaviour we need to specify an alternative console
    645and wire it to one of the supported "line" channels. For this we need to map a
    646console to use something different from the default xterm.
    647
    648Example which will divert console number 1 to stdin/stdout::
    649
    650   con1=fd:0,fd:1
    651
    652UML supports a wide variety of serial line channels which are specified using
    653the following syntax
    654
    655   conX=channel_type:options[,channel_type:options]
    656
    657
    658If the channel specification contains two parts separated by comma, the first
    659one is input, the second one output.
    660
    661* The null channel - Discard all input or output. Example ``con=null`` will set
    662  all consoles to null by default.
    663
    664* The fd channel - use file descriptor numbers for input/output. Example:
    665  ``con1=fd:0,fd:1.``
    666
    667* The port channel - start a telnet server on TCP port number. Example:
    668  ``con1=port:4321``.  The host must have /usr/sbin/in.telnetd (usually part of
    669  a telnetd package) and the port-helper from the UML utilities (see the
    670  information for the xterm channel below).  UML will not boot until a client
    671  connects.
    672
    673* The pty and pts channels - use system pty/pts.
    674
    675* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8``
    676  will make UML use the host 8th console (usually unused).
    677
    678* The xterm channel - this is the default - bring up an xterm on this channel
    679  and direct IO to it. Note that in order for xterm to work, the host must
    680  have the UML distribution package installed. This usually contains the
    681  port-helper and other utilities needed for UML to communicate with the xterm.
    682  Alternatively, these need to be complied and installed from source. All
    683  options applicable to consoles also apply to UML serial lines which are
    684  presented as ttyS inside UML.
    685
    686Starting UML
    687============
    688
    689We can now run UML.
    690::
    691
    692   # linux mem=2048M umid=TEST \
    693    ubd0=Filesystem.img \
    694    vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
    695    root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
    696
    697This will run an instance with ``2048M RAM`` and try to use the image file
    698called ``Filesystem.img`` as root. It will connect to the host using tap0.
    699All consoles except ``con1`` will be disabled and console 1 will
    700use standard input/output making it appear in the same terminal it was started.
    701
    702Logging in
    703============
    704
    705If you have not set up a password when generating the image, you will have to
    706shut down the UML instance, mount the image, chroot into it and set it - as
    707described in the Generating an Image section.  If the password is already set,
    708you can just log in.
    709
    710The UML Management Console
    711============================
    712
    713In addition to managing the image from "the inside" using normal sysadmin tools,
    714it is possible to perform a number of low-level operations using the UML
    715management console. The UML management console is a low-level interface to the
    716kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
    717there is a full-blown operating system under UML, there is much greater
    718flexibility possible than with the SysRq mechanism.
    719
    720There are a number of things you can do with the mconsole interface:
    721
    722* get the kernel version
    723* add and remove devices
    724* halt or reboot the machine
    725* Send SysRq commands
    726* Pause and resume the UML
    727* Inspect processes running inside UML
    728* Inspect UML internal /proc state
    729
    730You need the mconsole client (uml\_mconsole) which is a part of the UML
    731tools package available in most Linux distritions.
    732
    733You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML
    734kernel.  When you boot UML, you'll see a line like::
    735
    736   mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
    737
    738If you specify a unique machine id on the UML command line, i.e.
    739``umid=debian``, you'll see this::
    740
    741   mconsole initialized on /home/jdike/.uml/debian/mconsole
    742
    743
    744That file is the socket that uml_mconsole will use to communicate with
    745UML.  Run it with either the umid or the full path as its argument::
    746
    747   # uml_mconsole debian
    748
    749or
    750
    751   # uml_mconsole /home/jdike/.uml/debian/mconsole
    752
    753
    754You'll get a prompt, at which you can run one of these commands:
    755
    756* version
    757* help
    758* halt
    759* reboot
    760* config
    761* remove
    762* sysrq
    763* help
    764* cad
    765* stop
    766* go
    767* proc
    768* stack
    769
    770version
    771-------
    772
    773This command takes no arguments.  It prints the UML version::
    774
    775   (mconsole)  version
    776   OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64
    777
    778
    779There are a couple actual uses for this.  It's a simple no-op which
    780can be used to check that a UML is running.  It's also a way of
    781sending a device interrupt to the UML. UML mconsole is treated internally as
    782a UML device.
    783
    784help
    785----
    786
    787This command takes no arguments. It prints a short help screen with the
    788supported mconsole commands.
    789
    790
    791halt and reboot
    792---------------
    793
    794These commands take no arguments.  They shut the machine down immediately, with
    795no syncing of disks and no clean shutdown of userspace.  So, they are
    796pretty close to crashing the machine::
    797
    798   (mconsole)  halt
    799   OK
    800
    801config
    802------
    803
    804"config" adds a new device to the virtual machine. This is supported
    805by most UML device drivers. It takes one argument, which is the
    806device to add, with the same syntax as the kernel command line::
    807
    808   (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22
    809
    810remove
    811------
    812
    813"remove" deletes a device from the system.  Its argument is just the
    814name of the device to be removed. The device must be idle in whatever
    815sense the driver considers necessary.  In the case of the ubd driver,
    816the removed block device must not be mounted, swapped on, or otherwise
    817open, and in the case of the network driver, the device must be down::
    818
    819   (mconsole)  remove ubd3
    820
    821sysrq
    822-----
    823
    824This command takes one argument, which is a single letter.  It calls the
    825generic kernel's SysRq driver, which does whatever is called for by
    826that argument.  See the SysRq documentation in
    827Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
    828see what letters are valid and what they do.
    829
    830cad
    831---
    832
    833This invokes the ``Ctl-Alt-Del`` action in the running image.  What exactly
    834this ends up doing is up to init, systemd, etc.  Normally, it reboots the
    835machine.
    836
    837stop
    838----
    839
    840This puts the UML in a loop reading mconsole requests until a 'go'
    841mconsole command is received. This is very useful as a
    842debugging/snapshotting tool.
    843
    844go
    845--
    846
    847This resumes a UML after being paused by a 'stop' command. Note that
    848when the UML has resumed, TCP connections may have timed out and if
    849the UML is paused for a long period of time, crond might go a little
    850crazy, running all the jobs it didn't do earlier.
    851
    852proc
    853----
    854
    855This takes one argument - the name of a file in /proc which is printed
    856to the mconsole standard output
    857
    858stack
    859-----
    860
    861This takes one argument - the pid number of a process. Its stack is
    862printed to a standard output.
    863
    864*******************
    865Advanced UML Topics
    866*******************
    867
    868Sharing Filesystems between Virtual Machines
    869============================================
    870
    871Don't attempt to share filesystems simply by booting two UMLs from the
    872same file.  That's the same thing as booting two physical machines
    873from a shared disk.  It will result in filesystem corruption.
    874
    875Using layered block devices
    876---------------------------
    877
    878The way to share a filesystem between two virtual machines is to use
    879the copy-on-write (COW) layering capability of the ubd block driver.
    880Any changed blocks are stored in the private COW file, while reads come
    881from either device - the private one if the requested block is valid in
    882it, the shared one if not.  Using this scheme, the majority of data
    883which is unchanged is shared between an arbitrary number of virtual
    884machines, each of which has a much smaller file containing the changes
    885that it has made.  With a large number of UMLs booting from a large root
    886filesystem, this leads to a huge disk space saving.
    887
    888Sharing file system data will also help performance, since the host will
    889be able to cache the shared data using a much smaller amount of memory,
    890so UML disk requests will be served from the host's memory rather than
    891its disks.  There is a major caveat in doing this on multisocket NUMA
    892machines.  On such hardware, running many UML instances with a shared
    893master image and COW changes may cause issues like NMIs from excess of
    894inter-socket traffic.
    895
    896If you are running UML on high-end hardware like this, make sure to
    897bind UML to a set of logical CPUs residing on the same socket using the
    898``taskset`` command or have a look at the "tuning" section.
    899
    900To add a copy-on-write layer to an existing block device file, simply
    901add the name of the COW file to the appropriate ubd switch::
    902
    903   ubd0=root_fs_cow,root_fs_debian_22
    904
    905where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is
    906the existing shared filesystem.  The COW file need not exist.  If it
    907doesn't, the driver will create and initialize it.
    908
    909Disk Usage
    910----------
    911
    912UML has TRIM support which will release any unused space in its disk
    913image files to the underlying OS. It is important to use either ls -ls
    914or du to verify the actual file size.
    915
    916COW validity.
    917-------------
    918
    919Any changes to the master image will invalidate all COW files. If this
    920happens, UML will *NOT* automatically delete any of the COW files and
    921will refuse to boot. In this case the only solution is to either
    922restore the old image (including its last modified timestamp) or remove
    923all COW files which will result in their recreation. Any changes in
    924the COW files will be lost.
    925
    926Cows can moo - uml_moo : Merging a COW file with its backing file
    927-----------------------------------------------------------------
    928
    929Depending on how you use UML and COW devices, it may be advisable to
    930merge the changes in the COW file into the backing file every once in
    931a while.
    932
    933The utility that does this is uml_moo.  Its usage is::
    934
    935   uml_moo COW_file new_backing_file
    936
    937
    938There's no need to specify the backing file since that information is
    939already in the COW file header.  If you're paranoid, boot the new
    940merged file, and if you're happy with it, move it over the old backing
    941file.
    942
    943``uml_moo`` creates a new backing file by default as a safety measure.
    944It also has a destructive merge option which will merge the COW file
    945directly into its current backing file.  This is really only usable
    946when the backing file only has one COW file associated with it.  If
    947there are multiple COWs associated with a backing file, a -d merge of
    948one of them will invalidate all of the others.  However, it is
    949convenient if you're short of disk space, and it should also be
    950noticeably faster than a non-destructive merge.
    951
    952``uml_moo`` is installed with the UML distribution packages and is
    953available as a part of UML utilities.
    954
    955Host file access
    956==================
    957
    958If you want to access files on the host machine from inside UML, you
    959can treat it as a separate machine and either nfs mount directories
    960from the host or copy files into the virtual machine with scp.
    961However, since UML is running on the host, it can access those
    962files just like any other process and make them available inside the
    963virtual machine without the need to use the network.
    964This is possible with the hostfs virtual filesystem.  With it, you
    965can mount a host directory into the UML filesystem and access the
    966files contained in it just as you would on the host.
    967
    968*SECURITY WARNING*
    969
    970Hostfs without any parameters to the UML Image will allow the image
    971to mount any part of the host filesystem and write to it. Always
    972confine hostfs to a specific "harmless" directory (for example ``/var/tmp``)
    973if running UML. This is especially important if UML is being run as root.
    974
    975Using hostfs
    976------------
    977
    978To begin with, make sure that hostfs is available inside the virtual
    979machine with::
    980
    981   # cat /proc/filesystems
    982
    983``hostfs`` should be listed.  If it's not, either rebuild the kernel
    984with hostfs configured into it or make sure that hostfs is built as a
    985module and available inside the virtual machine, and insmod it.
    986
    987
    988Now all you need to do is run mount::
    989
    990   # mount none /mnt/host -t hostfs
    991
    992will mount the host's ``/`` on the virtual machine's ``/mnt/host``.
    993If you don't want to mount the host root directory, then you can
    994specify a subdirectory to mount with the -o switch to mount::
    995
    996   # mount none /mnt/home -t hostfs -o /home
    997
    998will mount the host's /home on the virtual machine's /mnt/home.
    999
   1000hostfs as the root filesystem
   1001-----------------------------
   1002
   1003It's possible to boot from a directory hierarchy on the host using
   1004hostfs rather than using the standard filesystem in a file.
   1005To start, you need that hierarchy.  The easiest way is to loop mount
   1006an existing root_fs file::
   1007
   1008   #  mount root_fs uml_root_dir -o loop
   1009
   1010
   1011You need to change the filesystem type of ``/`` in ``etc/fstab`` to be
   1012'hostfs', so that line looks like this::
   1013
   1014   /dev/ubd/0       /        hostfs      defaults          1   1
   1015
   1016Then you need to chown to yourself all the files in that directory
   1017that are owned by root.  This worked for me::
   1018
   1019   #  find . -uid 0 -exec chown jdike {} \;
   1020
   1021Next, make sure that your UML kernel has hostfs compiled in, not as a
   1022module.  Then run UML with the boot device pointing at that directory::
   1023
   1024   ubd0=/path/to/uml/root/directory
   1025
   1026UML should then boot as it does normally.
   1027
   1028Hostfs Caveats
   1029--------------
   1030
   1031Hostfs does not support keeping track of host filesystem changes on the
   1032host (outside UML). As a result, if a file is changed without UML's
   1033knowledge, UML will not know about it and its own in-memory cache of
   1034the file may be corrupt. While it is possible to fix this, it is not
   1035something which is being worked on at present.
   1036
   1037Tuning UML
   1038============
   1039
   1040UML at present is strictly uniprocessor. It will, however spin up a
   1041number of threads to handle various functions.
   1042
   1043The UBD driver, SIGIO and the MMU emulation do that. If the system is
   1044idle, these threads will be migrated to other processors on a SMP host.
   1045This, unfortunately, will usually result in LOWER performance because of
   1046all of the cache/memory synchronization traffic between cores. As a
   1047result, UML will usually benefit from being pinned on a single CPU,
   1048especially on a large system. This can result in performance differences
   1049of 5 times or higher on some benchmarks.
   1050
   1051Similarly, on large multi-node NUMA systems UML will benefit if all of
   1052its memory is allocated from the same NUMA node it will run on. The
   1053OS will *NOT* do that by default. In order to do that, the sysadmin
   1054needs to create a suitable tmpfs ramdisk bound to a particular node
   1055and use that as the source for UML RAM allocation by specifying it
   1056in the TMP or TEMP environment variables. UML will look at the values
   1057of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will
   1058look for shmfs mounted under ``/dev/shm``. If everything else fails use
   1059``/tmp/`` regardless of the filesystem type used for it::
   1060
   1061   mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX
   1062   TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options..
   1063
   1064*******************************************
   1065Contributing to UML and Developing with UML
   1066*******************************************
   1067
   1068UML is an excellent platform to develop new Linux kernel concepts -
   1069filesystems, devices, virtualization, etc. It provides unrivalled
   1070opportunities to create and test them without being constrained to
   1071emulating specific hardware.
   1072
   1073Example - want to try how Linux will work with 4096 "proper" network
   1074devices?
   1075
   1076Not an issue with UML. At the same time, this is something which
   1077is difficult with other virtualization packages - they are
   1078constrained by the number of devices allowed on the hardware bus
   1079they are trying to emulate (for example 16 on a PCI bus in qemu).
   1080
   1081If you have something to contribute such as a patch, a bugfix, a
   1082new feature, please send it to ``linux-um@lists.infradead.org``.
   1083
   1084Please follow all standard Linux patch guidelines such as cc-ing
   1085relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch.
   1086For more details see ``Documentation/process/submitting-patches.rst``
   1087
   1088Note - the list does not accept HTML or attachments, all emails must
   1089be formatted as plain text.
   1090
   1091Developing always goes hand in hand with debugging. First of all,
   1092you can always run UML under gdb and there will be a whole section
   1093later on on how to do that. That, however, is not the only way to
   1094debug a Linux kernel. Quite often adding tracing statements and/or
   1095using UML specific approaches such as ptracing the UML kernel process
   1096are significantly more informative.
   1097
   1098Tracing UML
   1099=============
   1100
   1101When running, UML consists of a main kernel thread and a number of
   1102helper threads. The ones of interest for tracing are NOT the ones
   1103that are already ptraced by UML as a part of its MMU emulation.
   1104
   1105These are usually the first three threads visible in a ps display.
   1106The one with the lowest PID number and using most CPU is usually the
   1107kernel thread. The other threads are the disk
   1108(ubd) device helper thread and the SIGIO helper thread.
   1109Running ptrace on this thread usually results in the following picture::
   1110
   1111   host$ strace -p 16566
   1112   --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} ---
   1113   epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
   1114   epoll_wait(4, [], 64, 0)                = 0
   1115   rt_sigreturn({mask=[PIPE]})             = 16967
   1116   ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
   1117   ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
   1118   ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
   1119   ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0
   1120   ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0
   1121   ptrace(PTRACE_SYSEMU, 16967, NULL, 0)   = 0
   1122   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} ---
   1123   wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967
   1124   ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
   1125   ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
   1126   ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
   1127   timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0
   1128   getpid()                                = 16566
   1129   clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
   1130   --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
   1131   rt_sigreturn({mask=[PIPE]})             = -1 EINTR (Interrupted system call)
   1132
   1133This is a typical picture from a mostly idle UML instance.
   1134
   1135* UML interrupt controller uses epoll - this is UML waiting for IO
   1136  interrupts:
   1137
   1138   epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
   1139
   1140* The sequence of ptrace calls is part of MMU emulation and running the
   1141  UML userspace.
   1142* ``timer_settime`` is part of the UML high res timer subsystem mapping
   1143  timer requests from inside UML onto the host high resolution timers.
   1144* ``clock_nanosleep`` is UML going into idle (similar to the way a PC
   1145  will execute an ACPI idle).
   1146
   1147As you can see UML will generate quite a bit of output even in idle. The output
   1148can be very informative when observing IO. It shows the actual IO calls, their
   1149arguments and returns values.
   1150
   1151Kernel debugging
   1152================
   1153
   1154You can run UML under gdb now, though it will not necessarily agree to
   1155be started under it. If you are trying to track a runtime bug, it is
   1156much better to attach gdb to a running UML instance and let UML run.
   1157
   1158Assuming the same PID number as in the previous example, this would be::
   1159
   1160   # gdb -p 16566
   1161
   1162This will STOP the UML instance, so you must enter `cont` at the GDB
   1163command line to request it to continue. It may be a good idea to make
   1164this into a gdb script and pass it to gdb as an argument.
   1165
   1166Developing Device Drivers
   1167=========================
   1168
   1169Nearly all UML drivers are monolithic. While it is possible to build a
   1170UML driver as a kernel module, that limits the possible functionality
   1171to in-kernel only and non-UML specific.  The reason for this is that
   1172in order to really leverage UML, one needs to write a piece of
   1173userspace code which maps driver concepts onto actual userspace host
   1174calls.
   1175
   1176This forms the so-called "user" portion of the driver. While it can
   1177reuse a lot of kernel concepts, it is generally just another piece of
   1178userspace code. This portion needs some matching "kernel" code which
   1179resides inside the UML image and which implements the Linux kernel part.
   1180
   1181*Note: There are very few limitations in the way "kernel" and "user" interact*.
   1182
   1183UML does not have a strictly defined kernel-to-host API. It does not
   1184try to emulate a specific architecture or bus. UML's "kernel" and
   1185"user" can share memory, code and interact as needed to implement
   1186whatever design the software developer has in mind. The only
   1187limitations are purely technical. Due to a lot of functions and
   1188variables having the same names, the developer should be careful
   1189which includes and libraries they are trying to refer to.
   1190
   1191As a result a lot of userspace code consists of simple wrappers.
   1192E.g. ``os_close_file()`` is just a wrapper around ``close()``
   1193which ensures that the userspace function close does not clash
   1194with similarly named function(s) in the kernel part.
   1195
   1196Using UML as a Test Platform
   1197============================
   1198
   1199UML is an excellent test platform for device driver development. As
   1200with most things UML, "some user assembly may be required". It is
   1201up to the user to build their emulation environment. UML at present
   1202provides only the kernel infrastructure.
   1203
   1204Part of this infrastructure is the ability to load and parse fdt
   1205device tree blobs as used in Arm or Open Firmware platforms. These
   1206are supplied as an optional extra argument to the kernel command
   1207line::
   1208
   1209    dtb=filename
   1210
   1211The device tree is loaded and parsed at boottime and is accessible by
   1212drivers which query it. At this moment in time this facility is
   1213intended solely for development purposes. UML's own devices do not
   1214query the device tree.
   1215
   1216Security Considerations
   1217-----------------------
   1218
   1219Drivers or any new functionality should default to not
   1220accepting arbitrary filename, bpf code or other parameters
   1221which can affect the host from inside the UML instance.
   1222For example, specifying the socket used for IPC communication
   1223between a driver and the host at the UML command line is OK
   1224security-wise. Allowing it as a loadable module parameter
   1225isn't.
   1226
   1227If such functionality is desireable for a particular application
   1228(e.g. loading BPF "firmware" for raw socket network transports),
   1229it should be off by default and should be explicitly turned on
   1230as a command line parameter at startup.
   1231
   1232Even with this in mind, the level of isolation between UML
   1233and the host is relatively weak. If the UML userspace is
   1234allowed to load arbitrary kernel drivers, an attacker can
   1235use this to break out of UML. Thus, if UML is used in
   1236a production application, it is recommended that all modules
   1237are loaded at boot and kernel module loading is disabled
   1238afterwards.