cachepc-linux

Fork of AMDESE/linux with modifications for CachePC side-channel attack
git clone https://git.sinitax.com/sinitax/cachepc-linux
Log | Files | Refs | README | LICENSE | sfeed.txt

memory-model.rst (8081B)


      1.. SPDX-License-Identifier: GPL-2.0
      2
      3.. _physical_memory_model:
      4
      5=====================
      6Physical Memory Model
      7=====================
      8
      9Physical memory in a system may be addressed in different ways. The
     10simplest case is when the physical memory starts at address 0 and
     11spans a contiguous range up to the maximal address. It could be,
     12however, that this range contains small holes that are not accessible
     13for the CPU. Then there could be several contiguous ranges at
     14completely distinct addresses. And, don't forget about NUMA, where
     15different memory banks are attached to different CPUs.
     16
     17Linux abstracts this diversity using one of the two memory models:
     18FLATMEM and SPARSEMEM. Each architecture defines what
     19memory models it supports, what the default memory model is and
     20whether it is possible to manually override that default.
     21
     22All the memory models track the status of physical page frames using
     23struct page arranged in one or more arrays.
     24
     25Regardless of the selected memory model, there exists one-to-one
     26mapping between the physical page frame number (PFN) and the
     27corresponding `struct page`.
     28
     29Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
     30helpers that allow the conversion from PFN to `struct page` and vice
     31versa.
     32
     33FLATMEM
     34=======
     35
     36The simplest memory model is FLATMEM. This model is suitable for
     37non-NUMA systems with contiguous, or mostly contiguous, physical
     38memory.
     39
     40In the FLATMEM memory model, there is a global `mem_map` array that
     41maps the entire physical memory. For most architectures, the holes
     42have entries in the `mem_map` array. The `struct page` objects
     43corresponding to the holes are never fully initialized.
     44
     45To allocate the `mem_map` array, architecture specific setup code should
     46call :c:func:`free_area_init` function. Yet, the mappings array is not
     47usable until the call to :c:func:`memblock_free_all` that hands all the
     48memory to the page allocator.
     49
     50An architecture may free parts of the `mem_map` array that do not cover the
     51actual physical pages. In such case, the architecture specific
     52:c:func:`pfn_valid` implementation should take the holes in the
     53`mem_map` into account.
     54
     55With FLATMEM, the conversion between a PFN and the `struct page` is
     56straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
     57`mem_map` array.
     58
     59The `ARCH_PFN_OFFSET` defines the first page frame number for
     60systems with physical memory starting at address different from 0.
     61
     62SPARSEMEM
     63=========
     64
     65SPARSEMEM is the most versatile memory model available in Linux and it
     66is the only memory model that supports several advanced features such
     67as hot-plug and hot-remove of the physical memory, alternative memory
     68maps for non-volatile memory devices and deferred initialization of
     69the memory map for larger systems.
     70
     71The SPARSEMEM model presents the physical memory as a collection of
     72sections. A section is represented with struct mem_section
     73that contains `section_mem_map` that is, logically, a pointer to an
     74array of struct pages. However, it is stored with some other magic
     75that aids the sections management. The section size and maximal number
     76of section is specified using `SECTION_SIZE_BITS` and
     77`MAX_PHYSMEM_BITS` constants defined by each architecture that
     78supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
     79physical address that an architecture supports, the
     80`SECTION_SIZE_BITS` is an arbitrary value.
     81
     82The maximal number of sections is denoted `NR_MEM_SECTIONS` and
     83defined as
     84
     85.. math::
     86
     87   NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
     88
     89The `mem_section` objects are arranged in a two-dimensional array
     90called `mem_sections`. The size and placement of this array depend
     91on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
     92sections:
     93
     94* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
     95  array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
     96  single `mem_section` object.
     97* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
     98  array is dynamically allocated. Each row contains PAGE_SIZE worth of
     99  `mem_section` objects and the number of rows is calculated to fit
    100  all the memory sections.
    101
    102The architecture setup code should call sparse_init() to
    103initialize the memory sections and the memory maps.
    104
    105With SPARSEMEM there are two possible ways to convert a PFN to the
    106corresponding `struct page` - a "classic sparse" and "sparse
    107vmemmap". The selection is made at build time and it is determined by
    108the value of `CONFIG_SPARSEMEM_VMEMMAP`.
    109
    110The classic sparse encodes the section number of a page in page->flags
    111and uses high bits of a PFN to access the section that maps that page
    112frame. Inside a section, the PFN is the index to the array of pages.
    113
    114The sparse vmemmap uses a virtually mapped memory map to optimize
    115pfn_to_page and page_to_pfn operations. There is a global `struct
    116page *vmemmap` pointer that points to a virtually contiguous array of
    117`struct page` objects. A PFN is an index to that array and the
    118offset of the `struct page` from `vmemmap` is the PFN of that
    119page.
    120
    121To use vmemmap, an architecture has to reserve a range of virtual
    122addresses that will map the physical pages containing the memory
    123map and make sure that `vmemmap` points to that range. In addition,
    124the architecture should implement :c:func:`vmemmap_populate` method
    125that will allocate the physical memory and create page tables for the
    126virtual memory map. If an architecture does not have any special
    127requirements for the vmemmap mappings, it can use default
    128:c:func:`vmemmap_populate_basepages` provided by the generic memory
    129management.
    130
    131The virtually mapped memory map allows storing `struct page` objects
    132for persistent memory devices in pre-allocated storage on those
    133devices. This storage is represented with struct vmem_altmap
    134that is eventually passed to vmemmap_populate() through a long chain
    135of function calls. The vmemmap_populate() implementation may use the
    136`vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to
    137allocate memory map on the persistent memory device.
    138
    139ZONE_DEVICE
    140===========
    141The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
    142`struct page` `mem_map` services for device driver identified physical
    143address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
    144that the page objects for these address ranges are never marked online,
    145and that a reference must be taken against the device, not just the page
    146to keep the memory pinned for active use. `ZONE_DEVICE`, via
    147:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
    148turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
    149:c:func:`get_user_pages` service for the given range of pfns. Since the
    150page reference count never drops below 1 the page is never tracked as
    151free memory and the page's `struct list_head lru` space is repurposed
    152for back referencing to the host device / driver that mapped the memory.
    153
    154While `SPARSEMEM` presents memory as a collection of sections,
    155optionally collected into memory blocks, `ZONE_DEVICE` users have a need
    156for smaller granularity of populating the `mem_map`. Given that
    157`ZONE_DEVICE` memory is never marked online it is subsequently never
    158subject to its memory ranges being exposed through the sysfs memory
    159hotplug api on memory block boundaries. The implementation relies on
    160this lack of user-api constraint to allow sub-section sized memory
    161ranges to be specified to :c:func:`arch_add_memory`, the top-half of
    162memory hotplug. Sub-section support allows for 2MB as the cross-arch
    163common alignment granularity for :c:func:`devm_memremap_pages`.
    164
    165The users of `ZONE_DEVICE` are:
    166
    167* pmem: Map platform persistent memory to be used as a direct-I/O target
    168  via DAX mappings.
    169
    170* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
    171  event callbacks to allow a device-driver to coordinate memory management
    172  events related to device-memory, typically GPU memory. See
    173  Documentation/vm/hmm.rst.
    174
    175* p2pdma: Create `struct page` objects to allow peer devices in a
    176  PCI/-E topology to coordinate direct-DMA operations between themselves,
    177  i.e. bypass host memory.