Introduction to QEMU

QEMU (Quick Emulator) is a widely used open-source computer emulator and virtual machine as a stand-alone hypervisor.

  • QEMU as an emulator – It simulates CPU and other hardware resources for GuestOS through binary translation, which basically allows GuestOS to believe that it is interacting directly with the hardware
  • QEMU as a virtual machine – QEMU can make the virtual machine get close to the performance of the physical machine by directly using the system resources (like, CPU, IO ports etc) of the physical machine.

QEMU provides two functions of operation. In user mode operation QEMU can launch Linux processes compiled for one CPU on another CPU, translating syscalls on the fly; the contents will be returned by the host kernel and so will describe the host CPU instead of the emulated CPU. It is mainly used to test the result of cross compilers Whereas, in system mode operation QEMU emulates a virtual machine with a configurable CPU, memory size and modes. It is much slower than user-mode emulation since the target kernel is emulated.

 

QEMU Architecture

When QEMU is used as a system simulator, it will simulate a virtual machine that can independently run the operating system. As shown in the above figure, each virtual machine corresponds to a QEMU process in the host (Host), and the vCPU of the virtual machine corresponds to a thread of the QEMU process.

KVM (Kernel based Virtual Machine)

It enables full virtualization with help of utilizing a computer’s physical components to support the software that creates and manages virtual machines. KVM provides a virtualization function module embedded in the Linux kernel, which utilizes some of the operating system capabilities provided by the Linux kernel. It dramatically increases the speed of a GuestOS, so that it is as much responsive and usable as the Native/Host OS.

KVM consists of a kernel module kvm.ko for core virtualization and a processor-related module kvm.ko which presents virtualization of CPU and memory, so it needs to be combined with QEMU to shape a whole virtualization technology.

QEMU Hardware emulation

Qemu emulation is achieved using different modules – some of the important module definitions are inked here.

  • QEMU Memory model – The qemu memory API emulates qemu’s memory, IO bus and corresponding controllers, including the following parts of the simulation:
  1.  Conventional memory
  2.  IO mapped memory (MMIO)
  3.  Memory controller (dynamically maps physical memory to different virtual address spaces)
  • QEMU Object Model (QOM) – QOM is a set of object-oriented mechanism that QEMU implements on the basis of C. It is responsible for abstracting devices such as device and bus into objects
  • QEMU Bus Model – These modules connect system devices and CPUs for communication between devices and between devices and CPUs.
  • QEMU ARM interrupt – QEMU interrupt system use GPIO to implement an interrupt system. 
    [ device ]   —————–>  [ GIC ]  ———————> [ CPU ]
  •  QEMU device (qdev) –  qdev is the interface that QEMU uses to create guest devices and connect them to each other

ARM Watchdog modeling (SP805)

This chapter demonstrates modeling of ARM Watchdog module (SP805) and interfacing it to the existing ARM versatilepb platform. Please refer to TRM from more details on SP805.

Module State structure
typedef struct SP805State {
    /*< private >*/
    SysBusDevice parent_obj;
    QEMUTimer *timer;
    /*< public >*/
    MemoryRegion iomem;
    qemu_irq wdt_irq;
    uint32_t regs[SP805_WDT_REGS_MAX];
    uint32_t wdt_lockreg;
    uint32_t wdt_perIdreg[SP805_PERIFID_REG_MAX]; // PERIPHERAL ID Registers
    uint32_t wdt_pcellreg[SP805_PCELL_REG_MAX]; // PCELL ID registers
    uint32_t wdt_clk_freq; // WDT clock frequency
    int64_t wdt_start_clk; // Virtual clock when wdt counter started
}SP805State;

This structure is important during the migration process. It contains information related to the bus to which the device is interfaced, WDT timer information, Memory (iomem in this case), IRQ, and device register information/States which are pushed and popped into/from the stack during multi-threading. The states of the devices are serialized in the host machine and deserialized in the target machine during the migration.

Module Class
typedef struct SP805Class {
    SysBusDeviceClass parent_class; // Inheritance using nested struct
    void (*wdt_reload) (SP805State *s);
}SP805Class;

In QEMU, the OOP functionality is implemented with nested structures used to store class and instance definitions, and callbacks as virtual methods.

VMState Structures
static const VMStateDescription vmstate_sp805_wdt = {
    .name = “vmstate sp805 wdt”,
    .version_id = 0,
    .minimum_version_id = 0,
    .fields = (VMStateField[]) {
        VMSTATE_TIMER_PTR(timer, SP805State),
        VMSTATE_UINT32_ARRAY(regs, SP805State, SP805_WDT_REGS_MAX),
        VMSTATE_UINT32(wdt_lockreg, SP805State),
        VMSTATE_UINT32_ARRAY(wdt_perIdreg, SP805State,

Migration is the process of moving a guest VM from one hypervisor to another while the guest is still running. The guest continues to work normally and is unaware that the hypervisor has changed. Usually, migration involves moving the entire guest from one physical host machine to another machine.

SP805_PERIFID_REG_MAX),
        VMSTATE_UINT32_ARRAY(wdt_pcellreg, SP805State, SP805_PCELL_REG_MAX),
        VMSTATE_END_OF_LIST()
    }
};

The device state is retained during migration. The state components are registered in this structure initialization (basically, put() and get() functions are generated for each component). Refer migration.rst for more details.

read register callback */
static uint64_t sp805_wdt_read(void *opaque, hwaddr offset, unsigned size)
{
    SP805State *s = SP805_WDT(opaque);
if(offset < SP805_WDT_LOCK_REG)
        offset >>= 2;

switch (offset) {
case SP805_WDT_LOAD_REG:
return s->regs[SP805_WDT_LOAD_REG];
    …. /* code to read every reg */
}

/* write register callback */
staticvoidsp805_wdt_write(void *opaque, hwaddr offset, uint64_t data,
unsigned size)
{
    SP805State *s = SP805_WDT(opaque);
    SP805Class *swc = SP805_WDT_GET_CLASS(s);

// make sure reg write is unclocked
bool is_unlocked = (s->wdt_lockreg == SP805_WDT_SPECIAL_UNLOCK_CODE)? true:false;
if(offset < SP805_WDT_LOCK_REG)
        offset >>= 2;
switch(offset) {
case SP805_WDT_LOAD_REG:
    {
if(is_unlocked)
        {
            s->regs[SP805_WDT_LOAD_REG] = data;
            swc->wdt_reload(s);
        }
break;
    }
  …. /* some code to write every reg */
}
/* register read and write callbacks */
staticconst MemoryRegionOps sp805_wdt_ops = {
    .read = sp805_wdt_read,
    .write = sp805_wdt_write,
    .endianness = DEVICE_NATIVE_ENDIAN,

};

The read and write callbacks are registered with MemoryRegionOps structure

Device reset
static void sp805_wdt_reset(DeviceState *dev)
{
    SP805State *s = SP805_WDT(dev);

    s->regs[SP805_WDT_LOAD_REG] = 0xFFFFFFFF;
    s->regs[SP805_WDT_CURRENT_REG] = 0xFFFFFFFF;
    s->regs[SP805_WDT_CONTROL_REG] = 0x0;
    s->regs[SP805_WDT_RAW_INT_STS_REG] = 0x0;
    s->regs[SP805_WDT_MSK_INT_STS_REG] = 0x0;
    s->wdt_lockreg = 0x0;
    s->wdt_perIdreg[SP805_WDT_PERIPH_ID0_IDX] = 0x05;
    s->wdt_perIdreg[SP805_WDT_PERIPH_ID1_IDX] = 0x18;
    s->wdt_perIdreg[SP805_WDT_PERIPH_ID2_IDX] = 0x14;
    s->wdt_perIdreg[SP805_WDT_PERIPH_ID3_IDX] = 0x00;
    s->wdt_pcellreg[SP805_WDT_PCELL_ID0_IDX] = 0x0D;
    s->wdt_pcellreg[SP805_WDT_PCELL_ID1_IDX] = 0xF0;
    s->wdt_pcellreg[SP805_WDT_PCELL_ID2_IDX] = 0x05;
    s->wdt_pcellreg[SP805_WDT_PCELL_ID3_IDX] = 0xB1;

    timer_del(s->timer);
}

This callback is called during qemu_system_reset()  during initialization time.

Device initialization
static void sp805_wdt_init(Object *obj)
{
    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
    SP805State *s = SP805_WDT(obj);
    memory_region_init_io(&s->iomem, OBJECT(s), &sp805_wdt_ops, s, TYPE_SP805_WDT, 0x1000);
                          //SP805_WDT_REGS_MAX + SP805_PERIFID_REG_MAX + SP805_PCELL_REG_MAX + 1); // +1 added for lock reg
   
    sysbus_init_mmio(sbd, &s->iomem);
    sysbus_init_irq(sbd, &s->wdt_irq);
}

static void sp805_wdt_realize(DeviceState *dev, Error **errp)
{
    SP805State *s = SP805_WDT(dev);

    s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, sp805_wdt_timer_expired, dev);   
    s->wdt_clk_freq = SP805_PCLK_HZ;
}

static void sp805_wdt_class_init(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SP805Class *swc = SP805_WDT_CLASS(klass);

    dc->desc = “ARM Watchdog Controller Module (SP805)”;
    dc->realize = sp805_wdt_realize;
    dc->reset = sp805_wdt_reset;
    dc->vmsd = &vmstate_sp805_wdt;
    swc->wdt_reload = sp805_wdt_reload;
}

static const TypeInfo sp805_wdt_info = {
    .parent = TYPE_SYS_BUS_DEVICE,
    .name = TYPE_SP805_WDT,
    .instance_size = sizeof(SP805State),
    .instance_init = sp805_wdt_init,
    .class_init = sp805_wdt_class_init,
    .class_size = sizeof(SP805Class),
};

The .class_init callbacks are called early, by a mechanism similar to C++ constructors, and they initialise the structures used to store the class definitions. They are recursively chained, i.e. first the parent callback is called to initialise the parent structure members, then the current callback is called, to fill in its own members.

The .instance_init callback is automatically called when new instances of a class are created. Similarly, they are also recursively chained. .instance_init may also create children objects, recursively.

.realize is a bit trickier. If .instance_init is the very first thing automatically called when creating an object, .realize is the very last thing, it is called manually when the whole hierarchy of objects is created and it signals that everything is ready, .realize usually has to manually call the parent .realize.

Some definite rules for initialization

  • anything that can fail and return an error must go in realize (because instance_init has no failure-return mechanism)
  • anything you need to do to set up a QOM property on the object must go in instance_init (so that the property can be set by the user of the device object before realizing it)
  • anything that changes the state of the simulation must go in realize (some QMP monitor commands to introspect objects will do instance_init/look at object/delete)

Device registration
static void wdt_sp805_register_types(void)
{
    watchdog_add_model(&model);
    type_register_static(&sp805_wdt_info);
}
type_init(wdt_sp805_register_types)

The device is registered in the qdev tree of the versatilepb machine.

Device Interface to versatilepb machine
 /* Add SP805 ARM Watchdog Module */
    sysbus_create_simple(“sp805.wdt”, 0x101e1000, pic[0]);

Adding the above line in hw/arm/versatilepb.c will interface a device into a versatilepb machine device tree. sysbus_create_simple() is a helper function to create a device. The address map and interrupt are mapped in the second and third argument respectively.

Timer reload and expire
static void sp805_wdt_reload(SP805State *s)
{
    uint64_t reload;

    reload = muldiv64(s->regs[SP805_WDT_LOAD_REG], NANOSECONDS_PER_SECOND,
                      s->wdt_clk_freq); // load * 10^9 / clk_freq
    if(sp805_wdt_is_enabled(s)) {
        s->wdt_start_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
        timer_mod(s->timer, s->wdt_start_clk + reload); // modify the timer to expire after reload time
    }
}

static void sp805_wdt_timer_expired(void *dev)
{
    SP805State *s = SP805_WDT(dev);
    SP805Class *swc = SP805_WDT_CLASS(dev);

    if(s->regs[SP805_WDT_RAW_INT_STS_REG] == 0x0) {
        s->regs[SP805_WDT_RAW_INT_STS_REG] = 0x1;
        qemu_set_irq(s->wdt_irq, 1);
        swc->wdt_reload(s);
    }
    else
    {
        qemu_log_mask(CPU_LOG_RESET, “Watchdog timer %” HWADDR_PRIx ” expired.\n”, s->iomem.addr);
        watchdog_perform_action();
        timer_del(s->timer);
    }
}

The Qemu timer module is used to reload the counter and when the timer expires watchdog_perform_acton() is a Qemu function used to reset the hardware.

The SP805 module code is compiled as a part of QEMU (qemu-system-arm); the new device SP805 is a part of the Versatile PB machine device tree (i.e qdev tree).

The reset values of the WDT registers are verified from the QEMU monitor through direct memory read. The register dump of peripheral id and pcell id of the SP805 is verify as shown below.

SP805 ARM Watchdog timer application
/**
* This is very basic application for SP805 watchdog timer using /dev/mem
* Few basic registers are programmed for demonstration and this can be enhanced using other registers
*/
#include
#include
#include
#include
#include <sys/mman.h>
#define printf(…) \
fprintf(stdout, __VA_ARGS__); \
fflush(stdout);

typedef uint32_t u32;
int main()
{
  unsigned int sp805_memmap_size = 0x1000; // Physical memory size mapped for SP805
  off_t sp805_base = 0x101e1000; // physical base address
  u32 *sp805_vptr;
  int fd;
  // Map the SP805 physical address into user space getting a virtual address for it
  if ((fd = open(“/dev/mem”, O_RDWR | O_SYNC)) == -1) {
  printf(“Failed to open /dev/mem\n”);
  exit(0);
  }
  sp805_vptr = (u32 *)mmap(NULL, sp805_memmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, sp805_base);
/**
* Set interval for 100ms – refer TRM page no 11
* Interval = (WdogLoad + 1)*watchdog clock period
* sp805 clock freq is 1MHz (hence clock period is 1ms)
*/
sp805_vptr[0] = 0x0001869F; // WdogLoad register (Base + 0x00)
sp805_vptr[2] = 0x1; // WdogControl register (Base + 0x08)
// Wait until the timer expires so that the file descriptor is not closed
while(sp805_vptr[4] != 0x1); // WdogRIS register (Base + 0x10)
    close(fd);
return 0;
}

The above application can be cross compiled and run on Qemu. The Qemu will reset and it should start rebooting when the timer expires.

References

  1. QEMU developers documentation https://readthedocs.org/projects/qemu/downloads/pdf/latest/
  2. QEMU https://wiki.archlinux.org/index.php/QEMU
  3. ARM Watchdog Module (SP805) TRM https://developer.arm.com/documentation/ddi0270/b/
  4. QEMU Source Code Notes https://chenyufei.info/notes/qemu-src.html
  5. qdev for programmers writeup by Paolo Bonzini https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg00842.html
  6. QEMU Code Overview Architecture & internals tour by Stefan Hajnoczi
  7. Porting QEMU to Plan 9: QEMU Internals and Port Strategy by Nathaniel Wesley Filardo