Introducing gem5 : An Open-Source Computer Architecture Simulator



What happens when we increase cache memory size? 

How does it impact my system performance? 

Does increasing cache size/RAM always improve my system performance?


Answers to a couple of these questions are not straightforward
as they depend on several factors such as the type of application we run, processor architecture, etc. It is practically impossible and expensive to try out different hardware combinations (in this case cache memory) on actual systems to experiment out. In such cases, we can use computer architectural simulators to analyze various aspects of computer systems.


A
computer architectural simulator is a software tool, used to model and mimic the behavior of a computer system’s architecture. It allows researchers, developers, and engineers to study and analyze various aspects of computer systems – such as processors, memory hierarchies, and interconnections—without needing physical hardware. Simulators provide a platform to design, experiment with, and evaluate computer systems in a virtual environment. 


A computer architectural simulator helps users to:

  • Perform detailed analysis of hardware components like CPUs, caches, memory, and interconnects by emulating them
  • Tweak system parameters (e.g., cache sizes, clock speeds, pipeline stages) to study their impact on performance.
  • Execute real or synthetic workloads to measure system performance, energy consumption, and other metrics.
  • Explore unconventional architectures or emerging technologies.
  • Track system behavior and issues using debugging tools. 

Based on the simulation behavior simulators can be classified into: 

  • Functional Simulators: Focus on the correctness and behavior of the system rather than performance. 

Example: Simulating the sequence of instructions executed by a CPU.

  • Cycle-Accurate Simulators: Model the timing of every hardware component cycle-by-cycle.

Example: Evaluating the latency and throughput of a pipeline.

  • Full-System Simulators: Simulate an entire system, including processors, peripherals, memory, and OS. 

Example: Testing how an operating system performs on new hardware.

  • Trace-Driven Simulators: Use traces (logs) of past executions to replay system behavior and analyze performance.
  • Event-Driven Simulators: Focus on discrete events (e.g., cache misses, branch predictions) and simulate their effects.

Let us glance through some of the popular computer architectural simulators that are used for academic and industrial research purposes

  • SimpleScalar: A widely-used simulator for basic processor modeling, known for its simplicity and ease of use. 
  • Sniper: A fast and accurate x86 multi-core simulator optimized for performance estimation.
  • MARSSx86: A full-system x86 simulator focused on detailed modeling of CPUs and memory systems.
  • QEMU: Primarily an emulator but often used for system-level simulations due to its speed.
  • ZSim: A simulator aimed at high-speed simulation of large-scale multi-core systems.
  • Simics: A commercial simulator that provides detailed full-system modeling with support for various ISAs.
  • Synopsys Platform Architect Ultra: A commercial tool for system-level performance and power modeling, focused on architecture exploration and optimization of SoC designs. It is widely used in industry for early-stage design and analysis.
  • SPARTA: A high-performance, modular, event-driven framework for detailed microarchitecture modeling. It is particularly useful for cycle-accurate simulations and fine-grained performance analysis.
  • gem5: A flexible, modular, and open-source simulator capable of detailed modeling of CPUs, GPUs, memory systems, and full-system simulations, supporting multiple ISAs such as ARM, x86, RISC-V, and MIPS.

In this blog we will discuss gem5 further as it stands out among these simulators due to several key advantages:

  • Modularity and Flexibility: gem5’s object-oriented design and modular components make it highly extensible for custom hardware-software co-design research.
  • Comprehensive Support: It supports multiple ISAs, including ARM, x86, RISC-V, MIPS, and SPARC, making it versatile for cross-architecture studies.
  • Detailed Modeling: gem5 provides accurate microarchitectural details for CPUs, GPUs, and memory subsystems, which is crucial for in-depth research.
  • Full-System Simulation: It allows full-system simulation, enabling the study of hardware and software interactions, unlike some simulators that are limited to user-level simulations.
  • Active Community: gem5 has a large, active community contributing to its development, ensuring continuous improvement, bug fixes, and support.
  • Open Source: As an open-source tool, it is freely available and can be modified to meet specific research needs, unlike commercial options such as Simics.

Introduction to gem5

gem5 is a state-of-the-art open-source computer architecture simulator widely used in academia and industry for modeling and evaluating computer systems. It provides a flexible and modular framework for simulating diverse architectures, from simple single-core systems to complex multi-core and heterogeneous setups. gem5 is primarily used for research and development in computer architecture, system software, and hardware-software co-design. gem5 is written primarily in C++ and python. It can simulate a system with devices and an operating system in full system mode (FS mode) or user space-only programs where system services are provided directly by the simulator in syscall emulation mode (SE mode). gem5 supports executing Alpha, ARM, MIPS, Power, SPARC, RISC-V, and 64-bit x86 binaries on CPU models including two simple single CPI models, an out-of-order model, and an in-order pipelined model. It can also run precompiled binaries for performance evaluation. 


Memory models

gem5 provides two memory models for simulating memory systems;  classic and Ruby. The table below summarizes their key features
 

Feature

Classic Model

Ruby Model

Cache Coherence Protocols

Predefined (MOESI, MESI)

Fully customizable

Ease of Use

Simple to configure and use

Complex, requires expertise

Simulation Speed

Faster

Slower

Flexibility

Limited

High

Custom Protocol Support

No

Yes

Use Case

General-purpose simulations

Advanced research and experiments

Let’s discuss how to use gem5 and try some small exercises to familiarize yourself with the tool.


First, we need to install gem5, the following installation steps will help you with the same 

Step 1: Install dependencies

sudo apt install build-essential git m4 scons zlib1g zlib1g-dev libprotobuf-dev protobuf-compiler libprotoc-dev libgoogle-perftools-dev python-dev python


Step 2: Clone gem5 repo

git clone https://github.com/gem5/gem5

Step 3: Build the system

We can build for any supported ISA, here I am taking RISCV as an example-

scons build/RISCV/gem5.opt -j9

 

Experimenting with gem5 

Now, let’s try out some experiments with the RISCV system we just built and analyze its performance.

Let’s start by measuring the level 2 cache misses and using an IPC performance metric (Instructions Per Cycle). We will experiment by changing the L2 cache size and see the impact on L2 miss rate and IPC.

For this we need to select one application, and create the application binary for the required ISA; in this case RISC-V binaries

I used Canneal from the PARSEC benchmark suite. We need to build the binaries for RISC-V using RISC-V toolchain

You can find the source code and steps to create riscv binaries for various applications including canneal from the link given below
https://github.com/RALC88/riscv-vectorized-benchmark-suite

After building the benchmark binaries, run the binaries with different cache sizes, in this case, I am experimenting with L2 cache size. For running canneal benchmark with L2 cache size 512 KB you can run this command given below: 

 ./build/RISCV/gem5.opt configs/deprecated/example/se.py –cmd=/home/siva/gem5/canneal_serial.exe –options=”1 15000 2000 input_can/200000.nets 64″  –caches –l2cache –l2_size=512kB –cpu-type=RiscvO3CPU

The possible command line arguments are given inside gem5/configs/common/Options.py file

Once the simulation ends you can check the stats file (gem5/m5out/stats.txt)  for required parameters. I have given the values for the L2 hit rate and IPC for different L2 cache sizes as references. The results demonstrate the impact of cache size on IPC and hit rate for a given application. 

Introducing gem5

Introducing gem5

Similarly, we can verify various architectural concepts, and design our architecture.

 

Conclusion 

The gem5 simulator is a versatile and powerful tool for computer architecture research, enabling detailed simulation and analysis of hardware and software interactions. Its flexibility, support for multiple ISAs, and modular design make it invaluable for exploring emerging technologies, optimizing performance, and studying complex systems.

The gem5 community has been active over the last few years and is frequently updated with new features. Let us try out a comparison of different CPU models such as in-order and out-of-order cpus by similarly comparing their ipc values.  Let us try out different CPU models [ Timing, Atomic, Out of Order]  and analyze the difference in IPC to start working with gem5.  

 

Feel free to reach out to our team for any further discussion. Write to us at sales@vayavyalabs.com or you can reach us here

100% LikesVS
0% Dislikes

Author