Rapid Design Space Exploration with Virtual Prototypes

Jonas Kaufmann — Wed, Aug 28, 2024

SimBricks has been completely rewritten to enable new use-cases and better usability. Therefore the Python code shown here is outdated. For details check newer blog posts or our documentation.

Designing tomorrow’s heterogeneous systems is a complex task. System architects face many system and component design choices. While experienced system architects can immediately dismiss many of these designs, the plausible design space remains. Narrowing down these plausible choices further is a challenge as architects cannot reliably compare them without spending months prototyping multiple complete systems for individual designs. For custom chips or large-scale systems, building multiple physical testbeds with different designs for comparison is out of the question. Complete system prototypes are necessary since ultimately the metrics of interest relate to performance and cost of the complete system. The increasingly complex and fine-grained interactions between components make projecting full-system metrics from component metrics extremely difficult.

We built SimBricks to solve exactly this problem. No matter where in the design process system architects stand, SimBricks allows them to build virtual prototypes of the complete system in simulation for full-system testing and performance evaluation with their actual workloads and software. This enables rapid design space exploration to figure out optimal design choices based on full-system metrics. The SimBricks orchestration framework enables users to assemble virtual prototypes with different component parameters and even run them in parallel given sufficient computational resources.

The rest of this post illustrates how this works with a concrete example.

Even Simple Systems Have a Large Design Space

We use the following, conceptually simple, AI inference serving system architecture:

A heterogeneous system with M clients connected to an external network and N
servers with X hardware accelerators each, which are connected to an internal
network on the other side. There is also a load balancer in the internal
network.

The system has M clients connected to an external network. The clients send requests to the load balancer in the internal network, which then forwards them to one of N servers with X AI inference hardware accelerators each. Here, N and X are system-level design parameters the system architect can play with. The internal network topology and parameters such as link speeds are also part of the design space. Both networks also have background traffic. Finally, there is an assortment of component-level parameters, such as the number of cores and amount of memory available at servers, and architectural choices for the hardware accelerators like clock-speed and the dimensions of their compute units, etc.

Even for this rather simple system, there are many questions to explore in a full-system evaluation: To serve M clients with given workload characteristics, how many servers (N) does the system need to achieve the given service-level objective (e.g. bounded tail latencies)? Do additional hardware accelerators per server (X) reduce the number of servers required? Are more accelerators operating at a lower frequency preferable to fewer that operate at a higher frequency? How does background traffic affect SLOs?

Easily Exploring Designs with SimBricks Orchestration

As a first step, the user chooses simulators for each system component. For standard components, users can draw on our library of pre-configured components and simulators. For custom components, the system architect needs to develop simulation models. They can decide the required level of detail here, depending on the project stage. For example, for the hardware accelerator, an early behavioral model in C++ or SystemC already enables users to answer early what-if questions. Later on, the virtual prototype can help test and explore different RTL design decisions. Critically, the virtual prototype also runs the complete software and workloads, possibly directly from the end-customer, to measure the full-system metrics that ultimately matter.

For building the simulation, the user writes a Python script for the SimBricks orchestration framework that describes the system to simulate and which simulators to use (see our earlier post. SimBricks orchestration scripts make it easy to explore a range of configurations, for different combinations of parameters to evaluate. Since orchestration scripts are just Python code creating objects describing the simulation, users can rely on all Python constructs, such as loops, modules, etc. Our example below uses itertools.product() and a few simple for loops to generate a complete list of different combinations of parameter values:

from simbricks.orchestration import experiments as exp
from simbricks.orchestration import nodeconfig as node
from simbricks.orchestration import simulators as sim

import itertools

experiments = []

# The system- and component-level design choices to compare
M_opts = [4, 16, 128]
N_opts = [1, 2, 4, 8]
num_accel_per_server_opts = [1, 2]
accel_clk_freq_opts = [100, 400]
background_traffic_opts = [0.5, 0.8]

# Actually build the experiments now
for (
    M, N, X,
    Hz, background_traffic
) in itertools.product(
    M_opts, N_opts, X_opts,
    accel_clk_freq_opts, background_traffic_opts
):
    experiment = exp.Experiment(
        f"exploration-{M}c-{N}s-{X}x-{Hz}-{background_traffic}"
    )
 
    # Instantiate external & internal network, add background traffic
    # ...
 
    for i in range(N):
        for j in range(X):
            # Instantiate and connect all servers
            # ...
   
    for i in range(M):
        # Instantiate and connect all clients
        # ...
 
    # Instantiate load balancer
    # ...
 
    experiments.append(experiment)

Fast Design Space Sweeps with Parallel Exploration

To make design space sweeps faster, SimBricks can run multiple different simulations in parallel. The orchestration framework automates this if simbricks-run is invoked with the --parallel flag. SimBricks manages CPU and memory resources and schedules parallel simulations accordingly to avoid slowdowns because of resource oversubscription. SimBricks also supports running simulations across multiple physical hosts to run larger simulations and for more parallelism. Stay tuned for a dedicated post on distributed simulations in SimBricks! Until then:

Join the Discussion on our Slack

Subscribe to our newsletter