Designing tomorrow’s heterogeneous systems is a complex task. System architects face many system and component design choices. While experienced system architects can immediately dismiss many of these designs, the plausible design space remains. Narrowing down these plausible choices further is a challenge as architects cannot reliably compare them without spending months prototyping multiple complete systems for individual designs. For custom chips or large-scale systems, building multiple physical testbeds with different designs for comparison is out of the question. Complete system prototypes are necessary since ultimately the metrics of interest relate to performance and cost of the complete system. The increasingly complex and fine-grained interactions between components make projecting full-system metrics from component metrics extremely difficult.
We built SimBricks to solve exactly this problem. No matter where in the design process system architects stand, SimBricks allows them to build virtual prototypes of the complete system in simulation for full-system testing and performance evaluation with their actual workloads and software. This enables rapid design space exploration to figure out optimal design choices based on full-system metrics. The SimBricks orchestration framework enables users to assemble virtual prototypes with different component parameters and even run them in parallel given sufficient computational resources.
The rest of this post illustrates how this works with a concrete example.
Even Simple Systems Have a Large Design Space
We use the following, conceptually simple, AI inference serving system architecture:
The system has M
clients connected to an external network. The clients send
requests to the load balancer in the internal network, which then forwards them
to one of N
servers with X
AI inference hardware accelerators each. Here, N
and X are system-level design parameters the system architect can play with. The
internal network topology and parameters such as link speeds are also part of
the design space. Both networks also have background traffic. Finally, there is
an assortment of component-level parameters, such as the number of cores and
amount of memory available at servers, and architectural choices for the
hardware accelerators like clock-speed and the dimensions of their compute
units, etc.
Even for this rather simple system, there are many questions to explore in a
full-system evaluation: To serve M
clients with given workload
characteristics, how many servers (N
) does the system need to achieve the
given service-level objective (e.g. bounded tail latencies)? Do additional
hardware accelerators per server (X
) reduce the number of servers required?
Are more accelerators operating at a lower frequency preferable to fewer that
operate at a higher frequency? How does background traffic affect SLOs?
Easily Exploring Designs with SimBricks Orchestration
As a first step, the user chooses simulators for each system component. For standard components, users can draw on our library of pre-configured components and simulators. For custom components, the system architect needs to develop simulation models. They can decide the required level of detail here, depending on the project stage. For example, for the hardware accelerator, an early behavioral model in C++ or SystemC already enables users to answer early what-if questions. Later on, the virtual prototype can help test and explore different RTL design decisions. Critically, the virtual prototype also runs the complete software and workloads, possibly directly from the end-customer, to measure the full-system metrics that ultimately matter.
For building the simulation, the user writes a Python script for the SimBricks
orchestration framework that describes the system to simulate and which
simulators to use (see our earlier
post. SimBricks
orchestration scripts make it easy to explore a range of configurations, for
different combinations of parameters to evaluate. Since orchestration scripts
are just Python code creating objects describing the simulation, users can rely
on all Python constructs, such as loops, modules, etc. Our example below uses
itertools.product()
and a few simple for
loops to generate a complete list
of different combinations of parameter values:
from simbricks.orchestration import experiments as exp
from simbricks.orchestration import nodeconfig as node
from simbricks.orchestration import simulators as sim
import itertools
experiments = []
# The system- and component-level design choices to compare
M_opts = [4, 16, 128]
N_opts = [1, 2, 4, 8]
num_accel_per_server_opts = [1, 2]
accel_clk_freq_opts = [100, 400]
background_traffic_opts = [0.5, 0.8]
# Actually build the experiments now
for (
M, N, X,
Hz, background_traffic
) in itertools.product(
M_opts, N_opts, X_opts,
accel_clk_freq_opts, background_traffic_opts
):
experiment = exp.Experiment(
f"exploration-{M}c-{N}s-{X}x-{Hz}-{background_traffic}"
)
# Instantiate external & internal network, add background traffic
# ...
for i in range(N):
for j in range(X):
# Instantiate and connect all servers
# ...
for i in range(M):
# Instantiate and connect all clients
# ...
# Instantiate load balancer
# ...
experiments.append(experiment)
Fast Design Space Sweeps with Parallel Exploration
To make design space sweeps faster, SimBricks can run multiple different
simulations in parallel. The orchestration framework automates this if
simbricks-run
is invoked with the --parallel
flag. SimBricks manages CPU and
memory resources and schedules parallel simulations accordingly to avoid
slowdowns because of resource oversubscription. SimBricks also supports running
simulations across multiple physical hosts to run larger simulations and for
more parallelism. Stay tuned for a dedicated post on distributed simulations in
SimBricks! Until then: