SimBricks allows us to easily create detailed virtual prototypes of complex network systems. In order to evaluate new network components or network transport protocols, we often want to prototype a large scale network to understand the effects of the new component or protocol, for instance, on a full data center. However, scaling the system also leads to large resource requirements to run the virtual prototype, potentially requiring 1000s of CPU cores. This makes it obviously infeasible to run the virtual prototypes on a single machine. Using SimBricks’ ability to distribute the underlying simulations among multiple machines, makes them possible as long as we have enough compute resources available, but it does not solve the root cause. Fortunately, we often do not need to model each component in full detail, which allows us to simulate non-critical components with less detailed simulators that are also less resource intensive. By implementing mixed-fidelity simulations and splitting up slow bottleneck simulators, we can reduce the required compute resources while maintaining good accuracy and keeping simulation times low.

Scaling Virtual Prototypes

In order to scale the virtual prototype, we simply add more components to the system that we want to model. With modular simulation this translates to additional simulator processes that are added to the virtual prototype, which naturally parallelizes the simulation and keeps the simulation time low. However, this approach also increases the required compute resources with every added simulator process, resulting in large amounts of CPU cores and memory that are needed for large scale systems.

Reducing Resource Requirements

For many evaluation tasks virtual prototypes, which model each component of the system in full detail, are actually not necessary. For example, a large network system consists of thousands of hosts, but usually we are not interested in a detailed model for all of them. Instead, just a few hosts might be used to gather detailed measurements, while the rest of them are responsible for generating realistic background traffic. This means that we can use different levels of fidelity to model the components in our virtual prototype.

Mixed-Fidelity Simulations

In most cases a component of a system can be modelled by different simulators. A network host, for example, could be simulated by the detailed architectural simulator gem5 or on a protocol level by the network simulator ns-3. The modular architecture of SimBricks enables us to choose between various simulators for each component, allowing full-system simulations with mixed fidelity. Additionally, we might simulate multiple components with less detail in a single simulator process, like hosts in ns-3, which reduces the amount of required CPU cores.

Splitting Up Bottleneck Simulators

However, when we move many components into a single simulator process, this process might become a bottleneck in the simulation, even if each component is modelled with low fidelity. To overcome this limitation, we split up the bottleneck simulator into multiple fragments, simulate each fragment in its own process and add corresponding SimBricks channels between the new fragment processes. In the case of a network, for example, we can split the topology along ethernet links and replace those with SimBricks channels. Although this increases the number of processes again, it keeps the simulation time low. Hence, both approaches together help reduce the required resources, while keeping simulation times manageable.

Overview of

Orchestrating Complex Virtual Prototypes

Configuring and running a virtual prototype composed of many simulator instances is already a challenging task. Mixed-fidelity simulations and splitting up bottleneck simulators further adds to this complexity. For example, moving a host to a network simulator or splitting up the network topology requires us to directly configure the network simulator, which becomes a laborious task. To overcome this complexity, we separate the specification of the system from its implementation which specifies how the system is simulated. With this abstraction mixed-fidelity simulations and splitting up simulators becomes an implementation choice. Additionally, we introduce a network abstraction layer, so that we can configure the network simulator through our orchestration framework instead of having to configure the simulator directly. The network abstraction is used by the system specification and the implementation to specify a network system and generate network descriptions. Each network simulator receives a description and configures the corresponding system, which it then simulates. This provides a practical approach to configure and run virtual prototypes of large scale network systems with manageable resource requirements while maintaining good accuracy and low simulation time.

As always feel free to: