In our effort to develop SimBricks from a research simulation tool into a
virtual prototyping solution for a broad range of use-cases, we are introducing
a fundamentally new architecture. Originally, SimBricks was a synchronous tool:
the user writes a simulation configuration and then runs it by invoking
simbricks-run
, which runs the simulation to completion and then returns. By
default simbricks-run
would run all simulator instances locally, but
optionally also supported running some simulators, again synchronously, via SSH
on different machines (e.g. for distributed
simulations). This approach is simple to
understand, requires minimal setup, and was easy to develop for our research.
However, this simple architecture has fundamental problems going forward.
This is a preview post! At the time of posting, we have not publicly released the new architecture yet.
Why #1: Multi-User Environments
First off, simulating the often complex and large virtual prototypes can take substantial computational resources, thus users typically run them on separate server machines, often in shared clusters (we certainly do!). Running the SimBricks virtual prototypes remotely is straightforward: connect to a remote machine via ssh and run the simulation. However, sharing resources with others also doing the same on shared machines is a challenge, as there is no explicit resource management and simulation performance will degrade significantly if there are not enough cores available.
Why #2: Robust and Asynchronous Runs
Since virtual prototyping runs can take substantial time and resources, robust
and asynchronous user interaction is key. With the status quo, this typically
boiled down to running SimBricks in a tmux
session, so the running simulation
survives ssh disconnecting. While a bit clunky, this can be workable for very
technical users running smaller non-distributed simulations. However, when
running distributed simulations, SimBricks would run each simulator on a remote
host through an SSH-connection. One of these SSH connections resetting, would
kill the complete simulation.
Why #3: Deployment Logistics
The local and ssh-based execution also complicates deployment. Users need direct access to environments with all simulators and their dependencies installed. Even at our research institute with relatively flexible IT this is often tricky on shared infrastructure. Our easiest option for now was running per-user docker containers with multi-gigabyte images. Moreover, some of the proprietary simulators require very specific and mutually incompatible OS/environments, such as specific ubuntu versions in one case and CentOS in another. In practice, this leads to multiple deployment headaches for users, especially combined with updates and license management.
Why #4: Enabling New Use-Cases
So far SimBricks has primarily targeted heavily technical users, but as we seek to enable new use-cases we also aim to enable non-simulation-experts to use SimBricks. For these users, setting up specific docker images and running on remote machines via SSH is simply not feasible. Further, for many virtual prototyping use-cases, the simulation of the virtual prototype is really only a small part of other processes. But the previous architecture make integration into other processes and tools a challenge. A useful graphical user interface is also a challenge with the current constraints.
What & How: The New Architecture
To address these challenges, we have substantially re-architected SimBricks, as shown in the figure above. The new architecture comprises three main components: frontend, backend, and runners. These components communicate through HTTP REST APIs. Briefly, frontends configure virtual prototypes and submit them to the backend, and then synchronously or asynchronously retrieve output and results. The backend then determines and schedules where and when to run the submitted virtual prototype simulations. For this, the backend assigns complete virtual prototypes or fragments thereof to individual runners (akin to GitHub or Gitlab CI), that then run the necessary simulators, collect output, and communicate back to the backend.
Initially we provide only a programmatic frontend/SDK. With this frontend, users
configure virtual prototypes through our python orchestration framework, similar
to how this previously worked. However, different to before, users then submit
these virtual prototype configurations to the backend by using the simbricks
cli tool or from scripts with our python client library. The CLI tool and
libraries can then either asynchronously retrieve output and results later, or
synchronously receive and process output as the virtual prototype runs. In the
future we will extend SimBricks with a graphical (Web) frontend, that also
enables graphical configuration and interaction.
The backend is responsible for storing configurations and results and scheduling which submitted virtual prototype simulations from multiple users to run on which runners and when. The backend also aggregates output and results and stores them for future retrieval.
Finally, runners actually execute virtual prototype simulations. Each runner is setup with a (set of) suitable environment(s) for the simulators it supports, and different runners can be configured differently and independently. As a result even mutually incompatible simulators can be configured through different runners that can run on shared or separate machines. Crucially, a set of runners on the same infrastructure can be shared by multiple users, as managed by the backend.
Stay Tuned For More Details & Updates
This new architecture is a first step towards a more useful and easy to use SimBricks. Along with this architecture change, we have also re-designed the SimBricks virtual prototype configuration abstractions. More on this in our next blog post. Until then: