Cost-Efficient Simulation in the Cloud: Paving the Way for Scalable Autonomy

April 5, 2024

In the fast-paced arena of advanced driver-assistance systems (ADAS) and automated driving (AD) development, the ability to rapidly and accurately validate and verify an array of safety cases through simulation is not just advantageous—it is essential.

As these simulations grow in complexity and scale, so too does the necessity for robust compute and storage solutions, typically provided by cloud providers like Amazon Web Services, Microsoft Azure, and Google Cloud Platform. However, this presents a formidable challenge: Cost.

Overlooking the financial dimension of simulation can have significant implications. At the heart of the issue is a paradox: While advanced simulations are indispensable for accelerating ADAS and AD development and ensuring safety, the escalating costs associated with scaling simulations in the cloud threaten to slow progress and stifle innovation. This tension between the need for comprehensive testing and the imperative of cost control underscores a critical challenge facing the industry. Without addressing the dual demands of scalability and affordability, ADAS and AD teams will face delays, compromising safety and efficiency. This blog post explores strategies and solutions designed to resolve this dilemma.

The Role of Simulation Software in ADAS and AD Development

Simulation software plays an important role in ADAS and AD development, facilitating the testing of algorithms, sensor configurations, and vehicle behaviors across a multitude of virtual environments. It enables an unparalleled breadth of testing, including scenarios that are impractical or impossible to replicate in the real world.

The ability to execute thousands of scenarios in parallel dramatically accelerates the ADAS and AD development process, increases efficiency, and decreases time to market.

Scalability

By running simulations at scale, ADAS and AD development teams can enhance test coverage and boost developer efficiency. When running a large number of simulations, teams can cover a wide range of scenarios, including edge cases that may not be frequently encountered in real-world driving. This helps ensure that an ADAS or AD system can handle a variety of situations.

Simulations can be run in parallel, testing multiple scenarios at once. This can significantly speed up the testing and validation process compared to real-world testing. For example, when running 10,000 unique scenarios with each scenario taking about ten minutes to simulate, running them sequentially would take almost 70 days of continuous simulation. If a team were to run 1,000 simulations at once, they could assess the same set of 10,000 scenarios in less than 2 hours. This demonstrates the significant time savings and efficiency gains achieved by running simulations at scale. 

Cost Efficiency

Cost efficiency is crucial in test and validation tooling for ADAS and AD software development. Here are four key reasons this is true:

  1. High development costs: Developing ADAS and AD software is complex and costly. It involves significant investment in hardware, software, talent, and infrastructure. Therefore, any opportunity to reduce costs without compromising on safety or performance can have a significant impact on the overall budget.
  2. Scale of testing: ADAS and AD systems need to be tested across a wide range of scenarios to ensure safety and reliability. This requires running millions, or even billions, of miles in simulation, which can be computationally intensive and expensive.

Comparing the costs of driving 10,000 miles in simulation vs. real-world driving illustrates the cost efficiency of simulation in the cloud.
Figure 1: Mature AV programs drive 25,000+ miles a day and supplement with simulation. Teams can run simulations at scale with Cloud Engine, part of Applied Intuition’s ADAS and AD development platform, at a fraction of the cost.

  1. Data management: ADAS and AD systems generate and consume massive amounts of data. It is crucial to efficiently store, retrieve, and process this data. Costs can escalate quickly if not managed properly.
  2. Iterative development: ADAS and AD development is an iterative process, with continuous integration and continuous deployment (CI/CD) of new software versions. Each iteration needs to be tested thoroughly, adding to the overall cost.

Applied Intuition’s Approach to Cost Efficiency

At Applied Intuition, our infrastructure team has developed and refined a variety of strategies to maximize cost efficiency in cloud-based simulations, enabling teams to achieve their development goals while reducing cost. 

Simulation queue management

Applied Intuition’s ADAS and AD development platform (ADP) enables teams to dynamically schedule and run simulation jobs at scale on different cloud compute instance types. This helps teams maximize resource utilization across their test and validation suites.

Engineering solution: Kubernetes is the industry-standard open source software for running workloads in the cloud. However, it has known scaling limitations that arise for compute and data-intensive applications. Instead of using Kubernetes’ scheduler, Applied Intuition developed a custom, purpose-built workload scheduler to give teams greater control over the cloud resources they use to run their simulation jobs. This innovation allows for dynamic scheduling and execution of simulation jobs across various instance types, optimizing resource utilization and reducing costs.

Impact: By parallelizing simulation work with startup and teardown tasks, and enabling dynamic routing of simulations, Applied Intuition has significantly reduced the per-simulation execution overhead.

Screen capture from the Applied Development Platform (ADP) shows how users can manage simulation queues to reduce overhead.
Figure 2: In ADP, users can manage simulation queues—distinct machine specifications on which they can run simulations from the cloud—to enable dynamic routing of simulations, thereby reducing overhead. 

Our team has also improved simulation resiliency by allowing simulations to be interrupted and dynamically routed. This approach has resulted in a 30% reduction in computing costs for Applied Intuition customers, enhancing the scalability of their testing and validation efforts.

Ephemeral compute

Recognizing the potential for cost savings, Applied Intuition has enabled the use of ephemeral compute options, such as spot instances, which are less reliable but priced significantly lower than their on-demand counterparts. This strategy leverages a cloud provider’s unused capacity to offer substantial discounts, albeit with the caveat of increased volatility in availability.

Infrastructure required to reliably utilize AWS Spot instances shown in flowchart-like imagery.
Figure 3: Infrastructure required to reliably utilize AWS Spot instances. 

Engineering solution: One of the challenges in using on-demand “spot” instances is that they might not always be available during periods of high demand and can be taken away at any time, should regular cloud demand increase.

To counteract the inherent unreliability of ephemeral compute, Applied Intuition developed infrastructure capable of running stateful simulations on an inherently unreliable stateless compute platform. By managing instance termination in real time and implementing a dual queue intelligent routing engine, we ensure that simulations can seamlessly transition to on-demand compute resources when necessary.

Impact: This approach has led to a 75% reduction in costs for large-scale simulation workloads, with minimal impact on reliability or execution speed.

Intelligent data management

ADAS and AD development require scalable storage solutions. Not only will drive logs and their processing utilize storage, but simulations also generate vast quantities of data, necessitating efficient strategies for storage, retrieval, and processing. To manage large swaths of data, Applied Intuition developed custom methods for users to tag, cycle, and tier their data, thus driving down object storage costs.

Engineering solution: First, our team developed custom tagging logic for all workloads (simulation, drive log analysis, etc.) running inside of the ADAS and AD development platform. We then implemented data lifecycle policies based on the custom tagging. Finally, we utilize intelligent tiering logic to reduce the unit costs of storage for data that is less frequently accessed.

Impact: Through these efforts, Applied Intuition’s development platform has achieved approximately a 50% reduction in cloud storage costs for our customers. For large organizations, this could mean millions of dollars in annual savings.

The role of simulation in ADAS and AD development is both critical and multifaceted, demanding scalability, efficiency, and cost-effectiveness.

Applied Intuition’s approach to simulation queue management, the utilization of ephemeral compute, and intelligent data management enables ADAS and AD development teams to prioritize cost efficiency without compromising on performance or safety.

Contact our engineering team to learn more about our approach to cost-efficient cloud simulation. Discover how Applied Intuition helps teams accelerate ADAS and AD development while reducing costs.