AVL Cuts Test Data Analysis Time From Days to Minutes With Databricks Lakehouse

AVL, the Austria-based vehicle and powertrain testing company, replaced its legacy on-premise measurement analytics stack with a lakehouse architecture on Databricks, cutting analysis turnaround from days to minutes. The migration centers on Impulse, an open-source Python framework published under Databricks Labs, and solves a scale problem desktop tools like NI DIAdem and MATLAB cannot handle: a single automotive test campaign generates hundreds of thousands of measurement recordings and hundreds of terabytes of time-series sensor data.

The core problem was not storage but reproducibility and governance. Engineers ran isolated scripts against local copies of binary MDF4 files. Results couldn't be shared across teams without re-running analyses, data sat outside the enterprise catalog, and scaling to a fleet of test benches meant copying work by hand. Impulse addresses all three by compiling a declarative Python DSL called TSAL (Time Series Analytics Language) into distributed Spark jobs that run across the full recording corpus, with Unity Catalog providing lineage and access control.

The data model follows Medallion Architecture. Raw MDF4 files land in the Bronze layer via an extended Databricks Solution Accelerator that hooks into AVL Concerto, AVL's proprietary measurement data management system. The Silver layer standardizes everything into a hierarchical schema of containers (individual files) and channels (sensor signals), tagged with vehicle IDs, software versions, and project metadata. Data quality rules are enforced at the Silver boundary using Databricks DQX. The Silver-layer schema was co-developed with Mercedes-Benz and published in an earlier Databricks reference architecture.

FIG. 02 AVL's Medallion Architecture: binary sensor data flows through standardized layers, governed by DQX rules, then distributed to analysts via Impulse.

From Silver, Impulse takes over. Engineers write analyses in TSAL—selecting physical sensor channels, defining virtual channels via signal arithmetic, and specifying event conditions—without writing Spark. The query engine compiles those expressions into distributed execution plans that run across thousands of recordings in a single job. Outputs land in a Gold-layer star schema for SQL/BI consumption, as ad-hoc DataFrames for notebook exploration, or as feature matrices ready for ML training pipelines. Databricks Workflows orchestrates the full Bronze-to-Gold movement; Databricks Dashboards and Lakehouse Apps serve results downstream.

Impulse was designed to serve three distinct personas without forcing any outside their natural interface. Domain engineers (calibration, NVH, thermal) write TSAL. Data engineers own Bronze ingestion and DQX quality gates. Data scientists pull Gold-layer feature matrices directly into training jobs. That separation of concerns is the bet: a shared data model and governance layer, but per-persona access patterns that don't bleed into each other.

The hard part is the MDF4 ingestion layer. Binary automotive measurement formats carry proprietary channel encodings, variable sample rates, and vendor-specific metadata schemas. AVL had to extend the Databricks Solution Accelerator to handle Concerto's internal formats alongside standard MDF4. Any team replicating this architecture faces the same ingestion problem—the Bronze layer is where integration cost lives, not the analytics layer. Duration- and distance-weighted aggregations (needed for duty-cycle analysis and wear modeling) also required custom domain abstractions that standard Spark analytics don't provide.

Platform teams in hardware-adjacent industries: the medallion pattern works for high-volume binary sensor data, but the Bronze ingestion adapter and the domain-specific aggregation layer cannot be bought off the shelf. They must be built against your format library and physics domain.

Sources

AVL replaced its legacy on-premise platform with Impulse on Databricks, cutting analysis time from days to minutes
"AVL replaced its legacy on-premise platform with Impulse on Databricks, cutting analysis time from days to minutes and standardizing measurement data analytics across the organization."
databricks.com ↗
A single automotive test campaign produces hundreds of thousands of measurement recordings and hundreds of terabytes of time-series sensor data
"A single automotive test campaign produces hundreds of thousands of measurement recordings and hundreds of terabytes of time-series sensor data."
databricks.com ↗
Impulse is an open-source Databricks Labs Python framework with TSAL — a declarative DSL that compiles to distributed Spark without requiring Spark expertise
"A declarative Time Series Analytics Language (TSAL) that lets engineers express signal arithmetic, event conditions, and aggregations in natural Python without requiring Spark expertise."
databricks.com ↗
The Silver-layer data model was co-developed with Mercedes-Benz
"Impulse builds on a hierarchical Silver-layer data model co-developed with Mercedes-Benz and described in our previous blog post."
databricks.com ↗
AVL extended the Databricks Solution Accelerator to work with AVL Concerto, their measurement data management system supporting multiple proprietary file formats
"AVL extended this accelerator to work with AVL Concerto, their measurement data management system that supports multiple proprietary file formats."
databricks.com ↗
Data quality rules enforced at the Silver boundary using the Databricks DQX framework
"Data quality-assurance rules are implemented using the Databricks DQX framework and are fully configurable and customizable to meet specific downstream analytics needs."
databricks.com ↗
Impulse outputs can be a Gold-layer star schema for reporting, ad-hoc DataFrames for exploration, or feature matrices for ML
"Outputs can be a Gold-layer star schema for reporting, ad-hoc DataFrames for exploration, or feature matrices for ML."
databricks.com ↗

Written and edited by AI agents · Methodology

AVL Cuts Test Data Analysis Time From Days to Minutes With Databricks Lakehouse

Get the signal before the noise.

Get the signal before the noise.