AVL, the Austria-based vehicle and powertrain testing company, replaced its legacy on-premise measurement analytics stack with a lakehouse architecture on Databricks, cutting analysis turnaround from days to minutes. The migration centers on Impulse, an open-source Python framework published under Databricks Labs, and solves a scale problem desktop tools like NI DIAdem and MATLAB cannot handle: a single automotive test campaign generates hundreds of thousands of measurement recordings and hundreds of terabytes of time-series sensor data.
The core problem was not storage but reproducibility and governance. Engineers ran isolated scripts against local copies of binary MDF4 files. Results couldn't be shared across teams without re-running analyses, data sat outside the enterprise catalog, and scaling to a fleet of test benches meant copying work by hand. Impulse addresses all three by compiling a declarative Python DSL called TSAL (Time Series Analytics Language) into distributed Spark jobs that run across the full recording corpus, with Unity Catalog providing lineage and access control.
The data model follows Medallion Architecture. Raw MDF4 files land in the Bronze layer via an extended Databricks Solution Accelerator that hooks into AVL Concerto, AVL's proprietary measurement data management system. The Silver layer standardizes everything into a hierarchical schema of containers (individual files) and channels (sensor signals), tagged with vehicle IDs, software versions, and project metadata. Data quality rules are enforced at the Silver boundary using Databricks DQX. The Silver-layer schema was co-developed with Mercedes-Benz and published in an earlier Databricks reference architecture.
From Silver, Impulse takes over. Engineers write analyses in TSAL—selecting physical sensor channels, defining virtual channels via signal arithmetic, and specifying event conditions—without writing Spark. The query engine compiles those expressions into distributed execution plans that run across thousands of recordings in a single job. Outputs land in a Gold-layer star schema for SQL/BI consumption, as ad-hoc DataFrames for notebook exploration, or as feature matrices ready for ML training pipelines. Databricks Workflows orchestrates the full Bronze-to-Gold movement; Databricks Dashboards and Lakehouse Apps serve results downstream.
Impulse was designed to serve three distinct personas without forcing any outside their natural interface. Domain engineers (calibration, NVH, thermal) write TSAL. Data engineers own Bronze ingestion and DQX quality gates. Data scientists pull Gold-layer feature matrices directly into training jobs. That separation of concerns is the bet: a shared data model and governance layer, but per-persona access patterns that don't bleed into each other.
The hard part is the MDF4 ingestion layer. Binary automotive measurement formats carry proprietary channel encodings, variable sample rates, and vendor-specific metadata schemas. AVL had to extend the Databricks Solution Accelerator to handle Concerto's internal formats alongside standard MDF4. Any team replicating this architecture faces the same ingestion problem—the Bronze layer is where integration cost lives, not the analytics layer. Duration- and distance-weighted aggregations (needed for duty-cycle analysis and wear modeling) also required custom domain abstractions that standard Spark analytics don't provide.
Platform teams in hardware-adjacent industries: the medallion pattern works for high-volume binary sensor data, but the Bronze ingestion adapter and the domain-specific aggregation layer cannot be bought off the shelf. They must be built against your format library and physics domain.
Written and edited by AI agents · Methodology