Daikin Turns Data Pipeline Rules Into Workspace Skills

Daikin Applied Americas, the North American commercial HVAC arm of the Japanese conglomerate, redesigned its data engineering operating model around Databricks Genie Code—an AI-assisted pipeline authoring tool. The result: pipeline prototypes that previously took days now generate in minutes.

The company's data team supports analytics and AI workloads across engineering, operations, and customer service, working with equipment telemetry, supply chain records, and field service data. As demand for pipelines grew, so did inconsistency. Early Genie Code adoption followed a predictable anti-pattern: long monolithic prompts encoding architecture rules, naming conventions, transformation logic, and documentation requirements in a single block. Instructions drifted across teams. Similar requests produced structurally different outputs. Prompts became unmaintainable.

The fix was structural. Daikin built a MECE (Mutually Exclusive, Collectively Exhaustive) skill framework—discrete, non-overlapping capability definitions covering the full data engineering lifecycle: medallion architecture design, source readiness, grain definition, transformation patterns, canonical alignment, and governance standards. Instead of embedding rules in each prompt, the environment loads relevant skills at runtime. Genie Code operates against those constraints during planning and execution. Trent Lezer, Sr. Director of Data & Analytics at Daikin Applied Americas, framed it plainly: "Genie Code works best when treated like a junior engineer who works fast but must respect the same architectural constraints as everyone else, no special exemptions 'because it's AI.'"

The MECE framework is enforced at the workspace level, not the conversation level. James VanGordon, Solutions Architect at Databricks: "Prompts get you started, but they are a bad place to enforce team standards. If the same rule matters more than once, it should live in the workspace as a skill, where Genie Code can actually use it." Governance lives where the work is created—not in a downstream review step someone has to remember.

The medallion architecture—Bronze (raw source truth), Silver (cleaned and conformed), Gold (business-ready analytics)—already existed at Daikin but was treated as a storage convention rather than an execution constraint. The team turned layer boundaries into checkpoints: before data advances from Bronze to Silver, source grain definition, join validation, and data stability checks must pass. These gates are enforced within the development workflow as pipelines are generated, not after the fact. Genie Code operates inside them.

FIG. 02 Daikin's medallion architecture enforces data quality gates at each tier, from raw ingestion through business-ready analytics. — Databricks case study

A parallel track addressed the semantic gap between technical models and business language. Daikin stakeholders think in terms of customers, equipment units, and service events—not joins and transformation chains. The team anchored pipelines to canonical entity definitions stored in Unity Catalog: a Customer, an Equipment Unit, a Service Event. Each definition carries agreed-upon business logic. Genie Code uses these as a stable vocabulary when planning transformations, reducing back-and-forth when LLM-generated SQL reflects neither source schema nor business intent.

When LLM-generated code must conform to org-wide standards, governance cannot live in prompts—prompts are ephemeral, per-session, and impossible to audit across a team. Encoding standards as reusable, runtime-loaded workspace skills shifts enforcement from human memory to system configuration. For architects evaluating agentic data engineering tools: the decision point is not whether the LLM can write correct SQL. It's whether the execution environment constrains the LLM to your architectural standards before the SQL lands in production.

Sources

Pipeline prototypes that previously took days to build can now be generated in minutes using Databricks Genie Code
"Pipelines that previously took days to prototype could be generated in minutes."
databricks.com ↗
Trent Lezer on treating Genie Code like a governed junior engineer with no special exemptions
"Genie Code works best when treated like a junior engineer who works fast but must respect the same architectural constraints as everyone else, no special exemptions 'because it's AI.'"
databricks.com ↗
DAA implemented a MECE skill framework where each skill defines one coherent, non-overlapping competency covering the full data engineering lifecycle
"We implemented a MECE skill framework, each skill defines one coherent competency, skills are non-overlapping and the full set covers the entire lifecycle of data engineering work."
databricks.com ↗
James VanGordon: prompts are a bad place to enforce team standards; recurring rules should live in the workspace as skills
"Prompts get you started, but they are a bad place to enforce team standards. If the same rule matters more than once, it should live in the workspace as a skill, where Genie Code can actually use it."
databricks.com ↗
Medallion architecture checkpoints (grain definition, join validation, data stability checks) are enforced within the development workflow as pipelines are generated
"These checkpoints are enforced within the development workflow itself, not as downstream review steps. Genie Code operates within these constraints as pipelines are generated and modified."
databricks.com ↗
DAA anchored pipelines to canonical entity definitions (Customer, Equipment Unit, Service Event) stored in Unity Catalog to align LLM-generated SQL with business language
"The team anchored pipelines to business concepts via canonical entity definitions in Unity Catalog."
databricks.com ↗

Written and edited by AI agents · Methodology

Daikin Turns Data Pipeline Rules Into Workspace Skills

Get the signal before the noise.

Get the signal before the noise.