Knowledge Graph Filter Keeps LLM Factory Explanations Audit-Ready

Researchers from Germany's Hochschule der Medien and affiliated institutions published a reference architecture that couples large language models with domain-specific knowledge graphs to generate human-readable explanations of ML model outputs on the factory floor, evaluated against 33 questions drawn from the standardized XAI Question Bank.

The system works in three stages. First, domain knowledge, ML model results, and their corresponding XAI explanations are co-stored in a Knowledge Graph as semantic triplets, binding manufacturing context directly to model outputs. Second, when an operator submits a natural-language query, a selective retrieval layer identifies the most relevant triplets from the KG. Third, those triplets are passed to an LLM, which synthesizes a role-appropriate, human-readable explanation — no data-science fluency required. The authors identify the retrieval step as critical: feeding raw KG data indiscriminately to an LLM degrades response quality, so the selection filter acts as a precision gate between the structured store and the generative layer.

Evaluation used both quantitative metrics — accuracy and consistency — and qualitative ones: clarity and usefulness. Beyond standard XAI Question Bank items, the team designed custom questions to stress-test the architecture against complex manufacturing scenarios. The paper reports that the method supports decision-making in real-world manufacturing environments; specific aggregate scores appear in the full paper rather than the abstract.

For enterprise AI architects, the architecture addresses a persistent deployment gap: ML models optimized for predictive performance routinely produce outputs that line operators cannot interpret, audit, or trust. By anchoring explanations in a structured KG rather than relying on the LLM's parametric memory, the system keeps explanations grounded in verified domain facts and traceable to specific data points — a property ad-hoc prompt engineering cannot reliably guarantee.

Under the EU AI Act, AI systems functioning as safety components of machinery are classified as high-risk via Article 6(1) in conjunction with Annex I — which maps to Union harmonisation legislation such as the Machinery Regulation (EU) 2023/1230 — rather than Annex III, which covers distinct sectors including biometrics, critical infrastructure, and employment. The transparency and human-oversight obligations that flow from that high-risk classification are the same regardless of route: operators must understand and oversee model outputs. A KG-backed explanation layer provides the structured audit trail those rules demand: every explanation is derived from explicit, inspectable triplets rather than opaque model internals.

The paper does not specify which LLM was used in production testing, which matters for enterprises assessing on-premises versus cloud deployment tradeoffs. KG construction and maintenance — keeping domain knowledge current as processes evolve — represents ongoing operational overhead the study does not quantify. At 33 evaluation questions, the benchmark covers a limited slice of the query diversity a deployed system would face at scale.

The more significant limitation may be organizational: the framework assumes a structured KG already exists or can be built. For manufacturers still running ML models against flat data lakes with no semantic layer, the KG is the real infrastructure investment — the LLM integration is the easy part.

Sources

Researchers evaluated the LLM + Knowledge Graph framework against 33 questions from the XAI Question Bank
"We evaluated 33 questions, analyzing responses using quantitative metrics such as accuracy and consistency, as well as qualitative ones such as clarity and usefulness."
arxiv.org ↗
The system stores domain-specific data, ML results, and explanations in a Knowledge Graph as semantic triplets
"We store domain-specific data along with ML results and their corresponding explanations, establishing a structured connection between domain knowledge and ML insights."
arxiv.org ↗
Relevant triplets are selectively retrieved from the KG and processed by an LLM to generate user-friendly explanations
"we designed a selective retrieval method in which relevant triplets are extracted from the KG and processed by a Large Language Model (LLM) to generate user-friendly explanations of ML results."
arxiv.org ↗
The method was applied in a real-world manufacturing environment and shown to support decision-making
"we provide empirical evidence showing that such explanations can be successfully applied in real-world manufacturing environments, supporting better decision-making in manufacturing processes."
arxiv.org ↗
The authors introduced more complex, tailored questions beyond the standard XAI Question Bank to stress-test the approach
"Beyond standard questions, we introduce more complex, tailored questions that highlight the strengths of our approach."
arxiv.org ↗
Authors are Thomas Bayer, Alexander Lohr, Sarah Weiß, Bernd Michelberger, and Wolfram Höpken; paper published April 17, 2026
"Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing"
arxiv.org ↗
Annex III of the EU AI Act covers high-risk AI systems referred to in Article 6(2) — categories such as biometrics, critical infrastructure, and employment — not industrial/machinery AI, which falls under Article 6(1) + Annex I
"Annex III: High-Risk AI Systems Referred to in Article 6(2)"
artificialintelligenceact.eu ↗

Written and edited by AI agents · Methodology