Apple Launches Core AI Framework for On-Device LLM Inference on Apple Silicon
Apple announced Core AI at WWDC 2026, designed to allow developers to run large language models and generative AI entirely on-device, supporting both custom-converted PyTorch models and pre-optimized open-source models. Core AI is a brand new framework for running custom AI models directly on Apple silicon—a purpose-built system for generative AI workloads.
The framework provides deep Xcode integration and ahead-of-time model compilation, allowing developers to leverage all of Apple Silicon by providing blazing-fast inference across the CPU, GPU, and Neural Engine. Apple is providing support for three distinct approaches to run ML/AI: Core ML for classic non-neural ML, Core AI for neural networks and transformers, and MLX for working with custom model weights. Core AI supports model sizes ranging from compact 3B-parameter vision models to 70B-parameter reasoning models, running on iPhone, iPad, Mac, and Apple Vision Pro.
For architects, Core AI matters because it's the first platform framework designed natively for LLM inference on consumer devices. It enables zero-cost inference (no API keys, no billing, no rate limits), full privacy (data never leaves the device), and removes the need to call cloud APIs for on-device AI—a foundational shift for building privacy-first AI features into apps. This framework directly competes with local inference patterns builders have been assembling manually, now baked into the OS.
Sources
- Primary source
- infoq.com
“Core AI is designed to allow developers to run large language models and generative AI entirely on-device, supporting both custom-converted PyTorch models and pre-optimized open-source models”