At WWDC 26, Apple introduced the Core AI framework, the official successor to Core ML. It’s designed to permit builders to run giant language fashions and generative AI totally on-device, supporting each custom-converted PyTorch fashions and pre-optimized open-source fashions.
Apple says the brand new Core AI framework offers a unified structure for deploying fashions starting from compact 3B-parameter imaginative and prescient fashions to large-scale LLMs, together with reasoning fashions with as much as 70B-parameter reasoning fashions, throughout the iPhone, iPad, Mac, and Apple Imaginative and prescient Professional.
Core AI is the expertise underpinning Apple Intelligence, and with the following launch of its OSes and toolchain, Apple is making it accessible to builders to construct what it calls “{custom} intelligence”. Core AI, which may solely run on Apple Silicon, ensures consumer information privateness, zero server dependencies, and nil per-token cloud prices.
Key Core AI capabilities embody unified {hardware} entry, permitting workloads to seamlessly run throughout the CPU, GPU, and Neural Engine below one API; a memory-safe Swift API enabling zero-copy information paths and fine-grained management over inference reminiscence; and ahead-of-time (AOT) compilation, which shifts work off the consumer’s system, yielding near-instant load occasions.
As talked about, you may convert a PyTorch mannequin right into a Core AI mannequin utilizing Core AI PyTorch. The best strategy is exporting a PyTorch as a torch.export.ExportedProgram and convert it to a CoreAI AIProgram utilizing TorchConverter().add_exported_program(ep).to_coreai().
Alternatively, you may creator a brand new Core AI mannequin from a PyTorch one utilizing built-in composite ops offered by the library, akin to consideration, RoPE embeddings, RMSNorm, and gather-matmul, registering {custom} decreasing operate to map new PyTorch ops to Core AI IR, and even creating {custom} Steel kernels for lower-level optimization.
When changing a PyTorch mannequin, an crucial step is compressing it for deployment on Apple {hardware}. This course of applies optimization methods akin to quantization and palettization, that are designed to align with the execution patterns of the Core AI runtime by default, guaranteeing environment friendly on-device efficiency.
Mannequin compression might help scale back the reminiscence footprint of your mannequin (disk dimension and at runtime), scale back inference latency, scale back energy consumption, or optimize them all of sudden.
One essential facet of operating an AIModel is its computerized specialization to the present {hardware} and OS model, which is carried by means of when the mannequin is first loaded into the mannequin cache. Consequently, the primary try to make use of a mannequin could take considerably longer than subsequent runs, as soon as the mannequin has been already cached. Builders can management how and when this course of occurs by customizing SpecializationOptions, accessing the AICacheModel to test whether or not a mannequin is already accessible or delete cached ones, and even sharing the mannequin cache throughout an app group.
With the introduction of Core AI, Apple is offering assist for 3 distinct approaches to run ML/AI on its working programs: Core ML, Core AI, and MLX Swift. Based mostly on developer discussions primarily based on Hacker Information, Apple appears to counsel utilizing Core ML for “basic, non-neural ML”, akin to determination timber or tabular characteristic engineering, Core AI for neural networks and transformers, and MLX for working with {custom} mannequin weights—although doubtlessly with decrease efficiency. Group suggestions additionally notes that whereas Core AI “makes it simpler to include high-performance LLMs”, its long-term worth will rely “on the the longer term progress of the official Core AI/neighborhood”.