Kuzco is a Swift package for integrating large language models (LLMs) directly into iOS and macOS apps. Built on top of llama.cpp, Kuzco enables you to run AI models entirely on-device, ensuring privacy and speed.
Key Features:
- Local LLM Execution: Runs models on-device using
llama.cpp. - Multiple Model Architectures: Supports LLaMA, Mistral, Phi, Gemma, Qwen, and more.
- Async/Await Native: Modern Swift concurrency with streaming responses.
- Cross-Platform: Works on iOS, macOS, and Mac Catalyst.
- Flexible Model Settings: Fine-tune context length, batch size, GPU layers, and CPU threads.
- Customizable Sampling: Control temperature, top-K, top-P, repetition penalties.
- Automatic Architecture Detection: Detects model architectures from filenames.
Use Cases:
- Building offline AI-powered features in iOS and macOS apps.
- Creating privacy-focused chat applications.
- Implementing on-device AI for resource-constrained devices.
- Experimenting with different LLM architectures and configurations.

