icon of Kuzco

Kuzco

An open-source Swift package for on-device LLM inference on Apple platforms using llama.cpp. Offers async APIs and customization, with alternatives like MLX, ggml, and Core ML–based models.

Kuzco is a Swift package for integrating large language models (LLMs) directly into iOS and macOS apps. Built on top of llama.cpp, Kuzco enables you to run AI models entirely on-device, ensuring privacy and speed.

Key Features:

  • Local LLM Execution: Runs models on-device using llama.cpp.
  • Multiple Model Architectures: Supports LLaMA, Mistral, Phi, Gemma, Qwen, and more.
  • Async/Await Native: Modern Swift concurrency with streaming responses.
  • Cross-Platform: Works on iOS, macOS, and Mac Catalyst.
  • Flexible Model Settings: Fine-tune context length, batch size, GPU layers, and CPU threads.
  • Customizable Sampling: Control temperature, top-K, top-P, repetition penalties.
  • Automatic Architecture Detection: Detects model architectures from filenames.

Use Cases:

  • Building offline AI-powered features in iOS and macOS apps.
  • Creating privacy-focused chat applications.
  • Implementing on-device AI for resource-constrained devices.
  • Experimenting with different LLM architectures and configurations.

Stay Updated

Subscribe to our newsletter for the latest news and updates about Open Source Alternatives