About
llama-swap is used by thousands of users for reliable for on-demand loading of AI backends like llama.cpp, vLLM, Whisper, and stable-diffusion.cpp. It follows a simple design philosophy: one binary, one configuration file, no dependencies.
It comes with no strings attached. MIT licensed with zero telemetry collection.
Installation
llama-swap is available on Linux, Mac and Windows. You can use on of the available methods or build it yourself.
Images are built nightly and are available for multiple hardware platforms: CUDA, Intel, Vulkan, MUSA, and CPU inference.
bash
# Pull the container for your hardware
docker pull ghcr.io/mostlygeek/llama-swap:cuda
docker pull ghcr.io/mostlygeek/llama-swap:vulkan
docker pull ghcr.io/mostlygeek/llama-swap:intel
docker pull ghcr.io/mostlygeek/llama-swap:musa
docker pull ghcr.io/mostlygeek/llama-swap:cpu
# Pull specific versions of llama-swap and llama.cpp
docker pull ghcr.io/mostlygeek/llama-swap:v179-cuda-b7565
# Run with a custom configuration
docker run -v ./config.yaml:/app/config.yaml ghcr.io/mostlygeek/llama-swapAll available images can be found on the GitHub packages page.