From bf2f95c684863e90b33186e794846384feae5073 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Thu, 25 Dec 2025 10:00:07 +0100 Subject: [PATCH] chore(docs): update docs with cuda 13 instructions and the new vibevoice backend Signed-off-by: Ettore Di Giacinto --- README.md | 49 +++++++++++------- docs/content/features/GPU-acceleration.md | 1 + docs/content/features/mcp.md | 1 + docs/content/features/text-to-audio.md | 42 +++++++++++++++ .../getting-started/container-images.md | 31 +++++++++-- docs/content/installation/docker.md | 18 +++++++ docs/content/reference/compatibility-table.md | 36 ++++++------- docs/content/reference/nvidia-l4t.md | 51 ++++++++++++++++--- 8 files changed, 185 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index d67886a11..170e34b6c 100644 --- a/README.md +++ b/README.md @@ -146,6 +146,9 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest ### NVIDIA GPU Images: ```bash +# CUDA 13.0 +docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13 + # CUDA 12.0 docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12 @@ -153,7 +156,11 @@ docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gp docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11 # NVIDIA Jetson (L4T) ARM64 +# CUDA 12 (for Nvidia AGX Orin and similar platforms) docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64 + +# CUDA 13 (for Nvidia DGX Spark) +docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13 ``` ### AMD GPU Images (ROCm): @@ -180,6 +187,9 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan # CPU version docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu +# NVIDIA CUDA 13 version +docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13 + # NVIDIA CUDA 12 version docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 @@ -269,39 +279,40 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration ### Text Generation & Language Models | Backend | Description | Acceleration Support | |---------|-------------|---------------------| -| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU | -| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12, ROCm, Intel | -| **transformers** | HuggingFace transformers framework | CUDA 11/12, ROCm, Intel, CPU | -| **exllama2** | GPTQ inference library | CUDA 12 | +| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU | +| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12/13, ROCm, Intel | +| **transformers** | HuggingFace transformers framework | CUDA 11/12/13, ROCm, Intel, CPU | +| **exllama2** | GPTQ inference library | CUDA 12/13 | | **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) | | **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) | ### Audio & Speech Processing | Backend | Description | Acceleration Support | |---------|-------------|---------------------| -| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU | -| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12, ROCm, Intel, CPU | -| **bark** | Text-to-audio generation | CUDA 12, ROCm, Intel | +| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU | +| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, CPU | +| **bark** | Text-to-audio generation | CUDA 12/13, ROCm, Intel | | **bark-cpp** | C++ implementation of Bark | CUDA, Metal, CPU | -| **coqui** | Advanced TTS with 1100+ languages | CUDA 12, ROCm, Intel, CPU | -| **kokoro** | Lightweight TTS model | CUDA 12, ROCm, Intel, CPU | -| **chatterbox** | Production-grade TTS | CUDA 11/12, CPU | +| **coqui** | Advanced TTS with 1100+ languages | CUDA 12/13, ROCm, Intel, CPU | +| **kokoro** | Lightweight TTS model | CUDA 12/13, ROCm, Intel, CPU | +| **chatterbox** | Production-grade TTS | CUDA 11/12/13, CPU | | **piper** | Fast neural TTS system | CPU | | **kitten-tts** | Kitten TTS models | CPU | | **silero-vad** | Voice Activity Detection | CPU | -| **neutts** | Text-to-speech with voice cloning | CUDA 12, ROCm, CPU | +| **neutts** | Text-to-speech with voice cloning | CUDA 12/13, ROCm, CPU | +| **vibevoice** | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU | ### Image & Video Generation | Backend | Description | Acceleration Support | |---------|-------------|---------------------| -| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12, Intel SYCL, Vulkan, CPU | -| **diffusers** | HuggingFace diffusion models | CUDA 11/12, ROCm, Intel, Metal, CPU | +| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12/13, Intel SYCL, Vulkan, CPU | +| **diffusers** | HuggingFace diffusion models | CUDA 11/12/13, ROCm, Intel, Metal, CPU | ### Specialized AI Tasks | Backend | Description | Acceleration Support | |---------|-------------|---------------------| -| **rfdetr** | Real-time object detection | CUDA 12, Intel, CPU | -| **rerankers** | Document reranking API | CUDA 11/12, ROCm, Intel, CPU | +| **rfdetr** | Real-time object detection | CUDA 12/13, Intel, CPU | +| **rerankers** | Document reranking API | CUDA 11/12/13, ROCm, Intel, CPU | | **local-store** | Vector database | CPU | | **huggingface** | HuggingFace API integration | API-based | @@ -311,11 +322,13 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration |-------------------|-------------------|------------------| | **NVIDIA CUDA 11** | llama.cpp, whisper, stablediffusion, diffusers, rerankers, bark, chatterbox | Nvidia hardware | | **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware | -| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark, neutts | AMD Graphics | -| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark | Intel Arc, Intel iGPUs | +| **NVIDIA CUDA 13** | All CUDA-compatible backends | Nvidia hardware | +| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark, neutts, vibevoice | AMD Graphics | +| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark, vibevoice | Intel Arc, Intel iGPUs | | **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cpp | Apple M1/M2/M3+ | | **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs | -| **NVIDIA Jetson** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI | +| **NVIDIA Jetson (CUDA 12)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (AGX Orin, etc.) | +| **NVIDIA Jetson (CUDA 13)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (DGX Spark) | | **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support | ### 🔗 Community and integrations diff --git a/docs/content/features/GPU-acceleration.md b/docs/content/features/GPU-acceleration.md index 7c619962b..2f10054d3 100644 --- a/docs/content/features/GPU-acceleration.md +++ b/docs/content/features/GPU-acceleration.md @@ -82,6 +82,7 @@ The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=ta - CUDA `11` tags: `master-gpu-nvidia-cuda-11`, `v1.40.0-gpu-nvidia-cuda-11`, ... - CUDA `12` tags: `master-gpu-nvidia-cuda-12`, `v1.40.0-gpu-nvidia-cuda-12`, ... +- CUDA `13` tags: `master-gpu-nvidia-cuda-13`, `v1.40.0-gpu-nvidia-cuda-13`, ... In addition to the commands to run LocalAI normally, you need to specify `--gpus all` to docker, for example: diff --git a/docs/content/features/mcp.md b/docs/content/features/mcp.md index 26134d3df..9172b9f76 100644 --- a/docs/content/features/mcp.md +++ b/docs/content/features/mcp.md @@ -257,6 +257,7 @@ It might be handy to install packages before starting the container to setup the services: local-ai: image: localai/localai:latest + #image: localai/localai:latest-gpu-nvidia-cuda-13 #image: localai/localai:latest-gpu-nvidia-cuda-12 container_name: local-ai restart: always diff --git a/docs/content/features/text-to-audio.md b/docs/content/features/text-to-audio.md index d10c8ad09..b9e5bdb82 100644 --- a/docs/content/features/text-to-audio.md +++ b/docs/content/features/text-to-audio.md @@ -122,6 +122,48 @@ curl --request POST \ Future versions of LocalAI will expose additional control over audio generation beyond the text prompt. +### VibeVoice + +[VibeVoice-Realtime](https://github.com/microsoft/VibeVoice) is a real-time text-to-speech model that generates natural-sounding speech with voice cloning capabilities. + +#### Setup + +Install the `vibevoice` model in the Model gallery. + +#### Usage + +Use the tts endpoint by specifying the vibevoice backend: + +``` +curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{ + "model": "vibevoice", + "input":"Hello!" + }' | aplay +``` + +#### Voice cloning + +VibeVoice supports voice cloning through voice preset files. You can configure a model with a specific voice: + +```yaml +name: vibevoice +backend: vibevoice +parameters: + model: microsoft/VibeVoice-Realtime-0.5B +tts: + voice: "Frank" # or use audio_path to specify a .pt file path + # Available English voices: Carter, Davis, Emma, Frank, Grace, Mike +``` + +Then you can use the model: + +``` +curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{ + "model": "vibevoice", + "input":"Hello!" + }' | aplay +``` + ### Vall-E-X [VALL-E-X](https://github.com/Plachtaa/VALL-E-X) is an open source implementation of Microsoft's VALL-E X zero-shot TTS model. diff --git a/docs/content/getting-started/container-images.md b/docs/content/getting-started/container-images.md index a4e94d8b8..5f4db3929 100644 --- a/docs/content/getting-started/container-images.md +++ b/docs/content/getting-started/container-images.md @@ -70,6 +70,16 @@ Standard container images do not have pre-installed models. Use these if you wan {{% /tab %}} +{{% tab title="GPU Images CUDA 13" %}} + +| Description | Quay | Docker Hub | +| --- | --- |-------------------------------------------------------------| +| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-gpu-nvidia-cuda-13` | `localai/localai:master-gpu-nvidia-cuda-13` | +| Latest tag | `quay.io/go-skynet/local-ai:latest-gpu-nvidia-cuda-13` | `localai/localai:latest-gpu-nvidia-cuda-13` | +| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-gpu-nvidia-cuda-13` | `localai/localai:{{< version >}}-gpu-nvidia-cuda-13` | + +{{% /tab %}} + {{% tab title="Intel GPU" %}} | Description | Quay | Docker Hub | @@ -98,9 +108,9 @@ Standard container images do not have pre-installed models. Use these if you wan | Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-vulkan` | `localai/localai:{{< version >}}-vulkan` | {{% /tab %}} -{{% tab title="Nvidia Linux for tegra" %}} +{{% tab title="Nvidia Linux for tegra (CUDA 12)" %}} -These images are compatible with Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, and Jetson AGX Xavier. For more information, see the [Nvidia L4T guide]({{%relref "reference/nvidia-l4t" %}}). +These images are compatible with Nvidia ARM64 devices with CUDA 12, such as the Jetson Nano, Jetson Xavier NX, and Jetson AGX Orin. For more information, see the [Nvidia L4T guide]({{%relref "reference/nvidia-l4t" %}}). | Description | Quay | Docker Hub | | --- | --- |-------------------------------------------------------------| @@ -110,6 +120,18 @@ These images are compatible with Nvidia ARM64 devices, such as the Jetson Nano, {{% /tab %}} +{{% tab title="Nvidia Linux for tegra (CUDA 13)" %}} + +These images are compatible with Nvidia ARM64 devices with CUDA 13, such as the Nvidia DGX Spark. For more information, see the [Nvidia L4T guide]({{%relref "reference/nvidia-l4t" %}}). + +| Description | Quay | Docker Hub | +| --- | --- |-------------------------------------------------------------| +| Latest images from the branch (development) | `quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-cuda-13` | `localai/localai:master-nvidia-l4t-arm64-cuda-13` | +| Latest tag | `quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13` | `localai/localai:latest-nvidia-l4t-arm64-cuda-13` | +| Versioned image | `quay.io/go-skynet/local-ai:{{< version >}}-nvidia-l4t-arm64-cuda-13` | `localai/localai:{{< version >}}-nvidia-l4t-arm64-cuda-13` | + +{{% /tab %}} + {{< /tabs >}} ## All-in-one images @@ -147,11 +169,13 @@ services: image: localai/localai:latest-aio-cpu # For a specific version: # image: localai/localai:{{< version >}}-aio-cpu - # For Nvidia GPUs decomment one of the following (cuda11 or cuda12): + # For Nvidia GPUs decomment one of the following (cuda11, cuda12, or cuda13): # image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-11 # image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-12 + # image: localai/localai:{{< version >}}-aio-gpu-nvidia-cuda-13 # image: localai/localai:latest-aio-gpu-nvidia-cuda-11 # image: localai/localai:latest-aio-gpu-nvidia-cuda-12 + # image: localai/localai:latest-aio-gpu-nvidia-cuda-13 healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"] interval: 1m @@ -203,6 +227,7 @@ docker run -p 8080:8080 --name local-ai -ti -v localai-models:/models localai/lo | Versioned image (e.g. for CPU) | `quay.io/go-skynet/local-ai:{{< version >}}-aio-cpu` | `localai/localai:{{< version >}}-aio-cpu` | | Latest images for Nvidia GPU (CUDA11) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-11` | `localai/localai:latest-aio-gpu-nvidia-cuda-11` | | Latest images for Nvidia GPU (CUDA12) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12` | `localai/localai:latest-aio-gpu-nvidia-cuda-12` | +| Latest images for Nvidia GPU (CUDA13) | `quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-13` | `localai/localai:latest-aio-gpu-nvidia-cuda-13` | | Latest images for AMD GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-hipblas` | `localai/localai:latest-aio-gpu-hipblas` | | Latest images for Intel GPU | `quay.io/go-skynet/local-ai:latest-aio-gpu-intel` | `localai/localai:latest-aio-gpu-intel` | diff --git a/docs/content/installation/docker.md b/docs/content/installation/docker.md index 125968d28..1a3ea706c 100644 --- a/docs/content/installation/docker.md +++ b/docs/content/installation/docker.md @@ -58,6 +58,11 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest #### GPU Images +**NVIDIA CUDA 13:** +```bash +docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13 +``` + **NVIDIA CUDA 12:** ```bash docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12 @@ -84,10 +89,17 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan ``` **NVIDIA Jetson (L4T ARM64):** + +CUDA 12 (for Nvidia AGX Orin and similar platforms): ```bash docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64 ``` +CUDA 13 (for Nvidia DGX Spark): +```bash +docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13 +``` + ### All-in-One (AIO) Images **Recommended for beginners** - These images come pre-configured with models and backends, ready to use immediately. @@ -100,6 +112,11 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu #### GPU Images +**NVIDIA CUDA 13:** +```bash +docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13 +``` + **NVIDIA CUDA 12:** ```bash docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 @@ -130,6 +147,7 @@ services: api: image: localai/localai:latest-aio-cpu # For GPU support, use one of: + # image: localai/localai:latest-aio-gpu-nvidia-cuda-13 # image: localai/localai:latest-aio-gpu-nvidia-cuda-12 # image: localai/localai:latest-aio-gpu-nvidia-cuda-11 # image: localai/localai:latest-aio-gpu-hipblas diff --git a/docs/content/reference/compatibility-table.md b/docs/content/reference/compatibility-table.md index 2511afcd5..b34b3d452 100644 --- a/docs/content/reference/compatibility-table.md +++ b/docs/content/reference/compatibility-table.md @@ -18,10 +18,10 @@ LocalAI will attempt to automatically load models which are not explicitly confi | Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| -| [llama.cpp]({{%relref "features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU | -| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel | -| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12, ROCm, Intel, CPU | -| [exllama2](https://github.com/turboderp-org/exllamav2) | GPTQ | yes | GPT only | no | no | CUDA 12 | +| [llama.cpp]({{%relref "features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 11/12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU | +| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12/13, ROCm, Intel | +| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12/13, ROCm, Intel, CPU | +| [exllama2](https://github.com/turboderp-org/exllamav2) | GPTQ | yes | GPT only | no | no | CUDA 12/13 | | [MLX](https://github.com/ml-explore/mlx-lm) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) | | [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) | | [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A | @@ -30,47 +30,49 @@ LocalAI will attempt to automatically load models which are not explicitly confi | Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| -| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU | -| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU | +| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU | +| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel, CPU | | [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU | -| [bark](https://github.com/suno-ai/bark) | bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel | +| [bark](https://github.com/suno-ai/bark) | bark | no | Audio generation | no | no | CUDA 12/13, ROCm, Intel | | [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | CUDA, Metal, CPU | -| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU | -| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU | -| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU | +| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU | +| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU | +| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12/13, CPU | | [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS | no | Text-to-speech | no | no | CPU | | [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU | -| [neutts](https://github.com/neuphonic/neuttsair) | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12, ROCm, CPU | +| [neutts](https://github.com/neuphonic/neuttsair) | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, CPU | +| [vibevoice](https://github.com/microsoft/VibeVoice) | VibeVoice-Realtime | no | Real-time text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU | | [mlx-audio](https://github.com/Blaizzy/mlx-audio) | MLX | no | Text-tospeech | no | no | Metal (Apple Silicon) | ## Image & Video Generation | Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| -| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU | -| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU | +| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12/13, Intel SYCL, Vulkan, CPU | +| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12/13, ROCm, Intel, Metal, CPU | | [transformers-musicgen](https://github.com/huggingface/transformers) | MusicGen | no | Audio generation | no | no | CUDA, CPU | ## Specialized AI Tasks | Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| -| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU | -| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU | +| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12/13, Intel, CPU | +| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 11/12/13, ROCm, Intel, CPU | | [local-store](https://github.com/mudler/LocalAI) | Vector database | no | Vector storage | yes | no | CPU | | [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based | ## Acceleration Support Summary ### GPU Acceleration -- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends +- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0, CUDA 13.0 support across most backends - **AMD ROCm**: HIP-based acceleration for AMD GPUs - **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision) - **Vulkan**: Cross-platform GPU acceleration - **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+) ### Specialized Hardware -- **NVIDIA Jetson (L4T)**: ARM64 support for embedded AI +- **NVIDIA Jetson (L4T CUDA 12)**: ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier) +- **NVIDIA Jetson (L4T CUDA 13)**: ARM64 support for embedded AI (DGX Spark) - **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+ - **Darwin x86**: Intel Mac support diff --git a/docs/content/reference/nvidia-l4t.md b/docs/content/reference/nvidia-l4t.md index b019aa70c..9cc81c09b 100644 --- a/docs/content/reference/nvidia-l4t.md +++ b/docs/content/reference/nvidia-l4t.md @@ -5,16 +5,43 @@ title = "Running on Nvidia ARM64" weight = 27 +++ -LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, and Jetson AGX Xavier. The following instructions will guide you through building the LocalAI container for Nvidia ARM64 devices. +LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, Jetson AGX Orin, and Nvidia DGX Spark. The following instructions will guide you through building and using the LocalAI container for Nvidia ARM64 devices. + +## Platform Compatibility + +- **CUDA 12 L4T images**: Compatible with Nvidia AGX Orin and similar platforms (Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier) +- **CUDA 13 L4T images**: Compatible with Nvidia DGX Spark ## Prerequisites - Docker engine installed (https://docs.docker.com/engine/install/ubuntu/) - Nvidia container toolkit installed (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-ap) +## Pre-built Images + +Pre-built images are available on quay.io and dockerhub: + +### CUDA 12 (for AGX Orin and similar platforms) + +```bash +docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64 +# or +docker pull localai/localai:latest-nvidia-l4t-arm64 +``` + +### CUDA 13 (for DGX Spark) + +```bash +docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13 +# or +docker pull localai/localai:latest-nvidia-l4t-arm64-cuda-13 +``` + ## Build the container -Build the LocalAI container for Nvidia ARM64 devices using the following command: +If you need to build the container yourself, use the following commands: + +### CUDA 12 (for AGX Orin and similar platforms) ```bash git clone https://github.com/mudler/LocalAI @@ -24,18 +51,30 @@ cd LocalAI docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r36.4.0 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core . ``` -Otherwise images are available on quay.io and dockerhub: +### CUDA 13 (for DGX Spark) ```bash -docker pull quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core +git clone https://github.com/mudler/LocalAI + +cd LocalAI + +docker build --build-arg SKIP_DRIVERS=false --build-arg BUILD_TYPE=cublas --build-arg CUDA_MAJOR_VERSION=13 --build-arg CUDA_MINOR_VERSION=0 --build-arg BASE_IMAGE=ubuntu:24.04 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-cuda-13-core . ``` ## Usage -Run the LocalAI container on Nvidia ARM64 devices using the following command, where `/data/models` is the directory containing the models: +Run the LocalAI container on Nvidia ARM64 devices using the following commands, where `/data/models` is the directory containing the models: + +### CUDA 12 (for AGX Orin and similar platforms) ```bash -docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core +docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64 +``` + +### CUDA 13 (for DGX Spark) + +```bash +docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13 ``` Note: `/data/models` is the directory containing the models. You can replace it with the directory containing your models.