Ollama, get up and running with large language models, locally.
This Chart is for deploying Ollama.
Kubernetes: >= 1.16.0-0
for CPU only
Kubernetes: >= 1.26.0-0
for GPU stable support (NVIDIA and AMD)
Not all GPUs are currently supported with ollama (especially with AMD)
To install the ollama
chart in the ollama
namespace:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama
First please read the release notes of Ollama to make sure there are no backwards incompatible changes.
Make adjustments to your values as needed, then run helm upgrade
:
# -- This pulls the latest version of the ollama chart from the repo.
helm repo update
helm upgrade ollama ollama-helm/ollama --namespace ollama --values values.yaml
To uninstall/delete the ollama
deployment in the ollama
namespace:
helm delete ollama --namespace ollama
Substitute your values if they differ from the examples. See helm delete --help
for a full reference on delete
parameters and flags.
ollama:
gpu:
# -- Enable GPU integration
enabled: true
# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'
# -- Specify the number of GPU to 2
number: 2
# -- List of models to pull at container startup
models:
- mistral
- llama2
ollama:
models:
- llama2
ingress:
enabled: true
hosts:
- host: ollama.domain.lan
paths:
- path: /
pathType: Prefix
ollama.domain.lan
| Key | Type | Default | Description |
|—–|——|———|————-|
| affinity | object | {}
| Affinity for pod assignment |
| autoscaling.enabled | bool | false
| Enable autoscaling |
| autoscaling.maxReplicas | int | 100
| Number of maximum replicas |
| autoscaling.minReplicas | int | 1
| Number of minimum replicas |
| autoscaling.targetCPUUtilizationPercentage | int | 80
| CPU usage to target replica |
| extraArgs | list | []
| Additional arguments on the output Deployment definition. |
| extraEnv | list | []
| Additional environments variables on the output Deployment definition. |
| fullnameOverride | string | ""
| String to fully override template |
| image.pullPolicy | string | "IfNotPresent"
| Docker pull policy |
| image.repository | string | "ollama/ollama"
| Docker image registry |
| image.tag | string | ""
| Docker image tag, overrides the image tag whose default is the chart appVersion. |
| imagePullSecrets | list | []
| Docker registry secret names as an array |
| ingress.annotations | object | {}
| Additional annotations for the Ingress resource. |
| ingress.className | string | ""
| IngressClass that will be used to implement the Ingress (Kubernetes 1.18+) |
| ingress.enabled | bool | false
| Enable ingress controller resource |
| ingress.hosts[0].host | string | "ollama.local"
| |
| ingress.hosts[0].paths[0].path | string | "/"
| |
| ingress.hosts[0].paths[0].pathType | string | "Prefix"
| |
| ingress.tls | list | []
| The tls configuration for hostnames to be covered with this ingress record. |
| livenessProbe.enabled | bool | true
| Enable livenessProbe |
| livenessProbe.failureThreshold | int | 6
| Failure threshold for livenessProbe |
| livenessProbe.initialDelaySeconds | int | 60
| Initial delay seconds for livenessProbe |
| livenessProbe.path | string | "/"
| Request path for livenessProbe |
| livenessProbe.periodSeconds | int | 10
| Period seconds for livenessProbe |
| livenessProbe.successThreshold | int | 1
| Success threshold for livenessProbe |
| livenessProbe.timeoutSeconds | int | 5
| Timeout seconds for livenessProbe |
| nameOverride | string | ""
| String to partially override template (will maintain the release name) |
| nodeSelector | object | {}
| Node labels for pod assignment. |
| ollama.gpu.enabled | bool | false
| Enable GPU integration |
| ollama.gpu.number | int | 1
| Specify the number of GPU |
| ollama.gpu.type | string | "nvidia"
| GPU type: ‘nvidia’ or ‘amd’ If ‘ollama.gpu.enabled’, default value is nvidia If set to ‘amd’, this will add ‘rocm’ suffix to image tag if ‘image.tag’ is not override This is due cause AMD and CPU/CUDA are different images |
| ollama.insecure | bool | false
| Add insecure flag for pulling at container startup |
| ollama.models | object | {}
| List of models to pull at container startup The more you add, the longer the container will take to start if models are not present models: - llama2 - mistral |
| persistentVolume.accessModes | list | ["ReadWriteOnce"]
| Ollama server data Persistent Volume access modes Must match those of existing PV or dynamic provisioner Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ |
| persistentVolume.annotations | object | {}
| Ollama server data Persistent Volume annotations |
| persistentVolume.enabled | bool | true
| Enable persistence using PVC |
| persistentVolume.existingClaim | string | ""
| If you’d like to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart will not create the default PVC. Requires server.persistentVolume.enabled: true |
| persistentVolume.size | string | "30Gi"
| Ollama server data Persistent Volume size |
| persistentVolume.storageClass | string | ""
| Ollama server data Persistent Volume Storage Class If defined, storageClassName: