How to run an Ollama server for large language models

Ollama LLM

Follow these steps to run a an Ollama server and set up a conda environment to use it with python

Initial setup

Perform these steps on the head node

Download and install ollama server

mkdir ~/ollama
cd ~/ollama
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
tar xvf ollama-linux-amd64.tgz

Set up conda environment and install ollama libraries. If you have not yet set up anaconda for the first time please follow the steps outlined here: Initialize Anaconda
```
conda create --name ollama-test python=3.12 anaconda
conda activate ollama-test
pip install ollama
conda deactivate
```

At this point, everything is ready to use. Each time you need to use it, first initialize everything

Environment Initialization

Start slurm job

srun --time=1-00:00:00 --partition=gpu --ntasks=1 --gres=gpu:1 --x11 --pty /bin/bash -i

Start ollama server

cd ~/ollama
nohup bin/ollama serve > ollama.log 2>&1 &

Pull any models you want to use, as an example:
```
bin/ollama pull llama3.2
```

Activate previously created conda environment

conda activate ollama-test

Now everything is ready for you to use.

You can look in the ollama.log file to watch what the server is doing. And you can write python code using the ollama python library installed in the setup steps to interact with the server.

Since the server was started inside of a slurm job running on a GPU queue, it should be able to find and utilize the GPU. The start up logs in the ollama.log file will indicate that.

0 reviews

Print Article

Updating...

How to run an Ollama server for large language models

Initial setup

Environment Initialization

Deleting...