How to run an Ollama server for large language models

Tags Ollama LLM

Follow these steps to run a an Ollama server and set up a conda environment to use it with python

Initial setup

Perform these steps on the head node

  1. Download and install ollama server
    mkdir ~/ollama
    cd ~/ollama
    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    tar xvf ollama-linux-amd64.tgz
  2. Set up conda environment and install ollama libraries.  If you have not yet set up anaconda for the first time please follow the steps outlined here: Initialize Anaconda
    conda create --name ollama-test python=3.12 anaconda
    conda activate ollama-test
    pip install ollama
    conda deactivate

​​​​​At this point, everything is ready to use. Each time you need to use it, first initialize everything

Environment Initialization

  1. Start slurm job
    srun --time=1-00:00:00 --partition=gpu --ntasks=1 --gres=gpu:1 --x11 --pty /bin/bash -i
  2. Start ollama server
    cd ~/ollama
    nohup bin/ollama serve > ollama.log 2>&1 &
  3. Pull any models you want to use, as an example:
    bin/ollama pull llama3.2
  4. Activate previously created conda environment
    ​​​​​​​conda activate ollama-test

 

Now everything is ready for you to use.

You can look in the ollama.log file to watch what the server is doing.  And you can write python code using the ollama python library installed in the setup steps to interact with the server.

Since the server was started inside of a slurm job running on a GPU queue, it should be able to find and utilize the GPU.  The start up logs in the ollama.log file will indicate that.