Setup a LLM Dev Environment on Windows 11
Welcome to this follow-up to my previous article (you can read it here).
In this article, we’ll explore how to set up an LLM development environment on Windows 11 PC with NVidia GPU (my is 3080 Ti) using Nvidia-Docker and WSL2. All the steps are based on official website https://docs.nvidia.com/cuda/wsl-user-guide/index.html.
Initial Setup
1.Install NVIDIA GPU Drivers
Visit NVIDIA’s driver download page and select the driver corresponding to your GPU model. This is the only driver you need throughout this setup.
Again, this is the only driver you need to install during this whole process.
2. Install WSL 2
If you’re on Windows 11, WSL comes pre-installed. Open Windows Terminal, Command Prompt, or PowerShell and install your Linux distribution to ensure it’s up-to-date:
wsl --install
wsl --update
Installing CUDA on WSL
1.Install CUDA
Use following command to enter WSL
wsl
2. Prepare for CUDA Installation
Remove the outdated GPG key with the command
sudo apt-key del 7fa2af80
Set up the CUDA repository for Ubuntu:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
3. Verifying Installation
Check your NVIDIA driver and CUDA version with
nvidia-smi
it should show something like following
To verify Docker’s installation on WSL, enter
docker --version
It should show something like following
If Docker isn’t installed, follow the guide here.
Setting Up Jupyter Notebook
1.Start Jupyter via Docker
Here I used tensorflow docker image tensorflow/tensorflow:2.14.0-gpu-jupyter, because only this version supported the CUDA version I installed 12.0. Please change the version number to match your CUDA version.
Using the TensorFlow Docker image optimized for GPU, run
sudo docker run -v /mnt/<path-to-your-code>:/tf/projects --gpus all -p 8888:8888 tensorflow/tensorflow:2.14.0-gpu-jupyter
After the image downloads and the container starts, it will display a URL to access Jupyter Notebook. Here is output from my machine:
2. Install Dependencies
Install packages from jupyter notebook code cells like
! pip install -q -U torch
! pip install -q -U transformers
! pip install -q -U accelerate
Restart the Kernel
Run a LLM from Huggingface
Here I utilized the Microsoft LLM Phi-3-mini-4k-instruct, which is small while without auth. The code from HuggingFace https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
messages = [
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
You will get following response:
and we can see the GPU memory usage is high, which means this code used the GPU to generate the response.
Summary
This setup ensures you utilize your GPU efficiently when running Large Language Models on Windows 11, providing a powerful platform for your machine learning. I cannot guarantee it will 100% work for you, the most reliable way to try LLM is on Linux or Cloud — colab for example.