17 oktober 2025


Running non-quantized Hugging Face LLMs locally in Ollama — 
example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).

Using Visual Studion code v.1.105.1 - vscode and macOS Tahoe 26.0.1 in this text.

From huggingface login and create a token
Start macos terminal. 
Replace hf_your_token_here with your newly created token from huggingface

~/> export HUGGING_FACE_HUB_TOKEN=hf_your_token_here

~/> git clone https://github.com/ggerganov/llama.cpp

~/> cd llama.cpp

Open direcotory in vscode with

~/llama.cpp> code .  

In vscode click on any .py file.
Click in the down right corner on where it says e.g python 3.13.7....



Click on +Create virtual environment in the popup list


Select venv  - last in the list



Select Custom in the list


Select the python version you want to create a virtual environment of e.g 3.11.13
(don´t select the latest version because a lot of python packages is not compatible with the latest versions)


Select the name venv and press enter


Select Skip package installation

Done. You have now a virtual python environment

In vscode, select New Terminal under Terminal menu

You will now see that it says (.venv) in the terminal



It means that if you install python packages with e.g. pip install, it will be only installed in the .venv environment and will not affect other python environments.

In vscode terminal:


(.venv) ~/llama.cpp> python3 -m pip install transformers sentencepiece protobuf safetensors huggingface_hub mistral-common torch
(.venv) ~/llama.cpp> mkdir build
(.venv) ~/llama.cpp> cd build
(.venv) ~/llama.cpp/build> brew install cmake
(.venv) ~/llama.cpp/build> cmake ..
(.venv) ~/llama.cpp/build> cmake --build . --config Release

Download Hermes-4-14B från huggingsface and create quantized version:

(.venv) ~/llama.cpp/build> cd ../../
(.venv) ~/>  brew install git-lfs
(.venv) ~/> git lfs install
(.venv) ~/> git clone https://huggingface.co/NousResearch/Hermes-4-14B
(.venv) ~/> cd Hermes-4-14B
(.venv) ~/Hermes-4-14B> git lfs pull

(.venv) ~/Hermes-4-14B> cd ../llama.cpp/
(.venv) ~/llama.cpp> python3 convert_hf_to_gguf.py ../Hermes-4-14B --outfile hermes4-14b-fp16.gguf

(.venv) ~/llama.cpp> ./build/bin/llama-quantize hermes4-14b-fp16.gguf hermes4-14b.Q4_K_M.gguf Q4_K_M

(.venv) ~/llama.cpp> vi Modelfile

FROM ./hermes4-14b.Q4_K_M.gguf
TEMPLATE """
<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""


(.venv) ~/llama.cpp> ollama create hermes4local -f Modelfile
(.venv) ~/llama.cpp> ollama run hermes4local