Robscurity: oktober 2025

Running non-quantized Hugging Face LLMs locally in Ollama —
example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).

Using Visual Studion code v.1.105.1 - vscode and macOS Tahoe 26.0.1 in this text.

From huggingface login and create a token

Start macos terminal.

Replace hf_your_token_here with your newly created token from huggingface

~/> export HUGGING_FACE_HUB_TOKEN=hf_your_token_here

~/> git clone https://github.com/ggerganov/llama.cpp

~/> cd llama.cpp

Open direcotory in vscode with

~/llama.cpp> code .

In vscode click on any .py file.

Click in the down right corner on where it says e.g python 3.13.7....

Click on +Create virtual environment in the popup list

Select venv - last in the list

Select Custom in the list

Select the python version you want to create a virtual environment of e.g 3.11.13

(don´t select the latest version because a lot of python packages is not compatible with the latest versions)

Select the name venv and press enter

Select Skip package installation

Done. You have now a virtual python environment

In vscode, select New Terminal under Terminal menu

You will now see that it says (.venv) in the terminal

It means that if you install python packages with e.g. pip install, it will be only installed in the .venv environment and will not affect other python environments.

In vscode terminal:

(.venv) ~/llama.cpp> python3 -m pip install transformers sentencepiece protobuf safetensors huggingface_hub mistral-common torch

(.venv) ~/llama.cpp> mkdir build

(.venv) ~/llama.cpp> cd build

(.venv) ~/llama.cpp/build> brew install cmake

(.venv) ~/llama.cpp/build> cmake ..

(.venv) ~/llama.cpp/build> cmake --build . --config Release

Download Hermes-4-14B från huggingsface and create quantized version:

(.venv) ~/llama.cpp/build> cd ../../

(.venv) ~/> brew install git-lfs

(.venv) ~/> git lfs install

(.venv) ~/> git clone https://huggingface.co/NousResearch/Hermes-4-14B

(.venv) ~/> cd Hermes-4-14B

(.venv) ~/Hermes-4-14B> git lfs pull

(.venv) ~/Hermes-4-14B> cd ../llama.cpp/

(.venv) ~/llama.cpp> python3 convert_hf_to_gguf.py ../Hermes-4-14B --outfile hermes4-14b-fp16.gguf

(.venv) ~/llama.cpp> ./build/bin/llama-quantize hermes4-14b-fp16.gguf hermes4-14b.Q4_K_M.gguf Q4_K_M

(.venv) ~/llama.cpp> vi Modelfile

FROM ./hermes4-14b.Q4_K_M.gguf

TEMPLATE """
<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

(.venv) ~/llama.cpp> ollama create hermes4local -f Modelfile

(.venv) ~/llama.cpp> ollama run hermes4local

Robscurity

22 oktober 2025

Testing OpenAI Atlas agent mode

18 oktober 2025

Equipping agents for the real world with Agent Skills \ Anthropic

17 oktober 2025

Running non-quantized Hugging Face LLMs locally in Ollama —
example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).

In vscode terminal:

Download Hermes-4-14B från huggingsface and create quantized version:

Om mig

22 oktober 2025

Testing OpenAI Atlas agent mode

18 oktober 2025

Equipping agents for the real world with Agent Skills \ Anthropic

17 oktober 2025

Running non-quantized Hugging Face LLMs locally in Ollama — example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).

In vscode terminal:

Download Hermes-4-14B från huggingsface and create quantized version:

Om mig

Running non-quantized Hugging Face LLMs locally in Ollama —
example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).