17 oktober 2025


Running non-quantized Hugging Face LLMs locally in Ollama — 
example: converting Hermes-4-14B into a quantized GGUF model (hermes4-14b.Q4_K_M.gguf).

Using Visual Studion code v.1.105.1 - vscode and macOS Tahoe 26.0.1 in this text.

From huggingface login and create a token
Start macos terminal. 
Replace hf_your_token_here with your newly created token from huggingface

~/> export HUGGING_FACE_HUB_TOKEN=hf_your_token_here

~/> git clone https://github.com/ggerganov/llama.cpp

~/> cd llama.cpp

Open direcotory in vscode with

~/llama.cpp> code .  

In vscode click on any .py file.
Click in the down right corner on where it says e.g python 3.13.7....



Click on +Create virtual environment in the popup list


Select venv  - last in the list



Select Custom in the list


Select the python version you want to create a virtual environment of e.g 3.11.13
(don´t select the latest version because a lot of python packages is not compatible with the latest versions)


Select the name venv and press enter


Select Skip package installation

Done. You have now a virtual python environment

In vscode, select New Terminal under Terminal menu

You will now see that it says (.venv) in the terminal



It means that if you install python packages with e.g. pip install, it will be only installed in the .venv environment and will not affect other python environments.

In vscode terminal:


(.venv) ~/llama.cpp> python3 -m pip install transformers sentencepiece protobuf safetensors huggingface_hub mistral-common torch
(.venv) ~/llama.cpp> mkdir build
(.venv) ~/llama.cpp> cd build
(.venv) ~/llama.cpp/build> brew install cmake
(.venv) ~/llama.cpp/build> cmake ..
(.venv) ~/llama.cpp/build> cmake --build . --config Release

Download Hermes-4-14B från huggingsface and create quantized version:

(.venv) ~/llama.cpp/build> cd ../../
(.venv) ~/>  brew install git-lfs
(.venv) ~/> git lfs install
(.venv) ~/> git clone https://huggingface.co/NousResearch/Hermes-4-14B
(.venv) ~/> cd Hermes-4-14B
(.venv) ~/Hermes-4-14B> git lfs pull

(.venv) ~/Hermes-4-14B> cd ../llama.cpp/
(.venv) ~/llama.cpp> python3 convert_hf_to_gguf.py ../Hermes-4-14B --outfile hermes4-14b-fp16.gguf

(.venv) ~/llama.cpp> ./build/bin/llama-quantize hermes4-14b-fp16.gguf hermes4-14b.Q4_K_M.gguf Q4_K_M

(.venv) ~/llama.cpp> vi Modelfile

FROM ./hermes4-14b.Q4_K_M.gguf
TEMPLATE """
<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""


(.venv) ~/llama.cpp> ollama create hermes4local -f Modelfile
(.venv) ~/llama.cpp> ollama run hermes4local


19 februari 2025

Setting up sgpt on macos to run with ollama locally

(A command-line productivity tool powered by AI large language models (LLM))


>pip install "shell-gpt[litellm]"


Create this file in ~/.config/shell_gpt/.sgptrc


> vi ~/.config/shell_gpt/.sgptrc


add below

CHAT_CACHE_PATH=/tmp/chat_cache

CACHE_PATH=/tmp/cache

CHAT_CACHE_LENGTH=100

CACHE_LENGTH=100

REQUEST_TIMEOUT=60

DEFAULT_MODEL=ollama/mistral:7b-instruct

DEFAULT_COLOR=magenta

ROLE_STORAGE_PATH= ~/.config/shell_gpt/roles

DEFAULT_EXECUTE_SHELL_CMD=false

DISABLE_STREAMING=false

CODE_THEME=dracula

OPENAI_FUNCTIONS_PATH=~/.config/shell_gpt/functions

OPENAI_USE_FUNCTIONS=false

SHOW_FUNCTIONS_OUTPUT=false

API_BASE_URL=http://127.0.0.1:11434

PRETTIFY_MARKDOWN=true

USE_LITELLM=true

SHELL_INTERACTION=true

OS_NAME=auto

SHELL_NAME=auto


Download a model with ollama e.g mistral:7b-instruct which is the DEFAULT_MODEL. (see config file above)

> ollama pull mistral:7b-instruct


# list downloaded ollama models

> ollama list


Example of output:


NAME                         ID              SIZE      MODIFIED          

mistral:7b-instruct          f974a74358d6    4.1 GB    About an hour ago    

llama3.1:8b-instruct-q8_0    b158ded76fa0    8.5 GB    10 days ago          

deepseek-r1:14b              ea35dfe18182    9.0 GB    4 weeks ago          

llama3.2-vision:latest       085a1fdae525    7.9 GB    4 weeks ago          

llama3.2:latest              a80c4f17acd5    2.0 GB    4 weeks ago          

phi4:latest                  ac896e5b8b34    9.1 GB    4 weeks ago      


# Start ollama server

> ollama serve


# change the modelname to one of the names above e.g phi4:latest if you don't want to use the DEFAULT_MODEL=ollama/mistral:7b-instruct 

> sgpt --model ollama/phi4:latest  "What is the fibonacci sequence"


More info and docs:   https://github.com/TheR1D/shell_gpt 

2 februari 2025

Transformers (how LLMs work) explained visually | DL5



Also all other videos about AI LLM from 3Blue1Brown 3blue1brown 3blue1brown 33blue1brown