KoboldCPP
KoboldCPP’s design goals and community culture lean more toward open-ended play than strict, utilitarian chat. Consequently, it doesn’t default to “you are a helpful assistant” like the system instructions of many other UIs do. This makes it a great choice for creative writing, role-playing, and exploratory conversations. It also has a very lightweight GUI that is easy to use.
Step 1: Download KoboldCPP
Go to the KoboldCPP GitHub releases page. At the top of the page should be the latest release (1.99.3 at the time of this writing). Scroll down the page just a bit, and you’ll see the list of Assets for the latest release, which should have packages for your operating system. If you scroll down too far you’ll start to get into older releases.
For Windows, download
koboldcpp.exe.If you have an older computer or you are lacking a usable graphics card, they have “oldpc” and “nocuda” options as well (nocuda essentially means “no nvidia graphics card”).
For modern Mac (with Apple Silicon M chips), download
koboldcpp-mac-arm64. Older Intel-chip Macs unfortunately are not supported.For Linux, download the latest
koboldcpp-linux-x64binary.Similar to Windows, they have an “oldpc” and “nocuda” option as well.
Step 2: Prepare the Binary
Windows: No setup required; just double-click the
.exefile.Mac: Open Terminal, navigate to the download location, execute:
chmod +x koboldcpp-mac-arm64Linux: In the terminal, navigate to the download directory and enter:
chmod +x koboldcpp-linux-x64Step 3: Run KoboldCPP
Windows: Double-click the
.exefile to launch the application.Mac/Linux: In Terminal, run:
./koboldcpp-mac-arm64 # Mac
./koboldcpp-linux-x64 # LinuxIf blocked on Mac, allow it from “Privacy & Security” settings, then reopen.
Step 4: Download an LLM Model
Visit sites hosting GGUF or GGML models (e.g., Hugging Face).
Download a model compatible with your RAM (7B for 8GB+, 13B for 16GB+).
Step 5: Load the Model in KoboldCPP
When KoboldCPP starts, either select your downloaded model via the GUI or provide the file path in the CLI prompt.
Step 6: Optional GPU Acceleration
If your computer has a compatible GPU:
Windows/Linux: Download a CUDA-enabled binary; make sure you have the correct CUDA toolkit installed.
Mac: Apple Silicon supports Metal GPU acceleration.
Select the number of GPU layers or enable acceleration in KoboldCPP’s settings.
Step 7: Start Prompting
The interface allows entering prompts, adjusting context, and saving your conversations.
Experiment with small models first; monitor system usage as you go.
Everything on Shared Sapience is free and open to all. However, it takes a tremendous amount of time and effort to keep these resources and guides up to date and useful for everyone.
If enough of my amazing readers could help with just a few dollars a month, I could dedicate myself full-time to helping Seekers, Builders, and Protectors collaborate better with AI and work toward a better future.
Even if you can’t support financially, becoming a free subscriber is a huge help in advancing the mission of Shared Sapience.
If you’d like to help by becoming a free or paid subscriber, simply use the Subscribe/Upgrade button below, or send a one-time quick tip with Buy me a Coffee by clicking here. I’m deeply grateful for any support you can provide - thank you!


