KoboldCPP

KoboldCPP’s design goals and community culture lean more toward open-ended play than strict, utilitarian chat. Consequently, it doesn’t default to “you are a helpful assistant” like the system instructions of many other UIs do. This makes it a great choice for creative writing, role-playing, and exploratory conversations. It also has a very lightweight GUI that is easy to use.

Step 1: Download KoboldCPP

  • Go to the KoboldCPP GitHub releases page. At the top of the page should be the latest release (1.99.3 at the time of this writing). Scroll down the page just a bit, and you’ll see the list of Assets for the latest release, which should have packages for your operating system. If you scroll down too far you’ll start to get into older releases.

    • For Windows, download koboldcpp.exe.

      • If you have an older computer or you are lacking a usable graphics card, they have “oldpc” and “nocuda” options as well (nocuda essentially means “no nvidia graphics card”).

    • For modern Mac (with Apple Silicon M chips), download koboldcpp-mac-arm64. Older Intel-chip Macs unfortunately are not supported.

    • For Linux, download the latest koboldcpp-linux-x64 binary.

      • Similar to Windows, they have an “oldpc” and “nocuda” option as well.

Step 2: Prepare the Binary

  • Windows: No setup required; just double-click the .exe file.

  • Mac: Open Terminal, navigate to the download location, execute:

chmod +x koboldcpp-mac-arm64
  • Linux: In the terminal, navigate to the download directory and enter:

chmod +x koboldcpp-linux-x64

Step 3: Run KoboldCPP

  • Windows: Double-click the .exe file to launch the application.

  • Mac/Linux: In Terminal, run:

./koboldcpp-mac-arm64        # Mac
./koboldcpp-linux-x64        # Linux
  • If blocked on Mac, allow it from “Privacy & Security” settings, then reopen.

Step 4: Download an LLM Model

  • Visit sites hosting GGUF or GGML models (e.g., Hugging Face).

  • Download a model compatible with your RAM (7B for 8GB+, 13B for 16GB+).

Step 5: Load the Model in KoboldCPP

  • When KoboldCPP starts, either select your downloaded model via the GUI or provide the file path in the CLI prompt.

Step 6: Optional GPU Acceleration

  • If your computer has a compatible GPU:

    • Windows/Linux: Download a CUDA-enabled binary; make sure you have the correct CUDA toolkit installed.

    • Mac: Apple Silicon supports Metal GPU acceleration.

    • Select the number of GPU layers or enable acceleration in KoboldCPP’s settings.

Step 7: Start Prompting

  • The interface allows entering prompts, adjusting context, and saving your conversations.

  • Experiment with small models first; monitor system usage as you go.


Everything on Shared Sapience is free and open to all. However, it takes a tremendous amount of time and effort to keep these resources and guides up to date and useful for everyone.

If enough of my amazing readers could help with just a few dollars a month, I could dedicate myself full-time to helping Seekers, Builders, and Protectors collaborate better with AI and work toward a better future.

Even if you can’t support financially, becoming a free subscriber is a huge help in advancing the mission of Shared Sapience.

If you’d like to help by becoming a free or paid subscriber, simply use the Subscribe/Upgrade button below, or send a one-time quick tip with Buy me a Coffee by clicking here. I’m deeply grateful for any support you can provide - thank you!

This post is for paid subscribers