iru on Nostr: Model is https://huggingface.co/TheBloke/airoboros-65B-gpt4-1.2-GGML Software is ...
Model is https://huggingface.co/TheBloke/airoboros-65B-gpt4-1.2-GGML
Software is https://github.com/ggerganov/llama.cpp
Not pretend that response was fast. A 30B or even 13B model might be faster than Pygmalion.
Llama can offload layers to GPU.
Koboldcpp can use llama.
Software is https://github.com/ggerganov/llama.cpp
Not pretend that response was fast. A 30B or even 13B model might be faster than Pygmalion.
Llama can offload layers to GPU.
Koboldcpp can use llama.