Monday, May 6, 2024

How to Run Exl2 LLMs Locally for Fast Speed

 This video shows how to install exllamav2 locally and run any model in exl2 format locally.




Code:

pip install huggingface_hub

huggingface-cli login


mkdir llama38b

cd llama38b

huggingface-cli download hjhj3168/Llama-3-8b-Orthogonalized-exl2 --local-dir llama38b --local-dir-use-symlinks False


cd ..


git clone https://github.com/turboderp/exllamav2

cd exllamav2


conda create -n exl2 python=3.11

conda activate exl2


pip install -r requirements.txt

pip install .


python test_inference.py -m /home/ubuntu/llama38b/ -p "To travel without ticket in train,"

No comments: