Thursday, October 19, 2023

Step by Step Mistral 7B Installation Local on Linux Windows or in Cloud

 This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.





Commands Used:


pip3 install optimum

pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79


git clone https://github.com/PanQiWei/AutoGPTQ

cd AutoGPTQ

git checkout v0.4.2

pip3 install .



from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"

# To use a different branch, change revision

# For example: revision="gptq-4bit-32g-actorder_True"


model = AutoModelForCausalLM.from_pretrained(model_name_or_path,

                                             device_map="auto",

                                             trust_remote_code=False,

                                             revision="main")


tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)


system_message = "You are an expert at bathroom renovations."

prompt = """

Renovate the following old bathroom:

I have a 25 year old house with an old bathroom. I want to renovate it completely. 

Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.

"""


prompt_template=f'''<|im_start|>system

{system_message}<|im_end|>

<|im_start|>user

{prompt}<|im_end|>

<|im_start|>assistant

'''


print("\n\n*** Generate:")


input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)

print(tokenizer.decode(output[0]))


# Inference can also be done using transformers' pipeline


print("*** Pipeline:")

pipe = pipeline(

    "text-generation",

    model=model,

    tokenizer=tokenizer,

    max_new_tokens=512,

    do_sample=True,

    temperature=0.7,

    top_p=0.95,

    top_k=40,

    repetition_penalty=1.1

)


print(pipe(prompt_template)[0]['generated_text'])

1 comment:

Jim McLeod said...

Which CUDA version have you loaded?
I am getting a RuntimeError: "Te detected CUDA version 912.3) mismatches the version that was used to compile PyTorch (11.8). Please make sure to use the same CUDA versions."

Thanks.