Thursday, August 17, 2023

Tutorial to Install WizardLM Locally - Step by Step

This is the easiest and quickest guide to install WizardLM model based on LLaMA2 on local machine in AWS.

 




Commands Used:


!pip install transformers


!git clone https://github.com/PanQiWei/AutoGPTQ

cd AutoGPTQ

!pip3 install .


from transformers import AutoTokenizer, pipeline, logging

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig


model_name_or_path = "TheBloke/WizardLM-70B-V1.0-GPTQ"


use_triton = False


tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)


model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,

        use_safetensors=True,

        trust_remote_code=False,

        device="cuda:0",

        use_triton=use_triton,

        quantize_config=None)



prompt = "Tell me about Stoics"

prompt_template=f'''A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

'''


print("\n\n*** Generate:")


input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)

print(tokenizer.decode(output[0]))


# Inference can also be done using transformers' pipeline


# Prevent printing spurious transformers error when using pipeline with AutoGPTQ

logging.set_verbosity(logging.CRITICAL)


print("*** Pipeline:")

pipe = pipeline(

    "text-generation",

    model=model,

    tokenizer=tokenizer,

    max_new_tokens=512,

    temperature=0.7,

    top_p=0.95,

    repetition_penalty=1.15

)


print(pipe(prompt_template)[0]['generated_text'])

No comments: