You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing ipex-llm I observed a difference in model output after calling optimize_model() which defaulted to sym_int4.
Please help clarify the following:
What is causing this variation in output ?
Does optimize_model() call ensure that the model accuracy remains the same across eval benchmarks like human eval, mmlu etc ?
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_VERBOSITY"] = "error"
import sys
import warnings
warnings.filterwarnings("ignore")
import torch
torch.manual_seed(100)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = 'meta-llama/Llama-2-7b-chat-hf'
model = AutoModelForCausalLM.from_pretrained(model_path,
trust_remote_code=True,
use_cache=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
system_prompt = "You are a creative poet. Write a poem about the given topic. Use only 100 words"
user_prompt = "Write a poem about owls and starry nights"
prompt_template = f"<s>[INST] <<SYS>>\n {system_prompt} \n<</SYS>>\n\n {user_prompt} [/INST]"
print("*"*10 + "Original model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()
from ipex_llm import optimize_model
model = optimize_model(model)
print("*"*10 + "IPEX-LLM Optimized model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()
output:
**********Original model output**********
[INST] <<SYS>>
You are a creative poet. Write a poem about the given topic. Use only 100 words
<</SYS>>
Write a poem about owls and starry nights [/INST] Sure! Here is a 100-word poem about owls and starry nights:
Silent sentinels of the night,
Owls perch on boughs, their eyes alight.
Glittering stars above, a twinkling sight,
A magical night, pure delight.
Converting the current model to sym_int4 format......
**********IPEX-LLM Optimized model output**********
[INST] <<SYS>>
You are a creative poet. Write a poem about the given topic. Use only 100 words
<</SYS>>
Write a poem about owls and starry nights [/INST] Sure, here is a poem about owls and starry nights in exactly 100 words:
Owls hoot in the night's embrace
Their soft coos echo through space
While stars twinkle bright and slow
A celestial show to know
Nature's symphony so grand
In this peaceful night's command
The text was updated successfully, but these errors were encountered:
We are doing some further optimizations in ipex-llm for optimal performance, which may change some logits and outputs, this is expected.
But at the same time, we are running accuracy benchmarks (e.g. the tasks in https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to make sure that our optimizations don't have any obvious negative impacts in the accuracy.
If you observe any wrong output with the ipex-llm optimized model, feel free to tell us and we will check it. Thanks!
While testing ipex-llm I observed a difference in model output after calling optimize_model() which defaulted to sym_int4.
Please help clarify the following:
Thanks!
env :
Python 3.9
ipex-llm 2.1.0b20240416
torch 2.2.2
transformers 4.31.0
reproducer:
output:
The text was updated successfully, but these errors were encountered: