Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model output is different when using default optimize_model #10782

Open
vishnumadhu365 opened this issue Apr 17, 2024 · 1 comment
Open

Model output is different when using default optimize_model #10782

vishnumadhu365 opened this issue Apr 17, 2024 · 1 comment

Comments

@vishnumadhu365
Copy link

While testing ipex-llm I observed a difference in model output after calling optimize_model() which defaulted to sym_int4.
Please help clarify the following:

  1. What is causing this variation in output ?
  2. Does optimize_model() call ensure that the model accuracy remains the same across eval benchmarks like human eval, mmlu etc ?

Thanks!

env :
Python 3.9
ipex-llm 2.1.0b20240416
torch 2.2.2
transformers 4.31.0

reproducer:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["TRANSFORMERS_VERBOSITY"] = "error"

import sys
import warnings
warnings.filterwarnings("ignore")

import torch
torch.manual_seed(100)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'meta-llama/Llama-2-7b-chat-hf'
model = AutoModelForCausalLM.from_pretrained(model_path,
                                                 trust_remote_code=True,
                                                 use_cache=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, 
                                           trust_remote_code=True)

system_prompt = "You are a creative poet. Write a poem about the given topic. Use only 100 words"
user_prompt = "Write a poem about owls and starry nights"
prompt_template = f"<s>[INST] <<SYS>>\n {system_prompt} \n<</SYS>>\n\n {user_prompt}  [/INST]"

print("*"*10 + "Original model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()

from ipex_llm import optimize_model
model = optimize_model(model)

print("*"*10 + "IPEX-LLM Optimized model output" + "*"*10)
print(tokenizer.decode(model.generate(tokenizer.encode(prompt_template, return_tensors="pt"), max_new_tokens=100)[0], skip_special_tokens=True))
sys.stdout.flush()

output:

**********Original model output**********
[INST] <<SYS>>
 You are a creative poet. Write a poem about the given topic. Use only 100 words 
<</SYS>>

 Write a poem about owls and starry nights  [/INST]  Sure! Here is a 100-word poem about owls and starry nights:

Silent sentinels of the night,
Owls perch on boughs, their eyes alight.
Glittering stars above, a twinkling sight,
A magical night, pure delight.
Converting the current model to sym_int4 format......
**********IPEX-LLM Optimized model output**********
[INST] <<SYS>>
 You are a creative poet. Write a poem about the given topic. Use only 100 words 
<</SYS>>

 Write a poem about owls and starry nights  [/INST]  Sure, here is a poem about owls and starry nights in exactly 100 words:

Owls hoot in the night's embrace
Their soft coos echo through space
While stars twinkle bright and slow
A celestial show to know
Nature's symphony so grand
In this peaceful night's command
@hkvision
Copy link
Contributor

hkvision commented Apr 22, 2024

Hi,

We are doing some further optimizations in ipex-llm for optimal performance, which may change some logits and outputs, this is expected.
But at the same time, we are running accuracy benchmarks (e.g. the tasks in https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to make sure that our optimizations don't have any obvious negative impacts in the accuracy.
If you observe any wrong output with the ipex-llm optimized model, feel free to tell us and we will check it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants