[Bug] Long text evaluation parameters are not clear #1035

bullw · 2024-04-10T12:39:14Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

python 3.10.1
OpenCompass 0.2.3
vllm 0.2.3

Reproduces the problem - code/configuration sample

configs/models/chatglm/vllm_chatglm2_6b_32k.py
from opencompass.models import VLLM

models = [
dict(
type=VLLM,
abbr='chatglm2-6b-32k-vllm',
path='THUDM/chatglm2-6b-32k',
max_out_len=512,
max_seq_len=4096,
batch_size=32,
generation_kwargs=dict(temperature=0),
run_cfg=dict(num_gpus=1, num_procs=1),
)
]

Reproduces the problem - command or script

python run.py --model vllm_chatglm2_6b_32k --datasets longbench leval

Reproduces the problem - error message

The difference between the evaluation result parameters and the document long text evaluation is about 20 points, The score for the document can not be reproduced.

“max_seq_len、max_out_len” Should these two parameters be modified in any way?

Other information

No response

liushz · 2024-04-10T14:35:31Z

For optimal performance, it is advisable to configure the max_seq_len parameter to the highest value feasible, such as 32768 or even higher if possible. As for the max_out_len, it typically comes with a preset default value within the dataset configuration. You have the option to adjust this to 256, or you may simply retain the default setting.

bullw · 2024-04-12T03:08:12Z

Thank you very much. I reproduced most of the scores.

I also need to ask, indicators for rouge1, rouge2,rougeL,rougeLsum subset of the score difference is still very large.

What is the reason wow?
What are the indicators used in the rank?

bullw · 2024-04-12T03:11:27Z

@liushz

mm-assistant bot assigned liushz Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Long text evaluation parameters are not clear #1035

[Bug] Long text evaluation parameters are not clear #1035

bullw commented Apr 10, 2024 •

edited

liushz commented Apr 10, 2024

bullw commented Apr 12, 2024 •

edited

bullw commented Apr 12, 2024

[Bug] Long text evaluation parameters are not clear #1035

[Bug] Long text evaluation parameters are not clear #1035

Comments

bullw commented Apr 10, 2024 • edited

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

liushz commented Apr 10, 2024

bullw commented Apr 12, 2024 • edited

bullw commented Apr 12, 2024

bullw commented Apr 10, 2024 •

edited

bullw commented Apr 12, 2024 •

edited