You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm evaluating with the officially supported tasks/models/datasets.
Environment
this is my cpu and gpu, I used the following machine for the test, max-works=32
CPU Info : 255 AMD EPYC 7713 64-Core Processor
GPU Info : NVIDIA H800-SXM4-80GB x 8
Reproduces the problem - code/configuration sample
office code
Reproduces the problem - command or script
I used a 2B model (for example qwen1.5 1.8B) to test 13 datasets, and the model was loaded using Huggingface.
I recoeded the time token for each section, and found that the infer task took about 20minutes and the evaluation task took about 12 minutes.
I found the file to calculate ppl and gen scores (predictions dir), about 500M of memory, so, why does it take 12 minutes in multipreocess? Shouldn't a calculation of 500M be done in about a minute?
I found the code to run the evaluation task (opencompass/runners/local.py line 61 ~ 210 ), and found that the most time-consuming part of the evaluation was the serialization and deserialization of the configuration file (disk landing and loading). The code looks like this :
# opencompass/runners/local.py line 180 ~ 188# Dump task config to filemmengine.mkdir_or_exist('tmp/')
param_file=f'tmp/{os.getpid()}_{index}_params.py'try:
task.cfg.dump(param_file) # ************** the most time-consumingtmpl=get_command_template(gpu_ids)
get_cmd=partial(task.get_command,
cfg_path=param_file,
template=tmpl)
When the task was started, I divided the evaluation task and the inference task. The inference task did not change, and the evaluation task eliminated the multi-process operation:
step 1 : Modify the submit function (opencompass/runners/local.py lines 133)
defsubmit(task, index):
# ...ifnum_gpus>0:
tqdm.write(f'launch {task.name} on GPU '+','.join(map(str, gpu_ids)))
else:
tqdm.write(f'launch {task.name} on CPU ')
# Modify Modify Modify if"OpenICLEvalTask"inself.task_cfg['type']:
res=self._launch_eval(task, gpu_ids, index)
else:
res=self._launch_infer(task, gpu_ids, index) # old self._launchpbar.update()
withlock:
gpus[gpu_ids] +=1returnres
It took me 40 seconds to run the 13 datasets before the revised code evaluation.
so, i hope the authorities can improve this bug. And I am now modifying the code so that logs cannot be written to the evaluated datasets logs, so PR is not created!
Reproduces the problem - error message
Evaluation time improvement
Other information
No response
The text was updated successfully, but these errors were encountered:
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
this is my cpu and gpu, I used the following machine for the test, max-works=32
Reproduces the problem - code/configuration sample
office code
Reproduces the problem - command or script
predictions
dir), about 500M of memory, so, why does it take 12 minutes in multipreocess? Shouldn't a calculation of 500M be done in about a minute?self._launch_eval
functionReproduces the problem - error message
Evaluation time improvement
Other information
No response
The text was updated successfully, but these errors were encountered: