Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用baichuan-7b评估humaneval数据与榜单差距过大 #1038

Open
2 tasks done
zh190920 opened this issue Apr 11, 2024 · 2 comments
Open
2 tasks done

使用baichuan-7b评估humaneval数据与榜单差距过大 #1038

zh190920 opened this issue Apr 11, 2024 · 2 comments
Assignees

Comments

@zh190920
Copy link

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA H800',
'MMEngine': '0.10.3',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'OpenCV': '4.9.0',
'PyTorch': '2.2.2+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.3.2 (Git Hash '
'2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 8.9.2\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=8.9.2, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=pedantic '
'-Wno-error=old-style-cast -Wno-missing-braces '
'-fdiagnostics-color=always -faligned-new '
'-Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, '
'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, '
'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, '
'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
'USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]',
'TorchVision': '0.17.2+cu121',
'numpy_random_seed': 2147483648,
'opencompass': '0.2.2+',
'sys.platform': 'linux'}

重现问题 - 代码/配置示例

`from opencompass.models import HuggingFaceCausalLM

models = [
dict(
type=HuggingFaceCausalLM,
abbr='baichuan-7b-hf',
path="/home/jovyan/zh/benchmark/pretrain_model/baichuan-7b",
tokenizer_path='/home/jovyan/zh/benchmark/pretrain_model/baichuan-7b',
tokenizer_kwargs=dict(padding_side='left',
truncation_side='left',
trust_remote_code=True,
use_fast=False,),
max_out_len=100,
max_seq_len=2048,
batch_size=8,
# batch_padding=False,
model_kwargs=dict(device_map='auto', trust_remote_code=True),
run_cfg=dict(num_gpus=1, num_procs=1),
)
]`

重现问题 - 命令或脚本

python run.py --models hf_baichuan_7b --datasets humaneval_gen_8e312c

重现问题 - 错误信息

与榜单差距过大,榜单baichuan-7b指标在9.1,根据默认配置只能到1.81。

其他信息

No response

@zh190920 zh190920 changed the title 使用baichuan-7b评估humaneval数据与帮当差距过大 使用baichuan-7b评估humaneval数据与榜单差距过大 Apr 11, 2024
@zh190920
Copy link
Author

@hzhwcmhf @x22x22 @Sanster Looking forward to your reply

@PJY-coder
Copy link

我也遇到了类似的情况,测试wizardcoder,starcoder2这两个模型在humaneval上的结果很差。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants