Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
benchmark
framework
evaluation
dataset
hallucination
aquila
unconstrained
baichuan
gpt-3
hallucinations
gpt-4
large-language-models
llm
chatgpt
chatglm
internlm
qwen
hallucination-detection
truthfulqa
-
Updated
Mar 25, 2024 - Python