Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows无法识别数据集:datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset #3406

Closed
1 task done
chinsulee opened this issue Apr 24, 2024 · 3 comments
Labels
wontfix This will not be worked on

Comments

@chinsulee
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

04/24/2024 10:50:33 - INFO - llmtuner.data.loader - Loading dataset oaast_rm_zh.json...
Generating train split: 0 examples [00:00, ? examples/s]
Exception in thread Thread-6 (run_exp):
Traceback (most recent call last):
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\builder.py", line 2027, in _prepare_split_single
num_examples, num_bytes = writer.finalize()
^^^^^^^^^^^^^^^^^
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\arrow_writer.py", line 611, in finalize
raise SchemaInferenceError("Please pass features or at least one example when writing data")
datasets.arrow_writer.SchemaInferenceError: Please pass features or at least one example when writing data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\ProgramData\miniconda3\envs\qwen\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "D:\ProgramData\miniconda3\envs\qwen\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "E:\LLM[LLM_files]\LLaMA-Factory-main\src\llmtuner\train\tuner.py", line 39, in run_exp
run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
File "E:\LLM[LLM_files]\LLaMA-Factory-main\src\llmtuner\train\dpo\workflow.py", line 28, in run_dpo
dataset = get_dataset(tokenizer, model_args, data_args, training_args, stage="rm")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LLM[LLM_files]\LLaMA-Factory-main\src\llmtuner\data\loader.py", line 147, in get_dataset
all_datasets.append(load_single_dataset(dataset_attr, model_args, data_args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LLM[LLM_files]\LLaMA-Factory-main\src\llmtuner\data\loader.py", line 95, in load_single_dataset
dataset = load_dataset(
^^^^^^^^^^^^^
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\load.py", line 2609, in load_dataset
builder_instance.download_and_prepare(
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\builder.py", line 1027, in download_and_prepare
self._download_and_prepare(
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\builder.py", line 1122, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\builder.py", line 1882, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\datasets\builder.py", line 2038, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

Expected behavior

在windows境下运行安装完所有的运行环境后,除了explame数据集能正常读取微调 ,其他所有的数据集都没法读取进行微调,包括自带的数据和自己定义的数据都是都失败,请高手们帮忙分析一下怎么解决

System Info

No response

Others

No response

@codemayq
Copy link
Collaborator

请先确定一下数据集的对应文件是否成功下载,datasets的版本也升级一下。

@codemayq codemayq added the pending This problem is yet to be addressed. label Apr 24, 2024
@chinsulee
Copy link
Author

请先确定一下数据集的对应文件是否成功下载,datasets的版本也升级一下。

特意升级datasets的版本到最新版了,数据集用的都是LLaMA-Factory包里自带的本地数据集,就是加载后读取不成功。只有expamle数据集因为有设定expamle。py就可以识别读取出去

@codemayq
Copy link
Collaborator

请 double check 一下 下载的data文件夹的内容是否完整,文件内容是否正确,读取自带数据集是一个标准操作。其他人没有遇到此问题。

@codemayq codemayq added wontfix This will not be worked on and removed pending This problem is yet to be addressed. labels Apr 29, 2024
@hiyouga hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants