Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 191: invalid start byte #495

Open
sth0114 opened this issue Aug 19, 2023 · 0 comments

Comments

@sth0114
Copy link

sth0114 commented Aug 19, 2023

1660TI 6Gx4
使用的是docker部署 ,跑run_sft.sh 微调脚本报错
Sat Aug 19 15:07:37 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:02:00.0 Off | N/A |
| 0% 36C P8 5W / 120W | 10MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce GTX 1660 Ti Off | 00000000:03:00.0 Off | N/A |
| 0% 37C P8 8W / 120W | 10MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce GTX 1660 Ti Off | 00000000:82:00.0 Off | N/A |
| 0% 37C P8 7W / 120W | 10MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce GTX 1660 Ti Off | 00000000:83:00.0 Off | N/A |
| 0% 36C P8 6W / 120W | 10MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 705, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 796, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 191: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/entry_point/sft_train.py", line 514, in
main()
File "src/entry_point/sft_train.py", line 235, in main
model = LlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2360, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 591, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 708, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at '/home/BELLE/models/to_finetuned_model/config.json' is not a valid JSON file.
08/19/2023 09:15:28 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, fp16-bits training: True, bf16-bits training: False
08/19/2023 09:15:28 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, fp16-bits training: True, bf16-bits training: False
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 705, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 796, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 191: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/entry_point/sft_train.py", line 514, in
main()
File "src/entry_point/sft_train.py", line 235, in main
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 705, in _get_config_dict
model = LlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2360, in from_pretrained
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 796, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.8/codecs.py", line 322, in decode
config, model_kwargs = cls.config_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 591, in from_pretrained
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 191: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/entry_point/sft_train.py", line 514, in
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict
main()
File "src/entry_point/sft_train.py", line 235, in main
model = LlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2360, in from_pretrained
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 708, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at '/home/BELLE/models/to_finetuned_model/config.json' is not a valid JSON file.
config, model_kwargs = cls.config_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 591, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 708, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at '/home/BELLE/models/to_finetuned_model/config.json' is not a valid JSON file.
08/19/2023 09:15:28 - WARNING - main - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, fp16-bits training: True, bf16-bits training: False
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 705, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 796, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 191: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "src/entry_point/sft_train.py", line 514, in
main()
File "src/entry_point/sft_train.py", line 235, in main
model = LlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2360, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 591, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 708, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at '/home/BELLE/models/to_finetuned_model/config.json' is not a valid JSON file.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5881) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

src/entry_point/sft_train.py FAILED

Failures:
[1]:
time : 2023-08-19_09:15:32
host : ubuntu1
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 5882)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-08-19_09:15:32
host : ubuntu1
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 5883)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-08-19_09:15:32
host : ubuntu1
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 5884)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-08-19_09:15:32
host : ubuntu1
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5881)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

使用的是
train_file=belleMath.json
validation_file=belleMath-dev1K.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant