You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2023-11-07 15:52:33,664] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-11-07 15:52:33,673] [INFO] [runner.py:540:main] cmd = /home/xiao/ChatGLM-Finetuning/venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=8888 --enable_each_rank_log=None train.py --train_path data/spo_0.json --model_name_or_path ChatGLM2-6B --per_device_train_batch_size 1 --max_len 768 --max_src_len 512 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm --train_type ptuning --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --pre_seq_len 16 --prefix_projection True --output_dir ./output-glm
[2023-11-07 15:52:34,810] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-07 15:52:34,810] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-07 15:52:34,810] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-07 15:52:34,810] [INFO] [launch.py:247:main] dist_world_size=1
[2023-11-07 15:52:34,810] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-11-07 15:52:35,996] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-11-07 15:52:36.106900: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-11-07 15:52:36.132146: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-07 15:52:36.582648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
response.raise_for_status()
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/ChatGLM2-6B/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 430, in cached_file
resolved_file = hf_hub_download(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1346, in hf_hub_download
raise head_call_error
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
metadata = get_hf_file_metadata(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1608, in get_hf_file_metadata
hf_raise_for_status(r)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6549ecc6-73aede0d7a9923e64722742c;8af87e54-9119-4c1a-b48f-0fba11517e3b)
Repository Not Found for url: https://huggingface.co/ChatGLM2-6B/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/train.py", line 235, in
main()
File "/home/xiao/ChatGLM-Finetuning/train.py", line 96, in main
tokenizer = MODE[args.mode]["tokenizer"].from_pretrained(args.model_name_or_path)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1947, in from_pretrained
resolved_config_file = cached_file(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 451, in cached_file
raise EnvironmentError(
OSError: ChatGLM2-6B is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>
[2023-11-07 15:52:38,814] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 28520
[2023-11-07 15:52:38,815] [ERROR] [launch.py:434:sigkill_handler] ['/home/xiao/ChatGLM-Finetuning/venv/bin/python', '-u', 'train.py', '--local_rank=0', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'ChatGLM2-6B', '--per_device_train_batch_size', '1', '--max_len', '768', '--max_src_len', '512', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm', '--train_type', 'ptuning', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--pre_seq_len', '16', '--prefix_projection', 'True', '--output_dir', './output-glm'] exits with return code = 1
The text was updated successfully, but these errors were encountered:
(venv) xiao@spider:~/ChatGLM-Finetuning$ CUDA_VISIBLE_DEVICES=0 deepspeed --master_port 8888 train.py \
[2023-11-07 15:52:33,664] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-11-07 15:52:33,673] [INFO] [runner.py:540:main] cmd = /home/xiao/ChatGLM-Finetuning/venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=8888 --enable_each_rank_log=None train.py --train_path data/spo_0.json --model_name_or_path ChatGLM2-6B --per_device_train_batch_size 1 --max_len 768 --max_src_len 512 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm --train_type ptuning --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --pre_seq_len 16 --prefix_projection True --output_dir ./output-glm
[2023-11-07 15:52:34,810] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-11-07 15:52:34,810] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-11-07 15:52:34,810] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-11-07 15:52:34,810] [INFO] [launch.py:247:main] dist_world_size=1
[2023-11-07 15:52:34,810] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-11-07 15:52:35,996] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-11-07 15:52:36.106900: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
.2023-11-07 15:52:36.132146: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-07 15:52:36.582648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 261, in hf_raise_for_status
response.raise_for_status()
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/ChatGLM2-6B/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 430, in cached_file
resolved_file = hf_hub_download(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1346, in hf_hub_download
raise head_call_error
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
metadata = get_hf_file_metadata(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1608, in get_hf_file_metadata
hf_raise_for_status(r)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6549ecc6-73aede0d7a9923e64722742c;8af87e54-9119-4c1a-b48f-0fba11517e3b)
Repository Not Found for url: https://huggingface.co/ChatGLM2-6B/resolve/main/tokenizer_config.json.
Please make sure you specified the correct
repo_id
andrepo_type
.If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/xiao/ChatGLM-Finetuning/train.py", line 235, in
main()
File "/home/xiao/ChatGLM-Finetuning/train.py", line 96, in main
tokenizer = MODE[args.mode]["tokenizer"].from_pretrained(args.model_name_or_path)
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1947, in from_pretrained
resolved_config_file = cached_file(
File "/home/xiao/ChatGLM-Finetuning/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 451, in cached_file
raise EnvironmentError(
OSError: ChatGLM2-6B is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with
huggingface-cli login
or by passingtoken=<your_token>
[2023-11-07 15:52:38,814] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 28520
[2023-11-07 15:52:38,815] [ERROR] [launch.py:434:sigkill_handler] ['/home/xiao/ChatGLM-Finetuning/venv/bin/python', '-u', 'train.py', '--local_rank=0', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'ChatGLM2-6B', '--per_device_train_batch_size', '1', '--max_len', '768', '--max_src_len', '512', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm', '--train_type', 'ptuning', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--pre_seq_len', '16', '--prefix_projection', 'True', '--output_dir', './output-glm'] exits with return code = 1
The text was updated successfully, but these errors were encountered: