Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调IndexError: list index out of range #215

Open
aolerv opened this issue Jan 5, 2024 · 1 comment
Open

微调IndexError: list index out of range #215

aolerv opened this issue Jan 5, 2024 · 1 comment

Comments

@aolerv
Copy link

aolerv commented Jan 5, 2024

(rwkv5_py310) root@autodl-container-f97d11abac-813971fc:~/autodl-tmp/RWKV-LM-main/RWKV-v5# ./demo-training-run.sh
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb972vb43
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb972vb43/_remote_module_non_scriptable.py
INFO:pytorch_lightning.utilities.rank_zero:########## work in progress ##########
/root/miniconda3/envs/rwkv5_py310/lib/python3.10/site-packages/pydantic/_internal/_config.py:321: UserWarning: Valid config keys have changed in V2:

  • 'allow_population_by_field_name' has been renamed to 'populate_by_name'
  • 'validate_all' has been renamed to 'validate_default'
    warnings.warn(message, UserWarning)
    /root/miniconda3/envs/rwkv5_py310/lib/python3.10/site-packages/pydantic/_internal/fields.py:149: UserWarning: Field "model_persistence_threshold" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
/root/miniconda3/envs/rwkv5_py310/lib/python3.10/site-packages/pydantic/_internal/_config.py:321: UserWarning: Valid config keys have changed in V2:

  • 'validate_all' has been renamed to 'validate_default'
    warnings.warn(message, UserWarning)
    Files in model/0.1-1: ['.ipynb_checkpoints']
    Traceback (most recent call last):
    File "/root/autodl-tmp/RWKV-LM-main/RWKV-v5/train.py", line 165, in
    max_p = list_p[-1]
    IndexError: list index out of range

#!/bin/bash

BASE_NAME="model/0.1-1"
N_LAYER="32"
N_EMBD="2560"
M_BSZ="16" # takes 16G VRAM (reduce this to save VRAM)
LR_INIT="1e-5"
LR_FINAL="1e-5"
GRAD_CP=0 # set to 1 to save VRAM (will be slower)
EPOCH_SAVE=10

magic_prime = the largest 3n+2 prime smaller than datalen/ctxlen-1 (= 1498226207/512-1 = 2926222.06 in this case)

use https://www.dcode.fr/prime-numbers-search

python train.py --load_model "/root/autodl-tmp/RWKV-LM-main/RWKV-v5/rwkv-5-World-3B-v2-20231113-ctx4096.pth" --wandb "RWKV-5-Test" --proj_dir $BASE_NAME
--ctx_len 4096 --my_pile_stage 3 --epoch_count 999999 --epoch_begin 0
--data_file "text" --my_exit_tokens 20021619 --magic_prime 4877
--num_nodes 1 --micro_bsz $M_BSZ --n_layer $N_LAYER --n_embd $N_EMBD --pre_ffn 0 --head_qk 0
--lr_init $LR_INIT --lr_final $LR_FINAL --warmup_steps 10 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 --my_pile_edecay 0 --data_type "binidx" --vocab_size 65536
--weight_decay 0.001 --epoch_save $EPOCH_SAVE --head_size_a 64
--accelerator gpu --devices 1 --precision bf16 --strategy deepspeed_stage_2 --grad_cp $GRAD_CP --enable_progress_bar True --ds_bucket_mb 200

环境按照这个装的,pip install torch==1.13.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install pytorch-lightning==1.9.5 deepspeed==0.7.0 wandb ninja

@aolerv aolerv closed this as completed Jan 5, 2024
@aolerv aolerv reopened this Jan 5, 2024
@BlinkDL
Copy link
Owner

BlinkDL commented Jan 15, 2024

rename rwkv-5-World-3B-v2-20231113-ctx4096.pth as rwkv-init.pth and put it in $BASE_NAME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants