Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training process cannot continue #1536

Open
xgySTATISICT opened this issue Jul 4, 2023 · 7 comments
Open

The training process cannot continue #1536

xgySTATISICT opened this issue Jul 4, 2023 · 7 comments

Comments

@xgySTATISICT
Copy link

I tried to train, but the logs stopped updating at this step, even after 12 hours.
image

@DamithDR
Copy link
Contributor

DamithDR commented Jul 4, 2023

@xgySTATISICT Can you post your configurations used to train the model?

@songzetao
Copy link

I also encountered the same problem, and I tried both CPU and GPU, but couldn't continue. Here is my configuration.

model = ClassificationModel(Model1, Model2,                                   
                                    args={'num_train_epochs':1,
                                          'overwrite_output_dir': True,
                                          'use_early_stopping':False,
                                          'use_cuda':False,
                                          'train_batch_size':50,
                                          'do_lower_case':True, 
                                          'silent':False,
                                          'no_cache':True, 
                                          'no_save':True
                                          }
                                    )

    # Train the Model
    model.train_model(train_df)

@DamithDR
Copy link
Contributor

DamithDR commented Jul 28, 2023

@songzetao I have encountered similar problem and I tried the following workaround. You may try too. Add the following to your configurations. Basically we are turning off multiprocessing.

use_multiprocessing = False
use_multiprocessing_for_evaluation = False

@songzetao
Copy link

@DamithDR Thank you very much for your answer. It really worked. Thank you again!😊

@DamithDR
Copy link
Contributor

@songzetao Glad it worked :)

@swardiantara
Copy link

I encounter the same problem. I have tried to add several fixes from others, as below.

args.use_multiprocessing = False, args.use_multiprocessing_for_evaluation = False args.process_count = 1

os.environ["TOKENIZERS_PARALLELISM"] = "false"

But still, the training stuck at: Converting to features started. Cache is not used.

@DamithDR
Copy link
Contributor

@swardiantara Can you post any logs you get and may be a screenshot where you got stuck?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants