Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: context has already been set - Finetuning deepdoctection #313

Open
ghost opened this issue Apr 12, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Apr 12, 2024

Hello,
I am trying to fine-tune deepdoctection tensorpack on fintabnet dataset. I followed instructions on fine-tune notebook but I got this error :
RuntimeError Traceback (most recent call last)
Cell In[140], line 1
----> 1 temp = dd.train_faster_rcnn(path_config_yaml=path_config_yaml,
2 dataset_train=fintabnet,
3 path_weights=path_weights,
4 config_overwrite=config_overwrite,
5 log_dir="/kaggle/working/logs/",
6 build_train_config=build_train_config,
7 dataset_val=dataset_val,
8 build_val_config=build_val_config,
9 metric_name="coco",
10 pipeline_component_name="ImageLayoutService"
11 )
File /kaggle/working/deepdoctection/deepdoctection/train/tp_frcnn_train.py:253, in train_faster_rcnn(path_config_yaml, dataset_train, path_weights, config_overwrite, log_dir, build_train_config, dataset_val, build_val_config, metric_name, metric, pipeline_component_name)
File /kaggle/working/deepdoctection/deepdoctection/train/tp_frcnn_train.py:136, in get_train_dataflow(dataset, config, use_multi_proc_for_train, **build_train_kwargs)
File /kaggle/working/deepdoctection/deepdoctection/utils/file_utils.py:657, in set_mp_spawn()
655 if not _S.mp_context_set:
656 _S.freeze(False)
--> 657 mp.set_start_method("spawn")
658 _S.mp_context_set = True
659 _S.freeze()
File /opt/conda/lib/python3.10/multiprocessing/context.py:247, in DefaultContext.set_start_method(self, method, force)
245 def set_start_method(self, method, force=False):
246 if self._actual_context is not None and not force:
--> 247 raise RuntimeError('context has already been set')
248 if method is None and force:
249 self._actual_context = None
RuntimeError: context has already been set

My code :
temp = dd.train_faster_rcnn(path_config_yaml=path_config_yaml,
dataset_train=fintabnet,
path_weights=path_weights,
config_overwrite=config_overwrite,
log_dir="/kaggle/working/logs/",
build_train_config=build_train_config,
dataset_val=dataset_val,
build_val_config=build_val_config,
metric_name="coco",
pipeline_component_name="ImageLayoutService"
)

Log :
log-3.log

@JaMe76
Copy link
Contributor

JaMe76 commented Apr 12, 2024

In build_train_config_overwrite add the argument ‘use_multi_proc=False’. If you are doing evaluation do the same for ‘build_val_config_overwrite’.
Fintabnet converts PDFs into images and the default setting uses multiprocessing to speed this up. Unfortunately this collides during training with multiprocessing image loading. Therefore you need to switch off one multiprocessing transformation in the trading dataflow.

Hope, this works.

@ghost
Copy link
Author

ghost commented Apr 13, 2024

Same error when I add use_multi_proc=False :

fintabnet = dd.get_dataset("fintabnet")
fintabnet.dataflow.categories.set_cat_to_sub_cat({"item":"item"})
fintabnet.dataflow.categories.filter_categories(["row","column"])
#path_config_yaml=os.path.join(dd.get_configs_dir_path(),"tp/cell/conf_frcnn_cell.yaml")
path_config_yaml='/kaggle/input/yamlfile/conf_frcnn_cell.yaml'
path_weights = os.path.join(dd.get_weights_dir_path(),"item/model-1750000.data-00000-of-00001")
dataset_train = fintabnet
config_overwrite=["TRAIN.STEPS_PER_EPOCH=5000","TRAIN.EVAL_PERIOD=20","TRAIN.STARTING_EPOCH=1",
"PREPROC.TRAIN_SHORT_EDGE_SIZE=[400,600]","TRAIN.CHECKPOINT_PERIOD=20",
"TRAIN.LR_SCHEDULE=1x",
"BACKBONE.FREEZE_AT=0"]
build_train_config=["max_datapoints=10000","rows_and_cols=True","use_multi_proc=False"]
dataset_val = fintabnet
build_val_config = ["max_datapoints=4000","rows_and_cols=True","use_multi_proc=False"]

RuntimeError Traceback (most recent call last)
Cell In[23], line 1
----> 1 temp = dd.train_faster_rcnn(path_config_yaml=path_config_yaml,
2 dataset_train=fintabnet,
3 path_weights=path_weights,
4 config_overwrite=config_overwrite,
5 log_dir="/kaggle/working/logs/",
6 build_train_config=build_train_config,
7 dataset_val=dataset_val,
8 build_val_config=build_val_config,
9 metric_name="coco",
10 pipeline_component_name="ImageLayoutService"
11 )

File /kaggle/working/deepdoctection/deepdoctection/train/tp_frcnn_train.py:253, in train_faster_rcnn(path_config_yaml, dataset_train, path_weights, config_overwrite, log_dir, build_train_config, dataset_val, build_val_config, metric_name, metric, pipeline_component_name)
249 model = ResNetFPNModel(config=config)
251 warmup_schedule, lr_schedule, step_number = train_frcnn_config(config)
--> 253 train_dataflow = get_train_dataflow(dataset_train, config, True, **build_train_dict)
254 # This is what's commonly referred to as "epochs"
256 try:

File /kaggle/working/deepdoctection/deepdoctection/train/tp_frcnn_train.py:136, in get_train_dataflow(dataset, config, use_multi_proc_for_train, **build_train_kwargs)
122 def get_train_dataflow(
123 dataset: DatasetBase, config: AttrDict, use_multi_proc_for_train: bool, **build_train_kwargs: str
124 ) -> DataFlow:
125 """
126 Return a dataflow for training TP Frcnn. The returned dataflow depends on the dataset and the configuration of
127 the model, as the augmentation is part of the data preparation.
(...)
133 :return: A dataflow
134 """
--> 136 set_mp_spawn()
137 cfg = config
138 df = dataset.dataflow.build(**build_train_kwargs)

File /kaggle/working/deepdoctection/deepdoctection/utils/file_utils.py:657, in set_mp_spawn()
655 if not _S.mp_context_set:
656 _S.freeze(False)
--> 657 mp.set_start_method("spawn")
658 _S.mp_context_set = True
659 _S.freeze()

File /opt/conda/lib/python3.10/multiprocessing/context.py:247, in DefaultContext.set_start_method(self, method, force)
245 def set_start_method(self, method, force=False):
246 if self._actual_context is not None and not force:
--> 247 raise RuntimeError('context has already been set')
248 if method is None and force:
249 self._actual_context = None

RuntimeError: context has already been set

@ghost
Copy link
Author

ghost commented Apr 13, 2024

When I add use_multi_proc_for_train=False, I got another error :
fintabnet = dd.get_dataset("fintabnet")
fintabnet.dataflow.categories.set_cat_to_sub_cat({"item":"item"})
fintabnet.dataflow.categories.filter_categories(["row","column"])
#path_config_yaml=os.path.join(dd.get_configs_dir_path(),"tp/cell/conf_frcnn_cell.yaml")
path_config_yaml='/kaggle/input/yamlfile/conf_frcnn_cell.yaml'
path_weights = os.path.join(dd.get_weights_dir_path(),"item/model-1750000.data-00000-of-00001")
dataset_train = fintabnet
config_overwrite=["TRAIN.STEPS_PER_EPOCH=5000","TRAIN.EVAL_PERIOD=20","TRAIN.STARTING_EPOCH=1",
"PREPROC.TRAIN_SHORT_EDGE_SIZE=[400,600]","TRAIN.CHECKPOINT_PERIOD=20",
"TRAIN.LR_SCHEDULE=1x",
"BACKBONE.FREEZE_AT=0"]
build_train_config=["max_datapoints=10000","rows_and_cols=True","use_multi_proc_for_train=False"]
dataset_val = fintabnet
build_val_config = ["max_datapoints=4000","rows_and_cols=True","use_multi_proc_for_val=False"]

TypeError Traceback (most recent call last)
Cell In[53], line 1
----> 1 temp = dd.train_faster_rcnn(path_config_yaml=path_config_yaml,
2 dataset_train=fintabnet,
3 path_weights=path_weights,
4 config_overwrite=config_overwrite,
5 log_dir="/kaggle/working/logs/",
6 build_train_config=build_train_config,
7 dataset_val=dataset_val,
8 build_val_config=build_val_config,
9 metric_name="coco",
10 pipeline_component_name="ImageLayoutService"
11 )

File /kaggle/working/deepdoctection/deepdoctection/train/tp_frcnn_train.py:253, in train_faster_rcnn(path_config_yaml, dataset_train, path_weights, config_overwrite, log_dir, build_train_config, dataset_val, build_val_config, metric_name, metric, pipeline_component_name)
249 model = ResNetFPNModel(config=config)
251 warmup_schedule, lr_schedule, step_number = train_frcnn_config(config)
--> 253 train_dataflow = get_train_dataflow(dataset_train, config, True, **build_train_dict)
254 # This is what's commonly referred to as "epochs"
256 try:
TypeError: get_train_dataflow() got multiple values for argument 'use_multi_proc_for_train'

@JaMe76 JaMe76 added the bug Something isn't working label Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant