Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networkx.exception.NetworkXError: The node ETS is not in the digraph. #4183

Closed
LeonTing1010 opened this issue May 8, 2024 · 5 comments · Fixed by #4202
Closed

networkx.exception.NetworkXError: The node ETS is not in the digraph. #4183

LeonTing1010 opened this issue May 8, 2024 · 5 comments · Fixed by #4202
Assignees
Labels
bug Something isn't working module: timeseries related to the timeseries module Needs Triage Issue requires Triage priority: 0 Maximum priority
Milestone

Comments

@LeonTing1010
Copy link

Traceback (most recent call last):
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/classes/digraph.py", line 927, in predecessors
return iter(self._pred[n])
KeyError: 'ETS'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/leo/web3/LLM/langchain/mlts/auto_gluon.py", line 32, in
forecast_entry_df = predictor.predict(train_data)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/predictor.py", line 845, in predict
predictions = self._learner.predict(
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/learner.py", line 185, in predict
return self.load_trainer().predict(
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 892, in predict
model_pred_dict = self.get_model_pred_dict(
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 1199, in get_model_pred_dict
pred_time_dict_total = self._get_total_pred_time_from_marginal(pred_time_dict_marginal)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 1211, in _get_total_pred_time_from_marginal
for base_model in self.get_minimum_model_set(model_name):
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/autogluon/timeseries/trainer/abstract_trainer.py", line 174, in get_minimum_model_set
minimum_model_set = list(nx.bfs_tree(self.model_graph, model, reverse=True))
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/algorithms/traversal/breadth_first_search.py", line 235, in bfs_tree
T.add_edges_from(edges_gen)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/classes/digraph.py", line 768, in add_edges_from
for e in ebunch_to_add:
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/algorithms/traversal/breadth_first_search.py", line 170, in bfs_edges
yield from generic_bfs_edges(G, source, successors, depth_limit, sort_neighbors)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/algorithms/traversal/breadth_first_search.py", line 77, in generic_bfs_edges
queue = deque([(source, depth_limit, neighbors(source))])
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/networkx/classes/digraph.py", line 929, in predecessors
raise NetworkXError(f"The node {n} is not in the digraph.") from err
networkx.exception.NetworkXError: The node ETS is not in the digraph.

@LeonTing1010 LeonTing1010 added bug: unconfirmed Something might not be working Needs Triage Issue requires Triage labels May 8, 2024
@Innixma Innixma added the module: timeseries related to the timeseries module label May 8, 2024
@Innixma
Copy link
Contributor

Innixma commented May 8, 2024

Hello @LeonTing1010,

Can you please provide additional details, such as a reproducible code example and the AutoGluon version used?

@Innixma Innixma added this to the 1.2 Release milestone May 8, 2024
@vamshik113
Copy link

Getting the same error but for DeepAR model.

Here are the logs:

`Warning: path already exists! This predictor may overwrite an existing predictor! path="output_model_2"
Beginning AutoGluon training...
AutoGluon will save models to 'output_model_2'
=================== System Info ===================
AutoGluon Version: 1.1.0
Python Version: 3.11.0
Operating System: Linux
Platform Machine: x86_64
Platform Version: #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024
CPU Count: 8
GPU Count: 0
Memory Avail: 24.42 GB / 31.05 GB (78.6%)
Disk Space Avail: 6.77 GB / 48.28 GB (14.0%)
WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception.
We recommend a minimum available disk space of 10 GB, and large datasets may require more.

Setting presets to: best_quality

Fitting with arguments:
{'enable_ensemble': True,
'eval_metric': MAPE,
'freq': 'W-MON',
'hyperparameters': 'default',
'known_covariates_names': [],
'num_val_windows': 10,
'prediction_length': 16,
'quantile_levels': [0.1, 0.5, 0.9],
'random_seed': 123,
'refit_every_n_windows': 1,
'refit_full': True,
'skip_model_selection': False,
'target': 'ship_qty',
'verbosity': 2}

Provided train_data has 373779 rows, 1071 time series. Median time series length is 349 (min=349, max=349).

Provided data contains following columns:
target: 'ship_qty'
past_covariates:
categorical: ['category', 'market_code']
continuous (float): ['market_size_sales', 'stock_days', 'product_consumption', 'weighted_avg_product_price', 'price_gap_1', 'covid', ...]

To learn how to fix incorrectly inferred types, please see documentation for TimeSeriesPredictor.fit

AutoGluon will gauge predictive performance using evaluation metric: 'MAPE'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.

Starting training. Start time is 2024-05-10 11:50:25
Models that will be trained: ['SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'CrostonSBA', 'NPTS', 'DynamicOptimizedTheta', 'AutoETS', 'AutoARIMA', 'Chronos[base]', 'TemporalFusionTransformer', 'DeepAR', 'PatchTST']
Training timeseries model SeasonalNaive.
-1.9552 = Validation score (-MAPE)
17.19 s = Training runtime
1.04 s = Validation (prediction) runtime
Training timeseries model RecursiveTabular.
-1.7324 = Validation score (-MAPE)
3158.88 s = Training runtime
1.06 s = Validation (prediction) runtime
Training timeseries model DirectTabular.
-2.2581 = Validation score (-MAPE)
54.85 s = Training runtime
1.15 s = Validation (prediction) runtime
Training timeseries model CrostonSBA.
-1.9592 = Validation score (-MAPE)
32.32 s = Training runtime
1.63 s = Validation (prediction) runtime
Training timeseries model NPTS.
-1.5177 = Validation score (-MAPE)
124.04 s = Training runtime
16.45 s = Validation (prediction) runtime
Training timeseries model DynamicOptimizedTheta.
-2.1870 = Validation score (-MAPE)
193.77 s = Training runtime
32.40 s = Validation (prediction) runtime
Training timeseries model AutoETS.
-1.8336 = Validation score (-MAPE)
72.40 s = Training runtime
4.96 s = Validation (prediction) runtime
Training timeseries model AutoARIMA.
Warning: AutoARIMA/W4 failed for 12 time series (1.1%). Fallback model SeasonalNaive was used for these time series.
-1.4207 = Validation score (-MAPE)
673.52 s = Training runtime
94.51 s = Validation (prediction) runtime
Training timeseries model Chronos[base].
Warning: Exception caused Chronos[base] to fail during training... Skipping this model.
Chronos[base]/W0 requires a GPU to run, but no GPU was detected. Please make sure that you are using a computer with a CUDA-compatible GPU and import torch; torch.cuda.is_available() returns True.
Training timeseries model TemporalFusionTransformer.
Warning: Exception caused TemporalFusionTransformer to fail during training... Skipping this model.
Could not deserialize ATN with version � (expected 4).
Training timeseries model DeepAR.
Warning: Exception caused DeepAR to fail during training... Skipping this model.
Could not deserialize ATN with version � (expected 4).
Training timeseries model PatchTST.
Warning: Exception caused PatchTST to fail during training... Skipping this model.
Could not deserialize ATN with version � (expected 4).
Fitting simple weighted ensemble.
Ensemble weights: {'AutoARIMA': 0.59, 'NPTS': 0.41}
-1.3364 = Validation score (-MAPE)
44.60 s = Training runtime
110.96 s = Validation (prediction) runtime
Training complete. Models trained: ['SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'CrostonSBA', 'NPTS', 'DynamicOptimizedTheta', 'AutoETS', 'AutoARIMA', 'WeightedEnsemble']
Total runtime: 4539.08 s
Best model: WeightedEnsemble
Best model score: -1.3364
WARNING: refit_full functionality for TimeSeriesPredictor is experimental and is not yet supported by all models.
Refitting models via refit_full using all of the data (combined train and validation)...
Models trained in this way will have the suffix '_FULL' and have NaN validation score.
This process is not bound by time_limit, but should take less time than the original fit call.
Fitting model: SeasonalNaive_FULL | Skipping fit via cloning parent ...
Fitting model: RecursiveTabular_FULL
341.75 s = Training runtime
Fitting model: DirectTabular_FULL
4.26 s = Training runtime
Fitting model: CrostonSBA_FULL | Skipping fit via cloning parent ...
Fitting model: NPTS_FULL | Skipping fit via cloning parent ...
Fitting model: DynamicOptimizedTheta_FULL | Skipping fit via cloning parent ...
Fitting model: AutoETS_FULL | Skipping fit via cloning parent ...
Fitting model: AutoARIMA_FULL | Skipping fit via cloning parent ...
Fitting model: WeightedEnsemble_FULL | Skipping fit via cloning parent ...
Refit complete. Models trained: ['SeasonalNaive_FULL', 'RecursiveTabular_FULL', 'DirectTabular_FULL', 'CrostonSBA_FULL', 'NPTS_FULL', 'DynamicOptimizedTheta_FULL', 'AutoETS_FULL', 'AutoARIMA_FULL', 'WeightedEnsemble_FULL']
Total runtime: 346.19 s
Updated best model to 'WeightedEnsemble_FULL' (Previously 'WeightedEnsemble'). AutoGluon will default to using 'WeightedEnsemble_FULL' for predict().
Model not specified in predict, will default to the model with the best validation score: WeightedEnsemble_FULL`

And here is the error when predicting:

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/classes/digraph.py:936, in DiGraph.predecessors(self, n)
935 try:
--> 936 return iter(self._pred[n])
937 except KeyError as err:

KeyError: 'DeepAR_FULL'

The above exception was the direct cause of the following exception:

NetworkXError Traceback (most recent call last)
Cell In[10], line 27
10 predictor = TimeSeriesPredictor(
11 freq=frequency,
12 prediction_length=16,
(...)
16 quantile_levels=[0.1, 0.5, 0.9]
17 )
19 predictor.fit(
20 train_data,
21 presets=train_preset_type,
(...)
24
25 )
---> 27 predictions = predictor.predict(
28 data=train_data,
29 )
31 predictions_1 = predictions.reset_index()
34 model_name = "AutoGluon"

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/predictor.py:845, in TimeSeriesPredictor.predict(self, data, known_covariates, model, use_cache, random_seed)
843 if known_covariates is not None:
844 known_covariates = self._to_data_frame(known_covariates)
--> 845 predictions = self._learner.predict(
846 data,
847 known_covariates=known_covariates,
848 model=model,
849 use_cache=use_cache,
850 random_seed=random_seed,
851 )
852 return predictions.reindex(original_item_id_order, level=ITEMID)

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/learner.py:185, in TimeSeriesLearner.predict(self, data, known_covariates, model, use_cache, random_seed, **kwargs)
183 known_covariates = self.feature_generator.transform_future_known_covariates(known_covariates)
184 known_covariates = self._align_covariates_with_forecast_index(known_covariates=known_covariates, data=data)
--> 185 return self.load_trainer().predict(
186 data=data,
187 known_covariates=known_covariates,
188 model=model,
189 use_cache=use_cache,
190 random_seed=random_seed,
191 **kwargs,
192 )

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/trainer/abstract_trainer.py:892, in AbstractTimeSeriesTrainer.predict(self, data, known_covariates, model, use_cache, random_seed, **kwargs)
882 def predict(
883 self,
884 data: TimeSeriesDataFrame,
(...)
889 **kwargs,
890 ) -> TimeSeriesDataFrame:
891 model_name = self._get_model_for_prediction(model)
--> 892 model_pred_dict = self.get_model_pred_dict(
893 model_names=[model_name],
894 data=data,
895 known_covariates=known_covariates,
896 use_cache=use_cache,
897 random_seed=random_seed,
898 )
899 return model_pred_dict[model_name]

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/trainer/abstract_trainer.py:1199, in AbstractTimeSeriesTrainer.get_model_pred_dict(self, model_names, data, known_covariates, record_pred_time, raise_exception_if_failed, use_cache, random_seed)
1195 if self.cache_predictions and use_cache:
1196 self._save_cached_pred_dicts(
1197 dataset_hash, model_pred_dict=model_pred_dict, pred_time_dict=pred_time_dict_marginal
1198 )
-> 1199 pred_time_dict_total = self._get_total_pred_time_from_marginal(pred_time_dict_marginal)
1201 final_model_pred_dict = {model_name: model_pred_dict[model_name] for model_name in model_names}
1202 final_pred_time_dict_total = {model_name: pred_time_dict_total[model_name] for model_name in model_names}

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/trainer/abstract_trainer.py:1211, in AbstractTimeSeriesTrainer._get_total_pred_time_from_marginal(self, pred_time_dict_marginal)
1209 pred_time_dict_total = defaultdict(float)
1210 for model_name in pred_time_dict_marginal.keys():
-> 1211 for base_model in self.get_minimum_model_set(model_name):
1212 if pred_time_dict_marginal[base_model] is not None:
1213 pred_time_dict_total[model_name] += pred_time_dict_marginal[base_model]

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/autogluon/timeseries/trainer/abstract_trainer.py:174, in SimpleAbstractTrainer.get_minimum_model_set(self, model, include_self)
172 if not isinstance(model, str):
173 model = model.name
--> 174 minimum_model_set = list(nx.bfs_tree(self.model_graph, model, reverse=True))
175 if not include_self:
176 minimum_model_set = [m for m in minimum_model_set if m != model]

File <class 'networkx.utils.decorators.argmap'> compilation 4:3, in argmap_bfs_tree_1(G, source, reverse, depth_limit, sort_neighbors, backend, **backend_kwargs)
compilation 4:1'>1 import bz2
compilation 4:2'>2 import collections
----> compilation 4:3'>3 import gzip
compilation 4:4'>4 import inspect
compilation 4:5'>5 import itertools

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/utils/backends.py:633, in _dispatchable.call(self, backend, *args, **kwargs)
628 """Returns the result of the original function, or the backend function if
629 the backend is specified and that backend implements func."""
631 if not backends:
632 # Fast path if no backends are installed
--> 633 return self.orig_func(*args, **kwargs)
635 # Use backend_name in this function instead of backend
636 backend_name = backend

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/algorithms/traversal/breadth_first_search.py:285, in bfs_tree(G, source, reverse, depth_limit, sort_neighbors)
277 T.add_node(source)
278 edges_gen = bfs_edges(
279 G,
280 source,
(...)
283 sort_neighbors=sort_neighbors,
284 )
--> 285 T.add_edges_from(edges_gen)
286 return T

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/classes/digraph.py:774, in DiGraph.add_edges_from(self, ebunch_to_add, **attr)
719 def add_edges_from(self, ebunch_to_add, **attr):
720 """Add all the edges in ebunch_to_add.
721
722 Parameters
(...)
772 >>> G.add_edges_from(list((5, n) for n in G.nodes))
773 """
--> 774 for e in ebunch_to_add:
775 ne = len(e)
776 if ne == 3:

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/algorithms/traversal/breadth_first_search.py:218, in bfs_edges(G, source, reverse, depth_limit, sort_neighbors)
214 yield from generic_bfs_edges(
215 G, source, lambda node: iter(sort_neighbors(successors(node))), depth_limit
216 )
217 else:
--> 218 yield from generic_bfs_edges(G, source, successors, depth_limit)

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/algorithms/traversal/breadth_first_search.py:117, in generic_bfs_edges(G, source, neighbors, depth_limit, sort_neighbors)
115 n = len(G)
116 depth = 0
--> 117 next_parents_children = [(source, neighbors(source))]
118 while next_parents_children and depth < depth_limit:
119 this_parents_children = next_parents_children

File ~/vamshi/ml-platform/.venv/lib/python3.11/site-packages/networkx/classes/digraph.py:938, in DiGraph.predecessors(self, n)
936 return iter(self._pred[n])
937 except KeyError as err:
--> 938 raise NetworkXError(f"The node {n} is not in the digraph.") from err

NetworkXError: The node DeepAR_FULL is not in the digraph.`

@shchur
Copy link
Collaborator

shchur commented May 14, 2024

@vamshik113 according to the log, you are saving the predictor to a directory that already contains a trained predictor. Can you please try removing the directory or providing a new path to the predictor and see if the problem persists?

@Innixma
Copy link
Contributor

Innixma commented May 15, 2024

@shchur Even if they are saving to an existing predictor location, shouldn't the logic work correctly without issue? Only way I'd foresee an error is if 1. The old predictor.pkl / trainer.pkl / learner.pkl isn't overwritten, 2. The new model files do not overwrite the old ones, 3. We have logic that looks for the existance of files to determine what to do next, which introduces bugs when re-using the same save location as an old run.

@Innixma Innixma added bug Something isn't working and removed bug: unconfirmed Something might not be working labels May 15, 2024
@Innixma Innixma modified the milestones: 1.2 Release, 1.1.1 Release May 15, 2024
@Innixma Innixma added the priority: 0 Maximum priority label May 15, 2024
@vamshik113
Copy link

@shchur Yes, figured it out that the issue is due to the existing directory with already trained predictors. After deleting, the issue was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: timeseries related to the timeseries module Needs Triage Issue requires Triage priority: 0 Maximum priority
Projects
None yet
4 participants