Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online Training and Incremental Training for streaming data (New data) #1731

Open
Rajaram1604 opened this issue Feb 21, 2024 · 3 comments
Open

Comments

@Rajaram1604
Copy link

Rajaram1604 commented Feb 21, 2024

Needs to train the model through Online Training and Incremental Training for streaming data, as soon as new data becomes available.

When I use fit() with BayesianEstimator to fit the only new data from the existing previous model. It wipes out the old data from previous model

Please look at this issue, the same issue happening for me. #1167,

Also, when I use fit_update() to merge the new data along with existing previous model data . It saying that ValueError: Data contains unexpected states for variable:

So now combining both old data and new data and train the model by using fit() method which is training the model from the scratch but this approach will not be suitable for me as old data is too large. So don't want to retrain the model from the scratch every time new data becomes available.

I wanted to update the existing model, not wipe or rewrite the existing model data.

Is there any possibilities on the below.

  1. Online/Incremental training for streaming data without structural change approach(without any node or parameter change) for Bayesian network model by using pgmpy lib.
  2. Online/Incremental training for streaming data with structural change( in case of any new node or parameter needs to add with the previous model) for Bayesian network model by using pgmpy lib.
@Rajaram1604 Rajaram1604 changed the title Online Training and Incremental Training for streaming data Online Training and Incremental Training for streaming data (New data) Feb 21, 2024
@ankurankan
Copy link
Member

@Rajaram1604

Online/Incremental training for streaming data without structural change approach(without any node or parameter > change) for Bayesian network model by using pgmpy lib.

I think you should be able to use the fit_update method to do incremental updates of only the parameters. The error that you mentioned could be happening because some of the states that the variables are taking in your new data were not present in the previous training dataset. pgmpy currently cannot handle this automatically, but you can pass all possible states of each variable in the initial fit call by specifying the state_names argument. And this should fix the error that you are getting with fit_update.

Online/Incremental training for streaming data with structural change( in case of any new node or parameter needs to add with the previous model) for Bayesian network model by using pgmpy lib.

Technically, it is possible by using the HillClimbSearch (https://pgmpy.org/structure_estimator/hill.html) algorithm by specifying the previous DAG as the start_dag argument for the algorithm. However, as Hill Climb is a local optimization method, I am not sure how well this approach would work. I think there must be better methods to do this but I am not familiar with the literature around this.

@Rajaram1604
Copy link
Author

Thanks Ankur for the quick response. Yes, Its working after passing the state_names with all the possibilities values which may come for the update while fitting the model with initial data itself. i. e while calling the fit() method.

Also, With respect to structural change, I have just gone through the shared link. I think this HillClimbSearch or other supported algorithms like PC and others are defining the nodes and edges from the learning data.

But my question is that , need to add new nodes(latent variable) and edges to the previous trained model but don't want retrain the model from the scratch, just have to adapt the new changes kind of incremental learning along with node and edge changes.

@ankurankan
Copy link
Member

@Rajaram1604

Also, With respect to structural change, I have just gone through the shared link. I think this HillClimbSearch or other supported algorithms like PC and others are defining the nodes and edges from the learning data.

In this case, if you already know all possible nodes, you can simply add them to the initial learned structure as disconnected nodes and then pass it as the starting point for the next iteration.

But also since you mentioned that you have latent variables in the model, learning those aren't possible with pgmpy yet. I also looked at the literature and there seem to be better ways for doing online structure learning. I think this: https://arxiv.org/pdf/1904.13247.pdf might be quite relevant in your problem setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants