Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter max_epochs isn't respected from history #674

Open
ratteripenta opened this issue Jul 28, 2020 · 8 comments
Open

Parameter max_epochs isn't respected from history #674

ratteripenta opened this issue Jul 28, 2020 · 8 comments

Comments

@ratteripenta
Copy link

ratteripenta commented Jul 28, 2020

Looking at the code at NeuralNet.fit_loop-function:

epochs = epochs if epochs is not None else self.max_epochs

From what I understand, the max_epochs is actually used in a once-in-fit fashion. With early stopping and continued training however what I'd expect would be that the model is trained only for max_epochs regardless.

In other words and my opinion, the training should not proceed further than what has been defined by max_epochs (i.e. net.history[-1].epoch should never exceed net.max_epochs).

@thomasjpfan
Copy link
Member

Thank you for raising this issue!

I agree the current behavior of max_epochs is strange. It is not consistent with sklearn, for example:

from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
X, y = make_regression(n_features=4, n_informative=2,
                       random_state=0, shuffle=False)
regr = RandomForestRegressor(n_estimators=100,
                             max_depth=2, random_state=0,
                             warm_start=True)
regr.fit(X, y)
len(regr.estimators_)
# 100

regr.fit(X, y)
# UserWarning: Warm-start fitting without increasing n_estimators does not fit new trees.

@ratteripenta
Copy link
Author

Would changing this behavior break something somewhere else?

skorch/skorch/net.py

Lines 714 to 724 in 7a84568

epochs = epochs if epochs is not None else self.max_epochs
dataset_train, dataset_valid = self.get_split_datasets(
X, y, **fit_params)
on_epoch_kwargs = {
'dataset_train': dataset_train,
'dataset_valid': dataset_valid,
}
for _ in range(epochs):
self.notify('on_epoch_begin', **on_epoch_kwargs)

I'm thinking in the lines of making the iteration check against the history attribute for the last epoch number. But then again, it seems to be modified via an event listener and would be more on the side of "implied and expected" rather than clearly written out code.

With that said, the fit_loop calls out the on_epoch_begin of the NeuralNet, which in turn assigns a new epoch.

What are your thoughts on this?

@thomasjpfan
Copy link
Member

thomasjpfan commented Jul 28, 2020

This change is technically possible.

Since this is a backward incompatible change, we have to decide if it is worth a deprecation cycle.

@BenjaminBossan
Copy link
Collaborator

Yes, we have a few divergences here from sklearn. On the one hand, as you said, the epoch counter in the fit loop will always start from 0, even if you've already cycled through a few epochs. Relatedly, calling partial_fit also goes through max_epoch epochs, instead of just 1.

I think in this case, we should think pragmatically. A few design choices in skorch diverge from sklearn because training a neural net often is a different beast than training a "regular" estimator.

Personally, the most common use case I can think of where this behavior matters is when I'm still in an exploratory phase. Training the net for a few epochs, then cancelling with KeyboardInterrupt, checking stuff, then continue training. At that point, I would be annoyed if the net stopped training at some point because I "exhausted" my max_epochs (same goes for only fitting one epoch with partial_fit). However, this can be circumvented by increasing max_epochs, so it's not a big deal, just annoying.

So before making a change and going through an expensive deprecation cycle, as Thomas mentioned, we should collect the most relevant use cases and determine how a change would affect them. In this specific case, I believe ergonomics should trump sklearn compatibility.

@ratteripenta
Copy link
Author

As odd as it might sound to you guys, I'm actually not after sklearn-compatibility here. The idea of having a parameter named as max_epochs which is actually used in the manner of fit_epochs at best is what seems inconsistent to me. @BenjaminBossan I can see what you mean with exploration, but that would be using the system disregarding what the parameters seem to stand for at face value - you know how it works despite a bit misleading parameter names.

But I really appreciate how you guys managed to get to this issue so fast! I hope you discuss this through and are able to determine what's best for the library w.r.t. this.

@githubnemo
Copy link
Contributor

@karmus89 Can you elaborate which problem you thought max_epochs was solving for you when you found out that it didn't? I think that perspective would add value to the discussion. Without said insight we might just as well rename the parameter to fit_epochs and be done with it but I imagine that this is not what we are really after?

@ratteripenta
Copy link
Author

@githubnemo As indicated by earlier discussion,I expected max_epochs to be a training session scope variable. Now it is just a variable used to set a maximum length of a single fit, regardless of how many epochs the model has already been trained for. Thus, fit_epochs is consistent with how the current implementation works.

@BenjaminBossan
Copy link
Collaborator

I agree that the name can be confusing in this context. I'm open to discuss renaming/changing the implementation, but backwards compatibility is a big factor here. The benefit needs to be big enough compared to the potential cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants