Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base function to check if the model is a clusterer (analogous to base.is_classifier() and base.is_regressor())? #28960

Closed
adrinjalali opened this issue May 6, 2024 Discussed in #28904 · 4 comments · Fixed by #28936

Comments

@adrinjalali
Copy link
Member

Discussed in #28904

Originally posted by aoot April 26, 2024
According to the note on figuring out the model type, it is recommended to use sklearn.base.is_classifier() or sklearn.base.is_regressor() function to check instead of of checking the attribute _estimator_type directly.

However, since the attribute _estimator_type can be either "classifier", "regressor", and "clusterer", are there any base function such as sklearn.base.is_clusterer() to check if the model is a clusterer?

Thanks for your input!

#28936 is an effort to fix this. Not sure what to think of it.

@jeremiedbb
Copy link
Member

It makes sense and in the same time I'm not sure if we want to move in that direction

  • there are more types of estimators (decomposition, outlier detectors, density estimators, ...). Do we want to do it for these as well ?
  • I thought that at some point we wanted to deprecate estimator_type if favor of tags. Is it still the case (I think we should) ?

@jeremiedbb jeremiedbb added New Feature and removed Needs Triage Issue requires triage labels May 6, 2024
@adrinjalali
Copy link
Member Author

I don't mind deprecating _estimator_type and adding it to tags. But that's independent of having a helper function to check if an estimator is a classifier, regressor, or a clusterer. I think we don't have to cover all cases with these helper functions. We only need the ones most commonly used.

@ChVeen
Copy link
Contributor

ChVeen commented May 7, 2024

For the estimator types not so commonly used, one could imagine a more generic function like

def is_estimator_type(estimator, expected_type: str):
    return getattr(estimator, "_estimator_type", None) == expected_type

in order to check for a given category.

@adrinjalali
Copy link
Member Author

What we're planning to do would be more like:

get_tags(estimator)["estimator_type"] == expected_type

aazuspan added a commit to lemma-osu/sknnr-spatial that referenced this issue May 17, 2024
This fix allows fitting unsupervised estimators with the assumption that
they will always predict to shape (n_samples,).

Output dtype is now determined based on the `_estimator_type` attribute.
This is likely a temporary solution as `_estimator_type` is planned for
deprecation in favor of tags and explicit estimator type checking
functions, but neither of those solutions are fully implemented yet.

See scikit-learn/scikit-learn#28960
aazuspan added a commit to lemma-osu/sknnr-spatial that referenced this issue May 17, 2024
This fix allows fitting unsupervised estimators with the assumption that
they will always predict to shape (n_samples,).

Output dtype is now determined based on the `_estimator_type` attribute.
This is likely a temporary solution as `_estimator_type` is planned for
deprecation in favor of tags and explicit estimator type checking
functions, but neither of those solutions are fully implemented yet.

See scikit-learn/scikit-learn#28960
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants