Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosened to dist <= stop_thresh to converge in on 1D constant data #28951

Merged
merged 12 commits into from May 17, 2024

Conversation

akikuno
Copy link
Contributor

@akikuno akikuno commented May 5, 2024

Reference Issues/PRs

Discussed #28926

What does this implement/fix? Explain your changes.

As @ogrisel suggested, I implemented the condition dist <= stop_thresh in the _mean_shift_single_seed function to address the issue of MeanShift failing to converge on 1D constant data within 300 iterations.

Any other comments?

Copy link

github-actions bot commented May 5, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d8e3889. Link to the linter CI: here

@jeremiedbb
Copy link
Member

Thanks for the PR @akikuno. Please add a non regression test in test_mean_shift.py and an entry in the v1.5.rst changelog

sklearn/cluster/tests/test_mean_shift.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_mean_shift.py Outdated Show resolved Hide resolved
# Test convergence using 2D constant data
x = np.concatenate([np.zeros((10, 10)), np.ones((10, 10))])
n_iter = MeanShift().fit(x).n_iter_
assert n_iter < 300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the 2d case. The 1d case is enough as non-regression test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel
Thank you so much for all your guidance! I have learnt a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove the 2d case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akikuno This comment has not been addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel
Sorry, I mistakenly thought it had already been changed.
I have now removed the 2d case.

akikuno and others added 3 commits May 7, 2024 08:39
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Comment on lines 46 to 48
- |Efficiency| The `clustering.MeanShift` class has now improved computational speed as it properly converges for constant data.
:pr:`28951` by :user:`Akihiro Kuno <akikuno>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider it a bug and move that in the cluster section of the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremiedbb
Thanks for your updates! I have moved the log to the sklearn.cluster section.

doc/whats_new/v1.5.rst Outdated Show resolved Hide resolved
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks @akikuno

@jeremiedbb jeremiedbb added this to the 1.5 milestone May 13, 2024
@jeremiedbb jeremiedbb merged commit e796d0a into scikit-learn:main May 17, 2024
30 checks passed
@akikuno akikuno deleted the feature/meanshift-stop_thresh branch May 18, 2024 03:23
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request May 20, 2024
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@jeremiedbb jeremiedbb mentioned this pull request May 20, 2024
14 tasks
jeremiedbb added a commit that referenced this pull request May 21, 2024
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants