Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add possibility to have a callable for verbose_feature_names_out of ColumnTransformer #28934

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MarcBresson
Copy link

@MarcBresson MarcBresson commented May 2, 2024

What does this implement?

This brings the possibility to pass a callable to the verbose_feature_names_out parameter of ColumnTransformer. Instead of the new feature name being "transormer_name__feature_name", we could have "feature_name$this is amazing$TRANSFORMER_NAME".

Any other comments?

I have a few questions:

  • How do you type sklearn as there are no stub files ?
  • What is the version number I should put in .. versionchanged ?

In advance, thank you for your time.

Copy link

github-actions bot commented May 2, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.4.4.


sklearn/compose/_column_transformer.py:10:1: I001 [*] Import block is un-sorted or un-formatted
   |
 8 |   #         Joris Van den Bossche
 9 |   # License: BSD
10 | / import warnings
11 | | from collections import Counter, UserList
12 | | from itertools import chain
13 | | from functools import partial
14 | | from numbers import Integral, Real
15 | | 
16 | | import numpy as np
17 | | from scipy import sparse
18 | | 
19 | | from ..base import TransformerMixin, _fit_context, clone
20 | | from ..pipeline import _fit_transform_one, _name_estimators, _transform_one
21 | | from ..preprocessing import FunctionTransformer
22 | | from ..utils import Bunch
23 | | from ..utils._estimator_html_repr import _VisualBlock
24 | | from ..utils._indexing import _determine_key_type, _get_column_indices
25 | | from ..utils._metadata_requests import METHODS
26 | | from ..utils._param_validation import HasMethods, Hidden, Interval, StrOptions
27 | | from ..utils._set_output import (
28 | |     _get_container_adapter,
29 | |     _get_output_config,
30 | |     _safe_set_output,
31 | | )
32 | | from ..utils.metadata_routing import (
33 | |     MetadataRouter,
34 | |     MethodMapping,
35 | |     _raise_for_params,
36 | |     _routing_enabled,
37 | |     process_routing,
38 | | )
39 | | from ..utils.metaestimators import _BaseComposition
40 | | from ..utils.parallel import Parallel, delayed
41 | | from ..utils.validation import (
42 | |     _check_feature_names_in,
43 | |     _get_feature_names,
44 | |     _is_pandas_df,
45 | |     _num_samples,
46 | |     check_array,
47 | |     check_is_fitted,
48 | | )
49 | | 
50 | | __all__ = ["ColumnTransformer", "make_column_transformer", "make_column_selector"]
   | |_^ I001
   |
   = help: Organize imports

Found 1 error.
[*] 1 fixable with the `--fix` option.

Generated for commit: fe4f428. Link to the linter CI: here

@adrinjalali
Copy link
Member

cc @thomasjpfan

pretty sure we had the conversation about this at some point.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

sklearn/compose/_column_transformer.py Show resolved Hide resolved
@MarcBresson
Copy link
Author

MarcBresson commented May 13, 2024

Hey,

just added the possibility to use string format. I also added unit tests.

One of the test I created creates name clashing, and I remember seeing something about that somewhere in sklearn's doc. However, I can't seem to put my hand on it.

From the behaviour of ColumnTransformer, I know it is not important since it uses np arrays rather than pandas dataframe, but do you think we should issue a warning somewhere in the doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants