Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling dtypes #316

Open
lcrmorin opened this issue May 5, 2024 · 2 comments
Open

Better handling dtypes #316

lcrmorin opened this issue May 5, 2024 · 2 comments
Labels
question Further information is requested
Projects
Milestone

Comments

@lcrmorin
Copy link

lcrmorin commented May 5, 2024

For the moment the data type need to be provided manually, accepting 'numerical' or 'categorical', with default being 'numerical'.

For quality of life I would suggest:

  • inferring type from the data
  • setting inference as default behaviour
  • recognising standard pandas dtypes for data type so that we can use them directly by providing X[columns].dtype

Any thoughts on these proposals ?

@guillermo-navas-palencia
Copy link
Owner

Hi @lcrmorin.

There is a good reason to avoid inferring types directly (although this is done in BinningProcess for obvious reasons). The main problem occurs when dealing with integer variables, there is no automatic process to distinguish between ordinal and categorical.

@guillermo-navas-palencia guillermo-navas-palencia added this to the Backlog milestone May 6, 2024
@guillermo-navas-palencia guillermo-navas-palencia added the question Further information is requested label May 6, 2024
@lcrmorin
Copy link
Author

lcrmorin commented May 6, 2024

I feel like:

  • Most ML algo would treat int as numerical.
  • Such a change would help the majority of people, while the edge case of encoding categorical as integers concerns a lot less people.
  • If you are encoding categorical as integers maybe that is on you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
ToDo
  
Awaiting triage
Development

No branches or pull requests

2 participants