Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-15991: Infogram pydocs updates #15992

Open
wants to merge 7 commits into
base: rel-3.46.0
Choose a base branch
from
Open

Conversation

shaunyogeshwaran
Copy link
Contributor

@shaunyogeshwaran shaunyogeshwaran commented Dec 28, 2023

fixes #15991
Crafted new examples to showcase the different parameters.

@shaunyogeshwaran shaunyogeshwaran added this to the 3.44.0.4 milestone Dec 28, 2023
@shaunyogeshwaran shaunyogeshwaran self-assigned this Dec 28, 2023
@shaunyogeshwaran shaunyogeshwaran linked an issue Jan 3, 2024 that may be closed by this pull request
@shaunyogeshwaran shaunyogeshwaran changed the base branch from master to rel-3.44.0 January 3, 2024 07:34
@hannah-tillman hannah-tillman changed the title [DOCS] Infogram pydocs updates GH-15991: Infogram pydocs updates Jan 5, 2024
hannah-tillman
hannah-tillman previously approved these changes Jan 12, 2024
@mn-mikke mn-mikke modified the milestones: 3.44.0.4, 3.46.0.2 Mar 4, 2024
tomasfryda
tomasfryda previously approved these changes Mar 5, 2024
Copy link
Contributor

@tomasfryda tomasfryda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @shaunyogeshwaran !

>>> pcols = ["SEX", "MARRIAGE", "AGE"]
>>> ig = H2OInfogram(protected_columns=pcols)
>>> ig.train(y=y, x=x, training_frame=train)
>>> ig.plot()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is example for algorithm_params I would expect call of ig.algorithm_params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong here. Its a example for data_fraction so I would expect. to call data_fractions..

So just call: ig.data_fraction

@@ -745,6 +762,23 @@ def total_information_threshold(self):
information is the x-axis of the Core Infogram. Default is -1 which gets set to 0.1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope but I don't understand Default is -1 which gets set to 0.1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the first line is not correct since the default is -0.1.

A number between 0 and 1 representing a threshold for total information, defaulting to 0.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendycwong any ideas about what's going on with the default values here?

(I can update the schema and fix the first line issue)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannah-tillman I understand it now.

It comes from here:

https://github.com/h2oai/h2o-3/blob/master/h2o-admissibleml/src/main/java/hex/Infogram/Infogram.java#L185-L187

@wendycwong any reason why we not set it directly here?

Nevertheless, its out of scope of this PR. @shaunyogeshwaran.

>>> x = train.columns
>>> x.remove(y)
>>> pcols = ["SEX", "MARRIAGE", "AGE"]
>>> ig = H2OInfogram(protected_columns=pcols, data_fraction=0.7)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work the same as split_frame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -768,6 +802,23 @@ def net_information_threshold(self):
the y-axis of the Core Infogram. Default is -1 which gets set to 0.1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. First line says number between 0-1,... Default -1 sets to 0.1..

@@ -816,6 +884,23 @@ def safety_index_threshold(self):
gets set to 0.1.

Type: ``float``, defaults to ``-1.0``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. First line says number between 0-1,... Default -1 sets to 0.1..

@@ -792,6 +843,23 @@ def relevance_index_threshold(self):
which gets set to 0.1.

Type: ``float``, defaults to ``-1.0``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. First line says number between 0-1,... Default -1 sets to 0.1..

Copy link
Collaborator

@valenad1 valenad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the examples works, but I would use different number than default to show how the parameter change the result, and I would rather avoid any unused stuff in the example. So the test should be used for some metric be deleted from example

Copy link
Collaborator

@valenad1 valenad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @shaunyogeshwaran. Can you please run the all the examples ones again. E.g this will fail for sure.

@valenad1
Copy link
Collaborator

And please merge it to re-3.46.0, the rel-3.44.0 is not used anymore.

@valenad1
Copy link
Collaborator

@wendycwong

import h2o
from h2o.estimators.infogram import H2OInfogram
h2o.init()
f = "https://erin-data.s3.amazonaws.com/admissible/data/taiwan_credit_card_uci.csv"
col_types = {'SEX': "enum", 'MARRIAGE': "enum", 'default_payment_next_month': "enum"}
df = h2o.import_file(path=f, col_types=col_types)
y = "default_payment_next_month"
x = df.columns
x.remove(y)
gbm_params = {'ntrees':3}
pcols = ["SEX", "MARRIAGE", "AGE"]
ig = H2OInfogram(protected_columns=pcols)
ig.train(y=y, x=x, training_frame=df, algorithm_params=gbm_params)
ig.algorithm_params

gives me:

TypeError                                 Traceback (most recent call last)
/var/folders/2z/9gdhqbns0djdj9crb242sgc00000gn/T/ipykernel_9099/4280293040.py in <cell line: 13>()
     11 pcols = ["SEX", "MARRIAGE", "AGE"]
     12 ig = H2OInfogram(protected_columns=pcols)
---> 13 ig.train(y=y, x=x, training_frame=df, algorithm_params=gbm_params)
     14 ig.algorithm_params

~/h2o-3/_valenad-notebooks/../h2o-py/build/main/h2o/estimators/infogram.py in train(self, x, y, training_frame, verbose, **kwargs)
   1202                                                                                    " and <= 1."
   1203 
-> 1204         parms = sup._make_parms(x,y,training_frame, extend_parms_fn = extend_parms, **kwargs)
   1205 
   1206         sup._train(parms, verbose=verbose)

TypeError: _make_parms() got an unexpected keyword argument 'algorithm_params'.

Ticket here.

@valenad1 valenad1 modified the milestones: 3.46.0.2, 3.46.0.3 May 17, 2024
@shaunyogeshwaran shaunyogeshwaran changed the base branch from rel-3.44.0 to rel-3.46.0 May 20, 2024 07:44
@shaunyogeshwaran shaunyogeshwaran dismissed stale reviews from tomasfryda and hannah-tillman May 20, 2024 07:44

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infogram pydocs updates
5 participants