added datapoint for a small dataset #1249

levscaut · 2023-10-19T06:00:22Z

Why are these changes needed?

Currently default LGBMClassifer does not have a datapoint for small dataset that generated from code snippet below:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
 
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=.5)

Applying LGBMClassifer to this dataset will have low performance than expected:

from flaml.default import LGBMClassifier
lgbm = LGBMClassifier().fit(X_train, y_train)
lgbm.score(X_test, y_test)

Hence, I added a datapoint which has metafeature of this dataset to LGBMClassifer default config. But I'm not sure if this new datapoint will break other scenario using default LGBMClassifer. I have passed test/default on my PC.

According to my test, XGBClassifer and RandomForestClassifer don't have this issue. They perform well in this dataset.

Related issue number

#1247

Checks

I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

sonichi · 2023-10-21T01:56:57Z

The change is simple. Some performance test will be needed to approve this PR.

levscaut · 2023-10-21T11:37:35Z

The change is simple. Some performance test will be needed to approve this PR.

Thanks for review! I'm happy to help running performance test if needed.

amueller · 2023-10-23T18:40:18Z

@sonichi how were the original 5 selected? I did some similar works a couple of years ago and I used a greedy approximation because I was going for a ranking, not a hard subset. Did you use a larger benchmark set and use some partitioning of the space or some integer programming to get to the subset?

I can run my benchmark suite on the branch and see if it helps improve accuracy on the datasets I'm looking at. But we probably also want to run against whatever system / benchmark you originally used.
Maybe at least running against the AutoML benchmark?

sonichi · 2023-10-24T00:54:45Z

@sonichi how were the original 5 selected? I did some similar works a couple of years ago and I used a greedy approximation because I was going for a ranking, not a hard subset. Did you use a larger benchmark set and use some partitioning of the space or some integer programming to get to the subset?

I can run my benchmark suite on the branch and see if it helps improve accuracy on the datasets I'm looking at. But we probably also want to run against whatever system / benchmark you originally used. Maybe at least running against the AutoML benchmark?

I used a new greedy algorithm in the zero-shot AutoML paper to select the portfolio from a large set of candidate configurations.
I agree that we shoul run against the AutoML benchmark for the multiclass tasks.

amueller · 2023-10-24T01:02:04Z

Can you please link the paper, I'm not sure which one that is.

sonichi · 2023-10-24T01:06:07Z

Mining Robust Default Configurations for Resource-constrained AutoML.

amueller · 2023-10-24T18:38:21Z

With the PR, the model doesn't fail catastrophically on my benchmark (a subset of OpenML CC-18 with a 50/50 train/test split and 10 fold cross-validation) anymore, but it's still not competitive. I assume making it perform well would at least require running the greedy algorithm again on an expanded benchmark. Let me check the paper for details.

amueller · 2023-10-24T19:33:58Z

Ok looks like the portfolio building is mostly the same as in my work and in autosklearn 2.0 apart from some minor differences, and the use of meta-features instead of just iterating through configurations. I'm somewhat surprised by how well the meta-feature based zero-shot works tbh, very cool!

levscaut · 2023-10-25T12:33:34Z

Is the code for profile mining process still in the repo? I can help with the experiment if we have more specific information like experiment code or which extra datasets to include

sonichi · 2023-10-26T23:19:50Z

The instructions are at: https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML#how-to-prepare-offline
@mkyl for awareness.

amueller · 2023-10-30T18:19:59Z

The list of datasets used for the original work is in the paper. I'm not 100% sure about how datasets were selected for the AutoML benchmark vs the cc-18 (which is what I'm using). Also I'm currently using a somewhat non-standard splitting strategy that splits data 50/50.
It might be interesting to also include subsampled versions of the datasets into the initial pool to broaden the space of datasets that are explored, but that very much feels like a research question that goes beyond a simple PR.

levscaut · 2023-10-31T11:36:38Z

Agreed on that, running over the whole benchmark is a little overwork for this PR. It will be great if there is a simple performance test to check how this new point is affecting exisiting flaml default usage, and then we could decide to keep or discard this change.

sonichi · 2023-11-01T05:24:12Z

Agreed on that, running over the whole benchmark is a little overwork for this PR. It will be great if there is a simple performance test to check how this new point is affecting exisiting flaml default usage, and then we could decide to keep or discard this change.

There are around 15 multi-class tasks in the benchmark, which is manageable to run just the default.lightgbm before and after. We can merge if the performance doesn't degrade. Likely the performance wouldn't change because the added dataset is not similar to them.

levscaut · 2023-11-06T11:12:01Z

I've been digging into the zero-shot paper for experiment details. I managed to selected all the multiclass task from the paper, following the 10-fold evaluation, using default LGBM, and I get the following result before and after this datapoint is added:

	mean_score_old	mean_score_new	duration(minutes)
car	0.901042	0.901042	1.61618
cnae-9	0.853704	0.853704	12.0663
fabert	0.712031	0.712031	76.7194
mfeat-factors	0.9695	0.9695	17.4963
segment	0.944589	0.944589	3.97898
vehicle	0.781373	0.781373	1.91733
connect-4	0.649939	0.649939	7.5523
Fashion-MNIST	0.903129	0.903129	698.402
Helena	0.0657854	0.055831	251.748
Jannis	0.71568	0.71568	39.1677
jungle_chess_2pcs_raw_endgame_complete	0.679398	0.679398	3.86058
Shuttle	0.885672	0.828362	8.92685
Volkert	0.688921	0.688921	142.019
Covertype	0.609783	0.609783	100.821
Dionis	0.17365	0.172989	4770.91
dilbert	0.9894	0.9894	735.268
Robert	0.5194	0.5194	7950.23

Unfortunately, it appears that this particular datapoint does impact and slightly diminish performance for a subset of datasets. I will investigate whether I can adjust this datapoint to prevent any negative effects on current tasks.

amueller · 2023-11-06T18:16:21Z

I'm not sure if doing tweaks based on such a small number of tasks will be very robust. You don't have any other dataset to confirm that any additional changes generalize, right? So I feel you're likely to overfit to these three tasks you just identified.

added datapoint for a small dataset

4e2809a

levscaut requested review from sonichi and thinkall October 19, 2023 06:00

levscaut requested a review from amueller October 21, 2023 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added datapoint for a small dataset #1249

added datapoint for a small dataset #1249

levscaut commented Oct 19, 2023 •

edited

sonichi commented Oct 21, 2023

levscaut commented Oct 21, 2023

amueller commented Oct 23, 2023

sonichi commented Oct 24, 2023

amueller commented Oct 24, 2023

sonichi commented Oct 24, 2023

amueller commented Oct 24, 2023

amueller commented Oct 24, 2023

levscaut commented Oct 25, 2023

sonichi commented Oct 26, 2023

amueller commented Oct 30, 2023

levscaut commented Oct 31, 2023

sonichi commented Nov 1, 2023

levscaut commented Nov 6, 2023 •

edited

amueller commented Nov 6, 2023

added datapoint for a small dataset #1249

Are you sure you want to change the base?

added datapoint for a small dataset #1249

Conversation

levscaut commented Oct 19, 2023 • edited

Why are these changes needed?

Related issue number

Checks

sonichi commented Oct 21, 2023

levscaut commented Oct 21, 2023

amueller commented Oct 23, 2023

sonichi commented Oct 24, 2023

amueller commented Oct 24, 2023

sonichi commented Oct 24, 2023

amueller commented Oct 24, 2023

amueller commented Oct 24, 2023

levscaut commented Oct 25, 2023

sonichi commented Oct 26, 2023

amueller commented Oct 30, 2023

levscaut commented Oct 31, 2023

sonichi commented Nov 1, 2023

levscaut commented Nov 6, 2023 • edited

amueller commented Nov 6, 2023

levscaut commented Oct 19, 2023 •

edited

levscaut commented Nov 6, 2023 •

edited