Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何获取ensemble后的最好模型参数 #106

Open
kilig000123 opened this issue Apr 25, 2024 · 3 comments
Open

如何获取ensemble后的最好模型参数 #106

kilig000123 opened this issue Apr 25, 2024 · 3 comments

Comments

@kilig000123
Copy link

使用hypergbm后,我的数据集在一些指标上有了大幅提升,但是我想知道在experiment跑完后聚合出来的最优结果的具体模型,想知道其的具体使用到了什么模型以及详细参数,我目前只找到了聚合后的weight与score。
如果有方法请告诉我,这对我理解模型十分重要

@lixfz
Copy link
Collaborator

lixfz commented Apr 26, 2024

如果你已经能够找到weight与score的话,具体每个模型的信息也就很容易了。

可以参考以下示例:

#estimator = = experiment.run()
ensembled = estimator.steps[-1][-1]
weights= ensembled.weights_
models = ensembled.estimators

for i,(w,m) in enumerate(zip(weights,models)):
    if m is not None:
        print('-'*30)
        print(i, w, m)

输出如下:

------------------------------
0 0.55 HyperGBMEstimator(task=binary, reward_metric=precision, cv=True,
data_pipeline: DataFrameMapper(df_out=True,
                df_out_dtype_transforms=[(ColumnSelector(include:['object', 'string']),
                                          'int')],
                features=[(ColumnSelector(include:['object', 'string', 'category', 'bool']),
                           Pipeline(steps=[('categorical_imputer_0',
                                            SafeSimpleImputer(strategy='constant')),
                                           ('categorical_label_encoder_0',
                                            MultiLabelEncoder())])),
                          (ColumnSelector(include:number, exclude:timedelta),
                           Pipeline(steps=[('numeric_imputer_0',
                                            FloatOutputImputer(strategy='median')),
                                           ('numeric_log_standard_scaler_0',
                                            LogStandardScaler())]))],
                input_df=True)
gbm_model: CatBoostClassifierWrapper(learning_rate=0.5, depth=10, l2_leaf_reg=20, silent=True, n_estimators=200, random_state=55954, eval_metric='Precision')
)
------------------------------
4 0.4 HyperGBMEstimator(task=binary, reward_metric=precision, cv=True,
data_pipeline: DataFrameMapper(df_out=True,
                df_out_dtype_transforms=[(ColumnSelector(include:['object', 'string']),
                                          'int')],
                features=[(ColumnSelector(include:['object', 'string', 'category', 'bool']),
                           Pipeline(steps=[('categorical_imputer_0',
                                            SafeSimpleImputer(strategy='constant')),
                                           ('categorical_label_encoder_0',
                                            MultiLabelEncoder())])),
                          (ColumnSelector(include:number, exclude:timedelta),
                           Pipeline(steps=[('numeric_imputer_0',
                                            FloatOutputImputer(strategy='median')),
                                           ('numeric_robust_scaler_0',
                                            RobustScaler())]))],
                input_df=True)
gbm_model: LGBMClassifierWrapper(boosting_type='goss', early_stopping_rounds=10,
                      learning_rate=0.5, max_depth=5, n_estimators=200,
                      num_leaves=440, random_state=58258, reg_alpha=10,
                      reg_lambda=0.5, verbosity=-1)
)
------------------------------
...

@kilig000123
Copy link
Author

我在这样的信息里面看到了categorical_label_encoder_0,这种类别编码方式具体是什么呢?

@lixfz
Copy link
Collaborator

lixfz commented Apr 28, 2024

HyperGBM实现的是从预处理到模型训练的全链路优化,categorical_label_encoder_0是对categorical数据进行的预处理,以我上面示例的输出为例:

image

此时对categorical进行编码采用的是MultiLabelEncoder,MultiLabelEncoder来自于Hypernet对LabelEncoder的封装。

关于HyperGBM缺省的优化空间的详细定义可参考源代码 search_space.pysklearn_ops.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants