Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightgbm trains much slower than catboost. #6456

Open
fengshansi opened this issue May 16, 2024 · 13 comments
Open

Lightgbm trains much slower than catboost. #6456

fengshansi opened this issue May 16, 2024 · 13 comments
Labels

Comments

@fengshansi
Copy link

On Ubuntu 22.04.2 LTS,python version 3.11.4,lightgbm 4.3.0。
Data size is 3000.
params

{
        "boosting_type": "gbdt",  
        "objective": "binary",  
        "verbose": -1,
        "n_jobs": -1,
        "device": "cpu",
        "random_state": 1,
        "metric": "None",
        "learning_rate": 0.03,
    }

feval:

def weighted_f1_score(preds, train_data):
    labels = train_data.get_label()
    preds_binary = (preds > 0.5).astype(int)  
    f1 = f1_score(labels, preds_binary, average="weighted")
    return "weighted_f1_score", f1, True

I use several categorical_features.

lightgbm.train(
        params=params,
        train_set=train_dataset,
        num_boost_round=iterations, 
        feval=feval, 
        categorical_feature=current_cat_feature,
        callbacks=[ lightgbm.early_stopping(50, first_metric_only=False),lightgbm.log_evaluation(period=20, show_stdv=True),],
  
    )

I need 5 minutes to train, much slower than catboost.

@jmoralez
Copy link
Collaborator

Hey @fengshansi, thanks for using LightGBM. Unfortunately, this isn't enough information, we'd also need the following:

  • How many iterations are you running?
  • At which iteration is LightGBM stopping?
  • At which iteration is CatBoost stopping?
  • Which parameters are you using for catboost?
  • How many features do you have?
  • Are you also using your custom metric in catboost?

For 3,000 samples 5 minutes sounds like a lot so I'm guessing your custom metric is being the bottleneck here but it's very hard to tell with just this information.

@fengshansi
Copy link
Author

fengshansi commented May 16, 2024

Thank you for your help.
My answer is:

  • The max iteration is 300. I use early stop of 60. Both of lightgbm and catboost.
  • LightGBM stop at 90. So it runs 150 iterations.
  • Catboost actually, I use optuna for 50 trail to search paramenters. Even run 50 times, it use 1 minute 3seconds.
    When I run it for one time of 70 iterations without earlystopping. It use 0.2 seconds.
    All the time get from jupyternotebook.
  • params of 50 times is
"learning_rate": trial.suggest_float("learning_rate", 0.001, 0.1, log=True),
    "depth": trial.suggest_int("depth", 1, 10),
    "subsample": trial.suggest_float("subsample", 0.05, 1.0),
    "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.05, 1.0),
    "min_data_in_leaf": trial.suggest_int("min_data_in_leaf", 1, 100),
  • 10 category features and 10 numeric features
  • I don't use custom metric in catboost. Catboost offers macro f1.

@jmoralez
Copy link
Collaborator

How long does it take if you remove your custom metric?

@shiyu1994
Copy link
Collaborator

Thanks for using LightGBM. Could you also provide information about how catboost is used? In my experience, the speed of catboost varies a lot depending on the tree structure you select and the boosting mode. These choices often make trade-offs between speed and performance.

@fengshansi
Copy link
Author

fengshansi commented May 16, 2024

删除自定义指标需要多长时间?

Also 5 minutes. I use metric= "binary_logloss".

@fengshansi
Copy link
Author

感谢您使用LightGBM。您能否提供有关如何使用 catboost 的信息?根据我的经验,catboost 的速度会根据您选择的树结构和提升模式而有很大差异。这些选择通常会在速度和性能之间做出权衡。

{
"iterations": 300,
"learning_rate": 0.07116892811065063,
"depth": 5,
"loss_function": "Logloss",
"verbose": 20,
"eval_metric": "TotalF1:average=Macro",
"subsample": 0.2697512982046929,
"colsample_bylevel": 0.932255235452595,
"early_stopping_rounds": 60,
"min_data_in_leaf": 98,
}
I use a CatBoostClassifier.

@mayer79
Copy link
Contributor

mayer79 commented May 27, 2024

Without data and working code, I fear we are stuck here.

@fengshansi
Copy link
Author

Here is code and data https://github.com/fengshansi/lgbm_compare.

@jmoralez
Copy link
Collaborator

@fengshansi can you try using the same parameters in both? For example you're setting 0.3 as the learning rate for LightGBM and 0.7 for CatBoost, which should converge faster. Also the default leaves in LightGBM is 31 and you're using a depth of 6 in CatBoost, which produces 64 leaves.

@mayer79
Copy link
Contributor

mayer79 commented Jun 1, 2024

@fengshansi : On my laptop (8 threads), running your two notebooks gives:

LightGBM

image

CatBoost

image

Thus, LightGBM is 4-5 times faster (using pip install)

@fengshansi
Copy link
Author

:在我的笔记本电脑(8 个线程)上,运行您的两个笔记本可以:

光GBM

image

CatBoost 升压

image

因此,LightGBM 的速度要快 4-5 倍(使用 pip install)

Unbelievable, my lightgbm takes nearly 5 minutes to run

@mayer79
Copy link
Contributor

mayer79 commented Jun 1, 2024

Ooops :-). I have reset the notebook kernels before running each of them.

@fengshansi
Copy link
Author

哎呀:-)。在运行每个笔记本内核之前,我已经重置了它们。

I reinstalled lightgbm. But still very slow. With Python 3.11.4 and lightgbm 4.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants