Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monotone_constraints not working with xgb.regressor (python) #10249

Closed
niltecedu opened this issue May 2, 2024 · 8 comments
Closed

monotone_constraints not working with xgb.regressor (python) #10249

niltecedu opened this issue May 2, 2024 · 8 comments

Comments

@niltecedu
Copy link

niltecedu commented May 2, 2024

Environment Info:
OS=Windows 10/11
Python version=3.11
Xgboost version=2.0.3
pandas = "2.2.0"

Summary:
Trying to add monotonic constraints to my forecasts but its causing some issues, I have tried out all versions from 2.0.0-2.0.3; tried training the model on GPU as well CPU.

This is what I get when I try to predict:

Error:
xgboost.core.XGBoostError: Invalid Parameter format for monotone_constraints expect but value='{'wind_speed_10m:ms_53.77_1.702': 1, 'wind_speed_10m:ms_53.84_1.767': 1, 'wind_speed_10m:ms_53.9_1.832': 1, 'wind_speed_10m:ms_53.97_1.897': 1, 'wind_speed_10m:ms_54.03_1.962': 1, 'wind_speed_10m:ms_54.1_2.027': 1, 'wind_speed_100m:ms_53.77_1.702': 1, 'wind_speed_100m:ms_53.84_1.767': 1, 'wind_speed_100m:ms_53.9_1.832': 1, 'wind_speed_100m:ms_53.97_1.897': 1, 'wind_speed_100m:ms_54.03_1.962': 1, 'wind_speed_100m:ms_54.1_2.027': 1, 'windlimit': 1, 'relative_humidity_2m:p_53.77_1.702': -1, 'relative_humidity_2m:p_53.84_1.767': -1, 'relative_humidity_2m:p_53.9_1.832': -1, 'relative_humidity_2m:p_53.97_1.897': -1, 'relative_humidity_2m:p_54.03_1.962': -1, 'relative_humidity_2m:p_54.1_2.027': -1}'

The model is saved and loaded with joblib dump and load.

Codesnippt for fitting:


alpha = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

 xbg_regressor_wind = xgb.XGBRegressor(
        objective="reg:quantileerror",
        quantile_alpha=alpha,
        # n_jobs=1,
        device=main_device,
        colsample_bytree=0.8,
        gamma=0.3,
        learning_rate=0.02,
        max_depth=4,
        n_estimators=800,
        subsample=0.6,
        min_child_weight=3,
        monotone_constraints=data_loader.create_wind_mono_cst(wind_x_df)
    )

Codesnippted for predicting


wind_model_path = os.path.join(
        ieeeutils.get_data_path_from_env(),
        "models",
        "model_Wind_MWh_credit_xgboost.pickle",
    )
    wind_model = load(wind_model_path)
    wind_model.set_params(device="cpu")


    for quantile in tqdm(range(10, 100, 10)):
        # Load wind and predicting it first

        # Redeclaring the variables, WHAT COULD GO WRONG?
        wind_x_df, wind_y_df = ieeeutils.split_x_and_y(
            wind_predict_data_dict[quantile], "Wind_MWh_credit"
        )
        solar_x_df, solar_y_df = ieeeutils.split_x_and_y(
            solar_predict_data_dict[quantile], "Solar_MWh_credit"
        )

        scores_wind = wind_model.predict(wind_x_df)
        modelling_table_wind[f"q{quantile}_Wind_MWh_credit"] = scores_wind[:,4]

The same constraints work on ensemble.HistGradientBoostingRegressor from sklearn if thats any help?

@trivialfis
Copy link
Member

Hi, could you please provide some info for us to reproduce:

  • The actual Python value of the constraint parameter.
  • The list of feature names.

@niltecedu
Copy link
Author

niltecedu commented May 13, 2024

Hey @trivialfis

Here are the feature names:

['t_2m:C_53.77_1.702', 't_2m:C_53.84_1.767', 't_2m:C_53.9_1.832', 't_2m:C_53.97_1.897', 't_2m:C_54.03_1.962', 't_2m:C_54.1_2.027', 'wind_speed_10m:ms_53.77_1.702', 'wind_speed_10m:ms_53.84_1.767', 'wind_speed_10m:ms_53.9_1.832', 'wind_speed_10m:ms_53.97_1.897', 'wind_speed_10m:ms_54.03_1.962', 'wind_speed_10m:ms_54.1_2.027', 'wind_speed_100m:ms_53.77_1.702', 'wind_speed_100m:ms_53.84_1.767', 'wind_speed_100m:ms_53.9_1.832', 'wind_speed_100m:ms_53.97_1.897', 'wind_speed_100m:ms_54.03_1.962', 'wind_speed_100m:ms_54.1_2.027', 'wind_dir_10m:d_53.77_1.702', 'wind_dir_10m:d_53.84_1.767', 'wind_dir_10m:d_53.9_1.832', 'wind_dir_10m:d_53.97_1.897', 'wind_dir_10m:d_54.03_1.962', 'wind_dir_10m:d_54.1_2.027', 'wind_dir_100m:d_53.77_1.702', 'wind_dir_100m:d_53.84_1.767', 'wind_dir_100m:d_53.9_1.832', 'wind_dir_100m:d_53.97_1.897', 'wind_dir_100m:d_54.03_1.962', 'wind_dir_100m:d_54.1_2.027', 'precip_1h:mm_53.77_1.702', 'precip_1h:mm_53.84_1.767', 'precip_1h:mm_53.9_1.832', 'precip_1h:mm_53.97_1.897', 'precip_1h:mm_54.03_1.962', 'precip_1h:mm_54.1_2.027', 'relative_humidity_2m:p_53.77_1.702', 'relative_humidity_2m:p_53.84_1.767', 'relative_humidity_2m:p_53.9_1.832', 'relative_humidity_2m:p_53.97_1.897', 'relative_humidity_2m:p_54.03_1.962', 'relative_humidity_2m:p_54.1_2.027', 'Wind_MWh_credit', 'windlimit']

Here is the constraints python value being passed to the regressor object

{'wind_speed_10m:ms_53.77_1.702': 1, 'wind_speed_10m:ms_53.84_1.767': 1, 'wind_speed_10m:ms_53.9_1.832': 1, 'wind_speed_10m:ms_53.97_1.897': 1, 'wind_speed_10m:ms_54.03_1.962': 1, 'wind_speed_10m:ms_54.1_2.027': 1, 'wind_speed_100m:ms_53.77_1.702': 1, 'wind_speed_100m:ms_53.84_1.767': 1, 'wind_speed_100m:ms_53.9_1.832': 1, 'wind_speed_100m:ms_53.97_1.897': 1, 'wind_speed_100m:ms_54.03_1.962': 1, 'wind_speed_100m:ms_54.1_2.027': 1, 'windlimit': 1, 'relative_humidity_2m:p_53.77_1.702': -1, 'relative_humidity_2m:p_53.84_1.767': -1, 'relative_humidity_2m:p_53.9_1.832': -1, 'relative_humidity_2m:p_53.97_1.897': -1, 'relative_humidity_2m:p_54.03_1.962': -1, 'relative_humidity_2m:p_54.1_2.027': -1}

@trivialfis
Copy link
Member

Hi, I tried to create a reproducer based on the parameters, but couldn't see the error:

import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.datasets import make_regression

alpha = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

mono = {
    "wind_speed_10m:ms_53.77_1.702": 1,
    "wind_speed_10m:ms_53.84_1.767": 1,
    "wind_speed_10m:ms_53.9_1.832": 1,
    "wind_speed_10m:ms_53.97_1.897": 1,
    "wind_speed_10m:ms_54.03_1.962": 1,
    "wind_speed_10m:ms_54.1_2.027": 1,
    "wind_speed_100m:ms_53.77_1.702": 1,
    "wind_speed_100m:ms_53.84_1.767": 1,
    "wind_speed_100m:ms_53.9_1.832": 1,
    "wind_speed_100m:ms_53.97_1.897": 1,
    "wind_speed_100m:ms_54.03_1.962": 1,
    "wind_speed_100m:ms_54.1_2.027": 1,
    "windlimit": 1,
    "relative_humidity_2m:p_53.77_1.702": -1,
    "relative_humidity_2m:p_53.84_1.767": -1,
    "relative_humidity_2m:p_53.9_1.832": -1,
    "relative_humidity_2m:p_53.97_1.897": -1,
    "relative_humidity_2m:p_54.03_1.962": -1,
    "relative_humidity_2m:p_54.1_2.027": -1,
}

fname = [
    "t_2m:C_53.77_1.702",
    "t_2m:C_53.84_1.767",
    "t_2m:C_53.9_1.832",
    "t_2m:C_53.97_1.897",
    "t_2m:C_54.03_1.962",
    "t_2m:C_54.1_2.027",
    "wind_speed_10m:ms_53.77_1.702",
    "wind_speed_10m:ms_53.84_1.767",
    "wind_speed_10m:ms_53.9_1.832",
    "wind_speed_10m:ms_53.97_1.897",
    "wind_speed_10m:ms_54.03_1.962",
    "wind_speed_10m:ms_54.1_2.027",
    "wind_speed_100m:ms_53.77_1.702",
    "wind_speed_100m:ms_53.84_1.767",
    "wind_speed_100m:ms_53.9_1.832",
    "wind_speed_100m:ms_53.97_1.897",
    "wind_speed_100m:ms_54.03_1.962",
    "wind_speed_100m:ms_54.1_2.027",
    "wind_dir_10m:d_53.77_1.702",
    "wind_dir_10m:d_53.84_1.767",
    "wind_dir_10m:d_53.9_1.832",
    "wind_dir_10m:d_53.97_1.897",
    "wind_dir_10m:d_54.03_1.962",
    "wind_dir_10m:d_54.1_2.027",
    "wind_dir_100m:d_53.77_1.702",
    "wind_dir_100m:d_53.84_1.767",
    "wind_dir_100m:d_53.9_1.832",
    "wind_dir_100m:d_53.97_1.897",
    "wind_dir_100m:d_54.03_1.962",
    "wind_dir_100m:d_54.1_2.027",
    "precip_1h:mm_53.77_1.702",
    "precip_1h:mm_53.84_1.767",
    "precip_1h:mm_53.9_1.832",
    "precip_1h:mm_53.97_1.897",
    "precip_1h:mm_54.03_1.962",
    "precip_1h:mm_54.1_2.027",
    "relative_humidity_2m:p_53.77_1.702",
    "relative_humidity_2m:p_53.84_1.767",
    "relative_humidity_2m:p_53.9_1.832",
    "relative_humidity_2m:p_53.97_1.897",
    "relative_humidity_2m:p_54.03_1.962",
    "relative_humidity_2m:p_54.1_2.027",
    "Wind_MWh_credit",
    "windlimit",
]

n_features = len(fname)

X, y = make_regression(256, n_features)
X_df = pd.DataFrame(X, columns=fname)

xgb_regressor_wind = xgb.XGBRegressor(
    objective="reg:quantileerror",
    quantile_alpha=alpha,
    # n_jobs=1,
    colsample_bytree=0.8,
    gamma=0.3,
    learning_rate=0.02,
    max_depth=4,
    n_estimators=800,
    subsample=0.6,
    min_child_weight=3,
    monotone_constraints=mono,
)
xgb_regressor_wind.fit(X_df, y)

@niltecedu
Copy link
Author

Hey the problem isnt while fitting but rather while predicting, fitting the model is fine but using the fitted model causes issues

@trivialfis
Copy link
Member

@niltecedu I added a predict to the previous snippet , it works fine as well.

@niltecedu
Copy link
Author

Hey @trivialfis Oddly enough this snippet does work but when I get my actual weather data it doesnt :/
Camt seem to get to the root of the issue, the monotone dict is auto generated based on the dataframe column named,

But the columns I sent and the dict I had is was fails in my program,

@trivialfis
Copy link
Member

Is it possible that you turned the parameter into a string instead of a dictionary?

@trivialfis
Copy link
Member

Feel free to reopen if there's a way to reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants