Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Cannot use predict_proba() on a finalized model #3979

Open
3 tasks done
DrBrule opened this issue Apr 17, 2024 · 1 comment
Open
3 tasks done

[BUG]: Cannot use predict_proba() on a finalized model #3979

DrBrule opened this issue Apr 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@DrBrule
Copy link

DrBrule commented Apr 17, 2024

pycaret version checks

Issue Description

I have trained a RandomForest classifier - I'd like to finalize it for deployment, but one of the things I would like is to get the probabilities of the classes to get the top 5 or so outputs to pass to a downstream ensemble model. predict_proba is a nice way for this to work. It works on the trained model.

However, if I run finalize_model on my rf, I can no longer use predict_proba on that finalized model.

Reproducible Example

# get data (not provided here)
data = load_data()

multi_class = ['facebook_interest_id',
        'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
        'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
        'max_has_text','mean_budget','max_genre_id','time_signature'] 

data = filtered[multi_class]
# categories - are max_genre_id, max_has_text, key, mode, facebook_interest_id

s = ClassificationExperiment()
s.setup(data, 
    target = 'facebook_interest_id',
    session_id = 123,
    categorical_features = ['max_genre_id','max_has_text','mode','key'],
    train_size=.8,
    remove_multicollinearity = True,
    multicollinearity_threshold=.5
    )

rf          = s.create_model('rf' , max_depth=17, min_samples_split=3)
final_rf    = s.finalize_model(rf)

# works fine
s.predict_model(final_rf, data.iloc[[1]] )


# get example data for PoC testing, single line for inference
test_data  = s.pipeline.transform( data.iloc[[1]].drop(columns='facebook_interest_id') )

# works fine
rf.predict_proba(test_data)

# errors out
final_rf.predict_proba(X=test_data)


### Expected Behavior

rf.predict_proba(test_data.drop(columns='facebook_interest_id'))

array([[1.11865776e-03, 0.00000000e+00, 8.07102502e-06, ...,
0.00000000e+00, 1.57480315e-04, 5.23560209e-05]])


### Actual Results

```python-traceback
pred = final_rf.predict_proba(data)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
    Xt = transform.transform(Xt)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
    X = to_df(X, index=getattr(y, "index", None))
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
    data = pd.DataFrame(data, index, columns)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
    mgr = ndarray_to_mgr(
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
    values = _prep_ndarraylike(values, copy=copy_on_sanitize)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
    values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.


### Installed Versions

<details>
System:
    python: 3.10.14 (main, Mar 21 2024, 11:21:31) [Clang 14.0.6 ]
executable: /Users/jack/miniforge3/envs/pycaret/bin/python
   machine: macOS-13.6-arm64-arm-64bit

PyCaret required dependencies:
                 pip: 23.3.1
          setuptools: 68.2.2
             pycaret: 3.3.0
             IPython: 8.23.0
          ipywidgets: 8.1.2
                tqdm: 4.66.2
               numpy: 1.26.4
              pandas: 2.1.4
              jinja2: 3.1.3
               scipy: 1.11.4
              joblib: 1.3.0
             sklearn: 1.4.2
                pyod: 1.1.3
            imblearn: 0.12.2
   category_encoders: 2.6.3
            lightgbm: 4.1.0
               numba: 0.59.1
            requests: 2.31.0
          matplotlib: 3.7.5
          scikitplot: 0.3.7
         yellowbrick: 1.5
              plotly: 5.20.0
    plotly-resampler: Not installed
             kaleido: 0.2.1
           schemdraw: 0.15
         statsmodels: 0.14.1
              sktime: 0.28.0
               tbats: 1.1.3
            pmdarima: 2.0.4
              psutil: 5.9.8
          markupsafe: 2.1.5
             pickle5: Not installed
         cloudpickle: 3.0.0
         deprecation: 2.1.0
              xxhash: 3.4.1
           wurlitzer: 3.0.3

PyCaret optional dependencies:
                shap: 0.45.0
           interpret: 0.6.0
                umap: 0.5.6
     ydata_profiling: 4.7.0
  explainerdashboard: 0.4.7
             autoviz: Not installed
           fairlearn: 0.7.0
          deepchecks: Not installed
             xgboost: 2.0.3
            catboost: 1.1.1
              kmodes: 0.12.2
             mlxtend: 0.23.1
       statsforecast: 1.5.0
        tune_sklearn: 0.5.0
                 ray: 2.10.0
            hyperopt: 0.2.7
              optuna: 3.6.1
               skopt: 0.10.1
              mlflow: 2.11.3
              gradio: 4.26.0
             fastapi: 0.110.1
             uvicorn: 0.29.0
              m2cgen: 0.10.0
           evidently: 0.4.16
               fugue: 0.8.6
           streamlit: Not installed
             prophet: Not installed</details>
@DrBrule DrBrule added the bug Something isn't working label Apr 17, 2024
@DrBrule
Copy link
Author

DrBrule commented Apr 19, 2024

update : the same error occurs when I load a model after saving. The loaded model can do predict() but not predict_proba().

Upgraded to pycaret 3.3.1 thinking maybe it was related to joblib / pickling but still no luck.

>>> rf.predict(df_features.iloc[0:1])
0    6003180715102
Name: facebook_interest_id, dtype: int64
>>> rf.predict_proba(df_features.iloc[0:1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
    Xt = transform.transform(Xt)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
    X = to_df(X, index=getattr(y, "index", None))
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
    data = pd.DataFrame(data, index, columns)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
    mgr = ndarray_to_mgr(
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
    values = _prep_ndarraylike(values, copy=copy_on_sanitize)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
    values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Somewhat astonishingly, predict_log_proba() works fine. Baffling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant