[BUG]: Cannot use predict_proba() on a finalized model #3979

DrBrule · 2024-04-17T15:18:07Z

pycaret version checks

I have checked that this issue has not already been reported here.
I have confirmed this bug exists on the latest version of pycaret.
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).

Issue Description

I have trained a RandomForest classifier - I'd like to finalize it for deployment, but one of the things I would like is to get the probabilities of the classes to get the top 5 or so outputs to pass to a downstream ensemble model. predict_proba is a nice way for this to work. It works on the trained model.

However, if I run finalize_model on my rf, I can no longer use predict_proba on that finalized model.

Reproducible Example

# get data (not provided here)
data = load_data()

multi_class = ['facebook_interest_id',
        'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
        'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
        'max_has_text','mean_budget','max_genre_id','time_signature'] 

data = filtered[multi_class]
# categories - are max_genre_id, max_has_text, key, mode, facebook_interest_id

s = ClassificationExperiment()
s.setup(data, 
    target = 'facebook_interest_id',
    session_id = 123,
    categorical_features = ['max_genre_id','max_has_text','mode','key'],
    train_size=.8,
    remove_multicollinearity = True,
    multicollinearity_threshold=.5
    )

rf          = s.create_model('rf' , max_depth=17, min_samples_split=3)
final_rf    = s.finalize_model(rf)

# works fine
s.predict_model(final_rf, data.iloc[[1]] )


# get example data for PoC testing, single line for inference
test_data  = s.pipeline.transform( data.iloc[[1]].drop(columns='facebook_interest_id') )

# works fine
rf.predict_proba(test_data)

# errors out
final_rf.predict_proba(X=test_data)



### Expected Behavior

rf.predict_proba(test_data.drop(columns='facebook_interest_id'))

array([[1.11865776e-03, 0.00000000e+00, 8.07102502e-06, ...,
0.00000000e+00, 1.57480315e-04, 5.23560209e-05]])


### Actual Results

```python-traceback
pred = final_rf.predict_proba(data)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
    Xt = transform.transform(Xt)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
    X = to_df(X, index=getattr(y, "index", None))
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
    data = pd.DataFrame(data, index, columns)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
    mgr = ndarray_to_mgr(
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
    values = _prep_ndarraylike(values, copy=copy_on_sanitize)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
    values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.



### Installed Versions

<details>
System:
    python: 3.10.14 (main, Mar 21 2024, 11:21:31) [Clang 14.0.6 ]
executable: /Users/jack/miniforge3/envs/pycaret/bin/python
   machine: macOS-13.6-arm64-arm-64bit

PyCaret required dependencies:
                 pip: 23.3.1
          setuptools: 68.2.2
             pycaret: 3.3.0
             IPython: 8.23.0
          ipywidgets: 8.1.2
                tqdm: 4.66.2
               numpy: 1.26.4
              pandas: 2.1.4
              jinja2: 3.1.3
               scipy: 1.11.4
              joblib: 1.3.0
             sklearn: 1.4.2
                pyod: 1.1.3
            imblearn: 0.12.2
   category_encoders: 2.6.3
            lightgbm: 4.1.0
               numba: 0.59.1
            requests: 2.31.0
          matplotlib: 3.7.5
          scikitplot: 0.3.7
         yellowbrick: 1.5
              plotly: 5.20.0
    plotly-resampler: Not installed
             kaleido: 0.2.1
           schemdraw: 0.15
         statsmodels: 0.14.1
              sktime: 0.28.0
               tbats: 1.1.3
            pmdarima: 2.0.4
              psutil: 5.9.8
          markupsafe: 2.1.5
             pickle5: Not installed
         cloudpickle: 3.0.0
         deprecation: 2.1.0
              xxhash: 3.4.1
           wurlitzer: 3.0.3

PyCaret optional dependencies:
                shap: 0.45.0
           interpret: 0.6.0
                umap: 0.5.6
     ydata_profiling: 4.7.0
  explainerdashboard: 0.4.7
             autoviz: Not installed
           fairlearn: 0.7.0
          deepchecks: Not installed
             xgboost: 2.0.3
            catboost: 1.1.1
              kmodes: 0.12.2
             mlxtend: 0.23.1
       statsforecast: 1.5.0
        tune_sklearn: 0.5.0
                 ray: 2.10.0
            hyperopt: 0.2.7
              optuna: 3.6.1
               skopt: 0.10.1
              mlflow: 2.11.3
              gradio: 4.26.0
             fastapi: 0.110.1
             uvicorn: 0.29.0
              m2cgen: 0.10.0
           evidently: 0.4.16
               fugue: 0.8.6
           streamlit: Not installed
             prophet: Not installed</details>

The text was updated successfully, but these errors were encountered:

DrBrule · 2024-04-19T15:16:52Z

update : the same error occurs when I load a model after saving. The loaded model can do predict() but not predict_proba().

Upgraded to pycaret 3.3.1 thinking maybe it was related to joblib / pickling but still no luck.

>>> rf.predict(df_features.iloc[0:1])
0    6003180715102
Name: facebook_interest_id, dtype: int64
>>> rf.predict_proba(df_features.iloc[0:1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
    Xt = transform.transform(Xt)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
    X = to_df(X, index=getattr(y, "index", None))
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
    data = pd.DataFrame(data, index, columns)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
    mgr = ndarray_to_mgr(
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
    values = _prep_ndarraylike(values, copy=copy_on_sanitize)
  File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
    values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Somewhat astonishingly, predict_log_proba() works fine. Baffling.

DrBrule added the bug Something isn't working label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Cannot use predict_proba() on a finalized model #3979

[BUG]: Cannot use predict_proba() on a finalized model #3979

DrBrule commented Apr 17, 2024 •

edited

DrBrule commented Apr 19, 2024 •

edited

[BUG]: Cannot use predict_proba() on a finalized model #3979

[BUG]: Cannot use predict_proba() on a finalized model #3979

Comments

DrBrule commented Apr 17, 2024 • edited

pycaret version checks

Issue Description

Reproducible Example

DrBrule commented Apr 19, 2024 • edited

DrBrule commented Apr 17, 2024 •

edited

DrBrule commented Apr 19, 2024 •

edited