You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have trained a RandomForest classifier - I'd like to finalize it for deployment, but one of the things I would like is to get the probabilities of the classes to get the top 5 or so outputs to pass to a downstream ensemble model. predict_proba is a nice way for this to work. It works on the trained model.
However, if I run finalize_model on my rf, I can no longer use predict_proba on that finalized model.
Reproducible Example
# get data (not provided here)data=load_data()
multi_class= ['facebook_interest_id',
'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo',
'max_has_text','mean_budget','max_genre_id','time_signature']
data=filtered[multi_class]
# categories - are max_genre_id, max_has_text, key, mode, facebook_interest_ids=ClassificationExperiment()
s.setup(data,
target='facebook_interest_id',
session_id=123,
categorical_features= ['max_genre_id','max_has_text','mode','key'],
train_size=.8,
remove_multicollinearity=True,
multicollinearity_threshold=.5
)
rf=s.create_model('rf' , max_depth=17, min_samples_split=3)
final_rf=s.finalize_model(rf)
# works fines.predict_model(final_rf, data.iloc[[1]] )
# get example data for PoC testing, single line for inferencetest_data=s.pipeline.transform( data.iloc[[1]].drop(columns='facebook_interest_id') )
# works finerf.predict_proba(test_data)
# errors outfinal_rf.predict_proba(X=test_data)
### Actual Results
```python-traceback
pred = final_rf.predict_proba(data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
Xt = transform.transform(Xt)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
X = to_df(X, index=getattr(y, "index", None))
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
data = pd.DataFrame(data, index, columns)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
mgr = ndarray_to_mgr(
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
values = _prep_ndarraylike(values, copy=copy_on_sanitize)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
update : the same error occurs when I load a model after saving. The loaded model can do predict() but not predict_proba().
Upgraded to pycaret 3.3.1 thinking maybe it was related to joblib / pickling but still no luck.
>>> rf.predict(df_features.iloc[0:1])
0 6003180715102
Name: facebook_interest_id, dtype: int64
>>> rf.predict_proba(df_features.iloc[0:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 341, in predict_proba
Xt = transform.transform(Xt)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 233, in transform
X = to_df(X, index=getattr(y, "index", None))
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pycaret/utils/generic.py", line 103, in to_df
data = pd.DataFrame(data, index, columns)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/frame.py", line 822, in __init__
mgr = ndarray_to_mgr(
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 319, in ndarray_to_mgr
values = _prep_ndarraylike(values, copy=copy_on_sanitize)
File "/Users/jack/miniforge3/envs/pycaret/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 575, in _prep_ndarraylike
values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Somewhat astonishingly, predict_log_proba() works fine. Baffling.
pycaret version checks
I have checked that this issue has not already been reported here.
I have confirmed this bug exists on the latest version of pycaret.
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).
Issue Description
I have trained a RandomForest classifier - I'd like to finalize it for deployment, but one of the things I would like is to get the probabilities of the classes to get the top 5 or so outputs to pass to a downstream ensemble model. predict_proba is a nice way for this to work. It works on the trained model.
However, if I run finalize_model on my rf, I can no longer use predict_proba on that finalized model.
Reproducible Example
rf.predict_proba(test_data.drop(columns='facebook_interest_id'))
array([[1.11865776e-03, 0.00000000e+00, 8.07102502e-06, ...,
0.00000000e+00, 1.57480315e-04, 5.23560209e-05]])
The text was updated successfully, but these errors were encountered: