Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: #3943

Open
3 tasks done
am-vaibhav opened this issue Mar 13, 2024 · 2 comments
Open
3 tasks done

[BUG]: #3943

am-vaibhav opened this issue Mar 13, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@am-vaibhav
Copy link

pycaret version checks

Issue Description

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True,
normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True)
best = compare_models(include=['rf'], sort='F1')
final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Reproducible Example

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True,
normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True)
best = compare_models(include=['rf'], sort='F1')
final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Expected Behavior

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True,
normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True)
best = compare_models(include=['rf'], sort='F1')
final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Actual Results

*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns.

Installed Versions

'3.3.0'

@am-vaibhav am-vaibhav added the bug Something isn't working label Mar 13, 2024
@ohamza-dgs
Copy link

I have the same error when using "finalize_model". Just a simple setup() followed by create_model() -> tune_model(). After tuning I call finalize_model() on tuned_model object which throws the error:

"IndexError: Length of values (17512) does not match length of index (18434). This usually happens when transformations that drop rows aren't applied on all the columns."

Although it is stated at some solutions suggestions, setting "index=True/False" in setup() does not fix the issue. Looks like disabling "n_features_to_select" and "polynomial_features" parameters in setup() generally fix the issue but not all the time!

pycaret.version = 3.2.0

@CJC-ds
Copy link

CJC-ds commented May 10, 2024

Also encountering this issue in pycaret 3.2.0
Has this been fixed yet?

Also tried setting index=False in setup(), but still encounter the same error.
Feature selection is required, so disabling n_features_to_select is not really an option for me, as suggested above.

I found a workaround for this...
After checking the source code, the error is caused by setting up a merge between the original_df and the transformed df, just so that they can merge. But due to oversampling with SMOTE on the minority class, the two indices do not align.

The main purpose for this class method is to return a df that has the correct ordering of the columns.
Order does not really matter in my case, and I have not checked any downstream implications of this fix...

If you care about ordering, you can add your own column order with monkey patch fix below at ... .

from pycaret.internal.preprocess.transformers import TransformerWrapper

def _reorder_cols(self, df, original_df):
  ...
  return df

TransformerWrapper._reorder_cols = _reorder_cols

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants