Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results to_pandas() method is turning a list into a string #468

Open
rbyh opened this issue May 10, 2024 · 6 comments
Open

Results to_pandas() method is turning a list into a string #468

rbyh opened this issue May 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@rbyh
Copy link
Contributor

rbyh commented May 10, 2024

image
@rbyh rbyh added the bug Something isn't working label May 10, 2024
@rbyh
Copy link
Contributor Author

rbyh commented May 10, 2024

In this example a have responses to a QuestionCheckBox question which is a list of strings. When I convert the results to a dataframe the lists of selected options are converted into strings that look like lists

@johnjosephhorton
Copy link
Contributor

@rbyh Can you investigate best practices with pandas here? Pandas is meant to be a 'flat' format, so don't know what we should be doing.

@rbyh
Copy link
Contributor Author

rbyh commented May 11, 2024

Pandas should preserve the format, eg, here a column that is lists of strings remains in this format:

import pandas as pd

# Example DataFrame with a list in a column
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Interests': [['reading', 'cycling'], ['painting'], ['writing', 'cooking']]}
df = pd.DataFrame(data)

type(df['Interests'][0])

Will return:

<class 'list'>

I think the issue is the intermediary CSV conversion steps in to_pandas(). I think we can skip them with this fix:

import pandas as pd
import io

    def to_pandas(self, remove_prefix: bool = False) -> pd.DataFrame:
        """Convert the results to a pandas DataFrame, ensuring that lists remain as lists.

        :param remove_prefix: Whether to remove the prefix from the column names.

        """
        df = pd.DataFrame(self.data)  
        
        if remove_prefix:
            # Optionally remove prefixes from column names
            df.columns = [col.split('.')[-1] for col in df.columns]
        
        df_sorted = df.sort_index(axis=1)  # Sort columns alphabetically
        return df_sorted

@johnjosephhorton
Copy link
Contributor

It's a good fix but it broke some other tests in a complicated way, so I'm not quite ready to implement.

@rbyh
Copy link
Contributor Author

rbyh commented Jun 1, 2024

Bumping this. As I'm working on examples for extracting themes and turning them into checkbox question options I am frequently needing to add a step transforming the list-as-string into a true list.

@rbyh
Copy link
Contributor Author

rbyh commented Jun 1, 2024

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants