Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph when calling spark.sql() #556

Open
DataBoyTX opened this issue Mar 15, 2024 · 1 comment
Assignees

Comments

@DataBoyTX
Copy link
Contributor

Describe the bug

The following code used to work, but is now throwing an error, assuming the datatype of the resulting df changed from SparkDataFrame to pyspark.sql.connect.dataframe.DataFrame

df = spark.sql("SELECT * FROM honeypot")

g2 = graphistry.edges(df, 'attackerIP', 'victimIP')

g2.plot()

simply adding .toPandas() to the df on input to edges() fixes the problem, but we should handle in the client.

error:


ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <command-2934552628071172>, line 1
----> 1 g.plot()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/PlotterBase.py:1404, in PlotterBase.plot(self, graph, nodes, name, description, render, skip_upload, as_files, memoize, extra_html, override_html_style)
   1401 PyGraphistry.refresh()
   1402 logger.debug("4. @PloatterBase plot: PyGraphistry.org_name(): {}".format(PyGraphistry.org_name()))
-> 1404 dataset = self._plot_dispatch(g, n, name, description, 'arrow', self._style, memoize)
   1405 if skip_upload:
   1406     return dataset

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/PlotterBase.py:1701, in PlotterBase._plot_dispatch(self, graph, nodes, name, description, mode, metadata, memoize)
   1698 except ImportError:
   1699     pass
-> 1701 error('Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.')

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/graphistry/util.py:280, in error(msg)
    279 def error(msg):
--> 280     raise ValueError(msg)

ValueError: Expected Pandas/Arrow/cuDF/Spark dataframe(s) or igraph/NetworkX graph.

To Reproduce

Lab 2 - Data Preparation and Styling-ExpectedPandasArrowSparkDataframe.zip

@lmeyerov
Copy link
Contributor

We should support multiple spark versions, sounds like impacts potentially these:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants