Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe output of all_pairs_dijkstra_shortest_paths? #1119

Open
thomasaarholt opened this issue Feb 27, 2024 · 1 comment
Open

Dataframe output of all_pairs_dijkstra_shortest_paths? #1119

thomasaarholt opened this issue Feb 27, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@thomasaarholt
Copy link

What is the expected enhancement?

I experience that it is very slow to convert the dictionary returned by all_pairs_dijkstra_shortest_paths on a large graph into a dataframe. Unnesting the dict into three lists (of 1.6 Million rows) works well, but when I read this into a dataframe as follows, it takes an extremely long time and can crash my system.

import polars as pl
mapping = all_pairs_dijkstra_shortest_paths(...)
left, right, path = nested_dict_to_df(mapping)
df_distances = pl.DataFrame(
    [
        pl.Series(name="left", values=left, dtype=pl.Int32),
        pl.Series(name="right", values=right, dtype=pl.Int32),
        pl.Series(name="path", values=path, dtype=pl.List(Float64)),
    ]
)

I believe this is due to the dictionary entries not being contiguous in memory, or some similar effect. It would be very nice to be able to export rustworkx output directly to a dataframe for further processing.

@IvanIsCoding IvanIsCoding added the enhancement New feature or request label Feb 28, 2024
@IvanIsCoding
Copy link
Collaborator

I think this is aligned with #1033, I don't think it will be easy to implement but it is our most requested integration.

In the meantime, maybe you can use .explode() with .apply() to see if it performs slightly better?
Something along these lines of Pandas code to avoid Python loops which are known to be slow. It might be able to be ported to Polars as well.

import rustworkx as rx
import pandas as pd

graph = rx.generators.heavy_square_graph(5)
all_paths = rx.all_pairs_dijkstra_shortest_paths(graph, lambda _: 1.0)

df = pd.DataFrame(
    {
        "U": all_paths.keys(),
        "V": all_paths.values(),
    }
)

df = df.explode("V")
df["U->V"] = df.apply(lambda x: all_paths[x["U"]][x["V"]], axis=1)

If you print(df.tail()):

     U   V                                               U->V
64  64   1  (64, 44, 60, 38, 56, 13, 55, 12, 54, 32, 50, 2...
64  64   5  (64, 44, 60, 18, 59, 17, 58, 16, 57, 35, 53, 1...
64  64  25  (64, 44, 60, 38, 56, 13, 55, 12, 54, 32, 50, 6...
64  64  45  (64, 44, 60, 38, 56, 13, 55, 12, 54, 32, 50, 2...
64  64   0  (64, 44, 60, 38, 56, 13, 55, 12, 54, 32, 50, 2...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants