Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capacity keyword arguments to graph creation #975

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jakelishman
Copy link
Member

This is just a simple exposure of the allocation control from Rust space to Python space.

I was playing around with this for DAG construction in Qiskit and it (surprisingly to me) didn't have a meaningful effect on performance for what I was doing, but even so, it just feels like something I'd like to be able to expose there as well.

@coveralls
Copy link

coveralls commented Aug 29, 2023

Pull Request Test Coverage Report for Build 9098666721

Details

  • 40 of 40 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 96.496%

Totals Coverage Status
Change from base Build 9009838732: 0.05%
Covered Lines: 16717
Relevant Lines: 17324

💛 - Coveralls

@IvanIsCoding
Copy link
Collaborator

I’ll let @mtreinish give his opinion as well but I am against exposing this to end-users.

Mostly because Python users should not worry about it. For example, Python lists don’t let you pick a capacity. I think we are leaking a detail from Rust into Python

Also, I think for most of the Qiskit DAG code we spent more time on Rust <-> Python conversion than anything else. For some algorithms it takes more time to convert objects than to actually run the core logic

@jakelishman
Copy link
Member Author

That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.

@IvanIsCoding
Copy link
Collaborator

That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.

I could see a constructor for PyGraph/PyDiGraph which accepts nodes/edges, a little bit similar to what extend_from_edge_list does now

But if we add with_capacity then we need to explain to the end user what is a capacity, why it is different from the number of nodes/edges, etc. Which I would rather not do

@jakelishman
Copy link
Member Author

I have a generator expression (or equivalent opaque iterable) in Python-space (technically it's actually generated in Rust space, but we need to go via DAGCircuit, so that's irrelevant here) whose length I know ahead of time, but I don't actually want to generate Python-space allocated storage for the collection ahead of time, especially since that would then have copy semantics on entry to extend_from_edge_list.

Feel free to close the PR if you don't want to deal with explaining capacity to users - I have no proof that it'd have any measurable impact right now anyway.

@mtreinish
Copy link
Member

mtreinish commented Aug 30, 2023

For me I think we probably want to expose this functionality in some way for performance. Saving on memory allocations can save a bunch of runtime especially for very large graphs. But, I also share Ivan's concerns around exposing too much internal details to the UI. I'm wondering if there is a compromise here on the interface, like instead of doing with_capacity adding kwargs to the constructor like target_node_count and target_edge_count and document the performance benefit and behavior there

@jakelishman
Copy link
Member Author

jakelishman commented Aug 30, 2023

I'm happy to retarget this PR to whatever interface you guys would prefer, if it's something you'd consider adding. From my side, my ask is only that I can reserve space for nodes and/or edges before I actually provide them - anything that lets me do that, no matter how wordy / hidden, is fine for me, if you guys agree to add it.

(and I can of course live with it on our side if it's not added at all, for the above reasons)

@enavarro51
Copy link
Contributor

enavarro51 commented Nov 1, 2023

What about initial_node_count and initial_edge_count? That indicates to a Python user that it's not a limit, but a starting point. I do think it's probably worthwhile, especially since we now have a lot of hybrid Python/Rust developers on the qiskit side and probably elsewhere as well.

Or could call the method initial_size and just use node_count and edge_count. Or ...

@mtreinish mtreinish added this to the 0.15.0 milestone Feb 22, 2024
This adds `initial_node_count` and `initial_edge_count` optional keyword
arguments to the Python-space constructors of `PyGraph` and `PyDiGraph`
to expose the `with_capacity` constructor from Rust space.  This also
easily allows these to be used during pickling operations.
@jakelishman jakelishman changed the title Add with_capacity constructors in Python space Add capacity keyword arguments to graph creation May 15, 2024
@jakelishman
Copy link
Member Author

Updated to use the new forms suggested above.

Comment on lines +213 to +214
("initial_node_count", self.graph.node_count()),
("initial_edge_count", self.graph.edge_bound()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question: why do you use node_count and edge_bound (and not say node_bound and edge_bound, or node_count and edge_count)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do I understand correctly that the purpose of this function is to speed up unpickling of rustworkx graphs (by passing the initial_node_count and initial_edge_count arguments to __new__)?

@alexanderivrii
Copy link
Contributor

FWIW, I also like the ability to specify the number of expected nodes/edges when creating rustworkx graphs, so I am in favor of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants