Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] lazy loading regression #481

Open
lmeyerov opened this issue May 3, 2023 · 10 comments
Open

[BUG] lazy loading regression #481

lmeyerov opened this issue May 3, 2023 · 10 comments

Comments

@lmeyerov
Copy link
Contributor

lmeyerov commented May 3, 2023

after 0.29.0 (big pyg[ai] merge), looks like import is slow again, likely some top-level imports getting triggered:

Screenshot from 2023-05-02 21-24-27

for reference, pandas is just 600ms on same box -- so we can be targeting subsecond, but aren't

step #1 is to profile again: https://www.graphistry.com/blog/import-pygraphistry-from-10s-to-1s-by-python-import-lazy-loading-and-tuna

@lmeyerov
Copy link
Contributor Author

lmeyerov commented May 3, 2023

as part of the fix, maybe we add a unit test? at least '1 out of 3 cold starts should be subsecond'

@silkspace
Copy link
Contributor

@tanmoyio could this have to do with new inits?

@lmeyerov
Copy link
Contributor Author

@tanmoyio @dcolinmorgan can we look as part of landing cu_cat?

@lmeyerov
Copy link
Contributor Author

I documented how I tracked it down last time: https://www.graphistry.com/blog/import-pygraphistry-from-10s-to-1s-by-python-import-lazy-loading-and-tuna

@dcolinmorgan
Copy link
Contributor

dcolinmorgan commented Jul 24, 2023

as in lazy loading cucat? we already lazy load cudf and cuml within #486 , or at least that was the whole idea, perhaps its not working correctly as 9s seems far too long

@lmeyerov
Copy link
Contributor Author

i mean why is import graphistry slow -- is it pulling in some unexpected deps during import time that need to be lazy loaded?

@lmeyerov
Copy link
Contributor Author

w/ lazy loading, cu_cat, cudf, etc shouldn't load at import graphistry, i'd expect at most just pandas

@dcolinmorgan
Copy link
Contributor

dcolinmorgan commented Jul 27, 2023

yes i see the issue now on 29.3 -- on eks where cuda is available, cudf loads in 4.7s for total g. load time of 6.7s, whereas on my mac laptop entire graphistry 29.3 loads in 0.829s. seems lazy cudf is getting initiated before user request. So the question is to front load or break loadings into pieces.

re: #439 seems like there are 2 options:

  1. if cudf is available, load it and run on full gpu by default via 'auto' flag. otherwise
  2. wait to load cudf until user triggers with cu-cat/cuml flags

from your comment above, the second option seems expected and wanted -- should not be a hard fix. I suppose i never noticed since I've always loaded cudf before graphistry
Screenshot 2023-07-27 at 18 54 48

@lmeyerov
Copy link
Contributor Author

lmeyerov commented Jul 27, 2023

Option 3. Lazy load

Delay any cudf imports until the code using it (including "auto") is run. 'import graphistry' shouldn't need to import cudf.

Can you find the path triggering the import? It looks like plotter imports embed_utils, and embed_utils is importing cudf at import time somewhere instead of when its methods are used

@dcolinmorgan
Copy link
Contributor

dcolinmorgan commented Jul 28, 2023

ok yeah i see the issue in embed_utils -- at some point we needed to typecheck between cudf and pd, and loaded cudf to do that, so i just swapped to using getmodule and loadtime back down to 1.8s all on pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants