Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the behavior of delete_cache so that only files that are not being used by any active sessions are deleted #8098

Open
mberco-quandl opened this issue Apr 22, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@mberco-quandl
Copy link

mberco-quandl commented Apr 22, 2024

  • [YES] I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
Difficulty applying the unload event listener to storage management. I would like two behaviours: 1) the ability to delete specific Gradio.Files and 2) the ability to delete the entire temporary directory associated with a session (eg "root:/tmp/gradio/c2ab059f2860eee1b4963a90a6815042e18404e4/").

Feature 1) - why doesn't .unload support inputs? In my testing it's impossible to access a Gradio.File path without passing it in as an input to the function triggered by unload. Being able to control specific files that get deleted upon connection termination would be desirable.
Feature 2) - even if I knew how to access the string "c2ab059f2860eee1b4963a90a6815042e18404e4" associated with my session ID, how would I then pass that into the function triggered by unload? Being able to clear the temporary storage associated with a session would be desirable.

@freddyaboulton helpfully suggested unload as a solution to storage leaks here but I can't apply it given the shortcomings. Any workarounds? Am I missing something?

Describe the solution you'd like

  1. The unload event listener to accept inputs. 2) a method to clear temporary storage associated with a connection as soon as the connection is terminated not relying on Gradio.Blocks(delete_cache=).

Additional context
Thank you!

@freddyaboulton
Copy link
Collaborator

Hi @mberco-quandl , Gradio does not save files independently for each session hash. The hash you see (c2ab059f2860eee1b4963a90a6815042e18404e4) is not the session hash but a hash of the file itself. Gradio saves files in a path that depends on the file contents so that duplicate files are not saved. So it's not recommended you delete those files as those can be used by another session at the same time.

The unload event does not accept inputs because the connection has been closed and so we cannot access the values of components used in the clients.

Why don't you want to use delete_cache? It was designed for the problem you are trying to solve.

@mberco-quandl
Copy link
Author

mberco-quandl commented Apr 22, 2024

Ah interesting. Would a collision occur if two sessions uploaded different files with the same names? What if there is no intention of sharing tmp files between sessions?

The issue is less control in a space-constrained deployment. In my application I know that users have no need to share files cross-session or to access files after closing a connection. I'd like users to be able to access files as long as the connection is open, however.

My understanding is that delete_cache would delete a file even if a user were still expecting to access it, as it is only based on age of the file creation. This is very good but not perfect. I'm tempted to set the delete_cache very low to optimize storage, but I'm also wary of deleting files users have not yet had a chance to download. Say they triggered a job to run then went to get coffee. Having more control over storage would be desirable.

Thank you!

@abidlabs
Copy link
Member

@freddyaboulton would it be possible to modify the behavior of delete_cache so that only files that are not being used by any active sessions are deleted? Based on the heartbeat connection.

@mberco-quandl
Copy link
Author

@freddyaboulton @abidlabs what is the supported way to write files to the temp directory? Performing an operation on a pandas dataframe like df.to_csv("filename.csv") appears to write files to the working directory and not the temp directory. I would think to do something like df.to_csv("root:/tmp/gradio/filehash/filename.csv" but how would I access the hash prior to it being created? Can you please elaborate on how the hash is generated and what happens if there are collisions?

Thanks!

@abidlabs
Copy link
Member

abidlabs commented May 3, 2024

Hi @mberco-quandl we don't support explicitly writing files to the cache. I don't quite follow why you need to do this.

I'm going to rename this issue to this:

would it be possible to modify the behavior of delete_cache so that only files that are not being used by any active sessions are deleted

@abidlabs abidlabs changed the title pass inputs into Gradio.Blocks.unload event listener Modify the behavior of delete_cache so that only files that are not being used by any active sessions are deleted May 3, 2024
@abidlabs abidlabs added enhancement New feature or request and removed pending clarification labels May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants