Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-lived cache management #298

Open
softloft38p-michael opened this issue Dec 8, 2023 · 2 comments
Open

Long-lived cache management #298

softloft38p-michael opened this issue Dec 8, 2023 · 2 comments

Comments

@softloft38p-michael
Copy link

I'm working on a project wherein I have two caches: one for celery tasks and one for a file index per root path. My question is how best to set these two caches up in a way that individual tasks and indexes have a limited lifetime, but the cache system itself is indefinite.

Currently I have this:

import diskcache

cache_root = '/path_to_caches'
TASK_CACHE = diskcache.FanoutCache(cache_root + '/task_cache', shards = 16)
INDEX_CACHE = diskcache.FanoutCache(cache_root + '/index_cache', shards = 16)

def get_task_cache(task_id: str):
    task_cache = TASK_CACHE.cache(task_id, expire=259_200)
    task_cache.touch()
    return task_cache

def get_index_cache(root_path: str):
    index_cache = TASK_CACHE.cache(root_path, expire=7_776_000)
    index_cache.touch()
    return index_cache

An individual task_cache or index_cache is read and written to by multiple celery tasks at the same time.

Some questions I have are:

  • Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?
  • Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.
  • What is the recommended to clean an individual .cache from FanoutCache? Is something like this sufficient:
    task_cache = TASK_CACHE.pop(task_id)
    task_cache.close()
    Or is there a single-function counterpart to .cache?
@grantjenks
Copy link
Owner

Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?

Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.

Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.

Not really. The expectation is that they would all share a single fanout cache. If you create individual ones, you’ll have to delete them yourself.

What is the recommended to clean an individual .cache from FanoutCache?

Fanout cache doesn’t store caches and that’s confusing. That method is simply an easy way to create a cache in a subdirectory. There’s no cache management functionality between the parent/child.

@softloft38p-michael
Copy link
Author

Thanks for the reply!

Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.

It seems I can make it work by switching from .cache to a key with an Index storing all the same data: is that a better approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants