Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize with lease #178

Open
yurj opened this issue Jan 9, 2021 · 7 comments
Open

Memoize with lease #178

yurj opened this issue Jan 9, 2021 · 7 comments

Comments

@yurj
Copy link

yurj commented Jan 9, 2021

Hi!

use case:

a web API. The API must return values as soon as possible, and values should be updated if some time is passed from last call. So the api is fast and has data quite updated. The data depend on an external service which can take from 1 second to several seconds to compute. Data is collected at the begin of the week using the external service and it rarely changes but when it happen, nice to have it displayed in some minutes.

memoize_stampede was not suited because when the cache expire, the caller has to wait all the recomputation thus the API would be slow. So I modified stampede a little bit, removing the euristic and adding a lease time parameter. It is not perfect because if the service is not called from sometime, it will display the cached value and then update it, but being called from Javascript in some home pages this should not happen often and changes in data are rare, so not a problem.

here the code (nice if it will be included on the recipes!):

def memoize_lease(cache, expire, lease, name=None, typed=False, tag=None):
    """Memoizing cache decorator with cache lease.

    The memoization decorator update the cache entry when lease expire. The
    decorator return always from the cache, then call the function only when
    the lease time has passed, using a separate thread. Setting expire to None
    means the caller never wait for the execution of the function, reading
    always from the cache. Suitable for quite long running (rare called)
    functions, for example a long calculation or web pages which can change
    every 5-10 mins and the page can take some seconds to be computed.

    If name is set to None (default), the callable name will be determined
    automatically.

    If typed is set to True, function arguments of different types will be
    cached separately. For example, f(3) and f(3.0) will be treated as distinct
    calls with distinct results.

    The original underlying function is accessible through the `__wrapped__`
    attribute. This is useful for introspection, for bypassing the cache, or
    for rewrapping the function with a different cache.

    >>> from diskcache import Cache
    >>> cache = Cache()
    >>> @memoize_lease(cache, expire=10, lease=1)
    ... def fib(number):
    ...     if number == 0:
    ...         return 0
    ...     elif number == 1:
    ...         return 1
    ...     else:
    ...         return fib(number - 1) + fib(number - 2)
    >>> print(fib(100))
    354224848179261915075

    An additional `__cache_key__` attribute can be used to generate the cache
    key used for the given arguments.

    >>> key = fib.__cache_key__(100)
    >>> del cache[key]

    Remember to call memoize when decorating a callable. If you forget, then a
    TypeError will occur.

    :param cache: cache to store callable arguments and return values
    :param float expire: seconds until arguments expire
    :param float lease: minimum seconds after last execution
                        we want to update the cache value
    :param str name: name given for callable (default None, automatic)
    :param bool typed: cache different types separately (default False)
    :param str tag: text to associate with arguments (default None)
    :return: callable decorator

    """
    # Caution: Nearly identical code exists in Cache.memoize
    def decorator(func):
        "Decorator created by memoize call for callable."
        base = (full_name(func),) if name is None else (name,)

        def timer(*args, **kwargs):
            "Time execution of `func` and return result and time delta."
            start = time.time()
            result = func(*args, **kwargs)
            delta = time.time() - start
            return result, delta, time.time()

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            "Wrapper for callable to cache arguments and return values."
            key = wrapper.__cache_key__(*args, **kwargs)
            trio, expire_time = cache.get(
                key, default=ENOVAL, expire_time=True, retry=True,
            )

            if trio is not ENOVAL:
                result, delta, last_exec = trio
                now = time.time()

                if (now - last_exec) > lease:
                    thread_key = key + (ENOVAL,)
                    thread_added = cache.add(thread_key, None,
                                             expire=delta, retry=True,)

                    if thread_added:
                        # Start thread for early recomputation.
                        def recompute():
                            with cache:
                                trio = timer(*args, **kwargs)
                                cache.set(
                                    key, trio, expire=expire,
                                    tag=tag, retry=True,)
                        thread = threading.Thread(target=recompute)
                        thread.daemon = True
                        thread.start()

                return result

            trio = timer(*args, **kwargs)
            cache.set(key, trio, expire=expire, tag=tag, retry=True)
            return trio[0]

        def __cache_key__(*args, **kwargs):
            "Make key for cache given function arguments."
            return args_to_key(base, args, kwargs, typed)

        wrapper.__cache_key__ = __cache_key__
        return wrapper

    return decorator

@grantjenks
Copy link
Owner

Interesting. It's been a while since I looked at this code. But I think I understand it. The "expire" is a hard deadline but the "lease" is like a soft deadline. Once the "lease" is passed a thread is started to update the cache item.

If the api isn't called in a long time, values will still expire and cause a delay, right? Or do you set the lease to something short (like a minute) and the expiration to something like (like a week)?

Also, did you read through http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html ? It would be helpful to see Concurrency and Latency charts with this decorator.

@grantjenks grantjenks changed the title memoize lease Memoize with lease Jan 10, 2021
@grantjenks
Copy link
Owner

The term "lease" is new to me and I like it. Are there similar cache APIs?

@yurj
Copy link
Author

yurj commented Jan 10, 2021

Interesting. It's been a while since I looked at this code. But I think I understand it. The "expire" is a hard deadline but the "lease" is like a soft deadline. Once the "lease" is passed a thread is started to update the cache item.

If the api isn't called in a long time, values will still expire and cause a delay, right? Or do you set the lease to something short (like a minute) and the expiration to something like (like a week)?

Yes, this is the main idea, I can set it to a very high value if we are sure it will be called in the mean time, so expiring will be updated and the cache entry almost never expire. The data are openings calendars for public places. The schedule is decided some days before. But sometimes there are updates in the current week, so I'll use 5 minutes because I can't know when changes will happen. Once a week, I'll run a full update of data, so I'm sure I'm updated and aligned at the begin of the week.

Also, did you read through http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html ? It would be helpful to see Concurrency and Latency charts with this decorator.

This page was my starting point but I haven't tested as you. I think the bottleneck can be the number of concurrent threads to calculate different weeks for different public place at the same time but usually this doesn't happen.

I'm thinking of generalize it using 2 functions as a parameter, one for the timer (maybe with a different name) and the condition check (in this caseif (now - last_exec) > lease) so we can implement many kind of caching strategies. The timer function is the one that execute the function and save the useful data for the caching strategy (+ the delta for the thread barrier), like last execution time, number of time the function has been called, using function arguments etc etc. The condition function would use the data to decide if we want to update the value in the cache.

"Lease" came from dhcp protocol, where the lease is assigned for some time and then renew after expiring.

@yurj
Copy link
Author

yurj commented Jan 10, 2021

The term "lease" is new to me and I like it. Are there similar cache APIs?

I don't know. I'm using diskcache because it is very fast, simple and handle concurrency very well. Actually I've tested it with this unicorn app with 16 threads and it spans tens of remote calls which are saved using diskcache without any glitch. Very nice python module!

@madeinoz67
Copy link

I've only just came across this module (thanks Python Bytes podcast for the heads up!) and I have a similar use case where my application makes remote API calls that are time costly, diskcache will resolve these, but the cached data needs to have a definable lifetime so that it can be refreshed with current data , love the idea of lease btw!

@grantjenks
Copy link
Owner

Hi @yurj , I made a couple small changes and opened a pull request. Feel free to open your own if you've made more modifications.

@yurj
Copy link
Author

yurj commented May 3, 2021

Can this be rewritten using asyncio and avoid to use threading? It should be similar to #196.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants