Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking clarity for persisting cache to disk #115

Open
polarathene opened this issue Aug 12, 2021 · 3 comments
Open

Seeking clarity for persisting cache to disk #115

polarathene opened this issue Aug 12, 2021 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@polarathene
Copy link

polarathene commented Aug 12, 2021

I have a similar use-case like a past user discussed with you at #9 . I have read over the README, some issues, and olricd.yaml config examples.

I've been informed that olric primarily provides a KV in-memory store (that I'll refer to as cache), but also capable of overflowing/persisting to disk (where a smaller in-memory pool caches the most active queries for latency, but can leverage larger available disk space to extend the cache). Is this supported?

Olric implements an append-only log file, indexed with a builtin map (uint64 => uint64). It creates new tables and evacuates existing data to the new ones if it needs to shrink or expand. - README - Storage Engine

Not entirely sure what is being conveyed here. Is this about utilizing disk storage at all?

There's no information on where files would be written so that persistence via a docker volume could be added.

I think, you could use Souin with the EmbeddedOlric or Olric provider. This way, it will store in memory and save on disk when it reaches the RAM limit. This way it will support the LRU based storage. - Advice from author of Souin (caching service that leverages olric)

Is this correct information? Will it also keep a copy of the in-memory contents on disk and persist with container restarts to allow filling the in-memory store/cache when a key exists on disk?


Use case - Caching processed responses from an image service and persisting to disk

This is new territory for me personally. We have a small site with about 20TB monthly traffic, heavy image content from user uploads.

To reduce storage costs we're adopting an API that receives a web request for an image asset and will process the original image to a transformed variant (differing image format, resolution, quality/filesize, etc).

Reducing impact of resource load with that approach benefits from caching the request responses of course 😀 This is where Souin (and thus Olric) is meant to help. Presently we've only been scaling vertically on a single node (although it's great that scaling horizontally is an option!), I just need to know if I can leverage disk storage in addition to a smaller memory cache or if only in-memory is supported by olric and I am required to persist to disk elsewhere?

One benefit for persisting to disk, not just extending the memory cache is that the most frequent requests cached will survive service restarts/failures (we use docker-compose for now).


Additional Questions

Few questions about configuring / understanding allocation size limits that the README wasn't clear on for me (in the eviction section, it also has a comment for maxKeys being in bytes for some reason?) and I tried to understand better reading issue threads:

From #9 , you mentioned that when more than 1 node is involved, partitions were distributed evenly for the maxInuse memory limit, then later mention you had implemented a new algorithm for it. I wasn't sure if you addressed that users concern about nodes having differing RAM available to allocate (eg 2GB on one node, 4GB on another), where the partition/allocation could adapt instead of be evenly divided? (node A using 2GB, node B being able to use more than 2GB)

In #106 (comment) I also see tableSize clarified as not only the 1MiB default size and default 271 partitions (where you advise using a prime number if modifying), you mention that this is the size per partition (portion of a DMap?), so the actual total size to keep in mind was 271 MiB? (at least when configuring for a single node)

You then later mention in #106 inserting 1M keys with values of 10 bytes using 350Mb of memory. Is that 350 Megabits (43.75 MB), or MegaBytes (upper case M), 1M * 10B == 10MB, so I assume it was around 40MB with other overheads?

@buraksezer buraksezer self-assigned this Aug 12, 2021
@buraksezer buraksezer added the question Further information is requested label Aug 12, 2021
@develar
Copy link

develar commented Aug 15, 2021

Will be great to hear official answer about persisting cache to disk. It will allow to avoid using some database in addition to Olric. Because RAM is cheap.

Maybe even simple query to dump all keys and values will work. Of course, I fully realize that in a distributed environment it is not a simple task, because immediately after/during hot backup maybe another updates comes.

@buraksezer
Copy link
Owner

Hi all,

I'm so sorry for the latency. There are a lot of questions here.

but also capable of overflowing/persisting to disk (where a smaller in-memory pool caches the most active queries for latency but can leverage larger available disk space to extend the cache). Is this supported?

Olric doesn't implement any aspects of on-disk persistence. It's a pure in-memory cache. I want to add this feature but there are no active plans for a production-ready implementation. Here is an early prototype.

It's designed as a reimplementation of Redis AOF in Go.

Is this correct information? Will it also keep a copy of the in-memory contents on disk and persist with container restarts to allow filling the in-memory store/cache when a key exists on disk?

I'm not sure how Souin works. It may implement its own persistence mechanism to persist overflowed items.

@develar

Maybe even simple query to dump all keys and values will work. Of course, I fully realize that in a distributed environment it is not a simple task, because immediately after/during hot backup maybe another updates comes.

Olric has a query interface. It doesn't have complicated features to query keys and values, but it's able to dump all keys and values:

c, err := dm.Query(query.M{"$onKey": query.M{"$regexMatch": "",}})

Query function returns a cursor. You can run Range function of Cursor type to iterate over the keys.

err := c.Range(func(key string, value interface{}) bool {
		fmt.Printf("KEY: %s, VALUE: %v\n", key, value)
                // means continue, return false to break the loop.
		return true 
})

Here is the documentation: https://github.com/buraksezer/olric#query

@polarathene I am going to answer the questions in the "Additional Questions" section in a few days.

@polarathene
Copy link
Author

polarathene commented Aug 21, 2021

I'm not sure how Souin works. It may implement its own persistence mechanism to persist overflowed items.

The author was mistaken. There did seem to be some persistence in the past with Olric via Badger that was later dropped (perhaps that's where they recalled Olric supporting such).

They've instead recently provided Badger as an alternative provider choice for users of Souin that want to leverage disk capacity for a caching large binary data.

Olric doesn't implement any aspects of on-disk persistence. It's a pure in-memory cache.

If Badger wasn't being used by Souin for this, presumably Olric would work well by managing an in-memory cache of keys to file paths..?

With the cache storage in Olric monitored to mirror evictions with associated binary data on disk by the application (such as Souin). That'd probably still require to persist the Olric store on disk for hydrating across restarts, thus Badger is better suited for the task atm I guess?


@polarathene I am going to answer the questions in the "Additional Questions" section in a few days.

That's great thank you! I'm not a Go developer, no rush :)

I've since learned a bit more about Redis and Memcache being more suitable for application level cache vs Souin and Varnish (focused on HTTP caching), and that Olric is more equivalent to Redis (which has disk persistence with RDB and AOF, but is mostly focused on being a solid in-memory KV store). It is probably the wrong layer/solution for those interested in caching results to a fixed size cache on disk (eg from an image processing service/API) 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants