Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fetching an item while writing the item into Cache #294

Open
nagardan opened this issue Oct 19, 2023 · 4 comments
Open

[Bug] Fetching an item while writing the item into Cache #294

nagardan opened this issue Oct 19, 2023 · 4 comments

Comments

@nagardan
Copy link

Hi DiskCache Team,

We have recently run into an issue where we have two threads attempting to write and read the same item from the cache which is resulting in a Pickle Error since the disk cache file is empty.

Example Error

  File "/usr/app/lib/python3.8/site-packages/diskcache/core.py", line 1199, in get
   value = self._disk.fetch(mode, filename, db_value, read)
   File "/usr/app/lib/python3.8/site-packages/diskcache/core.py", line 283, in fetch
   return pickle.load(reader)
EOFError: Ran out of input

After some deep diving, I believe it is caused by the following

  1. Thread1: Item 1 is being added to cache and the file is being created https://github.com/grantjenks/python-diskcache/blob/master/diskcache/core.py#L235
  2. Thread2: Item 1 is being fetched and since the file is empty, we see the above error
  3. Thread1: Item1 is written to Cache.

I would be happy to submit a PR for the above error, but would like some guidance on a preferred approach:

  1. Retry 10 times if the EOFError occurs during read (similarly to write)
  2. If EOFError occurs, return that the item cannot be found like this https://github.com/grantjenks/python-diskcache/blob/master/diskcache/core.py#L1174-L1176. This may result in re-writes to cache since items can occur

Any other suggestions welcome

@grantjenks
Copy link
Owner

I didn’t think the item was committed to the SQLite database until after the value was written. Is it overwriting an existing key?

Otherwise, I would probably go with (2). Perhaps there is a way to prevent a duplicate write in that scenario by testing a read.

@nagardan
Copy link
Author

That is correct! It is written first.

We see this error quite infrequently but it still occurs.

I wonder if an overwrite and a read could cause it since the file would be remade? Any thoughts?

@grantjenks
Copy link
Owner

Is it a big value? I wonder if there’s a period where the serialized value is basically in an inconsistent state. Like, half of it is written when the read occurs.

So if loading the value fails in any way, then the item should be treated as though it were not present in the cache.

Another thought is to write the value to a different temporary file and then rename the file into the correct place. On Linux there are ways to guarantee the rename is atomic. In that case, I think the cache would always be in a consistent state. Probably should use https://docs.python.org/3/library/os.html#os.replace

@nagardan
Copy link
Author

The strange part is that the error means the file is empty.

I will try and see where i can re-produce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants