Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cookies] Fix --cookies-from-browser=safari with unusual field order #9853

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rcombs
Copy link

@rcombs rcombs commented May 4, 2024

Description of your pull request and other information

These fields can be specified in any order. DataParser doesn't support skipping backwards, so I make a temporary copy for each.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

These fields can be specified in any order. `DataParser` doesn't support skipping backwards, so I make a temporary copy for each.
@seproDev seproDev added the bug Bug that is not site-specific label May 4, 2024
@pukkandan
Copy link
Member

I don't think this's the best approach.

@mbway if you are available, could you pls review?

@mbway
Copy link
Contributor

mbway commented May 4, 2024

The idea with the DataParser is to scan through the file and make sure every byte is accounted for (with unknown parts explicitly skipped). Allowing for random access would make the implementation more complex which is why it can't currently skip backwards.

The current PR introduces the possibility for sections of the file to be unaccounted for. If that doesn't bother anyone then this implementation looks OK.

To keep consistent ordering, my suggestion would be to sort the fields by their offset and read them in order. For example:

field_offsets = {
    "domain": p.read_uint(),
    "name": p.read_uint(),
    "path": p.read_uint(),
    "value": p.read_uint(),
}
field_values = {}
try:
    for field_name, field_offset in sorted(field_offsets.items(), key=itemgetter(1)):
        p.skip_to(field_offset)
        field_values[field_name] = p.read_cstring()
except UnicodeError:
    ...

Also, @rcombs do you have more information about how you found that the fields can be out of order? I'm curious because I could barely find any information about the format.

Would it be possible to add a test with an example of a cookie database with some out-of-order fields?
For the test here

def test_safari_cookie_parsing(self):
I cleared safari completely and created a HTML page that sets a single cookie then copied the resulting database directly into the test. I probably should have explained that in a comment.

@pukkandan
Copy link
Member

To keep consistent ordering, my suggestion would be to sort the fields by their offset and read them in order.

I like this

The current PR introduces the possibility for sections of the file to be unaccounted for. If that doesn't bother anyone then this implementation looks OK.

Not a big deal, but it would be nice to reassure that the data is structured as we expect.

@rcombs
Copy link
Author

rcombs commented May 5, 2024

do you have more information about how you found that the fields can be out of order?

My cookies file has records with out-of-order fields. I wrote this because the parser was erroring on it and taking down the process, and it fixes it for me.

Would it be possible to add a test with an example of a cookie database with some out-of-order fields?

Probably possible, not sure how to go about it, and I'm not willing to disclose my actual cookie database.

The current PR introduces the possibility for sections of the file to be unaccounted for.

Note that portions of the file may already be unaccounted for; we skip to the end of the record at the end of each one, and there is routinely additional data in that space. If present, it seems to usually be a binary plist containing a StoragePartition key for partitioned/CHIPS cookies; I've also seen NSHTTPCookieAcceptPolicy in a bplist at the end of this file, but seemingly not in the main cookie-record format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

None yet

4 participants