Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/youtube] Extract upload date timestamp if available #9856

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

coletdjnz
Copy link
Member

@coletdjnz coletdjnz commented May 4, 2024

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

ADD DESCRIPTION HERE

resolves #9829

fixes #4962

TODO update youtube tests and more thorough testing

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

if not upload_date or (
live_status in ('not_live', None)
not timestamp
and live_status in ('not_live', None)
and 'no-youtube-prefer-utc-upload-date' not in self.get_param('compat_opts', [])
Copy link
Member Author

@coletdjnz coletdjnz May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this compat opt work now?

        upload_date = (
            dt.datetime.fromtimestamp(timestamp, dt.timezone.utc).strftime('%Y%m%d') if (timestamp and 'no-youtube-prefer-utc-upload-date' not in self.get_param('compat_opts', []) else
            (
                unified_strdate(get_first(microformats, 'uploadDate'))
                or unified_strdate(search_meta('uploadDate'))
            ))

maintaining this compat opt is turning out to be a bit of a pain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe youtube-dl is also planning to implement timestamp as UTC. @dirkf Can you confirm? In that case, we should just make the compat option a no-op

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm. It also gets messy with the fact we will be changing the scheduled, live and past live streams / premieres upload_date to be utc now, since we can accurately get it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my WIP extractor gets the timestamp when possible, currently believed to be always, but I need to go back and check the age-gate API data.

Per #9829 (comment), the meta fields in the video page as well as the hydration JSON are using ISO 8601 with TZ offset. Presumably the API fields will mirror the hydration JSON.

The example code that I posted included the idea of own_upload_date to represent the cases where the automatic core upload_date shouldn't be used (because it should be in Pacific time, not UTC) and the unresolved TODO: ... to handle the calculation of upload_date in Pacific time from a known timestamp relative to UTC.

The problem is to know which actual TZ offset from (PST, PDT) applies in Mountain View, CA, for a particular UTC time. Even in Py >= 3.9 with zone_info, the library docs say that there's no cross-platform solution to this sort of TZ processing. If it should be required to do that the code would need to include the DST rules for the Pacific TZ, for each year where the calculation has to be done.

I think that this really argues for "retiring" the compat option and hoping that all the data sources used by the extractor are now providing ISO 8601 data with TZ.

Comment on lines +4570 to +4575
upload_date = (
dt.datetime.fromtimestamp(timestamp, dt.timezone.utc).strftime('%Y%m%d') if timestamp else
(
unified_strdate(get_first(microformats, 'uploadDate'))
or unified_strdate(search_meta('uploadDate'))
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If timestamp, don't set upload_date at all. Core code will do it

Copy link
Member Author

@coletdjnz coletdjnz May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have the old upload_date fallback below. Additionally it helps us avoid having to rewrite all the below code depending on if timestamp is avail or not

yt_dlp/utils/_utils.py Show resolved Hide resolved
@pukkandan
Copy link
Member

Why a1af9ff?

@coletdjnz
Copy link
Member Author

coletdjnz commented May 4, 2024

Why a1af9ff?

To ensure it is accurate in the event they remove the timezone (similar to what they had before), when using 3.8. (It may change between utc-7 and utc-8 if the timezone it is using is US/Pacific, or could change to UTC without a timezone to be consistent with other dates returned... who knows)

Just ensuring we don't end up wrong upload dates on videos where possible.

Alternatively if we don't want to risk it not being US/Pacific we could set the default to NO_DEFAULT regardless to not extract if there is no timezone. Actually I'll do this, simplifies things. Can deal with it later if they change things

@seproDev seproDev added the site-enhancement Feature request for some website label May 4, 2024
@coletdjnz
Copy link
Member Author

updated working YoutubeIE tests with timestamp

pukkandan and others added 2 commits May 12, 2024 23:18
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@coletdjnz
Copy link
Member Author

Made --compat-options no-youtube-prefer-utc-upload-date a noop and updated README to reflect that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
Status: Release blocker
Development

Successfully merging this pull request may close these issues.

[youtube] Upload date being wrong by one day timestamp field not set on YouTube videos
5 participants