New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] comments are not downloading #9358
Comments
This comment was marked as outdated.
This comment was marked as outdated.
This is probably caused by the A/B test that YouTube is doing in the comments, they are switching from Also @bashonly if the A/B test is the cause of the error then this can definitely be reproduced without an account, you just need to get visitor data with a visitor ID that has the A/B test. |
@bashonly
Windows (PC where comments were correct):
|
@alekskor1063 Can you add Contact: discord (pukkandan#4207) / email (pukkandan.ytdlp@gmail.com) cc @coletdjnz |
@pukkandan I sent them through Discord. |
Looks like I'm getting nothing but the failures now, so maybe the A/B testing is over for me. |
There's nothing you can do when using yt-dlp. The functions _extract_comment and _comment_entries in youtube.py have to be updated or re-written: This is because - as you pointed out - Youtube has moved the comments payload to path frameworkUpdates.entityBatchUpdate.mutations in the JSON response. I'm working on it. Hopefully, within the next 2-3 days, I'll post a pull request or maybe give someone else the code I write up. This issue is affecting me as well. |
Update: Still working on it. Hopefully will be done in another 2-3 days. A complication with Youtube means making temporary accommodations in the code which will be discarded at a later date (after Youtube has finally made up its mind what model to use in its JSON responses.) @themodfather360 had reported above that the commentRenderer model is no longer in use in the JSON responses from Youtube. But I've encountered otherwise. Youtube still uses the commentRenderer model for some videos, sometimes even for the same video, alternating between commentViewModel and commentRenderer from one moment to the next. |
There's a (so-far) unmerged pull request for Invidious that adds support for the new comment format: |
# This patch is public domain (CC0).
diff --git a/yt_dlp/extractor/youtube.py b/yt_dlp/extractor/youtube.py
--- a/yt_dlp/extractor/youtube.py
+++ b/yt_dlp/extractor/youtube.py
@@ -3307,23 +3307,22 @@ def _extract_heatmap(self, data):
'value': ('intensityScoreNormalized', {float_or_none}),
})) or None
- def _extract_comment(self, comment_renderer, parent=None):
- comment_id = comment_renderer.get('commentId')
- if not comment_id:
- return
+ def _extract_comment(self, view_model, entity, parent=None):
+ entity_payload = entity['payload']['commentEntityPayload']
+ comment_id = entity_payload.get('properties').get('commentId')
info = {
'id': comment_id,
- 'text': self._get_text(comment_renderer, 'contentText'),
- 'like_count': self._get_count(comment_renderer, 'voteCount'),
- 'author_id': traverse_obj(comment_renderer, ('authorEndpoint', 'browseEndpoint', 'browseId', {self.ucid_or_none})),
- 'author': self._get_text(comment_renderer, 'authorText'),
- 'author_thumbnail': traverse_obj(comment_renderer, ('authorThumbnail', 'thumbnails', -1, 'url', {url_or_none})),
+ 'text': self._get_text(entity_payload, ('properties', 'content', 'contetn')),
+ 'like_count': self._get_count(entity_payload, ('toolbar', 'likeCountNotliked')),
+ 'author_id': traverse_obj(entity_payload, ('author', 'channelId', {self.ucid_or_none})),
+ 'author': self._get_text(entity_payload, ('author', 'displayName')),
+ 'author_thumbnail': traverse_obj(entity_payload, ('author', 'avatarThumbnailUrl', {url_or_none})),
'parent': parent or 'root',
}
# Timestamp is an estimate calculated from the current time and time_text
- time_text = self._get_text(comment_renderer, 'publishedTimeText') or ''
+ time_text = self._get_text(entity_payload, ('properties', 'publishedTime')) or ''
timestamp = self._parse_time_text(time_text)
info.update({
@@ -3333,25 +3332,23 @@ def _extract_comment(self, comment_renderer, parent=None):
})
info['author_url'] = urljoin(
- 'https://www.youtube.com', traverse_obj(comment_renderer, ('authorEndpoint', (
- ('browseEndpoint', 'canonicalBaseUrl'), ('commandMetadata', 'webCommandMetadata', 'url'))),
+ 'https://www.youtube.com', traverse_obj(entity_payload,
+ ('author', 'channelCommand', 'innertubeCommand', 'browseEndpoint', 'canonicalBaseUrl'),
expected_type=str, get_all=False))
- author_is_uploader = traverse_obj(comment_renderer, 'authorIsChannelOwner')
+ author_is_uploader = traverse_obj(entity_payload, ('author', 'isCreator'))
if author_is_uploader is not None:
info['author_is_uploader'] = author_is_uploader
comment_abr = traverse_obj(
- comment_renderer, ('actionButtons', 'commentActionButtonsRenderer'), expected_type=dict)
+ entity, ('payload', 'engagementToolbarStateEntityPayload', 'heartState'), expected_type=str)
if comment_abr is not None:
- info['is_favorited'] = 'creatorHeart' in comment_abr
+ info['is_favorited'] = comment_abr == 'TOOLBAR_HEART_STATE_HEARTED'
- badges = self._extract_badges([traverse_obj(comment_renderer, 'authorCommentBadge')])
- if self._has_badge(badges, BadgeType.VERIFIED):
- info['author_is_verified'] = True
+ info['author_is_verified'] = traverse_obj(entity_payload, ('author', 'isVerified')) == 'true'
- is_pinned = traverse_obj(comment_renderer, 'pinnedCommentBadge')
- if is_pinned:
+ pinned_text = traverse_obj(view_model, 'pinnedText')
+ if pinned_text:
info['is_pinned'] = True
return info
@@ -3388,21 +3385,25 @@ def extract_header(contents):
break
return _continuation
- def extract_thread(contents):
+ def extract_thread(contents, entity_payloads):
if not parent:
tracker['current_page_thread'] = 0
for content in contents:
if not parent and tracker['total_parent_comments'] >= max_parents:
yield
comment_thread_renderer = try_get(content, lambda x: x['commentThreadRenderer'])
- comment_renderer = get_first(
- (comment_thread_renderer, content), [['commentRenderer', ('comment', 'commentRenderer')]],
- expected_type=dict, default={})
-
- comment = self._extract_comment(comment_renderer, parent)
- if not comment:
+ view_model = traverse_obj(comment_thread_renderer, ('commentViewModel', 'commentViewModel'))
+ if not view_model:
+ view_model = content.get('commentViewModel')
+ if not view_model:
continue
- comment_id = comment['id']
+ comment_id = view_model['commentId']
+ for entity in entity_payloads:
+ if traverse_obj(entity, ('payload', 'commentEntityPayload', 'properties', 'commentId')) == comment_id:
+ entity = entity
+ break
+
+ comment = self._extract_comment(view_model, entity, parent)
if comment.get('is_pinned'):
tracker['pinned_comment_ids'].add(comment_id)
# Sometimes YouTube may break and give us infinite looping comments.
@@ -3495,7 +3496,7 @@ def extract_thread(contents):
check_get_keys = None
if not is_forced_continuation and not (tracker['est_total'] == 0 and tracker['running_total'] == 0):
check_get_keys = [[*continuation_items_path, ..., (
- 'commentsHeaderRenderer' if is_first_continuation else ('commentThreadRenderer', 'commentRenderer'))]]
+ 'commentsHeaderRenderer' if is_first_continuation else ('commentThreadRenderer', 'commentViewModel'))]]
try:
response = self._extract_response(
item_id=None, query=continuation,
@@ -3527,7 +3528,7 @@ def extract_thread(contents):
break
continue
- for entry in extract_thread(continuation_items):
+ for entry in extract_thread(continuation_items, response['frameworkUpdates']['entityBatchUpdate']['mutations']):
if not entry:
return
yield entry This patch may work, but has not been tested enough. |
Here is a branch with the patch from @minamotorin: https://github.com/jakeogh/yt-dlp/tree/youtube_comments_ab and additional fixes. It works for me, but needs more testing. It attempts to handle both the new and old comment format. The patch has a typo: The path is correct, |
@jakeogh
|
This comment was marked as spam.
This comment was marked as spam.
#9775 Here's the pull request also mentioned above, they are working on the comments downloading. They are posting about it today, it may even be merged soon. |
When is this going to be fixed? It's been over 2 months now since the problem is known. A working pull request sits waiting for a week. I bet many of us see this feature as 2nd critical right after the downloading the videos itself. |
Closes #9358 Authored by: jakeogh, minamotorin, shoxie007, bbilly1 Co-authored-by: minamotorin <76122224+minamotorin@users.noreply.github.com> Co-authored-by: shoxie007 <74592022+shoxie007@users.noreply.github.com> Co-authored-by: Simon <35427372+bbilly1@users.noreply.github.com>
Any idea when this will hit stable? Otherwise, how do I change branches to dev without having to compile it myself? |
You can update to nightly with the following command |
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Region
RU
Provide a description that is worded well enough to be understood
My thoughts:
Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)'verbose': True
toYoutubeDL
params instead[debug] Command-line config
) and insert it belowComplete Verbose Output
The text was updated successfully, but these errors were encountered: