Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pornhub] Give better error for geo-restriction than Unable to extract title #9889

Open
11 tasks done
ClearBlueOcean opened this issue May 9, 2024 · 7 comments
Open
11 tasks done
Labels
geo-blocked Content is geo-blocked NSFW patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@ClearBlueOcean
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

Canada

Provide a description that is worded well enough to be understood

Parsing the title fails on each run. The issue occurs for all videos on Pornhub, using both stable and nightly builds. Same result with or without quotes around the URL. The issue has been occurring for several days, and I've had no problems downloading from other sites.

The common locations for the title look fine in the DOM:

<title>Wild College Orgy: three Hot Babes get Naughty with Students at Dorm Party - Pornhub.com</title>
and
<meta name="twitter:title" content="Wild College Orgy: Three Hot Babes Get Naughty with Students at Dorm Party">

This issue seems similar to #7527 (Closed in Jul 2023)

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

╰─○ yt-dlp -vU "https://www.pornhub.com/view_video.php?viewkey=66266bc7d9ef2"

[debug] Command-line config: ['-vU', 'https://www.pornhub.com/view_video.php?viewkey=66266bc7d9ef2']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2024.05.08.232715 from yt-dlp/yt-dlp-nightly-builds [6b54cccdc] (zip)
[debug] Python 3.12.3 (CPython x86_64 64bit) - Linux-6.8.0-31-generic-x86_64-with-glibc2.39 (OpenSSL 3.0.13 30 Jan 2024, glibc 2.39)
[debug] exe versions: none
[debug] Optional libraries: certifi-2023.11.17, requests-2.31.0, sqlite3-3.45.1, urllib3-2.0.7
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests
[debug] Loaded 1810 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: nightly@2024.05.08.232715 from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date (nightly@2024.05.08.232715 from yt-dlp/yt-dlp-nightly-builds)
[PornHub] Extracting URL: https://www.pornhub.com/view_video.php?viewkey=66266bc7d9ef2
[PornHub] 66266bc7d9ef2: Downloading pc webpage
ERROR: [PornHub] 66266bc7d9ef2: Unable to extract title; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/home/deepblue/.local/bin/yt-dlp/yt_dlp/extractor/common.py", line 734, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepblue/.local/bin/yt-dlp/yt_dlp/extractor/pornhub.py", line 306, in _real_extract
    'twitter:title', webpage, default=None) or self._html_search_regex(
                                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepblue/.local/bin/yt-dlp/yt_dlp/extractor/common.py", line 1357, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepblue/.local/bin/yt-dlp/yt_dlp/extractor/common.py", line 1321, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
@ClearBlueOcean ClearBlueOcean added site-bug Issue with a specific website triage Untriaged issue labels May 9, 2024
@bashonly bashonly added NSFW cant-reproduce The issue cannot be reliably reproduced labels May 9, 2024
@ClearBlueOcean
Copy link
Author

I've been doing further debugging using --write-pages and --print-traffic and have identified the cause. Documenting here in case others have similar issues.

Aylo (formerly MindGeek) is blocking access to their sites, including PornHub, in the US States of Utah, Virginia, Texas, Montana, Mississippi, Arkansas and North Carolina in response to recently enacted age-verification laws in those states. In my case, my desktop and my server are in different locations, so the site was only blocked on my server.

yt-dlp supported sites affected:

  • Pornhub
  • Thumbzilla
  • Tube8
  • YouPorn
  • RedTube

@pukkandan pukkandan removed triage Untriaged issue cant-reproduce The issue cannot be reliably reproduced labels May 15, 2024
@pukkandan pukkandan reopened this May 15, 2024
@pukkandan pukkandan added the geo-blocked Content is geo-blocked label May 15, 2024
@pukkandan pukkandan changed the title [Pornhub] Error: Unable to extract title [Pornhub] Give better error for geo-restriction than Unable to extract title May 15, 2024
@bashonly
Copy link
Member

if we intend on fixing this someone with an IP address in one of the impacted US states will need to share a page dump of the blocked page

@ClearBlueOcean
Copy link
Author

Sure. I've attached a basic "Save Page as..." HTML file for the geo-blocked Pornhub page in my initial issue here: Pornhub.zip (accessed from Texas). I can provide more info if needed (Full curl trace, debug headers, HTML from each of the 7 geo blocked states, and/or from all of the 5 supported Aylo websites that have geo-block restrictions, etc.). I'm happy to help debug or provide whatever info is helpful.

The page contents are quite clear regarding geo-blocking.

image

Pornhub.zip

@bashonly
Copy link
Member

@ClearBlueOcean could you add --write-pages to the command you used in the OP and share the .dump file

@ClearBlueOcean
Copy link
Author

@ClearBlueOcean
Copy link
Author

Or if easier: my repository

@bashonly
Copy link
Member

diff --git a/yt_dlp/extractor/pornhub.py b/yt_dlp/extractor/pornhub.py
index d94f28ceb..97e2260d9 100644
--- a/yt_dlp/extractor/pornhub.py
+++ b/yt_dlp/extractor/pornhub.py
@@ -294,6 +294,11 @@ def dl_webpage(platform):
                 'PornHub said: %s' % error_msg,
                 expected=True, video_id=video_id)
 
+        if age_verify_msg := self._search_regex(
+                r'(your elected officials in \w+ are requiring us to verify your age before allowing you access to our website)',
+                webpage, 'age verification message', default=None):
+            self.raise_geo_restricted(f'PornHub said: {age_verify_msg}')
+
         if any(re.search(p, webpage) for p in (
                 r'class=["\']geoBlocked["\']',
                 r'>\s*This content is unavailable in your country'))

@bashonly bashonly added the patch-available There is patch available that should fix this issue. Someone needs to make a PR with it label May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
geo-blocked Content is geo-blocked NSFW patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

No branches or pull requests

3 participants