ytdl_hook: make path and json available to other scripts #14098

nurupo · 2024-05-09T15:22:47Z

It's useful for user scripts to be able to use the same ytdl binary that
ytdl_hook uses without having to replicate ytdl_hook's process of
searching for the ytdl binary.

Some user scripts might also find it useful to be able to access ytdl's
json output that the ytdl_hook already receives, sparing user scripts
from having to make a duplicate ytdl binary invocation to get the json
output.

Fixes #14097, #12852 and #10410.

Tested to work with:

local utils = require 'mp.utils'

local function print_property(name, value)
    print(name .. ": " .. utils.to_string(value))
end

mp.observe_property("user-data/ytdl/json", "native", print_property)
mp.observe_property("user-data/ytdl/path", "string", print_property)

kasper93 · 2024-05-09T15:29:58Z

player/lua/ytdl_hook.lua

@@ -1016,6 +1020,8 @@ function run_ytdl_hook(url)
    msg.verbose("youtube-dl succeeded!")
    msg.debug('ytdl parsing took '..os.clock()-start_time..' seconds')

+    mp.set_property_native("user-data/ytdl/json", json)


I think this should not be persistent storage and instead script-message to notify scripts that may be interested in it.

The json is a bit lengthy and it indeed doesn't make much sense to hold on to it for long. It does feel a bit odd though having to access the path through user-data/ytdl/path property, but access the json via an entirely different mechanism -- script-message.

Anyway, sure, I can change that. Before I do though, I would like to avoid having to change it back. Do we want to wait for someone else's input on this or am I good to make the change?

The difference for me is that, ytdl/path is persistent and will not change. While json is an result of given extraction.

Sure you can wait for other comments to gather more opinions.

One problem I see with storing it in user-data is that it should always reflect what we have for the current file. With the way this is implemented now it only gets updated when acquisition is successful, so e.g. when switching to the next playlist item, the json in user-data is not the correct one for the current path until we get a new one.
By also storing the path which was used for the json, scripts can check if it is for the current path.

Sending a script-message like I did in my PR avoids any race conditions where the path is updated but the json is still the old one, or the other way around.

fwiw another option would be to store the parsed json in user-data/ytdl/data, so that one could e.g. use user-data/ytdl/data/format_id in a conditional profile.

But that might be going too far and if someone really wants to do something like that, they can always make a script that parses the json and stores that table in user-data.

file-local-options/ exists, but it has little bit different purpose.

Exporting parsed json sounds like a good idea, but since we don't have control over the format of it, it wouldn't very usable. That's why I see passing just a json and document it is directly copied from ytdl as better option. We just provide a bridge for the data to flow, not really any assurances about it.

If the main issue with how the PR does mp.set_property_native() is that it doesn't always reflect what we have for the current file, then how about making it always reflect what we have for the current file? For example, have ytdl_hook delete the property on the ~~end-file event~~ on_after_end_file hook? That would also remove the need to communicate which url the json is for, as the property would always refer to the currently open file.

I have pushed this change right now, please check it out.

You can test with, e.g.:

local utils = require 'mp.utils' mp.register_event("file-loaded", function() print("user-data/ytdl/path: " .. mp.get_property("user-data/ytdl/path")) print("user-data/ytdl/json: " .. utils.to_string(mp.get_property_native("user-data/ytdl/json"))) end)

Sharing the json it via script-message works too, but then we also need to specify the url to avoid race conditions as @christoph-heinrich has mentioned.

If we want to share the parsed json, i.e. a lua table, we can do so only via properties. Non-parsed json, i.e. a string, can be shared via both: properties and script-message. We could even share both parsed and non-parsed json via properties if we wanted to, though that might be a bit redundant.

So we have to decide on communicating via a property or script-message, and on sharing a parsed json, an unparsed json or both.

kasper93 · 2024-05-09T15:31:02Z

This needs to be documented, see how #10410 did that.

/cc @christoph-heinrich

na-na-hi · 2024-05-09T15:41:31Z

Has the situation changed since #10410, which was described as "never going to land"?

Also this PR doesn't address the following:
#12852 (comment)

This is only set when ytdl_hook ran at least once, how useful is that for other scripts?

#10410 (comment)

Such raw data cannot be relied upon by scripts, as it can change at any time. For instance if tomorrow ytdl-hook invokes ytdl twice, or if tomorrow it's invoked in a way where the JSON doesn't have the other tracks info (because ytdl-hook doesn't actually need the info for other tracks when not using all-formats=yes), etc.

kasper93 · 2024-05-09T15:50:36Z

This is only set when ytdl_hook ran at least once, how useful is that for other scripts?

Chicken and egg problem. Someone has to detect and set the path of ytdl. And ytdl_hook is the entrypoint for all ytld operations, so imho it is expected that all "child" scripts that may depend on this information should run only if the value is set.

Such raw data cannot be relied upon by scripts, as it can change at any time. For instance if tomorrow ytdl-hook invokes ytdl twice, or if tomorrow it's invoked in a way where the JSON doesn't have the other tracks info (because ytdl-hook doesn't actually need the info for other tracks when not using all-formats=yes), etc.

This is the question for script makers. If this info is useful. It is not "raw data" it is json with defined structure upstream. If any info is not included in the result the "child' script can act accordingly. I think the above comment is fucusing on some specific use-case with all-formats mention. But in fact having json output from ytdl might be used in various ways and if user is adding a "child" script they should also make sure that ytdl-hook config is correct.

If the alternative is to rerun youtube-dl in every script that might need that info, I think the solution of sharing the json is better.

github-actions · 2024-05-09T16:09:58Z

Download the artifacts for this pull request:

Windows

macOS

na-na-hi · 2024-05-09T17:12:16Z

Someone has to detect and set the path of ytdl.

This is only needed if the scripts want to run the ytdl binary themselves, right? In this case why can't the path detection function be extracted and made as a common function available to scripts?

If the alternative is to rerun youtube-dl in every script that might need that info, I think the solution of sharing the json is better.

Agreed, but this still needs to be documented as such, which means mpv passes ytdl's output as-is.

kasper93 · 2024-05-09T17:16:40Z

This is only needed if the scripts want to run the ytdl binary themselves, right? In this case why can't the path detection function be extracted and made as a common function available to scripts?

Currently we don't detect path in separate step. Script just try to run the subprocess and if it succeeded it means we "have" ytdl. This is only a problem if user has custom path set in options.

na-na-hi · 2024-05-09T17:27:58Z

This is only a problem if user has custom path set in options.

In this case scripts can just read that option directly.

kasper93 · 2024-05-09T18:02:34Z

I don't think they can. Also there is still some detection.

nurupo · 2024-05-09T18:16:46Z

This is only needed if the scripts want to run the ytdl binary themselves, right? In this case why can't the path detection function be extracted and made as a common function available to scripts?

The ytdl binary search can indeed be extracted into a separate function, and even placed in a separate script if desired. I extracted ytdl_hook's ytdl path detection in one of my "library" scripts that my other scripts use. Doing so in a user script is inefficient though, as my understanding is that the result can't be cached among different user scripts, every user script that requires that "library" script has its own instance of it. Now, if it was provided my mpv, like you are proposing, then I imagine the result of searching for the ytdl binary could be cached among multiple user scripts?

nurupo · 2024-05-09T18:24:08Z

Has the situation changed since #10410, which was described as "never going to land"?

My understanding for the "never going to land" comment is that a big part of that was the PR being essentially the XY problem (XY solution?).

This is only set when ytdl_hook ran at least once, how useful is that for other scripts?

Can't speak for everyone, everyone's scripts are different, but it is very useful for me. My scripts act on the input file/URL, so they run ytdl only when mp.get_property("path") is a URL, like YouTube, which is the case when the ytdl binary path gets set by ytdl_hook, so my scripts can use it.

Anyway, the intention of the PR is to simply make things that yt_hook already has, available to other scripts, in case they find them useful, with the minimal amount of changes and technical dept. If they find them useful -- good, if not, then the user scripts are not worse than where they were before this PR.

Technical debt incurring thoughts

If the ytdl binary path not being set unless the ytdl_hook runs at least once is a deal breaker for the PR, we could make the ytdl binary searching code in ytdl_hook run on the script init, always setting the ytdl binary path. It is inefficient though because that will result in the ytdl executable always getting invoked on ytdl_hook init. It also goes against the spirit of keeping the changes to ytdl_hook minimal for this PR, avoiding adding extra technical dept to ytdl_hook to support this.

Such raw data cannot be relied upon by scripts, as it can change at any time. For instance if tomorrow ytdl-hook invokes ytdl twice, or if tomorrow it's invoked in a way where the JSON doesn't have the other tracks info (because ytdl-hook doesn't actually need the info for other tracks when not using all-formats=yes), etc.

The yt-dlp json format stability is obviously at the whim of the yt-dlp project, who can break it if they so desire. So user-data/ytdl/json must be documented as such, as an unstable/uncontrollable format as far as mpv is concerned. Since mpv can't promise the stability of the json format, mpv / ytdl_hook breaking it shouldn't be much of an issue.

Technical debt incurring thoughts

If the json format must be future-proofed against mpv breakage by all costs, once ytdl_hook introduces a json format breaking change, ytdl_hook could preserve the old behavior for that variable at a cost of an extra ytdl invocation, possibly gated behind some backwards compatibility config flag, but that's sounds like a lot of technical dept and with the json format already being uncontrollable I'd say don't preserve the old behavior and just let it break.

player/lua/ytdl_hook.lua

na-na-hi · 2024-05-09T21:01:44Z

I don't think they can.

Should be possible with mp.get_opt.

Now, if it was provided my mpv, like you are proposing, then I imagine it could be cached among multiple user scripts?

To make the result cached, we can have a separate script dedicated to the path extraction. The first script which wants the path sends a script-message to that script, which then broadcasts the result with another script-message or by setting a user-data key. Then make the built-in ytdl script also use this method. This way cached path extraction isn't called when unnecessary and can be done even when the built-in ytdl script is disabled or the hook is never called.

My scripts act on the input file/URL, so they run ytdl only when mp.get_property("path") is a URL, like YouTube, which is the case when the ytdl binary path is set by ytdl_hook.

In this case isn't it more appropriate to use script-message for this since you're only interested when the ytdl_hook is called?

player/lua/ytdl_hook.lua

It's useful for user scripts to be able to use the same ytdl binary that ytdl_hook uses without having to replicate ytdl_hook's process of searching for the ytdl binary. Some user scripts might also find it useful to be able to access ytdl's json output that the ytdl_hook already receives, sparing user scripts from having to make a duplicate ytdl binary invocation to get the json output.

nurupo · 2024-05-17T12:49:35Z

After playing around with yt-dlp, I noticed that when it errors, it doesn't produce json output with the error information, it only prints the error to stdout/stderr. For example, if a Twitch channel channel_name is offline, you get:

ERROR: [twitch:stream] channel_name: The channel is not currently live

and no json.

So if user scripts wanted to react to a specific yt-dlp error, they can't do that with the json output alone as there would be no json output in the case of an error, the scripts also need to access the stdout/stderr outputs, checking for error patterns, to make sense of what has happened. It upsets me that yt-dlp doesn't provide error information in a machine-readable format like json, but it is what it is. So I'm thinking of changing the PR to export the entire subprocess return value, the table containing: stdout, stderr, status, killed_by_us, etc., as a single native property or a script-message. That way user scripts have all the information they can possibly have, the json, the errors, the exit code, etc.

kasper93 reviewed May 9, 2024

View reviewed changes

christoph-heinrich reviewed May 9, 2024

View reviewed changes

player/lua/ytdl_hook.lua Outdated Show resolved Hide resolved

nurupo force-pushed the ytdl-export-path-and-json branch 2 times, most recently from afc282b to 2cce6c3 Compare May 15, 2024 08:17

guidocella reviewed May 15, 2024

View reviewed changes

player/lua/ytdl_hook.lua Outdated Show resolved Hide resolved

nurupo force-pushed the ytdl-export-path-and-json branch from 2cce6c3 to 924ebf0 Compare May 15, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ytdl_hook: make path and json available to other scripts #14098

ytdl_hook: make path and json available to other scripts #14098

nurupo commented May 9, 2024 •

edited

kasper93 May 9, 2024

nurupo May 9, 2024 •

edited

kasper93 May 9, 2024

christoph-heinrich May 9, 2024

christoph-heinrich May 9, 2024 •

edited

kasper93 May 9, 2024

nurupo May 15, 2024

kasper93 commented May 9, 2024

na-na-hi commented May 9, 2024

kasper93 commented May 9, 2024

github-actions bot commented May 9, 2024

na-na-hi commented May 9, 2024 •

edited

kasper93 commented May 9, 2024

na-na-hi commented May 9, 2024

kasper93 commented May 9, 2024 •

edited

nurupo commented May 9, 2024 •

edited

nurupo commented May 9, 2024 •

edited

na-na-hi commented May 9, 2024 via email

nurupo commented May 17, 2024

ytdl_hook: make path and json available to other scripts #14098

Are you sure you want to change the base?

ytdl_hook: make path and json available to other scripts #14098

Conversation

nurupo commented May 9, 2024 • edited

kasper93 May 9, 2024

Choose a reason for hiding this comment

nurupo May 9, 2024 • edited

Choose a reason for hiding this comment

kasper93 May 9, 2024

Choose a reason for hiding this comment

christoph-heinrich May 9, 2024

Choose a reason for hiding this comment

christoph-heinrich May 9, 2024 • edited

Choose a reason for hiding this comment

kasper93 May 9, 2024

Choose a reason for hiding this comment

nurupo May 15, 2024

Choose a reason for hiding this comment

kasper93 commented May 9, 2024

na-na-hi commented May 9, 2024

kasper93 commented May 9, 2024

github-actions bot commented May 9, 2024

na-na-hi commented May 9, 2024 • edited

kasper93 commented May 9, 2024

na-na-hi commented May 9, 2024

kasper93 commented May 9, 2024 • edited

nurupo commented May 9, 2024 • edited

nurupo commented May 9, 2024 • edited

na-na-hi commented May 9, 2024 via email

nurupo commented May 17, 2024

nurupo commented May 9, 2024 •

edited

nurupo May 9, 2024 •

edited

christoph-heinrich May 9, 2024 •

edited

na-na-hi commented May 9, 2024 •

edited

kasper93 commented May 9, 2024 •

edited

nurupo commented May 9, 2024 •

edited

nurupo commented May 9, 2024 •

edited