Add DownloadToEpub.koplugin #9943

roygbyte · 2022-12-23T23:04:22Z

Plugin adds a "Download To EPUB" button to external links.

This change is

poire-z · 2022-12-30T13:40:11Z

12/30/22-14:32:32 WARN error in wrapped function: plugins/downloadtoepub.koplugin/main.lua:197: attempt to index global 'Trapper' (a nil value)
(fixed it, then:)

12/30/22-14:34:07 DEBUG DownloadToEpub: Begin download of http://www.magazine-litteraire.com/content/recherche/article?id=20651 outputting to /koreader/koreader-emulator-x86_64-linux-gnu-debug/koreader/EPUB Downloads/
12/30/22-14:34:07 INFO  unwrapped info: Error downloading EPUB: %1

Using another working link:
12/30/22-14:38:41 WARN error in wrapped function: plugins/downloadtoepub.koplugin/libs/xml2lua/xml2lua.lua:140: plugins/gazette.koplugin/libs/gazette/epub/templates/container.xml: No such file or directory

Guess I need that other PR of yours :) but enough playing for today :)

poire-z · 2022-12-30T13:44:52Z

Ok, a bit more playing.
I choosed some existing directory named test , and it created alongside it: testDefault title.epub.

roygbyte · 2022-12-30T15:40:12Z

He's playing with it! How nice (:

You'll want the latest commits (incoming...), which should fix the errors you'd been having.

roygbyte · 2023-01-22T15:50:10Z

Is there anything I need to change for this PR to be integrated? Also, well, do we want this PR integrated?

poire-z · 2023-01-22T16:55:00Z

OK, tried it again ;)
Followed 3 links from footnotes in some book, and got 2 succesful downloads to EPUB (the 3rd got me some single sentence, I didn't bother go looking at the webpage).
The 2 EPUBs were readable enough: some (assuming) full text, with images ! So, it gets the job done.

Major issue is that a next download did overwrite the previous: all were saved to a single Default title.epub - so when following items from your History menu, my 3 links did open the last and single EPUB :/

Also, the default download directory was outside my HOME directory (dunno if it's some remnants from my previous test), and I had to go hunt for it :) And it doesn't use the directory I select ! It keeps saving EPUBs in the original place.
Also, dunno if I could have interrupted the download if it would take a long time or got stuck downloading 100 images. It seems that if I tap while the "Downloading...", it is saved but my tap dismisses the success message - but I can read it if I... reload the current document :)

So, a few usability issues to fix, but the main purpose works.

Dunno if you were supposed to download and tweak the CSS. But I guess we can all be fine without any CSS (which, on the web, are wilder than what crengine can support). Dunno if you could/should clean them out from the resulting HTML.

Probably some <base href=> needed (dunno if crengine supports that, I think not), so probably internal website uris should be transformed to full urls as they will be external links for us.

Anyway, code/PR wise, same comments as before about gazette: it's 35 files, with funky and variable coding style :) All this makes me not want to go look at the code :/
So, you'll probably be the only maintainer, and get many bug reports, given the wild URLs and HTML we can find on the web.

Also, can you explain in a few words how this works, from the moment you get the main HTML data to when you decide what to be downloaded (images ? css ?), and how you transform that main HTML data ? Why xml2lua ? Just to extract things, or do you make a DOM, and serialize it back to HTML so we're sure to get correct and balanced XHTML (that I guess, crengine appreciates :).

(Personally, I would have made it as a single Lua script, doing the download + HTML as a string (no DOM) + some regexp to extract images and download/rework their URI like for Wikipedia.)

Are you using this plugin ? How often ? How happy ?

(I wish some other people would comment and test, because I have no real interest in the feature :/ This was just my curiosity and sympathy making me test and write all this :)
My only real bother are these 35 added files :) and the fact that even when merged, this will stay a Work in progress and you'll never rest, and you'll end up fighting/adjusting to web pages that failed. Probably more rough than RSS and gazette :)

NiLuJe · 2023-01-22T18:08:55Z

I have zero interest in the plugin from a user POV, and my past experience with reviewing a from-scratch massively modular PR was... not a fun one (and made actually grasping how it all fit together basically impossible without resorting to a local IDE), so I have yet to find the block of time + will necessary to review it ;).

roygbyte · 2023-01-25T10:56:34Z

Ahhhh I am haunted by the same issues my other work! Alas :'( I'm starting to see what you mean by the modularity being undesirable. There definitely can be some ways to reduce the file count by 50 percent at least. (On a side note: no longer taking Java so it can't get any worse, right?) Anyways ... it sounds like I should trash this and the other thing because they're so far gone and ... tbh .... I'm not using my ereader much more anyways :')

Frenzie · 2023-01-25T11:31:47Z

I'm in the same boat btw. I'd certainly like to take a better look, but the code structure is less inviting to me. I can't grok it very well just looking at the GH diff or Reviewable due to the unnecessarily complex inheritance.

If you compare the recent vocabbuilder plugin for example that was very clear for me to read.

I'm not really sure how to turn that into useful/actionable feedback, which is why I hadn't responded yet. But for example, I don't really understand why "request" and "requestfactory" are separate files. I know the factory pattern is a popular thing in Java but my understanding of it is a bit different (but possibly wrong or outdated). I'd envision the pattern more like you have the request class and then you have several different factories depending on what type of request you're doing. I'm not sure how much sense that makes here regardless but we don't even have a postrequestfactory vs a getrequestfactory or something. :-)

A similar example is the htmldocument stuff. To me there's no clear reason to split it up into 7 (?) files. I think I'd basically just have one or two files, the htmldocument and the template. That's mainly because at this point those are all one or two functions in an entirely new file/object. (Also see the ResourceIterator in webpageadapter.)

In short, it comes across to me as using a pattern for the sake of using a pattern (or perhaps: to learn the pattern?) rather than because it's necessarily useful here.

Anyway, as in writing good text, writing good code also often comes down to rewriting. That's not trashing it. ;-)

roygbyte · 2023-01-25T22:48:47Z

I'd envision the pattern more like you have the request class and then you have several different factories depending on what type of request you're doing.

Yeah, I was building it with a mind towards the future, where there would need to be many different factories for other subscription services. Probably that's why it's so bloated--those other services never appeared.

In short, it comes across to me as using a pattern for the sake of using a pattern (or perhaps: to learn the pattern?) rather than because it's necessarily useful here.

Actually, ha, yeah, the epub builder stuff twined with me learning about these types of patterns and then going whole hog on them. Oooopsies :)

Anyway, as in writing good text, writing good code also often comes down to rewriting. That's not trashing it. ;-)

True true true...! That would make it 10x easier to work with, too, since I'm not switching buffers so much to make a little change. Hmmm!

Frenzie added the Plugin label Dec 24, 2022

roygbyte added 6 commits January 1, 2023 10:25

Add dependencies

bc07612

Add the code

5590693

Incorrect return values were captured. That's fixed now.

78b1800

Don't notify about something that's not done

c02a63d

Make it less borked. Add history feature.

63182ab

Load files from this plugin, not another

273f869

roygbyte force-pushed the add_download_to_epub branch from 41678cc to 273f869 Compare January 1, 2023 15:05

roygbyte added 2 commits January 1, 2023 15:40

It now be smart enough to redownload history items that went poof

e8ea6ce

No trapper; Use ConfirmBox; Add NetworkMgr

8e80450

roygbyte marked this pull request as ready for review January 1, 2023 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DownloadToEpub.koplugin #9943

Add DownloadToEpub.koplugin #9943

roygbyte commented Dec 23, 2022 •

edited

poire-z commented Dec 30, 2022

poire-z commented Dec 30, 2022

roygbyte commented Dec 30, 2022

roygbyte commented Jan 22, 2023 •

edited

poire-z commented Jan 22, 2023 •

edited

NiLuJe commented Jan 22, 2023 •

edited

roygbyte commented Jan 25, 2023 •

edited

Frenzie commented Jan 25, 2023

roygbyte commented Jan 25, 2023

Add DownloadToEpub.koplugin #9943

Are you sure you want to change the base?

Add DownloadToEpub.koplugin #9943

Conversation

roygbyte commented Dec 23, 2022 • edited

poire-z commented Dec 30, 2022

poire-z commented Dec 30, 2022

roygbyte commented Dec 30, 2022

roygbyte commented Jan 22, 2023 • edited

poire-z commented Jan 22, 2023 • edited

NiLuJe commented Jan 22, 2023 • edited

roygbyte commented Jan 25, 2023 • edited

Frenzie commented Jan 25, 2023

roygbyte commented Jan 25, 2023

roygbyte commented Dec 23, 2022 •

edited

roygbyte commented Jan 22, 2023 •

edited

poire-z commented Jan 22, 2023 •

edited

NiLuJe commented Jan 22, 2023 •

edited

roygbyte commented Jan 25, 2023 •

edited