Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DownloadToEpub.koplugin #9943

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

roygbyte
Copy link
Contributor

@roygbyte roygbyte commented Dec 23, 2022

Plugin adds a "Download To EPUB" button to external links.


This change is Reviewable

@Frenzie Frenzie added the Plugin label Dec 24, 2022
@poire-z
Copy link
Contributor

poire-z commented Dec 30, 2022

12/30/22-14:32:32 WARN error in wrapped function: plugins/downloadtoepub.koplugin/main.lua:197: attempt to index global 'Trapper' (a nil value)
(fixed it, then:)

12/30/22-14:34:07 DEBUG DownloadToEpub: Begin download of http://www.magazine-litteraire.com/content/recherche/article?id=20651 outputting to /koreader/koreader-emulator-x86_64-linux-gnu-debug/koreader/EPUB Downloads/
12/30/22-14:34:07 INFO  unwrapped info: Error downloading EPUB: %1

Using another working link:
12/30/22-14:38:41 WARN error in wrapped function: plugins/downloadtoepub.koplugin/libs/xml2lua/xml2lua.lua:140: plugins/gazette.koplugin/libs/gazette/epub/templates/container.xml: No such file or directory

Guess I need that other PR of yours :) but enough playing for today :)

@poire-z
Copy link
Contributor

poire-z commented Dec 30, 2022

Ok, a bit more playing.
I choosed some existing directory named test , and it created alongside it: testDefault title.epub.

@roygbyte
Copy link
Contributor Author

He's playing with it! How nice (:

You'll want the latest commits (incoming...), which should fix the errors you'd been having.

@roygbyte roygbyte marked this pull request as ready for review January 1, 2023 23:32
@roygbyte
Copy link
Contributor Author

roygbyte commented Jan 22, 2023

Is there anything I need to change for this PR to be integrated? Also, well, do we want this PR integrated?

@poire-z
Copy link
Contributor

poire-z commented Jan 22, 2023

OK, tried it again ;)
Followed 3 links from footnotes in some book, and got 2 succesful downloads to EPUB (the 3rd got me some single sentence, I didn't bother go looking at the webpage).
The 2 EPUBs were readable enough: some (assuming) full text, with images ! So, it gets the job done.

Major issue is that a next download did overwrite the previous: all were saved to a single Default title.epub - so when following items from your History menu, my 3 links did open the last and single EPUB :/
image

Also, the default download directory was outside my HOME directory (dunno if it's some remnants from my previous test), and I had to go hunt for it :) And it doesn't use the directory I select ! It keeps saving EPUBs in the original place.
Also, dunno if I could have interrupted the download if it would take a long time or got stuck downloading 100 images. It seems that if I tap while the "Downloading...", it is saved but my tap dismisses the success message - but I can read it if I... reload the current document :)

image

image

So, a few usability issues to fix, but the main purpose works.

image

Dunno if you were supposed to download and tweak the CSS. But I guess we can all be fine without any CSS (which, on the web, are wilder than what crengine can support). Dunno if you could/should clean them out from the resulting HTML.

image
Probably some <base href=> needed (dunno if crengine supports that, I think not), so probably internal website uris should be transformed to full urls as they will be external links for us.

Anyway, code/PR wise, same comments as before about gazette: it's 35 files, with funky and variable coding style :) All this makes me not want to go look at the code :/
So, you'll probably be the only maintainer, and get many bug reports, given the wild URLs and HTML we can find on the web.

Also, can you explain in a few words how this works, from the moment you get the main HTML data to when you decide what to be downloaded (images ? css ?), and how you transform that main HTML data ? Why xml2lua ? Just to extract things, or do you make a DOM, and serialize it back to HTML so we're sure to get correct and balanced XHTML (that I guess, crengine appreciates :).

(Personally, I would have made it as a single Lua script, doing the download + HTML as a string (no DOM) + some regexp to extract images and download/rework their URI like for Wikipedia.)

Are you using this plugin ? How often ? How happy ?

(I wish some other people would comment and test, because I have no real interest in the feature :/ This was just my curiosity and sympathy making me test and write all this :)
My only real bother are these 35 added files :) and the fact that even when merged, this will stay a Work in progress and you'll never rest, and you'll end up fighting/adjusting to web pages that failed. Probably more rough than RSS and gazette :)

@NiLuJe
Copy link
Member

NiLuJe commented Jan 22, 2023

I have zero interest in the plugin from a user POV, and my past experience with reviewing a from-scratch massively modular PR was... not a fun one (and made actually grasping how it all fit together basically impossible without resorting to a local IDE), so I have yet to find the block of time + will necessary to review it ;).

@roygbyte
Copy link
Contributor Author

roygbyte commented Jan 25, 2023

Ahhhh I am haunted by the same issues my other work! Alas :'( I'm starting to see what you mean by the modularity being undesirable. There definitely can be some ways to reduce the file count by 50 percent at least. (On a side note: no longer taking Java so it can't get any worse, right?) Anyways ... it sounds like I should trash this and the other thing because they're so far gone and ... tbh .... I'm not using my ereader much more anyways :')

@Frenzie
Copy link
Member

Frenzie commented Jan 25, 2023

I'm in the same boat btw. I'd certainly like to take a better look, but the code structure is less inviting to me. I can't grok it very well just looking at the GH diff or Reviewable due to the unnecessarily complex inheritance.

If you compare the recent vocabbuilder plugin for example that was very clear for me to read.

I'm not really sure how to turn that into useful/actionable feedback, which is why I hadn't responded yet. But for example, I don't really understand why "request" and "requestfactory" are separate files. I know the factory pattern is a popular thing in Java but my understanding of it is a bit different (but possibly wrong or outdated). I'd envision the pattern more like you have the request class and then you have several different factories depending on what type of request you're doing. I'm not sure how much sense that makes here regardless but we don't even have a postrequestfactory vs a getrequestfactory or something. :-)

A similar example is the htmldocument stuff. To me there's no clear reason to split it up into 7 (?) files. I think I'd basically just have one or two files, the htmldocument and the template. That's mainly because at this point those are all one or two functions in an entirely new file/object. (Also see the ResourceIterator in webpageadapter.)

In short, it comes across to me as using a pattern for the sake of using a pattern (or perhaps: to learn the pattern?) rather than because it's necessarily useful here.

Anyway, as in writing good text, writing good code also often comes down to rewriting. That's not trashing it. ;-)

@roygbyte
Copy link
Contributor Author

I'd envision the pattern more like you have the request class and then you have several different factories depending on what type of request you're doing.

Yeah, I was building it with a mind towards the future, where there would need to be many different factories for other subscription services. Probably that's why it's so bloated--those other services never appeared.

In short, it comes across to me as using a pattern for the sake of using a pattern (or perhaps: to learn the pattern?) rather than because it's necessarily useful here.

Actually, ha, yeah, the epub builder stuff twined with me learning about these types of patterns and then going whole hog on them. Oooopsies :)

Anyway, as in writing good text, writing good code also often comes down to rewriting. That's not trashing it. ;-)

True true true...! That would make it 10x easier to work with, too, since I'm not switching buffers so much to make a little change. Hmmm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants