Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get archive content metadata #21

Open
kefniark opened this issue Jul 15, 2020 · 3 comments
Open

Get archive content metadata #21

kefniark opened this issue Jul 15, 2020 · 3 comments

Comments

@kefniark
Copy link

Description

Trying your RAR decoder as a replacement of libarchive, it works well but I'm running into some performance issue.

Somes books I'm dealing with (in CBR) are quite big, over 500MB and 300 pages.
But I dont see with this library how get file descriptions about the rar content without extracting everything. I end up loading the whole book in memory (which takes >20s) when I just want to list files and load the 2 first pages.

LibArchive for example expose a method .getFilesObject() to access metadata and listing file. And the reading/decoding operation is a separated async operation.

I searched but I couldnt see a way to have this kind of feature with bitjs, am I missing something?

@codedread
Copy link
Owner

  1. Out of curiosity, what kind of files are they?

  2. Are you sure that the files have been added to the archive in the order in which they need to be extracted? I have encountered RAR and ZIP files where this is not the case, which means the whole archive needs to be extracted to put files in the right order.

I'm afraid I haven't dug into the RAR format too much lately, so I've forgotten if what you're asking is possible (like is the metadata at the beginning of the file? the end of the file? interspersed throughout the file?). But it seems possible we could introduce a mode that lets the client code extract a file at a time. I want something like this for my comic book reader anyway: codedread/kthoom#31

Out of curiosity, why move from libarchive to this library? Since this is bespoke JavaScript library that does unarchiving of files, I would guess that any library that compiles from a C library like unrar to Web Assembly would have more complete support for the format.

@kefniark
Copy link
Author

  1. Nothing special, just big CBR books, full of illustrations
  2. No indeed they are not sorted and that's the problem, from my list of sample books, a certain % of them are not sorted. But because I access those file locally, I can access any part without streaming

Out of curiosity, why move from libarchive to this library

libarchive works really well for cbz and cbt, but their unrar library is quite buggy and not updated:

from my test, I have errors with lot of books in CBR format, which is why I was trying other unrar library to find a workaround.

For the moment my best workaround is to use libarchive to get rar file description, and fallback to node-unrar-js to access rar content. It works well and solve my problem, but it's a quite ugly hack 😄

I'm quite surprised there is no good unrar library in JS updated without crazy dependencies

@codedread
Copy link
Owner

bitjs is an attempt to avoid any dependencies for this functionality, but its support is not complete (but good enough, imo).

TBH, I'm not that surprised, since RAR Is an undocumented, proprietary format, so unarchivers have to either compile the unrar source or reverse-engineer that.

I'll think about this some more - I can't promise anything though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants