Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics pages not including archive chapters estimated under 1hr time in reading statistics #2961

Open
Cducharme84 opened this issue May 19, 2024 · 3 comments
Labels
bug Something isn't working db-migration This story needs a DB Migration

Comments

@Cducharme84
Copy link

Cducharme84 commented May 19, 2024

What happened?

Noticed that despite reading over 40 series totaling over 6100 read pages both user and statics pages showed 2 hours total read time. All series and chapter estimates show correctly on the series page so it’s being accurately accessed prior to read state.

Upon digging into the db I noticed all the read time values appear to be integer based counting by hour so anything calculated at under an hour is a 0 at all places containing the values, for series/volume tables this leads to intended display in the UI estimate on the series page of a rough by hour based estimate.

For chapters not totaling an hour estimated read time they are recorded as a zero for read time calculations including the statistics pages.

Dup on discord helped identify that in https://github.com/Kareadita/Kavita/blob/97ffdd097504ff9896f626bc7e0deb0c6e743d9d/API/Services/StatisticService.cs the section that contains the following logic to discard chapters with a read time value of zero when calculating page count for statistics display appears to be at fault.

.Where(p => p.chapter.AvgHoursToRead > 0)
.SumAsync(p =>
     p.chapter.AvgHoursToRead * (p.progress.PagesRead / (1.0f * p.chapter.Pages))));

In my case the discrepancy is 11 days estimated read time based on archive page read count of 6100+ pages and the display removing chapters under 1 hour estimated read time from the page count used for the calculation.

Looking at the highly scientific data gathering found in the show your server channel on discord those who mention primarily reading comics but reading frequently seem to have lower than I would expect calculation in their screens. It may also be causing manga chapters under 1 hr to be excluded in the calculation too, but I have less experience in the realm of typical manga archive size so that is conjecture.

Possible solutions:

  1. Remove the exclusion of chapters under an hour from the calculation for statistics display. I do not know if this logic was added for purpose so while simple sounding may be some refactor involved if that was to accommodate something outside of statistics.
  2. Use the low value identified for minutes per page for archive based completed chapters regardless of estimated reading time if the above exclusionary logic is needed for other types of books in statistics calculation.
  3. Refactor estimates to float to allow decimal based storage which involves database schema change, this option I do not see being attractive or even needed but may allow for more freedom in non-statistics page read estimate displays. This implementation would honestly be best looked at in a FR if this was the way the team leaned towards to gauge worthwhileness just waned to present my thought of solutions.

What did you expect?

All completed chapters to have page count included with reading stats

Kavita Version Number - If you don not see your version number listed, please update Kavita and see if your issue still persists.

Nightly Testing Branch

What operating system is Kavita being hosted from?

Docker (Dockerhub Container)

If the issue is being seen on Desktop, what OS are you running where you see the issue?

None

If the issue is being seen in the UI, what browsers are you seeing the problem on?

No response

If the issue is being seen on Mobile, what OS are you running where you see the issue?

None

If the issue is being seen on the Mobile UI, what browsers are you seeing the problem on?

No response

Relevant log output

No response

Additional Notes

Attached is a user with the reading statistics way off
IMG_0309

@Cducharme84 Cducharme84 added the needs-triage Needs to be triaged by a developer and assigned a release label May 19, 2024
@DieselTech
Copy link
Collaborator

To add to this, which I think is likely related, the server stats don't really line up with what it expected.

image

It's stating total read time is 5.5 days for over 100k files (Which is also wrong. I have about 100k comics alone, then about 40-50k manga across a few different libraries). The total size should be approx. 4.3TB for everything added to Kavita. I have some manga series that on their own are 7-10 days of read time alone.

@Cducharme84
Copy link
Author

Oh yeah, my first paragraph I meant user and server statistics pages. Since mine matches 2hrs on both screens I had assumed the server stats screen was displaying all users added together for read time, it just happens in my case my other user hasn’t read much on this db instance and was the same.

@majora2007 majora2007 added this to To do in v0.8 - PDF & Comic Love via automation Jun 9, 2024
@majora2007 majora2007 added the db-migration This story needs a DB Migration label Jun 9, 2024
@majora2007
Copy link
Member

This seems to be an oversight from building stats on top of the estimated reading time feature and is a great find. Unfortunately this requires a DB migration and a bit of rework to the codebase. I'll try to get to this in v0.8.3.

@majora2007 majora2007 added bug Something isn't working and removed needs-triage Needs to be triaged by a developer and assigned a release labels Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working db-migration This story needs a DB Migration
Projects
Development

No branches or pull requests

3 participants