Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time parsing works as not intended v.1.9.1 #2630

Open
H1r0Sh1mA opened this issue Feb 9, 2024 · 14 comments
Open

Real-time parsing works as not intended v.1.9.1 #2630

H1r0Sh1mA opened this issue Feb 9, 2024 · 14 comments
Labels

Comments

@H1r0Sh1mA
Copy link

GoAccess ver 1.9.1
Env: docker container
Command:

command: >
      /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --keep-last=30
      --time-format='%T'
      --date-format='%d/%b/%Y'
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
      --persist
      --restore

Problem:
During the real-time parsing goaccess re-parsing the entire log file every time a new record is added, causing it to count all the requests and stats again, including those that were already counted in previous parsing iteration.

Describing in more details:
we have 1 record in access log (total 1 line) -> report returns 1 total/valid request (all stats seems fine)
adding 1 record to access log (total 2 lines) -> report returns 3 total/valid/ requests (goaccess read the log file from the beginning and adding the 1st and the 2d records to the previous iteration)
adding 1 record to access log (total 3 lines) -> report returns 6 total/valid requests (goaccess read the log file from the beginning again and adding all the 3 records to previous iteration)
etc.....

The one shot (no real-time monitoring) execution works as intended.

@H1r0Sh1mA H1r0Sh1mA changed the title Real-time parsing works as not intended Real-time parsing works as not intended v.1.9.1 Feb 9, 2024
@allinurl
Copy link
Owner

allinurl commented Feb 9, 2024

Does this occur when you execute it directly on the machine, without using Docker?

@allinurl allinurl added the docker label Feb 9, 2024
@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 10, 2024

Does this occur when you execute it directly on the machine, without using Docker?

Standalone goaccess working as intended (with minimum workload).

Got some updates:

In my initial question i didnt mention one important detail - my access log rotating every 1 hour.
Its inode number does not changing
Here are rotation options:

/var/log/docker/*.log {
        hourly
        rotate 336
        copytruncate
        dateext
        datehourago
        dateformat _%Y-%m-%d_%H%M%S
        extension .log
        missingok
        compress
        notifempty
        sharedscripts
        postrotate
                docker exec `docker ps -q -f name=nginx_container` nginx -s reload > /dev/null 2>/dev/null || true
        endscript
}

The 1st hour after goaccess started im getting an accurate stats.

At the moment the second hour ticks and the access.log has been rotated (access.log inode value is still the same) - then the deviations started to appears.
Some of them:
In Overall Analyzed Requests pane

  1. Unique Visitors counter stops count
  2. Total\Valid requests starting to count with huge deviation (~10x )
  3. Tx Amount starting to count with huge deviation
  4. Requested Files counter stops count
  5. Log Size is correct

On the Time Distribution pane

  • Hour before the access.log has been rotated:
    hits -> 849,513
    visitors-> 48,750
    TX Amount -> 11Gb

this stats are fine

  • The next hour after the access.log has been rotated :
    hits -> 6,981,924
    visitors -> 2,767 (this stat is staying in freezed condition - its value does not changing)
    TX Amount -> 105.6 Gb

this stats received in the first 20 mins of a hour and they are NOT correct
Just to know the correct values for the request count during the fresh hour: wc -l access.log returns 213618 total requests
So im expecting to receive ~ 1,063,131 total requests instead of 6,981,924 in goaccess dashboard

@allinurl
Copy link
Owner

Are you running this using tail -F? Have you tried passing the log directly, for example?

# goaccess access.log -o /reports/access.log.html --real-time-html --origin=https://my.domain/ --ws-url=wss://my.domain/ws --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"' --time-format='%T' --date-format='%d/%b/%Y' --db-path=/internal_db --geoip-database=/geo_db/GeoIP2-Country.mmdb --persist --restore

@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 10, 2024

Are you running this using tail -F? Have you tried passing the log directly, for example?

# goaccess access.log -o /reports/access.log.html --real-time-html --origin=https://my.domain/ --ws-url=wss://my.domain/ws --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"' --time-format='%T' --date-format='%d/%b/%Y' --db-path=/internal_db --geoip-database=/geo_db/GeoIP2-Country.mmdb --persist --restore

in my docker container im running goaccess like this (as stated in my docker-compose file):

command: >
      /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --keep-last=30
      --time-format='%T'
      --date-format='%d/%b/%Y'
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
      --persist
      --restore

so the log goes directly to goaccess without piping

@allinurl
Copy link
Owner

does it function properly outside of Docker when passing the log directly?

@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 10, 2024

does it function properly outside of Docker when passing the log directly?

Yep, standalone goaccess (outside Docker env) working as intended (with minimum workload).

@allinurl
Copy link
Owner

Got it. I'm just trying to pinpoint the problem as precisely as possible. So, it seems the issue arises when running via Docker or when piping data in using tail -F, right?

If you remove --persist and --restore, does the issue still occur with the real-time counters? Also, are you experiencing the same issues with terminal output? Any additional details would be appreciated so I can attempt to reproduce this on my end. Thanks!

@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 10, 2024

The issue arises when running via Docker.
I did not use tail -F as i read its more advisable to send logs direct to goaccess to avoid potential issues.
I will try to remove --persist and --restore options and report as soon as possible.
I will also try to check the TUI behavior.

I can provide you with the docker-compose file im using.

Thank you so much for your time.

@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 13, 2024

Update
removing of --persist and --restore options does not help.
Here is the docker-compose file

  parser:
    image: allinurl/goaccess
    logging:
      driver: syslog
      options:
        tag: "goaccess_parser"
    expose:
      - "7890"
    command: >
      /app/logs/access.log
      -o /app/report/acces.log.html
      --jobs=2
      --real-time-html
      --origin=https://my.domain
      --ws-url=wss://my.domain/ws
      --log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'
      --keep-last=30
      --time-format=%H:%M:%S
      --date-format=%d/%b/%Y
      --db-path=/app/internal_db
      --geoip-database=/app/geo_db/GeoIP2-Country.mmdb
      --geoip-database=/app/geo_db/GeoLite2-ASN.mmdb
      --ignore-panel=OS
      --ignore-panel=BROWSERS
    volumes:
      - ./logs:/app/logs:ro
      - ./reports:/app/report
      - ./parser/internal_db:/app/internal_db
      - ./parser/geo_db:/app/geo_db

Unique Visitors Per Day Pane (wrong values):
image

CUM. T.S\MAX. T.S values looking odd to me

Time Distribution Pane (15h is ok, 16h is wrong. 16h values received in 11 minutes)
image

@H1r0Sh1mA
Copy link
Author

Update

Time Distribution Pane:

Hours from different days overlap each other:
Say - Yesterday 13th hour has 513k hits.
When 13th hour ticks today, then its value starting to add to yesterdays 13th hour (513k for yesterday + XXXk for today)

@allinurl
Copy link
Owner

@H1r0Sh1mA, just to clarify, are you encountering two separate issues? One regarding the real-time problem with Docker and the other regarding the accuracy of the stats numbers? If so, I suggest opening a new issue for the stats so we can address them separately.

Regarding the discrepancy with the cum/max/avg time served, it seems like there might be an issue with your log format. Could you share the actual lines from your access log? That would help us investigate further. Thanks

@H1r0Sh1mA
Copy link
Author

H1r0Sh1mA commented Feb 17, 2024

Could you share the actual lines from your access log? That would help us investigate further.

Feb 17 12:00:04 nginx_hostname nginx_container[13971]: my.domain 109.124.233.55 - - [17/Feb/2024:12:00:04 +0000] "POST /orders/event HTTP/1.1" 200 0 "-" "-" "0.001" "10.0.4.163:80" "0.002"

the log format arguments:

--log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'

escaping trails (like: \"%r\" ) are using due to the docker-compose format style. Now im issuing it like this:

   command: [
      "/app/logs/accesst.log",
      "-o", "/app/report/access.html",
      "--jobs=8",
      "--keep-last=3",
      "--real-time-html",
      "--origin=https://my.domain",
      "--ws-url=wss://my.domain/ws",
      "--log-format='%^ %^ %^ %^ %^ %v %h %^ %^[%d:%t %^] \"%r\" %s %b \"%R\" \"%u\" \"%^\" \"%^\" \"%T\"'",
      "--time-format=%H:%M:%S",
      "--date-format=%d/%b/%Y",
      "--date-spec=min",
      "--db-path=/app/internal_db",
      "--geoip-database=/app/geo_db/GeoIP2-Country.mmdb",
      "--geoip-database=/app/geo_db/GeoLite2-ASN.mmdb",
      "--ignore-panel=OS",
      "--ignore-panel=BROWSERS",
      "--persist",
      "--restore"
    ]

Didnt have a chance yet to test --datetime-format='%d/%b/%Y:%H:%M:%S %z' instead of --time-format=%H:%M:%S and --date-format=%d/%b/%Y

just to clarify, are you encountering two separate issues? One regarding the real-time problem with Docker and the other regarding the accuracy of the stats numbers?

I am experiencing all these issues in docker environment with real-time parsing mode.
That why i have put them all here.

To avoid confusion i will open issues one by one separately.

@H1r0Sh1mA
Copy link
Author

no updates so far?

@allinurl
Copy link
Owner

allinurl commented Mar 27, 2024

Some questions here...

  1. It seems like there might be an issue with your logrotate configuration. Can you verify how it's performing?

  2. Can you also share a simple way to reproduce this issue? I need the most basic command that triggers this issue so that we can narrow down the parameters that might not be causing it. e.g.,

 /app/logs/access.log
      -o /app/report/access.log.html
      --real-time-html
      --log-format='%^ %^ %^ %^ %^ %^ %v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%h:%^" "%T"'
      --time-format='%T'
      --date-format='%d/%b/%Y'
  1. Also, have you tried running it as posted on the README.md and see if that also causes issues?
    tail -F access.log | docker run -p 7890:7890 --rm -i -e LANG=$LANG allinurl/goaccess -a -o html --log-format COMBINED --real-time-html - > report.html
  1. Can you replicate this issue on another machine that also performs log rotation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants