Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce fatlog to record big response packet. #336

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

CharlesChen888
Copy link
Member

@CharlesChen888 CharlesChen888 commented Apr 19, 2024

Why fat log?

We already have slowlog, which keeps a record of recent queries that take relatively long time to execute. However sometimes it is not enough for debugging.

For example, we often receive questions from users like: "Why is my QPS not high, but the network flow is substantial?" or "Why did the network flow suddenly increase?"

Here a new log is introduced to keep a record when a query triggers a large response (and we may call this log "fatlog"), in which the command, response size, time stamp, client name and client network address are logged. This can help to monitor traffic flow spikes and big keys.

Alternative solution?

  • Slowlog: the command with big response packet may not be slow enough to trigger slowlog.
  • Big key: we can detect big keys, but we are not sure whether they are accessed at a certain time, and accessing a big key does not necessarily generate large responses.
  • Client traffic statistics: we may need to locate the exact commands.

Detail introduction

Just like slowlog, fatlog is not actually logged in a file, and is accessible through a series of commands:

FATLOG GET [count]
FATLOG LEN
FATLOG RESET
127.0.0.1:6379> config set fatlog-log-bigger-than 10
OK
127.0.0.1:6379> set a 1234567890
OK
127.0.0.1:6379> get a
"1234567890"
127.0.0.1:6379> fatlog get
1) 1) (integer) 3
   2) (integer) 1713516667
   3) (integer) 17
   4) 1) "get"
      2) "a"
   5) "127.0.0.1:62969"
   6) ""

also a help command is provided:

FATLOG HELP

And just like slowlog, two config items are provided to set how many log items are preserved and the response size threshold of being logged.

fatlog-max-len (default value: 128)
fatlog-log-bigger-than (default value: 16kb)

Also a new command flag (currently used by EXEC).

#define CMD_SKIP_FATLOG (1ULL<<29)

About implementation

You will find that a lot changes have been appllied to slowlog.c. That is because the implementation of fatlog is basically reusing the code of slowlog, since they have similar format and are controled by similar commands. So what used to be slowlog.c now contains code of both slowlog and fatlog. I tried to think of a name meaning "both slow and fat", and "heavy load" seems to cover both situations.

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
@madolson
Copy link
Member

I'm not sure about this specific feature. At least from what I've seen, it's less common for a single command to produce an unexpectedly large result without it also showing up in the slowlog.

@CharlesChen888
Copy link
Member Author

CharlesChen888 commented Apr 22, 2024

At least from what I've seen, it's less common for a single command to produce an unexpectedly large result without it also showing up in the slowlog.

This does happen. Some of our users are very sensitive to network flow spikes, and they couldn't find any clue in slowlog, so thay asked us about it.

Slowlog only records time of command executions ss. But a hash lookup and a memory copy (simple get command, even with a big response packet of 1MB) could be really quick, (compared to the default slowlog threshold, 10ms).

@hwware
Copy link
Member

hwware commented Apr 24, 2024

@CharlesChen888 Before I review your PR, could you please provide some background information about why you want to update the slowlogCommand to heavyLoadLogCommand? Because in the top comment, you mention you want to add some new commands for fatlog, but in your pr, you make some changes for slowlog files. I am confused it.

@CharlesChen888
Copy link
Member Author

@hwware This is because fatlog shares a lot code of slowlog. So the function used to be called slowlogCommand now contains code of both slowlog command and fatlog command. I tried to think of a name meaning "both slow and fat", "heavy load" seems to cover both situation.

@madolson
Copy link
Member

madolson commented May 6, 2024

This does happen. Some of our users are very sensitive to network flow spikes, and they couldn't find any clue in slowlog, so they asked us about it.

Can you be more precise than this? People ask us at AWS too about issues, but we've always been able to debug this type of problem without what you proposed. What do you mean they are sensitive to "network flow spikes". We've seen a lot more issues with a sudden burst of smaller commands than this type of "single large" packet issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Needs decision by core team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants