fluentd logs showing sigkill issue while tailing the logfile which is getting updated almost every second. #4304
Replies: 18 comments 4 replies
-
It doesn't reproduce in my environment.
Config: <source>
@type tail
tag test
path /test/fluentd/input/test.log*
pos_file /test/fluentd/pos/pos
refresh_interval 1s
<parse>
@type none
</parse>
</source>
<match test.**>
@type stdout
</match> Add a dummy log per 0.3s. $ while true; do echo "foo" >> test.log; sleep 0.3; done Then, SIGKILL doesn't occur. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reply.
when I check the memory consumption of fluentd worker process during this time, its really not high in memory usage that time, unsure how it can lead to Sigkill issue. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your report.
Can you also check the memory consumption of the entire system (or the container)? |
Beta Was this translation helpful? Give feedback.
-
I monitored the kubectl top pod command output till the time sigkill happens, I see the memory usage is very normal ~600mb and memory limit given to pod is 4gb. one query, do we need to provide any extra configuration/settings for fluentd to read in such situation where the service keeps the log file open entire time and continues to log to the open file descriptor every milli seconds? |
Beta Was this translation helpful? Give feedback.
-
Hmm, it may be something wrong with the disk, CPU, or some other resource, not the memory...
If the cause is a load to trace such a fast file update, setting enable_stat_watcher to <source>
@type tail
@log_level debug
path /data0/podman/storage/overlay-containers/*/userdata/ctr.log
pos_file /var/log/td-agent/podman.pos
refresh_interval 1s
tag kubernetes.podman.*
enable_stat_watcher false
<parse>
@type cri
</parse>
</source> |
Beta Was this translation helpful? Give feedback.
-
on the system when I run the "top" linux command I see few processes along with Ruby consuming cpu more than 100% ... ~150 to ~200% of cpu usage. could it be a cause? |
Beta Was this translation helpful? Give feedback.
-
I think we should consider the possibility that it is the cause.
Does this mean that once the CPU shortage was resolved, this issue no longer occurred? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick response. The top command is ran on the node where fluentd is running, that gives more than 100% cpu utilization, It is not run inside the fluentd container. can we consider this resource utilization? Do we have any flag or configuration which can be given to fluentd to see more detailed logs that which exact event is causing a SIGKILL issue? |
Beta Was this translation helpful? Give feedback.
-
This will reduce the load of Fluentd, right? The system may SIGKILL high-loaded processes when the resource is not enough. I'm not familiar with the SIGKILL mechanism, but I think you should resolve the resource issue first.
Sorry, I'm not familiar with k8s.
I think not. It is not Fluentd's issue. |
Beta Was this translation helpful? Give feedback.
-
when I look at the fluentd debug level logs i see sigkill happening just after tailing the log file from the source where it's getting updated every milli seconds. at this time memory and cpu given fluentd pod was quite high. (cpu: 7 core and memory: 10gb) . |
Beta Was this translation helpful? Give feedback.
-
I moved this to Discussion Q&A because, at this point, we cannot judge whether this is a Fluentd's bug or not (the contributing guidelines). |
Beta Was this translation helpful? Give feedback.
-
Note: additional info #4306 |
Beta Was this translation helpful? Give feedback.
-
#3614 , as I see here pos file having two entries may be causing fluentd to not tail the latest log file after log rotation happens. |
Beta Was this translation helpful? Give feedback.
-
hi @daipom, I see sigkill in a pattern, that after every log rotation (every1 hr) happens for the source ctr.log file, after that within 10mins of time interval we always see sigkill signal in fluentd pod logs. below is the trace logs - every time we see sigkill this below is the log pattern in fluentd. when we remove this source from fluentd I dont see sigkill issue. ` |
Beta Was this translation helpful? Give feedback.
-
ok, i'll try with refresh_interval 60s and update.. |
Beta Was this translation helpful? Give feedback.
-
hi @daipom
fluentd will keeps tailing the ctr.log file, and no issues seen. {"log":{"message":"following tail of /data0/podman/storage/overlay-containers/2f0b40af4be32c9e4ef3983e370fa5d2996f93442f7316c44dbc66999983e93a/userdata/ctr.log"},"extension":{"worker_id":0},"type":"log","level":"info","timezone":"xx","system":"xx","systemid":"xx","host":"xx","time":"2023-09-25T11:13:15+0300"} when for the first time the ctr.log file rotation happens, and the the tail happens for the ctr.log (new ctr.log file after rotation happened) after that next log shows the sigkill. ` ` |
Beta Was this translation helpful? Give feedback.
-
hi @daipom when I stop the log rotation mechanism, i never see a sigkill issue and fluentd runs successfully. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the delay. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
version used: td-agent 4.4.2 fluentd 1.15.3
fluentd configuration used , the source logs ctr.log is coming podman services, and this ctr.log log file is getting updated every less than a second of time. when fluentd tails this file and it gives sigkill issue.
{"log":{"message":"following tail of /xxx/podman/storage/overlay-containers//userdata/ctr.log"},"extension":{"worker_id":0},"type":"log","level":"info","timezone":"xxx/xxx","system":"BSSC","systemid":"035d867d3f4641bc89b5e2858f0116a9","host":"bcmt-fluentd-worker-bssc-fluentd-daemonset-86pfk.ncms","time":"2023-09-14T08:11:02+0300"}
{"time":"2023-09-14T08:23:00+0300","level":"error","message":"Worker 0 finished unexpectedly with signal SIGKILL"}
when enabled debug logs it shows below:
{"log":{"message":"tailing paths: target = /xxx/podman/storage/overlay-containers/060a7faf16e6f7b481e81b214165cda82d16fb7a2dbc16e13c11d41db95969ca/userdata/ctr.log,/data0/podman/storage/overlay-containers/xxx/userdata/ctr.log | existing = /data0/podman/storage/overlay-containers//userdata/ctr.log,/data0/podman/storage/overlay-containers/def7/userdata/ctr.log"},"extension":{"worker_id":0},"type":"log","level":"debug","timezone":"Europe/Helsinki","system":"BSSC","systemid":"035d867d3f4641bc89b5e2858f0116a9","host":"bcmt-fluentd-worker-bssc-fluentd-daemonset-86pfk.ncms","time":"2023-09-14T09:55:29+0300"}
To Reproduce
version used: td-agent 4.4.2 fluentd 1.15.3
Expected behavior
the sigkill shouldn't appear on the fluentd logs.
Your Environment
Your Configuration
Your Error Log
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions