You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Running in an EKS cluster as a DaemonSet whilst reading containerd logs it occasionally corrupts the log data on the time field leaving the chunk file blocked in the tail.0/ directory unable to be flushed to the OUTPUT.
To Reproduce
start fluentbit as Daemonset on K8S
Use tail plugin to collect container logs
Please refer configs mentioned below in Configuration section
The above is just a snippet of the whole file and some fields were amended above to protect the data.
Pay close attention to the time field.
Error message from the OUTPUT:
2024-05-15 01:13:43 +0000 [error]: #1 failed to process request error_class=Fluent::Plugin::Parser::ParserError error="invalid time format: value = 14:42.26019153+01:00, error_class = ArgumentError, error = invalid xmlschema format: \"14:42.26019153+01:00\""
The original log file from which this entry was collected didn't exhibit that datetime truncation at the beginning of the datetime string.
Expected behavior
The log files to parsed and chunked correctly like so many million others are.
Screenshots
N/A
Your Environment
Version used: 3.0.3
Configuration:
custom_parsers.conf: |
[PARSER[]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
fluent-bit.conf: |
[SERVICE[]
Daemon Off
Flush 1
Log_Level error
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
scheduler.cap 300
storage.path /var/log/flb-storage/
storage.max_chunks_up 128
storage.sync full
storage.backlog.mem_limit 5M
storage.delete_irrecoverable_chunks on
[INPUT[]
Name tail
Path /var/log/containers/*.log
multiline.parser cri
Tag kube.*
Skip_Long_Lines On
Skip_Empty_Lines On
Buffer_Chunk_Size 64KB
Buffer_Max_Size 128KB
DB /var/log/flb-storage/containers.db
storage.type filesystem
storage.pause_on_chunks_overlimit on
[INPUT[]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Systemd_Filter _SYSTEMD_UNIT=docker.service
Systemd_Filter _SYSTEMD_UNIT=containerd.service
DB /var/log/flb-storage/systemd.db
Read_From_Tail On
storage.type filesystem
storage.pause_on_chunks_overlimit on
[FILTER[]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Labels On
Annotations On
Buffer_Size 1MB
Use_Kubelet On
namespace_labels On
[FILTER[]
Name modify
Match host.*
Rename _HOSTNAME hostname
Rename _SYSTEMD_UNIT systemd_unit
Rename MESSAGE log
Remove_regex ^((?!hostname|systemd_unit|log).)*$
[FILTER[]
Name aws
Match host.*
imds_version v2
[FILTER[]
Name modify
Match *
Add environment_name env-name
Add cluster_name cluster-name
[FILTER[]
Name lua
Match *
script /fluent-bit/scripts/index_name_filter.lua
call index_name
[OUTPUT[]
Name http
Alias an-alias-name
Match *
Host a-host-name.com
Port 443
http_User ${FLUENTD_USER}
http_Passwd ${FLUENTD_PASSWORD}
URI /a-given-tag
Format json
header User-Agent a-user-agent
header_tag FLUENT-TAG
json_date_format iso8601
tls on
tls.verify off
compress gzip
Retry_Limit no_limits
net.dns.resolver async
log_suppress_interval 10s
storage.total_limit_size 500M
Log_Level error
Environment name and version (e.g. Kubernetes? What version?): 1.27.12
Server type and version: Running as docker images on Kubernetes (fluent-bit:3.0.3-debug)
Operating System and version: Linux
Filters and plugins: tail, systemd, kubernetes, modify, http
Additional context
This means that log entries will get stuck and never be processed and indexed as they should.
Bug Report
Describe the bug
Running in an EKS cluster as a DaemonSet whilst reading containerd logs it occasionally corrupts the log data on the time field leaving the chunk file blocked in the tail.0/ directory unable to be flushed to the OUTPUT.
To Reproduce
Contents of chunk file when stuck:
/var/log/flb-storage/tail.0/1-1715689020.629303639.flb
config.seen<D9>#2024-05-11T08:42:31.887233529+01:00<BC>cni.projectcalico.org/podIPs<B2>a-random-ip<A8>pod _name<B9>opensearch-cluster-data-2<A6>labels<8B><BA>app.kubernetes.io/instance<AF>opensearch-data<B6>app.kubernetes.io/name<AA>opensearch<D9> app.kubernetes.io/team-component<B7>opensearch-cluster-data<B8>controller-revision-hash<D9>"opensearch-cluster-data-868b795d8d<AD>helm.sh/chart<B1>opensearch-2.17.0<B7>sidecar.istio.io/inject<A5>false<D9>"statefulset.kubernetes.io/pod-name<B9>opensearch-cluster-data-2<A4> team<A4>a-random-team<B9>app.kubernetes.io/version<A6>2.11.1<BB>app.kubernetes.io/component<B7> opensearch-cluster-data<BC>app.kubernetes.io/managed-by<A4>Helm<AE> namespace_name<AA>opensearch<AF> container_image<D9>random-container-image-name<A6> pod_id<D9>$d641198c-6c29-4744-b20a-21a828f62f9b<A4>time<B4>14:42.26019153+01:00<AF>es_index_prefix<BE>a-random-index-prefix<A2>_p<A1>F<B4>kubernetes_namespace<82>
The above is just a snippet of the whole file and some fields were amended above to protect the data.
Pay close attention to the time field.
Error message from the OUTPUT:
2024-05-15 01:13:43 +0000 [error]: #1 failed to process request error_class=Fluent::Plugin::Parser::ParserError error="invalid time format: value = 14:42.26019153+01:00, error_class = ArgumentError, error = invalid xmlschema format: \"14:42.26019153+01:00\""
The original log file from which this entry was collected didn't exhibit that datetime truncation at the beginning of the datetime string.
Expected behavior
The log files to parsed and chunked correctly like so many million others are.
Screenshots
N/A
Your Environment
tail, systemd, kubernetes, modify, http
Additional context
This means that log entries will get stuck and never be processed and indexed as they should.
Potentially related issues:
#8413 #8718 #8798 #5217
The text was updated successfully, but these errors were encountered: