Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer not receiving messages when power off and restart, consumer's ack floor is ahead of stream's last sequence #5412

Open
stitchcula opened this issue May 13, 2024 · 12 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@stitchcula
Copy link

stitchcula commented May 13, 2024

Observed behavior

When the server is unexpectedly powered off and then restarted, all consumers are unable to self-recover and fail to receive messages. The log is as follows:

[1022] 2024/05/11 16:31:07.334599 [INF] Starting nats-server
[1022] 2024/05/11 16:31:07.382891 [INF]   Version:  2.10.14
[1022] 2024/05/11 16:31:07.383039 [INF]   Git:      [31af767]
[1022] 2024/05/11 16:31:07.383055 [DBG]   Go build: go1.21.9
[1022] 2024/05/11 16:31:07.383112 [INF]   Name:     NCFGS7EGBHTKZ46NGOLKN365O3RGONUATABFDU3MDRJOOWGBOEKH3LQ4
[1022] 2024/05/11 16:31:07.383150 [INF]   Node:     jAsr7q4d
[1022] 2024/05/11 16:31:07.383172 [INF]   ID:       NCFGS7EGBHTKZ46NGOLKN365O3RGONUATABFDU3MDRJOOWGBOEKH3LQ4
[1022] 2024/05/11 16:31:07.383248 [WRN] Plaintext passwords detected, use nkeys or bcrypt
[1022] 2024/05/11 16:31:07.383363 [INF] Using configuration file: /etc/nats/simpleJetStream.conf
[1022] 2024/05/11 16:31:07.383430 [WRN] Maximum payloads over 8.00 MB are generally discouraged and could lead to poor performance
[1022] 2024/05/11 16:31:07.513725 [INF] Starting http monitor on 0.0.0.0:8422
[1022] 2024/05/11 16:31:07.514385 [INF] Starting JetStream
[1022] 2024/05/11 16:31:07.553176 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1022] 2024/05/11 16:31:07.553260 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1022] 2024/05/11 16:31:07.553283 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1022] 2024/05/11 16:31:07.553303 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1022] 2024/05/11 16:31:07.553323 [INF] 
[1022] 2024/05/11 16:31:07.553342 [INF]          https://docs.nats.io/jetstream
[1022] 2024/05/11 16:31:07.553361 [INF] 
[1022] 2024/05/11 16:31:07.553381 [INF] ---------------- JETSTREAM ----------------
[1022] 2024/05/11 16:31:07.553433 [INF]   Max Memory:      3.73 GB
[1022] 2024/05/11 16:31:07.553464 [INF]   Max Storage:     93.13 GB
[1022] 2024/05/11 16:31:07.553488 [INF]   Store Directory: "/opt/data/nats/jetstream"
[1022] 2024/05/11 16:31:07.553510 [INF] -------------------------------------------
[1022] 2024/05/11 16:31:08.304405 [INF]   Starting restore for stream '$G > XANOLE_OUTPUT_VALUES'
[1022] 2024/05/11 16:31:08.403004 [WRN] Filestore [XANOLE_OUTPUT_VALUES] Stream state detected prior state, could not locate msg block 5054
[1022] 2024/05/11 16:31:08.405046 [INF]   Restored 0 messages for stream '$G > XANOLE_OUTPUT_VALUES' in 101ms
[1022] 2024/05/11 16:31:11.701798 [INF]   Recovering 2 consumers for stream - '$G > XANOLE_OUTPUT_VALUES'
[1022] 2024/05/11 16:31:13.597179 [WRN] Detected consumer '$G > XANOLE_OUTPUT_VALUES > consumer_default' ack floor 14084 is ahead of stream's last sequence 0
[1022] 2024/05/11 16:31:13.597335 [WRN] Detected consumer '$G > XANOLE_OUTPUT_VALUES > consumer_401899c7-a201-42b4-9953-e59d4b427d43' ack floor 14084 is ahead of stream's last sequence 0

After the restart, it was almost 100% likely to reproduce the situation described in the issue, where the consumer's ack floor is greater than the stream's last sequence, resulting in all consumers being unable to consume.

Note that:

[WRN] Filestore [XANOLE_OUTPUT_VALUES] Stream state detected prior state, could not locate msg block 5054
...
[WRN] Detected consumer '$G > XANOLE_OUTPUT_VALUES > consumer_default' ack floor 14084 is ahead of stream's last sequence 0

This appears to be the last blk index recorded in index.db, which actually cannot be found on the disk. This inconsistency may be caused by non-atomic sync writing of the index.db and blk to disk. When server/filestore.go func (fs *fileStore) recoverFullState() (rerr error) cannot find the corresponding [index].blk, it defaults to setting the stream's LastSeq to 0. The consumer's meta is also independently persisted, and server/consumer.go func (o *consumer) checkStateForInterestStream() error reads and finds that the recorded AckFloor is higher than the actual stream's LastSeq, thus throwing an error.

By the way, I've noticed that the persistence and recovery mechanisms of the filestore in different versions of the stream vary slightly.

I tested various sync time periods, including 'sync: always', but it doesn't seem to have much effect:

jetstream {
    store_dir: /userdata/nats
    max_mem: 100M
    max_file: 500M
    sync: always # 1s to 2min
}

The related issue is #4566 . I believe this is a server-side problem and should not be circumvented by client-side operations like deleting and re-adding consumers.

This issue is beyond my ability to fix quickly. I earnestly seek assistance. @derekcollison @bruth

If you need more information or logs, please feel free to contact me, and I will provide them as soon as possible.

Expected behavior

After the server unexpectedly power off and then restarted, all consumers should able to self-recover and messages should be consumed normally.
Some degree of data loss is acceptable.

Server and client version

I conducted power-off tests on the following server versions, and all exhibited the issue:

  • v2.10.6
  • v2.10.11
  • v2.10.14
  • v2.10.15
  • v2.11.0-preview.2
  • main (#31b155b)

All client versions have been cross-tested:

  • v1.30.0
  • v1.30.2
  • v1.34.1

Host environment

I have used multiple servers for testing, and have tested with various file systems (xfs, ext4) and types of disks (nano flash, SSD, mechanical hard disk) to rule out the possibility of randomness.

eg:

Linux XXX 5.10.110 #3 SMP Tue Oct 24 15:37:17 CST 2023 aarch64 GNU/Linux, ext4, nano flash
Linux XXX 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 GNU/Linux, xfs, SSD
Linux XXX 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 GNU/Linux, xfs, HDD

Steps to reproduce

  • Using a single-node NATS JetStream running on a standalone server. The configuration for the server is as follows:
jetstream {
    store_dir: /userdata/nats
    max_mem: 100M
    max_file: 500M
    sync: always # 1s to 2min
}
  • Create stream and consumer. The configuration for the consumer is as follows:
Information for Consumer XANOLE_OUTPUT_VALUES > consumer_default created 2024-05-11T16:03:32+08:00

Configuration:

                    Name: consumer_default
        Delivery Subject: _INBOX.YOhDaj4BWhBtA4ueOQxq0x
          Filter Subject: xanole.outputs.xxx>
          Deliver Policy: All
     Deliver Queue Group: consumer_default
              Ack Policy: Explicit
                Ack Wait: 30.00s
           Replay Policy: Instant
         Max Ack Pending: 1,000
            Flow Control: false

State:

  Last Delivered Message: Consumer sequence: 1,074,833 Stream sequence: 1,074,913 Last delivery: 423µs ago
    Acknowledgment Floor: Consumer sequence: 1,074,832 Stream sequence: 1,074,912 Last Ack: 2ms ago
        Outstanding Acks: 1 out of maximum 1,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
         Active Interest: Active using Queue Group xhouse_inputs_agents_default
  • After normal production and consumption for a period of time, directly cut off the server's power supply and then restart, you can almost certainly reproduce the issue 100% of the time.

  • Using reboot and kill -9 seems unable to simulate this process.

@stitchcula stitchcula added the defect Suspected defect such as a bug or regression label May 13, 2024
@derekcollison
Copy link
Member

After cutting the power, could you snapshot the store directory and share with us privately after reboot but before you restart the nats-server?

Also, what size payloads have you set the system to use? I see it is complaining they are over 8MB which we strongly discourage.

@derekcollison
Copy link
Member

Or provide more information on number of messages inbound into the WQ, how big are they, how are the consumers operating? Does the stream usually have messages or do the consumers usually consumer messages pretty quickly and keep the number of messages very low or at zero?

@bfoxstudio
Copy link
Contributor

same problem on kv tests and reboot container by drain docker node or reboot VM #5205

@yuzhou-nj
Copy link

I have encountered this problem. The sequence of consumer exceeds that of stream, so I cannot get new messages from stream until the sequence of stream exceeds that of consumer.

@derekcollison
Copy link
Member

We are trying to reproduce, so any and all information on how you are interacting with the stream is helpful.

How fast messages are being sent into the system?
How fast they are being consumed?

In general is the stream near empty or at empty most of the time?

@katrinwab
Copy link

in our case, 1 msg per sec was sent
cluster has 5 node

we drained nodes with meta leader and stream leader
waited 2 minute
activated nodes again
after all, consumer didn't see all message (etc some messages were pulling, but jetstream_consumer_num_pending was increasing)

@katrinwab
Copy link

also we have problems where we drain not leader node
after all, jetstream_stream_last_seq in drained node is less than in not drained node
that difference increases by time

@wallyqs
Copy link
Member

wallyqs commented May 14, 2024

Thanks everyone for sharing about the symptoms from this issue, currently we're looking at a reproducible example to trip this condition.
@stitchcula I see in the logs that you have a warning of Maximum payloads over 8.00 MB, do you have an idea of the size of the messages that you are sending or whether persisted messages are close to 8MB?

@bfoxstudio
Copy link
Contributor

Thanks everyone for sharing about the symptoms from this issue, currently we're looking at a reproducible example to trip this condition. @stitchcula I see in the logs that you have a warning of Maximum payloads over 8.00 MB, do you have an idea of the size of the messages that you are sending or whether persisted messages are close to 8MB?

We have a standard 1mb payloads. We test on version master -> 2.10.5 and always when restarting the docker server problem with consumers and problem with last sequence reproduce

@stitchcula
Copy link
Author

stitchcula commented May 15, 2024

In my testing, the size of payload is around 50KB, with a consumption rate of 100 messages per second. The consumers usually consumer messages pretty quickly and keep the number of messages very low.

To avoid incidents in production, I temporarily forked a branch and made an aggressive fix, where if the consumer initialization finds the ack floor higher than the stream's lastSeq, it forcibly resets to lastSeq.

This allows the consumer to consume normally, but data loss occurs. I believe the fundamental issue is that the persistence mechanism of the stream during a power outage can be improved to avoid losing the cursor.

@stitchcula
Copy link
Author

snapshot the store directory and share with us privately after reboot but before you restart the nats-server

I have already taken a snapshot of the store directory after rebooting but before restarting the nats-server. How can I share it with you?

Thanks for the hard work!

@wallyqs
Copy link
Member

wallyqs commented May 15, 2024

@stitchcula you can send to wally@nats.io if possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

6 participants