Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC Volume corruption? 500 error from volume servers with needle_read.go:58 entry not found1: offset found id size , expected size 623 #5491

Open
werdnum opened this issue Apr 10, 2024 · 3 comments

Comments

@werdnum
Copy link
Contributor

werdnum commented Apr 10, 2024

Sponsors SeaweedFS via Patreon https://www.patreon.com/seaweedfs
Report issues here. Ask questions here https://stackoverflow.com/questions/tagged/seaweedfs
Please ask questions in https://github.com/seaweedfs/seaweedfs/discussions

example of a good issue report:
#1005
example of a bad issue report:
#1008

Describe the bug
Some files seem to become unreadable, getting EIO when trying to read them.

Also seems to cause weed backup memory usage to grow without bound unless I limit it in systemd.

Logs are absolutely full of failures to read a certain volume.

System Setup
Kubernetes Helm chart - initially set up a year or two ago and I've gone and modified it heavily. Easist is probably to dump all the Kubernetes configs in YAML format, which I did here...

https://gist.github.com/werdnum/dab4ebfbf7968efd6d64cb94a6f20bc9

Expected behavior
Able to read files

Screenshots
Not applicable but the logs are absolutely packed with this:

Filer:

[seaweedfs-filer-0] I0410 14:11:53.688292 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-4.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.689917 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-3.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.691530 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-1.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.693570 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-0.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.694584 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-6.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.695231 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-5.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error
[seaweedfs-filer-0] I0410 14:11:53.696541 stream.go:353 read 4416,31f29420acc8ae failed, err: http://seaweedfs-volume-2.seaweedfs-volume:8080/4416,31f29420acc8ae?readDeleted=true: 500 Internal Server Error

Some possibly related stuff in volume server logs:

│ E0410 13:50:36.059204 needle_read.go:58 entry not found1: offset 20499632 found id 63383632386565373531303130353034 size -1809667580, expected size 623                                                                                                                            │
│ E0410 13:50:37.564308 needle_read.go:58 entry not found1: offset 20499632 found id 63383632386565373531303130353034 size -1809667580, expected size 623                                                                                                                            │
│ E0410 13:50:39.820598 needle_read.go:58 entry not found1: offset 20499632 found id 63383632386565373531303130353034 size -1809667580, expected size 623                                                                                                                            
│ E0410 13:56:13.674201 needle_read.go:58 entry not found1: offset 20499632 found id 63383632386565373531303130353034 size -1809667580, expected size 623                                                                                                                            │
│ E0410 14:02:56.008375 needle_read.go:58 entry not found1: offset 333248432 found id 66373363376463356465353066326337 size 452384547, expected size 623                                                                                                                             │
│ E0410 14:03:58.802600 needle_read.go:58 entry not found1: offset 20499632 found id 63383632386565373531303130353034 size -1809667580, expected size 623

For some reason I can't find matching entries for the same volume number in both filer & volume, but it might just not be logged correctly?

Grepping over all logs for the signatures, it seems like the corruption is only in a few places...

❯ for n in 0 1 2 3 4 5 6; do kubectl logs -n seaweedfs seaweedfs-volume-${n} | grep needle_read.go
done | gh gist create -

With context: https://gist.github.com/werdnum/a3a0fc21f56c51a04572716ff015d5dc

Volume 4416 is an erasure-coded volume. I tried to decode it...

> ec.decode -volumeId 4416
ec volume 4416 shard locations: map[seaweedfs-volume-0.seaweedfs-volume:8080:5 seaweedfs-volume-1.seaweedfs-volume:8080:2112 seaweedfs-volume-2.seaweedfs-volume:8080:130 seaweedfs-volume-3.seaweedfs-volume:8080:768 seaweedfs-volume-4.seaweedfs-volume:8080:24 seaweedfs-volume-5.seaweedfs-volume:8080:12288 seaweedfs-volume-6.seaweedfs-volume:8080:1056]
collectEcShards: ec volume 4416 collect shards to seaweedfs-volume-4.seaweedfs-volume:8080 from: map[seaweedfs-volume-0.seaweedfs-volume:8080:5 seaweedfs-volume-1.seaweedfs-volume:8080:2112 seaweedfs-volume-2.seaweedfs-volume:8080:130 seaweedfs-volume-3.seaweedfs-volume:8080:768 seaweedfs-volume-4.seaweedfs-volume:8080:24 seaweedfs-volume-5.seaweedfs-volume:8080:12288 seaweedfs-volume-6.seaweedfs-volume:8080:1056]
copy 4416.[8 9] seaweedfs-volume-3.seaweedfs-volume:8080 => seaweedfs-volume-4.seaweedfs-volume:8080
copy 4416.[1 7] seaweedfs-volume-2.seaweedfs-volume:8080 => seaweedfs-volume-4.seaweedfs-volume:8080
copy 4416.[5] seaweedfs-volume-6.seaweedfs-volume:8080 => seaweedfs-volume-4.seaweedfs-volume:8080
copy 4416.[0 2] seaweedfs-volume-0.seaweedfs-volume:8080 => seaweedfs-volume-4.seaweedfs-volume:8080
copy 4416.[6] seaweedfs-volume-1.seaweedfs-volume:8080 => seaweedfs-volume-4.seaweedfs-volume:8080
generateNormalVolume from ec volume 4416 on seaweedfs-volume-4.seaweedfs-volume:8080
error: generate normal volume 4416 on seaweedfs-volume-4.seaweedfs-volume:8080: rpc error: code = Unknown desc = ec volume 4416 missing shard 0

(note that it complains each time about a different shard being missing)

ec.rebuild returns in under a second so I don't think it's doing anything.

Additional context
Add any other context about the problem here.

@eliphatfs
Copy link

Can be the same issue with #5465.

@eliphatfs
Copy link

Oh, it is different, I can decode successfully.

@werdnum
Copy link
Contributor Author

werdnum commented May 8, 2024

Happened again today. When I try to rebuild, I get this:

> ec.decode -volumeId=238
ec volume 238 shard locations: map[seaweedfs-volume-0.seaweedfs-volume:8080:1023 seaweedfs-volume-1.seaweedfs-volume:8080:10 seaweedfs-volume-2.seaweedfs-volume:8080:9216 seaweedfs-volume-3.seaweedfs-volume:8080:4608 seaweedfs-volume-4.seaweedfs-volume:8080:144 seaweedfs-volume-5.seaweedfs-volume:8080:2304 seaweedfs-volume-6.seaweedfs-volume:8080:96]
collectEcShards: ec volume 238 collect shards to seaweedfs-volume-0.seaweedfs-volume:8080 from: map[seaweedfs-volume-0.seaweedfs-volume:8080:1023 seaweedfs-volume-1.seaweedfs-volume:8080:10 seaweedfs-volume-2.seaweedfs-volume:8080:9216 seaweedfs-volume-3.seaweedfs-volume:8080:4608 seaweedfs-volume-4.seaweedfs-volume:8080:144 seaweedfs-volume-5.seaweedfs-volume:8080:2304 seaweedfs-volume-6.seaweedfs-volume:8080:96]
generateNormalVolume from ec volume 238 on seaweedfs-volume-0.seaweedfs-volume:8080
error: generate normal volume 238 on seaweedfs-volume-0.seaweedfs-volume:8080: rpc error: code = Unknown desc = WriteDatFile /data/238: copy /data/238 small block 0: EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants