New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Lemmy instances sending every activity twice #4609
Comments
This seems like a bug in Lemmy so youre definitely right to report it here. Doesnt matter if you observe it from another platform. It looks like all requests go to @phiresky Do you have any idea about this? All I can think of is that theres more than one federation worker active for a single target instance, but the code looks correct. Edit: It could help to write unit tests for the outgoing federation queue, to ensure that it starts and stops workers as expected, and sends activities correctly. |
I know that lemmy.world definitely had a misconfiguration for a bit where they had multiple instances of Lemmy running without the appropriate -federation-worker flags set (since those where not required before 0.19). If this issue doesn't happen always and for every instance then it's likely a misconfiguration in those instances. That doesn't mean we shouldn't do something to improve the situation. For example we could add a heuristic to the federation workers that detects and logs an error when the federation_queue_state table has been updated by someone else which would mean overlapping federation workers running. The best solution would be a horizontal scaling method that does not require manual configuration of federation instance ranges but I can't think of a way to do that without huge complexity. |
The bug also seems to happen for instances that are running only a single Lemmy backend, eg mander.xyz. |
Have done some more testing. Test 1Sent a Follow from my server to a community on mander.xyz using a Ruby script. Script in here #!/usr/bin/env ruby
require 'securerandom'
require 'openssl'
require 'http' # needs 'gem install http'
# change these
my_site = "lemmon.website"
my_dir = "u"
my_actor = "freamon"
my_private_key_location = "/home/andrew/keys/#{my_actor}/private.pem"
my_public_key_location = "https://#{my_site}/#{my_dir}/#{my_actor}"
target_site = "mander.xyz"
target_dir = "c"
target_actor = "printforgood"
uuid = SecureRandom.uuid
json = <<~EOT
{
"@context": [
"https://w3id.org/security/v1",
"https://www.w3.org/ns/activitystreams"
],
"id":"https://#{my_site}/follows/#{uuid}",
"type":"Follow",
"actor":"https://#{my_site}/#{my_dir}/#{my_actor}",
"object":"https://#{target_site}/#{target_dir}/#{target_actor}"
}
EOT
date = Time.now.utc.httpdate
digest = Base64.strict_encode64(OpenSSL::Digest.digest('SHA256', json))
digest.prepend("SHA-256=")
keypair = OpenSSL::PKey::RSA.new(File.read("#{my_private_key_location}"))
signed_string = "(request-target): post /#{target_dir}/#{target_actor}/inbox\ncontent-type: application/activity+json\ndate: #{date}\ndigest: #{digest}\nhost: #{target_site}"
signature = Base64.strict_encode64(keypair.sign(OpenSSL::Digest::SHA256.new, signed_string))
header = 'keyId="' + my_public_key_location + '#main-key",algorithm="rsa-sha256",headers="(request-target) content-type date digest host",signature="' + signature + '"'
response = HTTP.headers({ 'Host': target_site, 'User-Agent': 'ActivityPub +https://' + my_site, 'Content-Type': 'application/activity+json', 'Digest': digest, 'Date': date, 'Signature': header })
.post("https://#{target_site}/#{target_dir}/#{target_actor}/inbox", body: json)
p response # header
p response.to_s # body (if any) Received 3 duplicate Accepts in response. ip.address - - [09/Apr/2024:18:30:27 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://mander.xyz"
ip.address - - [09/Apr/2024:18:30:27 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://mander.xyz"
ip.address - - [09/Apr/2024:18:30:27 +0000] "POST /inbox HTTP/1.1" 200 0 "-" "Lemmy/0.19.3; +https://mander.xyz" My backend server just gives each request a UUID and dumps the bodies into a file: -rw-r--r-- 1 andrew andrew 475 Apr 9 18:30 activity_from_printforgood_737b341e-54f4-42fe-9bdb-ae89c7f89222.json
-rw-r--r-- 1 andrew andrew 475 Apr 9 18:30 activity_from_printforgood_38b15127-ba42-4b4a-b552-946a6707d2e9.json
-rw-r--r-- 1 andrew andrew 475 Apr 9 18:30 activity_from_printforgood_f8a6f4ed-376c-4ade-9abe-87d9035435fc.json Each file contains the same Accept JSON. JSON in here{
"@context": [
"https://join-lemmy.org/context.json",
"https://www.w3.org/ns/activitystreams"
],
"actor": "https://mander.xyz/c/printforgood",
"to": [
"https://lemmon.website/u/freamon"
],
"object": {
"actor": "https://lemmon.website/u/freamon",
"to": null,
"object": "https://mander.xyz/c/printforgood",
"type": "Follow",
"id": "https://lemmon.website/follows/194d816b-1599-4587-95ae-d274c1d4209c"
},
"type": "Accept",
"id": "https://mander.xyz/activities/accept/b0f94ef2-f411-4b09-8548-9d81b98b01c7"
} Test 2Sent a Follow from a Lemmy instance running inside a VM and tunnelled through to via ngrok.com, to a community on mander.xyz, using the lemmy-ui Received only 1 Accept in response: Test 3Still on the VM. Extracted the private key for my lemmy user, and sent a Follow to a different community on mander.xyz (that was conveniently stuck at 'Pending'), using the Ruby script from Test 1. Received 2 Accepts in response, one of which lemmy dutifully rejected Test 4Used the lemmy-ui again to subscribe to a different community on mander.xyz. Also received 2 Accepts in response, one of which lemmy rejected Test 5Same as test 2-5, but for communities on lemmy.ml instead. Follows were sent using the lemmy-ui, and the Ruby script, but only 1 Accept was ever sent back Test 6Sent a Follow from my server to a community on lemmy.ml using the Ruby script. Received only 1 Accept back in response ThoughtsI've considered the possibility that me using lemmon.website to experiment with ActivityPub is a factor, and Lemmy instances have more than 1 idea of what the site is, and are sending activity for each version. The argument against this, is that I've interacted way more with lemmy.ml than mander.xyz, and there's no duplications coming from lemmy.ml. Also, when lemmy.world was on 0.18.5, it likewise never sent me dupes. I even tried to insert another record for 'lemmon.website' into the lemmy DB on the VM I was using, and - unsurprisingly - it rejected it for not being unique. Likewise, the Ruby script I've used - it works without issue for lemmy.ml (and Mastodon instances, and Pixelfed instances, etc), but me using it in Test 3 seemed to be the trigger that tripped mander.xyz up into starting sending me activity twice. I haven't been able to form anything conclusive from this, but maybe something will stand out to people more familiar with Lemmy's code. |
I've seen duplicate activities on Lemmy <-> Lemmy before, it seems unlikely to be related to the software on the receiving side. |
It would be good to get a reproduction with an instance that someone actually controls because otherwise it's hard to say whether there is a misconfiguration or not and to get the relevant logs / db entries (which are on the sending side) |
It might be relevant since in general every actor has their own inbox, and the merging of these inboxes into a single site-wide inbox might not work correctly depending on how other end presents their inboxes. If lemmy sees inboxes as separate then it may send events to multiple inboxes |
I also thought about that, but based on the logs posted by @freamon, all activities were sent to I messaged the admin of mander.xyz to help investigate, but havent received a response so far. |
Some good news / bad news for you - lemmy.ml is duplicating activities. I subscribed to asklemmy@lemmy.ml from a test Lemmy instance tunnelled through via ngrok, so I could screenshot the 'inspect' window - you can see a pattern of 200 OK, 400 Bad Request for every activity. I've included two images to demonstrate that they are the same activity. (I don't imagine it's a config issue for you guys, but LW changed their config to fix the problem when they were doing it) |
Note that the duplication has returned: https://lemmy.ml/comment/11010288 Anyone know if lemmy.world is testing that config change again or something? |
@freamon brought this up in their matrix chat the other day and the answer was nothing changed on lemmy.world's end |
@jlandahl - LW had to reset a container, so you shouldn't be getting any more duplicate activity from them. It was the same with lemmy.ml incidentally - a few comments up I was saying that they were duplicating too, but then they reset as part of testing an upgrade to 0.19.4, and that stopped it. As a Issue that seems to start randomly, and is stopped by 'turning it off and on again', I suspect this will be a hard problem to solve. I noticed when I was recording the video the other day (of which this is an excerpt), that there was a recurring pattern: 30 seconds of nothing, then a flood of activity, then back to 30 seconds of nothing, etc. I imagine this is by design, and my instance has to wait its turn to get a bunch in one, rather than get activities as they are created. It made me wonder whether 'version 1' of an activity was being generated but not actually sent, just queued up and stuck somewhere. Then Lemmy was generating 'version 2' because it thought the first version wasn't delivered successfully, and then both versions were being sent at the same time. |
Hello, I'm part of the Lemmy.World infra team. I'd be happy to help finding the root cause of this issue, but it's currently not clear what information we should be looking for? We're currently running with the default log level for our federation container and there are no logs even mentioning lemmon.website, except for some unrelated html error pages from a kbin instance. |
To fix this we have to narrow down what exactly is causing the duplicate activities. My understanding is that the bug doesnt happen with the default ansible configuration, and affected instances have some different config. Some things worth trying on a test instance:
|
We've previously been using multiple federation containers and now we are using a single federation container.
It seems to be random. Restarting fixes it according to @freamon's feedback after we restart the container.
This is probably something @freamon can answer. |
Yeah - restarting seems to fix it (at least from my perspective). It's for all activities - anything a remote community might Announce, and - for my one local community - anything a user might send. (I just verified that last part by upvoting a random post on endlesstalk.org. You can tell how broke that instance is anyway by how out-of-sync it is with votes generally (e.g. compare https://endlesstalk.org/post/34649989 with the fedi-link), but the upvote I gave for a post on !tails@lemmon.website was sent to me a sweet 7 times. There's some info here about his container setup, with a comment that also suggests that a restart fixes it.) |
I added some extra logging in #4726 to figure out where the problem is coming from. Once that is merged and released, run Lemmy with |
Requirements
Summary
This is discussed this post on on Lemmy - you can see me protesting that it should be me filing the report, but I've slept since then, and Nutomic said please (?!). There's a question of whether it's due to instance configuration, Lemmy itself, or some combination of the two (it only stated with 0.19.x, but not all instances do it, and there isn't always a correlation between it happening and the amount of containers an instance is using). I was reluctant to make this Issue because there's information you'll need about instance config that I'm not going to be able to give to you, because I'm not running Lemmy, so this is all second-hand info and observation.
The post on Lemmy has a video. The background to it is that I wanted to use ActivityPub for an Automod on lemmy.world/c/starwarsmemes, so I sent a Follow request to that community (using a Ruby script). When anything happens lemmy.world sends an Announce activity to my server, and the Automod can, amongst other things, send a notification to my phone when a new post is made. In the video, the red terminal is showing a
tail -f
on the nginx log; the green terminal is showing the.object.id
,.object.type
, and.object.actor
from the JSON contained inside each request. starwarsmemes is pretty quiet at the minute, so I was able to monitor the results of activity that I myself was generating as a user on lemmy.world. Hopefully, the video makes it clear that duplicate activity is being sent every time.I've seen similar behaviour before: post on lemmy.ca and post on endlesstalk.org. The background to those posts is that I was running a fake lemmy community (best viewed from here), that was using a basic ActivityPub server to federate comments and votes, and I could tell from that that those instances (among others) were duplicating everything.
I saw it first with lemmy.world when they upgraded to 0.19.3, and my dumb AutoMod pinged my phone 4 times to tell me about the same new post (I mentioned it to the admins in this comment).
Outside of my server, you can also see duplicate activity trip other recipients up. One place is Lemmy modlogs. For example, this modlog shows me manually restoring a comment that the AutoMod removed. I did this with my own fair hand, using the lemmy-ui, so I know that whilst I didn't do the same thing twice, activity was sent twice to endlesstalk.org about it. Another is PieFed. They've tightened up their ability to filter out duplicates now, but - if they've not been deleted since - you can still see what happened here and here, as well as here and here (the originals weren't updated, btw). In both circumstances (modlogs and piefed), I'm guessing it's hit-and-miss because it depends on how much time there was between the original activity and its duplicate.
It's entirely possible that is down to instance mis-configuration (in which, quite a few instances are making the same mistake, which is a problem in itself). It's only slightly vague suggestions from admins that the software itself is at fault. That's why I'd prefer it was them making this Issue instead of me, but I've been requested to do it, so here it is.
Steps to Reproduce
Technical Details
(I can't you much here, for the aforementioned I'm-not-running-Lemmy reasons)
Version
0.19.3
Lemmy Instance URL
lemmy.world
The text was updated successfully, but these errors were encountered: