You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, mastodon goes pretty much sideways the moment the replication lag grows unacceptably, however it's entirely unaware this is happening.
This happened on my instance while I was asleep (unfortunately), the replica ran out of disk space and postgres decided it would just keep working but stop applying new records. This lead to everything seemingly being "ok", but the instance was obviously rather unusable as it would constantly 404 on posts you'd swear you just saw a moment ago on your timeline.
It would be nice if mastodon could monitor replication lag, and warn, or at least temporarily backlist the failing SQL server from the pool.
Motivation
I think everyone running a bigger instance, or even everyone using a primary/replica setup will benefit greatly from this.
Obviously this can be prevented with additional monitoring outside of mastodon, or if anyone is on-call (as replication lag is probably something you'd monitor for). However I don't think it would be too unreasonable to add some intelligence to mastodon to monitor this :)
The text was updated successfully, but these errors were encountered:
smiba
changed the title
Give mastodon a better understanding of replication lag
Give mastodon a better understanding of database replication lag
May 5, 2024
Pitch
Right now, mastodon goes pretty much sideways the moment the replication lag grows unacceptably, however it's entirely unaware this is happening.
This happened on my instance while I was asleep (unfortunately), the replica ran out of disk space and postgres decided it would just keep working but stop applying new records. This lead to everything seemingly being "ok", but the instance was obviously rather unusable as it would constantly 404 on posts you'd swear you just saw a moment ago on your timeline.
It would be nice if mastodon could monitor replication lag, and warn, or at least temporarily backlist the failing SQL server from the pool.
Motivation
I think everyone running a bigger instance, or even everyone using a primary/replica setup will benefit greatly from this.
Obviously this can be prevented with additional monitoring outside of mastodon, or if anyone is on-call (as replication lag is probably something you'd monitor for). However I don't think it would be too unreasonable to add some intelligence to mastodon to monitor this :)
The text was updated successfully, but these errors were encountered: