Give mastodon a better understanding of database replication lag #30176

smiba · 2024-05-05T08:53:11Z

Pitch

Right now, mastodon goes pretty much sideways the moment the replication lag grows unacceptably, however it's entirely unaware this is happening.

This happened on my instance while I was asleep (unfortunately), the replica ran out of disk space and postgres decided it would just keep working but stop applying new records. This lead to everything seemingly being "ok", but the instance was obviously rather unusable as it would constantly 404 on posts you'd swear you just saw a moment ago on your timeline.

It would be nice if mastodon could monitor replication lag, and warn, or at least temporarily backlist the failing SQL server from the pool.

Motivation

I think everyone running a bigger instance, or even everyone using a primary/replica setup will benefit greatly from this.

Obviously this can be prevented with additional monitoring outside of mastodon, or if anyone is on-call (as replication lag is probably something you'd monitor for). However I don't think it would be too unreasonable to add some intelligence to mastodon to monitor this :)

smiba added the suggestion Feature suggestion label May 5, 2024

smiba changed the title ~~Give mastodon a better understanding of replication lag~~ Give mastodon a better understanding of database replication lag May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give mastodon a better understanding of database replication lag #30176

Give mastodon a better understanding of database replication lag #30176

smiba commented May 5, 2024

Give mastodon a better understanding of database replication lag #30176

Give mastodon a better understanding of database replication lag #30176

Comments

smiba commented May 5, 2024

Pitch

Motivation