Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(caldav): automatically delete outdated scheduling objects #45235

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

miaulalala
Copy link
Contributor

@miaulalala miaulalala commented May 8, 2024

Summary

oc_schedulingobjects currently grows without ever deleting outdated objects. This PR a repair step that is declared as expensive so admins can decide to run them at their convenience for the initial delete.

The delete is chunked to 50k rows on each transaction so the database isn't locked for a long time (especially in clustered setups this could cause issues). MySQL needs special treatment as it doesn't support LIMITs on DELETE queries, so it does a SELECT on the ids to delete, and then runs the delete on those.

After the repair step has run, a regular cron job is added to the Jobs List that runs every hour to get rid of scheduling objects that are older than an hour. We don't really need them and could theoretically delete them as soon as they're processed by the ITip\Broker but as rooms and resources are also run in a cron job, keeping them until the principal room and resources are added is probably a good idea as I can't exclude unwanted side effects. I also updated the runtime for rooms and resources to run every half hour for that reason.

Checklist

@miaulalala miaulalala self-assigned this May 8, 2024
@miaulalala miaulalala added 2. developing Work in progress performance 🚀 feature: caldav Related to CalDAV internals labels May 8, 2024
@miaulalala miaulalala added this to the Nextcloud 30 milestone May 8, 2024
miaulalala

This comment was marked as outdated.

@miaulalala
Copy link
Contributor Author

miaulalala commented May 13, 2024

Do a slow repair job for this so we don't kill the instance.

  1. Loop over all data DELETEs and do 50000 each loop - MySQL needs to be handled differently as it doesn't do LIMITs on its queries 💢
  2. Do repair job that also does that
  3. BG job is TIME_SENSITIVE and should run every hour as it has a longblob so it's better to delete often in smaller batches
  4. wrap the DELETE in a transaction so no index updates are done during the run.
    See: More than 1000 expressions in a list are not allowed on Oracle activity#1384 for how we handled it for activity

@miaulalala
Copy link
Contributor Author

public static function getExpensiveRepairSteps() {
return [
new OldGroupMembershipShares(\OC::$server->getDatabaseConnection(), \OC::$server->getGroupManager()),
\OC::$server->get(ValidatePhoneNumber::class),
];
}
for the expensive repair job

apps/dav/lib/CalDAV/CalDavBackend.php Fixed Show fixed Hide fixed
lib/private/Log.php Fixed Show fixed Hide fixed
@miaulalala miaulalala force-pushed the fix/remove-old-scheduling-objects branch from cbcae93 to f2d7e9f Compare May 16, 2024 18:22
@miaulalala miaulalala marked this pull request as ready for review May 16, 2024 18:23
@miaulalala miaulalala requested review from nickvergessen, Altahrim, a team and yemkareems and removed request for a team May 16, 2024 18:23
@miaulalala miaulalala requested a review from sorbaugh May 16, 2024 18:23
@miaulalala miaulalala marked this pull request as draft May 16, 2024 18:34
Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of the hourly full table scan

👍 otherwise

Comment on lines +2759 to +2765
$queryResults = $this->atomic(function () use ($modifiedBefore, $limit) {
$query = $this->db->getQueryBuilder();
$query->delete('schedulingobjects')
->where($query->expr()->lte('lastmodified', $query->createNamedParameter($modifiedBefore)))
->setMaxResults($limit);
return $query->executeStatement();
}, $this->db);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit transaction is superfluous for a single query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I need to move this further up so each 1k delete doesn't do the full table scan. Will an index solve this? I don't think it will, seeing as there is a blob associated with this query. When I discussed it with Joas, this was the intention behind the transaction (even though I applied it at the wrong code point 😉 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how an explicit transaction is different to our AUTOCOMMIT=1 sessions

}, $result->fetchAll(\PDO::FETCH_NUM));
$result->closeCursor();

$queryResult = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$queryResult = 0;
$numDeleted = 0;

naming

Comment on lines +2776 to +2779
$query->select('id')
->from('schedulingobjects')
->where($query->expr()->lte('lastmodified', $query->createNamedParameter($modifiedBefore)))
->setMaxResults($limit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you order by last modified, the values of the chunks will be closer together and make it more efficient for the database to traverse the index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no index for this query. it will perform a full table scan:

> EXPLAIN SELECT id FROM oc_schedulingobjects WHERE lastmodified <= 1700000000;
+------+-------------+----------------------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table                | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+----------------------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | oc_schedulingobjects | ALL  | NULL          | NULL | NULL    | NULL | 104  | Using where |
+------+-------------+----------------------+------+---------------+------+---------+------+------+-------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an index on the lastmodified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines +40 to +41
$this->setInterval(60 * 60);
$this->setTimeSensitivity(self::TIME_SENSITIVE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this time sensitive? wouldn't it be sufficient to clean up once a day in the maintenance window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it entirely depends on the size of the instance. For 100 users, no problem. For 1k or 10k, a lot of data could be produced in a short amount of time, especially if there's a lot of usage on the calendar. That's why I'd rather clean up once an hour during business hours than have this run outside of it with lots of rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we have no cleanup at all and most instances are fine. So I think 99.99% of instances will also survive if data is only cleaned up daily :)

A large range DELETE query can cause table locks that block other operations. We have seen this with #27695.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant too: #43605

* @param array $argument
*/
protected function run($argument): void {
$time = $this->time->getTime() - (60 * 60);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caveat with timing this once per hour while the resources job runs twice: it assumes that background jobs run reliably. this might not be the case for ajax cron.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resources job running every 30 minutes is just a precaution, I don't remember my exact thought pattern as to why, but I can revert. It was something along the lines of not deleting scheduling objects that might not have been processed yet but the more I think about it the less it makes sense, as I also delete scheduling objects 1 hour in the past only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing Work in progress feature: caldav Related to CalDAV internals performance 🚀
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove old scheduling objects from INBOX and oc_schedulingobjects via cron
3 participants