fix(caldav): automatically delete outdated scheduling objects #45235

miaulalala · 2024-05-08T18:42:15Z

Resolves: Remove old scheduling objects from INBOX and oc_schedulingobjects via cron #43621

Summary

oc_schedulingobjects currently grows without ever deleting outdated objects. This PR a repair step that is declared as expensive so admins can decide to run them at their convenience for the initial delete.

The delete is chunked to 50k rows on each transaction so the database isn't locked for a long time (especially in clustered setups this could cause issues). MySQL needs special treatment as it doesn't support LIMITs on DELETE queries, so it does a SELECT on the ids to delete, and then runs the delete on those.

After the repair step has run, a regular cron job is added to the Jobs List that runs every hour to get rid of scheduling objects that are older than an hour. We don't really need them and could theoretically delete them as soon as they're processed by the ITip\Broker but as rooms and resources are also run in a cron job, keeping them until the principal room and resources are added is probably a good idea as I can't exclude unwanted side effects. I also updated the runtime for rooms and resources to run every half hour for that reason.

Checklist

Code is properly formatted
Sign-off message is added to all commits
Tests (unit, integration, api and/or acceptance) are included
Documentation (manuals or wiki) has been updated or is not required
Backports requested where applicable (ex: critical bugfixes)

apps/dav/lib/Migration/DeleteSchedulingObjectsJobRegistrar.php

miaulalala · 2024-05-13T16:29:03Z

Do a slow repair job for this so we don't kill the instance.

Loop over all data DELETEs and do 50000 each loop - MySQL needs to be handled differently as it doesn't do LIMITs on its queries 💢
Do repair job that also does that
BG job is TIME_SENSITIVE and should run every hour as it has a longblob so it's better to delete often in smaller batches
wrap the DELETE in a transaction so no index updates are done during the run.
See: More than 1000 expressions in a list are not allowed on Oracle activity#1384 for how we handled it for activity

apps/dav/lib/BackgroundJob/DeleteOutdatedSchedulingObjects.php

miaulalala · 2024-05-15T16:17:53Z

server/lib/private/Repair.php

Lines 223 to 228 in b0bfe3e

    
           public static function getExpensiveRepairSteps() { 
        
           	return [ 
        
           		new OldGroupMembershipShares(\OC::$server->getDatabaseConnection(), \OC::$server->getGroupManager()), 
        
           		\OC::$server->get(ValidatePhoneNumber::class), 
        
           	]; 
        
           }

for the expensive repair job

apps/dav/lib/CalDAV/CalDavBackend.php

lib/private/Log.php

apps/dav/lib/Migration/DeleteSchedulingObjects.php

apps/dav/lib/AppInfo/Application.php

Signed-off-by: Anna Larch <anna@nextcloud.com>

ChristophWurst

Let's get rid of the hourly full table scan

👍 otherwise

ChristophWurst · 2024-05-22T05:24:05Z

apps/dav/lib/CalDAV/CalDavBackend.php

+		$queryResults = $this->atomic(function () use ($modifiedBefore, $limit) {
+			$query = $this->db->getQueryBuilder();
+			$query->delete('schedulingobjects')
+				->where($query->expr()->lte('lastmodified', $query->createNamedParameter($modifiedBefore)))
+				->setMaxResults($limit);
+			return $query->executeStatement();
+		}, $this->db);


explicit transaction is superfluous for a single query

yeah I need to move this further up so each 1k delete doesn't do the full table scan. Will an index solve this? I don't think it will, seeing as there is a blob associated with this query. When I discussed it with Joas, this was the intention behind the transaction (even though I applied it at the wrong code point 😉 )

I don't see how an explicit transaction is different to our AUTOCOMMIT=1 sessions

ChristophWurst · 2024-05-22T05:26:42Z

apps/dav/lib/CalDAV/CalDavBackend.php

+			}, $result->fetchAll(\PDO::FETCH_NUM));
+			$result->closeCursor();
+
+			$queryResult = 0;


Suggested change

$queryResult = 0;

$numDeleted = 0;

naming

ChristophWurst · 2024-05-22T05:28:15Z

apps/dav/lib/CalDAV/CalDavBackend.php

+			$query->select('id')
+				->from('schedulingobjects')
+				->where($query->expr()->lte('lastmodified', $query->createNamedParameter($modifiedBefore)))
+				->setMaxResults($limit);


if you order by last modified, the values of the chunks will be closer together and make it more efficient for the database to traverse the index.

Add an index on the lastmodified?

ChristophWurst · 2024-05-22T05:36:50Z

apps/dav/lib/BackgroundJob/DeleteOutdatedSchedulingObjects.php

+		$this->setInterval(60 * 60);
+		$this->setTimeSensitivity(self::TIME_SENSITIVE);


why is this time sensitive? wouldn't it be sufficient to clean up once a day in the maintenance window?

Well it entirely depends on the size of the instance. For 100 users, no problem. For 1k or 10k, a lot of data could be produced in a short amount of time, especially if there's a lot of usage on the calendar. That's why I'd rather clean up once an hour during business hours than have this run outside of it with lots of rows.

Right now we have no cleanup at all and most instances are fine. So I think 99.99% of instances will also survive if data is only cleaned up daily :)

A large range DELETE query can cause table locks that block other operations. We have seen this with #27695.

Relevant too: #43605

ChristophWurst · 2024-05-22T05:40:07Z

apps/dav/lib/BackgroundJob/DeleteOutdatedSchedulingObjects.php

+	 * @param array $argument
+	 */
+	protected function run($argument): void {
+		$time = $this->time->getTime() - (60 * 60);


caveat with timing this once per hour while the resources job runs twice: it assumes that background jobs run reliably. this might not be the case for ajax cron.

The resources job running every 30 minutes is just a precaution, I don't remember my exact thought pattern as to why, but I can revert. It was something along the lines of not deleting scheduling objects that might not have been processed yet but the more I think about it the less it makes sense, as I also delete scheduling objects 1 hour in the past only.

miaulalala self-assigned this May 8, 2024

miaulalala added 2. developing Work in progress performance 🚀 feature: caldav Related to CalDAV internals labels May 8, 2024

miaulalala added this to the Nextcloud 30 milestone May 8, 2024

github-advanced-security bot found potential problems May 8, 2024

View reviewed changes

apps/dav/lib/Migration/DeleteSchedulingObjectsJobRegistrar.php Fixed Show fixed Hide fixed

This comment was marked as outdated.

Sign in to view

tcitworld reviewed May 14, 2024

View reviewed changes

apps/dav/lib/BackgroundJob/DeleteOutdatedSchedulingObjects.php Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

apps/dav/lib/CalDAV/CalDavBackend.php Fixed Show fixed Hide fixed

lib/private/Log.php Fixed Show fixed Hide fixed

apps/dav/lib/Migration/DeleteSchedulingObjects.php Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

apps/dav/lib/AppInfo/Application.php Fixed Show fixed Hide fixed

miaulalala added 10 commits May 16, 2024 20:21

fix(caldav): automatically delete outdated scheduling objects

86e3a88

Signed-off-by: Anna Larch <anna@nextcloud.com>

fixup! fix(caldav): automatically delete outdated scheduling objects

07b3e59

fixup! fix(caldav): automatically delete outdated scheduling objects

285a034

fixup! fix(caldav): automatically delete outdated scheduling objects

e153db7

fixup! fix(caldav): automatically delete outdated scheduling objects

9ffb1e6

fixup! fix(caldav): automatically delete outdated scheduling objects

d3d81e1

fixup! fix(caldav): automatically delete outdated scheduling objects

f1844c3

fixup! fix(caldav): automatically delete outdated scheduling objects

7fd6d76

fixup! fix(caldav): automatically delete outdated scheduling objects

713cb56

fixup! fix(caldav): automatically delete outdated scheduling objects

f2d7e9f

miaulalala force-pushed the fix/remove-old-scheduling-objects branch from cbcae93 to f2d7e9f Compare May 16, 2024 18:22

miaulalala marked this pull request as ready for review May 16, 2024 18:23

miaulalala requested a review from ChristophWurst as a code owner May 16, 2024 18:23

miaulalala requested review from nickvergessen, Altahrim, a team and yemkareems and removed request for a team May 16, 2024 18:23

miaulalala requested a review from sorbaugh May 16, 2024 18:23

miaulalala marked this pull request as draft May 16, 2024 18:34

ChristophWurst reviewed May 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(caldav): automatically delete outdated scheduling objects #45235

fix(caldav): automatically delete outdated scheduling objects #45235

miaulalala commented May 8, 2024 •

edited

This comment was marked as outdated.

miaulalala commented May 13, 2024 •

edited

miaulalala commented May 15, 2024

ChristophWurst left a comment

ChristophWurst May 22, 2024

miaulalala May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

miaulalala May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

miaulalala May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

ChristophWurst May 22, 2024

miaulalala May 22, 2024 •

edited

		$this->setInterval(60 * 60);
		$this->setTimeSensitivity(self::TIME_SENSITIVE);

fix(caldav): automatically delete outdated scheduling objects #45235

Are you sure you want to change the base?

fix(caldav): automatically delete outdated scheduling objects #45235

Conversation

miaulalala commented May 8, 2024 • edited

Summary

Checklist

This comment was marked as outdated.

miaulalala commented May 13, 2024 • edited

miaulalala commented May 15, 2024

ChristophWurst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miaulalala May 22, 2024 • edited

Choose a reason for hiding this comment

miaulalala commented May 8, 2024 •

edited

miaulalala commented May 13, 2024 •

edited

miaulalala May 22, 2024 •

edited