Cron Task Scheduler

This repository contains tools for scheduling tasks and finding the non-overlapping times between them within a specified interval.

Scripts

data_preprocessor.py

This script processes the input datasets and returns a single output dataset that contains information about the cron jobs.

Dependencies

pandas

Inputs

The script takes in 3 datasets:

Functions dataset - This dataset contains information about the scheduled functions in PostgreSQL.
Dags dataset - This dataset contains information about the Directed Acyclic Graphs (DAGs) that the cron jobs are a part of.
Priority dataset - This dataset contains the priority assigned to each cron job.

Usage

The script can be executed by running the following command in the terminal:

python data_preprocessor.py

It will prompt the user to enter the paths for the three datasets: functions, dags, and priority.

Alternatively, the script can be imported into another Python script and the main() function can be called with the paths to the three datasets as arguments.

import data_preprocessor

cron_data = main(functions_path, dags_path, priority_path)

Note

Please ensure that the input datasets are in the correct format and have the expected columns before running the script, otherwise the script may not work as expected.

gantt_chart_generator.py

This script generates a Gantt chart showing the schedules of multiple jobs. The input is a pandas DataFrame with columns 'job_name', 'schedule', 'category', and 'duration', where 'schedule' is in crontab format and 'duration' is a pandas Timedelta object representing the duration of each job. The script generates a Gantt chart showing all job schedules from now until a user-specified interval.

Dependencies

pandas
croniter
plotly

Usage

To use the script, you can follow these steps:

Enter the following command:

python gantt_chart_generator.py

The script will prompt you for the interval for the Gantt chart (e.g. "days=7, weeks=2, months=1") and a list of tasks to include in the chart (separated by commas, or enter "all" to include all tasks). The Gantt chart will be generated and displayed using plotly.

Example

Here is an example of how to use the script:

import pandas as pd
from gantt_chart_generator import create_gantt_chart

# Create a sample DataFrame with job schedules
df = pd.DataFrame({'job_name': ['Job 1', 'Job 2', 'Job 3'], 'schedule': ['0 0 * * *', '0 0 * * 1', '0 0 * * 2'], 'category': ['category1', 'category1', 'category2'],'duration': [pd.Timedelta(minutes=60), pd.Timedelta(minutes=30), pd.Timedelta(minutes=45)]})

# Generate the Gantt chart
interval = {'days': 7}  # Show schedules for the next 7 days
tasks = ['Job 1', 'Job 3']  # Include only Job 1 and Job 3 in the chart
create_gantt_chart(df, interval, tasks)

Version

1.0.1

cron_schedule_lookup.py

This script parses and interprets crontab schedules to generate a list of datetimes for tasks within a specified time interval and date range. It takes a dataframe containing tasks' names and their crontab schedules, as well as user-specified input for the time interval, start date, and end date, and returns a dataframe containing the names of all tasks with the exact datetimes they start to run.

Dependencies

pandas
croniter

Usage

To use the script, call the get_task_datetimes function and pass in the dataframe containing the tasks and their crontab schedules, as well as the time interval, time zone, start date, and end date as arguments. The function will return a dataframe containing the task names and their corresponding datetimes within the specified time interval and date range.

You can also specify the time_zone argument to set the time zone for the datetimes returned by the function. The default time zone is UTC. If you don't specify a time zone, the datetimes will be in UTC. To specify a different time zone, pass the name of the time zone as a string (e.g. "Asia/Tokyo" for Tokyo time). You can find a list of available time zones here.

Version

1.1.1