(Or how to cure an afternoon headache)
Published on Mar 27, 2019
Clocks by endlesswatts, published under the Pixabay License.
Regularly scheduled processes occur on servers everyday: sending email notifications, importing data, updating software, removing out-of-date logs, and so on. The Cron system facilitates such events. A software utility, Cron runs specified commands at a designated minute, hour, day, month, year, or any combination of these increments: it keeps scheduled jobs on schedule, so you don’t have to.
A crontab is a file that contains cronjobs (or, more simply, tasks). You may have implemented a cronjob by shelling into your server and editing your user crontab:
# Show the contents of your crontab.
crontab -l
# Open the crontab, and add, edit, or delete cronjobs.
crontab -e
# Then, close the file, and wait for the Cron daemon to reload.
Sound familiar?
Unfortunately, shelling into your server to alter or view the contents of a crontab can create problems, unexpected inconveniences, and afternoon headaches. Manually manipulating the contents of a server may lead to service interruptions, and adjusting a crontab – without any record of its history – may result in exhausting exercises in reimagining the past.
At DataMade, in particular, we’ve encountered problems with toggling cronjobs off, but forgetting to toggle them back on, and shelling into the server to remind ourselves of the schedule for data scrapes and imports. That can be time-consuming and a little frustrating.
DataMade found a better way! We follow, among others, one relatively straightforward best practice: use version control. Crontabs, just like application source code, can be integrated with a version control system and a continuous deployment pipeline. Read on to learn how.
Step 1. Create a crontab (again, a file that contains your cronjobs). This file needs a particular name (e.g., “repo-name-cronjobs”). It should include only letters and dashes (no periods!). The file can live wherever you like, but we recommend placing it inside a scripts directory at the root of your application: scripts/repo-name-cronjobs
.
Step 2. Write the cronjobs. Does your app run management commands, scrape the web, send emails? Once per week, every day, every hour? A very simple cronjob looks like the following:
# /etc/cron.d/repo-name-cronjobs
* * * * * regina echo ‘version control’ >> /var/log/repo-name-cronjob.log
# You need a newline at the end of your crontab, or cron ignores the file.
Let’s untangle this.
The first line indicates one of several possible locations of crontabs on the server.
DataMade uses Ubuntu. On Ubuntu, the Cron service scans, every minute, the /etc/cron.d/
directory (among other places, e.g., /etc/cron.daily/
, /etc/cron.hourly
) and looks for new, updated, or recurring tasks. Your OS may store crontabs elsewhere: determine the location, and add that, plus the file name, to the top of your crontab. Using Ubuntu? Read this friendly Cron overview.
The second line provides the cronjob itself. It outputs the string “version control” to a test log. The five asterisks at the beginning of the task indicate it should run every minute. Need something more complex? Crontab Guru can help you find the right syntax.
The cronjob also includes a username, in this case “regina,” which indicates that the crontab should be run as the “regina” user. Remember! Our cron lives at /etc/cron.d
, wherein cron allows assignations of users. You may need a similar indication, but this depends on your OS.
Finally, do not forget the newline! Cron ignores the crontab without it.
Step 3. Update deployment scripts. DataMade uses AWS CodeDeploy, a service that automates deployments by pushing code to servers. A series of event hooks (delineated in shell scripts) determine the shape of the deployment lifecycle. In DataMade deployments, the AfterInstall hook (after_install.sh
) contains our instructions about handling crontabs.
# Move the crontab from the scripts directory to the beloved `/etc/cron.d`
PROJECT_DIR="/path/to/app/on/the/server"
mv $PROJECT_DIR/scripts/repo-name-cronjob /etc/cron.d/repo-name-cronjob
# Adjust the permissions, so that the Cron service can effectively interact with the file
chown root.root /etc/cron.d/repo-name-cronjob
chmod 644 /etc/cron.d/repo-name-cronjob
Feeling reluctant to get started? Take a look at some DataMade crontabs-under-control in the wild: a simple, straightforward example, a crontab with variables, and a cornucopia of cronjobs.
The Cron utility determines the basic shape of scheduling infrastructure on a server. Cron, in other words, organizes day-to-day tasks into a system, so that our digital tools can operate without surprise.
This type of infrastructure can elude obvious integration with version control. Happily, philosophies like “Infrastructure as Code” (IaC) make versioning patterns for Cron less elusive. IaC argues, in part, that infrastructural components should reside in version control, just as source code does. Versioning crontabs simply extends the IaC model to the scheduling of tasks.
More practically, versioning crontabs makes development more easeful, less error prone, and more collaborative. At DataMade, we rarely shell into servers and adjust cronjobs, like ghosts without fingerprints. We keep clear records of how our crontabs change, and in turn, this enables precise conversations about cron intricacies, such as implementing flock
or setting up unusual temporal boundaries. Fewer problems, little inconvenience, and no afternoon headaches.