Cron Jobs on Google App Engine
As a side project, I've been working on a mobile site to track my local bus system. I'll have a more detailed post about that once it is a fully polished site. Essentially, the site parses some xml with longitude/latitude data and uses a maps api to display the location. All fairly simple, and it's built on Google App Engine, which means it will be free/cheap to maintain.
I always wanted the app to be able to continually check the xml feeds, which are updated every minute, so I could store the data persistently and then use memcache to handle large spikes in traffic (hopefully my site will have this good problem). In order to do this, I needed to set up a cron job. Well I was in luck as Google recently
added support for cron jobs, a feature not previously built in to App Engine.
The setup is really simple and intuitive. Here's the relevant file structure:
/app.yaml
/cron.yaml
/main.py
/models.py
/my_cron.py
app.yaml:
application: myapp
version: 1
runtime: python
api_version: 1
handlers:
- url: /my_cron
script: my_cron.py
login: admin
- url: .*
script: main.py
Whenever a request is sent to your app, its url is checked against the list of handlers. If the url is myapp.appspot.com/my_cron, the file my_cron.py is executed. All other urls go to the main guts of the site, main.py. Notice the
login: admin part. This makes sure only the admin can access this url.
my_cron.py:
from models import *
from tools import *
#pseudocode:
info = get_data(args)
i = MyModel(info=info)
i.save()
Another benefit from setting up this cron job was that it forced me to separate my models from main.py (the code is looking more Django-y by the day).
Now let's tell App Engine how often we want my_cron.py to be executed.
cron.yaml:
cron:
-description: grabs some data
url: /my_cron
schedule: every 20 minutes
The
schedule parameter is the most important part. It can be simple, as I have shown, or very complex, following
these conventions.
That was easy.
Alan
14 April 2009