/var/

Various programming stuff

Hello! If you are using an ad blocker but find something useful here and want to support me please consider disabling your ad blocker for this site.

Thank you,
Serafeim

Using browserify and watchify to improve your client-side-javascript workflow

The problem

Once upon a time, when people started using client-side code to their projects they (mainly due to the lack of decent client-side libraries but also because of the NIH syndrome) were just adding their own code in script nodes or .js files using document.getElementById to manipulate the DOM (good luck checking for all possible browser-edge cases) and window.XMLHttpRequest to try doing AJAX.

After these dark-times, the age of javascript-framework came: prototype, jquery, dojo. These were (and still are) some great libraries (even NIH-sick people used them to handle browser incompatibilities): You just downloaded the .js file with the framework, put it inside your project static files and added it to your page with a script tag and then filled your client-side code with funny $(‘#id’) symbols!

Coming to the modern age in client-side development, the number of decent libraries has greatly increased and instead of monolithic frameworks there are different libraries for different needs. So instead of just downloading a single file and adding the script node for that file to your HTML, you need to download the required javascript files, put them all in your static files directory and then micro-manage the script nodes for each of your pages depending on which libraries each page needs! So if you want to use (for example) moment.js to your client-side code you need to go to all HTML pages that use that specific client-side code and add a moment.js-script element!

As can be understood this leads to really ugly situations like people avoiding refactoring their code to use external libraries, using a single-global module with all their client side code, using CDN to avoid downloading the javascript libraries and of course never upgrade their javascript libraries!

The solution

browserify and watchify are two sister tools from the server-side-javascript (node.js and friends) world that greatly improve your javascript workflow: Using them, you no longer need to micro-manage your script tags but instead you just declare the libraries each of your client-side modules is using - or you can even create your own reusable modules! Also, installing (or updating) javascript libraries is as easy as running a single command!

How are they working? With browserify you create a single main.js for each of your HTML pages and in it you declare its requirements using require. You’ll then pass your main.js through browserify and it will create a single file (e.g bundle.js) that contains all the requirements (of course each requirement could have other requirements - they’ll be automatically also included in the resulting .js file). That’s the only file you need to put to the script tag of your HTML! Using watchify, you can watch your main.js for changes (the changes may also be in the files included from main.js) and automatically generate the resulting bundle.js so that you’ll just need to hit F5 to refresh and get the new version!

browserify not only concatenates your javascript libraries to a single bundle but can also transform your coffesscript, typescript, jsx etc files to javascrpt and then also add them to the bundle. This is possible through a concept called transforms — there are a lot of transforms that you can use.

Below, I will propose a really simple and generic workflow that should cover most of your javascript needs. I should mention that I mainly develop django apps and my development machine is running windows, however you can easily use exactly the same workflow from any kind of server-side technology (ruby, python, javascript, java, php or even static HTML pages!) or development machine (windows, linux, osx) - it’s exactly the same!

Installing required tools

As already mentioned, you need two node.js tools. Just install them globally using npm (installing node.js and npm is really easy - there’s even a package for windows):

npm install -g browserify watchify

The -g switch installs the packages globally so you can use the browserify and watchify commands from your command prompt - after that entering browserify or watchify from your command prompt should be working.

Starting your (node.js) project

Although you may already have a project structure, in order to use browserify you’ll need to create a node.js project (project from now on) that needs just two things:

  • a package.json that lists various options for your project
  • a node_modules directory that contains the packages that your project uses

To create the package.json you can either copy paste a simple one or run npm init inside a folder of your project. After npm init you’ll need to answer a bunch of questions and then a package.json will be created to the same folder. If you don’t want to answer these questions (most probably you only want to use node.js for browserify - instead you wouldn’t be reading this) then just put an empty json string {} in package.json.

I recommend adding package.json to the top-level folder of your version-controlled souce-code tree - please put this file in your version control - the node_modules directory will be be created to the same directory with package.json and should be ignored by your version control.

Running browserify for the first time

Not the time has come to create a main.js file. This could be put anywhere you like (based on your project structure) - I” suppose that main.js is inside the src/ folder of your project. Just put a console.log("Hello, world") to the main.js for now. To test that everything is working, run:

browserify src/main.js

You should see some minified-js gibberish to your console (something like (function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof ...) ) which means that everything works fine. Now, create a dist directory which would contain your bundle files and run

browserify src/main.js -o dist/bundle.js

the -o switch will put the the same minified-js gibberish output to the dist/bundle.js file instead of stdout. Finally, include a script element with that file to your HTML and you should see “Hello, world” to your javascript console when opening the HTML file!

Using external libraries

To use a library from your main.js you need to install it and get a reference to it through require. Let’s try to use moment.js: To install the library run

npm install moment --save

This will create a moment directory inside node_modules that will contain the moment.js library. It will also add a dependency to your package.json (that’s what the --save switch does), something like this:

"dependencies": {
  "moment": "^2.10.3"
}

Whenever you install more client-side libraries they’ll be saved there. When you want to re-install everything (for instance when you clone your project) you can just do a

npm install

and all dependencies of package.json will be installed in node_modules (that’s why node_modules should not be tracked).

After you’ve installed moment.js to your project change src/main.js to:

moment = require('moment')
console.log(moment() );

and rerun browserify src/main.js -o dist/bundle.js. When you reload your HTML you’ll see the that you are able to use moment - all this without changing your HTML!!!

As you can understand, in order to use a library with browserify, this library must support it by having an npm package. The nice thing is that most libraries already support it — let’s try for another example to use underscore.js and (for some reason) we need version underscore 1.7 :

npm install underscore@1.7--save

you’ll se that your package.json dependencies will also contain underscore.js 1.7:

{
  "dependencies": {
    "moment": "^2.10.3",
    "underscore": "^1.7.0"
  }
}

If you want to upgrade underscore to the latest version run a:

npm install underscore --upgrade --save

and you’ll see that your package.json will contan the latest version of underscore.js.

Finally, let’s change our src/man.js to use underscore:

moment = require('moment')
_ = require('underscore')

_([1,2,3]).map(function(x) {
  console.log(x+1);
});

After you create your bundle you should se 2 3 4 in your console!

Introducing watchify

Running browserify every time you change your js files to create the bundle.js feels like doing repetitive work - this is where wachify comes to the rescue; watchify is a tool that watches your source code and dependencies and when a change is detected it will recreate the bundle automagically!

To run it, you can use:

watchify src/main.js -o dist/bundle.js -v

and you’ll see something like: 155544 bytes written to dist/bundle.js (0.57 seconds) — try changing main.js and you’ll see that bundle.js will also be re-written!

Some things to keep in mind with watchify usage:

  • The -v flag outputs the verbose text (or else you won’t se any postive messages) - I like using it to be sure that everything is ok.
  • You need to use the -o flag with watchify — you can’t output to stdout(we’ll see that this will change our workflow for production a bit later)
  • watchify takes the same parameters with browserify — so if you do any transformations with browserify you can also do them with watchify

In the following, I’ll assume that you are running the watchify src/main.js -o dist/bundle.js -d so your bundles will always be re-created when changes are found.

Creating your own modules

Using browserify we can create our own modules and require them in other modules using the module.exports mechanism!

Creating a module is really simple: In a normal javascript file either assign directly to module.exports or include all local objects you want to be visible as an attribute to module.exports — everything else will be private to the module.

As an example, let’s create an src/modules folder and put a file module1.js inside it, containing the following:

var variable = 'variable'
var variable2 = 'variable2'
var funct = function(x) {
  return x+1;
}
var funct2 = function(x) {
  return x+1;
}

module.exports['variable'] = variable
module.exports['funct'] = funct

As you see, although we’ve defined a number of things in that module, only the variable and funct attributes of module.exports will be visible when the module is used. To use the module, change main.js like this:

module1 = require('./modules/module1')
console.log(module1.funct(9))

When you refresh your HTML you’ll see 10 in the console. So, require will return the module.exports objects of each module. It will either search in your project’s node_modules (when you use just the modfule name, for example moment, or locally (when you start a path with either ./ or ../ — in our case we required the module module1.js from the folder modules).

As a final example, we’ll create another module that is used by module1: Create a file named module2.js inside the modules folder and the following contents:

var funct = function(x) {
    return x+1;
}

module.exports = funct

After that, change module1.js to this:

module2 = require('./module2')

var variable = 'variable'
var funct = function(x) {
    return module2(x)+1;
}

module.exports['variable'] = variable
module.exports['funct'] = funct

So module1 will import the module2 module (from the same directory) and call it (since a function is assignedd to module.exports). When you refresh your HTML you should see 11!

Uglifying your bundle

If had taken a look at the file size of your bundle.js when you’d included moment.js or underscore.js you’d see that the file size has been greatly increased. Take a peek at bundle.js and you’ll see why: The contents of the module files will be concatenated as they are, without any changes! This may be nice for development / debugging, however for production we’d like our bundle.js to be minified — or uglyfied as it’s being said in the javascript world.

To help us with this, uglifying we’ll use uglify-js. First of all, please install it globally

npm install uglify-js -g

and you’ll be able to use the uglifyjs command to uglify your bundles! To use the uglifyjs command for your bundle.js try this

uglifyjs dist\bundle.js  > dist\bundle.min.js

and you’ll see the size of the bundle.min.js greatly reduced! To achieve even better minification (and code mangling as an added bonus) you could pass the -mc options to uglify:

uglifyjs dist\bundle.js -mc > dist\bundle.min.js

and you’ll see an even smaller bundle.min.js!

As a final step, we can combine the output of browserify and uglify to a single command using a pipe:

browserify src/main.js | uglifyjs -mc > dist/bundle.js

this will create the uglified bundle.js! Using the pipe to output to uglifyjs is not possible with watchify since watchify cannot output to stdout — however, as we’ll see in the next section this is not a real problem.

The client-side javascript workflow

The proposed client-side javascript workflow uses two commands, one for the development and one for creating the production bundle.

For the development, we’ll use watchify since we need to immediately re-create the bundle when a javascript source file is changed and we don’t want any uglification:

watchify src/main.js -o dist/bundle.js -v

For creating our production bundle, we’ll use browserify and uglify:

browserify src/main.js  | uglifyjs -mc warnings=false > dist/bundle.js

(i’ve added warnings=false to uglfiyjs to suppress warnings).

The above two commands can either be put to batch files or added to your existing workflow (for example as fabric commands if you use fabric). However, since we already have a javascrpt project (i.e a package.json) we can use that to run these commands. Just add a scripts section to your package.json like this:

{
  "dependencies": {
    "moment": "^2.10.3",
    "underscore": "^1.8.3"
  },
  "scripts": {
    "watch": "watchify src/main.js -o dist/bundle.js -v",
    "build": "browserify src/main.js  | uglifyjs -mc warnings=false > dist/bundle.js"
  }
}

and you’ll be able to run npm run watch to start watchifying for changes and npm run build to create your production bundle!

Conlusion

In the above we saw two (three if we include uglifyjs) javascript tools that will greatly improve our javascript workflow. Using these we can easily require (import) external javascript libraries to our project without any micromanagement of script tags in html files. We also can seperate our own client-side code to self-contained modules that will only export interfaces and not pollute the global namespace. The resulting production client-side javascript file will be output minimized and ready to be used by the users’ browsers.

All the above are possible with minimal changes to our code and development workflow:

  • create a package.json and install your dependencies
  • require the external libraries (instead of using them off the global namespace)
  • define your module’s interace through module.exports (instead of polluting the global namespace)
  • change your client javascript files to bundle.js
  • run npm run watch when developing and npm run build before deploying

Show 404 page on django when DEBUG=True

The default 404 error page on django can be easily overriden by adding a template named 404.html to the top level directory of your templates. However, on your development environment you’ll never be able to see this template because when DEBUG=True django will render the debug not found page to help you debug your url configuration.

If you want to display that page in your development environment you can always change the DEBUG setting to False, however there’s a better way: Add a url pattern for django’s default 404 view - just add the following to your urls.py:

import django.views.defaults

urlpatterns = patterns('',
    # Other url patterns ...
    url(r'^404/$', django.views.defaults.page_not_found, ),
)

You’ll then be able to see your 404 page by visiting the defined URL!

Upgrade for newer Django versions

I’ve recently found out that for newer Django versions the page_not_found view needs a second parameter (beyond request) named exception (see here). Thus if you just add the view to your urls as I propose you’ll get an exception. To fix it, you can do something like:

import django

def custom_page_not_found(request):
    return django.views.defaults.page_not_found(request, None)

urlpatterns = [
    path("404/", custom_page_not_found),
]

This creates a simple view that calls the builtin page_not_found passing it None as the exception parameter.

Calling the REST API of Pusher from python

Introduction

Pusher is one of the best real time frameworks right now. Using it you can add real time events in your projects without the need to configure and use HTTP servers that support real-time events in your environment. I used it recently in a project and it worked really good, having a very simple API and a nice interface for debugging your requests.

The only problem I’ve found was that the Pusher python API misses some features that the APIs for other languages have, specifically finding out the users on a presence channel.

Pusher supports real-time events through the use of “channels”. Each pusher client will subscribe to a channel and receive messages that are sent to that channel. A special kind of channel are presence channels which keep a list of their subscribers. You can query the Pusher REST API (or f.e the Pusher Javascript API) to find out the names of the users in a presence channel - however this is not currently possible with the python API.

Unfortuanately, calling the Pusher REST API is not so easy, since it needs a complicated singining of each request, so I’ve written this post to help developers that need to call this API from python (to get the users of a presence channel or for any other method the REST API supports).

Signing the request

Quoting from the Pusher REST API, to sign a request we need a signature, which:

The signature is a HMAC SHA256 hex digest. This is generated by signing a string made up of the following components concatenated with newline characters \n:

  • The uppercase request method (e.g. POST)
  • The request path (e.g. /some/resource)
  • The query parameters sorted by key, with keys converted to lowercase, then joined as in the query string. Note that the string must not be url escaped (e.g. given the keys auth_key: foo, Name: Something else, you get auth_key=foo&name=Something else)

So, we need to create a string and then sign it using our Pusher api_key and secret. To help with this, we create a Token class which will be initialzed with out pusher key/secret and correctly sign a string:

class Token(object,):
    def __init__(self, key, secret):
        self.key = key
        self.secret = secret

    def sign(self, string):
        return hmac.new(self.secret, string, hashlib.sha256).hexdigest()

It uses the hmac and hashlib python modules.

Generating the complete query string

We can now create a function that will sign a request using an instance of the above token:

def create_signed_query_string(token, partial_path, request_params):
    params = {
        'auth_key': token.key,
        'auth_timestamp': int(time.time()),
        'auth_version': '1.0'
    }
    params.update(request_params)
    keys = sorted(params.keys() )
    params_list = []
    for k in keys:
        params_list.append( '{0}={1}'.format(k, params[k]) )

    query_string = '&'.join(params_list)

    sign_data = '\n'.join(['GET', partial_path, query_string])
    query_string += '&auth_signature=' + token.sign(sign_data);
    return query_string

create_signed_query_string receives an instance of a Token, the path that we want to request without the server part (for example /apps/33/users/my-channel) and a dictionary of request parameters. It then adds three extra fields to the request parameters (auth_key, auth_timestamp, auth_version) and creates a list of these parameters in the key=value form, where the keys are alphabetically sorted. After that it joins the above key=value parameters using & to create the query_string and then it creates the string to be signed (sign_data) by concatenating the HTTP methdo (GET) with the path and the query_string. Finally, it appends the signing result as an extra query parameter named (auth_signature).

Requesting the users of the presence channel

The create_signed_query_string can now be used to get the users of a presence channel like this:

def get_users(app_id, key, secret, channel):
    partial_path =  '/apps/{0}/channels/{1}/users'.format(app_id, channel)
    token = Token(key, secret)
    qs = create_signed_query_string(token, partial_path, {})
    full_path = 'http://api.pusherapp.com/{0}?{1}'.format(partial_path, qs)
    r = requests.get(full_path)
    return r.text

The get_users function will generate the path of the pusher REST API (using our pusher app_id and channel name) and initialize a signing Token using the pusher key and secret. It will then pass the previous to create_signed_query_string to generate the complete query_string and generate the full_path to which a simple HTTP GET request is issued. The result will be a JSON list of the users in the presence channel.

Complete example

A complete example of getting the presence users of a channel is the following:

import time
import hashlib
import hmac
import requests

app_id = 'pusher_app_id'
key = 'pusher_key'
secret = 'pusher_secret'
channel = 'pusher_presence_channel'


class Token(object,):
    def __init__(self, key, secret):
        self.key = key
        self.secret = secret

    def sign(self, string):
        return hmac.new(self.secret, string, hashlib.sha256).hexdigest()


def create_signed_query_string(token, partial_path, method, request_params):
    params = {
        'auth_key': token.key,
        'auth_timestamp': int(time.time()),
        'auth_version': '1.0'
    }
    params.update(request_params)
    keys = sorted(params.keys() )
    params_list = []
    for k in keys:
        params_list.append( '{0}={1}'.format(k, params[k]) )

    query_string = '&'.join(params_list)

    sign_data = '\n'.join([method, partial_path, query_string])
    query_string += '&auth_signature=' + token.sign(sign_data);
    return query_string


def get_users(channel):
    partial_path =  '/apps/{0}/channels/{1}/users'.format(app_id, channel)
    token = Token(key, secret)
    qs =  create_signed_query_string(token, partial_path, 'GET' {})
    full_path = 'http://api.pusherapp.com/{0}?{1}'.format(partial_path, qs)
    r = requests.get(full_path)
    return r.text

print get_users(channel)

Conclusion

With the above we are able to not only easily get the users of a Pusher presence channel in python but to also call any method we want from the Pusher REST API by implementing a function similar to get_users.

Asynchronous tasks in django with django-rq

Update 03/05/2022: The github project has been updated to work with latest version of Django (4.0.4) and Python (3.10): https://github.com/spapas/django-test-rq

Update 01/09/15: I’ve written a new post about rq and django with some more advanced techniques !

Introduction

Job queuing (asynchronous tasks) is a common requirement for non-trivial django projects. Whenever an operation can take more than half a second it should be put to a job queue in order to be run asynchronously by a seperate worker. This is really important since the response to a user request needs to be immediate or else the users will experience laggy behavior and start complaining! Even for fairly quick tasks (like sending email through an SMTP server) you need to use an asynchronous task if you care about your users since the time required for such a task is not really limited.

Using job queues is involved not only for the developers of the application (who need to create the asynchronous tasks and give feedback to the users when the’ve finished since they can’t use the normal HTTP response) and but also for the administrators, since, in order to support job queues at least two more componets will be needed:

  • One job queue that will store the jobs to be executed next in a first in first queue. This could be the normal database of the project however it’s not recommended for performance reasons and most of thetimes it is a specific component called “Message Broker”
  • One (or more) workers that will monitor the job queue and when there is work to do they will dequeue and execute it

These can all run in the same server but if it gets saturated they can easily be seperated (even more work for administrators).

Beyond job queuing, another relative requirement for many projects is to schedule a task to be run in the future (similar to the at unix command) or at specific time intervals (similar to the cron unix command). For instance, if a user is registered today we may need to check after one or two days if he’s logged in and used our application - if he hasn’t then probably he’s having problems and we can call him to help him. Also, we could check every night to see if any users that have registered to our application don’t have activated their account through email activation and delete these accounts. Scheduled tasks should be also run by the workers mentioned above.

Job queues in python

The most known application for using job queues in python is celery which is a really great project that supports many brokers, integrates nicely with python/django (but can be used even with other languages) and has many more features (most of them are only useful on really big, enterprise projects). I’ve already used it in a previous application, however, because celery is really complex I found it rather difficult to configure it successfully and I never was perfectly sure that my asynchronous task would actually work or that I’d used the correct configuration for my needs!

Celery also has many dependencies in order to be able to talk with the different broker backends it supports, improve multithreading support etc. They may be required in enterprise apps but not for most Django web based projects.

So, for small-to-average projects I recommend using a different asynchronous task solution instead of celery, particularly (as you’ve already guessed from the title of this post) RQ. RQ is simpler than celery, it integrates great with django using the excellent django-rq package and doesn’t actually have any more dependencies beyond redis support which is used as a broker (however most modern django projects already use redis for their caching needs as an alternative to memcached).

It even supports supports job scheduling through the rq-scheduler package (celery also supports job scheduling through celery beat): Run a different process (scheduler) that polls the job scheduling queue for any jobs that need to be run because of scheduling and if yes put them to the normal job queue.

Although RQ and frieds are really easy to use (and have nice documentation) I wasn’t able to find a complete example of using it with django, so I’ve implemented one (found at https://github.com/spapas/django-test-rq — since I’ve updated this project a bit with new stuff, please checkout tag django-test-rq-simple git checkout django-test-rq-simple) mainly for my own testing purposes. To help others that want to also use RQ in their project but don’t know from where to start, I’ll present it in the following paragraphs, along with some comments on how to actually use RQ in your production environment.

django-test-rq

This is a simple django project that can be used to asynchronously run and schedule jobs and examine their behavior. The job to be scheduled just downloads a provided URL and counts its length. There is only one django application (tasks) that contains two views, one to display existing tasks and create new ones and one to display some info for the jobs.

models.py

Two models (Task and ScheduledTask) for saving individual tasks and scheduled tasks and one model (ScheduledTaskInstance) to save scheduled instances of each scheduled task.

from django.db import models
import requests
from rq import get_current_job


class Task(models.Model):
    # A model to save information about an asynchronous task
    created_on = models.DateTimeField(auto_now_add=True)
    name = models.CharField(max_length=128)
    job_id = models.CharField(max_length=128)
    result = models.CharField(max_length=128, blank=True, null=True)


class ScheduledTask(models.Model):
    # A model to save information about a scheduled task
    created_on = models.DateTimeField(auto_now_add=True)
    name = models.CharField(max_length=128)
    # A scheduled task has a common job id for all its occurences
    job_id = models.CharField(max_length=128)


class ScheduledTaskInstance(models.Model):
    # A model to save information about instances of a scheduled task
    scheduled_task = models.ForeignKey('ScheduledTask')
    created_on = models.DateTimeField(auto_now_add=True)
    result = models.CharField(max_length=128, blank=True, null=True)

forms.py

A very simple form to create a new task.

from django import forms

class TaskForm(forms.Form):
    """ A simple form to read a url from the user in order to find out its length
    and either run it asynchronously or schedule it schedule_times times,
    every schedule_interval seconds.
    """
    url = forms.CharField(label='URL', max_length=128, help_text='Enter a url (starting with http/https) to start a job that will download it and count its words' )
    schedule_times = forms.IntegerField(required=False, help_text='How many times to run this job. Leave empty or 0 to run it only once.')
    schedule_interval = forms.IntegerField(required=False, help_text='How much time (in seconds) between runs of the job. Leave empty to run it only once.')

    def clean(self):
        data = super(TaskForm, self).clean()
        schedule_times = data.get('schedule_times')
        schedule_interval = data.get('schedule_interval')

        if schedule_times and not schedule_interval or not schedule_times and schedule_interval:
            msg = 'Please fill both schedule_times and schedule_interval to schedule a job or leave them both empty'
            self.add_error('schedule_times', msg)
            self.add_error('schedule_interval', msg)

views.py

This is actually very simple if you’re familiar with Class Based Views. Two CBVs are defined, one for the Task form + Task display and another for the Job display.

from django.views.generic.edit import FormView
from django.views.generic import TemplateView
from forms import TaskForm
from tasks import get_url_words, scheduled_get_url_words
from models import Task,ScheduledTask
from rq.job import Job
import django_rq
import datetime

class TasksHomeFormView(FormView):
    """
    A class that displays a form to read a url to read its contents and if the job
    is to be scheduled or not and information about all the tasks and scheduled tasks.

    When the form is submitted, the task will be either scheduled based on the
    parameters of the form or will be just executed asynchronously immediately.
    """
    form_class = TaskForm
    template_name = 'tasks_home.html'
    success_url = '/'

    def form_valid(self, form):
        url = form.cleaned_data['url']
        schedule_times = form.cleaned_data.get('schedule_times')
        schedule_interval = form.cleaned_data.get('schedule_interval')

        if schedule_times and schedule_interval:
            # Schedule the job with the form parameters
            scheduler = django_rq.get_scheduler('default')
            job = scheduler.schedule(
                scheduled_time=datetime.datetime.now(),
                func=scheduled_get_url_words,
                args=[url],
                interval=schedule_interval,
                repeat=schedule_times,
            )
        else:
            # Just execute the job asynchronously
            get_url_words.delay(url)
        return super(TasksHomeFormView, self).form_valid(form)

    def get_context_data(self, **kwargs):
        ctx = super(TasksHomeFormView, self).get_context_data(**kwargs)
        ctx['tasks'] = Task.objects.all().order_by('-created_on')
        ctx['scheduled_tasks'] = ScheduledTask.objects.all().order_by('-created_on')
        return ctx


class JobTemplateView(TemplateView):
    """
    A simple template view that gets a job id as a kwarg parameter
    and tries to fetch that job from RQ. It will then print all attributes
    of that object using __dict__.
    """
    template_name = 'job.html'

    def get_context_data(self, **kwargs):
        ctx = super(JobTemplateView, self).get_context_data(**kwargs)
        redis_conn = django_rq.get_connection('default')
        try:
            job = Job.fetch(self.kwargs['job'], connection=redis_conn)
            job = job.__dict__
        except:
            job = None

        ctx['job'] = job
        return ctx

tasks.py

Here two jobs are defined: One to be used for simple asynchronous tasks and the other to be used for scheduled asynchronous tasks (since for asynchronous tasks we wanted to group their runs per job id).

The @job decorator will add the delay() method (used in views.py) to the function. It’s not really required for scheduled_get_url_words since it’s called through the scheduled.schedule.

When a task is finished, it can return a value (like we do in return task.result) which will be saved for a limited amount of time (500 seconds by default - could be even saved for ever) to redis. This may be useful in some cases, however, I think that for normal web applications it’s not that useful, and since here we use normal django models for each task, we can save it to that model’s instance instead.

import requests
from models import Task, ScheduledTask, ScheduledTaskInstance
from rq import get_current_job
from django_rq import job


@job
def get_url_words(url):
    # This creates a Task instance to save the job instance and job result
    job = get_current_job()

    task = Task.objects.create(
        job_id=job.get_id(),
        name=url
    )
    response = requests.get(url)
    task.result = len(response.text)
    task.save()
    return task.result


@job
def scheduled_get_url_words(url):
    """
    This creates a ScheduledTask instance for each group of
    scheduled task - each time this scheduled task is run
    a new instance of ScheduledTaskInstance will be created
    """
    job = get_current_job()

    task, created = ScheduledTask.objects.get_or_create(
        job_id=job.get_id(),
        name=url
    )
    response = requests.get(url)
    response_len = len(response.text)
    ScheduledTaskInstance.objects.create(
        scheduled_task=task,
        result = response_len,
    )
    return response_len

settings.py

import os
BASE_DIR = os.path.dirname(os.path.dirname(__file__))

SECRET_KEY = '123'
DEBUG = True
TEMPLATE_DEBUG = True
ALLOWED_HOSTS = []

INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'django_extensions',
    'django_rq',

    'tasks',
)

MIDDLEWARE_CLASSES = (
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
)

ROOT_URLCONF = 'django_test_rq.urls'
WSGI_APPLICATION = 'django_test_rq.wsgi.application'

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    }
}

LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True

STATIC_URL = '/static/'

# Use redis for caches
CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://127.0.0.1:6379/0",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        }
    }
}

# Use the same redis as with caches for RQ
RQ_QUEUES = {
    'default': {
        'USE_REDIS_CACHE': 'default',
    },
}

SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "default"
RQ_SHOW_ADMIN_LINK = True

# Add a logger for rq_scheduler in order to display when jobs are queueud
LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'simple': {
            'format': '%(asctime)s %(levelname)s %(message)s'
        },
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
    },

    'loggers': {
        'django.request': {
            'handlers': ['console'],
            'level': 'DEBUG',
            'propagate': True,
        },
        'rq_scheduler': {
            'handlers': ['console'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

By default, rq_scheduler won’t log anything so we won’t be able to see any output when new instances of each scheduled task are queued for execution. That’s why we’ve overriden the LOGGING setting in order to actually log rq_scheduler output to the console.

Running the project

I recommend using Vagrant to start a stock ubuntu/trusty32 box. After that, install redis, virtualenv and virtualenvwrapper and create/activate a virtualenv named rq. You can go to the home directory of django-test-rq and install requirements through pip install requirements.txt and create the database tables with python manage.py migrate. Finally you may run the project with python manage.py runserver_plus.

rqworker and rqscheduler

Before scheduling any tasks we need to run two more processes:

  • rqworker: This is a worker that dequeues jobs from the queue and executes them. We could run more than one onstance of this job if we need it.
  • rqscheduler: This is a process that runs every one minute and checks if there are scheduled jobs that have to be executed. If yes, it will add them to the queue in order to be executed by a worker.

For development

If you want to run rqworker and rqscheduler for your development environment you can just do it with running python manage.py rqworker and python mange.py rqscheduler through screen/tmux. If everything is allright you should see tasks being added to the queue and scheduled (you may need to refresh the homepage before seeing everything since a task may be executed after the response is created).

Also, keep in mind that rqscheduler runs once every minute by default so you may need to wait up to minute to see a ScheduledTask instance. Also, this means that you can’t run more than one scheduled task instance per minute.

For production

Trying to create daemons through screen is not sufficient for a production envornment since we’d like to actually have logging, monitoring and of course automatically start rqworker and rqscheduler when the server boots.

For this, I recommend using the supervisord tool which can be used to monitor and control a number of processes. There are other similar tools, however I’ve found supervisord the easier to use.

In order to monitor/control a process through supervisord you need to add a [program:progrname] section in supervisord’s configuration and pass a number of parameters. The progname is the name of the monitoring process. Here’s how rqworker can be configured using supervisord:

[program:rqworker]
command=python manage.py rqworker
directory=/vagrant/progr/py/rq/django-test-rq
environment=PATH="/home/vagrant/.virtualenvs/rq/bin"
user=vagrant

The options used will chdir to directory and execute command as user. The environment option can be used to set envirotnment variables - here we set PATH in order to use a specific virtual environment. This will allow you to monitor rqworker through supervisord and log its output to a file in /var/log/supervisor (by default). A similar entry needs to be added for rqscheduler of course. If everything has been configured correctly, when you reload the supervisord settings you can run sudo /usr/bin/supervisorctl and should see something like

rqscheduler                      RUNNING    pid 1561, uptime 0:00:03
rqworker                         RUNNING    pid 1562, uptime 0:00:03

Also, tho log files should contain some debug info.

Conclusion

Although using job queues makes it more difficult for the developer and adds at least one (and probably more) points of failure to a project (the workers, the broker etc) their usage, even for very simple projects is unavoidable.

Unless a complex, enterprise solution like celery is really required for a project I recommend using the much simpler and easier to configure RQ for all your asynchronous and scheduled task needs. Using RQ (and the relative projects django-rq and rq-scheduler) we can easily add production ready queueued and scheduled jobs to any django project.

In this article we presented a small introduction to RQ and its friends and saw how to configure django to use it in a production ready environment using a small django project (https://github.com/spapas/django-test-rq) which was implemented as a companion to help readers quickly test the concepts presented here.

Django model auditing

Introduction

An auditing trail is a common requirement in most non-trivial applications. Organizations need to know who did the change, when it was done and what was actually changed. In this post we will see three different solution in order to add this functionality in Django: doing it ourselves, using django-simple-history and using django-reversion.

Update 24/09/2015: Added a paragraph describing the django-reversion-compare which is a great addon for django-reversion that makes finding differences between versions a breeze!

Adding simple auditing functionality ourselves

A simple way to actually do auditing is to keep four extra fields in our models: created_by, created_on, modified_by and modified_on. The first two will be filled when the model instance is created while the latter two will be changed whenever the model instance is saved. So we only have who and whe. Sometimes, these are enough so let’s see how easy it is to implement it in django.

We’ll need an abstract model that could be used as a base class for models that need auditing:

from django.conf import settings
from django.db import models

class Auditable(models.Model):
    created_on = models.DateTimeField(auto_now_add = True)
    created_by = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='created_by')

    modified_on = models.DateTimeField(auto_now = True)
    modified_by = models.ForeignKey(settings.AUTH_USER_MODEL, related_name='modified_by')

    class Meta:
        abstract = True

Models inheriting from Auditable will contain their datetime of creation and modification which will be automatically filled using the very usefull auto_now_add_ (which will set the current datetime when the model instance is created) and auto_now_ (which will set the current datetime when the model instance is modified).

Such models will also have two foreign keys to User, one for the user that created the and one of the user that modified them. The problem with these two fields is that they cannot be filled automatically (like the datetimes) because the user that actually did create/change the objects must be provided!

Since I am really fond of CBVs I will present a simple mixin that can be used with CreateView and UpdateView and does exactly that:

class AuditableMixin(object,):
    def form_valid(self, form, ):
        if not form.instance.created_by:
            form.instance.created_by = self.request.user
        form.instance.modified_by = self.request.user
        return super(AuditableMixin, self).form_valid(form)

The above mixin overrides the form_valid method of CreateView and UpdateView: First it checks if the object is created (if it is created it won’t be saved in the database yet thus it won’t have an id) in order to set the created_by attribute to the current user. After that it will set the modified_by attribute of the object to the current user. Finally, it will call the next form_valid method to do whatever is required (save the model instance and redirect to success_url by default).

The views using AuditableMixin should allow only logged in users (or else an exception will be thrown). Also, don’t forget to exclude the created_by and modified_by fields from your model form (created_on and modified_on will automatically be excluded).

Example

Let’s see a simple example of creating a small django application using the previously defined abstract model and mixin:

models.py

from django.conf import settings
from django.core.urlresolvers import reverse
from django.db import models

from auditable.models import Auditable


class Book(Auditable):
    name = models.CharField(max_length=128)
    author = models.CharField(max_length=128)

    def get_absolute_url(self):
        return reverse("book_list")

In the above we suppose that the Auditable abstract model is imported from the auditable.models module and that a view named book_list that shows all books exists.

forms.py

from django.forms import ModelForm


class BookForm(ModelForm):
    class Meta:
        model = Book
        fields = ['name', 'author']

Show only name and author fields (and not the auditable fields) in the Book ModelForm.

views.py

from django.views.generic.edit import CreateView, UpdateView
from django.views.generic import ListView

from auditable.views import AuditableMixin

from models import Book
from forms import BookForm


class BookCreateView(AuditableMixin, CreateView):
    model = Book
    form_class = BookForm


class BookUpdateView(AuditableMixin, UpdateView):
    model = Book
    form_class = BookForm


class BookListView(ListView):
    model = Book

We import the AuditableMixin from auditable.views and make our Create and Update views inherit from this mixin also in addition to CreateView and UpdateView. Pay attention that our mixin is placed before CreateView in order to call form_valid in the proper order: When multiple inheritance is used like this python will check each class from left to right to find the proper method and call it. For example, in our BookCreateView, when the form_valid method is called, python will first check if BookCreateView has a form_valid method. Since it does not, it will check if AuditableMixin has a form_valid method and call it. Now, we are calling the super(...).form_valid() in the AuditableMixin form_valid, so the form_valid of CreateView will also be called.

A simple ListView is also added to just show the info on all books.

urls.py

from django.conf.urls import patterns, include, url

from views import BookCreateView, BookUpdateView, BookListView

urlpatterns = patterns('',
    url(r'^accounts/login/$', 'django.contrib.auth.views.login', ),
    url(r'^accounts/logout/$', 'django.contrib.auth.views.logout', ),

    url(r'^create/$', BookCreateView.as_view(), name='create_book'),
    url(r'^update/(?P<pk>\d+)/$', BookUpdateView.as_view(), name='update_book'),
    url(r'^$', BookListView.as_view(), name='book_list'),
)

Just add the previously defined Create/Update/List views along with a login/logout views.

templates

You’ll need four templates:

  • books/book_list.html: Show the list of books
  • books/book_form.html: Show the book editing form
  • registration/login.html: Login form
  • registration/logout.html: Logout message

Using django-simple-history

django-simple-history can be used to not only store the user and date of each modification but a different version for each modification. To do that, for every model that is registered to be used with django-simple-history, it wil create a second table in the database hosting all versions (historical records) of that model. As we can understand this is really powerfull since we can see exactly what was changed and also do normal SQL queries on that!

Installation

To use django-simple-history in a project, after we do a pip install django-simple-history, we just need to add it to INSTALLED_APPS and add the simple_history.middleware.HistoryRequestMiddleware to the MIDDLEWARE_CLASSES list.

Finally, to keep the historical records for a model, just add an instace of HistoricalRecords to this model.

Example

For example, our previously defined Book model will be modified like this:

class SHBook(models.Model):
    name = models.CharField(max_length=128)
    author = models.CharField(max_length=128)

    def get_absolute_url(self):
        return reverse("shbook_list")

    history = HistoricalRecords()

When we run python manage.py makemigrations and migrate this, we’ll see that beyond the table for SHBook, a table for HistoricalSHBook will be created:

Migrations for 'sample':
  0002_historicalshbook_shbook.py:
    - Create model HistoricalSHBook
    - Create model SHBook

Let’s see the schema of historicalshbook:

CREATE TABLE "sample_historicalshbook" (
    "id" integer NOT NULL,
    "name" varchar(128) NOT NULL,
    "author" varchar(128) NOT NULL,
    "history_id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "history_date" datetime NOT NULL,
    "history_type" varchar(1) NOT NULL,
    "history_user_id" integer NULL REFERENCES "auth_user" ("id")
);

So we see that it has the same fields as with SHBook (id, name, author) with the addition of the primary key (history_id) of this historical record, the date and user that did the change (history_date, history_user_id) and the type of the record (created / update / delete).

So, just by adding a HistoricalRecords() attribute to our model definition we’ll get complete auditing for the instance of that model

Usage

To find out information about the historical records we’ll just use the HistoricalRecords() attribute of that model:

For example, running SHBook.history.filter(id=1) will return all historical records of the book with id = 1. For each one of them we have can use the following:

  • get the user that made the change through the history_user attribute
  • get the date of the change through the history_date attribute
  • get the type of the change through the history_type attribute (and the corresponding get_history_type_dispaly)
  • get a model instance as it was then through the history_object attribute (in order to save() it and revert to this version)

Using django-reversion

django-reversion offers more or less the same functionality of django-simple-history by following a different philosophy: Instead of creating an extra table holding the history records for each model, it insteads converts all the fields of each model to json and stores that JSON in the database in a text field.

This has the advantage that no extra tables are created to the database but the disadvantage that you can’t easily query your historical records. So you may choose one or the other depending on your actual requirements.

Installation

To use django-reversion in a project, after we do a pip install django-reversion, we just need to add it to INSTALLED_APPS and add the reversion.middleware.RevisionMiddleware to the MIDDLEWARE_CLASSES list.

In order to save the revisions of a model, you need to register this model to django-reversion. This can be done either through the django-admin, by inheriting the admin class of that model from reversion.VersionAdmin or, if you don’t want to use the admin by reversion.register decorator.

Example

To use django-reversion to keep track of changes to Book we can modify it like this:

@reversion.register
class RBook(models.Model):
    name = models.CharField(max_length=128)
    author = models.CharField(max_length=128)

    def get_absolute_url(self):
        return reverse("rbook_list")

django-reversion uses two tables in the database to keep track of revisions: revision and version. Let’s take a look at their schemata:

.schema reversion_revision
CREATE TABLE "reversion_revision" (
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "manager_slug" varchar(200) NOT NULL,
    "date_created" datetime NOT NULL,
    "comment" text NOT NULL,
    "user_id" integer NULL REFERENCES "auth_user" ("id")
);

.schema reversion_version
CREATE TABLE "reversion_version" (
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "object_id" text NOT NULL,
    "object_id_int" integer NULL,
    "format" varchar(255) NOT NULL,
    "serialized_data" text NOT NULL,
    "object_repr" text NOT NULL,
    "content_type_id" integer NOT NULL REFERENCES "django_content_type" ("id"),
    "revision_id" integer NOT NULL REFERENCES "reversion_revision" ("id")
);

As we can understand, the revision table holds information like who created this revison (user_id) and when (date_created) while the version stores a reference to the object that was modified (through a GenericForeignKey) and the actual data (in the serialized_data field). By default it uses JSON to serialize the data (the serialization format is in the format field). There’s an one-to-one relation between revision and version.

If we create an instance of RBook we’ll see the following in the database:

sqlite> select * from reversion_revision;
1|default|2015-01-21 10:31:25.233000||1

sqlite> select * from reversion_version;
1|1|1|json|[{"fields": {"name": "asdasdasd", "author": "asdasd"}, "model": "sample.rbook", "pk": 1}]|RBook object|12|1

date_created and user_id are stored on revision while format, serialized_data, content_type_id and object_id_int (the GenericForeignKey) are stored in version.

Usage

To find out information about an object you have to use the reversion.get_for_object(object) method. In order to be easily used in templates I recommend creating the following get_versions() method in each model that is registered with django-reversion

def get_versions(self):
    return reversion.get_for_object(self)

Now, each version has a revision attribute for the corresponding revision and can be used to do the following:

  • get the user that made the change through the revision.user attribute
  • get the date of the change through the revision.date_created attribute
  • get the values of the object fields as they were in this revision using the field_dict attribute
  • get a model instance as it was on that revision using the object_version.object attribute
  • revert to that previous version of that object using the revert() method

Comparing versions with django-reversion-compare

A great addon for django-version is django-reversion-compare which helps you find out differences between versions of your objects. When you use django-reversion-compare, you’ll be able to select two (different) versions of your object and you’ll be presented with a list of all the differences found in the fields of that object between the two versions. The diff algorithm is smart, so you’ll be able to easily recognise the changes.

To use django-reversion-compare, after installing it you should just inherit your admin views from reversion_compare.admin.CompareVersionAdmin (instead of reversion.VersionAdmin) and you’ll get the reversion-compare views instead of reversion views in the admin for the history of the object.

Also, in case you need to give access to normal, non-admin users to the history of an object (this is useful for auditing reasons), you can use the reversion_compare.views.HistoryCompareDetailView as a normal DetailView to create a non-admin history and compare diff view.

Conclusion

In the above we say that it is really easy to add basic (who and when) auditing capabilities to your models: You just need to inherit your models from the Auditable abstract class and inherit your Create and Update CBVs from AuditableMixin. If you want to know exactly what was changed then you have two solutions: django-simple-history to create an extra table for each of your models so you’ll be able to query your historical records (and easily extra aggregates, statistics etc) and django-reversion to save each version as a json object, so no extra tables will be created.

All three solutions for auditing have been implemented in a sample project at https://github.com/spapas/auditing-sample.

You can clone the project and, preferrably in a virtual environment, install requirements (pip install -r requirements.txt), do a migrate (python manage.py migrate — uses sqlite3 by default) and run the local development server (python manage.py ruinserver).