/var/

Various programming stuff

Using both Python 2 and 3 in Windows

The release of Django 2.0 was a milestone in the history of Python since it completely dropped support for Python 2.x. I am a long time user of Django and, loyal to the philosophy of “if it is working don’t change” I was always using Python 2.x.

Of course right now this needs to change since new applications will need to be developed to the latest version of Django (and Python). I am using Windows for development (almost exclusively) and what I wanted was to be able to create and use virtual environments for both my old projects (using Python 2) and new projects (using Python 3).

The above requirement is not as straight-forward as I’d like - actually it is if you know what you need to do and which tools you must use. That’s why I decided to write a quick step by step tutorial on how to use both versions of Python in your Windows environment. Notice that I am using Windows 10, Python 2.7.14 an Python 3.6.4.

First of all, let’s download both versions of Python from the Python download page. I downloaded the files python-2.7.14.msi and python-3.6.4.exe (not sure why the one is .msi and the other is .exe it doesn’t matter anyway).

Firstly I am installing Python 2.7.14 and selecting:

  • Install for all users
  • Install to default folder
  • Press next to following screen (install default customizations)
  • This won’t add the python.exe of Python 2.7 to path

Next I am install Python 3.6.4:

  • Make sure to click “Install launcher for all users (recommended)”
  • I also check “Add Python 3.6to PATH” (to add the Python 3.6 executable to path)
  • I then just click “Install Now” (this will put Python 3.6 to c:)

Right now if you open a terminal window (Windows+r, cmd.exe) and run Python you will initiate the Python 3.6 interpreter. This is useful for just dropping in a Python interpreter.

Remember the “Install launcher for all users (recommended)” we clicked before? This installs the python launcher which is used to select between Python versions. If you run it without parameters the Python 3.6 interpreter will by started. You can pass the -2 parameter to start the python 2.x interpreter or -3 to explicitly declare the python 3.x interpreter:

C:\Users\Serafeim>py -2
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z


C:\Users\Serafeim>py -3
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z

With py we can easily start the Python interpreter we want. This is not enough though - we need to use virtualenv and create proper virtual environments for our projects. To do this you can add the -m option to py to run a module with the proper python version. For example, to start an http server with Python 2 you would use the module SimpleHTTPServer while for python 3 you would use http.server (as per this):

C:\progr\py\pelican\spapas.github.io>py -2 -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

C:\progr\py\pelican\spapas.github.io>py -3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

Now, to create the virtual environments we’ll use the virtualenv module which is installed by default in Python 2.x (try running py -2 -m pip freeze to see the list of installed packages for Python 2.x) and the venv module which is included in the Python 3.x core. So to create a virtualenv for Python 2 we’ll run py -2 -m virtualenv name-of-virtualenv and for Python 3 py -3 -m venv name-of-virtualenv.

C:\progr\py>py -2 -m virtualenv venv-2
New python executable in C:\progr\py\venv-2\Scripts\python.exe
Installing setuptools, pip, wheel...done.

C:\progr\py>py -3 -m venv venv-3

C:\progr\py>venv-2\Scripts\activate

(venv-2) C:\progr\py>python -V
Python 2.7.14

(venv-2) C:\progr\py>deactivate

C:\progr\py>venv-3\Scripts\activate

(venv-3) C:\progr\py>python -V
Python 3.6.4

That’s how easy it is to have both Python 2.7 and Python 3.6 in your Windows!

My essential django package list

In this article I’d like to present a list of django packages (add-ons) that I use in most of my projects. I am using django for more than 5 years as my day to day work tool to develop applications for the public sector organization I work for. So please keep in mind that these packages are targeted to the “enterpripse” audience and some things that target public (open access) web apps may be missing.

Some of these packages are included in a django cookiecutter I am using to start new projects and also you can find example usage of some of these packages in my Mailer Server project. I’ll also be happy to answer any questions about these, preferably in stackoverflow if the questions are fit for that site.

Finally, before going to the package list, notice that most of the following are django-related and I won’t include any generic python packages (with the exception of xhtml2pdf which I discuss later). There are a bunch of python-generic packages that I usually use (xlrd, xlwt, requests, raven, unidecode, database connectors etc) but these won’t be discussed here.

Packages I use in all my projects

These packages are used more or less in all my projects - I just install them when I create a new project (or create the project through the cookiecutter I mention above)/

django-tables2

django-tables2, along with django-filter (following) are the two packages I always use in my projets. You create a tables.py module inside your applications where you define a bunch of tables using more or less the same methodology as with django-forms: For a Book model, Create a BookTable class, relate it with the model and, if needed override some of the columns it automatically generates. For example you can configure a column that will behave as a link to a detail view of the book, a column that will behave as a date (with the proper formatting) or even a column that will display a custom django template snippet.

You’ll then be able to configure the data (queryset) of this table and pass it your views context. Finally use {% render_table table %} to actually display your table. What you’ll get? The actual html table (with a nice style that can be configured if needed), free pagination and free column-header sorting. After you get the hang of it, you’ll be able to add tables for your models in a couple of minutes - this is DRY for me. It also offers some class based views and mixins for even more DRY adding tables to your views.

Finally, please notice that since version 1.8, django-tables2 supports exporting data using tablib, an old requirement of most users of this library.

django-filter

django-filter is used to create filters for your data. It plays well with django-tables2 but you can use it to create filters for any kind of listing you want. It works similar to django-tables2: Create a filters.py module in your application and define the a BookFilter class there by relating it with the Book model, specifying the fields of the model you want to filter with and maybe override some of the default options. New versions have some great built-in configuration functionality by customizing which method will be used for filtering or even adding multiple filter methods - for example you could define your BookFilter like this:

class BookFilter(django_filters.FilterSet):
    class Meta:
        model = Book
        fields = {
            'name': ['icontains'],
            'author__last_name': ['icontains', 'startswith'],
            'publication_date': ['year', 'month', ],
        }

which will give you the following filter form fields:

  • an ilike ‘%value%’ for the book name
  • an ilike ‘%value%’ for the author name
  • a like ‘value%’ for the author name
  • a year(publication_date) = value for the publication_date
  • a month(publication_date) = value for the publication_date

and their (AND) combinations!

The BookFilter can be used to create a filter form in your template and then in your views pass to it the initial queryset along with request.GET (which will contain the filter values) to return the filtered data (and usually pass it to the table). I’ve created a sample project that uses both django-tables2 and django-filters for you to use: https://github.com/spapas/django_table_filtering. Also, I’ve written an article which describes a technique for automatically creating a filter-table view.

django-crispy-forms

The forms that are created by default by django-forms are very basic and not styled properly. To overcome this and have better styles for my forms, I always use django-crispy-forms. It actually has two modes: Using the crispy template filter and using the crispy template tag. Using the crispy template filter is very simple - just take a plain old django form and render it in your template like this {{ form|crispy }}. If the django-crispy-forms has been configured correctly (with the correct template pack) the form you’ll get will be much nicer than the django-default one. This is completely automatic, you don’t need to do anything else!

Now, if you have some special requirements from a form, for example multi-column rendering, adding tabs, accordions etc then you’ll need to use the {% crispy %} template tag. To use this you must create the layout of your form in the form’s costructor using the FormHelper django-crispy-forms API. This may seem cumbersome at first (why not just create the form’s layout in the django template) but using a class to define your form’s layout has other advantages, for example all the form layout is in the actual form (not in the template) you can control programatically the layout of the form (f.e display some fields only for administrators), you can use inheritance and virtual methods to override how a form is rendered etc.

To help you understand how a FormHelper looks like, here’s a form that is used to edit an access Card for visitors that displays all fields horizontally inside a panel (I am using Bootstrap for all styling and layout purposes):

class CardForm(ModelForm):
    class Meta:
        model = Card
        fields = ['card_number', 'card_text', 'enabled']

    def __init__(self, *args, **kwargs):
        self.helper = FormHelper()
        self.helper.form_method = 'post'
        self.helper.layout = Layout(
            Fieldset(
                '',
                HTML(u"""<div class="panel panel-primary">
                   <div class="panel-heading">
                       <h3 class="panel-title">Add card</h3>
                   </div>
                   <div class="panel-body">"""),
                Div(
                   Div('card_number', css_class='col-md-4'),
                   Div('card_text', css_class='col-md-4'),
                   Div('enable',css_class='col-md-4'),
                   css_class='row'
                ),
            ),

            FormActions(
                Submit('submit', u'Save', css_class='btn btn-success'),
                HTML(u"<a class='btn btn-primary' href='{% url \"card_list\" %}'>Return</a>" ),
            ),
            HTML(u"</div></div>"),
        )

        super(CardForm, self).__init__(*args, **kwargs)

Notice that forms that will be rendered with a FormHelper actually contain their <form> tag (you don’t need to write it yourself like with plain django forms) so you have to define their method (post in this example) and submit button.

django-extensions

django-extensions is a swiss-army-knife of django tools. I use it always in my projects because of the runserver_plus and shell_plus commands. The first uses the Werkzeug debugger with django which makes django development an absolute joy (open a python shell wherever you want in your code and start writing commands)! The second opens a better shell (your models and a bunch of django stuff are auto-imported, a better shall will be used if found etc).

The runserver_plus and shell_plus alone will be more than enough for me to use this however it adds some more usefull management commands like: admin_generator to quickly create an admin.py for your app, graph_models to generate a graphviz dot file of your models, update_permissions to synchronize the list of permissions if you have added one to an existing model and many, many others. Take a look at them and you’ll probably find more useful things!

django-autocomplete-light

django-autocomplete-light is the best auto-complete library for django, especially after v3 was released (which greatly reduces the magic and uses normal CBVs for configuring the querysets). You will create an AutocompleteView for a model (similar to your other class based views) and then automatically use this view through a widget in the admin or in your own forms. It is fully configurable (both the results and the selection templates), supports many to many fields, creating new instances and even autocompleteing django-taggit tags! If for some reason it seems that it is not working please keep in mind that you need to includ jquery and {{ form.media }} to your templates or else the required client side code won’t be executed.

I think it is an essential for all cases because dropdowns with lots of choices have a very bad user experience - and the same is true with many to many fields (you could use the checkboxes widget to improve their behavior a little but you will have bad behavior when there are many choices).

django-reversion

django-reversion is a really important package for my projects. When it is configured properly (by adding a a reversion-middleware), it offers full auditing for all changes in the your model instances you select (and properly groups them in case of changes to multiple instances in a simple request). It saves a JSON representation of all the versions of an instance in your database. Keep in mind that this may increase your database size but if you need full auditing then is is probably the best way to do it. I have written an article about django model auditing that discusses this package and django-simple-history (following) more.

django-compressor

django-compressor is package that combines and minifies your css and javascript (both files and line snippets) into static files. There are other tools for this but I have never used them since django-compressor satisfies my needs. Although I’ve written about browserify and friends from the node-js world I don’t recommend using such tools in django to combine and minify your javascript and css unless you specifically require them.

It has an online and an offline mode. For the online mode, when a request is done it will check if the compressed file exist and if not it will create it. This may lead to problems with permissions if your application server user cannot write to your static folders and also your users will see exceptions if for some reason you have included a file that cannot be found. For the offline mode, you need to run a management command that will create the static files while deploying the applications - this mode is recommended because any missing files problems etc will be resolved while deploying the app.

django-debug-toolbar

django-debug-toolbar: This is a well known package for debugging django apps that is always included in the development configuration of my projects. It has various panels that help you debug your application but, at least for me, the most helpful is the one that displays you all SQL queries that are executed for a page load. Because of how the django orm is working it will go on and follow all relations something that will lead to hundreds of queries. For example, let’s say that you have simple Book model with a foreign key to an Author model that has N instances in your database. If you do a Book.objects.all() and want display the author name for each book in a template then you’ll always do N+1 queries to the database! This is really easy to miss because in the django you’ll just do {% for book in books %}{{ book.name}}, {{ book.author.name }}{% endif %} — however the {{ book.author.name }} will go on and do an extra SQL query!!! Such cases are easily resolved by using select_related (and prefetch_related) but you must be sure to use select_related for all your queries (and if you add some extra things to your template you must remember to also add them to your select_related clause for the query).

So, what I recommend before going to production is to visit all your pages using django-debug-toolbar and take a quick look at the number of SQL queries. If you see something that does not make sense (for example you see more than 10 queries) then you’ll need to think about the problem I just mentioned. Please notice that this, at least for me, is not premature optimization - this is not actually optimization! This is about writing correct code. Let’s suppose that you could not use the django orm anymore and you had to use plain old SQL queries. Would you write SELECT * FROM books and then for each row do another SELECT * FROM authors WHERE id=? passing the author of each book or do only select * from books b LEFT JOIN authors a on b.author_id = a.id?

Packages I use when I need their functionality

The packages following are also essential to me but only when I need their functionality. I don’t use them in all my projects but, when I need the capabilities they offer then I will use these packages (and not some others). For example, if I need to render PDFs in my applications then I will use the xhtml2pdf, if I need to login through LDAP I will use django-auth-ldap etc.

xhtml2pdf

xhtml2pdf is the package I use for creating PDF’s with django as I’ve alreadly discussed in the PDFs in Django article (this is not a django-specific package like most others I discuss here but it plays really good with django). You create a normal django template, add some styling to it and dump it to html. Notice that there’s a django-xhtml2pdf project but has not been recently updated and after all as you can see in my article it is easy to just call xhtml2pdf directly. The xhtml2pdf library is actually a wrapper around the excellent reportlab library which does the low-level pdf output.

Notice that the xhtml2pdf library had some maintenance problems (that’s why some people are suggesting other PDF solutions like WeasyPrint) however they seem to have been fixed now. Also, I have found out that, at least for my needs (using Windows as my development environment), other soltuons are much inferior to xhtml2pdf. I urge you to try xhtml2pdf first and only if you find that it does not cover your needs (and have asked me about your problem) try the other solutions.

django-auth-ldap

django-auth-ldap is the package you’ll want to use if your organization uses LDAP (or Active Directory) and you want to use it for logging in. Just configure your LDAP server settings, add the ldap authenticator and you’ll be ready to go. Please notice that this package is a django wrapper of the python-ldap package which actually provides the LDAP connection.

django-localflavor

django-localflavor offers useful stuff for various countries, mainly form fields with the correct validation and lists of choices. For example, for my country (Greece) you’ll get a GRPhoneNumberField, a GRPostalCodeField and a GRTaxNumberCodeField. Use it instead of re-implementing the behavior.

django-constance

django-constance is a simple package that enables you to add quick-configurable settings in your application. To change the settings.py file you need to edit the source and restart the application - for most installations this is a full re-deployment of the application. Fully re-deploying the app just to change a setting is not very good practice (depending on the setting of course but if it is a business setting it usually should be done by business users and not by administrators).

That’s where django-constance comes to help you. You can define some extra settings which can be changed through the django admin and their new value will be available immediately. Also you can configure where these settings will be saved. One option is the database but this is not recommended - instead you can use redis so that the settings values will be available much quicker!

django-rq

django-rq is a django wrapper for the rq library. I use it when I need asynchronous tasks (which is on almost all of my projects). More info can be found on the two articles I have writtten about django rq (asynchronous tasks in django and django-rq redux).

django-rules-light

One of the least known packages from those I discuss here, django-rules-light is one of the most useful when is needed. This package allows you to define complex rules for doing actions on model instances. Each rule is a function that gets the user that wants to do the action and the object that the user wants to action on. The function returns True or False to allow or not allow the action. You can then use these in both your code to programatically check if the user can do the the action and your templates to decide what buttons and options you will display. There are also various helper methods for CBVs that make everything easier.

To properly understand the value of django-rules-light you need to have some more complex than usual action rules. For example if your actions for an object are view / edit and all your users can view and edit their own objects then you don’t really need this package. However, if your administrators can view all objects and your object can be finalized so no changes are allowed unless an administrator tries to change it then you’ll greatly benefit from using it!

django-generic-scaffold

django-generic-scaffold is a package I have created that can be used to quickly (and DRYly) create CRUD CBVs for your models. I usually don’t want to give access to the django-admin to non-technical users however sometimes I want to quickly create the required CBVs for them (list, detail, create, edit delete). Using django-generic-scaffold you can just create a scaffold which is related with a Model and all the views will be automagically created - you only need to link them to your urls.py. The created CBVs are fully configurable by adding extra mixins or even changing the parent class of each CBV.

Notice that this package does not create any source files - instead all CBVs are created on-the-fly using type. For example, to create CRUD CBVs for a Book model you’ll do this in scaffolding.py:

class BookCrudManager(CrudManager):
    model = models.Book
    prefix = 'books'

and then in your urls.py you’ll just append the generated urls to your url list:

book_crud = BookCrudManager()
urlpatterns += book_crud.get_url_patterns()

Now you can visit the corresponding views (for example /books or /bookscreate - depends on the prefix) to add/view/edit etc your books!

django-taggit

django-taggit is the library you’ll want to use if you have to use tags with your models. A tag is a synonym for keyword, i.e adding some words/phrases to your instances that are used to categorise and desribe em. The relation between your to-be-tagged-model and your tags is many to many. To use it, you just add tags = TaggableManager() to your model and you are ready to go! Of course it will need some more configuration to be included in django admin and django forms but thankfully, autocomplete-lights can be integrated with django-taggit!

django-sendfile

django-sendfile is a very important - at least to - me library. Sometimes, user uploaded files (media in django) should not be visible to all users so you’ll need to implement some access control through your django app. However, it is important to not serve these media files through your application server (uwsi, gunicorn etc) but use a web server (nginx, apache ect) for serving them. This is needed because your application server’s purpose is not serving files from the disk - keep in mind that the application server usually has a specified amount of workers (usually analogous to the number of CPUs of your server, for example 4 workers ) - think what will happen if some large media files are server through these workers to users with a slow connection! With 4 such concurrent connections your application won’t be able to serve any other content!

So this package (along with the support of X-Sendfile from the web servers) helps you fulfill the above requirements: It allowes you to check permissions to your media through your django application but then offload the serving of your media files to the web server. More info about django-sendfile can be found on this SO answer but with a few words, with django-sendfile you create a view that checks if a file is allowed to be served and, if yes, instruct the web server to actually serve that file by appending a specific header to the response.

django-reversion-compare

django-reversion-compare is an addon to django-reversion. It allows you to compare two different versions of a model instance and highlights their differences using various smart algorithms (so if you have a long text field you won’t only see that these are different but you’ll also see where exactly they differ, with output similar to the one you get when using diff).

django-simple-history

django-simple-history has similar capabilities with django-reversion - (auditing and keeping versions of models) with a very important difference: While django-reversion keeps a JSON representation of each version in the database (making querying very difficult), django-simple-history creates an extra, history table for each model instance you want to track and adds each change as a new row to that table. As can be understood this will make the history table really huge but has the advantage that you can easily query for old values. I usually use django-reversion unless I know that I will need the history querying.

django-import-export

django-import-export can be used to enchance your models with mass import and export capabilities (from example from/to CSV). You will add an ModelResource class that describes (and configures) how your Model should be imported/exported. The ModelResource class can then be easily used in your views and, more importantly, it is integrated to the django-admin. I have to confess that I have not used django-import-export for importing data because I prefer implementing my own views for that (to have better control over the whole process and because the data I usually need to import does not usually map 1 to 1 with a model but I need to create more model instances etc). However I am using the export capabilities of django-import-export in various projects with great success, especially the admin integration which easily fulfills the exporting data capabilities of most users.

django-extra-views

Beyond django-forms, django supports a feature called Formsets which allows you to add/edit multiple object instances in a single view (by displaying all instances one after the other). The classic request/response cycle of django is preserved in Formsets, so your form instances will be submitted all together when you submit the form. The logic extension to the Formset is the ModelFormset i.e each form in a Formset is a ModelForm and InlineFormSet where you have a Parent model that has a bunch of children and you are editing the Parent and its children in a single Form. For example, you have a School and a Student model where each Student has a ForeignKey to School. The usual case would be to edit the Student model and select her school (through a drop down or even-better if you use django-autocomplete-light through a proper autocompelte widget). However, you may for some reason want to edit the School and display (and edit) the list of its Students — that’s where you’ll use an InlineFormSet!

The above features (Formsets, Modelformsets and Inlineformsets) are not supported natively by django CBVs — that’s where django-extra-views comes to the foreground. You can use the corresponding CBVs of django-extra-views to support the multiple-form workflows described above. Using these CBVs are more or less similar to using the good-old django FormView.

easy-thumbnails

easy-thumbnails is a great library if you want to support thumbnails. Actually, thumbnails is not 100% correct - this package can be used to generate and manage various versions of your original images, for example you may have a small version of the image that will be used as a thumbnail, a larger version that will be used in a gallery-carousel view and an even larger version (but not the original one which could be huge) that will be used when the user clicks on the gallery to view a larger version of the image. To do that you define the image configurations you support in your settings.py and then you have access to your thumbnails both in your templates and in your views. Notice that a specific thumbnail congfiguration for an image will be created only once since the generated images are saved so each thumbnail will be generated on the first request it contains it and will be reused in the following such requests.

django-rest-framework

django-rest-framework is definitely the best package for creating web APIs with django. I don’t recommend using it if you want to create a quick JSON search API (take a look at django non-HTML responses) but if you want to go the SPA way or if you want to create multiple APIs for various models then this is the way to go. Integrating it to your project will need some effort (that’s why I don’t recommend it for quick and dirty APIs) because you’ll need to create a serializers.py which will define the serializers (more or less the fields) for the models you’ll want to expose through your API and then create the views (or the viewsets which are families of views for example list, detail, delete, update, create) and add them to your urls.py. You’ll also need to configure authentication, authorization and probably filtering and pagination. This may seem like a lot of work but the result are excellent - you’ll get a full REST API supporting create, list, detail, update, delete for any complex configuration of your models. You can take a look at a sample application in my React tutorial repository (yes this is a repository that has a tutorial for React and friends but the back-end is in django and django-rest-framework).

django-rest-framework integrates nicely with django-filter (mentioned above) to re-use the filters you have created for your model listings in your REST APIs - DRY at its best!

django-waffle

django-waffle is described as a feature flipper. What does this mean? Let’s say that you want to control at will when a specific view will be enabled - this is the library you’ll want to use. Or you have developed a new feature and you want to give access to it only on a subset of users for a pilot run of the feature - once again you should use django flipper. It offers a nice admin interface where you can configure the flags that will be used for the various feature enabling/disabling (and if they are active or not) and various template tags and functions that you can use to test if the features should be activated or not.

django-allauth

django-allauth should be used in two cases: When you want a better user registration workflow than the default (non-existant) one or you want to integrate your application with an external OAuth provider (i.e allow your users to login through their facebook, google, twitter, github etc accounts). I have mainly used this package for the first case and I can confirm that it works great and you can create as complex flows as you want (for example, in one of my projects I have the following user registration-activation flow: A user registers using a custom form and using his email as username, he receives an email with a confirmation link, after he has confirmed his email he receivs a custom message to wait for his account activation and the administrators of the application are notified, the administrators enable the new user’s account after checking some things and only then he’ll be able to log-in). One thing that must be noticed about django-allauth is that it (in my opition) does not have very good documentation but there are lots of answers about django-allauth in stackoverflow and the source code is very clear so you can always use the source as documentation for this package.

django-modeltranslation

The django-modeltranslation library is the library I recommend for when you want to have translations to your models. To use it you add a translation.py file where you declare the models and their fields that should be translated. The, depending on which languages you have configured in your settings.py after you run makemigrations and migrate you’ll see that django-modeltranslation will have included extra fields to the database, each one with the corresponding language name (for example if you have added a field name to the translations and have english and greek as language, django-modeltranslation will add the fields name_en and name_el to your table). You can the edit the i18n fields (using forms or the django admin) and depending on the current language of your site when you use name you’ll get either name_el or name_en.

django-widget-tweaks

If for some reason you don’t want to do django-crispy-forms, or you have a form in which you want to do a specific layout change but without fully implementing the FormHelper then you can actually render the form in HTML and output the fields one by one. One thing that cannot be done though is passing custom options to the rendered form field. When you do a {{ form.field }} to your template django will render the form field using its default options - yes this can be overriden using custom widgets but I don’t recommend it for example if you only want to add a class to the rendered <input>!

Instead, you can use django-widget-tweaks to pass some specific class names or attributes to the rendered form fields - so if you use {{ form.field|add_class:"myclass" }} the rendered <input> will have a myclass css class.

django-simple-captcha

Use django-simple-captcha to add (configurable) captchas to your django forms. This is a very simple package that does not have any requirements beyond the Pillow library for the captcha image generator. The generated captchas are simple images with some added noise so it won’t integrate reCAPTCHA with which you may be more familiar. I deliberatly propose this package for captchas so you won’t need to integrate with Google services.

wagtail

wagtail is a great django CMS. I use it when I need to create a CMS or I need to add CMS like capabilities to a project. It has many capabilities, too many to be listed here. I urge you to try it if you need a CMS!

Conclusion

The above packages should cover most of your django needs. I have listed only packages with good documentation, that have been recently updated and work with new django versions and should be fairly easy to integrate with your projects. If you need anything more or want to take a general look at some of the packages that have are availablie I recommend starting with the django packages site.

One important thing to notice here is that some of the above packages are not really complex and their functionality can be re-implemented by you in a couple of hours. For example, you could replicate the functionality of django-constance by adding a config dict and a couple of methods (and template tags) of storing and retrieving the keys of that dict with redis. Or add some custom clean methods to your forms instead of using the form fields from django-localflavor. Also, some of these packages have similar functionality and can be used (along with a little custom code) to replicate the functionality of other packages, for exmaple instead of using django-waffle you could use django-constance to configure if the features should be enabled or disabled and django-rules-light to control if the users have access to the feature. Also, you could probably use django-waffle for access control, i.e allow only admins to access a specific views.

Please don’t do this. This violates DRY and violates being disciplined. Each package has its purpose and being DRY means that you use it for its purpose, not re-implementing it and not re-using it for other purposes. When somebody (or you after some months) sees that package in requirements or INSTALLED_APPS he will conclude that you are using it for its intented purpose and thank you because you have saved him some time - please don’t make him waste his time by needing to read your source code to understand any smart tricks or reinventing the wheel.

Automatically create a category table in Postgresql by extracting unique table values

Recently I was given an SQL table containing some useful data. The problem with that table was that it was was non-properly normalized but was completely flat, i.e all its columns contained varchar values while, some of them should have instead contained foreign keys to other tables. Here’s an example of how this table looked like:

Parts
ID Serial Category Manufacturer
1 423431 Monitor LG
2 534552 Monitor LG
3 435634 Printer HP
4 534234 Printer Samsung
5 234212 Monitor Samsung
6 123123 Monitor LG

The normalized version of this table should have been instead like this:

Parts
ID Serial Category_id Manufacturer_id
1 423431 1 1
2 534552 1 1
3 435634 2 2
4 534234 2 3
5 234212 1 3
6 123123 1 2

with the following extra tables that contain the category values with proper foreign keys:

Category
ID Name
1 Monitor
2 Printer
Manufacturer
ID Name
1 Monitor
2 Printer

The normalized version should be used instead of the flat one for reasons which at this moment must be all too obvious.

Having such non-normalized tables is also a common problem I experience with Django: When creating a model I usually define some CharField with pre-defined choices which “will never change”. Sometime during the development (if I am lucky) or when the project is live for years I will be informed that the choices need not only to be changed but must be changed by the users of the site without the need to change the source code! Or that the choices/categories have properties (beyond their name) that need to be defined and used in the project. Both of these cases mean that these categories need to be extracted from simple strings to full Django models (i.e get normalized in their own table)!

In this post I will present a function written in PL/pgsql that will automatically normalize a column from a flat table like the previous. Specifically, using the previous example, if you have a table named part that has a non-normalized column named category then when you call select export_relation('part', 'category') the following will happen:

  • A new table named part_category will be created. This table will contain two columns id and name, with id being the primary key and name having a unique constraint. If the table exists it will be dropped and re-ccreated
  • A new column named category_id will be added to part. This column will be a foreign key to the new table part_category.
  • For each unique value v of category:
    • Insert a new record in part_category with v in the name field of the table and save the inserted id to current_id
    • Set current_id to category_id to all rows of part where category has the value of v

Before diving in to the PL/pgSQL script that does the above changes to the table I’d like to notice that I am not very experienced with PL/pgSQL since I rarerly use it (I actually avoid writing code in the database) however, because the case I described is ideal for using a database script I’ve bitten the bullet and implemented it.

Beyond it’s actual functionality, this script can be used as a reference/cookbook for common PL/pgSQL tasks:

  • Create/define a PL/pgSQL function (stored procedure)
  • Declare variables
  • Assign values to variables
  • Execute SQL commands with variable defined table / column names
  • Log process in PL/pgSQL
  • Executing code conditionally
  • Loop through the rows of a query
  • Save the primary key of an inserted row

The script works however I feel that more experienced PL/pgSQL developers would write things different - if you have any proposals please comment out and I’ll be happy to incorporate them to the script.

Now, let’s now take a look at the actual script:

CREATE OR REPLACE FUNCTION export_relation(table_name varchar, column_name varchar)
RETURNS void AS $$
DECLARE
    val varchar;
    res boolean;
    new_table_name varchar;
    new_column_name varchar;
    current_id int;
BEGIN
    new_table_name := table_name || '_' || column_name;
    new_column_name := column_name || '_id' ;

    execute format ('drop table if exists %s cascade', new_table_name);
    execute format ('CREATE TABLE %s (id serial NOT NULL, name character varying(255) NOT NULL, CONSTRAINT %s_pkey PRIMARY KEY (id), CONSTRAINT %s_unique UNIQUE (name))WITH ( OIDS=FALSE)',
        new_table_name, new_table_name, new_table_name
    );

    execute format('select exists(SELECT column_name  FROM information_schema.columns WHERE table_name=''%s'' and column_name=''%s'') as x', table_name, new_column_name) into res;
    if res is false then
        raise notice 'Creating colum';
        execute format ('ALTER TABLE %s ADD COLUMN %s integer null', table_name, new_column_name);
        execute format ('ALTER TABLE %s ADD CONSTRAINT fk_%s FOREIGN KEY (%s) REFERENCES %s(id)', table_name, new_column_name, new_column_name, new_table_name);
    end if ;

    for val IN execute format ('select %s from %s group by(%s)', column_name, table_name, column_name) LOOP
        RAISE NOTICE 'Inserting new value %s ...', val;
        execute format ('insert into  %s(name) values (''%s'') returning id', new_table_name, val) into current_id;
        raise notice 'Created ID %', current_id;
        execute format ('update %s set %s = %s where %s = ''%s''', table_name, new_column_name,current_id , column_name, val );
    END LOOP;

    /* Uncomment this if you want to drop the flat column
    raise notice 'Dropping colmn';
    execute format ('alter table %s drop column %s', table_name, column_name);
    */

END;
$$ LANGUAGE plpgsql;

The first line creates or updates the script. You can just copy over this script to an SQL window and run it as many times as you like (making changes between runs) and the script will be always updated. The function that is created is actually a procedure since it returns void and takes two parameters. The DECLARE section that follows contains all the variables that are used in the script:

  • val is the current value of the category when looping through their values
  • res is a boolean variable used for a conditional
  • new_table_name is the name of the table that will be created
  • new_column_name is the name of the column that will be added to the old table
  • current_id is the id of the last inserted value in the new table

After the BEGIN the actual procedure starts: First the values of new_table_name and new_column_name are initialized to be used throughout the code and then the new table is dropped (if exists) and re-created. Noticce the execute format (parameter) function that executes the SQL contained in the parameter which is a string and is constructed using the variables we’ve defined. The next line checks if the old table has the new column (i.e category_id) and saves the result in the res variable to check if the new column exists and if not add it to the old table.

A loop enumerating all unique values of the category column of the old table is then executed. Notice that val will contain a single value since the SQL that is executed will return a single column (that’s why it is declared as varchar). If we returned more than one column from the select the val could be declared as record and access its properties through dot notation (val.prop1 etc). The value is inserted to the newly created table using a insert into table values () returning id SQL syntax (so that the new id will be returned - this is an insert/select hybrid command) and saved to the current_id variable. The current_id variable then is used to update the new column that was added to the old table with the proper foreign key value.

Notice that I’ve a commented out code in the end - if you want you can uncomment it and the old (flat) column will be dropped - so in my examply the category column will be removed since I will have category_id to find out the name of each category. I recommend to uncomment this and actually drop the column since when you have both category and category_id the values of these two columns are going to get out of sync and since you’ll have duplicate information your table will be even more non-normalized. You can of course keep the column to make sure that the script works as you want since if the column is not dropped you can easily return to the previous state of the database by removeing the new table and column.

To call it just run select export_relation('part', 'category') and you should see some debug info in the messages tab. When the script is finished you’ll have the part_category table and category_id column in the part table.

Creating custom components for ag-grid

Recently I had to integrate an application with a javascript (client-side) excel-like grid. After some research I found out that ag-grid is (at least for me) the best javascript grid library. It has an open source version with a MIT license so it can safely be used by all your projects and a commercial, enteprise version that includes a bunch of extra features. The open source version should be sufficient for most projects however I really recommend buying the commercial version to support this very useful library.

The greatest characteristic of ag-grid in my opinion is its openess and API that enables you to extend it to support all your needs! An example of this API will be presented in this article where I will implement two components that can be included in your grid:

  • An array editor through which you will be able to create cells that can be used to insert a list of values. For example, if you have a grid of employees, you will be able to add a “children” cell that will contain the names of each employee’s children properly separated (one line per child)
  • An object editor through which you will be able to create cells that can be used to insert flat objects. For example, for the grid of employees, you may add an “address” cell that, when edited will be expanded to seperate fields for Address, City, Zip code and Country.

A common ground

First of all, let’s create a simple example that will act as a common ground for our components:

As you can see, I have defined a bunch of columns for the grid and then create a new grid passing it the myGrid div and the gridOptions (which are kept to a minimum). Finally, there’s an event handler for button click that adds an empty row:

var columns = [
    {
        headerName: 'ID', field: 'id', width: 50, editable: true
    }, {
        headerName: 'Name', field: 'name', width: 100, editable: true
    }, {
        headerName: "Address", field: "address", width: 200, editable: true
    }, {
        headerName: "Children", field: "children", width: 200, editable: true
    }
];

var gridOptions = {
    columnDefs: columns,
    rowData: []
};

new agGrid.Grid(document.querySelector('#myGrid'), gridOptions);

document.querySelector('#addRow').addEventListener("click", function() {
    gridOptions.api.addItems([{}]);
});

You may be able to edit both the address and the children of each employee in the above example however this is not intuitive. The editors will be able to enter any kind of address they want and add the children seperated by commas, spaces, dashes or whatever they want. Of course you could add validators to enforce some formatting for these fields however I think that using custom components has a much better user experience.

Please notice that the above example along with the following components will be implemented in pure (no-framework) javascript. Integrating ag-grid with a javascript framework like angular or react in an SPA should not be difficult however I find it easier to adjust my SPA so that the grid component is seperate and does not need interoperability with other SPA components since all components like renderers and editors will be integrated to the grid!

Also, using pure javascript for your custom components makes them faster than adding another layer of indirection through react as can be seen on the on ag-grid react integration:

If you do use React, be aware that you are adding an extra layer of indirection into ag-Grid. ag-Grid’s internal framework is already highly tuned to work incredibly fast and does not require React or anything else to make it faster. If you are looking for a lightning fast grid, even if you are using React and the ag-grid-react component, consider using plain ag-Grid Components (as explained on the pages for rendering etc) inside ag-Grid instead of creating React counterparts.

Editors and renderers

A custom column in ag-grid actually has two distinctive parts: An object that is used for rendering and an object that is used for editing the cell value. You can have columns with the built in editor and a custom renderer, columns with the built in renderer and a custom editor and cells with custom editors and rendererers. I urge you to read the documentation on both renderers and editors in order to understand most of the decisions I have made for the implemented components.

A renderer can be:

  • a function that receives the value of the cell and returns either an HTML string or a complete DOM object
  • a class that provides methods for returning the HTML string or DOM object for the cell

The function is much easier to use however biting the bullet and providing a renderer class is better for non-trivial rendering. This is because the function will be called each time the cell needs to be refreshed (refer to the docs on what refreshing means) while, the class provides a specific refresh() method that is called instead. This way, using the class you can generate the DOM structure for the cell once, when it is first created and then when its value changes you’ll only call its refresh method to update the value. We’ll see how this works later.

An editor is a class that should provide methods for returning the DOM structure for the cell editing (for example an <input> field) and the current value of the field.

Both renderer and editor classes can be attached to columns using cellEditor and cellRenderer column properties. You also may pass per-column properties to each cell using the cellEditorParams and cellRendererParams propertie. For example, you may have a renderer for booleans that displays icons for true/false and you want to use different icons depending on the column type, or you may want to create a validation-editor that receives a function and accepts the value you enter only if the function returns true - the valid function could be different for different column types.

Creating the object cell editor

The first component we’ll present here is an object renderer/editor. This component will receiver a list of fields and will allow the user to edit them in a popup grouped together. Here’s a fiddle with the Address of each employee using the object editing component:

To integrate it with the ag-grid I’ve added an addressFields list containg the fields of the object like this:

var addressFields = [
  {'name': 'address', 'label': 'Address' },
  {'name': 'zip', 'label': 'ZIP' },
  {'name': 'city', 'label': 'City' },
]

and then passed this as a parameter to both the renderer and editor for the address field:

{
    headerName: "Address", field: "address", width: 200, editable: true,
    cellRenderer: ObjectCellRenderer,
    cellEditor: ObjectEditor,
    cellEditorParams: {
      fields: addressFields
    },
    cellRendererParams: {
      fields: addressFields
    }
}

The ObjectEditor and ObjectCellRenderer are the actual editor and renderer of the component. I will start by representing the renderer first:

function ObjectCellRenderer() {}

ObjectCellRenderer.prototype.init = function (params) {
    // Create the DOM element to display
    this.span = document.createElement('span');
    this.span.innerHTML='';
    this.refresh(params)
};

The ObjectCellRender is an javascript object to which we define an init method. This method will be called by ag-grid when the component is first created, passing it a params object with various useful params, like the user-defined parameters (from cellRendererParams) and the actual value of othe cell. We just create an empty span DOM element that will be used to display the value of the object and call refresh.

ObjectCellRenderer.prototype.refresh = function(params) {
    var res = ''
    if(params.value) {
        // If we have a value build the representation
        for(var i=0;i<params.fields.length;i++) {
            res += params.fields[i].label + ': ';
            res += params.value[params.fields[i].name] + ' ';
        }
    }
    // Put representation to the DOM element
    this.span.innerHTML=res;
}

ObjectCellRenderer.prototype.getGui = function () {
    return this.span;
};

The refresh method generates the text value of the cell (that will be put inside the span we created in init). It first checks if the value attribute of params is defined and if yes, it appends the label of each object attribute (which we pass through cellRendererParams.fields.label) along with its value (which is retrieved from the params.value using cellRendererParams.fields.name). Notice ag-grid puts the result of the getGui method in the cell - so we just return the span we create. Also, we created the span element in init but filled it in refresh - to avoid it creating the same element lots of times (this would be more imporntant of course on more expensive operations).

Now let’s continue with ObjectEditor:

function ObjectEditor() {}

// Surpress some keypresses
var onKeyDown = function(event) {
    var key = event.which || event.keyCode;
    if (key == 37 ||  // left
        key == 39 || // right
        key == 9 ) {  // tab
        event.stopPropagation();
    }
}

ObjectEditor.prototype.init = function (params) {
    // Create the container DOM element
    this.container = document.createElement('div');
    this.container.style = "border-radius: 15px; border: 1px solid grey;background: #e6e6e6;padding: 10px; ";
    this.container.onkeydown = onKeyDown

    // Create the object-editing form
    for(i=0;i<params.fields.length;i++) {
        var field = params.fields[i];
        var label = document.createElement('label');
                label.innerHTML = field.label+': ';
        var input = document.createElement('input');
        input.name = field.name;
        if (params.value) {
            input.value = params.value[field.name];
        }

        this.container.appendChild(label);
        this.container.appendChild(input);
        this.container.appendChild(document.createElement('br'));
    }

    // Create a save button
    var saveButton = document.createElement('button');
    saveButton.appendChild(document.createTextNode('Ok'))
    saveButton.addEventListener('click', function (event) {
        params.stopEditing();
    });
    this.container.appendChild(saveButton);
};

The init function of ObjectEditor ise used to create a container div element that will hold the actual input elements. Then, using the fields that were passed as a parameter to the editor it creates a label, an input and a br element and inserts them one by one to the container div. The input is instantiated with the current value of each attribute while its name is taken from the name of the corresponding field (from the fields parameter). Finally, a saveButton is created that will stop the editing when clicked.

ObjectEditor.prototype.getGui = function () {
    return this.container;
};

ObjectEditor.prototype.afterGuiAttached = function () {
    var inputs = this.container.getElementsByTagName('input');
    inputs[0].focus();
};

ObjectEditor.prototype.getValue = function () {
    var res = {};
    // Create the cell value (an object) from the inputs values
    var inputs = this.container.getElementsByTagName('input');
    for(j=0;j<inputs.length;j++) {
          res[inputs[j].name] = inputs[j].value.replace(/^\s+|\s+$/g, "");
    }
    return res;
};

ObjectEditor.prototype.destroy = function () {
};

ObjectEditor.prototype.isPopup = function () {
    return true;
};

The other methods of ObjectEditor are simpler: getGui actually returns the container we built in the init, afterGuiAttached is called when the component is attached to the DOM and focuses on the first input element, getValue enumerates the input elements, takes their value (and names) and return an object with the name/value pairs, destroy dosn’t do anything however it must be defined and can be used for cleaning up if needed and isPopup returns true to display the container as a popup instead of inline.

Creating the array-like cell editor

The multi-line renderer/editor will allow a cell to contain a list of values. Here’s the complete fiddle where the “children” column is using the multi-line component:

To integrate it with ag-grid we just need to use the corresponding editor and renderer:

{
    headerName: "Children", field: "children", width: 200, editable: true,
    cellRenderer: MultiLineCellRenderer,
    cellEditor: MultiLineEditor
}

The MultiLineCellRenderer is similar to the ObjectCellRenderer. A span/container element is created at the init method and the refresh method is called to fill it. The refresh method outputs the number of items in the list (i.e it writes N items) and uses the span’s title to display a tooltip with the values of the items:

function MultiLineCellRenderer() {}

MultiLineCellRenderer.prototype.init = function (params) {
    this.span = document.createElement('span');
    this.span.title=''
    this.span.innerHTML='';
    this.refresh(params);
}

MultiLineCellRenderer.prototype.refresh = function (params) {
    if (params.value === "" || params.value === undefined || params.value === null) {
        this.span.innerHTML = '0 items';
    } else {
        var res = ''
        // Create the tooltip for the cell
        for(var i=0;i<params.value.length;i++) {
            res += (i+1)+': ' + params.value[i]
            res+='\n';
        }
        this.span.title = res;
        this.span.innerHTML = params.value.length + ' item' + (params.value.length==1?"":"s");
    }
};

MultiLineCellRenderer.prototype.getGui = function () {
    return this.span;
};

The logic of the MultiLineEditor is also similar to the ObjectEditor:

function MultiLineEditor() {}

MultiLineEditor.prototype.onKeyDown = function (event) {
    var key = event.which || event.keyCode;
    if (key == 37 ||  // left
        key == 39 || // right
        key == 9 ) {  // tab
        event.stopPropagation();
    }
};

// Used to append list items (their value along with the remove button)
MultiLineEditor.prototype.addLine = function(val) {
    var li = document.createElement('li');
    var span = document.createElement('span');
    var removeButton = document.createElement('button');
    removeButton.style='margin-left: 5px; text-align: right; '
    removeButton.innerHTML = 'Remove'
    span.innerHTML = val;
    li.appendChild(span)
    li.appendChild(removeButton)

    this.ul.appendChild(li);
    var that = this;
    removeButton.addEventListener('click', function(event) {
        that.ul.removeChild(this.parentElement);
    });
}

MultiLineEditor.prototype.init = function (params) {
    var that = this;
    this.container = document.createElement('div');
    this.container.style = "border-radius: 15px; border: 1px solid grey;background: #e6e6e6;padding: 10px;";

    this.ul = document.createElement('ol');
    if (params.value) {
        for(i=0;i<params.value.length;i++) {
            this.addLine(params.value[i]);
        }
    }
    this.container.appendChild(this.ul);
    this.input = document.createElement('input');
    this.container.appendChild(this.input);

    this.addButton = document.createElement('button');
    this.addButton.innerHTML = 'Add';
    this.addButton.addEventListener('click', function (event) {
        var val = that.input.value;
        if(!val || val==undefined || val=='') return;
        that.addLine(val, that.ul);
        that.input.value='';
    });
    this.container.appendChild(this.addButton);

    this.saveButton = document.createElement('button');
    this.saveButton.innerHTML = 'Ok';
    this.saveButton.addEventListener('click', function (event) {
        params.stopEditing();
    });
    this.container.appendChild(this.saveButton);

};

A div element that will display the popup editor is created first. In this container we add a ol element and then, if there are values in the list they will be appended in that ol using the addLine method. This method creates a li element the value of each list item along with a remove button which removes the corresponding li element when clicked (that that=this is needed because the click callback function of the button has a different this than the addLine method so that needs to be used instead).

After the list of the items, there’s an input whose value will be inserted to the list when clicking the add button. The same addLine function we used when initializing the component is used here to append the input’s value to the ol. Finally the save button stops the list editing:

MultiLineEditor.prototype.getGui = function () {
    return this.container;
};

MultiLineEditor.prototype.afterGuiAttached = function () {
    var inputs = this.container.getElementsByTagName('input');
    inputs[0].focus();
};

MultiLineEditor.prototype.getValue = function () {
    var res = [];
    // The value is the text of all the span items of the li items
    var items = this.ul.getElementsByTagName('span');
    for(var i=0;i<items.length;i++) {
        res.push(items[i].innerHTML);
    }
    return res;
};

MultiLineEditor.prototype.destroy = function () {
};

MultiLineEditor.prototype.isPopup = function () {
    return true;
};

For the value of this component, we just enumerate through the li span items of the list we created through addLine and add it to a normal javascript array.

Some more discussion about the components

The above components can be used as they are however I think that their greatest value is that they should show-off some of the capabilities of ag-grid and used as templates to build up your own stuff. Beyond styling or changing the layout of the multi-line and the object editor, I can think of a great number of extensions for them. Some examples, left as an excerise to the reader:

  • The renderer components are rather simple, you may create any DOM structure you want in them
  • Pass a function as a parameter to the object or multi-line editor that will be used as a validator for the values (i.e not allow nulls or enforce a specific format) - then you can add another “error” span that will be displayed if the validation does not pass
  • Make the object editor smarter by passing the type of each attribute (i.e text, number, boolean, date) and check that each input corresponds to that type
  • Pass a min/max number of list items for the multi-line editor
  • The multi-line editor doesn’t really need to have only text values for each item. You could combine the multi-line editor with the object editor so that each line has a specific, object-like structure. For example, for each child we may have two attributes, name and date of birth

Getting a logical backup of all databases of your Postgresql server

In this small post I will present a small bash script that could be used to create logical backups of all the databases in a Postgresql server along with some other goodies. More specifically, the script will:

  • Create two files at /tmp to output information (one for debugging and one with info for sending it through email at the end)
  • Create a backup directory with the current date
  • Create a list of all databases found on the server
  • For each database, vacuum and analyze it, backup it, gzip it and put it in the backup directory
  • Write info about the backup in the info log file
  • Do the same for global objects
  • Send an email when the backup is finished with the info log

Now, in my system I’m using an external folder at /mnt/backupdb to put my backups. You may either use the same technique or connect remotely to a postgresql database (so you need to change the parameters of vacuumdb, pg_dump and pg_dumpall to define the server and credentials to connect to) and put the backups to a local disc.

#!/bin/sh
echo "" > /tmp/db_backup.log
echo "" > /tmp/db_backup_info.log
date_str=$(date +"%Y%m%d_%H%M%S")
backup_dir=/mnt/backupdb/pg_backup.$date_str

mkdir $backup_dir
pushd $backup_dir > /dev/null

dbs=`sudo -u postgres psql -Upostgres -lt | grep -v : | cut -d \| -f 1 | grep -v template | grep -v -e '^\s*$' | sed -e 's/  *$//'|  tr '\n' ' '`
echo "Will backup: $dbs to $backup_dir" >> /tmp/db_backup_info.log
for db in $dbs; do
  echo "Starting backup for $db" >> /tmp/db_backup_info.log
  filename=$db.$date_str.sql.gz
  sudo -u postgres vacuumdb --analyze -Upostgres $db >> /tmp/db_backup.log
  sudo -u postgres pg_dump -Upostgres -v $db -F p 2>> /tmp/db_backup.log | gzip > $filename
  size=`stat $filename --printf="%s"`
  kb_size=`echo "scale=2; $size / 1024.0" | bc`
  echo "Finished backup for $db - size is $kb_size KB" >> /tmp/db_backup_info.log
done

echo "Backing up global objects" >> /tmp/db_backup_info.log
filename=global.$date_str.sql.gz
sudo -u postgres pg_dumpall -Upostgres -v -g 2>> /tmp/db_backup.log | gzip > $filename
size=`stat $filename --printf="%s"`
kb_size=`echo "scale=2; $size / 1024.0" | bc`
echo "Finished backup for global - size is $kb_size KB" >> /tmp/db_backup_info.log

echo "Ok!" >> /tmp/db_backup_info.log
mail -s "Backup results" spapas@mymail.foo.bar  < /tmp/db_backup_info.log
popd > /dev/null

Let’s explain a bit the above script: The two first lines (echo …) will just clear out the two files /tmp/db_backup.log and /tmp/db_backup_info.log. The first will contain debug info from the commands and the second one will contain our info that will be sent through an email at the end of the backup. After that, we initialize date_str with the current date in the form 20161102_145011 and the backup_dir with the correct directory to save the backups to. We then create the backup directory and switch to it with pushd.

The following, rather long command will assign the names of the databases to the dbs variable. So how is it working? psql -lt lists the names of the databases, but lists also more non-needed information which we remove with the following commands (grep, cut etc). The sed removes whitespace and the tr concatenates individual lines to a single line so dbs will have a value like ‘db1 db2 …’. For each one of these files then we assign its name and date to a filename and then, after we execute vacuumdb we use pg_dump with gzip to actually create the backup and output it to the file. The other two lines (size and kb_size) are used to calculate the size of the backup file (to be sure that something is actually created) - you’ll need to install bc for that. The same process is followed the to backup global objects (usernames etc) using pg_dumpall -g. Finally, we send a mail with a subject of “Backup results” and body the contents of /tmp/db_backup_info.log.

I’ve saved this file to /var/lib/pgsql/db_backup_all.sh. To run I propose using cron — just edit your crontab (through vi /etc/crontab) and add the line

15 2  *  *  * root       /usr/bin/bash /var/lib/pgsql/db_backup_all.sh

This will run the backup every night at 2.15. Uses the root user to have access rights to the backup folder. One thing to be careful about is that on Redhat/Centos distributions, the above won’t work because sudo requires a tty to work and cron doesn’t have one. To fix this, comment out the line

Defaults    requiretty

of your /etc/sudoers file.

Update 02/12/2016: Here’s a little better version of the above script that

  1. Create two files for each database, one with SQL script backup, one with binary backup. Although with SQL backup you can check out the backup and maybe do changes before applying it, the binary backup is a more foolproof method of restoring everything to your database! Also, instead of restoring the database through psql (as required by the SQL script backup), using the binary backup you can restore through the pg_restore tool.
  2. Adds a function to output the file size (so the script is more DRY)
#!/bin/sh

function output_file_size {
  size=`stat $1 --printf="%s"`
  kb_size=`echo "scale=2; $size / 1024.0" | bc`
  echo "Finished backup for $2 - size is $kb_size KB" >> /tmp/db_backup_info.log
}

echo "" > /tmp/db_backup.log
echo "" > /tmp/db_backup_info.log
date_str=$(date +"%Y%m%d_%H%M%S")
backup_dir=/mnt/backupdb/dbpg/pg_backup.$date_str

mkdir $backup_dir
pushd $backup_dir > /dev/null
dbs=`sudo -u postgres psql -Upostgres -lt | cut -d \| -f 1 | grep -v template | grep -v -e '^\s*$' | sed -e 's/  *$//'|  tr '\n' ' '`
#dbs='dgul  hrms  mailer_server  missions  postgres'
echo "Will backup: $dbs to $backup_dir" >> /tmp/db_backup_info.log
for db in $dbs; do
  echo "Starting backup for $db" >> /tmp/db_backup_info.log
  filename=$db.$date_str.sql.gz
  filename_binary=$db.$date_str.bak.gz
  sudo -u postgres vacuumdb --analyze -Upostgres $db >> /tmp/db_backup.log
  sudo -u postgres pg_dump -Upostgres -v $db -F p 2>> /tmp/db_backup.log | gzip > $filename
  sudo -u postgres pg_dump -Upostgres -v $db -F c 2>> /tmp/db_backup.log | gzip > $filename_binary
  output_file_size $filename "$db sql"
  output_file_size $filename_binary "$db bin"
done
echo "Backing up global objects" >> /tmp/db_backup_info.log
filename=global.$date_str.sql.gz
sudo -u postgres pg_dumpall -Upostgres -v -g 2>> /tmp/db_backup.log | gzip > $filename
output_file_size $filename global
echo "Ok!" >> /tmp/db_backup_info.log
mail -s "Backup results" spapas@hcg.gr  < /tmp/db_backup_info.log
popd > /dev/null