/var/

Various programming stuff

How to create a custom filtered adapter in Android

Introduction

Android offers a nice component named AutoCompleteTextView that can be used to auto-fill a text box from a list of values. In its simplest form, you just create an array adapter passing it a list of objects (that have a proper toString() method). Then you type some characters to the textbox and by default it will filter the results searching in the beginning of the backing object’s toString() result.

However there are times that you don’t want to look at the beginning of the string (because you want to look at the middle of the string) or you don’t want to just to search in toString() method of the object or you want to do some more fancy things in object output. For this you must override the ArrayAdapter and add a custom Filter.

Unfurtunately this isn’t as straightforward as I’d like and I couldn’t find a quick and easy tutorial on how it can be done.

So here goes nothing: In the following I’ll show you a very simple android application that will have the minimum viable custom filtered adapter implementation. You can find the whole project in github: https://github.com/spapas/CustomFilteredAdapeter but I am going to discuss everything here also.

The application

Just create a new project with an empty activity from Android Studio. Use kotlin as the language.

The layout

I’ll keep it as simple as possible:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
        xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        android:orientation="vertical"
        android:layout_width="match_parent" android:layout_height="match_parent"
        tools:context=".MainActivity">
    <TextView
            android:layout_width="match_parent" android:layout_height="wrap_content"
            android:text="Hello World!"
            android:textSize="32sp"
            android:textAlignment="center"/>
    <AutoCompleteTextView
            android:layout_marginTop="32dip"
            android:layout_width="match_parent" android:layout_height="wrap_content"
            android:id="@+id/autoCompleteTextView"/>
</LinearLayout>

You should just care about the AutoCompleteTextView with an id of autoCompleteTextView.

The backing data object

I’ll use a simple PoiDao Kotlin data class for this:

data class PoiDao(
    val id: Int,
    val name: String,
    val city: String,
    val category_name: String
)

I’d like to be able to search to both name, city and category_name of each object. To create a list of the pois to be used to the adapter I can do something like:

val poisArray = listOf(
    PoiDao(1, "Taco Bell", "Athens", "Restaurant"),
    PoiDao(2, "McDonalds", "Athens","Restaurant"),
    PoiDao(3, "KFC", "Piraeus", "Restaurant"),
    PoiDao(4, "Shell", "Lamia","Gas Station"),
    PoiDao(5, "BP", "Thessaloniki", "Gas Station")
)

The custom adapter

This will be an ArrayAdapter<PoiDao> implementing also the Filterable interface:

inner class PoiAdapter(context: Context, @LayoutRes private val layoutResource: Int, private val allPois: List<PoiDao>):
    ArrayAdapter<PoiDao>(context, layoutResource, allPois),
    Filterable {
    private var mPois: List<PoiDao> = allPois

    override fun getCount(): Int {
        return mPois.size
    }

    override fun getItem(p0: Int): PoiDao? {
        return mPois.get(p0)
    }

    override fun getItemId(p0: Int): Long {
        // Or just return p0
        return mPois.get(p0).id.toLong()
    }

    override fun getView(position: Int, convertView: View?, parent: ViewGroup?): View {
        val view: TextView = convertView as TextView? ?: LayoutInflater.from(context).inflate(layoutResource, parent, false) as TextView
        view.text = "${mPois[position].name} ${mPois[position].city} (${mPois[position].category_name})"
        return view
    }

    override fun getFilter(): Filter {
        // See next section
    }
}

You’ll see that we add an instance variable named mPois that gets initialized in the start with allPois (which is the initial list of all pois that is passed to the adapter). The mPois will contain the filtered results. Then, for getCount and getItem we return the corresponding valeus from mPois; the getItemId is used when you have an sqlite backed adapter but I’m including it here for completeness.

The getView will create the specific line for each item in the dropdown. As you’ll see the layout that is passed must have a text child which is set based on some of the attributes of the corresponding poi for each position. Notice that we can use whatever view layout we want for our dropdown result line (this is the layoutResource parameter) but we need to configure it (i.e bind it with the values of the backing object) here properly.

Finally we create a custom instance of the Filter, explained in the next section.

The custom filter

The getFilter creates an object instance of a Filter and returns it:

override fun getFilter(): Filter {
    return object : Filter() {
        override fun publishResults(charSequence: CharSequence?, filterResults: Filter.FilterResults) {
            mPois = filterResults.values as List<PoiDao>
            notifyDataSetChanged()
        }

        override fun performFiltering(charSequence: CharSequence?): Filter.FilterResults {
            val queryString = charSequence?.toString()?.toLowerCase()

            val filterResults = Filter.FilterResults()
            filterResults.values = if (queryString==null || queryString.isEmpty())
                allPois
            else
                allPois.filter {
                    it.name.toLowerCase().contains(queryString) ||
                    it.city.toLowerCase().contains(queryString) ||
                    it.category_name.toLowerCase().contains(queryString)
                }
            return filterResults
        }
    }
}

This object instance overrides two methods of Filter: performFiltering and publishResults. The performFiltering is where the actual filtering is done; it should return a FilterResults object containing a values attribute with the filtered values. In this method we retrieve the charSequence parameter and converit it to lowercase. Then, if this parameter is not empty we filter the corresponding elements of allPois (i.e name, city and category_name in our case) using contains. If the query parameter is empty then we just return all pois. Warning java developers; here the if is used as an expression (i.e its result will be assigned to filterResults.values).

After the performFiltering has finished, the publishResults method is called. This method retrieves the filtered results in its filterResults parameter. Thus it sets mPois of the custom adapter is set to the result of the filter operation and calls notifyDataSetChanged to display the results.

Using the custom adapter

To use the custom adapter you can do something like this in your activity’s onCreate:

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    val poisArray = listOf(
        // See previous sections
    )
    val adapter = PoiAdapter(this, android.R.layout.simple_list_item_1, poisArray)
    autoCompleteTextView.setAdapter(adapter)
    autoCompleteTextView.threshold = 3

    autoCompleteTextView.setOnItemClickListener() { parent, _, position, id ->
        val selectedPoi = parent.adapter.getItem(position) as PoiDao?
        autoCompleteTextView.setText(selectedPoi?.name)
    }
}

We create the PoiAdapter passing it the poisArray and android.R.layout.simple_list_item_1 as the layout. That layout just contains a textview named text. As we’ve already discussed you can pass something more complex here. The thresold defined the number of characters that the user that needs to enter to do the filtering (default is 2).

Please notice that when the user clicks (selects) on an item of the dropdown we set the contents of the textview (or else it will just use the object’s toString() method to set it).

Fixing your Django async job - database integration

I’ve already written two articles about django-rq and implementing asynchronous tasks in Django. However I’ve found out that there’s a very important thing missing from them: How to properly integrate your asynchronous tasks with your Django database. This is very important because if it is not done right you will start experiencing strange errors about missing database objects or duplicate keys. The most troublesome thing about these errors is that they are not consistent. Your app may work fine but for some reason you’ll see some of your asynchronous tasks fail with these errors. When you re-queue the async jobs everything will be ok.

Of course this behavior (code that runs sometimes) smells of a race condition but its not easy to debug it if you don’t know the full story.

In the following I will describe the cause of this error and how you can fix it. As a companion to this article I’ve implemented a small project that can be used to test the error and the fix: https://github.com/spapas/async-job-db-fix.

Notice that although this article is written for django-rq it should also help people that have the same problems with other async job systems (like celery or django-q).

Description of the project

The project is very simple, you can just add a url and it will retrieve its content asynchronously and report its length. For the models, it just has a Task model which is used to provide information about what we want to the asynchronous task to do and retrieve the result:

from django.db import models

class Task(models.Model):
    created_on = models.DateTimeField(auto_now_add=True)
    url = models.CharField(max_length=128)
    url_length = models.PositiveIntegerField(blank=True, null=True)
    job_id = models.CharField(max_length=128, blank=True, null=True)
    result = models.CharField(max_length=128, blank=True, null=True)

It also has a home view that can be used to start new asynchronous tasks by creating a Task object with the url we got and passing it to the asynchronous task:

from django.views.generic.edit import FormView
from .forms import TaskForm
from .tasks import get_url_length
from .models import Task

import time
from django.db import transaction

class TasksHomeFormView(FormView):
    form_class = TaskForm
    template_name = 'tasks_home.html'
    success_url = '/'

    def form_valid(self, form):
        task = Task.objects.create(url=form.cleaned_data['url'])
        get_url_length.delay(task.id)
        return super(TasksHomeFormView, self).form_valid(form)

    def get_context_data(self, **kwargs):
        ctx = super(TasksHomeFormView, self).get_context_data(**kwargs)
        ctx['tasks'] = Task.objects.all().order_by('-created_on')
        return ctx

And finally the asynchronous job itself that retrieves the task from the database, requests its url and saves its length:

import requests
from .models import Task
from rq import get_current_job
from django_rq import job

@job
def get_url_length(task_id):
    jb = get_current_job()
    task = Task.objects.get(
        id=task_id
    )
    response = requests.get(task.url)
    task.url_length = len(response.text)
    task.job_id = jb.get_id()
    task.result = 'OK'
    task.save()

The above should be fairly obvious: The user visits the homepage and enters a url at the input. When he presses submit the view will create a new Task object with the url that the user entered and fire-off the get_url_length asynchronous job passing the task id of the task that was just created. It will then return immediately without waiting for the asynchronous job to complete. The user will need to refresh to see the result of his job; this is the usual behavior with async jobs.

The asynchronous job on the other hand will retrieve the task whose id got as a parameter from the database, do the work it needs to do and update the result when it is finished.

Unfortunately, the above simple setup will probably behave erratically by randomly throwing database related errors!

Cause of the problem

In the previous section I said probably because the erratic behavior is caused by a specific setting of your Django project; the ATOMIC_REQUESTS. This setting can be set on your database connection and if it is TRUE then each request will be atomic. This means that each request will be tied with a database transaction i.e a transaction will be started when your request starts and commited only when your requests finishes; if for some reason your request throws an error then the transaction will be rolled back. An example of this setting is:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
        'ATOMIC_REQUESTS': True,
    }
}

Now, in my opinion, ATOMIC_REQUESTS is a great thing to have because it makes everything much easier. I always set it to True to my projects because I don’t need to actually think about transactions and requests; I know that if there’s a problem in a request the whole transaction will be rolle back and no garbage will be left in the database. If on the other hand for some reason a request does not need to be tied to a transaction I just set it off for this specific transaction (using transaction.non_atomic_requests_). Please notice that by default the ATOMIC_REQUESTS has a False value which means that the database will be in autocommit mode meaning that every command will be executed immediately.

So although the ATOMIC_REQUESTS is great, it is actually the reason that there are problems with asynchronous tasks. Why? Let’s take a closer look at what the form_valid of the view does:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url']) #1
    get_url_length.delay(task.id) #2
    return super(TasksHomeFormView, self).form_valid(form) #3

It creates the task in #1, fires off the asynchronous task in #2 and continues the execution of the view processing in #3. The important thing to understand here is that the transaction will be commited only after #3 is finished. This means that there’s a possibility that the asynchronous task will be started before #3 is finished thus it won’t find the task because the task will not be created yet(!) This is a little counter-intuitive but you must remember that the async task is run by a worker which is a different process than your application server; the worker may be able to start before the transaction is commited.

If you want to actually see the problem every time you can add a small delay between the start of the async task and the form_valid something like this:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url'])
    get_url_length.delay(task.id)
    time.sleep(1)
    return super(TasksHomeFormView, self).form_valid(form)

This will make the view more slow so the asynchronous worker will always have time to start executing the task (and get the not found error). Also notice that if you had ATOMIC_REQUESTS: False the above code would work fine because the task would be created immediately (auto-commited) and the async job would be able to find it.

The solution

So how is this problem solved? Well it’s not that difficult now that you know what’s causing it!

One solution would be to set ATOMIC_REQUESTS to False but that would make all database commands auto-commit so you’ll lose request-transaction-tieing. Another solution would be to set ATOMIC_REQUESTS to True and disable atomic requests for the specific view that starts the asynchronous job using transaction.non_atomic_requests_. This is a viable solution however I don’t like it because I’d lose the comfort of transaction per request for this specific request and I would need to add my own transaction handling.

A third solution is to avoid messing with the database in your view and create the task object in the async job. Any parameters you want to pass to the async job would be passed directly to the async function. This may work fine in some cases but I find it more safe to create the task in the database before starting the async job so that I have better control and error handling. This way even if there’s an error in my worker and for some reason the async job never starts or it breaks before being able to handle the database, I will have the task object in the database because it will have been created in the view.

Is there anything better? Isn’t there a way to start the executing the async job after the transaction of the view is commited? Actually yes, there is! For this, transaction.on_commit comes to the rescue! This function receives a callback that will be called after the transaction is commited! Thus, to properly fix you project, you should change the form_valid method like this:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url'])
    transaction.on_commit(lambda: get_url_length.delay(task.id))
    time.sleep(1)
    return super(TasksHomeFormView, self).form_valid(form)

Notice that I need to use lambda to create a callback function that will call get_url_length.delay(task.id) when the transaction is commited. Now even though I have the delay there the async job will start after the transaction is commited, ie after the view handler is finished (after the 1 second delay).

Conclusion

From the above you should be able to understand why sometimes you have problems when your async jobs use the database. To fix it you have various options but at least for me, the best solution is to start your async jobs after the transaction is commited using transaction.on_commit. Just change each async.job.delay(parameters) call to transaction.on_commit(lambda: async.job.delay(parameters)) and you will be fine!

Use du to find out the disk usage of each directory in unix

One usual problem I have when dealing with production servers is that their disks get filled. This results in various warnings and errors and should be fixed immediately. The first step to resolve this issue is to actually find out where is that hard disk space is used!

For this you can use the du unix tool with some parameters. The problem is that du has various parameters (not needed for the task at hand) and the various places I search for contain other info not related to this specific task.

Thus I’ve decided to write this small blog post to help people struggling with this and also to help me avoid googling for it by searching in pages that also contain other du recipies and also avoid the trial and error that this would require.

So to print out the disk usage summary for a directory go to that directory and run du -h -s *; you need to have access to the child subdirectories so probably it’s better to try this as root (unless you go to your home dir for example).

Here’s a sample usage:

[root@server1 /]# cd /
[root@server1 /]# du -h -s *
7.2M    bin
55M     boot
164K    dev
35M     etc
41G     home
236M    lib
25M     lib64
20K     lost+found
8.0K    media
155G    mnt
0       proc
1.6G    root
12M     sbin
8.0K    srv
427M    tmp
3.2G    usr
8.9G    var

The parameters are -h to print human readable sizes (G, M etc) and -s to print a summary usage of each parameter. Since this will output the summary for each parameter I finally pass * to be changed to all files/dirs in that directory. If I used du -h -s /tmp instead I’d get the total usage only for the /tmp directory.

Another trick that may help you quickly find out the offending directories is to append the | grep G pipe command (i.e run du -h -s * | grep G) which will filter out only the entries containing a G (i.e only print the folders having more than 1 GB size). Yeh I know that this will also print entries that have also a G in their name but since there aren’t many directores that have G in their name you should be ok.

If you run the above from / so that /proc is included you may get a bunch of du: cannot access 'proc/nnn/task/nnn/fd/4': No such file or directory errors; just add the 2> /dev/null pipe redirect to redirect the stderr output to /dev/null, i.e run du -h -s * 2> /dev/null.

Finally, please notice that if there are lots of files in your directory you’ll get a lot of output entries (since the * will match both files and directories). In this case you can use echo */ or ls -d */ to list only the directories; append that command inside a ` pair or $() (to substitute for the command output) instead of the * to only get the sizes of the directories, i.e run du -h -s $(echo */) or du -h -s `echo */`.

One thing that you must be aware of is that this command may take a long time especially if you have lots of small files somewhere. Just let it run and it should finish after some time. If it takes too long time try to exclude any mounted network directories (either with SMB or NFS) since these will take extra long time.

Also, if you awant a nice interactive output using ncurses you can download and compile the ncdu tool (NCurses Disk Usage).

Adding a delay to Django HTTP responses

Sometimes you’d like to make your Django views more slow by adding a fake delay. This may sound controversial (why would somebody want to make some of his views slower) however it is a real requirement, at least when developing an application.

For example, you may be using a REST API and you want to implement a spinner while your form is loading. However, usually when developing your responses will load so soon that you won’t be able to admire your spinner in all its glory! Also, when you submit a POST form (i.e a form that changes your data), it is advisable to disable your submit button so that when your users double click it the form won’t be submitted two times (it may seem strange to some people but this is a very common error that has bitten me many times; there are many users that think that they need to double click the buttons; thus I always disable my submit buttons after somebody clicks them); in this case you also need to make your response a little slower to make sure that the button is actually disabled!

I will propose two methods for adding this delay to your responses. One that will affect all (or most) your views using a middleware and another that you can add to any CBV you want using a mixin; please see my previous CBV guide for more on Django CBVs and mixins. For the middleware solution we’ll also take a quick look at what is the Django middleware mechanism and how it can be used to add functionality.

Using middleware

The Django middleware is a mechanism for adding your own code to the Django request / response cycle. I’ll try to explain this a bit; Django is waiting for an HTTP Request (i.e GET a url with these headers and these query parameters), it will parse this HTTP Request and prepare an HTTP Response (i.e some headers and a Payload). Your view will be the main actor for retrieving the HTTP response and returning the HTTP request. However, using this middleware mechanism Django allows you to enable other actors (the middleware) that will universally modify the HTTP request before passing it to your view and will also modify the view’s HTTP respone before sending it back to the client.

Actually, a list of middleware called … MIDDLEWARE is defined by default in the settings.py of all new Django projects; these are used to add various capabilities that are universally needed, for example session support, various security enablers, django message support and others. You can easily attach your own middleware to that list to add extra functionality. Notice that the order of the middleware in the MIDDLEWARE list actually matters. Middleware later in the list will be executed after the ones previous in the list; we’ll see some consequences of this later.

Now the time has come to take a quick look at how to implement a middleware, taken from the Django docs:

class SimpleMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        # One-time configuration and initialization.

    def __call__(self, request):
        # Code to be executed for each request before
        # the view (and later middleware) are called.

        response = self.get_response(request)

        # Code to be executed for each request/response after
        # the view is called.

        return response

Actually you can implement the middleware as a nested function however I prefer the classy version. The comments should be really enlightening: When your project is started the constructor (__init__) will be called once, for example if you want to read a configuration setting from the database then you should do it in the __init__ to avoid calling the database everytime your middleware is executed (i.e for every request). The __call__ is a special method that gets translated to calling this class instance as a function, i.e if you do something like:

sm = SimpleMiddleware()
sm()

Then sm() will execute the __call__; there are various similar python special methods, for example __len__, __eq__ etc

Now, as you can see the __call__ special method has four parts:

  • Code that is executed before the self.get_response() method is called; here you should modify the request object. Middleware will reach this point in the order they are listed.
  • The actual call to self.get_response()
  • Code that is executed after the self.get_response() method is called; here you should modify the response object. Middleware will reach this point in the reverse order they are listed.
  • Returning the response to be used by the next middleware

Notice that get_response will call the next middleware; while the get_response for the last middleware will actually call the view. Then the view will return a response which could be modified (if needed) by the middlewares in the opposite order of their definition list.

As an example, let’s define two simple middlewares:

class M1:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        print("M1 before response")
        response = self.get_response(request)
        print("M1 after response")
        return response

class M2:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        print("M2 before response")
        response = self.get_response(request)
        print("M2 after response")
        return response

When you define MIDDLEWARE = ['M1', 'M2'] you’ll see the following:

# Got the request
M1 before response
M2 before response
# The view is rendered to the response now
M2 after response
M1 after response
# Return the response

Please notice a middleware may not call self.get_response to continue the chain but return directly a response (for example a 403 Forbiden response).

After this quick introduction to how middleware works, let’s take a look at a skeleton for the time-delay middleware:

import time

class TimeDelayMiddleware(object):

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        time.sleep(1)
        response = self.get_response(request)
        return response

This is really simple, I’ve just added an extra line to the previous middleware. This line adds a one-second delay to all responses. I’ve added it before self.get_response - because this delay does not depend on anything, I could have added it after self.get_response without changes in the behavior. Also, the order of this middleware in the MIDDLEWARE list doesn’t matter since it doesn’t depend on other middleware (it just needs to run to add the delay).

This middleware may have a little more functionality, for example to configure the delay from the settings or add the delay only for specific urls (by checking the request.path). Here’s how these extra features could be implemented:

import time
from django.conf import settings

class TimeDelayMiddleware(object):

    def __init__(self, get_response):
        self.get_response = get_response
        self.delay = settings.REQUEST_TIME_DELAY


    def __call__(self, request):
        if '/api/' in request.path:
            time.sleep(self.delay)
        response = self.get_response(request)
        return response

The above will add the delay only to requests whose path contains '/api'. Another case is if you want to only add the delay for POST requests by checking that request.method == 'POST'.

Now, to install this middleware, you can configure your MIDDLEWARE like this in your settings.py (let’s say that you have an application named core containing a module named middleware):

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',

    'core.middleware.TimeDelayMiddleware',
]

The other middleware are the default ones in Django. One more thing to consider is that if you have a single settings.py this middleware will be called; one way to override the delay is to check for settings.DEBUG and only call time.sleep when DEBUG == True. However, the proper way to do it is to have different settings for your development and production environments and add the TimeDelayMiddleware only to your development MIDDLEWARE list. Having different settings for each development is a common practice in Django and I totally recommend to use it.

Using CBVs

Another method to add a delay to the execution of a view is to implement a TimeDelayMixin and inherit your Class Based View from it. As we’ve seen in the CBV guide, the dispatch method is the one that is always called when your CBV is rendered, thus your TimeDelayMixin could be implemented like this:

import time

class TimeDelayMixin(object, ):

    def dispatch(self, request, *args, **kwargs):
        time.sleep(1)
        return super().dispatch(request, *args, **kwargs)

This is very simple (and you can use similar techniques as described for the middleware above to configure the delay time or add the delay only when settings.DEBUG == True etc) - to actually use it, just inherit your view from this mixin, f.e:

class DelayedSampleListView(TimeDelayMixin, ListView):
    model = Sample

Now whenever you call your DelayedSampleListView you’ll see it after the configured delay!

What is really interesting is that the dispatch method actually exists (and has the same functionality) also in Django Rest Framework CBVs, thus using the same mixin you can add the delay not only your normal CBVs but also your DRF API views!

Easy immutable objects in Javascript

With the rise of Redux and other similar Javascript frameworks (e.g Hyperapp) that try to be a little more functional (functional as in functional programming), a new problem was introduced to Javascript programmers (at least to those that weren’t familair with functional programming): How to keep their application’s state “immutable”.

Immutable means that the state should be an object that does not change (mutates) - instead of changing it, Redux needs the state object to be created from the beginning.

So when something happens in your application you need to discard the existing state object and create a new one from scratch by modifying and copying the previous state values. This is easy in most toy-apps that are used when introducing the concept, for example if your state is {counter: 0} then you could just define the reducer for the ADD action (i.e when the user clicks the + button) like this:

let reducer = (state={counter: 0}, action) => {
  switch (action.type) {
    case 'ADD': return { 'counter': state.counter+1 }
    default: return state
  }
}

Unfortunately, your application will definitely have a much more complex state than this!

In the following, I’ll do a quick introduction on how to keep your state objects immutable using modern Javascript techniques, I’ll present how complex it is to modify non-trivial immutable objects and finally I’ll give you a quick recipe for modifying your non-trivial immutable objects. If you want to play with the concepts I’ll introduce you can do it at a playground I’ve created on repl.it.

Please keep in mind that this article has been written for ES6 - take a look at my browserify with ES6 article to see how you can also use it in your projects with Browserify.

Also, if you’d like to see a non-toy React + Redux application or you’d like a gentle introduction to the concepts I talked about (state, reducers, actions etc) you can follow along my React Redux tutorial. This is a rather old article (considering how quickly the Javascript framework state change) but the basic concepts introduced there are true today.

Immutable objects

Let’s start our descent into avoiding mutations by supposing that you had something a little more complex than the initial example, for example your state was like this:

let state = {
    'counter': 0,
    'name': 'John',
    'age': 36
}

If you continued the same example then your ADD reducer would need to return something like this

return {
    'counter': state.counter+1,
    'name': state.name,
    'age': state.age
}

This gets difficult and error prone very soon - and what happens if you later need to add another attribute to your state?

The correct way to implement this would be to enumerate all properties of state except ‘counter’, copy them to a new object, and then assign counter+1 to the new object’s counter attribute. You could implement this by hand however, thankfully, there’s the Object.assign method! This method will copy all attributes from a list of objects to an object which will return as a result and is defined like this:

Object.assign(target, ...sources)

The target parameter is the object that will retrieve all attributes from sources (which is a variadic argument - you can have as many sources as you want - even 0; in this case the target will be returned). For a quick example, running:

let o = {'a': 1}
let oo = Object.assign(o, {'b': 2, 'c': 1}, {'c': 3})
console.log(oo, o===oo)

will return { a: 1, b: 2, c: 3 } true i.e the attributes ‘b’ and ‘c’ were copied to o and it was assigned to oo — notice that o and oo are the same object (thus o is modified now). Also, notice that the the attributes of objects to the right have priority over the attributes of the objects to the left ('c': 1 was overriden by 'c': 3).

As you should have guessed by now, you should never pass the state as the target but instead you should create a new object, thus the ADD reducer should return the following:

return Object.assign({}, state, {'counter': state.counter+1)

This means that it will create a new object which will copy all current attributes of state and increase the existing counter attribute.

I’d like to also add here that instead of using the Object.assign method you could use the spread syntax to more or less do the same. The spread syntax on an object takes this object’s attributes and outputs them as key-value dictionary pairs (for them to be used to initialize other objects). Thus, you can use the spread syntax to create an new object that has the same attributes of another object like this:

let newState = {...state}
// which is similar to
newState = Object.assign({}, state)

Of course you usually need to override some attributes, which can be passed directly to the newly created object, for example for the ADD reducer:

return {...state, 'counter': state.counter+1 }

Like Object.assign, you can have as many sources as you want in your spread syntax thus nothing stops you from using ... multiple times to copy the attributes of multiple objects for example you could define ADD like this:

return {...state, ...{'counter': state.counter+1 } }

The order is similar to Object.assign, i.e the attributes that follow will override the previous ones.

One final comment is that both Object.assign and copying objects with the spread syntax will do a “shallow” copy i.e it will copy only the outer object, not the objects its keys refer to. An example of this behavior is that if you run the following:

let a = {'val': 3 }
let x = {a }
let y = {...x}
console.log(x, y)
x['val2'] = 4
y['val2'] = 5
a['val'] = 33
console.log(x, y)

you’ll get:

{ a: { val: 3 } } { a: { val: 3 } }
{ a: { val: 33 }, val2: 4 } { a: { val: 33 }, val2: 5 }

i.e x and y got a different val2 attribute since they not the same object, however both x and y have a reference to the same a thus when it’s val attribute was changed this change appears to both x and y!

What the above means is that if you have a state object containing other objects (or arrays) you will also need to copy these children objects to keep your state immutable. We’ll see examples on this later.

Immutable arrays

One thing we haven’t talked about yet is what happens if there’s an array in the state, for example your state is let state=[] and you have and APPEND reducer that puts something in the end of that array. The naive (and wrong) way to do it is to call push directly to the state - this will mutate your state and is not be allowed!

You need to copy the array elements and the tool for this job is Array.slice. This methods takes two optional arguments (begin and end) that define the range of elements that will be copied; if you call it without arguments then it will copy the whole array. Using slice, your APPEND reducer can be like this:

let newState = state.slice()
newState.push('new element')
return newState

Also, you could use the Array.concat method which will return a new array by copying all the elements of its arguments

return state.concat(['new element'])

This will append new element to a new object that will have the elements of state (it won’t modify the existing state) and is easier if you have this exact requirement. The advantage of slice is that you can use it to add/remove/modify elements from any place in the original array. For example, here’s how you can add an element after the first element of an array:

let x = ['a', 'b', 'c' ]
let y = x.slice(0,1).concat(['second' ], x.slice(1,3))

Now y will be equal to [ 'a', 'second', 'b', 'c' ]. So the above will get the first (0-th) element from the x array and concat it with another element (second) and the remaining elements of x. Remember that x is not modifyied since concat will create a new array.

In a similar fashion to objects, instead of using concat it is much easier to use the spread syntax. The spread syntax for an array will output its elements one after the other for them to be used by other arrays. Thus, continuing from the previous example, [...x] will return a new array with the elements of x (so it is similar to x.slice() or x.concat()), thus to re-generate the previous example you’ll do something like

let y = y=[...x.slice(0,1), 'second', ...x.slice(1,3)]

All three of concat, slice or the spread syntax will do a shallow copy (similar to how Object.assign works) so the same conclusions from the previous section are true here: If you have arrays inside other arrays (or objects) you’ll need to copy the inner arrays recursively.

More complex cases

We’ll now take a look at some more complex cases and see how quickly it gets difficult because of the shallow copying. Let’s suppose that our state is the following:

const state = {
  'user': {
    'first_name': 'John',
    'last_name': 'Doe',
    'address': {
      'city': 'Athens',
      'country': 'Greece',
      'zip': '12345'
    }
  }
}

and we want to assign a group attribute to the state. This can be easily done with assign:

let groups = [{
    'name': 'group1'
}]

state = Object.assign({}, state, {
  'groups': groups
})

or spread:

state = {
  ...state, 'groups': groups
}

Notice that instead of 'groups': groups I could have used the shorthand syntax and written only groups and it would still work (i.e state = {...state, groups} is the same). In all cases, the resulting state will be:

{
  'user': {
    'first_name': 'John',
    'last_name': 'Doe',
    'address': {
      'city': 'Athens',
      'country': 'Greece',
      'zip': '12345'
    }
  },
  'groups': [{
    'name': 'group1'
  }]
}

From now on I’ll only use the spread syntax which is more compact.

Let’s try to change the user’s name. This is not as easy as the first example because we need to:

  • Create a new copy of the user object with the new first name
  • Create a new copy of the state object with the new user object created above

This can be done in two steps like this:

let user ={...state['user'], 'first_name': 'Jack'}
state = {...state, user}

or in one step like this:

state = {...state, 'user':{
  ...state['user'], 'first_name': 'Jack'}
}

The single step assignment is the combination of the two step described above. It is a little more complex but it saves typing and is prefered because it allows the reducer function to have a single expression. Now let’s try to modify the user’s zip code. We’ll do it in three steps first:

let address ={...state['user']['address'], 'zip': '54321'}
user ={...state['user'], address}
state = {...state, user}

And now in one:

state = {...state, 'user': {
  ...state['user'], 'address': {
    ...state['user']['address'], 'zip': 54321
  }
}}

Now, as can be seen in the above examples, modifying (without mutating) a compex state object is not very easy - it needs much thinking and is too error prone! This will be even more apparent when we also get the array modifications into the equation, for example by adding another two groups:

state = {
  ...state, groups: [
    ...state['groups'].slice(),
    {name: 'group2', id: 2},
    {name: 'group3', id: 3}
  ]
}

The above copies the existing state and assigns to it a new groups object by copying the existing groups and appending two more groups to that array! The state now will be:

{
  user: {
    first_name: 'Jack',
    last_name: 'Doe',
    address: { city: 'Athens', country: 'Greece', zip: 54321 }
  },
  groups: [
    { name: 'group1' },
    { name: 'group2', id: 2 },
    { name: 'group3', id: 3 }
  ]
}

As a final examply, how can we add the missing id attribute to the first group? Following the above techniques:

state = {
  ...state, groups: [
    {...state['groups'][0], 'id': 1},
    ...state['groups'].slice(1)
  ]
}

One more time what the above does?

  • Creates a new object and copies all existing properties of state to it
  • Creates a new array which assigns it to the new state’s groups attribute
  • For the first element of that array it copies all attributes of the first element of state[‘groups’] and assings it an id=1 attribute
  • For the remaining elements of that array it copies all elements of state[‘groups] after the first one

Now think what would happen if we had an even more complex state with 3 or 4 nested levels!

Immutability’s little helpers

As you’ve seen from the previous examples, using immutable objects is not as easy as seems from the toy examples. Actually, drilling down into complex immutable objects and returning new ones that have some values changed is a well-known problem in the functional world and has already a solution called “lenses”. This is a funny name but it more or less means that you use a magnifying lens to look at exactly the value you want and modify or retrieve it. The problem with lenses is that although they solve the problem I mention is that if you want to use them you’ll need to dive deep into functional programming and also you’ll need to include an extra library to your project (even if you only want this specific capability).

For completeness, here’s the the docs on lens from Ramda which is a well known Javascript functional library. This needs you to understand what is prop, what is assoc and then how to use the lens with view, set and over. For me, these are way too much things to remember for such a specific thing. Also, notice that the minified version of Ramda is around 45 kb which is not small. Yes, if I wanted to fully use Ramda or a similar library I’d be delighted to use all these techniques and include it as a dependency - however most people prefer to stick with more familiar (and more procedural) concepts.

The helpers I’m going to present here are more or less a poor man’s lens, i.e you will be able to use the basic functionality of a lens but…

  • without the peculiar syntax and
  • without the need to learn more functional concepts than what you’ll want and
  • without the need to include any more external dependencies

Pretty good deal, no?

In any case, a lens has two parts, a get and a set. The get will be used to drill down and retrieve a value from a complex object while the set will be used to drill down and assign a value to a complex object. The set does not modify the object but returns a new one. The get lens is not really needed since you can easily drill down to an object using the good old index syntax but I’ll include it here for completenes.

We’ll start with the get which seems easier. For this, I’ll just create a function that will take an object and a path inside that object as parameter and retrieve the value at that path. The path could be either a string of the form ‘a.0.c.d’ or an array [‘a’, ‘0’, ‘c’, ‘d’] - for numerical indeces we’ll consider an array at that point.

Thus, for the object {'a': [{'b': {'c': {'d': 32} }}]} when the lens getter is called with either 'a.0.b.c' or [‘a’, 0, ‘b’, ‘c’] as the path, it should return {'d': 32}.

To implement the get helper I will use a functional concept, reduce. I’ve already explained this concept in my previous react-redux tutorial so I urge you to read that article for more info. Using reduce we can apply one by one accumulatively the members of the path to the initial object and the result will be the value of that path. Here’s the implementation of pget (from property get):

const objgetter = (accumulator, currentValue) => accumulator[currentValue];
const pget = (obj, path) =>  (
    (typeof path === 'string' || path instanceof String)?path.split('.'):path
).reduce(objgetter, obj)

I have defined an objgetter reducer function that gets an accumulated object and the current value of the path and just returns the currentValue index of that accumulated object. Finally, for the get lens (named pget) I just check to see if the path is a string or an array (if it’s a string I split it on dots) and then I “reduce” the path using the objgetter defined above and starting by the original object as the initial value. To understand how it is working, let’s try calling it for an object:

const s1 = {'a': [{'b': {'c': {'d': 32} }}]}
console.log(pget(s1, ['a', 0, 'b', 'c']))

The above pget will call reduce on the passed array using the defined objgetter above as the reducer function and s1 as the original object. So, the reducer function will be called with the following values each time:

accumulator currentvalue
s1 'a'
s1['a'] 0
s1['a'][0] 'b'
s1['a'][0]['b'] 'c'
s1['a'][0]['b']['c']  

Thus the result will be exactly what we wanted {'d' :32}. An interesting thing is that it’s working fine without the need to differentiate between arrays and objects because of how index access [] works.

Continuing for the set lens (which will be more difficult), I’ll first represent a simple version that works only with objects and an array path but displays the main idea of how this will work: It uses recursion i.e it will call itself to gradually build the new object. Here’s how it is implemented:

const pset0 = (obj, path, val) => {
  let idx = path[0]

  if(path.length==1) {
    return {
      ...obj, [idx]: val
    }
  } else {
    let remaining = path.slice(1)
    return {
      ...obj,
      [idx]: pset0(...[obj[idx]], remaining, val)
    }
  }
}

As already explained, I have assumed that the path is an array of indeces and that the obj is a complex object (no arrays in it please); the function returns a new object with the old object’s value at the path be replaced with val. This function checks to see if the path has only one element, if yes it will assign the value to that attribute of the object it retrieve. If not, it will call itself recursively by skipping the current index and assign the return value to the current index of the curent object. Let’s see how it works for the following call:

const s2 = {a0: 0, a: {b0: 0, b: {c0: 0, c: 3}}}
console.log(pset0(s2, ['a', 'b', 'c'], 4))
# Call Call parameters Return
1 pset0(s2, [‘a’, ‘b’, ‘c’], 4) {…s2, [‘b’]: pset0(s2[‘a’], [‘b’, ‘c’], 4) }
2 pset0(s2[‘a’], [‘b’, ‘c’], 4) {…s2[‘a’], [‘c’]: pset0(s2[‘a’][‘b’], [‘c’], 4) }
3 pset0(s2[‘a’][‘b’], [‘c’], 4) {…s2[‘a’][‘b’], [‘c’]: 4}

Thus, the first time it will be called it will return a new object with the attributes of s2 but overriding its 'b' index with the return of the second call. The second call will return a new object with the attributes of s2['a'] but override it’s 'c' index with the return of the third call. Finally, the 3rd call will return an object with the attributes of s2['a']['b'] and setting the 'c' index to 4. The result will be as expected equal to:

{a0: 0, a: {b0: 0, b: {c0: 0, c: 4 }}}

Now that we’ve understood the logic we can extend the above function with the following extras:

  • support for arrays in the object using numerical indeces
  • support for array (['a', 'b']) or string path ('a.b') parameter
  • support for a direct value to set on the path or a function that will be applied on that value

Here’s the resulting set lens:

const pset = (obj, path, val) => {
  let parts = (typeof path === 'string' || path instanceof String)?path.split('.'):path
  const cset = (obj, cidx, val) => {
    let newval = val
    if (typeof val === "function") {
      newval = val(obj[cidx])
    }
    if(Array.isArray(obj)) {
      return [
        ...obj.slice(0, cidx*1),
        newval,
        ...obj.slice(cidx*1+1)
        ]
    } else {
      return {
        ...obj, [cidx]: newval
      }
    }
  }

  let pidx = parts[0]
  if(parts.length==1) {
    return cset(obj, pidx, val)
  } else {
    let remaining = parts.slice(1)
    return cset(obj, pidx, pset(obj[pidx], remaining, val))
  }
}

It may seem a little complex but I think it’s easy to be understood: The parts in the beginning will just check to see if the path is an array or a string and split the string to its parts. The cset function that follows is a local function that is used to make the copy of the object or array and set the new value. Here’s how it is working: It will first check to see if the val parameter is a function or a not. If it is a function it will apply this function to the object’s index to get the newvalue else it will just use val as the newvalue. After that it checks if the object it got is an array or not. If it is an array it will do the slice trick we saw before to copy the elements of the array except the newval which will put it at the index (notice that the index at that point must be numerical but that’s up to you to assert). If the current obj is not an array then it must be an object thus it uses the spread syntax to copy the object’s attributes and reassign the current index to newval.

The last part of pset is similar to the pset0 it just uses cset to do the new object/array generation instead of doing it in place like pset0 - as already explained, pset is called recursively until only one element remains on the path in which case the newval will be assigned to the current index of the current obj.

Let’s try to use pset for the following rather complex state:

let state2 = {
  'users': {
    'results': [
      {'name': 'Sera', 'groups': ['g1', 'g2', 'g3']},
      {'name': 'John', 'groups': ['g1', 'g2', 'g3']},
      {'name': 'Joe', 'groups': []}
    ],
    'pagination': {
      'total': 100,
      'perpage': 5,
      'number': 0
    }
  },
  'groups': {
    'results': [
    ]
    ,
    'total': 0
  }
}

Let’s call it three times one after the other to change various attributes:

let new_state2 = pset(
    pset(
        pset(
            pset(state2, "users.results.2.groups.0", 'aa'),
        "users.results.0.name", x=>x.toUpperCase()),
    "users.total", x=>x+1),
'users.results.1.name', 'Jack')

And here’s the result:

{
    "users": {
        "results": [{
            "name": "SERA",
            "groups": ["g1", "g2", "g3"]
        }, {
            "name": "Jack",
            "groups": ["g1", "g2", "g3"]
        }, {
            "name": "Joe",
            "groups": ["aa"]
        }],
        "pagination": {
            "total": 101,
            "perpage": 5,
            "number": 0
        }
    },
    "groups": {
        "results": [],
        "total": 0
    }
}

This should be self explanatory.

I’ve published the above immutable little helpers as an npm package: https://www.npmjs.com/package/poor-man-lens (yes I decided to use the poor man lens name instead of the immutable little helpers) - they are too simple and could be easily copied and pasted to your project but I’ve seen even smaller npm packages and I wanted to try to see if it is easy to publish a package to npm (answer: it is very easy - easier than python’s pip). Also there’s a github repository for these utils in case somebody wants to contribute anything or look at the source: https://github.com/spapas/poor-man-lens.

Notice that this package has been written in ES5 (and actually has a polyfil for Object.assign) thus you should probably be able to use it anywhere you want, even directly from the browser by directly including the pml.js file.

Conclusion

Using the above techniques you should be able to easily keep your state objects immutable. For simple cases you can stick to the spread syntax or Object.assign / Array.slice but for more complex cases you may want to consider either copying directly the pset and pget utils I explained above or just using the poor-man-lens npm package.