/var/

Various programming stuff

Declarative Ecto query filters

Continuing my Elixir journey I’d like to discuss here a method to implement one of my favorite Django features: Declarative query filters. This functionality is not a core Django feature but it is offered through the excellent django-filter package: Using this, you can create a Filter class which defines which fields are to be used for filtering the queryset and how each field will be queried (i.e using things like exact, like, year of date etc).

This is a functionality I am greatly missing in Elixir/phoenix so I’ve tried implementing it on my own. Of course, django-filter has various other capabilities that result from the implicit generation of things that Django offers like automatically creating the html for the declared fields, automatically declare the fields based on their types etc but such things are not supported by phoenix in any case so I won’t be trying in them here.

During my research I’ve seen a bunch of blog posts or packages about this thing however they didn’t properly support joins (i.e you could only filter on fields on a specific schema) or needed too much work to filter on joins (i.e define different filters for each part of the join). In the solution I’ll present here you’ll just define a filter for the specific query you need to filter no matter how many joins it has (just like in django-filters).

What will it do

The solution is more or less a self contained Elixir module named QueryFilterEx that can be used to declaratively filter your queries. To use that you’ll need to declare your filters using a simple array of maps. The filters should then be added in your form using a different input for each filter; then your queryset will be filtered with all the values you’ve added to the inputs using AND.

The module has a very simple API consisting of three functions:

  • get_changeset_from_params(params, filters): Pass it the GET request parameters you got from your form and the declared filters array to return you a proper changeset (which you can then use to build your form in your html)
  • make_filter_changeset(filters, params): This function actually generates the changeset using the filters and a Map of filter_name: value pairs (it is actually used by get_changeset_from_params)
  • filter(query, changeset, filters): Filter the query using the previously created changeset and the declared filters array

You can find a sample of the technique presented in this article in my PHXCRD repository: https://github.com/spapas/phxcrd for example in the user_controller or authority_controller.

Preparing the query

In order to use the QueryFilterEx module you’ll need to properly “prepare” your Ecto query. By preparing I don’t mean a big deal just the fact that you’ll need to name all your relations (or at least name all the relations you’re going to use for filtering). This is very simple to do, for example for the following query:

from(a in Authority,
  join: ak in AuthorityKind,
  on: [id: a.authority_kind_id],
  preload: [authority_kind: ak]
)

you can name the relations by adding two as: atoms like this:

from(a in Authority, as: :authority,
  join: ak in AuthorityKind, as: :authority_kind,
  on: [id: a.authority_kind_id],
  preload: [authority_kind: ak]
)

So after each join: you’ll add a name for your joined relation (and also add a name for your initial relation). Please notice that you can use any name you want for these (not related to the schema names).

Declaring the filters

To declare the filters you’ll just add an array of simple Elixir maps. Each map must have the following fields:

  • :name This is the name of the specific filter; it is mainly used in conjunction with the queryset and the form fields to set initial values etc
  • :type This is the type of the specific filter; it should be a proper Ecto type like :string, :date, :integer etc. This is needed to properly cast the values and catch errors
  • :binding This is the name of the relation this filter concerns which you defined in your query using :as (discussed in previous section)
  • :field_name This is the actual name of the field you want to filter on
  • :method How to filter on this field; I’ve defined a couple of methods I needed but you can implement anything you want

The methods I’ve implemented are the following:

  • :eq Equality
  • :ilike Field value starts with the input - ignore case
  • :icontains Field value contains the input - ignore case
  • :year Field is a date or datetime an its year is the same as the value
  • :date Field is a datetime and its date part is equal to the value

Anything else will just be compared using = (same as :eq).

Integrating it with a controller

As an example let’s see how QueryFilterEx is integrated it with the phxcrd user_controller. The query I’d like to filter on is the following (see that everything I’ll need is named using :as:

from(u in User, as: :user,
  left_join: a in Authority, as: :authority,
  on: a.id == u.authority_id,
  left_join: up in UserPermission,
  on: up.user_id == u.id,
  left_join: p in Permission, as: :permission,
  on: up.permission_id == p.id,
  preload: [authority: a, permissions: p]
)

To declare the filters I like to create a module attribute ending with filters, something like @user_filters for example. Here’s the filters I’m going to use for user_controller:

@user_filters [
  %{name: :username, type: :string, binding: :user, field_name: :username, method: :ilike},
  %{name: :authority_name, type: :string, binding: :authority, field_name: :name, method: :icontains},
  %{name: :permission_name, type: :string, binding: :permission, field_name: :name, method: :ilike},
  %{name: :last_login_date, type: :date, binding: :user, field_name: :last_login, method: :date}
]

So it will check if the user.username and permission.name start with the passed value, authority.name contains the passed value and if the user.login_date (which is a datetime) is the same as the passed date value.

Finally, here’s the full code of the index controller:

def index(conn, params) do
  changeset = QueryFilterEx.get_changeset_from_params(params, @user_filters)

  users =
    from(u in User,
      as: :user,
      left_join: a in Authority, as: :authority,
      on: a.id == u.authority_id,

      left_join: up in UserPermission,
      on: up.user_id == u.id,
      left_join: p in Permission, as: :permission,
      on: up.permission_id == p.id,
      preload: [authority: a, permissions: p]
    )
    |> QueryFilterEx.filter(changeset, @user_filters)
    |> Repo.all()

  render(conn, "index.html", users: users, changeset: changeset)
end

It is very simple, it just uses the get_changeset_from_params method I discussed before to generate the changeset and then uses it to filter the query. Also please notice that it passes the changeset to the template to be properly rendered in the filter form.

The template

The template for the user index action is the following:

<%= form_for @changeset, AdminRoutes.user_path(@conn, :index), [method: :get, class: "filter-form", as: :filter],  fn f -> %>
  <%= label f, :username, gettext "Username" %>
  <%= text_input f, :username  %>

  <%= label f, :authority_name, gettext "Authority name" %>
  <%= text_input f, :authority_name  %>

  <%= label f, :permission_name, gettext "Permission name" %>
  <%= text_input f, :permission_name  %>

  <%= label f, :last_login_date, gettext "Last login date" %>
  <%= text_input f, :last_login_date  %>
  <%= error_tag f, :last_login_date %>

  <%= submit gettext("Filter"), class: "ml-5" %>
  <%= link gettext("Reset"), to: AdminRoutes.user_path(@conn, :index), class: "button button-outline ml-2" %>
<% end %>
<%= for user <- @users do %>
<!-- display the user info -->
<% end %>

Notice that it gets the @changeset and uses it to properly fill the initial values and display error messages. For this case I’ve only added an error_tag for the :last_login_date field, the others since are strings do not really need it since they will accept all values.

Also, the form method form must be :get since we only filter (not change anything) and I’ve passed the as: :filter option to the form_for to collect the parameters under the filter server side parameter (this can be anything you want and can be optionally be passed to QueryFilterEx.get_changeset_from_params to know which parameter the filters are collected on).

How does this work?

In this section I’ll try to explain exactly how the QueryFilterEx module works. Before continuing I want to thank the people at the Elixir forum and #elixir-lang Freenode IRC chat that helped me with understanding how to be able to create dynamic bindings.

So I’ll split this explanation in two parts: Explain QueryFilterEx.get_changeset_from_params and make_filter_changeset (easy) and then explain QueryFilterEx.filter (more difficult).

QueryFilterEx.get_changeset_from_params and make_filter_changeset

This function generates a changeset using the GET request parameters and the list of declared filters. The create changeset is a schemaless one since it may contains fields of various schemas (or fields that are not even exist on a schema). To generate it it uses the cast/4 function passing it a {data, types} first parameter to generate the schemaless changeset. It has two public methods: get_changeset_from_params and make_filter_changeset. The get_changeset_from_params is the one we’ve used to integrate with the controller and is used to retrieve the filter parameters from the request parameters based on the collect parameter of the form we mentioned before (the as: :filter). If such parameters are found they will be passed to make_filter_changeset (or else it will pass an empty struct). Notice that the filter_name by default is "filter" but you can change it to anything you want.

def get_changeset_from_params(params, filters, filter_name \\ "filter") do
  case params do
    %{^filter_name => filter_params} ->
      filters |> make_filter_changeset(filter_params)

    _ ->
      filters |> make_filter_changeset(%{})
  end
end

The make_filter_changeset is the function that actually creates the schemaless changeset. To do that it uses two private functions that operate on the passed filters array: make_filter_keys to extract the :name field of each key filter and the make_filter_types to generate a Map of %{name: :type} as needed by the types of the {data, types} tuple passed to cast (the data is just an empty Map):

defp make_filter_keys(filters) do
  filters |> Enum.map(& &1.name)
end

defp make_filter_types(filters) do
  filters |> Enum.map(&{&1.name, &1.type}) |> Map.new()
end

def make_filter_changeset(filters, params) do
  data = %{}
  types = filters |> make_filter_types

  {data, types}
  |> Ecto.Changeset.cast(params, filters |> make_filter_keys)
  |> Map.merge(%{action: :insert})
end

One interesting thing here is the Map.merge(%{action: :insert}) that is piped to the generated changeset. This is needed to actually display the validation errors, if there’s no action to the changeset (and there won’t be since we aren’t going do any updates to the database with this changeset) then the casting errors won’t be displayed.

Please notice that although I use the get_changeset_from_params in my controller the important function here is the make_filter_changeset. The get_changeset_from_params is mainly used to retrieve the filter-related GET query parameter; however to use QueryFilterEx you can just create (however you want) a Map of filter_name: value pairs and pass it to make_filter_changeset to get the changeset.

QueryFilterEx.filter

The filter method gets three parameters. The query, the changeset (that was created with make_filter_changeset) and the declared filters. This function will then check all declared filters one by one and see if the changeset contains a change for this filter (i.e if the field has a value). If yes it will append a where/3 to the query based on the passed value of the changeset and the declared filter :method.

To do that it just uses Enum.reduce starting with the initial query as an accumulator and reducing on all the declared filters:

def filter(query, changeset, filters) do
  changes = Map.fetch!(changeset, :changes)
  filters |> Enum.reduce(query, creat_where_clauses_reducer(changes))
end

defp creat_where_clauses_reducer(changes) do
  fn %{name: name, field_name: field_name, binding: binding, method: method}, acc ->
    case Map.fetch(changes, name) do
      {:ok, value} ->
        acc |> creat_where_clause(field_name, binding,  method, value)

      _ ->
        acc
    end
  end
end

Notice that the creat_where_clauses_reducer function returns a function (the reducer) that reduce will use. This function checks to see if the current changes of the changeset contain the filter_name:. If yes it will pass the following values to the creat_where_clause function:

  • The accumulated query (acc)
  • The field_name:, :binding and :method values of the current filter
  • The value of the changes of the changeset

If the current filter_name is not contained in the changes then it just returns the accumulated query as it is.

Let’s now take a look at the creat_where_clause function:

defp creat_where_clause(acc, field_name, binding,  method, value) do
  case method do
    :eq -> acc |> where(
      [{^binding, t}],
      field(t, ^field_name) == ^value
    )
    :ilike -> acc |> where(
      [{^binding, t}],
      ilike(field(t, ^field_name), ^("#{value}%") )
    )
    :icontains -> acc |> where(
      [{^binding, t}],
      ilike(field(t, ^field_name), ^("%#{value}%") )
    )
    :year -> acc  |> where(
      [{^binding, t}],
      fragment("extract (year from ?) = ?", field(t, ^field_name), ^value)
    )
    :date -> acc  |> where(
      [{^binding, t}],
      fragment("? >= cast(? as date) and ? < (cast(? as date) + '1 day'::interval"), field(t, ^field_name), ^value, field(t, ^field_name), ^value)
    )
    _ -> acc |> where(
      [{^binding, t}],
      field(t, ^field_name) == ^value
    )

  end
end

This function is just a simple case that pipes the accumulated query to a different where clause depending on the method:. Let’s take a closer look at what happens when :method == :eq:

acc |> where(
  [{^binding, t}],
  field(t, ^field_name) == ^value
)

This may seem a little confusing so let’s take a look at a simple where first:

from(u in User) |> where([u], u.name == "root") |> Repo.all()

Nothing fancy here, now let’s add a named query:

from(u in User, as: :user) |> where([user: u], u.name == "root") |> Repo.all()

Notice that now we can declare that u is an alias for the users named binding. What if we used the tuples syntax for the user: u instead of the keyword one:

from(u in User, as: :user) |> where([{:user, u}], u.name == "root") |> Repo.all()

Yes this still works. What if we wanted to use a variable for the binding name in the where?

binding = :user
from(u in User, as: :user) |> where([{^binding, u}], u.name == "root") |> Repo.all()

I think it starts to make sense now, let’s finally use a variable for the field name also:

binding = :user
field_name = :name
from(u in User, as: :user) |> where([{^binding, u}], field(u, ^field_name) == "root") |> Repo.all()

So this is exactly how this works!

Beyond the :eq I’ve got the definitions for the other methods I described there, the most complex one is probably the :date which is something like:

where(
  [{^binding, t}],
  fragment("? >= cast(? as date) and ? < (cast(? as date) + '1 day'::interval"), field(t, ^field_name), ^value, field(t, ^field_name), ^value)
)

What this does is that it generates the following SQL fragment:

field_name >= cast(value as date) AND field_name < (cast(value as date) + '1 day'::interval)

You can add your own methods by adding more clauses to the case of the creat_where_clause function and following a similar pattern.

Conclusion

By using the QueryFilterEx module presented here you can very quickly declare the fields you want to filter on and the method you want to use for each field no matter if these fields are in the same schema or are accessed through joins. You can easily extend the functionality of the module by adding your own methods. The only extra thing you need to do is to just add names to your queries.

Phoenix forms integration with select2 and ajax

During the past months I’ve tried to implement a project using Elixir and the Phoenix framework. Old visitors of my blog will probably remember that I mainly use Django for back-end development but I decided to also give Phoenix a try.

My first impressions are positive but I don’t want to go into detail in this post; I’ll try to add a more extensive post comparing Elixir / Phoenix with Python / Django someday.

The problem that this particular post will try to explain is how to properly integrate a jQuery select2 dropdown ajax with autocomplete search to your Phonix Forms. This seems like a very common problem however I couldn’t find a proper solution anywhere in the internet. It seems that most people using Phoenix prefer to implement their autocompletes using SPA like functionality (react etc). Also I found this project that seems to be working, however it does not use select2 and I really didn’t like to mess with a different JS library for reasons that should be too obvious to most people.

So here we’ll implement a simple solution for allowing your foreign key value to be autocompleted through ajax using select2. The specific example is that you have a User that belongs to an Authority i.e user has a field named authority_id which is a foreign key to authority. We’ll add a functionality to the user edit form to select the authority using ajax-autocomplete.

Please notice that you can find a working version of this tutorial in my Phoenix Crud template project: https://github.com/spapas/phxcrd. This project contains various other functionality that I need but you should be able to test the user - authority integration by following the instructions there.

The schemas

For this tutorial, we’ll use two schemas: A User and an Authority. Each User belongs to an Authority (thus will have a foreign key to Authority; that’s what we want to set using the ajax select2). Here are the ecto schemas for these entities:

defmodule Phxcrd.Auth.Authority do
  use Ecto.Schema

  import Ecto.Changeset
  alias Phxcrd.Auth.User

  schema "authorities" do
    field :name, :string
    has_many :users, User, on_replace: :nilify
    timestamps()
  end

  @doc false
  def changeset(authority, attrs) do
    authority
    |> cast(attrs, [:name])
    |> validate_required([:name], message: "The field is required")
    |> unique_constraint(:name, message: "The name already exists!")
  end

  use Accessible
end
defmodule Phxcrd.Auth.User do
  use Ecto.Schema

  import Ecto.Changeset
  alias Phxcrd.Auth.Authority

  schema "users" do
    field :email, :string
    field :username, :string
    field :password_hash, :string
    field :password, :string, virtual: true

    belongs_to :authority, Authority

    timestamps()
  end

  @doc false
  def changeset(user, attrs) do
    user
    |> cast(attrs, [:username, :email, :authority_id])
    |> validate_required([:username, :email ])
  end

  use Accessible
end

Notice that both these entities are contained in the Auth context and were created using mix phx.gen.html; I won’t include the migrations here.

The search API

Let’s now take a look at the search api for Authority. I’ve added an ApiController which contains the following function:

def search_authorities(conn, params) do
    q = params["q"]

    authorities =
      from(a in Authority,
        where: ilike(a.name, ^"%#{q}%")
      )
      |> limit(20)
      |> Repo.all()

    render(conn, "authorities.json", authorities: authorities)
end

Notice that this retrieves a q parameter and makes an ilike query to Authority.name. It then passes the results to the view for rendering. Here’s the corresponding function for ApiView:

def render("authorities.json", %{authorities: authorities}) do
    %{results: Enum.map(authorities, &authority_json/1)}
  end

  def authority_json(a) do
    %{
      id: a.id,
      text: a.name
    }
end

Notice that select2 wants its results in a JSON struct with the following form {results: [{id: 1, name: "Authority 1"}]}.

To add this controller action to my routes I’ve added this to router.ex:

scope "/api", PhxcrdWeb do
    pipe_through :api

    get "/search_authorities", ApiController, :search_authorities
end

Thus if you visit http://127.0.0.1/search_authorities?q=A you should retrieve authorities containing A in their name.

The controller

Concenring the UserController I’ve added the following methods to it for creating and updating users:

def new(conn, _params) do
  changeset = Auth.change_user(%User{})
  render(conn, "new.html", changeset: changeset)
end

def create(conn, %{"user" => user_params}) do
  case Auth.create_user(user_params) do
    {:ok, user} ->
      conn
      |> put_flash(:info, "#{user.name} created!")
      |> redirect(to: Routes.user_path(conn, :show, user))

    {:error, changeset} ->
      render(conn, "new.html", changeset: changeset)
  end
end

def edit(conn, %{"id" => id}) do
  user = Auth.get_user!(id)
  changeset = Auth.change_user(user)
  render(conn, "edit.html", user: user, changeset: changeset)
end

def update(conn, %{"id" => id, "user" => user_params}) do
  user = Auth.get_user!(id)

  user_params = Map.merge(%{"authority_id" => nil}, user_params)

  case Auth.update_user(user, user_params) do
    {:ok, user} ->
      conn
      |> put_flash(:info, "User updated successfully.")
      |> redirect(to: Routes.user_path(conn, :show, user))

    {:error, %Ecto.Changeset{} = changeset} ->
      render(conn, "edit.html", user: user, changeset: changeset)
  end
end

Most of these are more or less the default things that mix phx.gen.html creates. One thing that may seem a strange here is the user_params = Map.merge(%{"authority_id" => nil}, user_params) line of update. What happens here is that I want to be able to clear the authority of a user (I’ll explain how in the next sections). If I do that then the user_params that is passed to update will not contain an authority_id key thus the authority_id won’t be changed at all (so even though I cleared it, it will keep its previous value after I save it). To fix that I set a default value of nil to authority_id; if the user has actually selected an authority from the form this will be overriden when merging the two maps. So the resulting user_params will always contain an authority_id key, either set to nil or to the selected authority.

Beyond that I wont’ go into detail explaining the above functions, but if something seems strange feel free to ask. I also won’t explain the Auth.* functions; all these are created by phoenix in the context module.

The view

The UserView module contains a simple but very important function:

def get_select_value(changeset, attr) do
  case changeset.changes[attr] do
    nil -> Map.get(changeset.data, attr)
    z -> z
  end
end

This functions gets two parameters: The changeset and the name of the attribute (:authority_id in our case). What it does is to first check if this attribute is contained in the changeset.changes; if yes it will return that value. If it isn’t contained in the changeset.changes then it will return the value of changeset.data for that attribute.

This is a little complex but let’s try to understand its logic: When you start editing a User you want to display the current authority of that instance. However, when you submit an edited user and retrieve an errored form (for example because you forgot to fill the username) you want to display the authority that was submitted in the form. So the changeset.changes contains the changes that were submitted just before while the changeset.data contain the initial value of the struct.

Update 02/07/2019: Please notice that instead of using the get_select_value I presented before you can use the Ecto.Changeset.get_field function that does exactly this! So get_select_value could be defined like this:

def get_select_value(changeset, attr) do
  changeset |> Ecto.Changeset.get_field(attr)
end

The form template

Both the :new and :edit actions include a common form.html.eex template:

<%= form_for @changeset, @action, fn f -> %>
  <%= if @changeset.action do %>
  <div class="alert alert-danger">
    <p><%= gettext("Problems while saving") %></p>
  </div>
  <% end %>
  <div class='row'>
    <div class='column'>
      <%= label f, :username %>
      <%= text_input f, :username %>
      <%= error_tag f, :username %>
    </div>
    <div class='column'>
      <%= label f, :email %>
      <%= text_input f, :email %>
      <%= error_tag f, :email %>
    </div>
  </div>

  <div class='row'>
    <div class='column'>
      <%= label f, :authority %>
      <%= select(f,
        :authority_id, [
          (with sv when not is_nil(sv) <- get_select_value(@changeset, :authority_id),
                                     a <- Phxcrd.Auth.get_authority!(sv), do: {a.name, a.id})
        ],
        style: "width: 100%")
        %>
      <%= error_tag f, :authority_id %>
    </div>

  </div>

  <div>
    <%= submit gettext("Save") %>
  </div>
<% end %>

This is a custom Phoenix form but it has the following addition which is more or less the meat of this article (along with the get_select_value function I explained before):

select(f, :authority_id, [
        (with sv when not is_nil(sv) <- get_select_value(@changeset, :authority_id),
                                   a <- Phxcrd.Auth.get_authority!(sv), do: {a.name, a.id})
      ],
      style: "width: 100%")

So this will create an html select element which will contain a single value (the array in the third parameter of select): The authority of that object or the authority that the user had submitted in the form. For this it uses get_select_value to retrieve the :authority_id and if it’s not nil it passes it to get_authority! to retrieve the actual authority and return a tuple with its name and id.

By default when you create a select element you’ll pass an array of all options in the third parameter, for example:

select(f, :authority_id, Phxcrd.Auth.list_authorities |> Enum.map(&{&1.name, &1.id}))

Of course this beats the purpose of using ajax since all options will be rendered.

The final step is to add the required custom javascript to convert that select to select2-with-ajax:

$(function () {
    $('#user_authority_id').select2({
      allowClear: true,
      placeholder: 'Select authority',
      ajax: {
        url: '<%= Routes.api_path(@conn, :search_authorities) %>',
        dataType: 'json',
        delay: 150,
        minimumInputLength: 2
      }
    });
})

The JS very rather simple; the allowClear option will display an x so that you can clear the selected authority while the ajax url will be that of the :search_authorities.

Conclusion

Although this article may seem a little long, as I’ve already mentioned the most important thing to keep is how to properly set the value that should be displayed in your select2 widget. Beyond that everything is a walk in the park by following the docs.

How to create a custom filtered adapter in Android

Introduction

Android offers a nice component named AutoCompleteTextView that can be used to auto-fill a text box from a list of values. In its simplest form, you just create an array adapter passing it a list of objects (that have a proper toString() method). Then you type some characters to the textbox and by default it will filter the results searching in the beginning of the backing object’s toString() result.

However there are times that you don’t want to look at the beginning of the string (because you want to look at the middle of the string) or you don’t want to just to search in toString() method of the object or you want to do some more fancy things in object output. For this you must override the ArrayAdapter and add a custom Filter.

Unfurtunately this isn’t as straightforward as I’d like and I couldn’t find a quick and easy tutorial on how it can be done.

So here goes nothing: In the following I’ll show you a very simple android application that will have the minimum viable custom filtered adapter implementation. You can find the whole project in github: https://github.com/spapas/CustomFilteredAdapeter but I am going to discuss everything here also.

The application

Just create a new project with an empty activity from Android Studio. Use kotlin as the language.

The layout

I’ll keep it as simple as possible:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
        xmlns:android="http://schemas.android.com/apk/res/android"
        xmlns:tools="http://schemas.android.com/tools"
        android:orientation="vertical"
        android:layout_width="match_parent" android:layout_height="match_parent"
        tools:context=".MainActivity">
    <TextView
            android:layout_width="match_parent" android:layout_height="wrap_content"
            android:text="Hello World!"
            android:textSize="32sp"
            android:textAlignment="center"/>
    <AutoCompleteTextView
            android:layout_marginTop="32dip"
            android:layout_width="match_parent" android:layout_height="wrap_content"
            android:id="@+id/autoCompleteTextView"/>
</LinearLayout>

You should just care about the AutoCompleteTextView with an id of autoCompleteTextView.

The backing data object

I’ll use a simple PoiDao Kotlin data class for this:

data class PoiDao(
    val id: Int,
    val name: String,
    val city: String,
    val category_name: String
)

I’d like to be able to search to both name, city and category_name of each object. To create a list of the pois to be used to the adapter I can do something like:

val poisArray = listOf(
    PoiDao(1, "Taco Bell", "Athens", "Restaurant"),
    PoiDao(2, "McDonalds", "Athens","Restaurant"),
    PoiDao(3, "KFC", "Piraeus", "Restaurant"),
    PoiDao(4, "Shell", "Lamia","Gas Station"),
    PoiDao(5, "BP", "Thessaloniki", "Gas Station")
)

The custom adapter

This will be an ArrayAdapter<PoiDao> implementing also the Filterable interface:

inner class PoiAdapter(context: Context, @LayoutRes private val layoutResource: Int, private val allPois: List<PoiDao>):
    ArrayAdapter<PoiDao>(context, layoutResource, allPois),
    Filterable {
    private var mPois: List<PoiDao> = allPois

    override fun getCount(): Int {
        return mPois.size
    }

    override fun getItem(p0: Int): PoiDao? {
        return mPois.get(p0)
    }

    override fun getItemId(p0: Int): Long {
        // Or just return p0
        return mPois.get(p0).id.toLong()
    }

    override fun getView(position: Int, convertView: View?, parent: ViewGroup?): View {
        val view: TextView = convertView as TextView? ?: LayoutInflater.from(context).inflate(layoutResource, parent, false) as TextView
        view.text = "${mPois[position].name} ${mPois[position].city} (${mPois[position].category_name})"
        return view
    }

    override fun getFilter(): Filter {
        // See next section
    }
}

You’ll see that we add an instance variable named mPois that gets initialized in the start with allPois (which is the initial list of all pois that is passed to the adapter). The mPois will contain the filtered results. Then, for getCount and getItem we return the corresponding valeus from mPois; the getItemId is used when you have an sqlite backed adapter but I’m including it here for completeness.

The getView will create the specific line for each item in the dropdown. As you’ll see the layout that is passed must have a text child which is set based on some of the attributes of the corresponding poi for each position. Notice that we can use whatever view layout we want for our dropdown result line (this is the layoutResource parameter) but we need to configure it (i.e bind it with the values of the backing object) here properly.

Finally we create a custom instance of the Filter, explained in the next section.

The custom filter

The getFilter creates an object instance of a Filter and returns it:

override fun getFilter(): Filter {
    return object : Filter() {
        override fun publishResults(charSequence: CharSequence?, filterResults: Filter.FilterResults) {
            mPois = filterResults.values as List<PoiDao>
            notifyDataSetChanged()
        }

        override fun performFiltering(charSequence: CharSequence?): Filter.FilterResults {
            val queryString = charSequence?.toString()?.toLowerCase()

            val filterResults = Filter.FilterResults()
            filterResults.values = if (queryString==null || queryString.isEmpty())
                allPois
            else
                allPois.filter {
                    it.name.toLowerCase().contains(queryString) ||
                    it.city.toLowerCase().contains(queryString) ||
                    it.category_name.toLowerCase().contains(queryString)
                }
            return filterResults
        }
    }
}

This object instance overrides two methods of Filter: performFiltering and publishResults. The performFiltering is where the actual filtering is done; it should return a FilterResults object containing a values attribute with the filtered values. In this method we retrieve the charSequence parameter and converit it to lowercase. Then, if this parameter is not empty we filter the corresponding elements of allPois (i.e name, city and category_name in our case) using contains. If the query parameter is empty then we just return all pois. Warning java developers; here the if is used as an expression (i.e its result will be assigned to filterResults.values).

After the performFiltering has finished, the publishResults method is called. This method retrieves the filtered results in its filterResults parameter. Thus it sets mPois of the custom adapter is set to the result of the filter operation and calls notifyDataSetChanged to display the results.

Using the custom adapter

To use the custom adapter you can do something like this in your activity’s onCreate:

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    val poisArray = listOf(
        // See previous sections
    )
    val adapter = PoiAdapter(this, android.R.layout.simple_list_item_1, poisArray)
    autoCompleteTextView.setAdapter(adapter)
    autoCompleteTextView.threshold = 3

    autoCompleteTextView.setOnItemClickListener() { parent, _, position, id ->
        val selectedPoi = parent.adapter.getItem(position) as PoiDao?
        autoCompleteTextView.setText(selectedPoi?.name)
    }
}

We create the PoiAdapter passing it the poisArray and android.R.layout.simple_list_item_1 as the layout. That layout just contains a textview named text. As we’ve already discussed you can pass something more complex here. The thresold defined the number of characters that the user that needs to enter to do the filtering (default is 2).

Please notice that when the user clicks (selects) on an item of the dropdown we set the contents of the textview (or else it will just use the object’s toString() method to set it).

Fixing your Django async job - database integration

I’ve already written two articles about django-rq and implementing asynchronous tasks in Django. However I’ve found out that there’s a very important thing missing from them: How to properly integrate your asynchronous tasks with your Django database. This is very important because if it is not done right you will start experiencing strange errors about missing database objects or duplicate keys. The most troublesome thing about these errors is that they are not consistent. Your app may work fine but for some reason you’ll see some of your asynchronous tasks fail with these errors. When you re-queue the async jobs everything will be ok.

Of course this behavior (code that runs sometimes) smells of a race condition but its not easy to debug it if you don’t know the full story.

In the following I will describe the cause of this error and how you can fix it. As a companion to this article I’ve implemented a small project that can be used to test the error and the fix: https://github.com/spapas/async-job-db-fix.

Notice that although this article is written for django-rq it should also help people that have the same problems with other async job systems (like celery or django-q).

Description of the project

The project is very simple, you can just add a url and it will retrieve its content asynchronously and report its length. For the models, it just has a Task model which is used to provide information about what we want to the asynchronous task to do and retrieve the result:

from django.db import models

class Task(models.Model):
    created_on = models.DateTimeField(auto_now_add=True)
    url = models.CharField(max_length=128)
    url_length = models.PositiveIntegerField(blank=True, null=True)
    job_id = models.CharField(max_length=128, blank=True, null=True)
    result = models.CharField(max_length=128, blank=True, null=True)

It also has a home view that can be used to start new asynchronous tasks by creating a Task object with the url we got and passing it to the asynchronous task:

from django.views.generic.edit import FormView
from .forms import TaskForm
from .tasks import get_url_length
from .models import Task

import time
from django.db import transaction

class TasksHomeFormView(FormView):
    form_class = TaskForm
    template_name = 'tasks_home.html'
    success_url = '/'

    def form_valid(self, form):
        task = Task.objects.create(url=form.cleaned_data['url'])
        get_url_length.delay(task.id)
        return super(TasksHomeFormView, self).form_valid(form)

    def get_context_data(self, **kwargs):
        ctx = super(TasksHomeFormView, self).get_context_data(**kwargs)
        ctx['tasks'] = Task.objects.all().order_by('-created_on')
        return ctx

And finally the asynchronous job itself that retrieves the task from the database, requests its url and saves its length:

import requests
from .models import Task
from rq import get_current_job
from django_rq import job

@job
def get_url_length(task_id):
    jb = get_current_job()
    task = Task.objects.get(
        id=task_id
    )
    response = requests.get(task.url)
    task.url_length = len(response.text)
    task.job_id = jb.get_id()
    task.result = 'OK'
    task.save()

The above should be fairly obvious: The user visits the homepage and enters a url at the input. When he presses submit the view will create a new Task object with the url that the user entered and fire-off the get_url_length asynchronous job passing the task id of the task that was just created. It will then return immediately without waiting for the asynchronous job to complete. The user will need to refresh to see the result of his job; this is the usual behavior with async jobs.

The asynchronous job on the other hand will retrieve the task whose id got as a parameter from the database, do the work it needs to do and update the result when it is finished.

Unfortunately, the above simple setup will probably behave erratically by randomly throwing database related errors!

Cause of the problem

In the previous section I said probably because the erratic behavior is caused by a specific setting of your Django project; the ATOMIC_REQUESTS. This setting can be set on your database connection and if it is TRUE then each request will be atomic. This means that each request will be tied with a database transaction i.e a transaction will be started when your request starts and commited only when your requests finishes; if for some reason your request throws an error then the transaction will be rolled back. An example of this setting is:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
        'ATOMIC_REQUESTS': True,
    }
}

Now, in my opinion, ATOMIC_REQUESTS is a great thing to have because it makes everything much easier. I always set it to True to my projects because I don’t need to actually think about transactions and requests; I know that if there’s a problem in a request the whole transaction will be rolle back and no garbage will be left in the database. If on the other hand for some reason a request does not need to be tied to a transaction I just set it off for this specific transaction (using transaction.non_atomic_requests_). Please notice that by default the ATOMIC_REQUESTS has a False value which means that the database will be in autocommit mode meaning that every command will be executed immediately.

So although the ATOMIC_REQUESTS is great, it is actually the reason that there are problems with asynchronous tasks. Why? Let’s take a closer look at what the form_valid of the view does:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url']) #1
    get_url_length.delay(task.id) #2
    return super(TasksHomeFormView, self).form_valid(form) #3

It creates the task in #1, fires off the asynchronous task in #2 and continues the execution of the view processing in #3. The important thing to understand here is that the transaction will be commited only after #3 is finished. This means that there’s a possibility that the asynchronous task will be started before #3 is finished thus it won’t find the task because the task will not be created yet(!) This is a little counter-intuitive but you must remember that the async task is run by a worker which is a different process than your application server; the worker may be able to start before the transaction is commited.

If you want to actually see the problem every time you can add a small delay between the start of the async task and the form_valid something like this:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url'])
    get_url_length.delay(task.id)
    time.sleep(1)
    return super(TasksHomeFormView, self).form_valid(form)

This will make the view more slow so the asynchronous worker will always have time to start executing the task (and get the not found error). Also notice that if you had ATOMIC_REQUESTS: False the above code would work fine because the task would be created immediately (auto-commited) and the async job would be able to find it.

The solution

So how is this problem solved? Well it’s not that difficult now that you know what’s causing it!

One solution would be to set ATOMIC_REQUESTS to False but that would make all database commands auto-commit so you’ll lose request-transaction-tieing. Another solution would be to set ATOMIC_REQUESTS to True and disable atomic requests for the specific view that starts the asynchronous job using transaction.non_atomic_requests_. This is a viable solution however I don’t like it because I’d lose the comfort of transaction per request for this specific request and I would need to add my own transaction handling.

A third solution is to avoid messing with the database in your view and create the task object in the async job. Any parameters you want to pass to the async job would be passed directly to the async function. This may work fine in some cases but I find it more safe to create the task in the database before starting the async job so that I have better control and error handling. This way even if there’s an error in my worker and for some reason the async job never starts or it breaks before being able to handle the database, I will have the task object in the database because it will have been created in the view.

Is there anything better? Isn’t there a way to start the executing the async job after the transaction of the view is commited? Actually yes, there is! For this, transaction.on_commit comes to the rescue! This function receives a callback that will be called after the transaction is commited! Thus, to properly fix you project, you should change the form_valid method like this:

def form_valid(self, form):
    task = Task.objects.create(url=form.cleaned_data['url'])
    transaction.on_commit(lambda: get_url_length.delay(task.id))
    time.sleep(1)
    return super(TasksHomeFormView, self).form_valid(form)

Notice that I need to use lambda to create a callback function that will call get_url_length.delay(task.id) when the transaction is commited. Now even though I have the delay there the async job will start after the transaction is commited, ie after the view handler is finished (after the 1 second delay).

Conclusion

From the above you should be able to understand why sometimes you have problems when your async jobs use the database. To fix it you have various options but at least for me, the best solution is to start your async jobs after the transaction is commited using transaction.on_commit. Just change each async.job.delay(parameters) call to transaction.on_commit(lambda: async.job.delay(parameters)) and you will be fine!

Use du to find out the disk usage of each directory in unix

One usual problem I have when dealing with production servers is that their disks get filled. This results in various warnings and errors and should be fixed immediately. The first step to resolve this issue is to actually find out where is that hard disk space is used!

For this you can use the du unix tool with some parameters. The problem is that du has various parameters (not needed for the task at hand) and the various places I search for contain other info not related to this specific task.

Thus I’ve decided to write this small blog post to help people struggling with this and also to help me avoid googling for it by searching in pages that also contain other du recipies and also avoid the trial and error that this would require.

So to print out the disk usage summary for a directory go to that directory and run du -h -s *; you need to have access to the child subdirectories so probably it’s better to try this as root (unless you go to your home dir for example).

Here’s a sample usage:

[root@server1 /]# cd /
[root@server1 /]# du -h -s *
7.2M    bin
55M     boot
164K    dev
35M     etc
41G     home
236M    lib
25M     lib64
20K     lost+found
8.0K    media
155G    mnt
0       proc
1.6G    root
12M     sbin
8.0K    srv
427M    tmp
3.2G    usr
8.9G    var

The parameters are -h to print human readable sizes (G, M etc) and -s to print a summary usage of each parameter. Since this will output the summary for each parameter I finally pass * to be changed to all files/dirs in that directory. If I used du -h -s /tmp instead I’d get the total usage only for the /tmp directory.

Another trick that may help you quickly find out the offending directories is to append the | grep G pipe command (i.e run du -h -s * | grep G) which will filter out only the entries containing a G (i.e only print the folders having more than 1 GB size). Yeh I know that this will also print entries that have also a G in their name but since there aren’t many directores that have G in their name you should be ok.

If you run the above from / so that /proc is included you may get a bunch of du: cannot access 'proc/nnn/task/nnn/fd/4': No such file or directory errors; just add the 2> /dev/null pipe redirect to redirect the stderr output to /dev/null, i.e run du -h -s * 2> /dev/null.

Finally, please notice that if there are lots of files in your directory you’ll get a lot of output entries (since the * will match both files and directories). In this case you can use echo */ or ls -d */ to list only the directories; append that command inside a ` pair or $() (to substitute for the command output) instead of the * to only get the sizes of the directories, i.e run du -h -s $(echo */) or du -h -s `echo */`.

One thing that you must be aware of is that this command may take a long time especially if you have lots of small files somewhere. Just let it run and it should finish after some time. If it takes too long time try to exclude any mounted network directories (either with SMB or NFS) since these will take extra long time.

Also, if you awant a nice interactive output using ncurses you can download and compile the ncdu tool (NCurses Disk Usage).