/var/

Various programming stuff

Hello! If you are using an ad blocker but find something useful here and want to support me please consider disabling your ad blocker for this site.

Thank you,
Serafeim

Multiple storages for the same FileField in Django

When you need to support user-uploaded files from Django (usually called media) you will probably use a FileField in your models. This is translated to a simple varchar (text) field in the database that contains a unique identifier for the file. This usually would be the path to the file, however this is not always the case!

What really happens is that Django has an underlying concept called a File Storage which is a class that has information on how to talk to the actual storage backend, and particularly how to translate the unique identifier stored on the db to an actual file object. By default Django stores files in the file system using the FileSystemStorage however it is possible to use different backends through an add-on (for example Amazon S3) or even write your own.

Each FileField can be configured to use a different storage backend by passing the storage parameter; if you don’t use this parameter then the default storage backend is used. So you can easily configure a FileField that would upload files to your filesystem and another one that would upload files to S3.

However, one thing that is not supported though is to use multiple storages for the same FileField depending on some parameter of the model instance. Unfortunately, in a recent project I had to do exactly that: We had a FileField on a model that contained hundreds of GBs of files stored on the filesystem; we wanted to be able to upload the files of new instances of that model on S3 but also wanted to keep the old files on the filesystem to avoid moving all these to S3 (which would result to a lot of downtime). I also wanted a way to be “flexible” on this i.e to be able to change again the storage backend for some instances if needed and definitely not move/copy all these files!

If you take a peek at the FileField options you’ll see that there’s a storage parameter that can be a callable. However this callable is initialized with the models and is not evaluated again until the app is restarted so it can’t be used to decide on the storage for each model instance.

The only thing that is evaluated each time a file is uploaded through the FileField is when upload_to is a function. This function receives the model instance and returns the path that the file will be uploaded to.

The idea is to use this upload_to function to return a different path depending on the model instance and then use a custom storage backend that will use the path to decide on the actual storage backend to use.

This is the code I ended up with for the upload_to function:

def file_upload_path(instance, filename):
    dt_str = app.created_on.strftime("%Y/%m/%d")
    file_storage = ""

    if instance.id >= settings.STORAGE_CHANGE_ID: 
        file_storage = settings.STORAGE_SELECTION_STR + "/"

    return "protected/{0}{1}/{2}/{3}".format(file_storage, dt_str, instance.id, filename)

class Model(models.Model):
    file = models.FileField(upload_to=file_upload_path)

What happens here is that I have a setting STORAGE_CHANGE_ID that is the id of the instance after which all instances will use the different storage backend. You can use whatever method you want here to decide on the storage that would be used; the only thing to keep in mind is to put the storage somewhere on the returned path.

I also have a setting STORAGE_SELECTION_STR that is the string that will be used in the path to differentiate the storage backend. The STORAGE_SELECTION_STR has the value of minios3 for this project.

Using this function the paths of the instances that are >= STORAGE_CHANGE_ID will be of the form protected/minios3/2021/04/11/1234/filename.ext while for the old files these will be of the form protected/2021/04/11/1234/filename.ext. Notice the minios3 string in between.

Of course this is not enough. We also need to tell Django to use the different storage backend for the new files. In order to do this we have to implement a custom storage class like this:

from django.core.files.storage import FileSystemStorage, Storage
from storages.backends.s3boto3 import S3Boto3Storage
from django.conf import settings


class FilenameBasedStorage(Storage):
    minio_choice = settings.STORAGE_SELECTION_STR

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def _open(self, name, mode="rb"):
        if self.minio_choice in name:
            return S3Boto3Storage().open(name, mode)
        else:
            return FileSystemStorage().open(name, mode)

    def _save(self, name, content):
        if self.minio_choice in name:
            return S3Boto3Storage().save(name, content)
        else:
            return FileSystemStorage().save(name, content)

    def delete(self, name):
        if self.minio_choice in name:
            return S3Boto3Storage().delete(name)
        else:
            return FileSystemStorage().delete(name)

    def exists(self, name):
        if self.minio_choice in name:
            return S3Boto3Storage().exists(name)
        else:
            return FileSystemStorage().exists(name)

    def size(self, name):
        if self.minio_choice in name:
            return S3Boto3Storage().size(name)
        else:
            return FileSystemStorage().size(name)

    def url(self, name):
        if self.minio_choice in name:
            return S3Boto3Storage().url(name)
        else:
            return FileSystemStorage().url(name)

    def path(self, name):
        if self.minio_choice in name:
            return S3Boto3Storage().path(name)
        else:
            return FileSystemStorage().path(name)

This class should be self explainable: It uses the settings.STORAGE_SELECTION_STR we mentioned above to decide which storage backend to use and then it forwards each method to the corresponding backend (either the filesystem storage or the S3 storage).

One thing to notice is that the django.core.files.storage.Storage class this class inherits from has more methods that can be implemented (and would raise if called without implementing them) however this implementation works fine for my needs.

One question some readers may have is what happens if the user uploads a file named test-minios3.pdf (i.e a file containing the STORAGE_SELECTION_STR). Well you may just as well ignore it; it will just be saved always on the minio-s3 storage backend. Or you can make sure to remove that string from the filename before saving it on the file_upload_path. I chose to ignore it since it doesn’t matter for my use case.

Finally, we need to tell Django to use this storage class for the file field. We can do this by adding it to the FileField like:

file = models.FileField(upload_to=file_upload_path, storage=FilenameBasedStorage)

or we can configure it on the DEFAULT_FILE_STORAGE setting (for Django < 4.2) or on the STORAGES dict (for Django >= 4.2).

I hope this helps someone else that needs to do something similar!

Comments