Aggregations and Ordering in Django

Filtering data is the same as asking which objects with given features we have. Queries to the database can answer some other questions. For example, how many objects do we have? How much money there is on all bank accounts?

We can also gather our data in groups and aggregate the values for each of them.

Another operation that may be really helpful is sorting data by its attributes. Let’s see how it works in Django ORM.

Ordering

By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.

Let’s try to create a model that will give us information regarding the daily temperature. The methods are not quite accurate yet, but we may rely on historical data and statistics. The only model in our application is called DayWeather:

from django.db import models


class DayWeather(models.Model):
    date = models.DateField()
    precipitation = models.FloatField()
    temperature = models.FloatField()
    was_raining = models.BooleanField()

Are you curious to find the 3 coldest days in our database? Let’s look for an answer:

top_three_coldest_days = DayWeather.objects.order_by('temperature')[:3]

Remember that ordering works like sorting: the first value is the smallest and the last is the greatest. We call the order_by method for Object Manager with temperature as a parameter and it returns data sorted by temperature field.

This method might be applied for Object Manager or QuerySet; you may filter data and afterward order it or even make aggregation operations and sort the complete outcome.

How about getting the top 3 most sultry days from the database?

top_three_hottest_days = DayWeather.objects.order_by('-temperature')[:3]

All that changes is adding the minus to our parameter. The minus sign means the reversed order. In our case, we get the biggest values of temperature first.

Aggregation

When specifying the field to be aggregated in an aggregate function, Django will allow you to use the same double underscore notation that is used when referring to related fields in filters. Django will then handle any table joins that are required to retrieve and aggregate the related value.

Let’s proceed with our weather example. Say, if we want to know how many days will be rainy and gloomy next month, we should look at how many days it rained in the same month last year. Assume that we have variables last_year and next_month:

raining_days = DayWeather.objects.filter(
    date__year=last_year, date__month=next_month, was_raining=True
)

raining_days_forecast = raining_days.count()

We make a QuerySet and call the method count on it. We think it is a good approximation to count the number of raining days in the same month last year and base our forecast on this number.

We may also wonder about the average temperature next week. For this prediction we analyze the temperature for the past week:

from datetime import date, timedelta
from django.db.models import Avg

query = DayWeather.objects.filter(date__gt=date.today() - timedelta(days=7))

average_temperature = query.aggregate(average=Avg('temperature'))['average']

We consider the aggregate method and pass any custom name as a parameter. The value of this parameter is a special function Avg, and the parameter of the Avg function is the name of the field we need to process.

The result of this function is Python’s dictionary {‘average’: …}. We get the value of the average temperature by the custom name we choose early in the aggregate method.

Group by Aggregations

We look through Django aggregation functions and find Avg, Count, Max, Min, StdDev, Sum, Variance. We can apply any of these functions to the numerical field values of the QuerySet.

The other task is to predict the total precipitation for each month for a whole year ahead. Should we create twelve QuerySets and process them one by one? Well, this is one way to do it; the other is to group values by month:

from django.db.models import Sum

precipitation = DayWeather.objects.filter(date__year=last_year) \
                          .values('date__month') \
                          .annotate(sum=Sum('precipitation'))

# precipitation is <Queryset [{'date__month': 1, 'sum': ...}, ...]>

We make a query and group our values by the month calling values method and passing a field or a field lookup to it. Then we call the annotate method; the syntax rules for it are the same as for aggregate in the previous example.

The result this time is a QuerySet consisting of customized objects in the form of dictionaries. We can access each object by index or convert the QuerySet to Python collection and work with it as we would with any other collection.

Count Function vs Count Method

The last prediction we want to make is the number of warm days with the outside temperature greater than or equal to 20 degrees Celsius per week. We again look at the values of the last year for each week:

from django.db.models import Count

warm_days = DayWeather.objects.filter(date__year=last_year) \
                      .filter(temperature__gte=20) \
                      .values('date__week') \
                      .annotate(count=Count('date'))

# warm_days is <Queryset [{'date__week': 1, 'count': ...}, ...]>

The call is similar to the previous one, but this time we pass the function Count to the annotated method.

There is a difference between the Count function and the count method of a QuerySet. The Count function returns statistics for each annotated value; the method returns the number of elements in the whole QuerySet.