Django web framework. Ridiculously fast, fully loaded, reassuringly secure, exceedingly scalable, incredibly versatile.
With Django, you can take web applications from concept to launch in a matter of hours. Django takes care of much of the hassle of web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
For more information, visit Django's website at djangoproject.com/.
This guide describes how you can use Astra DB for your Django applications in a manner that is as idiomatic as possible within the Django way of doing things. The practices outlined here, in most cases, even make it possible to migrate an existing Django application to using Astra DB with minimal changes.
RDBMS-based applications and Astra DB
For more complex applications that fully leverage the capabilities of a relational database, such as foreign keys, adjustments of the data model would be needed according to the fundamental approach to data modeling in Astra DB (i.e. in Cassandra).
In this page we adopt and suggest usage of the
Python package, which essentially provides Django object models on a Cassandra
backend. Notice, however, that the package is not as feature-rich as its RDBMS counterpart,
which in certain cases might require you to do a bit more of manual plumbing.
This page comes with a fully-working sample application as a companion repository, ready to be cloned and launched provided you go through its setup steps. All you need is an Astra DB instance and a corresponding database token. Refer to the README on the repository for more details, including a full setup guide, or click the button below to get a copy of the application:
The reference application ("partyfinder") is a very simple vanilla Django website to browse, create and delete "parties" happening in given cities at specific dates. Additionally, to illustrate the use of advanced Cassandra-specific features (namely, LWTs), a sort of "count-me-in" feature is also implemented to keep a consistent count of who will be attending a given party.
Astra DB usage in Django¶
With the Cassandra package for Django, you can switch between databases mostly in a seamless way: development still follows the "object-mapper" philosophy of defining models for the entities in the database and, so to speak, let the django engine figure the rest out by itself.
A difference is that, instead of the native
django.db.models.Model, you have to
to comply with the underlying CQL data types available for columns, the fields in
a model are drawn from the
cassandra.cqlengine.columns package. Moreover,
when creating a model for Cassandra, special syntax make it possible to specify
which part of the primary key is in the clustering columns. The following example
comes from the reference application:
import uuid from django.utils import timezone from cassandra.cqlengine import columns from django_cassandra_engine.models import DjangoCassandraModel # A model for this app class Party(DjangoCassandraModel): city = columns.Text( primary_key=True, ) id = columns.UUID( primary_key=True, clustering_order='asc', # (allowed: 'asc' , 'desc', lowercase) default=uuid.uuid4, ) name = columns.Text() people = columns.Integer(default=0) date = columns.DateTime(default=timezone.now) class Meta: get_pk_field='id'
Pitfalls of using Models¶
With object mappers, and the available Cassandra models, you can handle most of an application's needs. Still, a word of caution about usage of models is in order.
Models, if used casually, may encourage the wrong read pattern on a Cassandra table:
models implement methods such as
.all(), which in general map to the dreaded "allow filtering"
clause in terms of queries to Cassandra, and are generally to be avoided in production.
Another example is that the model's
.filter(...) method might be given filtering conditions
that do not map to the sensible query patterns the table is designed for, thereby hidering performance
or resulting in query timeouts. In short: do not let the model fool you, you still have to play by Cassandra's rules.
Inadvertently querying a table the wrong way
The table for
Party objects above ends up having
PRIMARY KEY (( city ), id).
This means that to get a given party one should do something like"
It is worth noting that if you omit the city, the following line would raise no error:
and (assuming global uniqueness of the IDs) would even appear to work as fine, but the underlying CQL query would be a performance killer on a production application.
There is another reason to be wary of models: some of the advanced techniques to use
Cassandra simply don't fit into the models philosophy.
For these, you need to access the underlying
Session object and
directly run CQL code on it. Luckily, there is a way to do so, and it is exemplified
in the sample application (keep reading to see how to do it).
Implications of Cassandra data models¶
The proper road to a successful Cassandra (or Astra DB)-backed application starts from designing the right data model. But it is also possible to migrate existing Django applications: in most cases, as observed above, a one-to-one reformulation of the models would do the trick.
However, the no-relations, no-joins, no-foreign-keys nature of the NoSQL database at hand means that if the existing application makes use of these things, a bit of work is warranted to go back to data-modeling-related issues.
In other words, if models in your pre-existing, relational-based application contain RDBMS-related specifications such as:
you have to consider more structural changes, such as moving the burden of joins or cascading deletes to the application itself or - even better - rethink your tables (and models) in a way that works without these costly operations. You can find good tutorials and hands-on learning resources on data modeling with Cassandra here and here.
In some cases the best approach is to bypass the "model layer" altogether and directly execute CQL code on your Session object, taking care of manually handling the results in case of "read" queries.
For example, the Session is needed to use Batches, LWTs and work with TTL.
In the application example we have a feature that allows users to
people field for a party. However, we don't want
these operations to succeed if the number seen by users on their browsers
does not match the stored value anymore (think race conditions and concurrent
access to the application: we certainly don't want risking this counter to
go below zero!).
A possible (if perhaps not optimal, performance-wise) solution to this problem
is offered by using Lightweight Transactions. Essentially we want to run the
CQL equivalent of
people of that row, but only if the current value is so-and-so. Report back whether the update succeeded."
This is achieved, in the appropriate view function, by the following code, which retrieves the database session and runs "raw CQL" on it:
from django.db import connection # ... ... def change_party_people(request, city, id, prev_value, delta): delta_num = int(delta) cursor = connection.cursor() change_applied = cursor.execute( 'UPDATE party SET people = %s WHERE city=%s AND id=%s IF people = %s', ( delta_num + prev_value, city, uuid.UUID(id), # must respect Cassandra type system prev_value, ), ).one()['[applied]'] if not change_applied: lwt_message = '?LWT_FAILED=1' else: lwt_message = '' # etc, etc ...
Configuration and DB access in Django¶
Now let's look at how to configure a Django application to use Astra DB and how the access parameters and secrets are passed to it.
The general project-level settings are given in
parties/parties/settings.py. In that file, you should
first add the
"django_cassandra_engine" item to the
INSTALLED_APPS so that it comes first in the list.
Second, you should replace the definition of the storage engine (sqlite3 by default on newly-created applications). That is, replace the following
with something like
Note that you should add the line
from cassandra.auth import PlainTextAuthProvider earlier in the file.
In the above database connection settings, there are four variables that should be set in a secure and portable manner
(e.g. through use of a
.env file as shown in the application example, or otherwise): they are
KEYSPACE_NAME, the name of the keyspace in your Astra DB instance. Note that you don't have to create the tables yourself. Tables are created based on model definitions when you issue Django's
sync_cassandracommand before running the application the first time (see instructions on the sample application's readme);
AUTH_PASSWORD: these may be either the "clientID/clientSecret" pair from your database token, or alternatively the literal
"token"and the token string starting with
SECURE_BUNDLE_PATH, the full path to the Secure connect bundle for your database. This can be downloaded manually or, as described in the application's readme, through use of Astra CLI along with the rest of the above setup.
Third, you may consider adding the line
CASSANDRA_FALLBACK_ORDER_BY_PYTHON = True. This means that, when a model's
order_by() directive cannot be mapped to CQL according to the table's clustering, the model can fall back to in-code sorting. Although this may be non-optimal in general (especially for large result sets), it can still be a safe and useful choice if you know that the amount of data involved is small.
Dependencies and Cassandra drivers¶
Two dependencies are needed for a Django application backed by Astra DB:
(The other package found in the sample app's
serves the purpose of reading secrets from a
.env file in the Django app's
It should be noted that current versions of the Cassandra engine for Django
automatically installs ScyllaDB's version of the Cassandra drivers, i.e.
scylla-driver. These are a drop-in replacement for the package
by DataStax (
cassandra-driver), meaning that:
- both are imported with statements such as
from cassandra.cluster import Clusterand the like;
- it is unwise to install both at once as that would introduce namespace collisions.
If you prefer to work with the driver package by DataStax, the application would work just fine indeed:
to do so, one can simply uninstall the drivers by Scylla (
pip uninstall scylla-driver), and then install
the desired drivers (
pip install cassandra-driver). Not even a line of code should then be changed.
Note: at the time of writing (January 2023), the differences between the two drivers are little and mostly confined to additional support for Scylla-specific database architecture. As such, there would be no implications on the functionality, nor the performance, of applications based on Astra DB.
Caveats and Troubleshooting¶
In this section we collect a handy list of warnings and things to keep in mind when using Astra DB with Django, whether by migration or when designing an app from scratch.
In Cassandra models, there is no
max_lengthparameter for text fields, corresponding to the absence of such a property for the CQL
Likewise, you should not add the
editable=Falseparameter for primary-key columns when defining models.
For a model class subclassing
DjangoCassandraModelwith a multi-column primary key (regardless of the partition/clustering distinction) one must provide a
get_pk_fieldattribute through a
Metaclass: in this way the Django engine would be able to resolve queries such as
<Model class>.objects.get(pk=...). You can see an example of this in the model quoted earlier. Failure to comply with this requirement would make the application fail to start with an informative error. If you are using the model in a sensible way (from a Cassandra perspective), you can pay little attention to this since you should not, as a matter of fact, be triggering such a query anywhere in your code, implicitly or explicitly.
django-cassandra-enginepackage does support most of the features of its native, RDBMS counterpart; however, in some cases, a little more manual plumbing might be in order. In particular, the native models support fields of type
FileField, which pairs with the form field of the same name and handles upload of files by storing the actual file content on disk and a path to it on DB. The Cassandra engine has no such facility, requiring you to manually handle what happens once the endpoint has received file uploads via a form POST (you can still use the form field, though). A similar consideration holds for the more specific
Once the application is ready and the DB has been synchronized with it (using
./manage.py sync_cassandraor equivalent command), you will still see warnings about a number of "unapplied migrations". You can ignore these warnings (incidentally, the
migratecommand is not even supported by the Cassandra engine, being supplanted by
If you change the model and try to run the application, or forget to run the
sync_cassandramanagement operation altogether, changes are you will see the application crash with no messages or with just a unhelpful
Segmentation fault (core dumped)message. In this case, please make sure that (1) your database is not in "Hibernated" state, (2) you have launched a sync operation after all changes to any model.
If you use a model's
filter(...)method but with a filtering condition (a
WHEREclause) that is not a good match to the structure of your database table, the application will most likely function, but possibly exhibit bad performance. It is your responsibility to make sure that usage of models does not sweep violations of data modeling best practices under the rug.
As remarked above, if you request objects to be sorted in a way that is not compliant with the structure of your table, you can still enable a fallback behaviour whereby the rows are sorted post-retrieval in Python code (you do this through
settings.py, but you should do this only if there are few rows involved). Don't be alarmed if you still see something like the following in the application logs (the warning would be a true exception if you hadn't enabled the fallback):
- Django homepage, djangoproject.com;
- The sample application referenced throughout this page, "partyfinder";
- Another Django application using Cassandra from DataStax' Sample App Gallery: a simple standard blog engine.