TL;DR:
Django ORM and internal utilities have rich functionality that allows you to n
-
automate some complex data manipulations,
-
automate processes that rely on multiple models in a way that will not require updates every time the data structure changes
-
collect data from the database quickly without diving into the project very deeply
n In order to do this, we need a few tools like the Django Model `_meta` object and some basic Python code.
Reasoning
There are many examples when it is necessary to automate some tasks with Django models without knowing the model structure. n n Let’s take a look at one example. I will use GDPR compliance for this. Under GDPR, user may ask for the deletion of their data. In that case, we need to remove all data from the database; however, in a large enterprise project, data may be spread across multiple tables in the DB, multiple models or even multiple databases and projects. During one day, new models can be added by multiple developers, and it is very easy to lose track of these changes. Without the use of some very abstract code and some Django internal utilities, it is impossible to create a piece of code that will handle these cases and will not require updates every time the data structure changes
A very simple model structure example
from django.conf import settings
from django.db import models
class Meeting(models.Model):
topic = models.CharField(max_length=200)
participants = models.ManyToManyField(settings.AUTH_USER_MODEL, through='Participation', blank=True)
class Participation(models.Model):
meeting = models.ForeignKey('Meeting', on_delete=models.CASCADE, related_name='participations')
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name='participations')
class Comment(models.Model):
meeting = models.ForeignKey('Meeting', on_delete=models.CASCADE, related_name='comments')
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
description = models.TextField(max_length=3000)
attachment = models.FileField(null=True, blank=True)
class UserDocumentGroup(models.Model):
name = models.CharField(max_length=200)
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
class UserUploadedDocument(models.Model):
group = models.ForeignKey(UserDocumentGroup, on_delete=models.CASCADE)
document = models.FileField()
n This is a very simple basic example of Django models that I will be using throughout this article
Introduction to Django internals
Django have various internal utilities that allow you to access and manipulate the structure of your models without knowing them. Let’s explore some of the possibilities before we dive into solving something. n n For example, it would be good to have the ability to get a list of models from the project and maybe filter them by some arbitrary rules. Luckily, Django has a very simple way to do it. n
from django.apps import apps
all_models = apps.get_models()
import pprint
pprint.pprint(all_models)
# [<class 'django.contrib.auth.models.Permission'>,
# <class 'django.contrib.auth.models.Group'>,
# <class 'django.contrib.auth.models.User'>,
# <class 'django.contrib.contenttypes.models.ContentType'>,
# <class 'meetings.models.Meeting'>,
# <class 'meetings.models.Participation'>,
# <class 'meetings.models.Comment'>,
# <class 'meetings.models.UserUploadedDocument'>]
n As you can see, there are all models, including internal Django models like ContentType. In order to effectively work with this list, we may need some Django Model `_meta` API. n n Minor note: Unfortunately, the Django official documentation regarding this topic is fairly poor – https://docs.djangoproject.com/en/5.1/ref/models/meta/ . A good understanding of this API goes beyond the use of Options.get_fields method. So, if you are interested in this topic, I recommend importing a model and playing around with it in a Django shell. It is simple to understand, as long as you play with it a bit, but it takes some time to dive into it. It is also good to check the docs about fields – https://docs.djangoproject.com/en/5.1/ref/models/fields/, and it is good to consult official docs about any API you are going to use and any questions you stumble upon n n So far, I will use some utilities and explain how they work along the way. n n For example, let’s try to filter all models that have a foreign key or a one-to-one field toUsermodel n
def filter_user_related_models(all_models: list[type[models.Model]]) -> list[type[models.Model]]:
user_related_models = []
for model in all_models:
for field in model._meta.get_fields(include_parents=True, include_hidden=True):
if field.one_to_one or field.many_to_one and field.related_model == User:
user_related_models.append(model)
break
return user_related_models
pprint.pprint(filter_user_related_models(all_models))
# [<class 'meetings.models.Participation'>,
# <class 'meetings.models.Comment'>,
# <class 'meetings.models.UserDocumentGroup'>]
n n A few explanations regarding the used code: n
get_fieldsmethod returns a list of all fields of a modelone_to_oneandmany_to_oneare flags that indicate that this field is an FK or O2O field. n
Regarding related fields, I need to stop and explain a little more with examples. Let’s play a little with data in the shell. n n
Meeting._meta.get_field("comments").many_to_one
# Out[43]: False
Comment._meta.get_field("meeting").many_to_one
# Out[45]: True
Meeting._meta.get_field("comments").one_to_many
# Out[44]: True
Comment._meta.get_field("meeting").one_to_many
# Out[46]: False
n n So what happened? The answer to this is simple, but nuanced:get_field and get_fields methods give access not only to the fields of a model, but also to relations, including reversed relations. So if Comment has a FK to Meeting it will be accessible through Comment._meta.get_field("meeting") from Comment side and through Meeting._meta.get_field("comments") from Meetingside, but they are two different fields. n
Comment._meta.get_field("meeting")
# Out[47]: <django.db.models.fields.related.ForeignKey: meeting>
Meeting._meta.get_field("comments")
# Out[48]: <ManyToOneRel: meetings.comment>
n These are very important distinctions. If some model has a FK to another model, that means that on the business level of abstraction, it reflects that this model most likely belongs to another model, but not vice versa. For example,Comment is a part of Meeting, but Meeting is not a part of Comment. Another example can be: if Order has a FK to User, then Order is a part of User, but User is not a part of Order. ManyToManyrelations most of the time reflect a different situation. It is most likely a relation between two separate entities that are not parts of each other. n
Back to GDPR example
For our GDPR example, all of the above means that we need to look at the FK and O2O relations that go down from User. n n First, let’s collect and, for better understanding, visualise what we are working with. There is a very simple tree library that can help us with this: https://treelib.readthedocs.io/en/latest/ n n First, I will collect all models that are part of theUser. Knowing all the information above, I will write a recursive function that will go over all User model relations, collect related models that are part of User, and add them to the tree. n n
def get_traced_relations(
model: type[models.Model],
parent: str | None = None,
tree: Tree | None = None,
) -> Tree:
if not tree:
tree = Tree()
tree.create_node(identifier=str(model), parent=parent)
parent = str(model)
fields = model._meta.get_fields(include_parents=True)
for field in fields:
field_node_id = f"{str(field.related_model)} ({str(field.remote_field)})"
if field.is_relation and (field.one_to_many or field.one_to_one):
tree.create_node(
identifier=field_node_id,
parent=parent,
data={
"model": field.related_model,
"field": field.remote_field,
}
)
get_traced_relations(
model=field.related_model,
parent=field_node_id,
tree=tree,
)
return tree
n n And let’s check the result.Treelibhave a useful api that allows us to visualise the tree. n n
tree = get_traced_relations(User)
tree.show()
# <class 'django.contrib.auth.models.User'>
# ├── <class 'meetings.models.Comment'> (meetings.Comment.user)
# ├── <class 'meetings.models.Participation'> (meetings.Participation.user)
# └── <class 'meetings.models.UserDocumentGroup'> (meetings.UserDocumentGroup.user)
# └── <class 'meetings.models.UserUploadedDocument'> (meetings.UserUploadedDocument.group)
n Treelib library also has a way to make a graphviz diagram from the tree. In this case, it looks like this.
n

n n As you see, it even collected theUserUploadedDocument, which is a part of UserDocumentGroup that is a part of User. n n Now, let’s actually collect some data for the user. There are multiple ways to do it, but in order to keep it simple, I will use a very basic approach. To do it, we will need a few utilities n n First, we need to have a function that will return a path from the tree root to the node. n n
def path_to_node(tree: Tree, node: Node) -> list[Node]:
path = [node]
if (parent := tree.parent(node.identifier)):
path.extend(path_to_node(tree, parent))
return path
n Second is just to get a lookup for building a queryset. Just a regular lookup that is used in cases likeUserUploadedDocument.objects.filter(group__user=user) n n
def get_lookup_for_node(path: list[Node]) -> str:
return "__".join(node.data["field"].name for node in path if not node.is_root())
n Now let’s see what we have. I exclude root specifically, because root is aUsermodel, so we don’t need to query it. n
for node in tree.all_nodes():
if not node.is_root():
print(
node.data["model"],
"lookup",
get_lookup_for_node(path_to_node(tree, node))
)
# <class 'meetings.models.Participation'> lookup user
# <class 'meetings.models.Comment'> lookup user
# <class 'meetings.models.UserDocumentGroup'> lookup user
# <class 'meetings.models.UserUploadedDocument'> lookup group__user
n Now having all of this, we can actually do the query. But to use this, we actually need to know what we are searching for. For example, we have 2 file fields:
attachmentinCommentmodeldocumentinUserUploadedDocumentmodel n
As we agreed before, because all of these models are related to `User` via FK, we can assume that these files are presumably uploaded by User. So let’s create a function that will find every document that was uploaded by the user. First, let’s create some data, for simplicity, i will usemodel-bakery (https://github.com/model-bakers/model_bakery) library n n
from model_bakery import baker
user = baker.make("auth.User")
meeting = baker.make("meetings.Meeting")
baker.make("meetings.Comment", meeting=meeting, user=user)
baker.make("meetings.Comment", meeting=meeting, user=user)
document_group = baker.make("meetings.UserDocumentGroup", user=user)
baker.make("meetings.UserUploadedDocument", group=document_group)
baker.make("meetings.UserUploadedDocument", group=document_group)
n In order to find all files uploaded by the user, we can walk over the tree nodes, find models that have a file field in them and query them from the database. n n
from django.db import models
def get_all_models_with_file_fields(tree: Tree) -> list[Node]:
models_to_search = []
for node in tree.all_nodes():
if node.is_root():
continue
for field in node.data["model"]._meta.get_fields():
if isinstance(field, models.FileField):
models_to_search.append(node)
break
return models_to_search
get_all_models_with_file_fields(tree)
# [Node(tag=<class 'meetings.models.Comment'> (meetings.Comment.user), identifier=<class 'meetings.models.Comment'> (meetings.Comment.user), data={'model': <class 'meetings.models.Comment'>, 'field': <django.db.models.fields.related.ForeignKey: user>}),
# Node(tag=<class 'meetings.models.UserUploadedDocument'> (meetings.UserUploadedDocument.group), identifier=<class 'meetings.models.UserUploadedDocument'> (meetings.UserUploadedDocument.group), data={'model': <class 'meetings.models.UserUploadedDocument'>, 'field': <django.db.models.fields.related.ForeignKey: group>})]
n n As you can see, we have 2 models that have a file field. Which is exactly correct accordingly to the model’s declaration. Now, let’s just collect the needed data n
def collect_data_for_user(user: User, tree: Tree, nodes: list[Node]) -> dict[type[models.Model], list[models.Model]]:
data = {}
for node in nodes:
model = node.data["model"]
data[model] = model.objects.filter(
**{get_lookup_for_node(path_to_node(tree, node)): user}
)
return data
collect_data_for_user(user, tree, desired_nodes)
# Out[41]:
# {meetings.models.Comment: <QuerySet [<Comment: Comment object (4)>, <Comment: Comment object (5)>]>,
# meetings.models.UserUploadedDocument: <QuerySet [<UserUploadedDocument: UserUploadedDocument object (4)>, <UserUploadedDocument: UserUploadedDocument object (5)>]>}
n And there you go. Found every instance of every model in the database that has a file field and is related to the user that we specified. Now, in case we are obliged by gdpr to delete user data, we can rely on this script regardless of how many models we have and how many times these models were changed by developers. We don’t need to update this script every time we change the model structure, nor don’t need to track every minor change in code, etc. However, in case of deletion, precautionary measures must be taken, because every automated tool made in this manner tends to scrap to much data
Conclusion
Django internal utilities are a very powerful tool for automation. There are many use cases where they can help a lot, such as business intelligence, data analysis and automation of processes. The example above is very basic, yet it can demonstrate the power of these tools.