How to improve 2 million data query speed in Django RESTful APIs
I have a scientific research publications data of 2 Million records. I used django restframework to write apis for searching the data in title and abstract. This is taking me 12 seconds while using postgres as db, but if I used MongoDB as db, it goes down to 6seconds.
But even 6 seconds sounds a lot of waiting for user to me. I indexed the title and abstract, but abstract indexing failed because some of the abstract texts are too lengthy.
Here is the django Model using MongoDB(MongoEngine as ODM):
title = StringField()
journal_title = StringField()
abstract = StringField()
full_text = StringField()
pub_year = IntField()
pub_date = DateTimeField()
pmid = IntField()
link = StringField()
How do I improve the query performance, what stack makes the search and retrieval more faster?.
Some pointers about optimisation for the Django ORM with Postgres:
db_index=Trueon fields that will be search upon often and have some degree of repetition between entries, like “title”.
values_list()to select only the columns you want from a QuerySet.
- If you’re doing full text search in any of those columns (like a
containsquery), bear in mind that Django has support for full text search directly on a Postgres database.
print queryset.queryto check what kind of SQL query is going into your database and if it can be improved upon.
- Many Postgres optimisation techniques rely in custom SQL queries that can be made in Django by using
- Remember that there are many, many ways to search for data in a database, be it relational or not-relational in nature. In your case, MongoDB is not “faster” than Postgres, it’s just doing a better job at querying what you really want.