How to improve 2 million data query speed in Django RESTful APIs

How to improve 2 million data query speed in Django RESTful APIs

I have a scientific research publications data of 2 Million records. I used django restframework to write apis for searching the data in title and abstract. This is taking me 12 seconds while using postgres as db, but if I used MongoDB as db, it goes down to 6seconds.
But even 6 seconds sounds a lot of waiting for user to me. I indexed the title and abstract, but abstract indexing failed because some of the abstract texts are too lengthy.
Here is the django Model using MongoDB(MongoEngine as ODM):
class Journal(Document):
title = StringField()
journal_title = StringField()
abstract = StringField()
full_text = StringField()
pub_year = IntField()
pub_date = DateTimeField()
pmid = IntField()
link = StringField()

How do I improve the query performance, what stack makes the search and retrieval more faster?.


Solution 1:

Some pointers about optimisation for the Django ORM with Postgres:

  • Use db_index=True on fields that will be search upon often and have some degree of repetition between entries, like “title”.
  • Use values() and values_list() to select only the columns you want from a QuerySet.
  • If you’re doing full text search in any of those columns (like a contains query), bear in mind that Django has support for full text search directly on a Postgres database.
  • Use print queryset.query to check what kind of SQL query is going into your database and if it can be improved upon.
  • Many Postgres optimisation techniques rely in custom SQL queries that can be made in Django by using RawSQL expressions.
  • Remember that there are many, many ways to search for data in a database, be it relational or not-relational in nature. In your case, MongoDB is not “faster” than Postgres, it’s just doing a better job at querying what you really want.
Related:  MongoDB bulk lookup by bson_id