Django Blog #35: Search Algorithms in Django

by Alex
Django Blog #35: Search Algorithms in Django

Reducing and ranking results

Django offers a SearchQuery class to translate queries into a search query object. By default, queries go through shorthand algorithms to better find matches. You can also sort results by relevance. PostgreSQL provides a ranking feature that sorts results based on how often the query text appears and how close they are to each other. Edit the views.py file of the blog application and add the following imports:

from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank

Then take a look at these lines:

results = Post.objects.annotate(
		search=SearchVector('title', 'body',) 
	 ).filter(search=query)

Replace them with these:

search_vector = SearchVector('title', 'body') search_query = SearchQuery(query) 
results = Post.objects.annotate( 
	      search=search_vector,
	      rank=SearchRank(search_vector, search_query)
	 ).filter(search=search_query).order_by('-rank')

This code creates a SeatchQuery object, after that the results are filtered and SearchRank is used to rank them. You can open https://127.0.0.1:8000/blog/search/ in your browser and run some tests to see how the shortening and ranking works.

Query Weighting

You can make certain destinations have more weight when sorting by relevance. For example, code like this would work to rank results primarily by headings rather than by body. Edit the previous lines of the blog application’s views.py file to make it look like this:

search_vector = SearchVector('title', weight='A') + SearchVector('body', weight='B') 
search_query = SearchQuery(query) 
results = Post.objects.annotate( 
		rank=SearchRank(search_vector, search_query)
	 ).filter(rank__gte=0.3).order_by('-rank')

This code uses an additional “weight” for search directions in the title and body fields. The default weights are D, C, B, and A. They correspond to numbers 0.1, 0.2, 0.4, and 1.0, respectively. Let us apply a weight of 1.0 to the title field and 0.4 to the body: matches in the title field will prevail over those in the body. Filter results to display only those with weight greater than 0.3.

Search by trigram similarity

Another search algorithm is trigram similarity. A trigram is a group of three characters. You can measure the similarity of two strings by counting the number of common trigrams. This approach is often used to measure the similarity of words in many languages. To use trigrams in PostgreSQL, you must first install pg_trgm. Use the following command to connect to the database:

psql blog

Then this one to install the pg_trgm extension:

CREATE EXTENSION pg_trgm;

Edit the view and change it to look for trigmas. Edit the views.py file of the blog application and add the following import:

from django.contrib.postgres.search import TrigramSimilarity

Then replace the Post search query with the following lines:

results = Post.objects.annotate(
		similarity=TrigramSimilarity('title', query,) 
	 ).filter(similarity__gt=0.3).order_by('-similarity')

Open https://127.0.0.1:8000/blog/search/ in your browser and check for different trigram searches. For example, type yango and get results that include the word django (there are blog articles with that word). The project now has a powerful search engine. More information about full-text search can be found here https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/search/.

Related Posts

LEAVE A COMMENT