zoukankan      html  css  js  c++  java
  • Extending Django’s database API to include fulltext search

    Extending Django’s database API to include full-text search | Mercurytide

    Extending Django’s database API to include full-text search

    Published 17th August 2006

     

    Introducing Django

    Django is an open-source web application framework that-to quote its creators-"encourages rapid development and clean, pragmatic design." It is written in Python, and provides components to build high-quality web applications: an object-relational mapper, a full-featured templating system, URL dispatching, session management, authentication, and much more besides.

    One valuable measure of a framework is the ease with which it can be extended to meet one's needs. Here we will introduce Django's database API, and demonstrate its flexibility by extending it to support MySQL's full-text search capabilities.

    The source code included in this article is released into the public domain.

    Django's Database API

    Django's object-relational mapper provides a rich API for building database queries.

    Django's object-relational mapper provides a rich API for building database queries. This API insulates the application developer from the details of the SQL statements, while still allowing queries with complex selection criteria to be made in an efficient manner. Another notable benefit of this approach is that it removes entirely the possibility of SQL injection attacks, as the application developer is no longer inserting values directly into SQL statements.

    Objects are retrieved from the database with a QuerySet, which is an abstraction of an SQL SELECT statement. QuerySets provide methods that narrow down the results to objects matching specific criteria, much like the WHERE clause in an SQL statement-in fact, behind the scenes a QuerySet builds an SQL statement as its methods are called. QuerySet instances are obtained from a model class's Manager instance, which is normally called objects. Here are a few examples of the use of QuerySets:

    # Retrieve the QuerySet containing all articles
    articles = Article.objects.all()

    # Include only articles written before this year
    articles = articles.filter(posted_date__lt='2006-01-01')

    # But leave out those written by me
    articles = articles.exclude(author__exact='Andrew')

    # And finally, sort them by rating and date
    articles = articles.order_by('rating', 'posted_date')

    QuerySets can be filtered and sorted very cheaply, as they are lazily evaluated: these actions manipulate the QuerySet's internal SQL statement, and the statement is not executed until you try to access the results of the QuerySet, by iteration or slicing, for example.

    # Get the top five articles; only now is the database accessed.
    a = articles[:5]

    To extend this interface, we'll develop a Manager subclass and a QuerySet subclass, but first we'll briefly describe MySQL's full text search.

    MySQL and Full Text Search

    MySQL has a built-in full text search engine; while not as powerful as dedicated libraries like Lucene and Xapian, it is integrated into the database engine and query syntax, and so is easy to use from database- backed applications. We'll be looking here only at its support for ‘natural language' queries; see the MySQL Documentation for details of its other features.

    Full text search is enabled by creating a full-text index on a column or set of columns in a table. The index will be automatically updated whenever rows are inserted, updated, or deleted, so the search results will never be out of date. The CREATE INDEX statement is used to make the index:

    CREATE FULLTEXT INDEX index_name ON table_name (column_names)

    Searching is done with the MATCH...AGAINST expression, which gives the column to search and the query:

    MATCH(column_names) AGAINST ('words to find')

    For natural language queries, this evaluates to a number representing the relevance of the row to the given query. A full query with this expression might be:

    SELECT title, MATCH(title, text) AGAINST ('Django framework')
      AS `relevance`
    FROM fulltext_article
    WHERE MATCH(title, text) AGAINST ('Django framework')

    This returns the title and relevance score of all articles that matched the words Django and framework; by default they will be sorted in order of decreasing relevance. Don't worry about the repeated MATCH...AGAINST expressions-MySQL will see that they are the same and only perform the search once. Note particularly that the columns passed to MATCH must be the same columns that were specified when creating the index.

    Extending the Database API with Search

    One of Django's design philosophies is consistency, which is important to apply to extensions as well. To keep the full text search interface consistent with the rest of Django, it should work with Managers and QuerySets, so that it can be used in the same way. In practical terms, that means the programmer should be able to write statements like these:

    Article.objects.search('Django Jazz Guitar')
    Article.objects.filter(posted_date__gt='2006-07-01').search(
         'Django Python')

    And these statements should return QuerySets that can themselves be filtered further if need be. To achieve this, we will develop a QuerySet subclass called SearchQuerySet that provides the search() method and builds the MATCH...AGAINST expression into the SQL statement; and a Manager subclass called SearchManager that returns SearchQuerySet subclasses. Since the Manager class also provides many of the QuerySet's methods for convenience, SearchQuerySet should also provide a search() method to be consistent with this. Here is the code:

    from django.db import models, backend

    class SearchQuerySet(models.query.QuerySet):
         def __init__(self, model=None, fields=None):
             super(SearchQuerySet, self).__init__(model)
             self._search_fields = fields

         def search(self, query):
             meta = self.model._meta

             # Get the table name and column names from the model
             # in `table_name`.`column_name` style
             columns = [meta.get_field(name,
                                                 many_to_many=False).column
                 for name in self._search_fields]
             full_names = ["%s.%s" %
                            (backend.quote_name(meta.db_table),
                             backend.quote_name(column))
                             for column in columns]

             # Create the MATCH...AGAINST expressions
             fulltext_columns = ", ".join(full_names)
             match_expr = ("MATCH(%s) AGAINST (%%s)" %
                                    fulltext_columns)

             # Add the extra SELECT and WHERE options
             return self.extra(select={'relevance': match_expr},
                                     where=[match_expr],
                                     params=[query, query])


    class SearchManager(models.Manager):
         def __init__(self, fields):
             super(SearchManager, self).__init__()
             self._search_fields = fields

         def get_query_set(self):
             return SearchQuerySet(self.model, self._search_fields)

         def search(self, query):
             return self.get_query_set().search(query)

    Here, SearchQuerySet.search() asks Django for the table and column names, builds a MATCH...AGAINST expression, then in one line adds that to the SELECT and WHERE clauses of the query.

    Conveniently, all the real work is done by Django itself.

    Conveniently, all the real work is done by Django itself. The _meta object in a model class stores all the ‘meta information' about the model and its fields, but here we only need the table name and the column names (the SearchQuerySet instance is told which fields are to be searched when it is initialised).

    The QuerySet.extra() method provides an easy way to add extra columns, WHERE clause expressions, and table references to the QuerySet's internal SQL statement: the select parameter maps column aliases to expressions (think of "SELECT expression AS alias"), the where parameter is a list of expressions for the WHERE clause, and the params parameter is a list of values that will be interpolated into the SQL statement, with the required quoting and escaping, in place of the substring ‘%s'.

    What about SearchManager? As said above, it must return SearchQuerySet instances instead of plain QuerySets. Fortunately, the Manager class was written with this sort of specialisation in mind: it has a get_query_set() method that returns an instance of the appropriate QuerySet sub-class, and this method is called whenever a Manager needs to create a new QuerySet instance. Overriding the get_query_set() method to return a SearchQuerySet is trivial. When creating a SearchQuerySet instance, it passes in the fields to search, which were provided in its own constructor. We also wanted SearchManager to implement a search() convenience method, which just delegates to SearchQuerySet.search().

    Using the Search Component

    We'll now demonstrate the use of these subclasses. Here is a simple model that represents articles posted on a web site; so that it can be searched, we create a SearchManager instance and assign it to objects:

    from django.db import models
    from fulltext.search import SearchManager

    class Article(models.Model):
         posted_date = models.DateField(db_index=True)
         title = models.CharField(maxlength=100)
         text = models.TextField()

         # Use a SearchManager for retrieving objects,
         # and tell it which fields to search.
         objects = SearchManager(('title', 'text'))

         class Admin:
             pass

         def __str__(self):
             return "%s (%s)" % (self.title, self.posted_date)

    Articles have a title, body text, and the date they were posted. We will define a FULLTEXT INDEX on the title and text columns in the database, and we pass a tuple of the corresponding field names to the SearchManager instance to declare that here. Here's the SQL statement that creates the index:

    CREATE FULLTEXT INDEX fulltext_article_title_text
         ON fulltext_article (title, text);

    Given a Django project, an application that defines the Article model, and a database populated with suitable articles, the full text search can be easily demonstrated with Python's interactive interpreter:

    >>> # How many articles are there?
    >>> len(Article.objects.all())

    12

    >>> # Find articles about frameworks:
    >>> Article.objects.search('framework')

    [<Article: Introducing Django (2006-07-17)>,
    <Article: Django on Windows HOWTO (2006-04-03)>,
    <Article: Django for non-programmers (2006-05-02)>]


    >>> # Show the relevance score for these articles:
    >>> [(a, a.relevance)
    ... for a in Article.objects.search('framework')]

    [(<Article: Introducing Django (2006-07-17)>, 1.0980),
    (<Article: Django on Windows HOWTO (2006-04-03)>, 0.6015),
    (<Article: Django for non-programmers (2006-05-02)>, 0.5842)]


    >>> # Limit the results to those posted before June:
    >>> Article.objects.search('framework').filter(
    ... posted_date__lt='2006-06-01')

    [<Article: Django on Windows HOWTO (2006-04-03)>,
    <Article: Django for non-programmers (2006-05-02)>]


    >>> # Note that filter() also returns a SearchQuerySet:
    >>> Article.objects.filter(
    ... posted_date__lt='2006-06-01').search('framework')

    [<Article: Django on Windows HOWTO (2006-04-03)>,
    <Article: Django for non-programmers (2006-05-02)>]



    Final Remarks

    I'll now let you in on a secret: since the start of June, Django has supported a search operator for queries that uses MySQL's full text search:

    # This uses the Boolean search mode, not
    # natural language queries
    Article.objects.filter(title__search='+Django -Rails')

    However, the techniques demonstrated in this article can be built upon extend the database API to support any SQL feature, whether that's supporting full text search, grouping and aggregate queries, or any other SQL feature or database-specific extension.

    Technorati tags: Django; Python; MySQL; search; database; and fulltext.

    References

    Mercurytide is a forward thinking, dynamic, and innovative Internet applications development company. We are well-established with a proven track-record within the industry of attracting blue chip clients from around the world. We produce regular white papers on a variety of technology-orientated topics. For more details see the Mercurytide web site.

    View other news from August 2006

  • 相关阅读:
    关于字符函数的一些应用总结
    常用数字函数简单举例
    dump浅析B树索引
    pl/sql developer 登陆sys用户
    关于oracle11g简单的网络配置
    关于oracle数据库读取数据的三种方式
    创建表
    SQL*Plus 与数据库的交互
    Oracle编程艺术--配置环境
    MongoDB--(NoSQL)入门介绍
  • 原文地址:https://www.cnblogs.com/lexus/p/2540935.html
Copyright © 2011-2022 走看看