Full Text Search

Postgres has built-in functions to handle Full Text Search queries. This is like a "search engine" within Postgres.

Preparation#

For this guide we'll use the following example data:

idtitleauthordescription
1The Poky Little PuppyJanette Sebring LowreyPuppy is slower than other, bigger animals.
2The Tale of Peter RabbitBeatrix PotterRabbit eats some vegetables.
3TootleGertrude CramptonLittle toy train has big dreams.
4Green Eggs and HamDr. SeussSam has changing food preferences and eats unusually colored food.
5Harry Potter and the Goblet of FireJ.K. RowlingFourth year of school starts, big drama ensues.

Usage#

The functions we'll cover in this guide are:

to_tsvector()#

Converts your data into searchable "tokens". to_tsvector() stands for "to text search vector". For example:

1select to_tsvector('green eggs and ham')
2
3-- Returns 'egg':2 'green':1 'ham':4

Collectively these tokens are called a "document" which Postgres can use for comparisons.

to_tsquery()#

Converts a query string into "tokens" to match. to_tsquery() stands for "to text search query".

This conversion step is important because we will want to "fuzzy match" on keywords. For example if a user searches for "eggs", and a column has the value "egg", we probably still want to return a match.

Match: @@#

The @@ symbol is the "match" symbol for Full Text Search. It returns any matches between a to_tsvector result and a to_tsquery result.

Take the following example:

1select *
2from books
3where title = 'Harry';

The equality symbol above (=) is very "strict" on what it matches. In a full text search context, we might want to find all "Harry Potter" books and so we can rewrite the example above:

1select *
2from books
3where to_tsvector(title) @@ to_tsquery('Harry');

Basic Full Text Queries#

Search a single column#

To find all books where the description contain the word big:

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('big');

Search multiple columns#

To find all books where description or title contain the word little:

1select
2  *
3from
4  books
5where
6  to_tsvector(description || ' ' || title) -- concat columns, but be sure to include a space to separate them!
7  @@ to_tsquery('little');

Match all search words#

To find all books where description contains BOTH of the words little and big, we can use the & symbol:

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('little & big'); -- use & for AND in the search query

Match any search words#

To find all books where description contain ANY of the words little or big, use the | symbol:

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('little | big'); -- use | for OR in the search query

Notice how searching for big includes results with the word bigger (or biggest, etc).

Creating Indexes#

Now that we have Full Text Search working, let's create an index. This will allow Postgres to "build" the documents pre-emptively so that they don't need to be created at the time we execute the query. This will make our queries much faster.

Searchable columns#

Let's create a new column fts inside the books table to store the searchable index of the title and description columns.

We can use a special feature of Postgres called Generated Columns to ensure that the index is updated any time the values in the title and description columns change.

1alter table
2  books
3add column
4  fts tsvector generated always as (to_tsvector('english', description || ' ' || title)) stored;
5
6create index books_fts on books using gin (fts); -- generate the index
7
8select id, fts
9from books;

Search using the new column#

Now that we've created and populated our index, we can search it using the same techniques as before:

1select
2  *
3from
4  books
5where
6  fts @@ to_tsquery('little & big');

Query Operators#

Visit PostgreSQL: Text Search Functions and Operators to learn about additional query operators you can use to do more advanced full text queries, such as:

Proximity: <->#

The proximity symbol is useful for searching for terms that are a certain "distance" apart. For example, to find the phrase big dreams, where the a match for "big" is followed immediately by a match for "dreams":

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('big <-> dreams');

We can also use the <-> to find words within a certain distance of eachother. For example to find year and school within 2 words of each other:

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('year <2> school');

Negation: !#

The negation symbol can be used to find phrases which don't contain a search term. For example, to find records that have the word big but not little:

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('big & !little');

Resources#