Full-Text Search
55 minPostgreSQL provides built-in full-text search capabilities that enable powerful text searching without external search engines. Full-text search converts documents into searchable vectors (tsvector) and queries into search patterns (tsquery), enabling fast, relevance-ranked searches. This built-in capability eliminates the need for external search systems for many applications, reducing complexity and infrastructure requirements.
The tsvector type represents a document in a normalized form optimized for searching. Text is tokenized (broken into words), normalized (lowercased, stemmed), and duplicate words are removed. The to_tsvector() function converts text to tsvector, and you can specify a text search configuration (like 'english') that determines how text is processed. Different languages have different configurations that handle language-specific stemming and stop words appropriately.
The tsquery type represents search queries with support for boolean operators (AND &, OR |, NOT !) and phrase searches. The to_tsquery() function converts search strings to tsquery format. Queries can include wildcards (*) for prefix matching and can specify phrase searches using <-> operator. Understanding tsquery syntax enables you to create sophisticated search queries that match user intent.
GIN indexes on tsvector columns enable fast full-text searches. The @@ operator matches tsvector against tsquery, returning true if the document matches the query. Without indexes, full-text search requires sequential scans, making indexing essential for performance. GIN indexes are particularly efficient for full-text search, enabling fast queries even on large text collections.
Ranking functions (ts_rank, ts_rank_cd) calculate relevance scores for search results, enabling you to sort results by relevance. These functions consider factors like term frequency and proximity. You can customize ranking by adjusting weights for different document sections (title vs body) or using different ranking algorithms. Ranking enables users to find the most relevant results first.
Full-text search supports advanced features including highlighting matches, searching multiple languages, custom text search configurations, and integration with other PostgreSQL features. You can combine full-text search with regular SQL queries, enabling powerful hybrid searches. Understanding full-text search capabilities enables you to build search functionality directly in PostgreSQL without external dependencies.
Key Concepts
- Full-text search uses tsvector for documents and tsquery for search patterns.
- to_tsvector() converts text to searchable vectors.
- GIN indexes on tsvector enable fast full-text searches.
- Ranking functions calculate relevance scores for search results.
- Full-text search supports boolean operators and phrase searches.
Learning Objectives
Master
- Creating and using tsvector columns for full-text search
- Building tsquery search patterns with boolean operators
- Creating GIN indexes for full-text search performance
- Using ranking functions to sort results by relevance
Develop
- Understanding text search algorithms and indexing
- Designing effective search interfaces
- Optimizing full-text search performance
Tips
- Use to_tsvector('english', text) to create searchable vectors.
- Create GIN index: CREATE INDEX ON table USING gin(tsvector_column);
- Use ts_rank() to sort by relevance: ORDER BY ts_rank(tsvector, query) DESC.
- Combine multiple fields: to_tsvector('english', title || ' ' || content).
Common Pitfalls
- Not creating indexes on tsvector columns, causing slow searches.
- Using wrong text search configuration, missing language-specific optimizations.
- Not using ranking, returning results in arbitrary order.
- Not updating tsvector when source text changes, causing stale search indexes.
Summary
- PostgreSQL provides built-in full-text search capabilities.
- tsvector and tsquery enable powerful text searching.
- GIN indexes are essential for full-text search performance.
- Ranking functions enable relevance-based result sorting.
Exercise
Implement full-text search for a blog posts table.
-- Create a blog posts table
CREATE TABLE blog_posts (
id SERIAL PRIMARY KEY,
title VARCHAR(200),
content TEXT,
author VARCHAR(100),
published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample blog posts
INSERT INTO blog_posts (title, content, author) VALUES
('Getting Started with PostgreSQL', 'PostgreSQL is a powerful database system...', 'John Doe'),
('Advanced SQL Techniques', 'Learn about window functions and CTEs...', 'Jane Smith'),
('Database Performance Optimization', 'Tips for improving query performance...', 'John Doe');
-- Create a full-text search index
ALTER TABLE blog_posts ADD COLUMN search_vector tsvector;
UPDATE blog_posts SET search_vector = to_tsvector('english', title || ' ' || content);
CREATE INDEX idx_blog_posts_search ON blog_posts USING gin(search_vector);
-- Perform full-text search
SELECT title, author, ts_rank(search_vector, query) as rank
FROM blog_posts, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
Exercise Tips
- Try different search configurations: to_tsvector('spanish', text) for Spanish.
- Use phrase search: to_tsquery('english', 'postgresql <-> performance').
- Highlight matches: ts_headline('english', content, query) to show matching text.
- Update tsvector automatically: CREATE TRIGGER to update on text changes.