Engineering

When Postgres Indexing Went Wrong

Arjun Lall

Sep 23, 2024 • 5 min read

Indexing in Postgres seems simple, but it’s important to understand the basics of how it really works and the best practices for preventing system downtime.

TLDR: Be careful when creating indexes — a lesson I learned the hard way when concurrent indexing failed silently.

Critical incident

At a previous company, we managed a high-volume Postgres instance with billions of rows of transactional data. As we scaled, query performance became a key priority, and one of the first optimizations was adding indexes. To avoid downtime, we used CREATE INDEX CONCURRENTLY, which allows indexing large tables without locking out writes for hours. Initially, p99 query performance improved dramatically.

A few weeks later, another team launched a new feature that was built to rely heavily on the new index. Everything seemed routine—until the traffic spiked.

At first, the problem was subtle. A few queries took longer than expected. But within hours, the load began to spike. Query response times slowed to a crawl, and some requests were timing out.

We couldn’t immediately see why. The index was in place, a quick EXPLAIN ANALYZE confirmed it was being used. But users were still experiencing massive slowdowns, and we were on the brink of a full-scale production outage.

It wasn’t until we checked the server logs did we piece together what happened:

CREATE INDEX CONCURRENTLY idx_email_2019 ON users_2019 (email);
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 54321; blocked by process 54322.

Concurrent indexing can fail (silently)

Concurrent indexing needs more total work than a standard index build and takes much longer to complete. It uses a 2 phase approach that helps avoid locking the table:

Phase 1: A snapshot of the current data gets taken, and the index is built on that.
Phase 2: Postgres then catches up with any changes (inserts, updates, or deletes) that happened during phase 1.

Since this process is asynchronous, the CREATE INDEX command might fail, leaving an incomplete index behind. An “invalid” index is ignored during querying, but this oversight can have serious consequences if not monitored.

postgres=# \d users_emails_2019
       Table "public.users_emails_2019"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
  ...   |            |           |          |
Indexes:
    "idx" btree (email) INVALID

In our case, the issue was amplified by the fact that our data was partitioned. The index had failed on some partitions but not others, leading to a situation where some queries were using the index while others were hitting unindexed partitions. This imbalance resulted in uneven query performance and significantly increased load on the system.

If we hadn’t caught it when we did, we would have faced a full-blown production outage, impacting every user on the platform.

Best practices for Postgres indexing

To help others navigate this terrain, here are some best practices for Postgres indexing that can prevent these issues:

Avoid dangerous operations

Always use the CONCURRENTLY flag when creating indexes in production. Without it, even smaller tables can block writes for unacceptably long, leading to system downtime. While CONCURRENTLY takes more CPU and I/O, the trade-off is worth it to maintain availability. Keep in mind that concurrent index builds can only happen one at a time on the same table, so plan accordingly for large datasets.

Monitor concurrent index creation closely

Don’t take successful index creation for granted. The system table pg_stat_progress_create_index can be queried for progress reporting while indexing is taking place.

postgres=# SELECT * FROM pg_stat_progress_create_index;
-[ RECORD 1 ]------+---------------------------------------
pid                | 896799
datid              | 16402
datname            | postgres
relid              | 17261
index_relid        | 136565
command            | CREATE INDEX CONCURRENTLY
phase              | building index: loading tuples in tree
lockers_total      | 0
lockers_done       | 0
current_locker_pid | 0
blocks_total       | 0
blocks_done        | 0
tuples_total       | 10091384
tuples_done        | 1775295
partitions_total   | 0
partitions_done    | 0

Manually validate indexes

If you don’t check your indexes, you might think you’re able to rely on them when you can’t. And although an invalid index gets ignored during querying, it still consumes update overhead. Common causes for index failures include:

Deadlocks: Index creation might conflict with ongoing transactions, leading to deadlocks.
Disk Space: Large indexes may fail due to insufficient disk space.
Constraint Violations: Creating unique indexes on columns with non-unique data will result in failures.

You can find all invalid indexes by running the following:

SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;

You can also query the pg_stat_all_indexes and pg_statio_all_indexes system views to verify that the index is being accessed.

Fix invalid indexes

Invalid indexes can be recovered using the REINDEX command. It’s the same as dropping and recreating the index, except it would also lock out reads that attempt to use that index (if not specifying CONCURRENTLY). Note that CONCURRENTLY reindexing isn’t supported in versions below Postgres 12.

REINDEX INDEX CONCURRENTLY idx_users_email_2019;

If a problem occurs while rebuilding the indexes, it’d leave behind a new invalid index suffixed with _ccnew. Drop it and retry REINDEX CONCURRENTLY.

postgres=# \d users_2019
       Table "public.tab"
 Column |  Type   | Modifiers
--------+---------+-----------
 col    | integer |
Indexes:
    "users_emails_2019" btree (col) INVALID
    "users_emails_2019_ccnew" btree (col) INVALID

If the invalid index is suffixed with _ccold, it’s the original index that wasn’t fully replaced. You can safely drop it, as the rebuild has succeeded.

Create partition indexes consistently

Newly created partitioned tables or small tables (<100k) can easily just create indexes synchronously on the parent table, and it'd automatically propagate indexes to all partitions, including any newly created ones in the future.

CREATE INDEX idx_users_email ON users (email);

But it’s currently not possible to use the CONCURRENTLY flag when creating an index on the root partitioned table. What you should use instead is the ONLY flag. This tells the parent table to not apply the index recursively to children, so the table isn’t locked.

-- Create an index on the parent table (metadata only operation);
CREATE INDEX idx_users_email ON ONLY users (email);

This creates an invalid index first. Then we can create indexes for each partition and attach them to the parent index:

CREATE INDEX CONCURRENTLY idx_users_email_2019
    ON users_2019 (email);
ALTER INDEX idx_users_email
    ATTACH PARTITION idx_users_email_2019;

CREATE INDEX CONCURRENTLY idx_users_email_2020
    ON users_2020 (email);
ALTER INDEX idx_users_email
    ATTACH PARTITION idx_users_email_2020;

// repeat for all partitions

Only once all partitions are attached, the index for the root table will be marked as valid automatically. The parent itself is just a “virtual” table without any storage, but can serve to ensure all partitions maintain a consistent indexing strategy.

Check the query execution plan

Using the EXPLAIN ANALYZE command provides a comprehensive view of the query execution plan, detailing how Postgres processes your query. This breakdown is essential for verifying that the expected indexes are being utilized effectively.

EXPLAIN ANALYZE SELECT * FROM users_2019 WHERE email = '[email protected]';

Index Scan using idx_users_email_2019 on users_2019  (cost=0.15..0.25 rows=1 width=48) (actual time=0.015..0.018 rows=1 loops=1)
  Index Cond: (email = '[email protected]'::text)
Planning Time: 0.123 ms
Execution Time: 0.028 ms

Remove unused indexes

Sometimes the indexes we add aren’t as valuable as expected. To prune our indexes to optimize write performance, we can check which indexes haven’t been used:

select 
    indexrelid::regclass as index, relid::regclass as table 
from 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
where 
    idx_scan = 0 and indisunique is false;

By implementing these best practices, you can avoid scary mistakes. Remember to monitor, validate, and understand the implications of your indexing strategy. The cost of overlooking these details can be significant, and a proactive approach will help you maintain a stable and efficient database.

At Bemi, we specialize in handling audit trails at large volumes, where storage optimization and the right indexing strategies are crucial. We have to deeply understand Postgres storage and indexing internals to ensure 100% reliability and performance. We’ve had to build out index health monitoring at scale and also automated safeguards to ensure indexes are always valid and queries optimized. In a future blog, I’ll share some of the internal performance tooling and tech we use under the hood.

But when Postgres indexing isn't enough to scale, check out the BemiDB GitHub repo for handling analytical workloads on Postgres.