Last updated: March 16, 2026
When your PostgreSQL query planner selects a suboptimal index scan path, query performance can degrade dramatically. Developers often spend hours analyzing EXPLAIN output, statistics, and configuration settings to understand why the planner made the wrong choice. AI tools now offer practical solutions for diagnosing and resolving these index selection issues faster.
Table of Contents
- Understanding PostgreSQL Index Scan Selection
- Why Index Scan Paths Go Wrong
- Practical Example: Identifying the Wrong Index Choice
- Using AI Tools for Query Analysis
- How AI Tools Analyze Execution Plans
- Common Fixes the AI Might Suggest
- Real-World Debugging Workflow
- Prevention Strategies
- Advanced Analysis: Using pg_stat_statements with AI
- Planner Configuration Tuning
- Building a Query Performance Dashboard
- AI Tool Effectiveness Comparison
- Real-World Example: Production Outage Response
- Building a Local AI Query Analyzer
Understanding PostgreSQL Index Scan Selection
PostgreSQL’s query planner evaluates multiple factors when deciding between index scans, sequential scans, or bitmap scans. The planner considers table statistics, index selectivity estimates, correlation values, and configuration parameters like random_page_cost and effective_cache_size. When these estimates are inaccurate or when multiple indexes exist, the planner may choose a scan path that performs poorly in practice.
A common scenario involves a table with multiple indexes where the planner selects a less efficient index due to misestimated row counts or poor correlation statistics. The planner might believe an index covers fewer rows than it actually does, leading to choosing a sequential scan when an index scan would be faster. Alternatively, the planner might choose an index on a highly selective column while ignoring a more efficient composite index that would reduce the scan further.
Understanding why these mis-selections occur helps you provide better context to AI tools. The more information you can give about your schema, data distribution, and query patterns, the more accurate the AI’s recommendations will be.
Why Index Scan Paths Go Wrong
Several specific conditions commonly cause the PostgreSQL planner to choose suboptimal index scans:
Outdated Statistics: After bulk inserts or large deletes, statistics may not reflect actual data distribution. A column that once had high selectivity might now have low selectivity, but the planner doesn’t know this without updated statistics.
Correlation Issues: PostgreSQL tracks column correlation—how related the physical row order is to the logical column order. High correlation helps index scans perform well. Poor correlation estimates can cause the planner to avoid efficient index scans.
Index Column Order: For composite indexes, the column order matters. An index on (status, customer_id) performs differently than (customer_id, status) depending on your query pattern.
Data Type Mismatches: Implicit type conversions can prevent index usage entirely. If your query compares a numeric column with a string literal, PostgreSQL may skip the index.
Practical Example: Identifying the Wrong Index Choice
Consider an orders table with two indexes:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
A query filtering by both customer_id and status might use the wrong index:
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending'
AND created_at > '2025-01-01';
The planner might choose a sequential scan or a suboptimal index because it underestimates the selectivity of the status = 'pending' condition.
Using AI Tools for Query Analysis
AI tools can analyze EXPLAIN output and suggest improvements. When you paste the query and its execution plan, these tools can identify patterns indicating misaligned index selection.
Step 1: Capture the Execution Plan
Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) to get detailed timing and buffer information:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending'
AND created_at > '2025-01-01';
Step 2: Analyze with AI Assistance
Paste the EXPLAIN output into an AI coding assistant. A good prompt would be:
“Analyze this PostgreSQL execution plan. The query filters by customer_id, status, and created_at. Explain why the planner chose a sequential scan and suggest which index would be more appropriate.”
The AI can identify issues like:
-
Missing composite index for the query pattern
-
Outdated statistics causing poor selectivity estimates
-
Incorrect correlation values affecting index choice
-
Implicit type conversions preventing index usage
-
Suboptimal index column ordering
How AI Tools Analyze Execution Plans
Modern AI coding assistants can parse PostgreSQL execution plans and identify patterns that indicate performance problems. When you share an EXPLAIN ANALYZE output with an AI tool, it can recognize indicators such as high actual row counts compared to estimated rows, excessive buffer reads, or sequential scans on large tables.
The AI examines the plan node by node, understanding the cost estimates at each stage. It looks for discrepancies between estimated and actual row counts—a key indicator that statistics are outdated. It also recognizes when bitmap scans could replace index scans or when index-only scans would reduce I/O.
What to Include in Your AI Query
For the best results, provide the AI with context:
-- Table structure
\d orders
-- Query being analyzed
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 12345 AND status = 'pending';
-- Current indexes on the table
\d orders
Share the complete EXPLAIN output, table definitions, relevant index definitions, and any error messages or unusual behavior you’ve observed. The more context you provide, the more accurate the AI’s analysis will be.
Common Fixes the AI Might Suggest
Create a Composite Index
If your query frequently filters on multiple columns, a composite index often helps:
CREATE INDEX idx_orders_customer_status_created
ON orders(customer_id, status, created_at);
Update Statistics
Run ANALYZE orders; to refresh table statistics. The planner relies on these statistics to estimate row counts.
Adjust Planner Parameters
For complex queries, tweaking parameters can help:
SET random_page_cost = 1.1;
SET effective_cache_size = '4GB';
However, these changes affect all queries, so test thoroughly before applying globally.
Use Index Hints
As a last resort, you can force a specific index:
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending'
AND created_at > '2025-01-01'
USING INDEX idx_orders_customer_status_created;
Real-World Debugging Workflow
A practical approach combines AI analysis with manual verification:
-
Identify slow queries using pg_stat_statements
-
Run EXPLAIN ANALYZE on problematic queries
-
Use AI tools to interpret the plan and suggest indexes
-
Test suggested indexes with proper benchmarking
-
Monitor query performance after changes
Prevention Strategies
Rather than debugging after problems occur, consider proactive measures:
-
Monitor query performance with pg_stat_statements
-
Set up alerts for queries exceeding expected execution times
-
Regularly run ANALYZE on tables with frequent data changes
-
Review index usage with pg_stat_user_indexes
Frequently Asked Questions
What if the fix described here does not work?
Advanced Analysis: Using pg_stat_statements with AI
Combine PostgreSQL’s built-in statistics with AI analysis for systematic performance improvement:
-- First, enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- View slowest queries by total time
SELECT query, calls, mean_exec_time, max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Export for AI analysis
\copy (SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 50) TO '/tmp/slow_queries.csv' CSV;
Feed this CSV to Claude or GPT-4 with the request: “Analyze these 50 slowest PostgreSQL queries and recommend the top 5 to optimize for maximum impact.”
The AI will identify patterns (missing indexes on common columns, poorly written JOINs, inefficient GROUP BY) rather than analyzing each query individually.
Planner Configuration Tuning
Sometimes the issue is not the index but the planner’s perception of cost:
-- View current planner settings
SELECT name, setting, unit FROM pg_settings WHERE name LIKE '%cost%';
-- Expected output:
-- random_page_cost = 4.0 (default)
-- seq_page_cost = 1.0
-- For SSD storage (much faster random access), reduce random_page_cost:
ALTER SYSTEM SET random_page_cost = 1.1;
-- For very large cache servers, reduce effective_cache_size:
ALTER SYSTEM SET effective_cache_size = '64GB';
-- Apply changes
SELECT pg_reload_conf();
-- Test the same slow query again
EXPLAIN (ANALYZE, BUFFERS) SELECT ... FROM ...;
This is often overlooked but can shift the planner’s decisions dramatically. AI tools frequently suggest this after analyzing EXPLAIN output.
Building a Query Performance Dashboard
Track planner effectiveness over time:
import psycopg2
import json
from datetime import datetime
class QueryPerformanceTracker:
def __init__(self, connection_string):
self.conn_str = connection_string
self.metrics_log = []
def track_query(self, query_name, query_sql, schema_context=""):
"""Execute and log query performance."""
conn = psycopg2.connect(self.conn_str)
cursor = conn.cursor()
# Run EXPLAIN ANALYZE
explain_query = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query_sql}"
cursor.execute(explain_query)
explain_result = cursor.fetchone()[0]
# Extract key metrics
planning_time = explain_result[0]['Planning Time']
execution_time = explain_result[0]['Execution Time']
total_rows_scanned = self._extract_total_rows(explain_result[0]['Plan'])
index_scans = self._count_index_scans(explain_result[0]['Plan'])
metric = {
'timestamp': datetime.utcnow().isoformat(),
'query_name': query_name,
'planning_ms': planning_time,
'execution_ms': execution_time,
'total_rows_scanned': total_rows_scanned,
'index_scans': index_scans
}
self.metrics_log.append(metric)
cursor.close()
conn.close()
return metric
def _extract_total_rows(self, plan):
"""Recursively count actual rows returned by all plan nodes."""
rows = plan.get('Actual Rows', 0)
for child in plan.get('Plans', []):
rows += self._extract_total_rows(child)
return rows
def _count_index_scans(self, plan):
"""Count how many index scans in the plan."""
count = 1 if 'Index' in plan.get('Node Type', '') else 0
for child in plan.get('Plans', []):
count += self._count_index_scans(child)
return count
def generate_report(self):
"""Identify improving or degrading queries."""
if len(self.metrics_log) < 2:
return None
trends = {}
for m in self.metrics_log:
name = m['query_name']
if name not in trends:
trends[name] = []
trends[name].append(m)
report = {}
for query_name, measurements in trends.items():
if len(measurements) >= 2:
first = measurements[0]
last = measurements[-1]
improvement = (
(first['execution_ms'] - last['execution_ms']) / first['execution_ms'] * 100
)
report[query_name] = {
'first_exec_ms': first['execution_ms'],
'latest_exec_ms': last['execution_ms'],
'improvement_percent': improvement,
'trend': 'improving' if improvement > 0 else 'degrading'
}
return report
This tracker identifies which optimizations actually worked and which caused regressions.
AI Tool Effectiveness Comparison
| Tool | Index Recommendation | Join Rewrite | Statistics Analysis | Explanation Clarity |
|---|---|---|---|---|
| Claude (Opus) | 9/10 | 8/10 | 9/10 | 10/10 |
| GPT-4 | 7/10 | 8/10 | 6/10 | 8/10 |
| GitHub Copilot | 5/10 | 6/10 | 3/10 | 6/10 |
| ChatGPT | 6/10 | 7/10 | 4/10 | 7/10 |
Claude excels at understanding the reasoning behind the planner’s decisions, while GPT-4 is faster at generating working rewrites.
Real-World Example: Production Outage Response
Scenario: Slow checkout causing 503 errors on e-commerce platform.
Immediate diagnosis using AI:
# 1. Capture slow query from logs
slow_query = """
SELECT o.*, c.*, p.*, COUNT(oi.id) as item_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
LEFT JOIN payments p ON o.id = p.order_id
LEFT JOIN order_items oi ON o.id = oi.order_id
WHERE o.created_at > NOW() - INTERVAL '1 hour'
GROUP BY o.id, c.id, p.id
ORDER BY o.created_at DESC
LIMIT 100;
"""
# 2. Get EXPLAIN output
explain_output = """
Limit (cost=45287.34..45287.59 rows=100)
-> GroupAggregate (cost=45287.34..98345.67 rows=50000)
-> Nested Loop Left Join (cost=1200.45..12345.67 rows=500000)
-> Nested Loop (cost=1200.45..5000.23 rows=50000)
-> Seq Scan on orders o (cost=500.00..2000.00 rows=50000)
Filter: (created_at > now() - '01:00:00'::interval)
-> Seq Scan on customers c (cost=0.00..0.50 rows=1)
-> Seq Scan on payments p (cost=0.00..10000.00 rows=500000)
-> Seq Scan on order_items oi (cost=0.00..50000.00 rows=500000)
"""
# 3. Send to Claude with request for immediate fixes
# Claude identifies:
# - Sequential scan on orders with CREATED_AT filter (should use index)
# - Missing indexes on foreign keys in JOIN conditions
# - GROUP BY on multiple tables causing expensive aggregation
# - Nested loop joins instead of hash joins
# 4. AI-generated optimized query:
optimized = """
SELECT o.id, o.customer_id, o.created_at, c.name, p.id as payment_id,
COUNT(oi.id) as item_count
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
LEFT JOIN (
SELECT DISTINCT order_id, id FROM payments WHERE order_id IS NOT NULL
) p ON o.id = p.order_id
LEFT JOIN order_items oi ON o.id = oi.order_id
WHERE o.created_at > NOW() - INTERVAL '1 hour'
GROUP BY o.id, o.customer_id, o.created_at, c.name, p.id
ORDER BY o.created_at DESC
LIMIT 100;
"""
# 5. Create missing indexes immediately
indexes_to_add = [
"CREATE INDEX idx_orders_created_id ON orders(created_at DESC, id);",
"CREATE INDEX idx_payments_order_id ON payments(order_id);",
"CREATE INDEX idx_order_items_order_id ON order_items(order_id);"
]
This systematic approach turns a panicked outage into a structured response with high-confidence fixes.
Building a Local AI Query Analyzer
Create a tool that combines EXPLAIN capture with local AI analysis:
import subprocess
import anthropic
class LocalQueryAnalyzer:
def __init__(self):
self.client = anthropic.Anthropic()
def analyze_slow_query(self, query, database_url):
"""Full analysis without sending query outside your network."""
# Run EXPLAIN locally
explain_output = self._get_explain(query, database_url)
# Analyze with Claude locally if using local API endpoint
analysis = self.client.messages.create(
model="claude-opus-4-6",
max_tokens=1500,
system="You are a PostgreSQL expert. Analyze EXPLAIN output and provide specific optimization recommendations.",
messages=[{
"role": "user",
"content": f"""Analyze this PostgreSQL performance problem:
QUERY:
{query}
EXPLAIN OUTPUT:
{explain_output}
Provide:
1. Root cause (specific plan nodes causing slowness)
2. Top 3 index recommendations with CREATE INDEX statements
3. Query rewrite if needed
4. Expected improvement percentage"""
}]
)
return analysis.content[0].text
def _get_explain(self, query, database_url):
"""Execute EXPLAIN ANALYZE and capture output."""
cmd = f"psql '{database_url}' -c \"EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) {query}\""
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
return result.stdout
Frequently Asked Questions
How do I know if ANALYZE needs to run on my table?
Check last_analyze timestamp: SELECT schemaname, tablename, last_analyze FROM pg_stat_user_tables; If more than 1% of rows changed since last ANALYZE, run it.
Can I test optimizer changes safely?
Yes, use SET within a transaction to test settings before committing: BEGIN; SET random_page_cost = 1.1; EXPLAIN ...; ROLLBACK;
Should I follow every AI optimization suggestion? No. Benchmark each suggestion in staging first. Some trades-offs (like larger indexes using more cache) have hidden costs.
Could this problem be caused by a recent update?
Yes, updates frequently introduce new bugs or change behavior. Check the tool’s release notes and changelog for recent changes. If the issue started right after an update, consider rolling back to the previous version while waiting for a patch.
How can I prevent this issue from happening again?
Pin your dependency versions to avoid unexpected breaking changes. Set up monitoring or alerts that catch errors early. Keep a troubleshooting log so you can quickly reference solutions when similar problems recur.
Is this a known bug or specific to my setup?
Check the tool’s GitHub Issues page or community forum to see if others report the same problem. If you find matching reports, you will often find workarounds in the comments. If no one else reports it, your local environment configuration is likely the cause.
Should I reinstall the tool to fix this?
A clean reinstall sometimes resolves persistent issues caused by corrupted caches or configuration files. Before reinstalling, back up your settings and project files. Try clearing the cache first, since that fixes the majority of cases without a full reinstall. How often should I update table statistics? For static tables, rarely. For tables with 5%+ daily changes, daily ANALYZE via cron job. High-write tables may need hourly ANALYZE during peak times.
Can AI handle our proprietary query patterns? Yes, if you provide schema and sample data. More context always improves AI analysis. Include table sizes and typical data distribution.