Introduction
A critical financial institution was facing severe performance degradation in their daily transaction analysis reports. The primary SQL query that aggregates customer transaction data and calculates spending patterns was taking 45-60 minutes to complete, blocking downstream reporting pipelines and delaying crucial business intelligence for management. This query runs multiple times daily, and the delay was directly impacting time-sensitive decision-making for risk assessment and fraud detection teams.
The issue manifested during peak business hours (9 AM – 12 PM), when transaction volumes were highest, causing the entire reporting infrastructure to bottleneck. Users were receiving timeout errors, and stakeholders were forced to make decisions based on stale data—a critical concern in a banking environment where real-time insights are essential for regulatory compliance and operational efficiency.
Why We Need To Do
Root Cause Analysis
The problematic query was structured as follows:
SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(t.transaction_amount) as total_spent,
COUNT(t.transaction_id) as transaction_count,
AVG(t.transaction_amount) as avg_transaction,
MAX(t.transaction_date) as last_transaction_date
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id
LEFT JOIN transaction_details td ON t.transaction_id = td.transaction_id
WHERE UPPER(SUBSTR(c.account_type, 1, 2)) = ‘SA’
AND TRUNC(t.transaction_date) BETWEEN TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND TRUNC(SYSDATE)
AND LOWER(c.region) IN (‘north’, ‘south’, ‘east’)
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(t.transaction_amount) > 100000
ORDER BY total_spent DESC;
Identified Performance Bottlenecks
-
Functions Applied to Indexed ColumnsThe WHERE clause used UPPER(SUBSTR(c.account_type, 1, 2))on the account_type column. Even though account_type was indexed, applying functions prevented the optimizer from using that index, forcing a full table scan of the 2.5 million customer records.
-
Missing Composite IndexesThe JOIN conditions between customersand transactions tables lacked proper composite indexing. The transactions table contained 450 million records, and without optimized indexes on the join columns, nested loop joins were executing inefficiently.
-
Function on transaction_date in WHERE Clause The TRUNC(t.transaction_date)function was applied to a column with a date index, which again prevented index utilization. For a table with 450 million records, this meant millions of unnecessary function calls during the scan phase.
-
Suboptimal JOIN StrategyThe query performed a LEFT JOIN to transaction_details even though specific columns from this table weren’t being selected. This was causing unnecessary data retrieval and increased I/O operations.
-
Lack of Aggregation Pre-processing The query was aggregating on-the-fly across 450 million transaction records without any pre-aggregation layer, causing massive CPU and memory consumption.
Impact
-
Performance:45-60 minute execution time during peak hours
-
Business Impact:Delayed risk assessments and fraud detection alerts
-
User Experience:Timeout errors and failed report generation
-
Resource Utilization:Query monopolized database resources, affecting other applications
-
Compliance Risk:Real-time transaction analysis capabilities were compromised
How Do We Solve
Step 1: Rewrite Predicates to Enable Index Usage
Problem: Functions on indexed columns prevent index utilization.
Solution: Restructure the WHERE clause to allow the optimizer to use existing indexes:
— Optimized approach: Store and compare normalized values
SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(t.transaction_amount) as total_spent,
COUNT(t.transaction_id) as transaction_count,
AVG(t.transaction_amount) as avg_transaction,
MAX(t.transaction_date) as last_transaction_date
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
WHERE c.account_type LIKE ‘SA%’ — Removed UPPER and SUBSTR
AND t.transaction_date >= TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND t.transaction_date < TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’) + 365
AND c.region IN (‘North’, ‘South’, ‘East’) — Case-sensitive, normalized data
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(t.transaction_amount) > 100000
ORDER BY total_spent DESC;
Key Changes: – Replaced UPPER(SUBSTR(c.account_type, 1, 2)) = ‘SA’ with c.account_type LIKE ‘SA%’ (uses index) – Changed TRUNC(t.transaction_date) comparisons to direct date range (uses date index) – Removed unnecessary LEFT JOIN to transaction_details – Switched to INNER JOIN to reduce row cardinality
Step 2: Create Strategic Composite Indexes
Create indexes specifically designed for this query’s access patterns:
— Composite index for customer filtering and joining
CREATE INDEX idx_customers_region_account
ON customers(region, account_type, customer_id);
— Composite index for transaction filtering and joining
CREATE INDEX idx_transactions_customer_date
ON transactions(customer_id, transaction_date, transaction_amount);
— Covering index to avoid table access
CREATE INDEX idx_transactions_customer_amt_date
ON transactions(customer_id, transaction_date, transaction_amount)
COMPRESS;
Rationale: – Index columns in the order they appear in the WHERE clause and JOIN conditions – Include frequently accessed columns in the index (covering index strategy) – This allows the optimizer to perform index-only scans without accessing the base table
Step 3: Implement Pre-aggregation Strategy
Instead of aggregating 450 million rows every query execution, create a materialized view that pre-aggregates data:
— Create a materialized view for daily aggregations
CREATE MATERIALIZED VIEW mv_customer_daily_metrics AS
SELECT
customer_id,
TRUNC(transaction_date) as transaction_day,
SUM(transaction_amount) as daily_total,
COUNT(*) as daily_count,
AVG(transaction_amount) as daily_avg,
MAX(transaction_amount) as daily_max,
MIN(transaction_amount) as daily_min
FROM transactions
GROUP BY customer_id, TRUNC(transaction_date);
— Create index on materialized view
CREATE INDEX idx_mv_daily_metrics ON mv_customer_daily_metrics(customer_id);
— Refresh schedule (e.g., nightly)
BEGIN
DBMS_MVIEW.REFRESH(‘mv_customer_daily_metrics’, ‘C’);
END;
/
Then, modify the main query to use pre-aggregated data:
SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(m.daily_total) as total_spent,
SUM(m.daily_count) as transaction_count,
ROUND(SUM(m.daily_total) / SUM(m.daily_count), 2) as avg_transaction,
MAX(m.transaction_day) as last_transaction_date
FROM customers c
INNER JOIN mv_customer_daily_metrics m ON c.customer_id = m.customer_id
WHERE c.account_type LIKE ‘SA%’
AND m.transaction_day >= TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND m.transaction_day < TRUNC(SYSDATE)
AND c.region IN (‘North’, ‘South’, ‘East’)
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(m.daily_total) > 100000
ORDER BY total_spent DESC;
Benefits: – Queries pre-aggregated daily data instead of raw 450M rows – Reduces data volume by ~99% depending on transaction patterns – Execution time reduced from scanning millions of rows to thousands of daily buckets
Step 4: Update Table Statistics
Ensure the query optimizer has current statistics:
— Gather fresh statistics for accurate execution planning
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘CUSTOMERS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);
DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘TRANSACTIONS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);
DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘MV_CUSTOMER_DAILY_METRICS’,
estimate_percent => 100
);
END;
/
Step 5: Validate Execution Plan
Compare execution plans before and after optimization:
— Before optimization: Full table scan, nested loop joins
EXPLAIN PLAN FOR
SELECT … (original query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
— After optimization: Index range scan, hash joins
EXPLAIN PLAN FOR
SELECT … (optimized query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Expected Improvements: – Change from full table scan to index range scan – Hash joins instead of nested loop joins – Significant reduction in rows processed
Real-Time Scenario: Implementation Timeline
Day 1-2: Analysis and index creation – Identified slow query using AWR (Automatic Workload Repository) reports – Created composite indexes during off-peak hours – Updated table statistics
Day 3-4: Testing with materialized view – Created and populated materialized view – Set up nightly refresh job – Validated accuracy against original query results
Day 5-7: Deployment and monitoring – Deployed optimized query to production – Set up refresh jobs and monitoring – Collected performance metrics
Conclusion
Metric |
Before |
After |
Improvement |
Execution Time |
45-60 minutes |
8-12 seconds |
99.7% faster |
Rows Scanned |
450 million |
~50,000 |
99.98% reduction |
CPU Utilization |
~85% |
~5% |
94% reduction |
I/O Operations |
~2M disk reads |
~5K disk reads |
99.7% reduction |
Query Frequency |
6x daily (5.5 hours total) |
6x daily (1 minute total) |
330x improvement |
Business Impact
-
Real-Time Insights:Risk assessment and fraud detection teams now receive transaction analysis within seconds instead of hours
-
Resource Efficiency:Database CPU and I/O utilization dropped by 94%, freeing resources for other critical operations
-
Compliance:Transaction monitoring capabilities now meet regulatory requirements for timely risk detection
-
Scalability:The optimized approach scales efficiently; even with transaction volume growth to 1 billion records, query execution remains under 15 seconds
Unique Solution Aspects
-
Composite Index Strategy:Rather than creating single-column indexes, we designed composite indexes following the WHERE clause and JOIN order, enabling both filtering and joining without additional table access.
-
Materialized View Approach:Pre-aggregating daily metrics reduced the working dataset by 99%, transforming the problem from processing 450M raw records to processing ~30K daily buckets. This is more sustainable than query rewriting alone.
-
Predicate Rewriting:By replacing functions with simpler, index-friendly predicates (LIKE instead of UPPER+SUBSTR, direct date ranges instead of TRUNC), we enabled the optimizer to leverage existing indexes without requiring new index creation.
-
Statistics Management:Proactive statistics gathering ensured the optimizer made intelligent decisions about join order and access paths.
Lessons Learned
-
Functions are Index Killers:Always store normalized data and avoid functions on indexed columns
-
Pre-aggregation Scales:For queries dealing with massive fact tables, materialized views outperform dynamic aggregation
-
Composite Indexes Matter:Strategic multi-column indexes are more effective than numerous single-column indexes
-
Monitoring is Continuous:Even after optimization, ongoing monitoring ensures sustained performance as data volumes grow
This optimization transformed a bottleneck operation into a real-time analytical capability, demonstrating how systematic query analysis and multi-pronged optimization strategies can deliver dramatic performance improvements in banking environments where milliseconds matter.
Key Takeaway
Slow queries rarely have single causes. This case study demonstrates that the most effective optimization approach combines: – Index strategy refinement – Query logic restructuring – Pre-aggregation layers – Statistics management – Continuous monitoring
For data analysts and engineers working with large banking datasets, these principles are universally applicable and can typically deliver 10-100x performance improvements.
Recent Posts