Optimizing Slow Transaction Query in Banking Data Analysis

Introduction

A critical financial institution was facing severe performance degradation in their daily transaction analysis reports. The primary SQL query that aggregates customer transaction data and calculates spending patterns was taking 45-60 minutes to complete, blocking downstream reporting pipelines and delaying crucial business intelligence for management. This query runs multiple times daily, and the delay was directly impacting time-sensitive decision-making for risk assessment and fraud detection teams.

The issue manifested during peak business hours (9 AM – 12 PM), when transaction volumes were highest, causing the entire reporting infrastructure to bottleneck. Users were receiving timeout errors, and stakeholders were forced to make decisions based on stale data—a critical concern in a banking environment where real-time insights are essential for regulatory compliance and operational efficiency.

Why We Need To Do

Root Cause Analysis

The problematic query was structured as follows:

SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(t.transaction_amount) as total_spent,
COUNT(t.transaction_id) as transaction_count,
AVG(t.transaction_amount) as avg_transaction,
MAX(t.transaction_date) as last_transaction_date
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id
LEFT JOIN transaction_details td ON t.transaction_id = td.transaction_id
WHERE UPPER(SUBSTR(c.account_type, 1, 2)) = ‘SA’
AND TRUNC(t.transaction_date) BETWEEN TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND TRUNC(SYSDATE)
AND LOWER(c.region) IN (‘north’, ‘south’, ‘east’)
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(t.transaction_amount) > 100000
ORDER BY total_spent DESC;

Identified Performance Bottlenecks

Functions Applied to Indexed ColumnsThe WHERE clause used UPPER(SUBSTR(c.account_type, 1, 2))on the account_type column. Even though account_type was indexed, applying functions prevented the optimizer from using that index, forcing a full table scan of the 2.5 million customer records.
Missing Composite IndexesThe JOIN conditions between customersand transactions tables lacked proper composite indexing. The transactions table contained 450 million records, and without optimized indexes on the join columns, nested loop joins were executing inefficiently.
Function on transaction_date in WHERE Clause The TRUNC(t.transaction_date)function was applied to a column with a date index, which again prevented index utilization. For a table with 450 million records, this meant millions of unnecessary function calls during the scan phase.
Suboptimal JOIN StrategyThe query performed a LEFT JOIN to transaction_details even though specific columns from this table weren’t being selected. This was causing unnecessary data retrieval and increased I/O operations.
Lack of Aggregation Pre-processing The query was aggregating on-the-fly across 450 million transaction records without any pre-aggregation layer, causing massive CPU and memory consumption.

Impact

Performance:45-60 minute execution time during peak hours
Business Impact:Delayed risk assessments and fraud detection alerts
User Experience:Timeout errors and failed report generation
Resource Utilization:Query monopolized database resources, affecting other applications
Compliance Risk:Real-time transaction analysis capabilities were compromised

How Do We Solve

Step 1: Rewrite Predicates to Enable Index Usage

Problem: Functions on indexed columns prevent index utilization.

Solution: Restructure the WHERE clause to allow the optimizer to use existing indexes:

— Optimized approach: Store and compare normalized values
SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(t.transaction_amount) as total_spent,
COUNT(t.transaction_id) as transaction_count,
AVG(t.transaction_amount) as avg_transaction,
MAX(t.transaction_date) as last_transaction_date
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
WHERE c.account_type LIKE ‘SA%’ — Removed UPPER and SUBSTR
AND t.transaction_date >= TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND t.transaction_date < TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’) + 365
AND c.region IN (‘North’, ‘South’, ‘East’) — Case-sensitive, normalized data
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(t.transaction_amount) > 100000
ORDER BY total_spent DESC;

Key Changes: – Replaced UPPER(SUBSTR(c.account_type, 1, 2)) = ‘SA’ with c.account_type LIKE ‘SA%’ (uses index) – Changed TRUNC(t.transaction_date) comparisons to direct date range (uses date index) – Removed unnecessary LEFT JOIN to transaction_details – Switched to INNER JOIN to reduce row cardinality

Step 2: Create Strategic Composite Indexes

Create indexes specifically designed for this query’s access patterns:

— Composite index for customer filtering and joining
CREATE INDEX idx_customers_region_account
ON customers(region, account_type, customer_id);

— Composite index for transaction filtering and joining
CREATE INDEX idx_transactions_customer_date
ON transactions(customer_id, transaction_date, transaction_amount);

— Covering index to avoid table access
CREATE INDEX idx_transactions_customer_amt_date
ON transactions(customer_id, transaction_date, transaction_amount)
COMPRESS;

Rationale: – Index columns in the order they appear in the WHERE clause and JOIN conditions – Include frequently accessed columns in the index (covering index strategy) – This allows the optimizer to perform index-only scans without accessing the base table

Step 3: Implement Pre-aggregation Strategy

Instead of aggregating 450 million rows every query execution, create a materialized view that pre-aggregates data:

— Create a materialized view for daily aggregations
CREATE MATERIALIZED VIEW mv_customer_daily_metrics AS
SELECT
customer_id,
TRUNC(transaction_date) as transaction_day,
SUM(transaction_amount) as daily_total,
COUNT(*) as daily_count,
AVG(transaction_amount) as daily_avg,
MAX(transaction_amount) as daily_max,
MIN(transaction_amount) as daily_min
FROM transactions
GROUP BY customer_id, TRUNC(transaction_date);

— Create index on materialized view
CREATE INDEX idx_mv_daily_metrics ON mv_customer_daily_metrics(customer_id);

— Refresh schedule (e.g., nightly)
BEGIN
DBMS_MVIEW.REFRESH(‘mv_customer_daily_metrics’, ‘C’);
END;
/

Then, modify the main query to use pre-aggregated data:

SELECT
c.customer_id,
c.customer_name,
c.account_type,
SUM(m.daily_total) as total_spent,
SUM(m.daily_count) as transaction_count,
ROUND(SUM(m.daily_total) / SUM(m.daily_count), 2) as avg_transaction,
MAX(m.transaction_day) as last_transaction_date
FROM customers c
INNER JOIN mv_customer_daily_metrics m ON c.customer_id = m.customer_id
WHERE c.account_type LIKE ‘SA%’
AND m.transaction_day >= TO_DATE(‘2024-01-01’, ‘YYYY-MM-DD’)
AND m.transaction_day < TRUNC(SYSDATE)
AND c.region IN (‘North’, ‘South’, ‘East’)
GROUP BY c.customer_id, c.customer_name, c.account_type
HAVING SUM(m.daily_total) > 100000
ORDER BY total_spent DESC;

Benefits: – Queries pre-aggregated daily data instead of raw 450M rows – Reduces data volume by ~99% depending on transaction patterns – Execution time reduced from scanning millions of rows to thousands of daily buckets

Step 4: Update Table Statistics

Ensure the query optimizer has current statistics:

— Gather fresh statistics for accurate execution planning
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘CUSTOMERS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);

DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘TRANSACTIONS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);

DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘MV_CUSTOMER_DAILY_METRICS’,
estimate_percent => 100
);
END;
/

Step 5: Validate Execution Plan

Compare execution plans before and after optimization:

— Before optimization: Full table scan, nested loop joins
EXPLAIN PLAN FOR
SELECT … (original query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

— After optimization: Index range scan, hash joins
EXPLAIN PLAN FOR
SELECT … (optimized query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Expected Improvements: – Change from full table scan to index range scan – Hash joins instead of nested loop joins – Significant reduction in rows processed

Real-Time Scenario: Implementation Timeline

Day 1-2: Analysis and index creation – Identified slow query using AWR (Automatic Workload Repository) reports – Created composite indexes during off-peak hours – Updated table statistics

Day 3-4: Testing with materialized view – Created and populated materialized view – Set up nightly refresh job – Validated accuracy against original query results

Day 5-7: Deployment and monitoring – Deployed optimized query to production – Set up refresh jobs and monitoring – Collected performance metrics

Conclusion

Metric	Before	After	Improvement
Execution Time	45-60 minutes	8-12 seconds	99.7% faster
Rows Scanned	450 million	~50,000	99.98% reduction
CPU Utilization	~85%	~5%	94% reduction
I/O Operations	~2M disk reads	~5K disk reads	99.7% reduction
Query Frequency	6x daily (5.5 hours total)	6x daily (1 minute total)	330x improvement

Business Impact

Real-Time Insights:Risk assessment and fraud detection teams now receive transaction analysis within seconds instead of hours
Resource Efficiency:Database CPU and I/O utilization dropped by 94%, freeing resources for other critical operations
Compliance:Transaction monitoring capabilities now meet regulatory requirements for timely risk detection
Scalability:The optimized approach scales efficiently; even with transaction volume growth to 1 billion records, query execution remains under 15 seconds

Unique Solution Aspects

Composite Index Strategy:Rather than creating single-column indexes, we designed composite indexes following the WHERE clause and JOIN order, enabling both filtering and joining without additional table access.
Materialized View Approach:Pre-aggregating daily metrics reduced the working dataset by 99%, transforming the problem from processing 450M raw records to processing ~30K daily buckets. This is more sustainable than query rewriting alone.
Predicate Rewriting:By replacing functions with simpler, index-friendly predicates (LIKE instead of UPPER+SUBSTR, direct date ranges instead of TRUNC), we enabled the optimizer to leverage existing indexes without requiring new index creation.
Statistics Management:Proactive statistics gathering ensured the optimizer made intelligent decisions about join order and access paths.

Lessons Learned

Functions are Index Killers:Always store normalized data and avoid functions on indexed columns
Pre-aggregation Scales:For queries dealing with massive fact tables, materialized views outperform dynamic aggregation
Composite Indexes Matter:Strategic multi-column indexes are more effective than numerous single-column indexes
Monitoring is Continuous:Even after optimization, ongoing monitoring ensures sustained performance as data volumes grow

This optimization transformed a bottleneck operation into a real-time analytical capability, demonstrating how systematic query analysis and multi-pronged optimization strategies can deliver dramatic performance improvements in banking environments where milliseconds matter.

Key Takeaway

Slow queries rarely have single causes. This case study demonstrates that the most effective optimization approach combines: – Index strategy refinement – Query logic restructuring – Pre-aggregation layers – Statistics management – Continuous monitoring

For data analysts and engineers working with large banking datasets, these principles are universally applicable and can typically deliver 10-100x performance improvements.

Anushka Sur

We take you from best practices to next practices of software implementation

Solutions that Ensure You Stay Ahead of Competition​

Your Partner for Strategic Technology Solutions

Guiding Your Digital Transformation Journey

Introduction

Why We Need To Do

Root Cause Analysis

The problematic query was structured as follows:

Identified Performance Bottlenecks

Functions Applied to Indexed ColumnsThe WHERE clause used UPPER(SUBSTR(c.account_type, 1, 2))on the account_type column. Even though account_type was indexed, applying functions prevented the optimizer from using that index, forcing a full table scan of the 2.5 million customer records.

Missing Composite IndexesThe JOIN conditions between customersand transactions tables lacked proper composite indexing. The transactions table contained 450 million records, and without optimized indexes on the join columns, nested loop joins were executing inefficiently.

Function on transaction_date in WHERE Clause The TRUNC(t.transaction_date)function was applied to a column with a date index, which again prevented index utilization. For a table with 450 million records, this meant millions of unnecessary function calls during the scan phase.

Suboptimal JOIN StrategyThe query performed a LEFT JOIN to transaction_details even though specific columns from this table weren’t being selected. This was causing unnecessary data retrieval and increased I/O operations.

Lack of Aggregation Pre-processing The query was aggregating on-the-fly across 450 million transaction records without any pre-aggregation layer, causing massive CPU and memory consumption.

Impact

Performance:45-60 minute execution time during peak hours

Business Impact:Delayed risk assessments and fraud detection alerts

User Experience:Timeout errors and failed report generation

Resource Utilization:Query monopolized database resources, affecting other applications

Compliance Risk:Real-time transaction analysis capabilities were compromised

How Do We Solve

Step 1: Rewrite Predicates to Enable Index Usage

Problem: Functions on indexed columns prevent index utilization.

Solution: Restructure the WHERE clause to allow the optimizer to use existing indexes:

Step 2: Create Strategic Composite Indexes

Create indexes specifically designed for this query’s access patterns:

— Composite index for customer filtering and joining CREATE INDEX idx_customers_region_account ON customers(region, account_type, customer_id);

— Composite index for transaction filtering and joining CREATE INDEX idx_transactions_customer_date ON transactions(customer_id, transaction_date, transaction_amount);

— Covering index to avoid table access CREATE INDEX idx_transactions_customer_amt_date ON transactions(customer_id, transaction_date, transaction_amount) COMPRESS;

Rationale: – Index columns in the order they appear in the WHERE clause and JOIN conditions – Include frequently accessed columns in the index (covering index strategy) – This allows the optimizer to perform index-only scans without accessing the base table

Step 3: Implement Pre-aggregation Strategy

Instead of aggregating 450 million rows every query execution, create a materialized view that pre-aggregates data:

— Create index on materialized view CREATE INDEX idx_mv_daily_metrics ON mv_customer_daily_metrics(customer_id);

— Refresh schedule (e.g., nightly) BEGIN DBMS_MVIEW.REFRESH(‘mv_customer_daily_metrics’, ‘C’); END; /

Then, modify the main query to use pre-aggregated data:

Benefits: – Queries pre-aggregated daily data instead of raw 450M rows – Reduces data volume by ~99% depending on transaction patterns – Execution time reduced from scanning millions of rows to thousands of daily buckets

Step 4: Update Table Statistics

Ensure the query optimizer has current statistics:

— Gather fresh statistics for accurate execution planning BEGIN DBMS_STATS.GATHER_TABLE_STATS( ownname => ‘SCHEMA_NAME’, tabname => ‘CUSTOMERS’, estimate_percent => 100, method_opt => ‘FOR ALL COLUMNS SIZE AUTO’ );

DBMS_STATS.GATHER_TABLE_STATS( ownname => ‘SCHEMA_NAME’, tabname => ‘TRANSACTIONS’, estimate_percent => 100, method_opt => ‘FOR ALL COLUMNS SIZE AUTO’ );

DBMS_STATS.GATHER_TABLE_STATS( ownname => ‘SCHEMA_NAME’, tabname => ‘MV_CUSTOMER_DAILY_METRICS’, estimate_percent => 100 ); END; /

Step 5: Validate Execution Plan

Compare execution plans before and after optimization:

— Before optimization: Full table scan, nested loop joins EXPLAIN PLAN FOR SELECT … (original query); SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

— After optimization: Index range scan, hash joins EXPLAIN PLAN FOR SELECT … (optimized query); SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Expected Improvements: – Change from full table scan to index range scan – Hash joins instead of nested loop joins – Significant reduction in rows processed

Real-Time Scenario: Implementation Timeline

Day 1-2: Analysis and index creation – Identified slow query using AWR (Automatic Workload Repository) reports – Created composite indexes during off-peak hours – Updated table statistics

Day 3-4: Testing with materialized view – Created and populated materialized view – Set up nightly refresh job – Validated accuracy against original query results

Day 5-7: Deployment and monitoring – Deployed optimized query to production – Set up refresh jobs and monitoring – Collected performance metrics

Conclusion

Metric

Before

After

Improvement

Execution Time

45-60 minutes

8-12 seconds

99.7% faster

Rows Scanned

450 million

~50,000

99.98% reduction

CPU Utilization

~85%

~5%

94% reduction

I/O Operations

~2M disk reads

~5K disk reads

99.7% reduction

Query Frequency

6x daily (5.5 hours total)

6x daily (1 minute total)

330x improvement

Business Impact

Real-Time Insights:Risk assessment and fraud detection teams now receive transaction analysis within seconds instead of hours

Resource Efficiency:Database CPU and I/O utilization dropped by 94%, freeing resources for other critical operations

Compliance:Transaction monitoring capabilities now meet regulatory requirements for timely risk detection

Solutions that Ensure You Stay Ahead of Competition

— Composite index for customer filtering and joining
CREATE INDEX idx_customers_region_account
ON customers(region, account_type, customer_id);

— Composite index for transaction filtering and joining
CREATE INDEX idx_transactions_customer_date
ON transactions(customer_id, transaction_date, transaction_amount);

— Covering index to avoid table access
CREATE INDEX idx_transactions_customer_amt_date
ON transactions(customer_id, transaction_date, transaction_amount)
COMPRESS;

— Create index on materialized view
CREATE INDEX idx_mv_daily_metrics ON mv_customer_daily_metrics(customer_id);

— Refresh schedule (e.g., nightly)
BEGIN
DBMS_MVIEW.REFRESH(‘mv_customer_daily_metrics’, ‘C’);
END;
/

— Gather fresh statistics for accurate execution planning
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘CUSTOMERS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);

DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘TRANSACTIONS’,
estimate_percent => 100,
method_opt => ‘FOR ALL COLUMNS SIZE AUTO’
);

DBMS_STATS.GATHER_TABLE_STATS(
ownname => ‘SCHEMA_NAME’,
tabname => ‘MV_CUSTOMER_DAILY_METRICS’,
estimate_percent => 100
);
END;
/

— Before optimization: Full table scan, nested loop joins
EXPLAIN PLAN FOR
SELECT … (original query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

— After optimization: Index range scan, hash joins
EXPLAIN PLAN FOR
SELECT … (optimized query);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);