This one took me a lot of time for debugging. First, the process runs for an hour – hour and a half when all is fine and 4-5 hours when it is not. Second, it is part of an ETL process running in the night – it’s hard to see it running “live”. Third, sometimes it runs just fine, other times it is very slow – with no change in the system or code. And teh worst is, when I try to reproduce the issue, the query always runs fast as expected. and in the end it turned out the problem is not on this statement/process, but in one other, which runs in the same time. But let’s see the raw facts.
Day 1
The compleint is that one of the steps of the nightly ETL needed a lot more time to complete last night than then nights before. The ETL process is running as one relatively complex scheduler job chain with 30+ steps and comeplx dependencies. Fortunately the process is very well instrumented an one can easily see what step started when, what other steps were active at that time and so on.
Of course, nobody have changed anything – the last change in the ETL as last Wednesday, while the problem first appeared in Saturday (and Sunday, and Monday…). In such situations one can get some hints gorm teh graph generated by teh OEM Cloud Control. Here;s what i saw for one “fast” and one “slow” execution:
.
The symptoms are easy to see – only one session is working on this query, there’s no parallel execution. Let’s see the execution plan form the last run (last night):
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | 11M(100)| | | | | | |
| 1 | LOAD AS SELECT | | | | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10012 | 445M| 1737G| | 11M (1)| 31:41:37 | | | Q1,12 | P->S | QC (RAND) |
| 4 | HASH JOIN OUTER BUFFERED | | 445M| 1737G| 779M| 11M (1)| 31:41:37 | | | Q1,12 | PCWP | |
| 5 | PX RECEIVE | | 33M| 2738M| | 104K (2)| 00:17:41 | | | Q1,12 | PCWP | |
| 6 | PX SEND HASH | :TQ10010 | 33M| 2738M| | 104K (2)| 00:17:41 | | | Q1,10 | P->P | HASH |
| 7 | VIEW | | 33M| 2738M| | 104K (2)| 00:17:41 | | | Q1,10 | PCWP | |
| 8 | WINDOW SORT | | 33M| 1636M| 1896M| 104K (2)| 00:17:41 | | | Q1,10 | PCWP | |
| 9 | HASH JOIN RIGHT OUTER | | 33M| 1636M| | 5799 (17)| 00:00:59 | | | Q1,10 | PCWP | |
| 10 | PX RECEIVE | | 33M| 409M| | 544 (19)| 00:00:06 | | | Q1,10 | PCWP | |
| 11 | PX SEND HASH | :TQ10007 | 33M| 409M| | 544 (19)| 00:00:06 | | | Q1,07 | P->P | HASH |
| 12 | PX BLOCK ITERATOR | | 33M| 409M| | 544 (19)| 00:00:06 | | | Q1,07 | PCWC | |
| 13 | INDEX FAST FULL SCAN | DIM_xxxx_x_xx_x_xxx_xx_x | 33M| 409M| | 544 (19)| 00:00:06 | | | Q1,07 | PCWP | |
| 14 | PX RECEIVE | | 33M| 1227M| | 5092 (15)| 00:00:52 | | | Q1,10 | PCWP | |
| 15 | PX SEND HASH | :TQ10008 | 33M| 1227M| | 5092 (15)| 00:00:52 | | | Q1,08 | P->P | HASH |
| 16 | HASH JOIN RIGHT OUTER | | 33M| 1227M| | 5092 (15)| 00:00:52 | | | Q1,08 | PCWP | |
| 17 | PX RECEIVE | | 152K| 1786K| | 11 (0)| 00:00:01 | | | Q1,08 | PCWP | |
| 18 | PX SEND BROADCAST | :TQ10000 | 152K| 1786K| | 11 (0)| 00:00:01 | | | Q1,00 | P->P | BROADCAST |
| 19 | PX BLOCK ITERATOR | | 152K| 1786K| | 11 (0)| 00:00:01 | | | Q1,00 | PCWC | |
| 20 | TABLE ACCESS FULL | DIM_xxxxxxxx | 152K| 1786K| | 11 (0)| 00:00:01 | | | Q1,00 | PCWP | |
| 21 | PX BLOCK ITERATOR | | 33M| 849M| | 5015 (14)| 00:00:51 | | | Q1,08 | PCWC | |
| 22 | TABLE ACCESS FULL | DIM_xxxxxxxx | 33M| 849M| | 5015 (14)| 00:00:51 | | | Q1,08 | PCWP | |
| 23 | PX RECEIVE | | 448M| 1713G| | 5689K (1)| 16:03:56 | | | Q1,12 | PCWP | |
| 24 | PX SEND HASH | :TQ10011 | 448M| 1713G| | 5689K (1)| 16:03:56 | | | Q1,11 | P->P | HASH |
| 25 | VIEW | | 448M| 1713G| | 5689K (1)| 16:03:56 | | | Q1,11 | PCWP | |
| 26 | SORT UNIQUE | | 448M| 50G| 54G| 5689K (1)| 16:03:56 | | | Q1,11 | PCWP | |
| 27 | WINDOW SORT | | 448M| 50G| 54G| 5689K (1)| 16:03:56 | | | Q1,11 | PCWP | |
| 28 | PX RECEIVE | | 448M| 50G| | 120K (16)| 00:20:21 | | | Q1,11 | PCWP | |
| 29 | PX SEND HASH | :TQ10009 | 448M| 50G| | 120K (16)| 00:20:21 | | | Q1,09 | P->P | HASH |
| 30 | HASH JOIN | | 448M| 50G| | 120K (16)| 00:20:21 | | | Q1,09 | PCWP | |
| 31 | PX RECEIVE | | 21 | 126 | | 2 (0)| 00:00:01 | | | Q1,09 | PCWP | |
| 32 | PX SEND BROADCAST | :TQ10001 | 21 | 126 | | 2 (0)| 00:00:01 | | | Q1,01 | P->P | BROADCAST |
| 33 | PX BLOCK ITERATOR | | 21 | 126 | | 2 (0)| 00:00:01 | | | Q1,01 | PCWC | |
| 34 | TABLE ACCESS FULL | DIM_xxx_xxx | 21 | 126 | | 2 (0)| 00:00:01 | | | Q1,01 | PCWP | |
| 35 | HASH JOIN | | 448M| 47G| | 119K (16)| 00:20:12 | | | Q1,09 | PCWP | |
| 36 | PX RECEIVE | | 8 | 128 | | 2 (0)| 00:00:01 | | | Q1,09 | PCWP | |
| 37 | PX SEND BROADCAST | :TQ10002 | 8 | 128 | | 2 (0)| 00:00:01 | | | Q1,02 | P->P | BROADCAST |
| 38 | PX BLOCK ITERATOR | | 8 | 128 | | 2 (0)| 00:00:01 | | | Q1,02 | PCWC | |
| 39 | TABLE ACCESS FULL | DIM_xxx_xxxxxx | 8 | 128 | | 2 (0)| 00:00:01 | | | Q1,02 | PCWP | |
| 40 | HASH JOIN | | 448M| 40G| | 118K (15)| 00:20:03 | | | Q1,09 | PCWP | |
| 41 | PX RECEIVE | | 135 | 2160 | | 2 (0)| 00:00:01 | | | Q1,09 | PCWP | |
| 42 | PX SEND BROADCAST | :TQ10003 | 135 | 2160 | | 2 (0)| 00:00:01 | | | Q1,03 | P->P | BROADCAST |
| 43 | PX BLOCK ITERATOR | | 135 | 2160 | | 2 (0)| 00:00:01 | | | Q1,03 | PCWC | |
| 44 | TABLE ACCESS FULL | DIM_xxx_xxxx | 135 | 2160 | | 2 (0)| 00:00:01 | | | Q1,03 | PCWP | |
| 45 | HASH JOIN | | 455M| 34G| | 117K (15)| 00:19:54 | | | Q1,09 | PCWP | |
| 46 | PX RECEIVE | | 55 | 385 | | 2 (0)| 00:00:01 | | | Q1,09 | PCWP | |
| 47 | PX SEND BROADCAST | :TQ10004 | 55 | 385 | | 2 (0)| 00:00:01 | | | Q1,04 | P->P | BROADCAST |
| 48 | PX BLOCK ITERATOR | | 55 | 385 | | 2 (0)| 00:00:01 | | | Q1,04 | PCWC | |
| 49 | TABLE ACCESS FULL | DIM_xxxxxxxx | 55 | 385 | | 2 (0)| 00:00:01 | | | Q1,04 | PCWP | |
| 50 | HASH JOIN | | 463M| 32G| | 116K (14)| 00:19:45 | | | Q1,09 | PCWP | |
| 51 | PX RECEIVE | | 152K| 1786K| | 12 (9)| 00:00:01 | | | Q1,09 | PCWP | |
| 52 | PX SEND BROADCAST | :TQ10005 | 152K| 1786K| | 12 (9)| 00:00:01 | | | Q1,05 | P->P | BROADCAST |
| 53 | PX BLOCK ITERATOR | | 152K| 1786K| | 12 (9)| 00:00:01 | | | Q1,05 | PCWC | |
| 54 | TABLE ACCESS FULL | DIM_xxxxxxxx | 152K| 1786K| | 12 (9)| 00:00:01 | | | Q1,05 | PCWP | |
| 55 | HASH JOIN | | 557M| 32G| | 115K (13)| 00:19:34 | | | Q1,09 | PCWP | |
| 56 | PX RECEIVE | | 2 | 10 | | 2 (0)| 00:00:01 | | | Q1,09 | PCWP | |
| 57 | PX SEND BROADCAST | :TQ10006 | 2 | 10 | | 2 (0)| 00:00:01 | | | Q1,06 | P->P | BROADCAST |
| 58 | PX BLOCK ITERATOR | | 2 | 10 | | 2 (0)| 00:00:01 | | | Q1,06 | PCWC | |
| 59 | TABLE ACCESS FULL | DIM_xxxxxxxxx | 2 | 10 | | 2 (0)| 00:00:01 | | | Q1,06 | PCWP | |
| 60 | PX PARTITION RANGE ALL| | 557M| 30G| | 114K (12)| 00:19:22 | 1 | 4429 | Q1,09 | PCWC | |
| 61 | TABLE ACCESS FULL | FACT_xxxxxx | 557M| 30G| | 114K (12)| 00:19:22 | 1 | 4429 | Q1,09 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hm, looks a lot of parallel to me. So the optimizer decided to go with a parallel plan, but the execution engine decided to ignore this and do a serial execution..
I decided to run the statement manually, to see what will happen. Well, what happened is a parallel execution compelted in ~70 minutes. o matter what I try, if the hint parallel
exists, it runs in parallel! And yes, the hint is in the query in the ETL packages.
I’d like to quote here one of the extremely valuable advices by Jonathan Lewis – it’s about the hint parallel
. This hint is not mandatory. It does not instruct the optimizer “make this run with that amount of parallel processes”. No, the hint only… hm… hints “ok, why don’t you check if a better plan could be produced with that amount of parallelism”. The optimizer has the right to say “No”. But this is not the case with this query – the optimizer have created a parallel plan. This is easy to see in v$sql_plan
. In other words, the optimizer, which is guilty “by default” for many performance problem, has done the right thing. Therefore the standard approaches like creating baselines or sql profiles cannot help us.
After have spent a few hours trying to reproduce the problem. During the tests everything runs fine. So I gave up for the day. Let’s see what will happen tonight…
Day 2
We have yet another slow execution. This is very annoying. No matter how hard I try to reproduce it manually, it always runs “fast” (relatively fast – for a hour and few minutes). But when it runs as a part of the ETL job chain, we have a slow, serial execution. And the most irritating thing is, the plan in the SQL cache is parallel again, but the execution is serial. And no, we do not have resource management limitations during the night.
Maybe we need more parallel query slaves? Let’s increase the limit from 20 to 32 for tonight:
alter system set parallel_max_servers=32;
Day 3
We have another serial execution 🙁
But it’s Friday. I do not want to touch it, to avoid urgent interventions during the weekend. It may work slower, but nobody is affected – the ETL completes before sunrise 🙂
(to be continued)