(to part 1)
Day N
OK, now I’m ready for the big gun – Metalink notes 729903.1, 400886.1 and 444164.1. In those notes I found a way to trace the parallel execution. It’s not very well explained, so the interpretation of the logs will be real guesswork (have this in mind – i may be wrong).
I’ve asked the developers to enable _px_trace
in the ETL code. One emergency release and we’re ready to wait for tonight execution.
Day N+1
F*CK! Last night the execution was fast 🙁
The only change in the code is enabling _px_trace
for the session, which starts this specific query (and disabling it after the query completion). Nothing else!
Fast or slow, I can’t wait to take a look at the trace and see if it is readable. Remember, I’m not an expert on reading PX traces.
This is what I found at line 5694 (yes, I’ve scanned the first 5693 lines without finding something related to my case):
2013-02-20 01:53:58.837069*:PX_Granule:kxfr.c@5598:kxfrFlushObjs(): flushed 4436 objects and 0 temp objects
kxfrialo [ 24520/ 23550]
best object 0x21f7ab9ce0
hgt:0 blks:21872380 acp:0 nds:1 thr:4
kxfrialo [ 24520/ 0]
threads requested = 4 (from kxfrComputeThread())
kxfrialo [ 24520/ 0]
adjusted no. threads = 4 (from kxfrAdjustDOP())
kxfrAllocSlaves [ 24520/ 0]
DOP trace -- call kxfpgsg to get 4 slaves
Looks like we’ll use parallel 4. Remember, for parallel execution X you need at least 2 x X threads – X producers and X consumers.
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@9448:kxfpgsg(): reqthreads:4 KXFPLDBL/KXFPADPT/ load balancing:on adaptive:on
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17464:kxfpiinfo(): inst [cpus:mxslv]
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17468:kxfpiinfo(): 1 [16 :32 ]
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@22218:kxfpGetNumActiveSlaves():
number of active slaves on the instance: 24,number of active slaves but available to use: 0
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17785:kxfpclinfo(): inst(load:user:pct:fact:servtarget:queued:started:granted:active:active(free)): aff
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17791:kxfpclinfo(): 1 (30 :1 :100:53 :256 :0 :0 :0 :24 :0 )
load adapt num servers requested to = 4 (from kxfpAdaptDOP())
kxfpgsg [ 24520/ 0]
getting 2 sets of 4 threads, client parallel query execution flg=0x230
If I read this correctly, 24 slaves are already active. Wtf? Let’s see what follows
Height=4, Affinity List Size=0, inst_total=1, coord=1
Insts 1
Threads 4
kxfpg1sg [ 24520/ 0]
q:0x22ffaeb5e0 req_threads:4 nthreads:4 unit:1 #inst:1 normal
jStart:0 jEnd:32 jIncr:1 isGV:0 i:0 instno:1 kxfpilthno:4
kxfpg1srv [ 24520/ 0]
trying to get slave P000 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P001 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P002 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P003 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P004 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P005 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P006 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P007 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P008 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P009 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P010 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P011 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P012 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P013 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P014 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P015 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P016 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P017 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P018 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P019 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P020 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P021 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P022 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
trying to get slave P023 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24520/ 0]
Looks like P000 to P023 are already busy, they do not want to work for us
kxfpg1srv [ 24520/ 0]
trying to get slave P024 on instance 1 for q:0x22ffaeb5e0
slave P024 is local
found slave P024 dp:0x21319a0220 flg:0
kxfpcrel [ 24520/ 0]
Creating local slave 24 flg:30
kxfpcre1 [ 24520/ 0]
Creating slave 24 flg:30
free descriptor found dp:0x21319a0e48
Allocated slave P024 dp:0x21319a0e48 flg:4
Got It. 1 so far.
kxfpg1srv [ 24530/ 10]
trying to get slave P025 on instance 1 for q:0x22ffaeb5e0
slave P025 is local
found slave P025 dp:0x21319a0220 flg:0
kxfpcrel [ 24530/ 0]
Creating local slave 25 flg:30
kxfpcre1 [ 24530/ 0]
Creating slave 25 flg:30
free descriptor found dp:0x21319a0da8
Allocated slave P025 dp:0x21319a0da8 flg:4
Got It. 2 so far.
kxfpg1srv [ 24550/ 20]
trying to get slave P026 on instance 1 for q:0x22ffaeb5e0
slave P026 is local
found slave P026 dp:0x21319a0220 flg:0
kxfpcrel [ 24550/ 0]
Creating local slave 26 flg:30
kxfpcre1 [ 24550/ 0]
Creating slave 26 flg:30
free descriptor found dp:0x21319a0a70
Allocated slave P026 dp:0x21319a0a70 flg:4
Got It. 3 so far.
kxfpg1srv [ 24560/ 10]
trying to get slave P027 on instance 1 for q:0x22ffaeb5e0
slave P027 is local
found slave P027 dp:0x21319a0220 flg:0
kxfpcrel [ 24560/ 0]
Creating local slave 27 flg:30
kxfpcre1 [ 24560/ 0]
Creating slave 27 flg:30
free descriptor found dp:0x21319a09d0
Allocated slave P027 dp:0x21319a09d0 flg:4
Got It. 4 so far.
kxfpPrepareJoin [ 24580/ 20]
grln:3 grlr:3 size:1536 var:486 evntCnt:11 eventSize:466 totSize:2014
kxfpg1sg [ 24580/ 0]
wait reply from qref 0x22fd47e038
kxfpg1sg [ 24580/ 0]
received reply from qref 0x22fd47e038
kxfpg1sg [ 24580/ 0]
wait reply from qref 0x22fd481aa8
kxfpg1sg [ 24580/ 0]
received reply from qref 0x22fd481aa8
kxfpg1sg [ 24580/ 0]
wait reply from qref 0x1f23ebf38
kxfpg1sg [ 24580/ 0]
received reply from qref 0x1f23ebf38
kxfpg1sg [ 24580/ 0]
wait reply from qref 0x1f23ebc90
kxfpg1sg [ 24580/ 0]
received reply from qref 0x1f23ebc90
kxfpg1sg [ 24580/ 0]
got 4 servers (sync), errors=0x0 returning
kxfpg1sg [ 24580/ 0]
q:0x22ffaeb5e0 req_threads:4 nthreads:4 unit:1 #inst:1 normal
jStart:0 jEnd:32 jIncr:1 isGV:0 i:0 instno:1 kxfpilthno:4
OK, we’ve created 4 slaves so far (for the first set). We need 4 more for the second set…
kxfpg1srv [ 24580/ 0]
trying to get slave P000 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P001 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P002 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P003 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P004 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P005 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P006 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P007 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P008 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P009 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P010 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P011 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P012 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P013 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P014 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P015 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P016 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P017 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P018 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P019 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P020 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P021 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P022 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P023 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P024 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P025 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P026 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
trying to get slave P027 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv [ 24580/ 0]
P000-P027 seem to be busy already (actually P024-P027 are busy for us)
trying to get slave P028 on instance 1 for q:0x22ffaeb5e0
slave P028 is local
found slave P028 dp:0x21319a0220 flg:0
kxfpcrel [ 24580/ 0]
Creating local slave 28 flg:30
kxfpcre1 [ 24580/ 0]
Creating slave 28 flg:30
free descriptor found dp:0x21319a0698
Allocated slave P028 dp:0x21319a0698 flg:4
Got It. 1 so far.
kxfpg1srv [ 24600/ 20]
trying to get slave P029 on instance 1 for q:0x22ffaeb5e0
slave P029 is local
found slave P029 dp:0x21319a0220 flg:0
kxfpcrel [ 24600/ 0]
Creating local slave 29 flg:30
kxfpcre1 [ 24600/ 0]
Creating slave 29 flg:30
free descriptor found dp:0x21319a05f8
Allocated slave P029 dp:0x21319a05f8 flg:4
Got It. 2 so far.
kxfpg1srv [ 24610/ 10]
trying to get slave P030 on instance 1 for q:0x22ffaeb5e0
slave P030 is local
found slave P030 dp:0x21319a0220 flg:0
kxfpcrel [ 24610/ 0]
Creating local slave 30 flg:30
kxfpcre1 [ 24610/ 0]
Creating slave 30 flg:30
free descriptor found dp:0x21319a02c0
Allocated slave P030 dp:0x21319a02c0 flg:4
Got It. 3 so far.
kxfpg1srv [ 24630/ 20]
trying to get slave P031 on instance 1 for q:0x22ffaeb5e0
slave P031 is local
found slave P031 dp:0x21319a0220 flg:0
kxfpcrel [ 24630/ 0]
Creating local slave 31 flg:30
kxfpcre1 [ 24630/ 0]
Creating slave 31 flg:30
free descriptor found dp:0x21319a0220
Allocated slave P031 dp:0x21319a0220 flg:4
Got It. 4 so far.
kxfpPrepareJoin [ 24640/ 10]
grln:3 grlr:3 size:1536 var:486 evntCnt:11 eventSize:466 totSize:2014
kxfpg1sg [ 24640/ 0]
wait reply from qref 0x1f23eb740
kxfpg1sg [ 24640/ 0]
received reply from qref 0x1f23eb740
kxfpg1sg [ 24640/ 0]
wait reply from qref 0x1f23eb498
kxfpg1sg [ 24640/ 0]
received reply from qref 0x1f23eb498
kxfpg1sg [ 24640/ 0]
wait reply from qref 0x1f23eb1f0
kxfpg1sg [ 24650/ 10]
received reply from qref 0x1f23eb1f0
kxfpg1sg [ 24650/ 0]
wait reply from qref 0x1f23eaf48
kxfpg1sg [ 24650/ 0]
received reply from qref 0x1f23eaf48
kxfpg1sg [ 24650/ 0]
got 4 servers (sync), errors=0x0 returning
Phew, that was close. We have the other 4 slaves ready – P028-P031. Remember, the limit is set to 32 total.
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10118:kxfpgsg(): Acquired 8 slaves on 1 instances avg height:4 #set:2 qser:893953
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg(): P024 inst 1 spid 31409
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg(): P025 inst 1 spid 31413
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg(): P026 inst 1 spid 31417
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg(): P027 inst 1 spid 31422
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg(): P028 inst 1 spid 31427
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg(): P029 inst 1 spid 31433
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg(): P030 inst 1 spid 31438
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg(): P031 inst 1 spid 31443
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10144:kxfpgsg(): Instance(servers):
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10147:kxfpgsg(): inst:1 #slvs:8
Let’s wait and see if we can get a slow execution tonight.