Apr 012013
 

This post is also available in: Bulgarian

(to part 1)

Day N

OK, now I’m ready for the big gun – Metalink notes 729903.1, 400886.1 and 444164.1. In those notes I found a way to trace the parallel execution. It’s not very well explained, so the interpretation of the logs will be real guesswork (have this in mind – i may be wrong).

I’ve asked the developers to enable _px_trace in the ETL code. One emergency release and we’re ready to wait for tonight execution.

Day N+1
F*CK! Last night the execution was fast 🙁

The only change in the code is enabling _px_trace for the session, which starts this specific query (and disabling it after the query completion). Nothing else!

Fast or slow, I can’t wait to take a look at the trace and see if it is readable. Remember, I’m not an expert on reading PX traces.

This is what I found at line 5694 (yes, I’ve scanned the first 5693 lines without finding something related to my case):

2013-02-20 01:53:58.837069*:PX_Granule:kxfr.c@5598:kxfrFlushObjs(): flushed 4436 objects and 0 temp objects
        kxfrialo                                                       [   24520/ 23550]
          best object 0x21f7ab9ce0
            hgt:0 blks:21872380 acp:0 nds:1 thr:4
kxfrialo                                                       [   24520/     0]
          threads requested = 4 (from kxfrComputeThread())
kxfrialo                                                       [   24520/     0]
          adjusted no. threads = 4 (from kxfrAdjustDOP())
kxfrAllocSlaves                                                [   24520/     0]
          DOP trace -- call kxfpgsg to get 4 slaves

Looks like we’ll use parallel 4. Remember, for parallel execution X you need at least 2 x X threads – X producers and X consumers.

2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@9448:kxfpgsg():         reqthreads:4 KXFPLDBL/KXFPADPT/ load balancing:on adaptive:on
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17464:kxfpiinfo():      inst [cpus:mxslv]
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17468:kxfpiinfo():      1    [16  :32   ]
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@22218:kxfpGetNumActiveSlaves():
        number of active slaves on the instance: 24,number of active slaves but available to use: 0
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17785:kxfpclinfo():     inst(load:user:pct:fact:servtarget:queued:started:granted:active:active(free)): aff
2013-02-20 01:53:58.837069*:PX_Messaging:kxfp.c@17791:kxfpclinfo():     1   (30  :1   :100:53  :256   :0     :0      :0      :24    :0     )
          load adapt num servers requested to = 4 (from kxfpAdaptDOP())
kxfpgsg                                                        [   24520/     0]
          getting 2 sets of 4 threads, client parallel query execution flg=0x230

If I read this correctly, 24 slaves are already active. Wtf? Let’s see what follows

          Height=4, Affinity List Size=0, inst_total=1, coord=1
          Insts     1
          Threads   4
kxfpg1sg                                                       [   24520/     0]
          q:0x22ffaeb5e0 req_threads:4 nthreads:4 unit:1 #inst:1 normal
          jStart:0  jEnd:32  jIncr:1 isGV:0  i:0 instno:1 kxfpilthno:4
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P000 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P001 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P002 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P003 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P004 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P005 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P006 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P007 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P008 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P009 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P010 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P011 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P012 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P013 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P014 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P015 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P016 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P017 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P018 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P019 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P020 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P021 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P022 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]
          trying to get slave P023 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24520/     0]

Looks like P000 to P023 are already busy, they do not want to work for us

kxfpg1srv                                                      [   24520/     0]
          trying to get slave P024 on instance 1 for q:0x22ffaeb5e0
          slave P024 is local
          found slave P024 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24520/     0]
          Creating local slave 24 flg:30
kxfpcre1                                                       [   24520/     0]
          Creating slave 24 flg:30
          free descriptor found dp:0x21319a0e48
          Allocated slave P024 dp:0x21319a0e48 flg:4
          Got It. 1 so far.
kxfpg1srv                                                      [   24530/    10]
          trying to get slave P025 on instance 1 for q:0x22ffaeb5e0
          slave P025 is local
          found slave P025 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24530/     0]
          Creating local slave 25 flg:30
kxfpcre1                                                       [   24530/     0]
          Creating slave 25 flg:30
          free descriptor found dp:0x21319a0da8
          Allocated slave P025 dp:0x21319a0da8 flg:4
          Got It. 2 so far.
kxfpg1srv                                                      [   24550/    20]
          trying to get slave P026 on instance 1 for q:0x22ffaeb5e0
          slave P026 is local
          found slave P026 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24550/     0]
          Creating local slave 26 flg:30
kxfpcre1                                                       [   24550/     0]
          Creating slave 26 flg:30
          free descriptor found dp:0x21319a0a70
          Allocated slave P026 dp:0x21319a0a70 flg:4
          Got It. 3 so far.
kxfpg1srv                                                      [   24560/    10]
          trying to get slave P027 on instance 1 for q:0x22ffaeb5e0
          slave P027 is local
          found slave P027 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24560/     0]
          Creating local slave 27 flg:30
kxfpcre1                                                       [   24560/     0]
          Creating slave 27 flg:30
          free descriptor found dp:0x21319a09d0
          Allocated slave P027 dp:0x21319a09d0 flg:4
          Got It. 4 so far.
kxfpPrepareJoin                                                [   24580/    20]
          grln:3 grlr:3 size:1536 var:486 evntCnt:11 eventSize:466 totSize:2014
kxfpg1sg                                                       [   24580/     0]
          wait reply from qref 0x22fd47e038
kxfpg1sg                                                       [   24580/     0]
          received reply from qref 0x22fd47e038
kxfpg1sg                                                       [   24580/     0]
          wait reply from qref 0x22fd481aa8
kxfpg1sg                                                       [   24580/     0]
          received reply from qref 0x22fd481aa8
kxfpg1sg                                                       [   24580/     0]
          wait reply from qref 0x1f23ebf38
kxfpg1sg                                                       [   24580/     0]
          received reply from qref 0x1f23ebf38
kxfpg1sg                                                       [   24580/     0]
          wait reply from qref 0x1f23ebc90
kxfpg1sg                                                       [   24580/     0]
          received reply from qref 0x1f23ebc90
kxfpg1sg                                                       [   24580/     0]
          got 4 servers (sync), errors=0x0 returning
kxfpg1sg                                                       [   24580/     0]
          q:0x22ffaeb5e0 req_threads:4 nthreads:4 unit:1 #inst:1 normal
          jStart:0  jEnd:32  jIncr:1 isGV:0  i:0 instno:1 kxfpilthno:4

OK, we’ve created 4 slaves so far (for the first set). We need 4 more for the second set…

kxfpg1srv                                                      [   24580/     0]
          trying to get slave P000 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P001 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P002 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P003 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P004 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P005 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P006 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P007 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P008 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P009 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P010 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P011 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P012 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P013 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P014 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P015 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P016 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P017 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P018 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P019 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P020 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P021 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P022 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P023 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P024 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P025 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P026 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]
          trying to get slave P027 on instance 1 for q:0x22ffaeb5e0
kxfpg1srv                                                      [   24580/     0]

P000-P027 seem to be busy already (actually P024-P027 are busy for us)

          trying to get slave P028 on instance 1 for q:0x22ffaeb5e0
          slave P028 is local
          found slave P028 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24580/     0]
          Creating local slave 28 flg:30
kxfpcre1                                                       [   24580/     0]
          Creating slave 28 flg:30
          free descriptor found dp:0x21319a0698
          Allocated slave P028 dp:0x21319a0698 flg:4
          Got It. 1 so far.
kxfpg1srv                                                      [   24600/    20]
          trying to get slave P029 on instance 1 for q:0x22ffaeb5e0
          slave P029 is local
          found slave P029 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24600/     0]
          Creating local slave 29 flg:30
kxfpcre1                                                       [   24600/     0]
          Creating slave 29 flg:30
          free descriptor found dp:0x21319a05f8
          Allocated slave P029 dp:0x21319a05f8 flg:4
          Got It. 2 so far.
kxfpg1srv                                                      [   24610/    10]
          trying to get slave P030 on instance 1 for q:0x22ffaeb5e0
          slave P030 is local
          found slave P030 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24610/     0]
          Creating local slave 30 flg:30
kxfpcre1                                                       [   24610/     0]
          Creating slave 30 flg:30
          free descriptor found dp:0x21319a02c0
          Allocated slave P030 dp:0x21319a02c0 flg:4
          Got It. 3 so far.
kxfpg1srv                                                      [   24630/    20]
          trying to get slave P031 on instance 1 for q:0x22ffaeb5e0
          slave P031 is local
          found slave P031 dp:0x21319a0220 flg:0
kxfpcrel                                                       [   24630/     0]
          Creating local slave 31 flg:30
kxfpcre1                                                       [   24630/     0]
          Creating slave 31 flg:30
          free descriptor found dp:0x21319a0220
          Allocated slave P031 dp:0x21319a0220 flg:4
          Got It. 4 so far.
kxfpPrepareJoin                                                [   24640/    10]
          grln:3 grlr:3 size:1536 var:486 evntCnt:11 eventSize:466 totSize:2014

kxfpg1sg                                                       [   24640/     0]
          wait reply from qref 0x1f23eb740
kxfpg1sg                                                       [   24640/     0]
          received reply from qref 0x1f23eb740
kxfpg1sg                                                       [   24640/     0]
          wait reply from qref 0x1f23eb498
kxfpg1sg                                                       [   24640/     0]
          received reply from qref 0x1f23eb498
kxfpg1sg                                                       [   24640/     0]
          wait reply from qref 0x1f23eb1f0
kxfpg1sg                                                       [   24650/    10]
          received reply from qref 0x1f23eb1f0
kxfpg1sg                                                       [   24650/     0]
          wait reply from qref 0x1f23eaf48
kxfpg1sg                                                       [   24650/     0]
          received reply from qref 0x1f23eaf48
kxfpg1sg                                                       [   24650/     0]
          got 4 servers (sync), errors=0x0 returning

Phew, that was close. We have the other 4 slaves ready – P028-P031. Remember, the limit is set to 32 total.

2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10118:kxfpgsg():        Acquired 8 slaves on 1 instances avg height:4 #set:2 qser:893953
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg():        P024 inst 1 spid 31409
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg():        P025 inst 1 spid 31413
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg():        P026 inst 1 spid 31417
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10128:kxfpgsg():        P027 inst 1 spid 31422
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg():        P028 inst 1 spid 31427
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg():        P029 inst 1 spid 31433
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg():        P030 inst 1 spid 31438
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10137:kxfpgsg():        P031 inst 1 spid 31443
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10144:kxfpgsg():        Instance(servers):
2013-02-20 01:53:58.965839*:PX_Messaging:kxfp.c@10147:kxfpgsg():        inst:1 #slvs:8

Let’s wait and see if we can get a slow execution tonight.

 Posted by at 6:41

Sorry, the comment form is closed at this time.