Parallel problems – part 1

 Oracle  Comments Off on Parallel problems – part 1
Mar 272013
 

This one took me a lot of time for debugging. First, the process runs for an hour – hour and a half when all is fine and 4-5 hours when it is not. Second, it is part of an ETL process running in the night – it’s hard to see it running “live”. Third, sometimes it runs just fine, other times it is very slow – with no change in the system or code. And teh worst is, when I try to reproduce the issue, the query always runs fast as expected. and in the end it turned out the problem is not on this statement/process, but in one other, which runs in the same time. But let’s see the raw facts.

Day 1
The compleint is that one of the steps of the nightly ETL needed a lot more time to complete last night than then nights before. The ETL process is running as one relatively complex scheduler job chain with 30+ steps and comeplx dependencies. Fortunately the process is very well instrumented an one can easily see what step started when, what other steps were active at that time and so on.

Of course, nobody have changed anything – the last change in the ETL as last Wednesday, while the problem first appeared in Saturday (and Sunday, and Monday…). In such situations one can get some hints gorm teh graph generated by teh OEM Cloud Control. Here;s what i saw for one “fast” and one “slow” execution:

screenshot-1.

The symptoms are easy to see – only one session is working on this query, there’s no parallel execution. Let’s see the execution plan form the last run (last night):

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                        |                          |       |       |       |    11M(100)|          |       |       |        |      |            |
|   1 |  LOAD AS SELECT                         |                          |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                        |                          |       |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)                  | :TQ10012                 |   445M|  1737G|       |    11M  (1)| 31:41:37 |       |       |  Q1,12 | P->S | QC (RAND)  |
|   4 |     HASH JOIN OUTER BUFFERED            |                          |   445M|  1737G|   779M|    11M  (1)| 31:41:37 |       |       |  Q1,12 | PCWP |            |
|   5 |      PX RECEIVE                         |                          |    33M|  2738M|       |   104K  (2)| 00:17:41 |       |       |  Q1,12 | PCWP |            |
|   6 |       PX SEND HASH                      | :TQ10010                 |    33M|  2738M|       |   104K  (2)| 00:17:41 |       |       |  Q1,10 | P->P | HASH       |
|   7 |        VIEW                             |                          |    33M|  2738M|       |   104K  (2)| 00:17:41 |       |       |  Q1,10 | PCWP |            |
|   8 |         WINDOW SORT                     |                          |    33M|  1636M|  1896M|   104K  (2)| 00:17:41 |       |       |  Q1,10 | PCWP |            |
|   9 |          HASH JOIN RIGHT OUTER          |                          |    33M|  1636M|       |  5799  (17)| 00:00:59 |       |       |  Q1,10 | PCWP |            |
|  10 |           PX RECEIVE                    |                          |    33M|   409M|       |   544  (19)| 00:00:06 |       |       |  Q1,10 | PCWP |            |
|  11 |            PX SEND HASH                 | :TQ10007                 |    33M|   409M|       |   544  (19)| 00:00:06 |       |       |  Q1,07 | P->P | HASH       |
|  12 |             PX BLOCK ITERATOR           |                          |    33M|   409M|       |   544  (19)| 00:00:06 |       |       |  Q1,07 | PCWC |            |
|  13 |              INDEX FAST FULL SCAN       | DIM_xxxx_x_xx_x_xxx_xx_x |    33M|   409M|       |   544  (19)| 00:00:06 |       |       |  Q1,07 | PCWP |            |
|  14 |           PX RECEIVE                    |                          |    33M|  1227M|       |  5092  (15)| 00:00:52 |       |       |  Q1,10 | PCWP |            |
|  15 |            PX SEND HASH                 | :TQ10008                 |    33M|  1227M|       |  5092  (15)| 00:00:52 |       |       |  Q1,08 | P->P | HASH       |
|  16 |             HASH JOIN RIGHT OUTER       |                          |    33M|  1227M|       |  5092  (15)| 00:00:52 |       |       |  Q1,08 | PCWP |            |
|  17 |              PX RECEIVE                 |                          |   152K|  1786K|       |    11   (0)| 00:00:01 |       |       |  Q1,08 | PCWP |            |
|  18 |               PX SEND BROADCAST         | :TQ10000                 |   152K|  1786K|       |    11   (0)| 00:00:01 |       |       |  Q1,00 | P->P | BROADCAST  |
|  19 |                PX BLOCK ITERATOR        |                          |   152K|  1786K|       |    11   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |
|  20 |                 TABLE ACCESS FULL       | DIM_xxxxxxxx             |   152K|  1786K|       |    11   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  21 |              PX BLOCK ITERATOR          |                          |    33M|   849M|       |  5015  (14)| 00:00:51 |       |       |  Q1,08 | PCWC |            |
|  22 |               TABLE ACCESS FULL         | DIM_xxxxxxxx             |    33M|   849M|       |  5015  (14)| 00:00:51 |       |       |  Q1,08 | PCWP |            |
|  23 |      PX RECEIVE                         |                          |   448M|  1713G|       |  5689K  (1)| 16:03:56 |       |       |  Q1,12 | PCWP |            |
|  24 |       PX SEND HASH                      | :TQ10011                 |   448M|  1713G|       |  5689K  (1)| 16:03:56 |       |       |  Q1,11 | P->P | HASH       |
|  25 |        VIEW                             |                          |   448M|  1713G|       |  5689K  (1)| 16:03:56 |       |       |  Q1,11 | PCWP |            |
|  26 |         SORT UNIQUE                     |                          |   448M|    50G|    54G|  5689K  (1)| 16:03:56 |       |       |  Q1,11 | PCWP |            |
|  27 |          WINDOW SORT                    |                          |   448M|    50G|    54G|  5689K  (1)| 16:03:56 |       |       |  Q1,11 | PCWP |            |
|  28 |           PX RECEIVE                    |                          |   448M|    50G|       |   120K (16)| 00:20:21 |       |       |  Q1,11 | PCWP |            |
|  29 |            PX SEND HASH                 | :TQ10009                 |   448M|    50G|       |   120K (16)| 00:20:21 |       |       |  Q1,09 | P->P | HASH       |
|  30 |             HASH JOIN                   |                          |   448M|    50G|       |   120K (16)| 00:20:21 |       |       |  Q1,09 | PCWP |            |
|  31 |              PX RECEIVE                 |                          |    21 |   126 |       |     2   (0)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  32 |               PX SEND BROADCAST         | :TQ10001                 |    21 |   126 |       |     2   (0)| 00:00:01 |       |       |  Q1,01 | P->P | BROADCAST  |
|  33 |                PX BLOCK ITERATOR        |                          |    21 |   126 |       |     2   (0)| 00:00:01 |       |       |  Q1,01 | PCWC |            |
|  34 |                 TABLE ACCESS FULL       | DIM_xxx_xxx              |    21 |   126 |       |     2   (0)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|  35 |              HASH JOIN                  |                          |   448M|    47G|       |   119K (16)| 00:20:12 |       |       |  Q1,09 | PCWP |            |
|  36 |               PX RECEIVE                |                          |     8 |   128 |       |     2   (0)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  37 |                PX SEND BROADCAST        | :TQ10002                 |     8 |   128 |       |     2   (0)| 00:00:01 |       |       |  Q1,02 | P->P | BROADCAST  |
|  38 |                 PX BLOCK ITERATOR       |                          |     8 |   128 |       |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWC |            |
|  39 |                  TABLE ACCESS FULL      | DIM_xxx_xxxxxx           |     8 |   128 |       |     2   (0)| 00:00:01 |       |       |  Q1,02 | PCWP |            |
|  40 |               HASH JOIN                 |                          |   448M|    40G|       |   118K (15)| 00:20:03 |       |       |  Q1,09 | PCWP |            |
|  41 |                PX RECEIVE               |                          |   135 |  2160 |       |     2   (0)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  42 |                 PX SEND BROADCAST       | :TQ10003                 |   135 |  2160 |       |     2   (0)| 00:00:01 |       |       |  Q1,03 | P->P | BROADCAST  |
|  43 |                  PX BLOCK ITERATOR      |                          |   135 |  2160 |       |     2   (0)| 00:00:01 |       |       |  Q1,03 | PCWC |            |
|  44 |                   TABLE ACCESS FULL     | DIM_xxx_xxxx             |   135 |  2160 |       |     2   (0)| 00:00:01 |       |       |  Q1,03 | PCWP |            |
|  45 |                HASH JOIN                |                          |   455M|    34G|       |   117K (15)| 00:19:54 |       |       |  Q1,09 | PCWP |            |
|  46 |                 PX RECEIVE              |                          |    55 |   385 |       |     2   (0)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  47 |                  PX SEND BROADCAST      | :TQ10004                 |    55 |   385 |       |     2   (0)| 00:00:01 |       |       |  Q1,04 | P->P | BROADCAST  |
|  48 |                   PX BLOCK ITERATOR     |                          |    55 |   385 |       |     2   (0)| 00:00:01 |       |       |  Q1,04 | PCWC |            |
|  49 |                    TABLE ACCESS FULL    | DIM_xxxxxxxx             |    55 |   385 |       |     2   (0)| 00:00:01 |       |       |  Q1,04 | PCWP |            |
|  50 |                 HASH JOIN               |                          |   463M|    32G|       |   116K (14)| 00:19:45 |       |       |  Q1,09 | PCWP |            |
|  51 |                  PX RECEIVE             |                          |   152K|  1786K|       |    12   (9)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  52 |                   PX SEND BROADCAST     | :TQ10005                 |   152K|  1786K|       |    12   (9)| 00:00:01 |       |       |  Q1,05 | P->P | BROADCAST  |
|  53 |                    PX BLOCK ITERATOR    |                          |   152K|  1786K|       |    12   (9)| 00:00:01 |       |       |  Q1,05 | PCWC |            |
|  54 |                     TABLE ACCESS FULL   | DIM_xxxxxxxx             |   152K|  1786K|       |    12   (9)| 00:00:01 |       |       |  Q1,05 | PCWP |            |
|  55 |                  HASH JOIN              |                          |   557M|    32G|       |   115K (13)| 00:19:34 |       |       |  Q1,09 | PCWP |            |
|  56 |                   PX RECEIVE            |                          |     2 |    10 |       |     2   (0)| 00:00:01 |       |       |  Q1,09 | PCWP |            |
|  57 |                    PX SEND BROADCAST    | :TQ10006                 |     2 |    10 |       |     2   (0)| 00:00:01 |       |       |  Q1,06 | P->P | BROADCAST  |
|  58 |                     PX BLOCK ITERATOR   |                          |     2 |    10 |       |     2   (0)| 00:00:01 |       |       |  Q1,06 | PCWC |            |
|  59 |                      TABLE ACCESS FULL  | DIM_xxxxxxxxx            |     2 |    10 |       |     2   (0)| 00:00:01 |       |       |  Q1,06 | PCWP |            |
|  60 |                   PX PARTITION RANGE ALL|                          |   557M|    30G|       |   114K (12)| 00:19:22 |     1 |  4429 |  Q1,09 | PCWC |            |
|  61 |                    TABLE ACCESS FULL    | FACT_xxxxxx              |   557M|    30G|       |   114K (12)| 00:19:22 |     1 |  4429 |  Q1,09 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hm, looks a lot of parallel to me. So the optimizer decided to go with a parallel plan, but the execution engine decided to ignore this and do a serial execution..

I decided to run the statement manually, to see what will happen. Well, what happened is a parallel execution compelted in ~70 minutes. o matter what I try, if the hint parallel exists, it runs in parallel! And yes, the hint is in the query in the ETL packages.

I’d like to quote here one of the extremely valuable advices by Jonathan Lewis – it’s about the hint parallel. This hint is not mandatory. It does not instruct the optimizer “make this run with that amount of parallel processes”. No, the hint only… hm… hints “ok, why don’t you check if a better plan could be produced with that amount of parallelism”. The optimizer has the right to say “No”. But this is not the case with this query – the optimizer have created a parallel plan. This is easy to see in v$sql_plan. In other words, the optimizer, which is guilty “by default” for many performance problem, has done the right thing. Therefore the standard approaches like creating baselines or sql profiles cannot help us.

After have spent a few hours trying to reproduce the problem. During the tests everything runs fine. So I gave up for the day. Let’s see what will happen tonight…

Day 2

We have yet another slow execution. This is very annoying. No matter how hard I try to reproduce it manually, it always runs “fast” (relatively fast – for a hour and few minutes). But when it runs as a part of the ETL job chain, we have a slow, serial execution. And the most irritating thing is, the plan in the SQL cache is parallel again, but the execution is serial. And no, we do not have resource management limitations during the night.

Maybe we need more parallel query slaves? Let’s increase the limit from 20 to 32 for tonight:

alter system set parallel_max_servers=32;

Day 3

We have another serial execution 🙁

But it’s Friday. I do not want to touch it, to avoid urgent interventions during the weekend. It may work slower, but nobody is affected – the ETL completes before sunrise 🙂

(to be continued)

 Posted by at 7:36

OBIEE databаse requirements

 Oracle  Comments Off on OBIEE databаse requirements
Mar 222013
 

I am reading the installation guide for Oracle Business Intelligence. And I wonder who is that extremely smart and knowledgeable person that wrote the following in the documentation:

Ensure that your database is not hardened for security purposes. Installing Oracle Business Intelligence against a hardened database is not supported.

It sounds like Oracle Business Intelligence Enterprise Edition Plus – not the cheapest product on the market – is designed to be used only in small shops with less than 3 employees. Because every bigger organization has to take care of the security of the database. Even if it’s a BI/DWH database.

On the other hand, it’s lot easier to write it that way…

 Posted by at 9:41

ORA-16631: Operation requires shutdown of database/instance “”

 Oracle  Comments Off on ORA-16631: Operation requires shutdown of database/instance “”
Mar 052013
 

One more funny bug. We have the following
– 4 databases in the configuration (1 primary and 3 standby)
– we have restarted one of the standbys. Everything works as expected
dgmgrl shows something different:

DGMGRL> show configuration

Configuration - DG

  Protection Mode: MaxAvailability
  Databases:
    primary  - Primary database
    standby1 - Physical standby database
    standby2 - Physical standby database (disabled)
    standby3 - Logical standby database (disabled)

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

(disabled) – WTF? I have to say that we have restarted standby1 – we didn’t even touch the others.

OK, let’s try the easiest solution for every DG problem:

DGMGRL> enable database standby2;

Here comes the surprise:

ORA-16631: Operation requires shutdown of database/instance ""

Errr… which instance do you need restarted?

The only good thing about this crappy error is that it lets you find the following note in MOS:
Metalink note 1258074.1

Ora-16631: Operation Requires Shutdown Of Database Or Instance “” On Physical Standby

Applies to:
Oracle Server – Enterprise Edition – Version: 11.1.0.7 and later [Release: 11.1 and later ]
Information in this document applies to any platform.

Symptoms

ORA-16631: Operation requires shutdown of database/instance “”

Cause
Both the primary and standby were shutdown in an graceful fashion.
Since standby is shutdown before primary, standby is marked as shutdown disabled
standby is restarted first and then the primary.
standby sends a bootstrap request to the primary but since it’s not started yet,
it’s request goes unanswered. When primary is then started and as part of
its bootstrap processing the primary intentionally leaves the standby in a
shutdown disabled state. The FSFO state then transitions to an unsynchronized
state since the standby is not available.

Solution
— on primary —-

dgmgrl /
edit database 'standby db unique name here' set state='ONLINE';
show configuration;
exit

( or )

—- on standby —

sqlplus / as sysdba
shu immediate
startup mount

OK, we have problem with both physical, and logical standby. And we did not restart any of them, nor the primary database. I can only guess that during the restart standby1 sent to primary the “I’m leaving the party” signal. Then primary made a mistake and dropped the connection to all the standbys. After taht, when standby1 has beens started, he sent a bootstrap signal. Primay have added him to the configuration, leaving the others away.
Fortunately the solution worked. Of course I choose the first one. I’m not sure what will happen is I restart standby2, maybe we will loose standby1 again… 🙂

 Posted by at 9:43

PLS-00201: identifier ‘SECRETUSER.ASHVIEWER’ must be declared

 Oracle  Comments Off on PLS-00201: identifier ‘SECRETUSER.ASHVIEWER’ must be declared
Mar 012013
 

It happens when I try to deploy ASH Analytics on one of our production databases. We’ve done this before and there was no issue with it.. It seams this is a “feature” of the latest OEM Cloud control. We’ve patched recently the OMS to 12.1.0.2.1 (PSU Jan’2013) and probably this is the problem.

Usually ASH Analytics is easy to setup. Doug Burns have demonstrated this in his lecture “Falling in Love All Over Again: OEM 12c Performance Page Enhancements” on the latest BGOUG seminar. Generally you need a user with some specific privileges (OEM tells you what is needed). After you create such user, you click on Performance home -> ASH analytics, provle the credentials, and the feature gets deployed by and OEM job. But in the latest release the deployment job fails with

Error while executing the script : oracle.sysman.assistants.common.dbutil.SQLFatalErrorException:
Error creating PL/SQL Object PRVT_AWR_DATA : PACKAGE BODY – PLS-00201: identifier ‘SECRETUSER.ASHVIEWER’ must be declared
Error creating PL/SQL Object PRVT_AWR_DATA : PACKAGE BODY – PL/SQL: Statement ignored
at line number – 2580
Driver SQL Script encountered errors.
Fail

After three failures I knew it was not my fault. And, of course, I could not find anything similar in Metalink nor Google.

Fortunately at the beginning of the job log you can see the scripts that will be executed. Even better: the packages are not wrapped and it is easy the identify the problem. And the problem is the wrong order of execution of the scripts.

The sql scripts can be found on the OEM app server, in plugins/oracle.sysman.db.oms.plugin_12.1.0.3.0/sql/db/latest/instance/. The order in the job is:

prvs_awr_data.sql
prvt_awr_data.sql
prvs_awr_data_cp.sql
prvt_awr_data_cp.sql
dbms_compare_period.sql
prvt_compare_period.sql
eaddm_pkgdef.sql
eaddm_pkgbody.sql
ashviewer_pkgdefs.sql
ashviewer_pkgbodys.sql

The second script fails with the aforementioned error, because it needs the package ashviewer. As you can see, this package is created by the last 2 files. The solution is to run the scripts in the flowing order:

prvs_awr_data.sql

ashviewer_pkgdefs.sql
ashviewer_pkgbodys.sql

prvt_awr_data.sql
prvs_awr_data_cp.sql
prvt_awr_data_cp.sql
dbms_compare_period.sql
prvt_compare_period.sql
eaddm_pkgdef.sql
eaddm_pkgbody.sql

After this ASH Analytics works just fine.

 Posted by at 14:55