UPDATE GLOBAL INDEXES – testing suite

 Oracle  Comments Off on UPDATE GLOBAL INDEXES – testing suite
Aug 302010
 

In order to reproduce the problems, described in my post; and to test the possible solutions, I made a quick testing package. It has some flaws – for example, does not behave nice if the config table is empty – but it is a “quick and dry” solution. I put it here, so I do not loose it 🙂

The idea is: one simple script starts however sql*plus I want (for example 20 or 40).

for i in `seq 1 20`;
do
  sqlplus /nolog @loader.sql &
done;

Then every instance calls the procedure start_single_process. That’s all

Loading data can be started when you execute (form another, “control” session) the procedure start_load. I would always give some “wait”, even small (0.01 sec). If you do not give “wait” at all, it gets ugly.

You can stop the loading at every moment by executing pause_load. Executing stop_load makes all sql*plus sessions to exit.

create or replace package multi_ses_load is

  c_status_start constant number := 1; -- the LOADER sessions should run
  c_status_pause constant number := 0; -- the LOADER sessions should stop
  c_status_exit  constant number := 2; -- the LOADER sessions should exit
  -- every LOADER process should call this procedure. The procedure will wait for the START signal
  procedure start_single_process;

  -- issies the START signal. Parameters:
  --    part_no: in which partition should we load data
  --    wait: how many seconds to wait between 2 inserts
  --    refresh_iterations: how often to refresh the data from the config table
  procedure start_load(in_part_no            in number,
                       in_wait_sec           in number default 0,
                       in_refresh_iterations in number default 10);

  -- changes in which partition should we load data
  procedure change_part(in_part_no in number);

  -- changes how many seconds to wait between 2 inserts
  procedure change_wait(in_wait_sec in number);

  -- changes how often to refresh the data flrom the config table
  procedure change_refresh(in_refresh_iterations in number);

  -- stops the load process until another START signal
  procedure pause_load;

  -- restarts the load without changing any parameters
  procedure restart_load;

  -- makes the loading processes to exit
  procedure stop_load;

end multi_ses_load;

create or replace package body multi_ses_load is

  c_alert_name constant varchar2(20) := 'SESS_ALERT';

  -- every LOADER process should call this procedure. The procedure will wait for the START signal
  procedure start_single_process is
    v_alert_msg          VARCHAR2(1800);
    v_alert_status       PLS_INTEGER;
    v_part_no            number;
    v_wait_sec           number;
    v_refresh_iterations number;
    v_curr_stat          number;
    v_ses_clob           clob;
    v_my_sid             number;
    v_seed               number;
    procedure read_config_vars is
    begin
      -- reed the config variables form teh config table
      select curr_stat, part_no, wait_sec, refresh_iterations
        into v_curr_stat, v_part_no, v_wait_sec, v_refresh_iterations
        from ses_load_parames;
    end;
  
  begin
    -- init the random generator. we will use the current SID and number of seconds since mignight for seed
    Select Sid
      into v_my_sid
      from v$mystat
     where rownum = 1;
    v_seed := (sysdate - trunc(sysdate)) * 1000 * (24 * 60 * 60) + v_my_sid;
    dbms_random.initialize(v_seed);
    -- read the config vars
    read_config_vars;
    while v_curr_stat in (c_status_start, c_status_pause) -- if the current status is START or PAUSE, we proceed; 
     loop
      if v_curr_stat = c_status_pause then
        -- if the status us PAUSE, wait for the signal
        -- we should register the current session as a consumer for the alert
        dbms_alert.register(c_alert_name);
        -- now we can wait for the alert
        dbms_alert.waitone(c_alert_name, v_alert_msg, v_alert_status);
        -- refresh the config vars - they may be changed
        read_config_vars;
        if v_curr_stat = c_status_exit then
          exit;
        end if;
      end if;
      -- do 'v_refresh_iterations' inserts
      for i in 1 .. v_refresh_iterations loop
        -- insert a row into 'v_part_no' partition
        insert into ...
        values (dbms_random.string('P', 32), ...); -- here we should take care of the partition placement
        commit;
        dbms_lock.sleep(v_wait_sec); -- wiat for 'v_wait_sec' seconds
      end loop;
      -- refresh the config vars after whe have done enough iterations 
      read_config_vars;
    end loop;
  end;

  -- issues the START signal. Parameters:
  --    part_no: in which partition should we load data
  --    wait: how many seconds to wait between 2 inserts
  --    refresh_iterations: how often to refresh the data flrom the config table
  procedure start_load(in_part_no            in number,
                       in_wait_sec           in number default 0,
                       in_refresh_iterations in number default 10) is
  begin
    -- we set the parameters in the config table
    update ses_load_parames
       set curr_stat = c_status_start, part_no = in_part_no, wait_sec = in_wait_sec,
           refresh_iterations = in_refresh_iterations;
    commit;
    -- set teh LOADER processes free 🙂
    dbms_alert.signal(c_alert_name, 'Газ!');
    commit;
  end;

  -- changes in which partition should we load data
  procedure change_part(in_part_no in number) is
  begin
    update ses_load_parames
       set part_no = in_part_no;
    commit;
  end;

  -- changes how many seconds to wait between 2 inserts
  procedure change_wait(in_wait_sec in number) is
  begin
    update ses_load_parames
       set wait_sec = in_wait_sec;
    commit;
  end;

  -- changes how often to refresh the data flrom the config table
  procedure change_refresh(in_refresh_iterations in number) is
  begin
    update ses_load_parames
       set refresh_iterations = in_refresh_iterations;
    commit;
  end;

  -- stops the load process until another START signal
  procedure pause_load is
  begin
    update ses_load_parames
       set curr_stat = c_status_pause;
    commit;
  end;

  -- restarts the load without changing any parameters
  procedure restart_load is
  begin
    -- we set the parameters in the config table
    update ses_load_parames
       set curr_stat = c_status_start;
    commit;
    -- set the LOADER processes free 🙂
    dbms_alert.signal(c_alert_name, 'Аре пак на пангара!');
    commit;
  end;

  -- makes the loading processes to exit
  procedure stop_load is
  begin
    -- put the exit flag in teh table
    update ses_load_parames
       set curr_stat = c_status_exit;
    commit;
    -- is some process is in PAUSE, we should alert him to reread the config vars
    dbms_alert.signal(c_alert_name, 'Аре да си одиме, момчета!');
    commit;
  end;

end multi_ses_load;
 Posted by at 18:24

UPDATE GLOBAL INDEXES

 Oracle  Comments Off on UPDATE GLOBAL INDEXES
Aug 302010
 

(this is relevant for 10.2.)

Let me tell you a story. It is about an application, that “freezes” for a few seconds every hour. In fact, identification of the problem took us quite some time. The symptoms are visible on application server level. So we checked this and that, in the app, network, OS, DB…

We noticed the following: this application uses one very busy table (among the others). The information there is kept for a few hours, then can be deleted; but during those few hours we gather lots of gigabytes. The operations on this table are Insert and Select. The table is partitioned by hour, and every hour a DB job is run to purge old data using

alter table ... truncate partition ... UPDATE GLOBAL INDEXES

This UPDATE GLOBAL INDEXES is put, because the primary key uses a global index. For partitioned tables an index can be unique inly if it is global or if the partition kay column is part of the index.

There is one major problem in the scheme. The strange application “freeze” happens every hour at the very same minute, when the TRUNCATE job is started.

So we looked carefully in the database, holding the mentioned table. There we saw that for a few seconds almost all INSERT-ing sessions INSER wait for cursor: pin S wait on X. There is also some library cache lock. We are talking for 4-5-6 seconds, but quite a lot sessions.

* * *

What happens indeed?

Disclaimer: I might be wring here! If you find some mistake, please write me back.

truncate (or, in our case, alter table ... truncate partition) is a DDL statement. Every DDL statement invalidates all cursors on the object (table), that gets modified (altered). Then another session comes, trying to do an insert (we have tens of INSERTing sessions every second). The first INSERT notices the invalid cursor and tries to parse it. We are talking about parsing a single INSERT if a single row in a table – should complete in milliseconds. But for this parse to happen, the session need to “lock” the definition of the table in the library cache. Otherwise ith may parse the query, come up with a plan, but somebody else might change the structure of the table in the same time and the plan may be wrong. So the firs INSERTing session needs a library cache lock.

But the truncate is still running. The insert have to wait for the truncate to finish. The it will receive the lock on the table structure and complete the parse. It all seems waste of time for a simple truncate – it does not change the table structure at all. But it is a DDL. So the first insert is blocked, and waits.

A few milliseconds later another session comes, trying to execute an insert. So the second session has to check if the insert statement is already parsed. Calculates the hash of the statement, goes to the library cache and, just to be sure, puts one instant lock on the hash. In 10g/11g this is done using a mutex. yes, but the first insert is still trying to parse (in fact, waits to begin parsing). And here we have the rare chance to see a real mutex – an operation that is supposed to be so fast, to happen in an instant. This wait is known as cursor: pin S wait on X.

And the same happens with the next session trying to do an insert. And the next. And next…

* * *

Usually a truncate is quite fast. It should be only a data dictionary update, does not depend on the table size. But here we have some malformed truncate, that, other than lowering HWM, goes to fix some index.

After the completion of the tuncate everything goes to normal. But until then, we have a few seconds downtime. And this is every hour.

Oracle have identified this as a problem in bug 7640597. They even made a patch. The idea if the patch is very simple: do not touch the logic, simply the cursor invalidation is postponed until the very end of the truncate. This way all cursor can be re-parsed.

unfortunately, after releasing the patch, some unwanted “side effects” appeared. So Oracle removed it form Metalink. The general solution is in bug 8441239, fixed in 11.2. But it is not backported to older versions as of the writing of this post.

* * *

So we have a few options:

1. We may avoid doing a TRUNCATE (or other DDL – like alter table exchange partition). Instead, we can do a DELETE. This will add some load to the database; generate lots of redo (and undo); and work slower. But it is a simple DML, does not invalidate any cursor.

2. We may try to do the TRUNCATE faster by creating partitions every 30 minutes, or even less. This way every TRUNCATE will have less work on the index. But this only masks the problem – it is not a solution.

3. We may add the partitioning key column to the primary key. Then we may do the index local, and the TRUNCATE will complete in an instant, as we usually expect for a TRUNCATE. But this is not always possible – depends on the data and the application.

4. We may upgrade to 11.2. But this cannot be done easily – a lots of tests are needed

5. We may request a backport of the patch. But then we will need “a serious business justification”. This may take weeks, even months; and we will be the first ones to use it on production.

The solution depends on the situation. I will show you my testing suite, which I used to evaluate the options

Stay tuned!

 Posted by at 17:35