OK, the server is not that bad. He could handle the load. We are talking about a Dev server, after all. It runs Dev databases.
The problem is, it runs about 30 Dev. At every given time only 5-6 (or even 1-2) are active. But it gets much worse when the time comes for the automatic statistic gathering. Then the server is knocked down to the knees. All 30 or so DBs start fighting for the disks – and the disks cannot handle the massive requests. Then the ASM get severely overloaded and even sops responding for a few seconds. In some extreme cases some of the databases get terminated:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 4 Serial number: 1
ASMB (ospid: 8290): terminating the instance due to error 15064
One more detail: all those 30 databases are created as a copy of one “master” database. Whatever you put in the master – sooner or later gets populated to all the rest. Every developer or a team can recreate his team DB as a current copy of the master. Or a copy of other team DB, but this happens rarely. So, if out master DB knows the statistics have to be gathered at 22:00, sooner or later all Dev databases start gathering stats at 22:00. This is, in fact, the most resource-intensive thing that can happen to the server.
So hare’s what I’ve done:
create or replace procedure p_scramble_stats_start as
v_start_hour char(1);
v_start_min char(2);
BEGIN
-- disable
DBMS_SCHEDULER.DISABLE(name => '"SYS"."MONDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."TUESDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."WEDNESDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."THURSDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."FRIDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."SATURDAY_WINDOW"', force => TRUE);
DBMS_SCHEDULER.DISABLE(name => '"SYS"."SUNDAY_WINDOW"', force => TRUE);
-- "choose" the time
select to_char(ora_hash(instance_name, 6)), to_char(ora_hash(instance_name, 59))
into v_start_hour, v_start_min
from v$instance;
-- set
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."MONDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=MON;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."TUESDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=TUE;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."WEDNESDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=WED;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."THURSDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=THU;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."FRIDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=FRI;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."SATURDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=SAT;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
DBMS_SCHEDULER.SET_ATTRIBUTE(name => '"SYS"."SUNDAY_WINDOW"',
attribute => 'REPEAT_INTERVAL',
value => 'freq=daily;byday=SUN;byhour='||v_start_hour||';byminute='||v_start_min||';bysecond=0');
-- enable
DBMS_SCHEDULER.ENABLE(name => '"SYS"."MONDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."TUESDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."WEDNESDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."THURSDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."FRIDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."SATURDAY_WINDOW"');
DBMS_SCHEDULER.ENABLE(name => '"SYS"."SUNDAY_WINDOW"');
END;
This procedure picks a random hour an minute (from 00:00 to 6:59) based in the current instance name. Then sets all maintenance windows (used for statistics gathering) to be open at that time. In our case the statistics gathering needs 5 to 15 minutes, so that’s OK.
I have done a scheduler job on all DBS job, which runs this procedure every evening at 22:00. This way, no matter which database is copied, the stats gathering time is changed the same day.
This solved the problem for us.