Optimizer Statistics

Tweak the Optimizer with optimal stats

Extended Stats

Oracle 11g introduced some interesting features concerning extended statistics:

Indicate relations

Sometimes columns show relations, i.e. certain combinations never show up. Imagine a hotel booking agency. They have hotels in many categories, from very simple 1-star up to superior 5- and maybe even 6-star hotels listed. One table of their database could list up how many rooms each hotel has in different room categories. Very unlikely that a 1-star hotel lists a luxury presidential suite – or a 6-star has a dormatory room for backpack-travelers with sleeping bags. That is, two columns hotel_id and room_category have typical relations – and others never showing up. Which will prove relevant to execution plans.

So how to make use of this database model knowledge? Hardly a chance before Oracle 11g – but now we can:

dbms_stats.create_extended_stats(
  'booking_schema',           -- schema owner
  'rooms_available',          -- table name
  '(hotel_id, room_category)' -- two related columns
);

Now gather statistics again – and the optimizer will take the relation into consideration. By the way: The same can be achieved during stats gathering using method_opt => 'FOR COLUMNS (hotel_id, room_category)'.

Expression stats

Free text name columns on a web interface – and the user has LOCKED THE CAPS KEY from time to time? 'Smith' == 'SMITH' == 'smith'? For the optimizer, the three expressions are each unique, and it would gather the statistics accordingly. Hot should it know better? Of course if the DBA tells it:

dbms_stats.create_extended_stats(
  'ownname'  => 'SCOTT',
  'tabname'  => 'EMP',
  'extension => 'LOWER(ENAME)'
);

Drop extended statistics

Did they mess up? Things got worse instead of better? Of course there is a way to get rid of them again. But to do so, the DBA first needs to know which are defined:

SELECT * FROM dba_stat_extensions;

The obvious columns to look at are, of course, owner, table_name, and extension – corresponding to the parameters passed to the create command above. There are some more columns available as well, for the advanced DBA …

Now to drop the unwanted extension: Same syntax as with their creation, just call a different procedure, called dbms_stats.drop_extended_stats.

Gather only stale stats

I fully agree: It makes absolutely no sense to gather stats on unchanged objects again and again (except keeping CPU and disks busy). Ever heard of something like "stale statistics"? Those are the once requiring a refresh. How does the optimizer know? It doesn't – except you placed the objects on monitoring: ALTER <TABLE|INDEX> … MONITORING;.

Now, for a last time full statistics must be gathered (as the object was not monitored in the past). Afterwards, the options parameter can be set to either GATHER STALE or GATHER AUTO – and statistics will be considered stale and thus be recollected only for objects having changed more than 10%.

DBMS_STATS.AUTO_SAMPLE_SIZE

As every DBA should know, statistics can either be CALCULATEd (which is rather time and ressource consuming – and completely unpracticable on many larger databases) or ESTIMATEd. The latter uses a (hopefully representable) sample size to "simulate" a calculation. That's like with elections: Having a good sample, it can be pretty precise even if only a few percent of the voters have been counted. The question just is: What is a good sample size? Chose it too large, and it takes too long. Have it too small, and you can not rely on the results.

To ease the answer for the DBA, Oracle 9i introduced dbms_stats.auto_sample_size to be passed to the estimate_percent parameter of the gather-procedures – promising to find the "best suiting" value for the given object, which also automatically adjusts with its changes.

That did not work out well in some cases before Oracle 11g – but meanwhile they improved to be more reliable. And even faster. More accurate. And, finally, "deterministic" – whatever that is meant to say, as it somehow implicates that before two gather-runs on the same object would result in different stats …

Parallelism: DBMS_STATS.AUTO_DEGREE

Similarly to above mentioned [AUTO_SAMPLE_SIZE, Oracle 11g introduced dbms_stats.auto_degree to determine the best parallel statistics gathering. This can be passed to the dbms_stats.gather_*_stats() procedures degree parameter.

What benefit do the different `method_opts` bring?

There are quite some available. I won't list them all here, and the experiences mentioned on this page are gathered from other DBAs on the web. But at least for the junior DBA, the following indications may prove helpful.

For what objects to gather statistics?

FOR ALL INDEXED COLUMNS
- only gather statistics on indexed columns, ignoring all other
- not generally recommended – but might be useful in special cases
FOR COLUMNS (xx,yy)
- Something very useful also called "extended stats" and explained above
FOR ALL COLUMNS
- consider ALL columns, not just indexed ones
- this is what is used in most cases

How much statistics to gather?

This is where opions differ – a lot: The parameter SIZE which can be passed to above options. Let's walk them slowly upwards:

SIZE 1
- gather minimal statistics (1 bucked; max would be 254)
- does not create histograms
- this was the default with Oracle 9i
SIZE AUTO
- let the optimizer chose the "best size" automatically
- let the optimizer decide which columns are "important", and create histograms on those automatically
- based on workload
- sometimes "suboptimal"
- default with Oracle 10g
SIZE REPEAT
- only re-analyze indexes with existing histograms
- intention: SKEW in larger intervals, then REPEAT in shorter intervals between SKEWs
SIZE SKEWONLY
- examines value distribution for every column of every index, creating histograms where "useful"
- very time intensive
- creates "tons" of data (histograms)

The best strategy can not be given "in general" – but very much depends on the data model. The DBA should also keep in mind where all these statistics are stored: in the data dictionary, which is located in the SYSAUX tablespace. So the more detailed the statistics get, the more this will blow up the SYSAUX tablespace. Also questionable is the performance gain with huge statistics: the full-table-scan may be ommitted on a data table on the cost of a full-table-scan in the data dictionary …

In most cases, the DBA will decide to minimal global statistics (SIZE 1) without any histograms, and more detailed statistics on certain objects where it is justified. Which requires a good knowledge of the data model, of course. But between "best" and "most" somewhere lies "optimal" (concerning the relation between time of evaluation and benefit).

Incremental Statistics

This is a feature new with 11g, and should be quite interesting with e.g. partitioned tables where most partitions stay unchanged, while only a few of them are permanently modified. Instead of generating statistics for the entire table again and again, the idea here is similar to incremental backups: just update what has changed:

dbms_stats.set_table_prefs(
  'SCOTT',
  'EMP',
  'INCREMENTAL',
  'TRUE'
);
dbms_stats.gather_table_stats(
  ownname  => 'SCOTT',
  tabname  => 'EMP',
  partname => 'PART_2012',
  granularity => 'AUTO'
);

This way, global stats are updated incrementally from partition stats.

Sounds too good to be true? Nothing comes for free? Booby-traps granted? OK – so where's the hook in this case? Almost at the same place as above: It again uses the data dictionary to store a lot of additional data. But not only this:

no sample size supported (except for 'AUTO'), so it uses a sample size of 100% and takes some time …
uses SYSAUX to store increments, which may result in really blowing up that tablespace, as it stores quite a lot of data
lots of overhead especially for 'small' tables
needs incremental statistics for ALL partitions of the affected table to start with
doesn't work with locked statistics

So before using this on a production database, better thoroughly test it on the corresponding staging DB!

Reschedule statistic gathering

Should the window Oracle configured for statistic gathering collide with your internal schedules, it's possible to change that. If you cannot change your own timing, simply change Oracle’s. How to achieve that is e.g. described at DBPilot, which is what the following is based upon.

-- find the window group
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT;

-- OK, group for optimizer stats is ROA$AT_WGRP_OS. Now find the windows defined for it:
SELECT * FROM DBA_SCHEDULER_WINGROUP_MEMBERS WHERE WINDOW_GROUP_NAME='ORA$AT_WGRP_OS';

-- We have 7 windows, one for each day of the week. To get details of all defined windows:
-- SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,ENABLED FROM DBA_SCHEDULER_WINDOWS;
-- let's restrict that to what we need:
SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,ENABLED FROM DBA_SCHEDULER_WINDOWS
WHERE window_name IN (SELECT window_name FROM dba_scheduler_wingroup_members WHERE window_group_name='ORA$AT_WGRP_OS');

-- We see the job runs mon-fri at 10pm, and sat+sun at 4am. All 7 windows are enabled. What can we do?
-- a) adjust the defined windows' REPEAT_INTERVALs
-- b) do not touch existing windows but create our own 7-days in parallel
-- c) like b), but for simplicity just create a single window matching all days

-- temporarily disable stats collection
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>null);
-- verify
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT 
WHERE CLIENT_NAME='auto optimizer stats collection';
-- create a single new scheduler object to run daily at 2am (customer request)
begin
  dbms_scheduler.create_schedule(
    schedule_name=>'GATHER_STATS_SCHEDULE',
    start_date=>cast(sysdate as timestamp) at time zone 'EUROPE/VIENNA',
    --start_date=>systimestamp,
    repeat_interval=>'freq=daily;byday=MON,TUE,WED,THU,FRI,SAT,SUN;byhour=2;byminute=0;bysecond=0',
    end_date=>null,
    comments=>'scheduler window for stats collection: daily at 2am'
  );
end;
/
-- if we later want to delete it:
-- exec dbms_scheduler.drop_schedule(schedule_name=>'GATHER_STATS_SCHEDULE');
-- verify:
SELECT OWNER,SCHEDULE_NAME,REPEAT_INTERVAL FROM DBA_SCHEDULER_SCHEDULES WHERE SCHEDULE_NAME='GATHER_STATS_SCHEDULE';
SELECT START_DATE,END_DATE,COMMENTS FROM DBA_SCHEDULER_SCHEDULES WHERE SCHEDULE_NAME='GATHER_STATS_SCHEDULE';
-- now create a new scheduler window using this object:
begin
  dbms_scheduler.create_window(
    window_name=>'GATHERSTATS_DAILY',
    resource_plan=>'DEFAULT_MAINTENANCE_PLAN',
    schedule_name=>'GATHER_STATS_SCHEDULE',
    duration=>interval '4' hour,
    window_priority=>'LOW',
    comments=>'Scheduler Window to run our nightly gathering of optimizer stats (use INSTEAD of the defaults)'
  );
end;
/
-- to get rid of it again:
-- dbms_scheduler.drop_window(window_name=>'GATHERSTATS_DAILY');
-- verify:
SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,SCHEDULE_NAME,ENABLED FROM DBA_SCHEDULER_WINDOWS 
WHERE WINDOW_NAME=UPPER('GATHERSTATS_DAILY');
-- now add it to the default maintenance group:
begin
  dbms_scheduler.add_group_member(
    group_name => 'MAINTENANCE_WINDOW_GROUP',
    member => 'GATHERSTATS_DAILY'
  );
end;
/
-- to remove:
-- dbms_stats.remove_group_member(member => 'GATHERSTATS_DAILY');
-- now associate the AOSC job with our new window and activate it:
begin
  dbms_auto_task_admin.enable(
    client_name => 'auto optimizer stats collection',
    operation => null,
    window_name => 'GATHERSTATS_DAILY'
  );
end;
/
-- verify:
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT 
WHERE CLIENT_NAME='auto optimizer stats collection';
SELECT * from DBA_SCHEDULER_WINGROUP_MEMBERS where WINDOW_GROUP_NAME='ORA$AT_WGRP_OS';
SELECT WINDOW_NAME,WINDOW_NEXT_TIME,AUTOTASK_STATUS STATUS,OPTIMIZER_STATS FROM DBA_AUTOTASK_WINDOW_CLIENTS;
-- all green? Then deactivate the original windows:
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'MONDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'TUESDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'WEDNESDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'THURSDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'FRIDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'SATURDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'SUNDAY_WINDOW');
-- verify:
SELECT WINDOW_NAME,WINDOW_NEXT_TIME,AUTOTASK_STATUS STATUS,OPTIMIZER_STATS FROM DBA_AUTOTASK_WINDOW_CLIENTS;
-- Done.

2021-08-18

IzzySoft