Tweak the Optimizer with optimal stats
Extended Stats
Oracle 11g introduced some interesting features concerning extended statistics:
Indicate relations
Sometimes columns show relations, i.e. certain combinations never show up.
Imagine a hotel booking agency. They have hotels in many categories, from very
simple 1-star up to superior 5- and maybe even 6-star hotels listed. One table
of their database could list up how many rooms each hotel has in different room
categories. Very unlikely that a 1-star hotel lists a luxury presidential suite
– or a 6-star has a dormatory room for backpack-travelers with sleeping bags.
That is, two columns hotel_id and room_category have typical relations –
and others never showing up. Which will prove relevant to execution plans.
So how to make use of this database model knowledge? Hardly a chance before Oracle 11g – but now we can:
dbms_stats.create_extended_stats(
'booking_schema', -- schema owner
'rooms_available', -- table name
'(hotel_id, room_category)' -- two related columns
);
Now gather statistics again – and the optimizer will take the relation into
consideration. By the way: The same can be achieved during stats gathering
using method_opt => 'FOR COLUMNS (hotel_id, room_category)'.
Expression stats
Free text name columns on a web interface – and the user has LOCKED THE CAPS KEY from time to time? 'Smith' == 'SMITH' == 'smith'? For the optimizer, the three expressions are each unique, and it would gather the statistics accordingly. Hot should it know better? Of course if the DBA tells it:
dbms_stats.create_extended_stats(
'ownname' => 'SCOTT',
'tabname' => 'EMP',
'extension => 'LOWER(ENAME)'
);
Drop extended statistics
Did they mess up? Things got worse instead of better? Of course there is a way to get rid of them again. But to do so, the DBA first needs to know which are defined:
SELECT * FROM dba_stat_extensions;
The obvious columns to look at are, of course, owner, table_name, and
extension – corresponding to the parameters passed to the create
command above. There are some more columns available as well, for the advanced
DBA …
Now to drop the unwanted extension: Same syntax as with their creation, just
call a different procedure, called dbms_stats.drop_extended_stats.
Gather only stale stats
I fully agree: It makes absolutely no sense to gather stats on unchanged
objects again and again (except keeping CPU and disks busy). Ever heard of
something like "stale statistics"? Those are the once requiring a refresh. How
does the optimizer know? It doesn't – except you placed the objects on
monitoring: ALTER <TABLE|INDEX> … MONITORING;.
Now, for a last time full statistics must be gathered (as the object was not
monitored in the past). Afterwards, the options parameter can be set to either
GATHER STALE or GATHER AUTO – and statistics will be considered stale and
thus be recollected only for objects having changed more than 10%.
DBMS_STATS.AUTO_SAMPLE_SIZE
As every DBA should know, statistics can either be CALCULATEd (which is rather time and ressource consuming – and completely unpracticable on many larger databases) or ESTIMATEd. The latter uses a (hopefully representable) sample size to "simulate" a calculation. That's like with elections: Having a good sample, it can be pretty precise even if only a few percent of the voters have been counted. The question just is: What is a good sample size? Chose it too large, and it takes too long. Have it too small, and you can not rely on the results.
To ease the answer for the DBA, Oracle 9i introduced
dbms_stats.auto_sample_size to be passed to the estimate_percent parameter
of the gather-procedures – promising to find the "best suiting" value for the
given object, which also automatically adjusts with its changes.
That did not work out well in some cases before Oracle 11g – but meanwhile they improved to be more reliable. And even faster. More accurate. And, finally, "deterministic" – whatever that is meant to say, as it somehow implicates that before two gather-runs on the same object would result in different stats …
Parallelism: DBMS_STATS.AUTO_DEGREE
Similarly to above mentioned [AUTO_SAMPLE_SIZE, Oracle 11g
introduced dbms_stats.auto_degree to determine the best parallel statistics
gathering. This can be passed to the dbms_stats.gather_*_stats() procedures
degree parameter.
What benefit do the different method_opts bring?
There are quite some available. I won't list them all here, and the experiences mentioned on this page are gathered from other DBAs on the web. But at least for the junior DBA, the following indications may prove helpful.
For what objects to gather statistics?
FOR ALL INDEXED COLUMNS- only gather statistics on indexed columns, ignoring all other
- not generally recommended – but might be useful in special cases
FOR COLUMNS (xx,yy)- Something very useful also called "extended stats" and explained above
FOR ALL COLUMNS- consider ALL columns, not just indexed ones
- this is what is used in most cases
How much statistics to gather?
This is where opions differ – a lot: The parameter SIZE which can be passed
to above options. Let's walk them slowly upwards:
SIZE 1- gather minimal statistics (1 bucked; max would be 254)
- does not create histograms
- this was the default with Oracle 9i
SIZE AUTO- let the optimizer chose the "best size" automatically
- let the optimizer decide which columns are "important", and create histograms on those automatically
- based on workload
- sometimes "suboptimal"
- default with Oracle 10g
SIZE REPEAT- only re-analyze indexes with existing histograms
- intention: SKEW in larger intervals, then REPEAT in shorter intervals between SKEWs
SIZE SKEWONLY- examines value distribution for every column of every index, creating histograms where "useful"
- very time intensive
- creates "tons" of data (histograms)
The best strategy can not be given "in general" – but very much depends on the
data model. The DBA should also keep in mind where all these statistics are
stored: in the data dictionary, which is located in the SYSAUX tablespace. So
the more detailed the statistics get, the more this will blow up the SYSAUX
tablespace. Also questionable is the performance gain with huge statistics: the
full-table-scan may be ommitted on a data table on the cost of a
full-table-scan in the data dictionary …
In most cases, the DBA will decide to minimal global statistics (SIZE 1) without any histograms, and more detailed statistics on certain objects where it is justified. Which requires a good knowledge of the data model, of course. But between "best" and "most" somewhere lies "optimal" (concerning the relation between time of evaluation and benefit).
Incremental Statistics
This is a feature new with 11g, and should be quite interesting with e.g. partitioned tables where most partitions stay unchanged, while only a few of them are permanently modified. Instead of generating statistics for the entire table again and again, the idea here is similar to incremental backups: just update what has changed:
dbms_stats.set_table_prefs(
'SCOTT',
'EMP',
'INCREMENTAL',
'TRUE'
);
dbms_stats.gather_table_stats(
ownname => 'SCOTT',
tabname => 'EMP',
partname => 'PART_2012',
granularity => 'AUTO'
);
This way, global stats are updated incrementally from partition stats.
Sounds too good to be true? Nothing comes for free? Booby-traps granted? OK – so where's the hook in this case? Almost at the same place as above: It again uses the data dictionary to store a lot of additional data. But not only this:
- no sample size supported (except for 'AUTO'), so it uses a sample size of 100% and takes some time …
- uses
SYSAUXto store increments, which may result in really blowing up that tablespace, as it stores quite a lot of data - lots of overhead especially for 'small' tables
- needs incremental statistics for ALL partitions of the affected table to start with
- doesn't work with locked statistics
So before using this on a production database, better thoroughly test it on the corresponding staging DB!
Reschedule statistic gathering
Should the window Oracle configured for statistic gathering collide with your internal schedules, it's possible to change that. If you cannot change your own timing, simply change Oracle’s. How to achieve that is e.g. described at DBPilot, which is what the following is based upon.
-- find the window group
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT;
-- OK, group for optimizer stats is ROA$AT_WGRP_OS. Now find the windows defined for it:
SELECT * FROM DBA_SCHEDULER_WINGROUP_MEMBERS WHERE WINDOW_GROUP_NAME='ORA$AT_WGRP_OS';
-- We have 7 windows, one for each day of the week. To get details of all defined windows:
-- SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,ENABLED FROM DBA_SCHEDULER_WINDOWS;
-- let's restrict that to what we need:
SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,ENABLED FROM DBA_SCHEDULER_WINDOWS
WHERE window_name IN (SELECT window_name FROM dba_scheduler_wingroup_members WHERE window_group_name='ORA$AT_WGRP_OS');
-- We see the job runs mon-fri at 10pm, and sat+sun at 4am. All 7 windows are enabled. What can we do?
-- a) adjust the defined windows' REPEAT_INTERVALs
-- b) do not touch existing windows but create our own 7-days in parallel
-- c) like b), but for simplicity just create a single window matching all days
-- temporarily disable stats collection
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>null);
-- verify
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT
WHERE CLIENT_NAME='auto optimizer stats collection';
-- create a single new scheduler object to run daily at 2am (customer request)
begin
dbms_scheduler.create_schedule(
schedule_name=>'GATHER_STATS_SCHEDULE',
start_date=>cast(sysdate as timestamp) at time zone 'EUROPE/VIENNA',
--start_date=>systimestamp,
repeat_interval=>'freq=daily;byday=MON,TUE,WED,THU,FRI,SAT,SUN;byhour=2;byminute=0;bysecond=0',
end_date=>null,
comments=>'scheduler window for stats collection: daily at 2am'
);
end;
/
-- if we later want to delete it:
-- exec dbms_scheduler.drop_schedule(schedule_name=>'GATHER_STATS_SCHEDULE');
-- verify:
SELECT OWNER,SCHEDULE_NAME,REPEAT_INTERVAL FROM DBA_SCHEDULER_SCHEDULES WHERE SCHEDULE_NAME='GATHER_STATS_SCHEDULE';
SELECT START_DATE,END_DATE,COMMENTS FROM DBA_SCHEDULER_SCHEDULES WHERE SCHEDULE_NAME='GATHER_STATS_SCHEDULE';
-- now create a new scheduler window using this object:
begin
dbms_scheduler.create_window(
window_name=>'GATHERSTATS_DAILY',
resource_plan=>'DEFAULT_MAINTENANCE_PLAN',
schedule_name=>'GATHER_STATS_SCHEDULE',
duration=>interval '4' hour,
window_priority=>'LOW',
comments=>'Scheduler Window to run our nightly gathering of optimizer stats (use INSTEAD of the defaults)'
);
end;
/
-- to get rid of it again:
-- dbms_scheduler.drop_window(window_name=>'GATHERSTATS_DAILY');
-- verify:
SELECT WINDOW_NAME,REPEAT_INTERVAL,DURATION,SCHEDULE_NAME,ENABLED FROM DBA_SCHEDULER_WINDOWS
WHERE WINDOW_NAME=UPPER('GATHERSTATS_DAILY');
-- now add it to the default maintenance group:
begin
dbms_scheduler.add_group_member(
group_name => 'MAINTENANCE_WINDOW_GROUP',
member => 'GATHERSTATS_DAILY'
);
end;
/
-- to remove:
-- dbms_stats.remove_group_member(member => 'GATHERSTATS_DAILY');
-- now associate the AOSC job with our new window and activate it:
begin
dbms_auto_task_admin.enable(
client_name => 'auto optimizer stats collection',
operation => null,
window_name => 'GATHERSTATS_DAILY'
);
end;
/
-- verify:
SELECT CLIENT_NAME,STATUS,CONSUMER_GROUP,CLIENT_TAG,WINDOW_GROUP FROM DBA_AUTOTASK_CLIENT
WHERE CLIENT_NAME='auto optimizer stats collection';
SELECT * from DBA_SCHEDULER_WINGROUP_MEMBERS where WINDOW_GROUP_NAME='ORA$AT_WGRP_OS';
SELECT WINDOW_NAME,WINDOW_NEXT_TIME,AUTOTASK_STATUS STATUS,OPTIMIZER_STATS FROM DBA_AUTOTASK_WINDOW_CLIENTS;
-- all green? Then deactivate the original windows:
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'MONDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'TUESDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'WEDNESDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'THURSDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'FRIDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'SATURDAY_WINDOW');
exec dbms_auto_task_admin.disable(client_name=>'auto optimizer stats collection',operation=>null,window_name=>'SUNDAY_WINDOW');
-- verify:
SELECT WINDOW_NAME,WINDOW_NEXT_TIME,AUTOTASK_STATUS STATUS,OPTIMIZER_STATS FROM DBA_AUTOTASK_WINDOW_CLIENTS;
-- Done.
