Manual Switchover in a Data Guard environment without relying on dgmgrl
If you use Data Guard and want to perform a switch-over, you risk a license
violation: If you do that via dgmgrl
, your new standby database will end up
with an open_mode
of READ ONLY WITH APPLY
– which requires the Active
Data Guard license you might not have. And no, you didn't do anything wrong –
that's by design, and (apart from using Clusterware or setting up a complete
Grid environment), since 11g, there's no way to avoid that (for details, see
Avoid unintentional use of unlicensed Options).
So using Data Guard is fully within the limits of the Enterprise Edition
license – as long as you don't perform a switch-over via dgmgrl
. You can even
configure that for FSFO: on a fail-over, only the later primary will survive
(the old primary will be left broken as long as you left auto-reinstate
off), so no violation. But no convenient
application-transparent switchover with a single command, sorry.
Now what will you do after a fail-over, to move the primary back to its original
location? Of course you could simply force another fail-over (e.g. by killing the
primary's system monitor (smon
) process). But then you'll need to rebuild the
standby a second time. While RMAN makes that easy, you'd also risk some data loss,
which you certainly do not want.
Hence, do it the „old-fashioned“, manual way:
Contents
Preparation
- Shutdown application (ideally) to prevent further data changes
- If you've established application services: Stop them and have them cancel
sessions which are still active
On primary:exec dbms_service.disconnect_session('${service_name}',1)
,exec dbms_service.stop_service('${service_name}')
(0: POST_TRANSACTION, 1: IMMEDIATE, 2: NO_REPLAY) - Optionally: shutdown listener to prevent new client connections (if your
applications always use the service to connect, this is not required)
On primary only:lsnrctl stop <listener>
- Check that no user sessions are remaining
select username,count(*) from v$session group by username order by 1;
- Check DG status with observer:
dgmgrl sys@myPrim
,show configuration
- Stop observer (
stop observer
) - Last checks on primary & standby (standby commands are commented out, all others are for primary. This is an optional verification for the extra careful; if DG stated "success", these steps should – at least with MaxAvailibility – just confirm):
ALTER SESSION SET nls_date_format='DD-MON-YYYY HH24:MI:SS';
alter system archive log current;
SELECT * FROM (
SELECT sequence#, first_time, next_time, first_change#, next_change#, applied
FROM v$archived_log
WHERE dest_id=2
ORDER BY sequence# desc
) where rownum <10;
-- compare this with standby:
-- select sequence#,first_time,last_time,first_change#,last_change# from v$standby_log where sequence# > 0;
-- funnily, first_change# on standby will correspond to next_change# on primary
-- (and the sequence# is one ahead there)
col dest_name for a50
select dest_name,status,error from v$archive_dest where dest_name='LOG_ARCHIVE_DEST_2';
col message for a120
select * from (
select timestamp,message from v$dataguard_status order by timestamp desc
) where rownum < 10;
(note I used alter system archive log current
here incidentally: other than
alter system switch logfile
, which returns the SQL prompt immediately and leaves
the log switch to background processes, this only returns the prompt once the job
has been completed)
Alternative check method:
select status, gap_status from v$archive_dest_status where dest_id = 2;
-- should show: VALID, NO GAP
select name, value, datum_time from v$dataguard_stats;
-- no "transport lag" or "apply lag" should be shown, "finish time" should be zero
select switchover_status from v$database;
-- should be TO STANDBY (maybe SESSIONS ACTIVE) on primary, TO PRIMARY (or SESSIONS ACTIVE) on standby
Commit the primary
Now we start to commit the primary and convert it into a standby. During commit it should do its last syncs to the current standby.
alter database commit to switchover to standby;
shutdown immediate
startup nomount
alter database mount standby database;
alter database recover managed standby database disconnect from session;
select name,open_mode,database_role from v$database;
Note that alter database commit to switchover to standby
should imply the following
shutdown immediate
– so if on the latter command you get an ORA-01012: not logged on
,
that's fine: just reconnect in this case (conn / as sysdba
) and continue with the
next step.
Managed recovery should be started automatically by the broker if we wait a few seconds. When in a hurry, we can try to be faster :) Output from the last command should confirm: open_mode MOUNTED, database_role PHYSICAL STANDBY. Now we're ready to…
Open the standby
alter database commit to switchover to primary;
shutdown immediate
startup
alter system register;
select name,open_mode,database_role from v$database;
Here we expect to see: open_mode READ WRITE, database_role PRIMARY. If you use services for your application connections, your corresponding startup trigger should have taken care for those here, so verify that with the listener on the new primary:
lsnrctl services <listener>
All green? Then go for the
Final steps
If all above went well, we need to bring up what we've brought down in the preparation phase:
- Start the application listener on the new standby:
lsnrctl start <listener>
- optionally: perform an
alter system register
on the new standby, then check withlsnrctl services <listener>
to make sure the application service is NOT shown (the corresponding startup trigger should have taken care for that) - Connect
dgmgrl
with the database and check the status viashow configuration
If there are any errors, fix them first before continuing! Expected errors: both databases are in the wrong mode (DG wasn't involved in the switch, so its configuration still reflects their original roles). To fix that:edit configuration set protection mode as MaxPerformance;
,disable fast_start failover;
,remove configuration
– then setup your original configuration with the roles reversed. - Start the observer:
nohup dgmgrl sys@myPrim \
"start observer file='/<path/to>/fsfo_${ORACLE_SID}.dat'" \
| tee -a /<path/to>/observer_${ORACLE_SID}.log &
- Once the oberver has settled, verify with a final
show configuration
.
If all went well, you're done now. If it went too well too many times, you might even script some of the steps (as every good lazy admin would do) so the entire show goes faster next time.
After going to all this trouble, you certainly want to avoid the surprise if after a failover, Data Guard suddenly decides to reinstate the broken primary (if it e.g. only disappeared temporarily due to network issues or a host reboot) – and seing your old standby opened READ ONLY WITH APPLY. So make sure to keep Data Guard from doing that:
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverAutoReinstate=FALSE;