Site Logo

IzzySoft


Das inoffizielle Android-HandbuchDas inoffizielle Android-Handbuch
Buy at Amazon for 16,99 €
Das inoffizielle Android-System-Handbuch (Professional Series)Das inoffizielle Android-System-Handbuch (Professional Series)
Buy at Amazon for 13,49 €
Die besten Android-Apps (Action)Die besten Android-Apps (Action)
Buy at Amazon for 2,24 €
As of 2024-10-15 04:07
prices & availability might be subject to change.

Manual Switchover in a Data Guard environment without relying on dgmgrl

If you use Data Guard and want to perform a switch-over, you risk a license violation: If you do that via dgmgrl, your new standby database will end up with an open_mode of READ ONLY WITH APPLY – which requires the Active Data Guard license you might not have. And no, you didn't do anything wrong – that's by design, and (apart from using Clusterware or setting up a complete Grid environment), since 11g, there's no way to avoid that (for details, see Avoid unintentional use of unlicensed Options).

So using Data Guard is fully within the limits of the Enterprise Edition license – as long as you don't perform a switch-over via dgmgrl. You can even configure that for FSFO: on a fail-over, only the later primary will survive (the old primary will be left broken as long as you left auto-reinstate off), so no violation. But no convenient application-transparent switchover with a single command, sorry.

Now what will you do after a fail-over, to move the primary back to its original location? Of course you could simply force another fail-over (e.g. by killing the primary's system monitor (smon) process). But then you'll need to rebuild the standby a second time. While RMAN makes that easy, you'd also risk some data loss, which you certainly do not want.

Hence, do it the „old-fashioned“, manual way:

Contents

Preparation

  1. Shutdown application (ideally) to prevent further data changes
  2. If you've established application services: Stop them and have them cancel sessions which are still active
    On primary: exec dbms_service.disconnect_session('${service_name}',1), exec dbms_service.stop_service('${service_name}')
    (0: POST_TRANSACTION, 1: IMMEDIATE, 2: NO_REPLAY)
  3. Optionally: shutdown listener to prevent new client connections (if your applications always use the service to connect, this is not required)
    On primary only: lsnrctl stop <listener>
  4. Check that no user sessions are remaining
    select username,count(*) from v$session group by username order by 1;
  5. Check DG status with observer: dgmgrl sys@myPrim, show configuration
  6. Stop observer (stop observer)
  7. Last checks on primary & standby (standby commands are commented out, all others are for primary. This is an optional verification for the extra careful; if DG stated "success", these steps should – at least with MaxAvailibility – just confirm):
ALTER SESSION SET nls_date_format='DD-MON-YYYY HH24:MI:SS';
alter system archive log current;
SELECT * FROM (
  SELECT sequence#, first_time, next_time, first_change#, next_change#, applied
    FROM v$archived_log
   WHERE dest_id=2
   ORDER BY sequence# desc
) where rownum <10;

-- compare this with standby:
-- select sequence#,first_time,last_time,first_change#,last_change# from v$standby_log where sequence# > 0;
-- funnily, first_change# on standby will correspond to next_change# on primary
-- (and the sequence# is one ahead there)

col dest_name for a50
select dest_name,status,error from v$archive_dest where dest_name='LOG_ARCHIVE_DEST_2';

col message for a120
select * from (
  select timestamp,message from v$dataguard_status order by timestamp desc
) where rownum < 10;

(note I used alter system archive log current here incidentally: other than alter system switch logfile, which returns the SQL prompt immediately and leaves the log switch to background processes, this only returns the prompt once the job has been completed)

Alternative check method:

select status, gap_status from v$archive_dest_status where dest_id = 2;
-- should show: VALID, NO GAP
select name, value, datum_time from v$dataguard_stats;
-- no "transport lag" or "apply lag" should be shown, "finish time" should be zero
select switchover_status from v$database;
-- should be TO STANDBY (maybe SESSIONS ACTIVE) on primary, TO PRIMARY (or SESSIONS ACTIVE) on standby

Commit the primary

Now we start to commit the primary and convert it into a standby. During commit it should do its last syncs to the current standby.

alter database commit to switchover to standby;
shutdown immediate
startup nomount
alter database mount standby database;
alter database recover managed standby database disconnect from session;
select name,open_mode,database_role from v$database;

Note that alter database commit to switchover to standby should imply the following shutdown immediate – so if on the latter command you get an ORA-01012: not logged on, that's fine: just reconnect in this case (conn / as sysdba) and continue with the next step.

Managed recovery should be started automatically by the broker if we wait a few seconds. When in a hurry, we can try to be faster :) Output from the last command should confirm: open_mode MOUNTED, database_role PHYSICAL STANDBY. Now we're ready to…

Open the standby

alter database commit to switchover to primary;
shutdown immediate
startup
alter system register;
select name,open_mode,database_role from v$database;

Here we expect to see: open_mode READ WRITE, database_role PRIMARY. If you use services for your application connections, your corresponding startup trigger should have taken care for those here, so verify that with the listener on the new primary:

lsnrctl services <listener>

All green? Then go for the

Final steps

If all above went well, we need to bring up what we've brought down in the preparation phase:

  1. Start the application listener on the new standby: lsnrctl start <listener>
  2. optionally: perform an alter system register on the new standby, then check with lsnrctl services <listener> to make sure the application service is NOT shown (the corresponding startup trigger should have taken care for that)
  3. Connect dgmgrl with the database and check the status via show configuration
    If there are any errors, fix them first before continuing! Expected errors: both databases are in the wrong mode (DG wasn't involved in the switch, so its configuration still reflects their original roles). To fix that: edit configuration set protection mode as MaxPerformance;, disable fast_start failover;, remove configuration – then setup your original configuration with the roles reversed.
  4. Start the observer:
    nohup dgmgrl sys@myPrim \
    "start observer file='/<path/to>/fsfo_${ORACLE_SID}.dat'" \
    | tee -a /<path/to>/observer_${ORACLE_SID}.log &
  5. Once the oberver has settled, verify with a final show configuration.

If all went well, you're done now. If it went too well too many times, you might even script some of the steps (as every good lazy admin would do) so the entire show goes faster next time.

After going to all this trouble, you certainly want to avoid the surprise if after a failover, Data Guard suddenly decides to reinstate the broken primary (if it e.g. only disappeared temporarily due to network issues or a host reboot) – and seing your old standby opened READ ONLY WITH APPLY. So make sure to keep Data Guard from doing that:

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverAutoReinstate=FALSE;
2020-11-18