Perform Switchover and Failover with Oracle Data Guard Broker

This post guides you to perform Switchover and Failover with Oracle Data Guard Broker.

In this serries:

Switchover

Running this command on primary or standby:

dgmgrl sys/syspassword
DGMGRL> switchover to dg10
 Performing switchover NOW, please wait…

Result will be something like this:

DGMGRL> switchover to dg10
 Performing switchover NOW, please wait…
 New primary database "dg10" is opening…
 Operation requires shutdown of instance "DG9" on database "dg9"
 Shutting down instance "DG9"…
 ORA-01017: invalid username/password; logon denied
 Warning: You are no longer connected to ORACLE.
 Please complete the following steps to finish switchover:
         shut down instance "DG9" of database "dg9"
         start up instance "DG9" of database "dg9"

You may need to restart both dg10 – new active database and dg9 – new standby manually

sqlplus / as sysdba
SQL> shutdown immediate;
SQL> startup;

Now login to dgmgrl and check

DGMGRL> show configuration;
 Configuration - mydgconfig
 Protection Mode: MaxPerformance
   Databases:
     dg10 - Primary database
     dg9  - Physical standby database
 Fast-Start Failover: DISABLED
 Configuration Status:
 SUCCESS

DG10 now new primary database, DG9 became physical standby

DGMGRL> show database dg9
 Database - dg9
 Role:            PHYSICAL STANDBY
   Intended State:  APPLY-ON
   Transport Lag:   0 seconds
   Apply Lag:       0 seconds
   Real Time Query: ON
   Instance(s):
     DG9
 Database Status:
 SUCCESS

Check each database with transport lag and apply lag

DGMGRL> show database dg10
 Database - dg10
 Role:            PRIMARY
   Intended State:  TRANSPORT-ON
   Instance(s):
     DG9
 Database Status:
 SUCCESS

Check dg10 with TRANSPORT-ON

Now switch back

DGMGRL> switchover to dg9;
 Performing switchover NOW, please wait…

If you have some errors while switching, please refer to “Some typical errors” section at the end of this post.

Failover & Reinstate

Run this command on STANDBY because FAILOVER is in case your PRIMARY is damaged.

DGMGRL> FAILOVER to dg10;
 Performing failover NOW, please wait…

In just few seconds, its done. Check again

DGMGRL> FAILOVER to dg10;
 Performing failover NOW, please wait…
 Failover succeeded, new primary is "dg10"
 DGMGRL> show configuration
 Configuration - mydgconfig
 Protection Mode: MaxPerformance
   Databases:
     dg10 - Primary database
     dg9  - Physical standby database (disabled)
       ORA-16661: the standby database needs to be reinstated
 Fast-Start Failover: DISABLED
 Configuration Status:
 SUCCESS

Now if we got dg9 back without datafiles damaged -> just reinstate and add to dg10 the primary

If dg9 is back with datafiles damaged, then we need to restore the datafiles and then reinstate.

Restoring the datafiles is done from RMAN with some commands like this:

rman target /
STARTUP MOUNT;
RESTORE DATAFILE 2;
SELECT FLASHBACK_ON FROM V$DATABASE;
EXIT

Reinstate

Reinstate can be understand as re-register the database to the data guard configuration, so, you need to prepare database restored as above. Using those commands:

DGMGRL> reinstate database dg9;
 Reinstating database "dg9", please wait…

You will have something like this:

DGMGRL> reinstate database dg9;
 Reinstating database "dg9", please wait…
 Operation requires shutdown of instance "DG9" on database "dg9"
 Shutting down instance "DG9"…
 ORA-01017: invalid username/password; logon denied
 Warning: You are no longer connected to ORACLE.
 Please complete the following steps and reissue the REINSTATE command:
         shut down instance "DG9" of database "dg9"
         start up and mount instance "DG9" of database "dg9"

Now you need to restart mount database DG9 manually, you will have result like:

SQL> startup mount;
 ORACLE instance started.
 Total System Global Area 1603411968 bytes
 Fixed Size                  2213776 bytes
 Variable Size             436209776 bytes
 Database Buffers         1157627904 bytes
 Redo Buffers                7360512 bytes
 Database mounted.

Now run the reinstate again at the DG10 – PRIMARY database:

DGMGRL> reinstate database dg9
 Reinstating database "dg9", please wait…

Note: You may have to do it twice (reinstate and restart mount the database).

Finally you will have some successful screens:

DGMGRL> reinstate database dg9
 Reinstating database "dg9", please wait…
 Reinstatement of database "dg9" succeeded

Now check the configuration:

DGMGRL> show configuration
 Configuration - mydgconfig
 Protection Mode: MaxPerformance
   Databases:
     dg10 - Primary database
     dg9  - Physical standby database
       Error: ORA-16766: Redo Apply is stopped
 Fast-Start Failover: DISABLED
 Configuration Status:
 ERROR

Now dg9 is back but with some errors. Check the log we have

The log at:

tail -200f /u01/app/oracle/diag/rdbms/dg9/DG9/trace/alert_DG9.log

Result is:

Additional information: 3
 Recovery Slave PR00 previously exited with exception 38701
 Errors in file /u01/app/oracle/diag/rdbms/dg9/DG9/trace/DG9_mrp0_16080.trc:
 ORA-38701: Flashback database log 18 seq 65 thread 1: "/u01/app/oracle/flash_recovery_area/DG9/flashback/o1_mf_gwo46ctq_.flb"
 ORA-27037: unable to obtain file status
 Linux-x86_64 Error: 2: No such file or directory

Please check at the Some Typical Errors section below for ORA-27037 & ORA-38701.

Some Typical Errors & Solutions

ORA-16610: command “Broker automatic health check” in progress and DGM-17017: unable to determine configuration status

Solution: You need to restart primary and physical standby database manually.

DGMGRL> show configuration;
 Configuration - mydgconfig
 Protection Mode: MaxPerformance
   Databases:
     dg9  - Primary database
     dg10 - Physical standby database
 Fast-Start Failover: DISABLED
 Configuration Status:
 ORA-16610: command "Broker automatic health check" in progress
 DGM-17017: unable to determine configuration status

ORA-16778, ORA-16810 solution

Solution: You may need for a while and retry, the databases is preparing for synchronization.

DGMGRL> show configuration
 Configuration - mydgconfig
 Protection Mode: MaxPerformance
   Databases:
     dg9  - Primary database
       Error: ORA-16778: redo transport error for one or more databases
 dg10 - Physical standby database   
Error: ORA-16810: multiple errors or warnings detected for the database
 Fast-Start Failover: DISABLED
 Configuration Status:
 ERROR

ORA-38760: This database instance failed to turn on flashback database and Slave exiting with ORA-38760 exception

Slave exiting with ORA-38760 exception is a general error which means you can relay on this error and need to find more specific error

Same with ORA-38760. In my case I got more detail error: Error: ORA-16766: Redo Apply is stopped

Error: ORA-16766: Redo Apply is stopped

Solution:

This message mean there was something to you standby database. Check the log at:

tail -200f /u01/app/oracle/diag/rdbms/dg10/DG9/trace/alert_DG9.log

You will see something like:

RFS[6]: Assigned to RFS process 44261
 RFS[6]: Identified database type as 'physical standby': Client is ARCH pid 11634
 RFS[6]: Selected log 8 for thread 1 sequence 159 dbid 1381157600 branch 1023448483
 Fri Nov 15 12:30:14 2019
 Archived Log entry 168 added for thread 1 sequence 159 ID 0x5261d062 dest 1:
 Fri Nov 15 12:30:17 2019
 Media Recovery Log /home/data/flash_recovery_area/DG10/archivelog/2019_11_15/o1_mf_1_159_gwwfz6o4_.arc
 Media Recovery Waiting for thread 1 sequence 160 (in transit)
 Recovery of Online Redo Log: Thread 1 Group 7 Seq 160 Reading mem 0
   Mem# 0: /u01/app/oracle/oradata/DG9/standby_redo01.log
 Fri Nov 15 12:30:26 2019
 RFS[7]: Assigned to RFS process 44263
 RFS[7]: Identified database type as 'physical standby': Client is ARCH pid 11648
 Errors in file /u01/app/oracle/diag/rdbms/dg10/DG9/trace/DG9_rfs_44263.trc:
 ORA-16401: archive log rejected by Remote File Server (RFS)

Now do crosscheck archivelog and full backup on PRIMARY:

rman target /
RMAN> crosscheck archivelog all;
RMAN> BACKUP DATABASE PLUS ARCHIVELOG;

If you have huge datafiles and archivelogs, please consider delete them first using:

DELETE NOPROMPT ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-2';

Now copy all the backed up and archivelog to STANDBY:

scp -r oracle@dg9:/home/data/flash_recovery_area/DG9/archivelog/ /home/data/flash_recovery_area/DG9

scp -r oracle@dg9:/home/data/flash_recovery_area/DG9/backupset/ /home/data/flash_recovery_area/DG9

Next, crosscheck archivelog and backup on STANDBY

rman target /
RMAN> crosscheck archivelog all;
RMAN> BACKUP DATABASE PLUS ARCHIVELOG;

Now stop/start STANDBY database manually and retry with DGMGRL.

ORA-27037 & ORA-38701 Solution

 ORA-38701: Flashback database log 18 seq 65 thread 1: "/u01/app/oracle/flash_recovery_area/DG9/flashback/o1_mf_gwo46ctq_.flb"
 ORA-27037: unable to obtain file status
 

These errors happen when you have some trouble with your flashback even if you restore point list is empty:

RMAN> list restore point all;
 using target database control file instead of recovery catalog
 SCN              RSP Time  Type       Time      Name
 
 RMAN>

Solution:

Do exactly as follow:

  1. Shutdown the database immediate
  2. startup mount
  3. turn off flashback: alter database flashback off;
  4. Open the database (read only if on standby)
  5. shutdown the database immediate (again)
  6. startup mount
  7. Turn on flashback (alter database flashback on)
  8. Open the database (read only if on standby)
  9. Check dgmgrl again