This post guides you to perform Switchover and Failover with Oracle Data Guard Broker.
In this serries:
- Install Oracle Linux 5,6,7 from USB with ISO file
- Install Oracle 11g R2 Database On Oracle Linux 6.10 with EM (Enterprise Manager)
- Install Oracle 11g R2 Database On Oracle Linux 7.7 with EM (Enterprise Manager)
- How to configure X11 forwarding to install Oracle databases
- Perform Switchover and Failover with Oracle Data Guard Broker
Switchover
Running this command on primary or standby:
dgmgrl sys/syspassword DGMGRL> switchover to dg10 Performing switchover NOW, please wait…
Result will be something like this:
DGMGRL> switchover to dg10 Performing switchover NOW, please wait… New primary database "dg10" is opening… Operation requires shutdown of instance "DG9" on database "dg9" Shutting down instance "DG9"… ORA-01017: invalid username/password; logon denied Warning: You are no longer connected to ORACLE. Please complete the following steps to finish switchover: shut down instance "DG9" of database "dg9" start up instance "DG9" of database "dg9"
You may need to restart both dg10 – new active database and dg9 – new standby manually
sqlplus / as sysdba SQL> shutdown immediate; SQL> startup;
Now login to dgmgrl and check
DGMGRL> show configuration; Configuration - mydgconfig Protection Mode: MaxPerformance Databases: dg10 - Primary database dg9 - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS
DG10 now new primary database, DG9 became physical standby
DGMGRL> show database dg9 Database - dg9 Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds Apply Lag: 0 seconds Real Time Query: ON Instance(s): DG9 Database Status: SUCCESS
Check each database with transport lag and apply lag
DGMGRL> show database dg10 Database - dg10 Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): DG9 Database Status: SUCCESS
Check dg10 with TRANSPORT-ON
Now switch back
DGMGRL> switchover to dg9; Performing switchover NOW, please wait…
If you have some errors while switching, please refer to “Some typical errors” section at the end of this post.
Failover & Reinstate
Run this command on STANDBY because FAILOVER is in case your PRIMARY is damaged.
DGMGRL> FAILOVER to dg10; Performing failover NOW, please wait…
In just few seconds, its done. Check again
DGMGRL> FAILOVER to dg10; Performing failover NOW, please wait… Failover succeeded, new primary is "dg10" DGMGRL> show configuration Configuration - mydgconfig Protection Mode: MaxPerformance Databases: dg10 - Primary database dg9 - Physical standby database (disabled) ORA-16661: the standby database needs to be reinstated Fast-Start Failover: DISABLED Configuration Status: SUCCESS
Now if we got dg9 back without datafiles damaged -> just reinstate and add to dg10 the primary
If dg9 is back with datafiles damaged, then we need to restore the datafiles and then reinstate.
Restoring the datafiles is done from RMAN with some commands like this:
rman target / STARTUP MOUNT; RESTORE DATAFILE 2; SELECT FLASHBACK_ON FROM V$DATABASE; EXIT
Reinstate
Reinstate can be understand as re-register the database to the data guard configuration, so, you need to prepare database restored as above. Using those commands:
DGMGRL> reinstate database dg9; Reinstating database "dg9", please wait…
You will have something like this:
DGMGRL> reinstate database dg9; Reinstating database "dg9", please wait… Operation requires shutdown of instance "DG9" on database "dg9" Shutting down instance "DG9"… ORA-01017: invalid username/password; logon denied Warning: You are no longer connected to ORACLE. Please complete the following steps and reissue the REINSTATE command: shut down instance "DG9" of database "dg9" start up and mount instance "DG9" of database "dg9"
Now you need to restart mount database DG9 manually, you will have result like:
SQL> startup mount; ORACLE instance started. Total System Global Area 1603411968 bytes Fixed Size 2213776 bytes Variable Size 436209776 bytes Database Buffers 1157627904 bytes Redo Buffers 7360512 bytes Database mounted.
Now run the reinstate again at the DG10 – PRIMARY database:
DGMGRL> reinstate database dg9 Reinstating database "dg9", please wait…
Note: You may have to do it twice (reinstate and restart mount the database).
Finally you will have some successful screens:
DGMGRL> reinstate database dg9 Reinstating database "dg9", please wait… Reinstatement of database "dg9" succeeded
Now check the configuration:
DGMGRL> show configuration Configuration - mydgconfig Protection Mode: MaxPerformance Databases: dg10 - Primary database dg9 - Physical standby database Error: ORA-16766: Redo Apply is stopped Fast-Start Failover: DISABLED Configuration Status: ERROR
Now dg9 is back but with some errors. Check the log we have
The log at:
tail -200f /u01/app/oracle/diag/rdbms/dg9/DG9/trace/alert_DG9.log
Result is:
Additional information: 3 Recovery Slave PR00 previously exited with exception 38701 Errors in file /u01/app/oracle/diag/rdbms/dg9/DG9/trace/DG9_mrp0_16080.trc: ORA-38701: Flashback database log 18 seq 65 thread 1: "/u01/app/oracle/flash_recovery_area/DG9/flashback/o1_mf_gwo46ctq_.flb" ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory
Please check at the Some Typical Errors section below for ORA-27037 & ORA-38701.
Some Typical Errors & Solutions
ORA-16610: command “Broker automatic health check” in progress and DGM-17017: unable to determine configuration status
Solution: You need to restart primary and physical standby database manually.
DGMGRL> show configuration; Configuration - mydgconfig Protection Mode: MaxPerformance Databases: dg9 - Primary database dg10 - Physical standby database Fast-Start Failover: DISABLED Configuration Status: ORA-16610: command "Broker automatic health check" in progress DGM-17017: unable to determine configuration status
ORA-16778, ORA-16810
solution
Solution: You may need for a while and retry, the databases is preparing for synchronization.
DGMGRL> show configuration Configuration - mydgconfig Protection Mode: MaxPerformance Databases: dg9 - Primary database Error: ORA-16778: redo transport error for one or more databasesdg10 - Physical standby database
Error: ORA-16810: multiple errors or warnings detected for the database
Fast-Start Failover: DISABLED Configuration Status: ERROR
ORA-38760: This database instance failed to turn on flashback database and Slave exiting with ORA-38760 exception
Slave exiting with ORA-38760 exception is a general error which means you can relay on this error and need to find more specific error
Same with ORA-38760. In my case I got more detail error: Error: ORA-16766: Redo Apply is stopped
Error: ORA-16766: Redo Apply is stopped
Solution:
This message mean there was something to you standby database. Check the log at:
tail -200f /u01/app/oracle/diag/rdbms/dg10/DG9/trace/alert_DG9.log
You will see something like:
RFS[6]: Assigned to RFS process 44261 RFS[6]: Identified database type as 'physical standby': Client is ARCH pid 11634 RFS[6]: Selected log 8 for thread 1 sequence 159 dbid 1381157600 branch 1023448483 Fri Nov 15 12:30:14 2019 Archived Log entry 168 added for thread 1 sequence 159 ID 0x5261d062 dest 1: Fri Nov 15 12:30:17 2019 Media Recovery Log /home/data/flash_recovery_area/DG10/archivelog/2019_11_15/o1_mf_1_159_gwwfz6o4_.arc Media Recovery Waiting for thread 1 sequence 160 (in transit) Recovery of Online Redo Log: Thread 1 Group 7 Seq 160 Reading mem 0 Mem# 0: /u01/app/oracle/oradata/DG9/standby_redo01.log Fri Nov 15 12:30:26 2019 RFS[7]: Assigned to RFS process 44263 RFS[7]: Identified database type as 'physical standby': Client is ARCH pid 11648 Errors in file /u01/app/oracle/diag/rdbms/dg10/DG9/trace/DG9_rfs_44263.trc: ORA-16401: archive log rejected by Remote File Server (RFS)
Now do crosscheck archivelog and full backup on PRIMARY:
rman target / RMAN> crosscheck archivelog all; RMAN> BACKUP DATABASE PLUS ARCHIVELOG;
If you have huge datafiles and archivelogs, please consider delete them first using:
DELETE NOPROMPT ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-2';
Now copy all the backed up and archivelog to STANDBY:
scp -r oracle@dg9:/home/data/flash_recovery_area/DG9/archivelog/ /home/data/flash_recovery_area/DG9 scp -r oracle@dg9:/home/data/flash_recovery_area/DG9/backupset/ /home/data/flash_recovery_area/DG9
Next, crosscheck archivelog and backup on STANDBY
rman target /
RMAN> crosscheck archivelog all;
RMAN> BACKUP DATABASE PLUS ARCHIVELOG;
Now stop/start STANDBY database manually and retry with DGMGRL.
ORA-27037 & ORA-38701 Solution
ORA-38701: Flashback database log 18 seq 65 thread 1: "/u01/app/oracle/flash_recovery_area/DG9/flashback/o1_mf_gwo46ctq_.flb" ORA-27037: unable to obtain file status
These errors happen when you have some trouble with your flashback even if you restore point list is empty:
RMAN> list restore point all; using target database control file instead of recovery catalog SCN RSP Time Type Time Name RMAN>
Solution:
Do exactly as follow:
- Shutdown the database immediate
- startup mount
- turn off flashback: alter database flashback off;
- Open the database (read only if on standby)
- shutdown the database immediate (again)
- startup mount
- Turn on flashback (alter database flashback on)
- Open the database (read only if on standby)
- Check dgmgrl again