Bhavani's Blog: Work around ORA-600: internal error arguments: [3020], [3 ]– Oracle 11gr2 RAC

Mitigate/Work around ORA-00600: internal error code, arguments: [3020], [3], [346523], [12929435], [], [], [], [], [], [], [], [] – Oracle 11gr2 RAC – Active data-guard

Let me brief the background, we have banking database with 2 node RAC in production and we have configured Active data-guard and it is fully utilized for reports database for every minute and we cannot effort down time for even for a minute.

The steps I followed were Inspired by “The Arup Nanda Blog: Resolving Gaps in Data Guard Apply Using Incremental RMAN Backup “ : http://arup.blogspot.com/2009/12/resolving-gaps-in-data-guard-apply.html

A big thanks to Arup (I never interacted with him)

I will narrate the chain of events as they happened:

We use the following query to check for the sync:

Query-1

SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference" FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#))ARCH, (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1;

Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

1 44878 44878 0

2 15739 15739 0

My team mate noticed the difference has risen to 7 & 8 in the respective threads

I started investigating and look at the alert log and we had the ora-600 error and also looked at the following query:

Query-2

SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#,BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;

(Sample query output not actual result)

PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS

--------- ------------ ---------- ---------- ---------- ----------

ARCH CONNECTED 0 0 0 0

RFS IDLE 0 0 0 0

RFS IDLE 1 44883 41973 1

RFS IDLE 0 0 0 0

RFS IDLE 2 15743 271919 1

RFS IDLE 0 0 0 0

MRP0 WAIT_FOR_LOG 2 15743 0 0

13 rows selected.

Noticed MRP0 process was missing in the actual result

Also verified in alert log:

Errors in file /u01/app/oracle/diag/rdbms/flexprdb/FLEXDRDB1/trace/FLEXDRDB1_mrp0_24466.trc:

ORA-00600: internal error code, arguments: [3020], [3], [346523], [12929435], [], [], [], [], [], [], [], []

ORA-10567: Redo is inconsistent with data block (file# 3, block# 346523, file offset is 2838716416 bytes)

ORA-10564: tablespace UNDOTBS1

ORA-01110: data file 3: '+FLEXDATA/flexprdb/datafile/undotbs1.304.742240195'

ORA-10560: block type 'KTU UNDO BLOCK'

MRP0: Background Media Recovery process shutdown (FLEXDRDB1)

Here I had to do something which can be faster than oracle support response; then I recalled Arup’s blog and thought why not take chance...

As alert log messaged saying “ORA-10567: Redo is inconsistent with data block (file# 3, block# 346523, file offset is 2838716416 bytes).

The inference is somehow tablespace UNDOTBS1 having file 3: '+FLEXDATA/flexprdb/datafile/undotbs1.304.742240195' has an issue in recovery.”

Following were the steps I have taken:

1. Taken RMAN backup of datafile 3 as mentioned in the alert log file from the production having a size of 50 MB backup peace it took around 5 minutes to take the backup.

2. FTP to DR site it took around 5 minutes

3. Cataloged the file directory for RMAN restore at Standby database side

4. Shutdown the standby database by command “shutdown immediate” and it was not shutting down (its almost 10 minutes) for some reason , I saw the following message:

Fri Feb 18 16:54:14 2011

SHUTDOWN: Active processes prevent shutdown operation

Probably I did not use the command “ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL” Still puzzling as MRP0 is down and its taking so much time to shutdown!!!

5. Took decision to give command “SHUTDOWN ABORT”

6. RMAN> STARTUP MOUNT;

7. RMAN> RESTORE DATAFILE 3;

Alert log message:

Fri Feb 18 16:56:26 2011

ERROR: failed to establish dependency between database FLEXPRDB and diskgroup resource ora.FLEXDATA.dg

Fri Feb 18 17:01:14 2011

Full restore complete of datafile 3 +FLEXDATA/flexprdb/datafile/undotbs1.304.742240195. Elapsed time: 0:04:05

checkpoint is 6808181049

last deallocation scn is 6808094618

Undo Optimization current scn is 6807046599

Fri Feb 18 17:01:24 2011

RFS[1]: Assigned to RFS process 8532

RFS[1]: Identified database type as 'physical standby': Client is ARCH pid 16697

8. ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

9. Executed the query-1 at SQL prompt I could see the archives were getting applied

10. Also validated with query-2 to find MRP0 is back again – I was very relieved for the success

11. As users were really depended on the active data-guard, I forced to use the following commands to give back the database to users for reports usage

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

ALTER DATABASE OPEN

This process took more than 20-30 minutes as it had around 30 archives to be applied to make it consistent and then followed by:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGGILE DISCONNECT;

SQL> select open_mode from v$database;

OPEN_MODE

--------------------

READ ONLY WITH APPLY

Oracle Version: Oracle RAC Active data-guard standby database

SQL> select banner from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

PL/SQL Release 11.2.0.1.0 - Production

CORE 11.2.0.1.0 Production

TNS for Linux: Version 11.2.0.1.0 - Production

NLSRTL Version 11.2.0.1.0 - Production

I hope this will help somebody and I appreciate your feedback....

6 comments:

NagaAppaniFebruary 21, 2011 at 10:17 AM
Good Bhavani, I can Visualize how u did it... Thanks for sharing...
Sunil BholaFebruary 27, 2011 at 10:19 PM
As per above note you told :- "I forced to use the following commands to give back the database to users for reports usage"

as you are using 11G you would have used active dataguard:-

recover managed standby database cancel;
alter database open read only;
recovery managed standby database disconnect using current logfile;
BhavaniFebruary 28, 2011 at 11:08 PM
Thanks for the feedback and I agree with you.
-Bhavani
Arup NandaMarch 2, 2011 at 7:44 AM
Wow! Thanks, Bhavani for the reference to my blog post. I have to admit when I wrote it, I thought of only one use case - to refresh standby after a long desync. I didn't think about the other use cases such as redo entry being corrupt. That's a brilliant use of this incremental RMAN apply.
UnknownMarch 17, 2011 at 5:53 AM
Good Bhavani, Thanks for sharing.
shy123September 11, 2019 at 12:51 PM
Hi Bhavani-I am trying to understand how the standby catches up with the primary after having restored the datafile in question. I mean what mechanism does it use to sync up after the datafile restore to get in sync with the Primary. Appreciate your feedback!

Bhavani's Blog

Saturday, February 19, 2011

Work around ORA-600: internal error arguments: [3020], [3 ]– Oracle 11gr2 RAC – Active data-guard

A big thanks to Arup (I never interacted with him)

6 comments: