|
|
|
Computing the Offset of Corrupted ASM Block:
SQL> select GROUP_NUMBER,NAME,ALLOCATION_UNIT_SIZE from v$asm_diskgroup; GROUP_NUMBER NAME ALLOCATION_UNIT_SIZE
------------ ------------------------- --------------------
1 DATA 1048576
SQL> select GROUP_NUMBER, DISK_NUMBER, name, path
from v$asm_disk;
GROUP_NUMBER DISK_NUMBER NAME PATH
------------ ----------- ------------------------- --------------------
1 0 DATA_0000 /u01/oradata/oravol1
1 1 DATA_0001 /u01/oradata/oravol2
1 2 DATA_0002 /u01/oradata/oravol3
1 3 DATA_0003 /u01/oradata/oravol4
1 4 DATA_0004 /u01/oradata/oravol5
SQL> select BLOCK_SIZE from v$asm_file where FILE_NUMBER=516;
BLOCK_SIZE
----------
512
SQL> select DISK_KFFXP, AU_KFFXP from x$kffxp
where XNUM_KFFXP=24 and group_kffxp=1 and NUMBER_KFFXP=516;
DISK_KFFXP AU_KFFXP
---------- ----------
1 60884
Disk#1 : /u01/oradata/oravol2 Interpreting the truss Output of ARCH:
fd#261 is /u01/oradata/oravol2 for ARCH. Reading Offsets by ARCH: bash-3.00$ grep "pread(261" arc0.truss.log 26085: pread(261, 0xFFFFFD7FFC32DE00, 131072, 0xEDE600000) = 131072 26085: pread(261, 0xFFFFFD7FFC21CE00, 131072, 0xEDE620000) = 131072 26085: pread(261, 0xFFFFFD7FFC10BE00, 131072, 0xEDE640000) = 131072 26085: pread(261, 0xFFFFFD7FFBE2DE00, 131072, 0xEDE660000) = 131072 26085: pread(261, 0xFFFFFD7FFBA2DE00, 131072, 0xEDE680000) = 131072 26085: pread(261, 0xFFFFFD7FFB42DE00, 131072, 0xEDE6A0000) = 131072 26085: pread(261, 0xFFFFFD7FFB53DE00, 131072, 0xEDE6C0000) = 131072 26085: pread(261, 0xFFFFFD7FFB64DE00, 131072, 0xEDE6E0000) = 131072 26085: pread(261, 0xFFFFFD7FFADCDE00, 131072, 0xEDE700000) = 131072 26085: pread(261, 0xFFFFFD7FFAE6DE00, 131072, 0xEDE800000) = 131072 26085: pread(261, 0xFFFFFD7FFAEDDE00, 131072, 0xEDE720000) = 131072 26085: pread(261, 0xFFFFFD7FFAF7DE00, 131072, 0xEDE820000) = 131072 26085: pread(261, 0xFFFFFD7FFC2CDE00, 131072, 0xEDE740000) = 131072 26085: pread(261, 0xFFFFFD7FFC36DE00, 131072, 0xEDE840000) = 131072 26085: pread(261, 0xFFFFFD7FFC1BCE00, 131072, 0xEDE760000) = 131072 26085: pread(261, 0xFFFFFD7FFC25CE00, 131072, 0xEDE860000) = 131072 26085: pread(261, 0xFFFFFD7FFC0ABE00, 131072, 0xEDE780000) = 131072 26085: pread(261, 0xFFFFFD7FFC14BE00, 131072, 0xEDE880000) = 131072 26085: pread(261, 0xFFFFFD7FFBDCDE00, 131072, 0xEDE7A0000) = 131072 26085: pread(261, 0xFFFFFD7FFBE6DE00, 131072, 0xEDE8A0000) = 131072 26085: pread(261, 0xFFFFFD7FFB9CDE00, 131072, 0xEDE7C0000) = 131072 26085: pread(261, 0xFFFFFD7FFBA6DE00, 131072, 0xEDE8C0000) = 131072 26085: pread(261, 0xFFFFFD7FFB3CDE00, 131072, 0xEDE7E0000) = 131072 26085: pread(261, 0xFFFFFD7FFB46DE00, 131072, 0xEDE8E0000) = 131072 26085: pread(261, 0xFFFFFD7FFB51DE00, 131072, 0xEDE900000) = 131072 26085: pread(261, 0xFFFFFD7FFB62DE00, 131072, 0xEDE920000) = 131072 26085: pread(261, 0xFFFFFD7FFAE0DE00, 131072, 0xEDE940000) = 131072 26085: pread(261, 0xFFFFFD7FFAF1DE00, 131072, 0xEDE960000) = 131072 26085: pread(261, 0xFFFFFD7FFC30DE00, 131072, 0xEDE980000) = 131072 26085: pread(261, 0xFFFFFD7FFC1FCE00, 131072, 0xEDE9A0000) = 131072 26085: pread(261, 0xFFFFFD7FFC0EBE00, 131072, 0xEDE9C0000) = 131072 26085: pread(261, 0xFFFFFD7FFBE0DE00, 131072, 0xEDE9E0000) = 131072 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560 26085: pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560 26085: pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512 26085: pread(261, 0xFFFFFD7FFC53BE00, 16384, 0xEDDED4000) = 16384 bash-3.00$ As seen above, offsets starting with 0xEDE and 0xEDD5 are greater than our corrupted offset of 0xEDD4DFA00. So, They are out of the scope. The followings should be examined:
ARCH did not read the corrupted block#50941. But, it reported an error. dd Output of the Corrupted Block: ASM Corrupted Block Offset in 512 byte block: 63842417152/512=124692221 bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124692221 count=1|od -x
0000000 2201 0000 f0fd 0000 001b 0000 80d8 2304
<blockNo>
0000020 3838 322e 3731 312e 3431 7807 0a6c 111e
0000040 2230 3001 002c 0605 3131 3730 3130 3306
0x0000f0fd is not 50941. So, it's corrupted. The reason why ARCH did not read this block is hidden in the error messages: ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1 It says near. Finding the Other Corrupted Block:
dd Outputs on pread() of ARCH:
As seen above, the block numbers increase from 0xC000 to 0xC0FF. But, in the last call, it jumped to 0xC800. truss Output of ARCH for block# 0xC800 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
<blockNo>
26085: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
26085: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n
26085: 07\f %1F01 0 ,\00505 6 2 0 5 1\b a d a m k a c i0E 1 9 5 . 2 4 4
26085: . 6 2 . 1 4 507 x l\n07\f % "01 0 ,\00505 6 2 0 5 1\b a d a m k
26085: a c i\f 7 8 . 1 9 0 . 6 8 . 1 707 x l\n07\f % #01 0 ,\00502 - 1
26085: 05 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f % .02 - 2
26085: ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07
26085: \f &0102 - 2 ,\00505 6 1 1 4 105 1 9 5 5 60E 1 9 5 . 2 4 4 . 6 2
26085: . 1 4 707 x l\n07\f &\r01 0 ,\00505 6 1 1 4 105 1 9 5 5 6\f 8 8
26085: . 2 3 4 . 5 . 2 3 107 x l\n07\f &0F01 0 ,\00502 - 105 K A Y A 2
26085: 0E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f &1002 - 2 ,\00506 1 1
26085: 1 0 1 605 O K A Y A\f 8 5 . 1 0 8 . 8 7 . 5 007 x l\n07\f & !01
26085: 0 ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n
26085: 07\f & "02 - 2 ,\00505 4 1 9 3 806 6 4 3 2 5 5\r 8 8 . 2 2 5 . 1
26085: 2 0 . 5 307 x l\n07\f & +01 0 ,\00505 5 3 0 5 506 0 9 1 2 1 90E
Then, the following messages were written to the trace file: 26085: write(2, " * * * 2 0 0 9 - 0 3 -".., 27) = 27
26085: write(2, "\n", 1) = 1
26085: write(2, " ", 1) = 1
26085: write(2, "\n", 1) = 1
26085: write(2, " C o r r u p t r e d o".., 51) = 51
26085: write(2, "\n", 1) = 1
26085: write(2, " F l a g : 0 x 3 0 F".., 80) = 80
26085: write(2, "\n", 1) = 1
26085: write(2, " - - - - - D u m p o".., 39) = 39
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 4 6 3 8 3 0 2 0 3 0".., 64) = 64
<blockNoPiece0>
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 0 3 0 4 3 3 c 5 c 3 0".., 64) = 64
<blockNoPiece1>
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 0 5 c 5 0 3 0 3 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 0 5 c 3 0 2 0 3 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 2 3 9 3 5 3 2 2 0 0 9".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 0 3 1 3 0 3 0 2 0 3 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 1 3 0 3 0 5 c 3 2".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 4 3 0 3 0 5 c 3 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 2 0 3 0 5 c 3 2 3 0 3 9".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 0 5 c 3 0 2 0 3 6".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 0 3 0 4 3 3 c 2 0 3 7".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 a 3 5 3 2 3 9 3 0 2 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 5 c 3 0 5 c 3 0 3 0 3 0".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 4 2 5 4 2 0 4 4 0 a 4 9".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 2 0 3 8 2 0 3 0 2 0 3 5".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " 3 0 3 5 3 0 3 6 5 c 3 8".., 64) = 64
26085: write(2, "\n", 1) = 1
26085: write(2, " R e r e a d i n g l o".., 78) = 78
26085: write(2, "\n", 1) = 1
Rereading the block fails like this. There are 2 problems:
Checking missing IO of LGWR from truss Output :
bash-3.00$ grep Err lgwr.truss.log|grep pwrite bash-3.00$ grep Err lgwr.truss.log|grep pread bash-3.00$ No missing IO. Checking IO buffers of LGWR: fd#260 is /u01/oradata/oravol2 for LGWR. The Last write to block: 25925: pwrite(260, 0x380D78400, 76288, 0xEDD420000) = 76288
25925: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
<blockNo>
25925: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
25925: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n
As seen above, the contents of redo buffer is corrupted. The block number is 0xC800. But, this LGWR had generated correct archivelog: bash-3.00$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf bs=512 skip=256 count=1|od -x
1+0 records in
1+0 records out
0000000 2201 0000 0100 0000 0019 0000 8000 d162
<blockNo>
0000020 3534 332e 2e33 3032 0733 6b78 0904 3c0c
0000040 0114 2c30 0500 3205 3031 3631 6905 6e69
0x0100 = 256, which is the correct block number. Looks like a configuration issue or a bug in OS/STORAGE side.
This issue handles redo corruption only. But, the database encounters the corruptions on UNDO,INDEX,TABLE, CONTROL FILES, too. But, the root cause is same: Similar to This issue will be updated when a comment is sent by the OS vendor. Operating System reinstalled by the vendor. Then problem has not occured.
|
|||||||||||||||||||||||||||||||||||||||||
As seen above, the last successful sequence before the corruption is 25.
Header of Archive Log:
(root@gdksun1:bin)$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf bs=512 skip=50941 count=1|od -x 0000000 2201 0000 c6fd 0000 0019 0000 81d8 54c6 <blockNo> 0000020 2e32 3134 362e 0736 6b78 1207 2f0f 0212 0000040 332d 002c 0505 3831 3834 0532 7567 6469 0000060 0c65 3838 322e 3433 382e 2e38 3138 7807 0000100 076b 0f12 172f 3001 002c 0505 3032 3834 0000120 0739 7362 7361 6369 0e69 3538 312e 3530 0000140 312e 3535 322e 3233 7807 076b 0f12 172fThe block number is 0x0000c6fd (bytes swapped since the platform is little endian). Since 50941=0x0000c6fd, block number in archive log is correct. That means, LGWR had successfuly written the correct redo before the log switch.