[QA-65] ubGuard 4.0.0-4.0.1 Prerelease Commands Created: 19/Jan/22  Updated: 19/Jan/22

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: ubTools - ubGuard Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0


 Description   

This document explains ubGuard 4.0.0-4.0.1 prerelease commands.

 Comments   
Comment by ubTools Support [ 19/Jan/22 08:15 PM ]
COMMANDS:

ubGuard executable is <UBGUARD_HOME>/bin/ubguard.sh (ubguard.bat for Windows).

Prerequisite for all commands:

  • Oracle listeners must be running on primary and standy servers.

Setup:

Prerequisites:

  • <UBGUARD_HOME>/conf/setup.properties must be filled.
  • Primary databases must be in OPEN state.
  • Standby databases must be in MOUNT state.

Usage:

CMD> ubguard.sh setup

Start:

Prerequisites:

  • Primary database must be in OPEN state.
  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh start guard -d <ubguard_database_alias>

Stop:

Usage:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>

Status:

Prerequisites:

  • Primary database must be in OPEN state.
  • Standby database must be in MOUNT or OPEN state.

Usage:

CMD> ubguard.sh status guard -d <ubguard_database_alias>

Failover:

Prerequisites:

  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh failover to <ubguard_database_alias> [-f]

Default Failover:

It is a failover without "-f" option. It applies archivelogs to standby database, activates standby database as primary database. It causes less data loss, but longer failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> RECOVER DATABASE;
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT;

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.

Forced Failover:

It is a failover with "-f" option. It doesn't apply archivelogs to standby database. It activates standby database as primary database. It causes more data loss, but less failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT; 

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.

Switchover:

Prerequisites:

  • Primary database must be in MOUNT or OPEN state.
  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh switchover to <ubguard_database_alias>




[QA-64] ubGuard 4.0.0-1.0.0 Prerelease Commands Created: 19/Mar/19  Updated: 19/Jan/22

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: ubTools - ubGuard Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0


 Description   
This document explains ubGuard 4.0.0-1.0.0 prerelease commands.

 Comments   
Comment by ubTools Support [ 19/Mar/19 02:08 PM ]
COMMANDS:

ubGuard executable is <UBGUARD_HOME>/bin/ubguard.sh (ubguard.bat for Windows).

Prerequisite for all commands:

  • Oracle listeners must be running on primary and standy servers.

Setup:

Prerequisites:

  • <UBGUARD_HOME>conf/setup.properties must be filled.
  • Primary databases must be opened.
  • Standby databases must be in MOUNT state.

Usage:

CMD> ubguard.sh setup

Start:

Prerequisites:

  • Primary database must be opened.
  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh start guard -d <ubguard_database_alias>

Stop:

Usage:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>

Status:

Usage:

CMD> ubguard.sh status guard -d <ubguard_database_alias>

Failover:

Prerequisites:

  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh failover to <ubguard_database_alias> [-f]

Default Failover:

It is a failover without "-f" option. It gets all missing archivelogs from primary server, applies archivelogs to standby database, activates standby database as primary database. It causes less data loss, but longer failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>
--> Copy all missing archivelogs from primary server to standby server
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> SHUTDOWN IMMEDIATE;
RMAN> STARTUP MOUNT;
RMAN> RECOVER DATABASE;
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT;

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.

Forced Failover:

It is a failover with "-f" option. It doesn't get archivelogs from primary server and it doesn't apply archivelogs to standby database. It activates standby database as primary database. It causes more data loss, but less failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -d <ubguard_database_alias>
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> SHUTDOWN IMMEDIATE;
RMAN> STARTUP MOUNT;
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT; 

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.





[QA-63] ORA-600 [3020] on the standby after adding a datafile on primary Created: 16/Feb/18  Updated: 16/Feb/18

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.4
Operating System: Linux
Host Name: .
Database Name: .

 Description   
Problem:

The customer has added a datafile on the primary database. After the datafile was created on the standby database, ORA-600 [3020] was encountered while applying an archivelog to this datafile on standby.

ORA-600 [3020]:

This is called a 'STUCK RECOVERY'.

  There is an inconsistency between the information stored in the redo 
  and the information stored in a database block being recovered.

Ref: Doc ID 30866.1



 Comments   
Comment by ubTools Support [ 16/Feb/18 10:36 AM ]
PROBLEM OCCURRENCE:

Adding datafile on the Primary:

Tue Feb 13 07:44:52 2018
ALTER TABLESPACE MENKUL2018_DATA
  ADD DATAFILE '/orassd/orcl/datafile/menkul2018_data02.dbf'
  SIZE 5G
  AUTOEXTEND ON
  NEXT 100M
  MAXSIZE UNLIMITED
Completed: ALTER TABLESPACE MENKUL2018_DATA
  ADD DATAFILE '/orassd/orcl/datafile/menkul2018_data02.dbf'
  SIZE 5G
  AUTOEXTEND ON
  NEXT 100M
  MAXSIZE UNLIMITED
Tue Feb 13 07:45:38 2018

Applying Archivelogs on the Standby:

Tue Feb 13 07:46:55 2018
ALTER DATABASE RECOVER AUTOMATIC STANDBY DATABASE UNTIL CHANGE 69358633799
Media Recovery Start
 started logmerger process
Tue Feb 13 07:46:55 2018
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 24 slaves
Media Recovery Log /u01/ORCL/archive/52b86b9a_1_91417_922239972.arc
Tue Feb 13 07:47:07 2018
Successfully added datafile 232 to media recovery
Datafile #232: '/u01/oracle/app/oradata/ORCL/datafile/ORCL_STBY/datafile/o1_mf_menkul20_f84vg0lt_.dbf'
Incomplete Recovery applied until change 69358633799 time 02/13/2018 07:45:38
Tue Feb 13 07:47:08 2018
Media Recovery Complete (ORCL)
Completed: ALTER DATABASE RECOVER AUTOMATIC STANDBY DATABASE UNTIL CHANGE 69358633799

.....

Tue Feb 13 08:26:55 2018
ALTER DATABASE RECOVER AUTOMATIC STANDBY DATABASE UNTIL CHANGE 69358697423
Media Recovery Start
 started logmerger process
Tue Feb 13 08:26:55 2018
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 24 slaves
Media Recovery Log /u01/ORCL/archive/52b86b9a_1_91425_922239972.arc
Tue Feb 13 08:26:57 2018
Errors in file /u01/oracle/app/diag/rdbms/orcl_stby/ORCL/trace/ORCL_pr0i_21945.trc  (incident=131897):
ORA-00600: internal error code, arguments: [3020], [232], [3], [973078531], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 232, block# 3, file offset is 24576 bytes)
ORA-10564: tablespace MENKUL2018_DATA
ORA-01110: data file 232: '/u01/oracle/app/oradata/ORCL/datafile/ORCL_STBY/datafile/o1_mf_menkul20_f84vg0lt_.dbf'
ORA-10560: block type '0'
Incident details in: /u01/oracle/app/diag/rdbms/orcl_stby/ORCL/incident/incdir_131897/ORCL_pr0i_21945_i131897.trc
Tue Feb 13 08:26:59 2018
Comment by ubTools Support [ 16/Feb/18 11:06 AM ]
ANALYSIS of the RESULT:
Ref: ORCL_pr0i_21945.trc

Data:

REDO Dump:

KCOX_FUTURE: CHANGE IN FUTURE OF BLOCK

*** 2018-02-13 08:26:56.826
RECOVERY STUCK AT BLOCK 3 OF FILE 232
Redo record scn: 0x0010.2619b07d
CHANGE #1 TYP:0 CLS:12 AFN:232 DBA:0x3a000003 OBJ:4294967295
 SCN:0x0010.2618c0fb SEQ:2 OP:22.5 ENC:0 RBL:0

Buffer read during recovery:
.....

The stuck recovery happened at file#232 block#3 with KCOX_FUTURE: CHANGE IN FUTURE OF BLOCK information.

Block Dump:

buffer tsn: 141 rdba: 0x3a000003 (232/3)
scn: 0x0000.00000000 seq: 0x01 flg: 0x05 tail: 0x00000001
frmt: 0x02 chkval: 0x9d03 type: 0x00=unknown
on-disk scn: 0x0.0

SCN is 0, type is unknown. flg is 0x05:

Where flg: 0x05 contains flag 0x1 (unused,unformatted block).

Ref: Oracle Doc ID 17896895.8

Comment:

The change vector was expecting SCN:0x0010.2618c0fb on the block. But, the SCN on the block was 0x0000.00000000.

Oracle was trying to apply archivelog to an unformatted block. REDO in archivelog is beyond block in datafile. This inconsistency causes stuck recovery.

Comment by ubTools Support [ 16/Feb/18 12:12 PM ]
ANALYSIS of the ROOT CAUSE:

Data:

The datafile has been created at sequence#91417 and the problem happened at sequence#91425 at file#232 block#3.

REDO Dump Commands:

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91417_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91418_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91419_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91420_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91421_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91422_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL>  ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91423_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91424_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL>  ALTER SYSTEM dump  logfile '/u01/ORCL/archive/52b86b9a_1_91425_922239972.arc' dba min 232 3 dba max 232 3;

System altered.

SQL> 

REDO Dumps:

DUMP OF REDO FROM FILE '/u01/ORCL/archive/52b86b9a_1_91417_922239972.arc'
.....
REDO RECORD - Thread:1 RBA: 0x016519.00000765.01e8 LEN: 0x0058 VLD: 0x01
SCN: 0x0010.2618c0fb SUBSCN:  1 02/13/2018 07:44:59
(LWN RBA: 0x016519.00000763.0010 LEN: 0004 NST: 0002 SCN: 0x0010.2618c0f9)
CHANGE #1 TYP:1 CLS:12 AFN:232 DBA:0x3a000003 OBJ:4294967295 SCN:0x0010.2618c0fb SEQ:1 OP:22.4 ENC:0 RBL:0
ktfbbfo - File BitMap Block Format:
BitMap Control:
RelFno: 232, BeginBlock: 128, Flag: 0, First: 0, Free: 63488

REDO RECORD - Thread:1 RBA: 0x016519.00000763.0010 LEN: 0x0244 VLD: 0x05
SCN: 0x0010.2618c0fb SUBSCN:  1 02/13/2018 07:44:59
CHANGE #1 TYP:0 CLS:69 AFN:3 DBA:0x00c018c0 OBJ:4294967295 SCN:0x0010.2618c0ee SEQ:1 OP:5.4 ENC:0 RBL:0
ktucm redo: slt: 0x0007 sqn: 0x000060e7 srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x3000319f.07d5.06 ext: 2 spc: 7334 fbi: 0
CHANGE #2 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:17.30 ENC:0
Add datafiles to tablespace #141
file #232  relative file #232. '/orassd/orcl/datafile/menkul2018_data02.dbf'
flags(reuse): 0x0
Checkpointed at scn:  0x0010.2618c0f0 02/13/2018 07:44:56
.....
DUMP OF REDO FROM FILE '/u01/ORCL/archive/52b86b9a_1_91425_922239972.arc'
.....
REDO RECORD - Thread:1 RBA: 0x016521.0001a022.0034 LEN: 0x0040 VLD: 0x01
SCN: 0x0010.2619b07d SUBSCN: 14 02/13/2018 08:23:51
(LWN RBA: 0x016521.00019dca.0010 LEN: 1012 NST: 0002 SCN: 0x0010.2619b069)
CHANGE #1 TYP:0 CLS:12 AFN:232 DBA:0x3a000003 OBJ:4294967295 SCN:0x0010.2618c0fb SEQ:2 OP:22.5 ENC:0 RBL:0
ktfbbredo - File BitMap Block Redo:
Use Bits:

Comment:

The datafile has been created at sequence#91417 by REDO OP code 17.30, which means:

the OP:17.30 redo which adds the <file#> datafile

Ref: Oracle Doc ID 27229389.8

There are some other OP codes 22.4 and 5.4 before adding the datafile.

Change Vector for OP Code 22:4:

It tries to change absolute file#232 (AFN:232 DBA:0x3a000003). This is the problem that Oracle tries to apply a change vector to a file which was not created yet.

Change Vector for OP Code 5:4:

It tries to change absolute file#3 (AFN:3 DBA:0x00c018c0). This is a different file. So, it's out of the scope.

Comment by ubTools Support [ 16/Feb/18 12:26 PM ]
SOLUTION:

Problem:

Oracle tries to apply archivelog to a file which was not created on standby yet.

Fix:

This is Oracle bug 27229389.

Workaround:

Copy datafile from primary to standby that doesn't require corrupted archivelogs.





[QA-62] ubGuard 3.0.0 Commands Created: 26/Dec/17  Updated: 26/Dec/17

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: ubTools - ubGuard Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0


 Description   
This document explains ubGuard 3.0.0 commands.

 Comments   
Comment by ubTools Support [ 26/Dec/17 01:55 PM ]
COMMANDS:

ubGuard executable is <UBGUARD_HOME>/bin/ubguard.sh (ubguard.bat for Windows).

Prerequisite for all commands:

  • Regarding Oracle listeners must be running.

Setup:

Prerequisites:

  • <UBGUARD_HOME>conf/setup.properties must be filled.
  • Primary databases must be opened.
  • Standby databases must be in MOUNT state.

Usage:

CMD> ubguard.sh setup

Start:

Prerequisites:

  • Primary database must be opened.
  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh start guard -i <ubguard_instance_alias>

Stop:

Usage:

CMD> ubguard.sh stop guard -i <ubguard_instance_alias>

Status:

Usage:

CMD> ubguard.sh status guard -i <ubguard_instance_alias>

Failover:

Prerequisites:

  • Standby database must be in MOUNT state.

Usage:

CMD> ubguard.sh failover to <ubguard_instance_alias> [-f]

Default Failover:

It is a failover without "-f" option. It gets all missing archivelogs from primary server, applies archivelogs to standby database, activates standby database as primary database. It causes less data loss, but longer failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -i <ubguard_instance_alias>
--> Copy all missing archivelogs from primary server to standby server
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> SHUTDOWN IMMEDIATE;
RMAN> STARTUP MOUNT;
RMAN> RECOVER DATABASE;
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT; 

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.

Forced Failover:

It is a failover with "-f" option. It doesn't get archivelogs from primary server and it doesn't apply archivelogs to standby database. It activates standby database as primary database. It causes more data loss, but less failover time.

The alternative manual method by RMAN on standby:

CMD> ubguard.sh stop guard -i <ubguard_instance_alias>
CMD> SET ORACLE_SID=<SID>
CMD> rman target /
RMAN> SHUTDOWN IMMEDIATE;
RMAN> STARTUP MOUNT;
RMAN> SQL 'ALTER DATABASE ACTIVATE STANDBY DATABASE';
RMAN> ALTER DATABASE OPEN;
RMAN> EXIT; 

If the manual method is used, ubGuard setup must be run again to update ubGuard's catalog.





[QA-60] "PRVF-5507 : NTP daemon or service is not running on any node ..." even if NTP is running. Created: 05/Mar/16  Updated: 05/Mar/16

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.4 RAC
Operating System: Linux
Operating System Version: Oracle Linux 7.2
Host Name: .
Database Name: .

 Description   
CVU gives the following error:
$./runcluvfy.sh stage -pre crsinst -n sygnx01,sygnx02 -verbose
.....
No NTP Daemons or Services were found to be running
PRVF-5507 : NTP daemon or service is not running on any node but NTP configuration file exists on the following node(s):
sygnx02,sygnx01
Result: Clock synchronization check using Network Time Protocol(NTP) failed


 Comments   
Comment by ubTools Support [ 05/Mar/16 03:12 PM ]
NTP status:
[root@sygnx01 ~]# systemctl status ntpd
 ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2016-03-05 14:32:46 EET; 1h 29min ago
  Process: 1074 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1081 (ntpd)
   CGroup: /system.slice/ntpd.service
           1081 /usr/sbin/ntpd -u ntp:ntp -x -g

NTP is running.

Comment by ubTools Support [ 05/Mar/16 03:17 PM ]
CVU Trace:

Generating Trace:

$ export CV_TRACELOC=/tmp
$ export SRVM_TRACE=true
$ ./runcluvfy.sh stage -pre crsinst -n sygnx01,sygnx02 -verbose
.....

Excerpt from the trace:

[20421@***.***.com] [Worker 0] [ 2016-03-05 16:20:46.657 EET ] [RuntimeExec.runCommand:77]  /tmp/CVU_11.2.0.4.0_grid/exectask.sh -chkfile /var/run/ntpd.pid
[20421@***.***.com] [Worker 0] [ 2016-03-05 16:20:46.659 EET ] [RuntimeExec.runCommand:142]  runCommand: Waiting for the process
[20421@***.***.com] [Thread-216] [ 2016-03-05 16:20:46.659 EET ] [StreamReader.run:61]  In StreamReader.run
[20421@***.***.com] [Thread-217] [ 2016-03-05 16:20:46.659 EET ] [StreamReader.run:61]  In StreamReader.run
[20421@***.***.com] [Thread-216] [ 2016-03-05 16:20:46.668 EET ] [StreamReader.run:65]  OUTPUT><CV_VRES>1</CV_VRES><CV_LOG>Exectask: file check failed</CV_LOG><CV_ERES>0</CV_ERES>
.....
[20421@sygnx01.sankomenkul.com] [main] [ 2016-03-05 16:20:46.669 EET ] [TaskDaemonLiveliness.displayDaemonLivelinessOutput:283]  Daemon 'ntpd' is not running on node: 'sygnx01'

"/var/run/ntpd.pid" doesn't exist.

Comment by ubTools Support [ 05/Mar/16 03:22 PM ]
Solution

There was no "/var/run/ntpd.pid" file defined in "/etc/sysconfig/ntpd". The problem has been solved after setting as below:

#OPTIONS="-g"
OPTIONS="-x -g -p /var/run/ntpd.pid"

Additional note:

NTP has been replaced by Chrony(new feature) in Oracle Linux 7.

Ref: Oracle Note: Unable to Configure NTP after Oracle Linux 7 Installation (Doc ID 1995703.1)





[QA-59] Unable to use the full CPU speed when CPUfreq Governor is ondemand. Created: 29/Sep/15  Updated: 02/Oct/15

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

File Attachments: PNG File AWR.png     PNG File EMTopActivity.png    
Product Version: 11.2.0.3
Operating System: Linux
Operating System Version: 2.6.32-504.23.4.el6.x86_64
Host Name: .
Database Name: .

 Description   
The customer is unable use the full CPU speed. The CPUfreq Governor is OnDemand.

 Comments   
Comment by ubTools Support [ 29/Sep/15 12:16 PM ]
See the following notes for the basic definitions of CPUfreq Governors:
Comment by ubTools Support [ 29/Sep/15 12:21 PM ]
ENVIRONMENT:

Data:
for the CPU0(similar for the others):

[perftest1]/sys/devices/system/cpu/cpu0/cpufreq $ more *
::::::::::::::
affected_cpus
::::::::::::::
0
cpuinfo_cur_freq: Permission denied
::::::::::::::
cpuinfo_max_freq
::::::::::::::
2000000
::::::::::::::
cpuinfo_min_freq
::::::::::::::
1200000
::::::::::::::
cpuinfo_transition_latency
::::::::::::::
10000

*** ondemand: directory ***

::::::::::::::
related_cpus
::::::::::::::
0
::::::::::::::
scaling_available_frequencies
::::::::::::::
2000000 1900000 1800000 1700000 1600000 1500000 1400000 1300000 1200000
::::::::::::::
scaling_available_governors
::::::::::::::
ondemand userspace performance
::::::::::::::
scaling_cur_freq
::::::::::::::
2000000
::::::::::::::
scaling_driver
::::::::::::::
acpi-cpufreq
::::::::::::::
scaling_governor
::::::::::::::
ondemand
::::::::::::::
scaling_max_freq
::::::::::::::
2000000
::::::::::::::
scaling_min_freq
::::::::::::::
1200000
::::::::::::::
scaling_setspeed
::::::::::::::
<unsupported>

*** stats: directory ***

[perftest1]/sys/devices/system/cpu/cpu0/cpufreq $ cd ondemand
[perftest1]/sys/devices/system/cpu/cpu0/cpufreq/ondemand $ ls -ltr
total 0
-r--r--r-- 1 root root 4096 Sep 29 15:17 sampling_rate_min
-r--r--r-- 1 root root 4096 Sep 29 15:17 sampling_rate_max
-rw-r--r-- 1 root root 4096 Sep 29 15:17 up_threshold
-rw-r--r-- 1 root root 4096 Sep 29 15:17 sampling_rate
-rw-r--r-- 1 root root 4096 Sep 29 15:17 powersave_bias
-rw-r--r-- 1 root root 4096 Sep 29 15:17 ignore_nice_load
[perftest1]/sys/devices/system/cpu/cpu0/cpufreq/ondemand $ more *
::::::::::::::
ignore_nice_load
::::::::::::::
0
::::::::::::::
powersave_bias
::::::::::::::
0
::::::::::::::
sampling_rate
::::::::::::::
10000
::::::::::::::
sampling_rate_max
::::::::::::::
4294967295
::::::::::::::
sampling_rate_min
::::::::::::::
10000
::::::::::::::
up_threshold
::::::::::::::
95
[perftest1]/sys/devices/system/cpu/cpu0/cpufreq/ondemand $

View:

  • scaling_governor: CPU scaling governor is ondemand.
  • cpuinfo_min_freq: Minimum CPU frequency is 1200000Khz(1.2Ghz)
  • cpuinfo_max_freq: Maximum CPU frequency is 2001000Khz(2.0Ghz)
  • sampling_rate: The kernel looks at the CPU usage per 10000us(10ms) to make decisions about CPU frequency.
  • up_threshold: The kernel will increase the CPU frequency if average CPU usage between each sampling_rate(10ms) is higher than 95%.
Comment by ubTools Support [ 29/Sep/15 12:24 PM ]
atop(http://www.atoptool.nl/) tool wil be used to monitor CPU frequencies.

From the man page of atop:

In  case  that the kernel module 'cpufreq_stats' is active (after issueing 'modprobe cpufreq_stats'), the average frequency ('avgf')
            and the average scaling percentage ('avgscal') is shown. Otherwise the current frequency ('curf') and the current scaling percentage
            ('curscal') is shown at the moment that the sample is taken.

In order to compare the CPU usages to the frequencies, CPU "cpufreq_stats" should be enabled. Otherwise, atop will show the current frequencies, not the average during monitoring samples.

Comment by ubTools Support [ 29/Sep/15 12:29 PM ]
METHOD:
  • The tests will be done when CPU scaling governors are ondemand and then performance.
  • The same work load will be generated by HP's LOAD RUNNER tool.
  • The results will be compared.
Comment by ubTools Support [ 29/Sep/15 12:57 PM ]
TEST1:

CPU scaling governor is ondemand.

An atop snapshot:

ATOP - avsprddbflx05                2015/09/29  15:40:21                ---------                  10s elapsed
PRC | sys    7.86s | user  80.03s | #proc   1347 | #tslpi  1791 | #tslpu     0 | #zombie    0 | no  procacct |
CPU | sys      66% | user    800% | irq      13% | idle    626% | wait     95% | avgf 1.63GHz | avgscal  81% |
cpu | sys       4% | user     86% | irq       1% | idle      6% | cpu000 w  3% | avgf 1.94GHz | avgscal  96% |
cpu | sys       4% | user     77% | irq       5% | idle     11% | cpu004 w  3% | avgf 1.90GHz | avgscal  94% |
cpu | sys       4% | user     72% | irq       0% | idle     17% | cpu001 w  7% | avgf 1.83GHz | avgscal  91% |
cpu | sys       3% | user     67% | irq       0% | idle     20% | cpu002 w  9% | avgf 1.80GHz | avgscal  90% |
cpu | sys       3% | user     61% | irq       0% | idle     28% | cpu003 w  8% | avgf 1.73GHz | avgscal  86% |
cpu | sys       6% | user     54% | irq       1% | idle     34% | cpu009 w  6% | avgf 1.62GHz | avgscal  80% |
cpu | sys       3% | user     52% | irq       1% | idle     36% | cpu005 w  8% | avgf 1.68GHz | avgscal  83% |
cpu | sys       7% | user     47% | irq       1% | idle     29% | cpu008 w 16% | avgf 1.67GHz | avgscal  83% |
cpu | sys       6% | user     45% | irq       0% | idle     48% | cpu013 w  1% | avgf 1.52GHz | avgscal  76% |
cpu | sys       7% | user     39% | irq       1% | idle     53% | cpu015 w  1% | avgf 1.49GHz | avgscal  74% |
cpu | sys       3% | user     41% | irq       0% | idle     49% | cpu006 w  7% | avgf 1.57GHz | avgscal  78% |
cpu | sys       5% | user     34% | irq       0% | idle     52% | cpu010 w  9% | avgf 1.51GHz | avgscal  75% |
cpu | sys       2% | user     35% | irq       1% | idle     55% | cpu007 w  7% | avgf 1.55GHz | avgscal  77% |
cpu | sys       4% | user     32% | irq       0% | idle     63% | cpu014 w  1% | avgf 1.43GHz | avgscal  71% |
cpu | sys       4% | user     31% | irq       0% | idle     59% | cpu011 w  6% | avgf 1.46GHz | avgscal  72% |
cpu | sys       2% | user     26% | irq       0% | idle     68% | cpu012 w  3% | avgf 1.43GHz | avgscal  71% |
CPL | avg1    5.33 | avg5    5.46 | avg15   4.57 | csw   377039 | intr  323539 |              | numcpu    16 |
MEM | tot   126.1G | free   38.8G | cache   3.6G | dirty   4.0M | buff  146.3M | slab  577.8M |              |
SWP | tot    17.1G | free   17.1G |              |              |              | vmcom  12.9G | vmlim  42.6G |
NET | transport    | tcpi   20869 | tcpo   21016 | udpi   70875 | udpo   71067 | tcpao     33 | tcppo      1 |
NET | network      | ipi   128214 | ipo    92084 | ipfrw      0 | deliv  91742 | icmpi      0 | icmpo      0 |

  PID    TID   SYSCPU   USRCPU    VGROW   RGROW  RUID       EUID       THR  ST   EXC  S   CPU  CMD        1/64
13661      -    0.15s    4.50s   -24.0M  -14.3M  grid       oracle       1  --     -  R   47%  oracle
15747      -    0.16s    3.80s       0K    684K  grid       oracle       1  --     -  S   40%  oracle
13733      -    0.32s    3.57s   32768K  31360K  grid       oracle       1  --     -  R   39%  oracle
27274      -    0.61s    3.21s   24576K  11976K  grid       oracle       1  --     -  R   39%  oracle
14869      -    0.17s    3.29s       0K  -1880K  grid       oracle       1  --     -  S   35%  oracle

The "CPU" shows overall statistics for all CPUs.
The "cpu" shows statistics for single CPU.

Analysis:

  • Although maximum CPU frequency is 2.0Ghz, the server could not use its full speed. It used average 1.63Ghz, which is 81% of full CPU speed.
  • When CPU usage is 91%(sys:4+user:86+irq:1) at cpu000, it used average 1.94Ghz, which is 96% of full CPU speed.
  • When CPU usage is 51%(sys:6+user:45+irq:0) at cpu013, it used average 1.52Ghz, which is 76% of full CPU speed.
  • When CPU usage is 28%(sys:2+user:26+irq:0) at cpu012, it used average 1.43Ghz, which is 71% of full CPU speed.
Comment by ubTools Support [ 29/Sep/15 01:22 PM ]
TEST2:

CPU scaling governor is performance.

An atop snapshot:

ATOP - avsprddbflx05                2015/09/29  16:16:27                ---------                  10s elapsed
PRC | sys    7.06s | user  86.40s | #proc   1313 | #tslpi  1756 | #tslpu     0 | #zombie    0 | no  procacct |
CPU | sys      57% | user    864% | irq      14% | idle    623% | wait     43% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     80% | irq       7% | idle     10% | cpu004 w  1% | avgf 2.00GHz | avgscal 100% |
cpu | sys       5% | user     81% | irq       2% | idle     11% | cpu000 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     82% | irq       1% | idle     14% | cpu001 w  1% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     74% | irq       0% | idle     21% | cpu002 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     73% | irq       0% | idle     22% | cpu003 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     62% | irq       0% | idle     32% | cpu005 w  3% | avgf 2.00GHz | avgscal 100% |
cpu | sys       6% | user     58% | irq       1% | idle     26% | cpu008 w 10% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     51% | irq       0% | idle     43% | cpu006 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     47% | irq       0% | idle     47% | cpu010 w  3% | avgf 2.00GHz | avgscal 100% |
cpu | sys       3% | user     46% | irq       0% | idle     49% | cpu013 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       2% | user     44% | irq       1% | idle     50% | cpu007 w  3% | avgf 2.00GHz | avgscal 100% |
cpu | sys       6% | user     40% | irq       1% | idle     48% | cpu009 w  6% | avgf 2.00GHz | avgscal 100% |
cpu | sys       6% | user     33% | irq       1% | idle     57% | cpu011 w  3% | avgf 2.00GHz | avgscal 100% |
cpu | sys       2% | user     34% | irq       0% | idle     60% | cpu012 w  3% | avgf 2.00GHz | avgscal 100% |
cpu | sys       4% | user     28% | irq       0% | idle     66% | cpu014 w  2% | avgf 2.00GHz | avgscal 100% |
cpu | sys       2% | user     28% | irq       0% | idle     69% | cpu015 w  1% | avgf 2.00GHz | avgscal 100% |
CPL | avg1    5.98 | avg5    6.41 | avg15   4.75 | csw   382133 | intr  340254 |              | numcpu    16 |
MEM | tot   126.1G | free   36.6G | cache   5.3G | dirty  28.9M | buff  193.0M | slab  836.2M |              |
SWP | tot    17.1G | free   17.1G |              |              |              | vmcom  13.0G | vmlim  42.6G |
NET | transport    | tcpi   10272 | tcpo   10302 | udpi   75222 | udpo   75458 | tcpao     30 | tcppo      2 |
NET | network      | ipi   111030 | ipo    85760 | ipfrw      0 | deliv  85494 | icmpi      0 | icmpo      0 |

  PID    TID   SYSCPU   USRCPU    VGROW   RGROW  RUID       EUID       THR  ST   EXC  S   CPU  CMD        1/62
15847      -    0.17s    4.59s       0K    896K  grid       oracle       1  --     -  S   48%  oracle
14867      -    0.14s    3.79s       0K   2220K  grid       oracle       1  --     -  R   40%  oracle
15835      -    0.15s    3.76s    8192K    384K  grid       oracle       1  --     -  R   39%  oracle
14871      -    0.24s    3.59s       0K      0K  grid       oracle       1  --     -  R   39%  oracle
15849      -    0.14s    3.55s       0K  -1216K  grid       oracle       1  --     -  R   37%  oracle

Analysis:

  • The maximum CPU frequency is 2.0Ghz and all CPUs could use 100% of full CPU speed.
Comment by ubTools Support [ 29/Sep/15 03:19 PM ]
COMPARISION:

30 minutes load test results...

1st: When CPU scaling governor is ondemand.
2nd: When CPU scaling governor is performance.

Data:

Top Activity:

AWR:

Analysis:

  • "row cache lock" wait time decreased since the holders did their jobs faster, as a result held the resources shorter.
  • DB time decreased 31.4%, mostly from decrease in "row cache lock".
  • Logical reads increased 16.3% since more buffer gets could be done on the faster CPU frequency.
Comment by ubTools Support [ 29/Sep/15 03:34 PM ]
SUMMARY:

Analysis:

  • Changing CPU scaling governor from "ondemand" to "performance" increased the performance.
  • Performance improvement is noticable when:
    • The difference between the minumum and maximum CPU frequencies is high.
    • CPU usage is not heavy(up_threshold:95%).
    • There are sessions waiting for other sessions on CPU.

Recommendations:

  • If performance is important than heating, set CPU scaling governor to "performance".
Comment by ubTools Support [ 30/Sep/15 09:24 AM ]
CPU TIME and LOGICAL READS:

Data:

  ondemand performance Difference(%)
CPU time per second 7.8s 8.3s 6.4
Logical reads per second 535,236.2 622,625.2 16.3
CPU time per Logical reads 14,6us 13,3us 8.9

Analysis:

8.9% improvements in CPU time caused 31.4% improvement DB time.

Comment by ubTools Support [ 02/Oct/15 01:45 PM ]
The focus here is to show how CPU scaling governor affects Oracle service and wait times; not to show how to tune Oracle events such as "row cache lock" above.




[QA-57] ORA-04030 returned by "__libc_sbrk(0x0000000001010020) Err#12 ENOMEM" Created: 02/Dec/13  Updated: 28/Feb/17

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Third-party Problem Votes: 0

Product Version: 10.2.0.4
Operating System: IBM-AIX
Operating System Version: 6.1
Host Name: .
Database Name: .

 Description   
The customer encountered the following problem:
ORA-04030: out of process memory when trying to allocate 2093096 bytes (QERHJ hash-joi,QERHJ list array)


 Comments   
Comment by ubTools Support [ 02/Dec/13 02:52 PM ]
ANALYIS 1:

PGASTAT:

SQL> select * from v$pgastat order by value;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
maximum PGA used for manual workareas                                     0
bytes

over allocation count                                                     0


total PGA used for manual workareas                                       0
bytes


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
cache hit percentage                                                  98.53
percent

process count                                                           126


max processes count                                                     135



NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
recompute count (total)                                              132370


total PGA used for auto workareas                                   4399104
bytes

total freeable PGA memory                                         106823680
bytes


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
maximum PGA used for auto workareas                               153909248
bytes

global memory bound                                               214743040
bytes

total PGA inuse                                                   747691008
bytes


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
total PGA allocated                                              1180690432
bytes

aggregate PGA auto target                                        1265577984
bytes

maximum PGA allocated                                            1299183616
bytes


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
aggregate PGA target parameter                                   2147483648
bytes

extra bytes read/written                                         1.2622E+10
bytes

PGA memory freed back to OS                                      6.0510E+10
bytes


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
UNIT
------------
bytes processed                                                  8.5171E+11
bytes


19 rows selected.

SQL>

pga_aggregate_target parmeter is not exceeded.

HEAPDUMP:

Set Up:

To setup tracing to trap the ORA-4030, on the server use the following in SQL*Plus:

SQL> ALTER SYSTEM SET EVENTS '4030 trace name heapdump level 536870917;name errorstack level 3';
Once the error reoccurs with the event set, you can turn off tracing using the following command in SQL*Plus:

ALTER SYSTEM SET EVENTS '4030 trace name context off; name context off';

Ref: Oracle note: Master Note for Diagnosing OS Memory Problems and ORA-4030 (Doc ID 1088267.1)

TRACE:

Heap:

HEAP DUMP heap name="session heap"  desc=11044a830
 extent sz=0xff80 alt=32767 het=32767 rec=0 flg=2 opc=2
 parent=1101981f0 owner=70000033f6789e8 nex=0 xsz=0x0
.....
Total heap size    =108241256

Internal Parameters:

  _pga_max_size                       = 419420 KB
.....
  _smm_max_size                       = 209710 KB
  _smm_px_max_size                 = 1048576 KB

No PGA limits are exceeded.

Comment by ubTools Support [ 02/Dec/13 03:02 PM ]
ANALYSIS 2:

System Calls:

truss -fae -o <outputFile> -p <V$PROCESS.SPID> excerpt:

14483680:	43122723: __libc_sbrk(0x0000000001010020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FE0020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000001004020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FE0020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000001001020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FE0020)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000001000420)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FDF420)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000001000120)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FDF420)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000001000060)	Err#12 ENOMEM
14483680:	43122723: __libc_sbrk(0x0000000000FDF420)	Err#12 ENOMEM
14483680:	43122723: statx("/oracle/admin/ATSD/udump", 0x0FFFFFFFFFFF41A8, 176, 0) = 0
14483680:	43122723: close(5)				= 0
14483680:	43122723: statx("/oracle/admin/ATSD/udump/atsd2_ora_14483680.trc", 0x0FFFFFFFFFFF44C0, 176, 01) Err#2  ENOENT
14483680:	43122723: statx("/oracle/admin/ATSD/udump/atsd2_ora_14483680.trc", 0x0FFFFFFFFFFF44C0, 176, 0) Err#2  ENOENT
14483680:	43122723: kopen("/oracle/admin/ATSD/udump/atsd2_ora_14483680.trc", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP) = 5
14483680:	43122723: kwrite(5, 0x0000000104A1C468, 0)	= 0
14483680:	43122723: kwrite(5, " / o r a c l e / a d m i".., 47) = 47

When ORA-4030 error occured, trace file ("/oracle/admin/ATSD/udump/atsd2_ora_14483680.trc was created. So, the problem occured before its generation at _libc_sbrk with return code of _ENOMEM. The system could not return memory to Oracle process.

User resource limits:

oracle@atlasdb2:/home/oracle/dunal >ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited
oracle@atlasdb2:/home/oracle/dunal >

No limit was found for oracle user.

Comment by ubTools Support [ 02/Dec/13 03:10 PM ]
The system admin will work on this problem. The solution will be added here.
Comment by ubTools Support [ 28/Feb/17 09:01 AM ]
There was no response from the system admin. But, the problem was a resource limit problem that Oracle user could not allocate memory.




[QA-56] "cursor: mutex X" Created: 27/Jul/13  Updated: 28/Jul/13

Status: Open
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Unresolved Votes: 0

Product Version: 11.2.0.3.7
Operating System: Solaris
Host Name: .
Database Name: .

 Description   
The customer has upgraded from Oracle 10.2.0.5 to Oracle 11.2.0.3.7. They encountered library cache lock problem.

Some excerpt from the AWR:

Elapsed:	 	 15.01 (mins)	 	 
DB Time:	 	 6,548.72 (mins)	 	 
.....
Top 5 Timed Foreground Events

Event	Waits	Time(s)	Avg wait (ms)	% DB time	Wait Class
library cache lock	425	364,186	856909	92.69	Concurrency
enq: TX - row lock contention	288	11,500	39930	2.93	Application
TCP Socket (KGAS)	136,317	9,552	70	2.43	Network
DB CPU	 	4,167	 	1.06	 
db file sequential read	742,541	2,176	3	0.55	User I/O


 Comments   
Comment by ubTools Support [ 27/Jul/13 10:42 AM ]
Some excerpt from the HANGANALYZE trace:
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (opusdata.opusdata)
                   os id: 25465
              process id: 256, oracle@<hostname>
              session id: 39
        session serial #: 61345
    }
    is waiting for 'library cache lock' with wait info:
    {
                      p1: 'handle address'=0x22038dff68
                      p2: 'lock address'=0x21ef361978
                      p3: '100*mode+namespace'=0x520002
            time in wait: 5 min 33 sec
           timeout after: never
                 wait id: 5128
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.001507 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 0.003166 sec
                     wait id: 5127            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000002 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000002 sec
                     wait id: 5126            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000011 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.003906 sec
                     wait id: 5125            p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (opusdata.opusdata)
                   os id: 22379
              process id: 500, oracle@<hostname>
              session id: 8008
        session serial #: 62080
    }
    which is waiting for 'cursor: mutex X' with wait info:
    {
                      p1: 'idn'=0xd4d88873
                      p2: 'value'=0x1
                      p3: 'where'=0x400000000
            time in wait: 0.000000 sec
      heur. time in wait: 1 min 3 sec
           timeout after: never
                 wait id: 7092018
                blocking: 426 sessions
            wait history:
              * time between current wait and wait #1: 0.000007 sec
              1.       event: 'cursor: mutex X'
                 time waited: 0.000002 sec
                     wait id: 7092017         p1: 'idn'=0xd4d88873
                                              p2: 'value'=0x1
                                              p3: 'where'=0x400000000
              * time between wait #1 and #2: 0.000009 sec
              2.       event: 'cursor: mutex X'
                 time waited: 0.000002 sec
                     wait id: 7092016         p1: 'idn'=0xd4d88873
                                              p2: 'value'=0x1
                                              p3: 'where'=0x400000000
              * time between wait #2 and #3: 0.000007 sec
              3.       event: 'cursor: mutex X'
                 time waited: 0.000003 sec
                     wait id: 7092015         p1: 'idn'=0xd4d88873
                                              p2: 'value'=0x1
                                              p3: 'where'=0x400000000
    }
 
Chain 1 Signature: 'cursor: mutex X'<='library cache lock'
Chain 1 Signature Hash: 0xfbcb6c60
.....
              process id: 3136, oracle@<hostname>
.....
    is waiting for 'library cache lock' with wait info:
.....
    and is blocked by 'instance: 1, os id: 22379, session id: 8008',

There are many other sessions waiting on library cache lock in the trace. And, they are blocked by SID#8008 which is waiting on cursor: mutex X. SID#8008 blocks 426 sessions.

Comment by ubTools Support [ 27/Jul/13 01:03 PM ]
Unfortunately, the blocker session SID#8008 exited. But, a new blocker SID#7579 appeared. Some excerpt from its ERRORSTACK LEVEL 3 trace:
KGX Atomic Operation Log 3c6bb9388
       Mutex 22c1df2c18(7579, 0) idn d4d88873 oper EXCL
       Cursor Parent uid 7579 efd 15 whr 1 slp 0
       oper=OPERATION_DEFAULT pt1=0 pt2=0 pt3=0
       pt4=0 u41=0 stt=0
      KGX Atomic Operation Log 3c6bb93d8
       Mutex 22c1df2d30(0, 1) idn d4d88873 oper GET_EXCL
       hash table uid 7579 efd 15 whr 4 slp 36782
       oper=OPERATION_DEFAULT pt1=0 pt2=0 pt3=0
       pt4=0 u41=0 stt=0

Mutex IDN 0xd4d88873 is hold in EXCL mode at mutex address 0x22c1df2c18. The holder SID is 7579. The mutex type is Cursor Parent
The same mutex IDN is requested in EXCL(oper GET_EXCL) mode at mutex address 0x22c1df2d30 and waited. The mutex type is hash table
See Oracle note Understanding and Reading Systemstates (Doc ID 423153.1) for interpreting.

Waiting session holds the same mutex IDN in EXCL mode but in the different mutex address and the different mutex type. No other holder encountered in the SYSTEMSTATE trace.

Comment by ubTools Support [ 28/Jul/13 01:26 PM ]
Some excerpt from AWR:
Foreground Wait Events
.....
Event	Waits	%Time -outs	Total Wait Time (s)	Avg wait (ms)	Waits /txn	% DB time
.....
cursor: mutex X	8,066,619	0	835	0	138.17	0.21
.....
Mutex Sleep Summary

ordered by number of sleeps desc

Mutex Type	Location	Sleeps	Wait Time (ms)
hash table	kkshhcdel [KKSHBKLOC4]	8,061,510	0

The mutex sleep location is kkshhcdel [KKSHBKLOC4]. Nothing found about it in Metalink.

Comment by ubTools Support [ 28/Jul/13 05:33 PM ]
Workaround:

Change the SQL text.

Comment by ubTools Support [ 28/Jul/13 05:36 PM ]
The customer will open an SR to Oracle Support. I'll update this issue later with SR result.




[QA-55] deinstall tool drops database Created: 26/Mar/13  Updated: 26/Mar/13

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.3
Operating System: Solaris
Host Name: .
Database Name: .

 Description   
Oracle® Database Upgrade Guide 11g Release 2 (11.2) Part Number E23633-07 writes:
Known Issue with the Deinstallation Tool for This Release
Cause: After upgrading from 11.2.0.1 or 11.2.0.2 to 11.2.0.3, deinstallation of the Oracle home in the earlier release of Oracle Database
may result in the deletion of the old Oracle base that was associated with it. This may also result in the deletion of data files, audit files, etc.,
which are stored under the old Oracle base.

Action: Before deinstalling the Oracle home in the earlier release, edit the orabase_cleanup.lst file found in the $Oracle_Home/utl directory and
remove the "oradata" and "admin" entries. Then, deinstall the Oracle home using the 11.2.0.3 deinstallation tool.

_Ref: http://docs.oracle.com/cd/E11882_01/server.112/e23633/intro.htm#BHCEECDJ

In our case:

  • There were already no oradata and admin entries in $ORACLE_HOME/utl/orabase_cleanup.lst.
  • There was already no database file in $ORACLE_BASE/oradata. There was just a soft link to ASM disk, which includes the database.

But, deinstall tool dropped the database.



 Comments   
Comment by ubTools Support [ 26/Mar/13 03:52 PM ]
Be careful while using deinstall. if you want to keep your database, don't use it until this problem is fixed.




[QA-54] Unable to close database by srvctl and racgimon takes 100% of CPU. Created: 26/Mar/13  Updated: 26/Mar/13

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.3
Operating System: Solaris
Host Name: .
Database Name: .

 Description   
Unable to close the database:
$ srvctl stop database -d ESIBASE
PRKP-1002 : Error stopping instance ESIBASE1 on node ersteracsrv1
CRS-0216: Could not stop resource 'ora.ESIBASE.ESIBASE1.inst'.
$

2 racgimon processes take 100% of CPU in prstat output:

  PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
  8286 oracle    64  36 0.0 0.0 0.0 0.0 0.0 0.0   0  37 .15   0 racgimon/1
  7903 oracle    65  35 0.0 0.0 0.0 0.0 0.0 0.0   0  35 .15   0 racgimon/1
 10015 root     0.0 0.8 0.0 0.0 0.0 0.0  99 0.0  21   1 398   0 prstat/1
  7818 oracle   0.2 0.0 0.0 0.0 0.0 0.0 100 0.0  62   1  7K   0 oracle/2
 10055 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0   7   0 318   0 sleep/1
   816 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  30   1 275   0 init.cssd/1
  1916 oracle   0.1 0.0 0.0 0.0 0.0 0.0 100 0.0  60   0 719  59 oracle/1
  1878 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 170   0 694   1 oracle/1
  1874 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 170   0 691   1 oracle/1
  1872 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  81   0 401  31 oracle/1
  1621 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  53   0 343   1 oracle/1
  1894 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  21   0  54   0 oracle/2
  1625 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  64   0 201   1 oracle/1
  1623 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  64   0 201   1 oracle/1
  1870 oracle   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  59   0 347   1 oracle/2
 NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU
    72 oracle     21G   21G    57%   0:07:37  26%
    54 root      117M  180M   0.5%   0:00:06 0.2%
     1 noaccess  136M  207M   0.6%   0:00:12 0.0%
     6 daemon   6408K 7496K   0.0%   0:00:00 0.0%
     1 smmsp    1136K 7244K   0.0%   0:00:00 0.0%
Total: 134 processes, 439 lwps, load averages: 2.26, 1.82, 1.13
#


 Comments   
Comment by ubTools Support [ 26/Mar/13 03:01 PM ]
ANALYSIS 1:

truss output of one of _racgimon:

# truss -fae -p 8286

8286:   close(346745079)                                Err#9 EBADF
8286:   close(346745080)                                Err#9 EBADF
8286:   close(346745081)                                Err#9 EBADF
8286:   close(346745082)                                Err#9 EBADF
8286:   close(346745083)                                Err#9 EBADF
8286:   close(346745084)                                Err#9 EBADF
8286:   close(346745085)                                Err#9 EBADF
8286:   close(346745086)                                Err#9 EBADF

# truss -faec -p 8286
psargs: /u01/app/oracle/product/10.2/bin/racgimon startd ESIBASE
^C
syscall               seconds   calls  errors
close                   2.374 1857265 1857265
                     --------  ------   ----
sys totals:             2.374 1857265 1857265
usr time:               1.079
elapsed:               23.090
#

Comment:

racgimon could not close file descriptors. It repeats to close different file descriptors which are incremented 1 in each subsequent close() system call.

close() system calls return EBADF, which is The fildes argument is not a valid file descriptor.
Ref: http://docs.oracle.com/cd/E23823_01/html/816-5167/close-2.html#REFMAN2close-2

Comment by ubTools Support [ 26/Mar/13 03:09 PM ]
ANALYSIS 2:

prctl outpur of racgimon:

# prctl 8286
process: 8286: /u01/app/oracle/product/10.2/bin/racgimon startd ESIBASE
NAME    PRIVILEGE       VALUE    FLAG   ACTION
RECIPIENT
process.max-port-events
        privileged      65.5K       -   deny
 -
        system          2.15G     max   deny
 -
process.max-msg-messages
        privileged      8.19K       -   deny
 -
        system          4.29G     max   deny
 -
process.max-msg-qbytes
        privileged      64.0KB      -   deny
 -
        system          16.0EB    max   deny
 -
process.max-sem-ops
        privileged        512       -   deny
 -
        system          2.15G     max   deny
 -
process.max-sem-nsems
        privileged        512       -   deny
 -
        system          32.8K     max   deny
 -
process.max-address-space
        privileged      16.0EB    max   deny
 -
        system          16.0EB    max   deny
 -
process.max-file-descriptor
        privileged      2.15G     max   deny
 -
        system          2.15G     max   deny
 -
process.max-core-size
        basic               0B      -   deny
8286
        system          8.00EB    max   deny
 -
process.max-stack-size
        basic           10.0MB      -   deny
8286
        privileged       125TB      -   deny
 -
        system           125TB    max   deny
 -
process.max-data-size
        privileged      16.0EB    max   deny
 -
        system          16.0EB    max   deny
 -
process.max-file-size
        privileged      8.00EB    max   deny,signal=XFSZ
 -
        system          8.00EB    max   deny
 -
process.max-cpu-time
        privileged      18.4Es    inf   signal=XCPU
 -
        system          18.4Es    inf   none
 -
task.max-cpu-time
        system          18.4Es    inf   none
 -
task.max-lwps
        system          2.15G     max   deny
 -
project.max-contracts
        privileged      10.0K       -   deny
 -
        system          2.15G     max   deny
 -
project.max-device-locked-memory
        privileged      2.19GB      -   deny
 -
        system          16.0EB    max   deny
 -
project.max-locked-memory
        system          16.0EB    max   deny
 -
project.max-port-ids
        privileged      8.19K       -   deny
 -
        system          65.5K     max   deny
 -
project.max-shm-memory
        privileged      24.0GB      -   deny
 -
        system          16.0EB    max   deny
 -
project.max-shm-ids
        privileged        128       -   deny
 -
        system          16.8M     max   deny
 -
project.max-msg-ids
        privileged        128       -   deny
 -
        system          16.8M     max   deny
 -
project.max-sem-ids
        privileged        128       -   deny
 -
        system          16.8M     max   deny
 -
project.max-crypto-memory
        privileged      8.77GB      -   deny
 -
        system          16.0EB    max   deny
 -
project.max-tasks
        system          2.15G     max   deny
 -
project.max-lwps
        system          2.15G     max   deny
 -
project.cpu-cap
        system          4.29G     inf   deny
 -
project.cpu-shares
        privileged          1       -   none
 -
        system          65.5K     max   none
 -
zone.max-swap
        system          16.0EB    max   deny
 -
zone.max-locked-memory
        system          16.0EB    max   deny
 -
zone.max-shm-memory
        system          16.0EB    max   deny
 -
zone.max-shm-ids
        system          16.8M     max   deny
 -
zone.max-sem-ids
        system          16.8M     max   deny
 -
zone.max-msg-ids
        system          16.8M     max   deny
 -
zone.max-lwps
        system          2.15G     max   deny
 -
zone.cpu-cap
        system          4.29G     inf   deny
 -
zone.cpu-shares
        privileged          1       -   none
 -
        system          65.5K     max   none
 -

$ prctl -n process.max-file-descriptor -i process $$
process: 7615: -sh
NAME    PRIVILEGE       VALUE    FLAG   ACTION
RECIPIENT
process.max-file-descriptor
        basic           4.10K       -   deny
7615
        system          2.15G     max   deny
 -
$

Comment:

privileged option of process.max-file-descriptor had reached to 2.15G descriptors. But, no privileged option had been set to it.

Comment by ubTools Support [ 26/Mar/13 03:21 PM ]
WORKAROUND:

Set privileged option to a value as an example below:

# projmod -s -K "process.max-file-descriptor=(basic,4096,deny),(privileged,65536,deny)" 'user.oracle'

After setting, check as below:

$ prctl -n process.max-file-descriptor -i process $$
process: 708: -sh
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
process.max-file-descriptor
        basic           4.10K       -   deny                               708
        privileged      65.5K       -   deny                                 -
        system          2.15G     max   deny                                 -
$

See similar problem for lower Oracle versions in Oracle note srvctl Slow or Fails to Start/Stop Database Instance and crsd.bin/racgmain/racgimon High CPU Usage [ID 1457387.1].





[QA-53] Starting Listener Hangs with "TNS-12531: TNS:cannot allocate memory" in Listener Log Created: 02/Jul/12  Updated: 12/Jul/12

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - SQL*Net Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.3
Operating System: Linux
Host Name: .
Database Name: .

 Description   
Starting the LISTENER hangs. The following errors appear as an infinite loop in the listener.log:
02-JUL-2012 15:54:06 * 12531
TNS-12531: TNS:cannot allocate memory
02-JUL-2012 15:54:06 * 12531
TNS-12531: TNS:cannot allocate memory
02-JUL-2012 15:54:06 * 12531
TNS-12531: TNS:cannot allocate memory
02-JUL-2012 15:54:06 * 12531
TNS-12531: TNS:cannot allocate memory
02-JUL-2012 15:54:06 * 12531
TNS-12531: TNS:cannot allocate memory


 Comments   
Comment by ubTools Support [ 02/Jul/12 02:19 PM ]
LISTENER trace enabled in listener.ora as below:
TRACE_LEVEL_LISTENER     = 16
TRACE_FILE_LISTENER      = listener.trc
TRACE_UNIQUE_LISTENER    = TRUE
TRACE_TIMESTAMP_LISTENER = TRUE

listener.trc was generated in $ORACLE_BASE/diag/tnslsnr/linux1/listener/trace/ as below:

2012-07-02 15:55:03.847203 : snlinGetAddrInfo:entry
2012-07-02 15:55:03.847276 : snlinGetAddrInfo:getaddrinfo() failed with error -3
2012-07-02 15:55:03.847295 : snlinGetAddrInfo:exit
2012-07-02 15:55:03.847307 : nserror:entry
2012-07-02 15:55:03.847319 : nserror:nsres: id=0, op=65, ns=12531, ns2=0; nt[0]=0, nt[1]=0, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
2012-07-02 15:55:03.847331 : nsmfr:entry
2012-07-02 15:55:03.847342 : nsmfr:1528 bytes at 0xa193a0
2012-07-02 15:55:03.847352 : nsmfr:normal exit
2012-07-02 15:55:03.847363 : nsopenmplx:error exit
2012-07-02 15:55:03.847373 : nsopen:unable to allocate context area
2012-07-02 15:55:03.847384 : nsopen:error exit
2012-07-02 15:55:03.847395 : nsanswer:error exit
2012-07-02 15:55:03.847411 : nsglhc:nsanswer error 12531

The problem appeared in getaddrinfo() system call.

Comment by ubTools Support [ 02/Jul/12 02:22 PM ]
The IPv4 for hostname was defined in the /etc/hosts; but there was no IPv6 definition.

Even though, only IPv4 address was used in the listener.ora, the problem occured again.

The problem has been disappeared after adding the same hostname as IPv6 to the /etc/hosts.





[QA-52] "Transaction recovery: lock conflict caught and ignored" messages in ALERT LOG. Created: 30/Dec/11  Updated: 16/Jan/12

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 11.2.0.1.0 (RAC)
Operating System: HP-UX
Operating System Version: B.11.31
Host Name: .
Database Name: .

 Description   
The customer encounters the following messages:

ALERT LOG:

.....
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
Transaction recovery: lock conflict caught and ignored
.....

SMON TRACE:

.....
*** 2011-12-26 14:42:46.401
Serial Transaction recovery caught exception 30319
Serial Transaction recovery caught exception 601

*** 2011-12-26 14:46:25.455
Serial Transaction recovery caught exception 601
Serial Transaction recovery caught exception 601
Serial Transaction recovery caught exception 601
Serial Transaction recovery caught exception 601
.....

The customer said the error started after SUPPLEMENTAL LOGGING enabled. But, the messages have not disappeared after disabling it.



 Comments   
Comment by ubTools Support [ 30/Dec/11 01:09 PM ]
DEAD TRANSACTIONS:

SQL:

select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn
xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks,
a.ktuxesta txstatus
from x$ktuxe a, undo$ b
where a.ktuxecfl like '%DEAD%'
and a.ktuxeusn = b.us#;

Data:

USEG	INSTID	STATUS	XID_USN	XID_SLOT	XID_SEQ	UNDOBLOCKS	TXSTATUS
_SYSSMU1209_1270276489$	1	3	1209	3	1100382	3033	ACTIVE
_SYSSMU1482_3325964579$	2	2	1482	16	496322	0	INACTIVE
_SYSSMU1681_4095893383$	2	2	1681	5	472365	0	INACTIVE
_SYSSMU2072_3213080551$	2	2	2072	2	120912	0	INACTIVE

Definition:

  • Transaction id: XID_USN.XID_SLOT.XID_SEQ

Comment:

  • There is an active dead transaction in _SYSSMU1209_1270276489$ undo segment.
  • The dead transaction id is 1209.3.1100382 which is 0x04B9.003.0010CA5E in hexadecimal.
Comment by ubTools Support [ 30/Dec/11 01:27 PM ]
UNDO HEADER:

Reading Transaction Table in the UNDO header:

SQL:

  • SQL> ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU1209_1270276489$';

Data:

.....
  TRN TBL::
 
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num    cmt
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x10ca61  0x001a  0x001d.ac01b6ea  0x04c30103  0x0000.000.00000000  0x00000001   0x00000000  1324239683
   0x01    9    0x00  0x10c8f0  0x000e  0x001d.abff4525  0x04c30043  0x0000.000.00000000  0x00000001   0x00000000  1324239603
   0x02    9    0x00  0x10ca1f  0x0005  0x001d.abfad814  0x00c2e328  0x0000.000.00000000  0x00000003   0x00000000  1324239446
   0x03   10    0x90  0x10ca5e  0x0002  0x001d.ab7efd87  0x00c0f5ea  0x0000.000.00000000  0x00000bd9   0x04c1f938  0
   0x04    9    0x00  0x10c46d  0x000a  0x001d.abda8e80  0x04c28e5f  0x0000.000.00000000  0x00000001   0x00000000  1324238461
   0x05    9    0x00  0x10c91c  0x0015  0x001d.abfadb90  0x00c2e32d  0x0000.000.00000000  0x00000001   0x00000000  1324239447
   0x06    9    0x00  0x10cdbb  0x001d  0x001d.abd50f70  0x04c28e80  0x0000.000.00000000  0x00000001   0x00000000  1324238283
   0x07    9    0x00  0x10c77a  0x0004  0x001d.abd90c5b  0x00c29cd8  0x0000.000.00000000  0x00000001   0x00000000  1324238409
   0x08    9    0x00  0x10c229  0x0020  0x001d.abe1de1e  0x00c2a8d2  0x0000.000.00000000  0x00000001   0x00000000  1324238704
   0x09    9    0x00  0x10ca28  0x0006  0x001d.abd4dfb6  0x04c28e5f  0x0000.000.00000000  0x00000001   0x00000000  1324238278
   0x0a    9    0x00  0x10c6b7  0x0008  0x001d.abe1c7f3  0x00c2a8c2  0x0000.000.00000000  0x00000001   0x00000000  1324238701
   0x0b    9    0x00  0x10c9e6  0x0017  0x001d.abfdbd74  0x04c30007  0x0000.000.00000000  0x00000001   0x00000000  1324239554
   0x0c    9    0x00  0x10cb45  0x0011  0x001d.abfc5eea  0x04c2ff9d  0x0000.000.00000000  0x00000001   0x00000000  1324239502
   0x0d    9    0x00  0x10c444  0x001c  0x001d.abca9d1f  0x00c22bc1  0x0000.000.00000000  0x00000001   0x00000000  1324237948
   0x0e    9    0x00  0x10c7e3  0x0000  0x001d.abffbb9a  0x04c3005f  0x0000.000.00000000  0x00000001   0x00000000  1324239618
   0x0f    9    0x00  0x10ca72  0x0007  0x001d.abd82320  0x00c29c21  0x0000.000.00000000  0x00000001   0x00000000  1324238375
   0x10    9    0x00  0x10c501  0x001f  0x001d.abf33edd  0x00c2e03f  0x0000.000.00000000  0x00000001   0x00000000  1324239208
   0x11    9    0x00  0x10ca90  0x000b  0x001d.abfdbc34  0x04c30004  0x0000.000.00000000  0x00000001   0x00000000  1324239554
   0x12    9    0x00  0x10c2ef  0x0018  0x001d.ac09d85c  0x04c3036f  0x0000.000.00000000  0x00000001   0x00000000  1324239959
   0x13    9    0x00  0x10c8ae  0x0010  0x001d.abe83852  0x04c2ea99  0x0000.000.00000000  0x00000001   0x00000000  1324238911
   0x14    9    0x00  0x10c5ad  0x0016  0x001d.abd3e99a  0x04c28de0  0x0000.000.00000000  0x00000001   0x00000000  1324238242
   0x15    9    0x00  0x10c62c  0x000c  0x001d.abfb4d1a  0x04c2ff8d  0x0000.000.00000000  0x00000001   0x00000000  1324239464
   0x16    9    0x00  0x10c72b  0x001b  0x001d.abd4d238  0x04c28e4e  0x0000.000.00000000  0x00000001   0x00000000  1324238274
   0x17    9    0x00  0x10c2da  0x0001  0x001d.abff0f85  0x04c30029  0x0000.000.00000000  0x00000001   0x00000000  1324239598
   0x18    9    0x00  0x10c589  0xffff  0x001d.ad480910  0x00000000  0x0000.000.00000000  0x00000000   0x00000000  1324254620
   0x19    9    0x00  0x10c628  0x000d  0x001d.abca6bcb  0x00c22ba5  0x0000.000.00000000  0x00000001   0x00000000  1324237944
   0x1a    9    0x00  0x10c4a7  0x0012  0x001d.ac04e46b  0x04c30232  0x0000.000.00000000  0x00000001   0x00000000  1324239791
   0x1b    9    0x00  0x10c2e6  0x0009  0x001d.abd4df89  0x04c28e5c  0x0000.000.00000000  0x00000001   0x00000000  1324238277
   0x1c    9    0x00  0x10c755  0x0014  0x001d.abcb5957  0x04c28b14  0x0000.000.00000000  0x00000001   0x00000000  1324237971
   0x1d    9    0x00  0x10cd54  0x0021  0x001d.abd6f01b  0x00c29b7b  0x0000.000.00000000  0x00000001   0x00000000  1324238343
   0x1e    9    0x00  0x10c5e3  0x0019  0x001d.abca546a  0x00c22b90  0x0000.000.00000000  0x00000001   0x00000000  1324237940
   0x1f    9    0x00  0x10c232  0x0002  0x001d.abf7fc92  0x00c2e1be  0x0000.000.00000000  0x00000001   0x00000000  1324239355
   0x20    9    0x00  0x10c391  0x0013  0x001d.abe5ff89  0x04c2e999  0x0000.000.00000000  0x00000001   0x00000000  1324238832
   0x21    9    0x00  0x10cc70  0x000f  0x001d.abd77e3c  0x00c29bd6  0x0000.000.00000000  0x00000001   0x00000000  1324238361
  EXT TRN CTL::
  usn: 1209
.....

Definitions:

  • State#10 means active transaction.
  • dba points to starting UNDO block address.
  • usn: Undo segment number
  • usn.index.wrap# gives transaction id.

Comment:

An active transaction of 0x04b9.003.0010ca5e is available in the slot of 0x03, which has a dba of 0x00c0f5ea, which is 12645866 in decimal.

Comment by ubTools Support [ 30/Dec/11 01:43 PM ]
UNDO BLOCK:

Reading UNDO Block:

SQL:

  • fileID: select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(12645866) from x$dual;
  • blockID:select DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(12645866) from x$dual;
  • alter system dump datafile <fileID> block <blockID>;

Data:

.....
UNDO BLK:  
xid: 0x04b9.003.0010ca5e  seq: 0x1447 cnt: 0x2e  irb: 0x2c  icl: 0x0   flg: 0x0000
 
 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f8c     0x02 0x1dac     0x03 0x1d3c     0x04 0x1ccc     0x05 0x1c64     
0x06 0x1c0c     0x07 0x1b7c     0x08 0x1b0c     0x09 0x1a9c     0x0a 0x1a24     
0x0b 0x19cc     0x0c 0x183c     0x0d 0x17cc     0x0e 0x175c     0x0f 0x16e4     
0x10 0x168c     0x11 0x15fc     0x12 0x158c     0x13 0x151c     0x14 0x14b4     
0x15 0x145c     0x16 0x12f4     0x17 0x1284     0x18 0x1214     0x19 0x11ac     
0x1a 0x1154     0x1b 0x0f9c     0x1c 0x0f2c     0x1d 0x0ebc     0x1e 0x0e44     
0x1f 0x0dec     0x20 0x0c3c     0x21 0x0bcc     0x22 0x0b5c     0x23 0x0af4     
0x24 0x0a9c     0x25 0x08c4     0x26 0x0854     0x27 0x07e4     0x28 0x076c     
0x29 0x0714     0x2a 0x0604     0x2b 0x022c     0x2c 0x01c4     0x2d 0x0154     
0x2e 0x00e4     
.....

Definitions

  • irb points to last UNDO RECORD in UNDO block.
  • rci points to previous UNDO RECORD. if rci=0, it's the first UNDO RECORD.
  • Recovery operation starts from irb and chain is followed by rci until rci is zero.

Comment:

  • The transaction of 0x04b9.003.0010ca5e starts recovery from UNDO RECORD of 0x2c.
Comment by ubTools Support [ 30/Dec/11 02:16 PM ]
UNDO RECORDS:

Reading UNDO Records:

Data:

.....

*-----------------------------
* Rec #0x2c  slt: 0x03  objn: 939468(0x000e55cc)  objd: 941274  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x2b   
.....
*-----------------------------
* Rec #0x2b  slt: 0x03  objn: 939468(0x000e55cc)  objd: 941274  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x2a   
.....
*-----------------------------
* Rec #0x2a  slt: 0x03  objn: 939468(0x000e55cc)  objd: 941274  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x29   
.....
*-----------------------------
* Rec #0x29  slt: 0x03  objn: 1126679(0x00113117)  objd: 1126679  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x28   
.....
*-----------------------------
* Rec #0x28  slt: 0x03  objn: 1123018(0x001122ca)  objd: 1123018  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x27   
.....
*-----------------------------
* Rec #0x27  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x26   
.....
*-----------------------------
* Rec #0x26  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x25   
.....
*-----------------------------
* Rec #0x25  slt: 0x03  objn: 939450(0x000e55ba)  objd: 939450  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x24   
.....
*-----------------------------
* Rec #0x24  slt: 0x03  objn: 1126696(0x00113128)  objd: 1126696  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x23   
.....
*-----------------------------
* Rec #0x23  slt: 0x03  objn: 1123035(0x001122db)  objd: 1123035  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x22   
.....
*-----------------------------
* Rec #0x22  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x21   
.....
*-----------------------------
* Rec #0x21  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x20   
.....
*-----------------------------
* Rec #0x20  slt: 0x03  objn: 939408(0x000e5590)  objd: 941229  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x1f   
.....
*-----------------------------
* Rec #0x1f  slt: 0x03  objn: 1126655(0x001130ff)  objd: 1126655  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x1e   
.....
*-----------------------------
* Rec #0x1e  slt: 0x03  objn: 1122994(0x001122b2)  objd: 1122994  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x1d   
.....
*-----------------------------
* Rec #0x1d  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x1c   
.....
*-----------------------------
* Rec #0x1c  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x1b   
.....
*-----------------------------
* Rec #0x1b  slt: 0x03  objn: 939429(0x000e55a5)  objd: 941242  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x1a   
.....
*-----------------------------
* Rec #0x1a  slt: 0x03  objn: 1126678(0x00113116)  objd: 1126678  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x19   
.....
*-----------------------------
* Rec #0x19  slt: 0x03  objn: 1123017(0x001122c9)  objd: 1123017  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x18   
.....
*-----------------------------
* Rec #0x18  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x17   
.....
*-----------------------------
* Rec #0x17  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x16   
.....
*-----------------------------
* Rec #0x16  slt: 0x03  objn: 939466(0x000e55ca)  objd: 941272  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x15   
.....
*-----------------------------
* Rec #0x15  slt: 0x03  objn: 1126681(0x00113119)  objd: 1126681  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x14   
.....
*-----------------------------
* Rec #0x14  slt: 0x03  objn: 1123020(0x001122cc)  objd: 1123020  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x13   
.....
*-----------------------------
* Rec #0x13  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x12   
.....
*-----------------------------
* Rec #0x12  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x11   
.....
*-----------------------------
* Rec #0x11  slt: 0x03  objn: 939420(0x000e559c)  objd: 941236  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x10   
.....
*-----------------------------
* Rec #0x10  slt: 0x03  objn: 1126647(0x001130f7)  objd: 1126647  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x0f   
.....
*-----------------------------
* Rec #0xf  slt: 0x03  objn: 1122986(0x001122aa)  objd: 1122986  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x0e   
.....
*-----------------------------
* Rec #0xe  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x0d   
.....
*-----------------------------
* Rec #0xd  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x0c   
.....
*-----------------------------
* Rec #0xc  slt: 0x03  objn: 939418(0x000e559a)  objd: 941235  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x0b   
.....
*-----------------------------
* Rec #0xb  slt: 0x03  objn: 1126653(0x001130fd)  objd: 1126653  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x0a   
.....
*-----------------------------
* Rec #0xa  slt: 0x03  objn: 1122992(0x001122b0)  objd: 1122992  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x09   
.....
*-----------------------------
* Rec #0x9  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x08   
.....
*-----------------------------
* Rec #0x8  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x07   
.....
*-----------------------------
* Rec #0x7  slt: 0x03  objn: 939438(0x000e55ae)  objd: 941251  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x06   
.....
*-----------------------------
* Rec #0x6  slt: 0x03  objn: 1126696(0x00113128)  objd: 1126696  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x05   
.....
*-----------------------------
* Rec #0x5  slt: 0x03  objn: 1123035(0x001122db)  objd: 1123035  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x04   
.....
*-----------------------------
* Rec #0x4  slt: 0x03  objn: 1162285(0x0011bc2d)  objd: 1162285  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x03   
.....
*-----------------------------
* Rec #0x3  slt: 0x03  objn: 1162273(0x0011bc21)  objd: 1162273  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x02   
.....
*-----------------------------
* Rec #0x2  slt: 0x03  objn: 939448(0x000e55b8)  objd: 939448  tblspc: 9(0x00000009)
*       Layer:  11 (Row)   opc: 1   rci 0x01   
.....
*-----------------------------
* Rec #0x1  slt: 0x03  objn: 1126675(0x00113113)  objd: 1126675  tblspc: 9(0x00000009)
*       Layer:  10 (Index)   opc: 22   rci 0x00   
.....
KDO Op code: LMN row dependencies Disabled
.....

Definitions:

  • objn means object id.

Comment:

  • The objects need recovery:
    select * from dba_objects
    where object_id in (939468,1126679,1123018,1162285,1162273,939450,1126696,1123035,939408,
    1126655,1122994,939429,1126678,1123017,939466,1126681,
    1123020,939420,1126647,1122986,939418,1126653,1122992,939438,939448,1126675);
    
  • The first UNDO record includes LMN.
    --
    When running RAC and compatible 11.1 or higher, SMON could fail to
    recover transactions which had undo records for supplemental logging.
     
      (1) SMON is spinning
      (2) Must be RAC and compatible 11.1 or higher
      (3) Supplemental logging must have been enabled.
     
      If so, dump the undo for the transaction mentioned.  If the records
      show LMN entries, it is this bug.
    

    Ref: Bug 9489626 ORA-600 [4464] in RAC and SMON spins on cpu for a table with supplemental logging
Comment by ubTools Support [ 30/Dec/11 02:22 PM ]
ACTIONS:

Bug:

This problem is Oracle Bug:9857702:

.....
Affects:
Product (Component) Oracle Server (Rdbms)  
Range of versions believed to be affected Versions >= 11.1 but BELOW 12.1  
Versions confirmed as being affected
•11.2.0.1 
•11.1.0.7 
 
Platforms affected Generic (all / most platforms affected)  

Fixed:
This issue is fixed in
•12.1 (Future Release) 
•11.2.0.2 (Server Patch Set) 
•11.1.0.7.8 Patch Set Update 
•11.1.0.7 Patch 40 on Windows Platforms  
.....

Ref: Bug 9857702 ORA-600 [4464] / ORA-600 [4139] by ROLLBACK for a table with supplemental logging enabled

Workaround:

  • Recreate objects that need recovery.
Comment by ubTools Support [ 30/Dec/11 02:29 PM ]
Waiting for the customer action.
Comment by ubTools Support [ 16/Jan/12 03:28 PM ]
The customer dropped the identified objects, and the problem disappeared.




[QA-50] PRVF-5410 : Check of common NTP Time Server failed, PRVF-5416 : Query of NTP daemon failed on all nodes Created: 13/May/11  Updated: 13/May/11

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: CVU 11g
Operating System: Solaris
Operating System Version: 10
Host Name: .
Database Name: .

 Description   
Errors:

The customer encountered the following errors in CVU:

./cluvfy stage -pre crsinst -n detrac1,detrac2 -verbose
.....
NTP common Time Server Check started...
PRVF-5410 : Check of common NTP Time Server failed
PRVF-5416 : Query of NTP daemon failed on all nodes
Result: Clock synchronization check using Network Time Protocol(NTP) failed 
.....

NTP:

# ntpq -p
          remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*<REMOVED>    LOCAL(0)         <REMOVED>
# 

Same on both nodes.

CVU log:

.....
[978@detrac1] [main] [ 2011-05-13 17:09:51.490 EEST ] [TaskNTP.getTimeServerInfo:838]  Output from NTP query command on node detrac1 is =
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
  *<REMOVED>    LOCAL(0)        <REMOVED>

[978@detrac1] [main] [ 2011-05-13 17:09:51.492 EEST ] [TaskNTP.getTimeServerInfo:864]  Parsing of NTP query output line FAILED. Line=
 *<REMOVED>    LOCAL(0)        <REMOVED>
[978@detrac1] [main] [ 2011-05-13 17:09:51.492 EEST ] [TaskNTP.getTimeServerInfo:880]  NTP query on node detrac1 did NOT produce valid output.
[978@detrac1] [main] [ 2011-05-13 17:09:51.492 EEST ] [TaskNTP.getTimeServerInfo:838]  Output from NTP query command on node detrac2 is =
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
  *<REMOVED>    LOCAL(0)         <REMOVED>

[978@detrac1] [main] [ 2011-05-13 17:09:51.493 EEST ] [TaskNTP.getTimeServerInfo:864]  Parsing of NTP query output line FAILED. Line=
 *<REMOVED>    LOCAL(0)       <REMOVED>
[978@detrac1] [main] [ 2011-05-13 17:09:51.494 EEST ] [TaskNTP.getTimeServerInfo:880]  NTP query on node detrac2 did NOT produce valid output.
.....

Ref: $CVU_HOME/cv/log/cvutrace.log.0



 Comments   
Comment by ubTools Support [ 13/May/11 05:47 PM ]
Action:
The Network Administrator set an IP to refid for NTP.

NTP:

# ntpq -p
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*<REMOVED>    72.14.188.52     <REMOVED>
#

CVU Log:

.....
[967@detrac1] [main] [ 2011-05-13 19:21:19.426 EEST ] [TaskNTP.getTimeServerInfo:838]  Output from NTP query command on node detrac1 is =
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*<REMOVED>    72.14.188.52    <REMOVED>

[967@detrac1] [main] [ 2011-05-13 19:21:19.433 EEST ] [TimeServerNode.addDataToNode:66]  TimeServerNode:addDataToNode():Parsing line:
*<REMOVED>    72.14.188.52     <REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.434 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[0]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.434 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[1]=72.14.188.52
[967@detrac1] [main] [ 2011-05-13 19:21:19.434 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[2]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.435 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[3]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.435 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[4]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.436 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[5]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.436 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[6]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.437 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[7]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.437 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[8]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.438 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[9]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.438 EEST ] [TaskNTP.getTimeServerInfo:838]  Output from NTP query command on node detrac2 is =
     remote           refid      st t when poll reach   delay   offset    disp
==============================================================================
*<REMOVED>    72.14.188.52     <REMOVED>

[967@detrac1] [main] [ 2011-05-13 19:21:19.439 EEST ] [TimeServerNode.addDataToNode:66]  TimeServerNode:addDataToNode():Parsing line:
*<REMOVED>    72.14.188.52     <REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.440 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[0]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.440 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[1]=72.14.188.52
[967@detrac1] [main] [ 2011-05-13 19:21:19.441 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[2]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.441 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[3]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.441 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[4]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.442 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[5]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.442 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[6]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.443 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[7]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.443 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[8]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.444 EEST ] [TimeServerNode.addDataToNode:79]  Parsed Value[9]=<REMOVED>
[967@detrac1] [main] [ 2011-05-13 19:21:19.444 EEST ] [TaskNTP.doTimeServerCheck:736]  tsId=72.14.188.52; tServer
.....

CVU could parse ntpq output.

CVU Output:

.....
NTP common Time Server Check started...
NTP Time Server "72.14.188.52" is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed

Clock time offset check from NTP Time Server started...
Checking on nodes "[detrac1, detrac2]"...
Check: Clock time offset from NTP Time Server

Time Server: 72.14.188.52
Time Offset Limit: 1000.0 msecs
  Node Name     Time Offset               Status
  ------------  ------------------------  ------------------------
  detrac1       -2.332                    passed
  detrac2       -2.842                    passed
Time Server "72.14.188.52" has time offsets that are within permissible limits for nodes "[detrac1, detrac2]".
Clock time offset check passed

Result: Clock synchronization check using Network Time Protocol(NTP) passed
.....
Comment by ubTools Support [ 13/May/11 06:04 PM ]
Solution:

The Network Administrator set an IP to refid for NTP.





[QA-49] ORA-4031: High Allocation for "Oracle Text Commit new id" in Shared Pool. Created: 05/Nov/10  Updated: 05/Nov/10

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.2.0.4
Operating System: Solaris
Host Name: .
Database Name: .

 Description   
The customer encountered ORA-4031 and trace file generated. SGA is an ASMM SGA. The application uses Oracle Text.

 Comments   
Comment by ubTools Support [ 05/Nov/10 10:24 PM ]
Analysis of the Trace:

The Requested SUBPOOL:

.....
=================================
Begin 4031 Diagnostic Information
=================================
.....
HEAP DUMP heap name="sga heap(3,0)"  desc=380043660
 extent sz=0xfe0 alt=216 het=32767 rec=9 flg=-126 opc=0
 parent=0 owner=0 nex=0 xsz=0x1000000
 latch set 3 of 4
 durations enabled for this heap
 reserved granules for root 0 (granule size 16777216)
.....

The allocation was requested from sga heap(3,0), which is (SUBPOOL:3,DURATION:0).

All SUBPOOLS and Their DURATION Memories:

.....
HEAP DUMP heap name="sga heap(1,0)"  desc=380030610
Total heap size    =218102664
Total free space   =  1066928
Total reserved free space   =  8439520
Unpinned space     = 38812528  rcr=11971 trn=17906
Permanent space    =208595160
HEAP DUMP heap name="sga heap(1,1)"  desc=380031e68
Total heap size    = 67108512
Total free space   =  2912528
Total reserved free space   =  1382816
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,2)"  desc=3800336c0
Total heap size    =167771280
Total free space   = 92743480
Total reserved free space   =  3852856
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(1,3)"  desc=380034f18
Total heap size    =268434048
Total free space   = 74547592
Total reserved free space   = 13497472
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,0)"  desc=380039e38
Total heap size    =201325536
Total free space   =    17200
Total reserved free space   =  8435920
Unpinned space     = 26474112  rcr=7934 trn=8094
Permanent space    =192871456
HEAP DUMP heap name="sga heap(2,1)"  desc=38003b690
Total heap size    = 83885640
Total free space   = 48723768
Total reserved free space   =  1035792
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,2)"  desc=38003cee8
Total heap size    =369096816
Total free space   =258674312
Total reserved free space   = 16982464
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(2,3)"  desc=38003e740
Total heap size    =218102664
Total free space   = 17202608
Total reserved free space   = 10966696
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,0)"  desc=380043660
Total heap size    =184548408
Total free space   =    13008
Total reserved free space   =  5061928
Unpinned space     = 26943408  rcr=4930 trn=9425
Permanent space    =179472608
HEAP DUMP heap name="sga heap(3,1)"  desc=380044eb8
Total heap size    = 67108512
Total free space   = 27568352
Total reserved free space   =     4744
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,2)"  desc=380046710
Total heap size    =352319688
Total free space   =233302736
Total reserved free space   = 15981216
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(3,3)"  desc=380047f68
Total heap size    =385873944
Total free space   =143746536
Total reserved free space   = 19402616
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(4,0)"  desc=38004ce88
Total heap size    =184548408
Total free space   =     8616
Total reserved free space   =  7592328
Unpinned space     = 28725496  rcr=8459 trn=9864
Permanent space    =176946600
HEAP DUMP heap name="sga heap(4,1)"  desc=38004e6e0
Total heap size    = 83885640
Total free space   = 33356784
Total reserved free space   =  1189120
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(4,2)"  desc=38004ff38
Total heap size    =335542560
Total free space   =238988592
Total reserved free space   = 16293768
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
HEAP DUMP heap name="sga heap(4,3)"  desc=380051790
Total heap size    =721416504
Total free space   =445595432
Total reserved free space   = 33743680
Unpinned space     =        0  rcr=0 trn=0
Permanent space    =        0
.....

All PERMANENT SPACES were allocated in DURATION 0. Although there are enough free spaces in the other DURATIONS of (3,1),(3,2),(3,3); free space can not be allocated from them.

.....
duration memory (duration 0) cannot take free memory from other durations within the same subpool.  
It can only get more memory by being given a new complete EXTENT (granule) from the granule management code.
.....

Ref: Oracle Bug 9911213: ORA-04031 AFTER APPLYING 10.2.0.4 PATCSHET

Since the lower limit of BUFFER CACHE was determined by DB_CAHCE_SIZE parameter; SHARED POOL could not grow by allocating a new EXTENT, then ORA-4031 appeared.

SUBPOOL Allocations:

.....
==============================
Memory Utilization of Subpool 1
================================
     Allocation Name          Size   
_________________________  __________
"free memory              "   215299680  
.....
"sql area                 "   151923248
.....
"Oracle Text Commit new id"   399237696
.....
"library cache            "    30711448
.....
==============================
Memory Utilization of Subpool 2
================================
     Allocation Name          Size   
_________________________  __________
"free memory              "   367295736
.....
"sql area                 "   160984248
.....
"Oracle Text Commit new id"   392833064
.....
"library cache            "    35069800
.....
==============================
Memory Utilization of Subpool 3
================================
     Allocation Name          Size   
_________________________  __________
"free memory              "   450731968  
.....
"sql area                 "   182415376
.....
"Oracle Text Commit new id"   417149240
.....
"library cache            "    39156336
.....
==============================
Memory Utilization of Subpool 4
================================
     Allocation Name          Size   
_________________________  __________
"free memory              "   781766288
.....
"sql area                 "   156513808
.....
"Oracle Text Commit new id"   410783408
.....
"library cache            "    31300664

The total size of Oracle Text Commit new id is 1.5GB (399237696+392833064+417149240+410783408). It's high.

Comment by ubTools Support [ 05/Nov/10 10:35 PM ]
Oracle Text Commit new id Allocation Trend:

An Excerpt from SGA Stat:

SQL> select a.instance_number,begin_interval_time, bytes from dba_hist_sgastat a, dba_hist_snapshot b
  2  where pool='shared pool' and
  3        a.snap_id=b.snap_id and
  4        a.instance_number=b.instance_number and
  5        name='Oracle Text Commit new id'
  6  order by begin_interval_time;

.....
              1 06/10/2010 01:00:07,750                                                      352864368
              1 06/10/2010 02:00:55,107                                                      353711568
.....
              1 12/10/2010 11:00:12,212                                                      448444792
              1 12/10/2010 12:00:27,412                                                      449299672
              1 12/10/2010 13:00:12,435                                                      450157752
              1 12/10/2010 14:00:19,294                                                      450179512
.....
              1 04/11/2010 14:31:10,604                                                     1622639416
              1 04/11/2010 14:40:18,341                                                     1623339552
              1 04/11/2010 14:50:28,971                                                     1623879936
              1 04/11/2010 15:00:40,721                                                     1623880712

722 rows selected.

SQL>

Oracle Text Commit new id had increased in small sizes.

Comment by ubTools Support [ 05/Nov/10 10:43 PM ]
Summary:

Root Cause:

This problem is Oracle BUG:8593562 encountered in Oracle Text environment.

.....
It is incremented as the space is allocated, but not decremented as it is freed.
 It will reset when the instance is restarted.
.....
The bug is currently in work by Development and expected to be resolved in a future release.
.....

Ref: Growth of "Oracle Text Commit new id" memory with Sync on Commit Index [ID 872413.1]

Workaround:

  • Restart the INSTANCE.




[QA-48] Unable to start VIP because of invalid RX packets numbers. Created: 18/Mar/09  Updated: 19/Mar/09

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10.2.0.4, RAC
Operating System: IBM-AIX
Operating System Version: 6.1

 Description   
*When starting a VIP on a node, it fails and started on the other node.

Starting the VIP:

# ./crs_start ora.akyorap2.vip
Attempting to start `ora.akyorap2.vip` on member `akyorap2`
Start of `ora.akyorap2.vip` on member `akyorap2` failed.
Attempting to start `ora.akyorap2.vip` on member `akyorap1`
Start of `ora.akyorap2.vip` on member `akyorap1` succeeded.
#

The log level increased to get more detailed diagnostic data.

Setting Log Level:

#./crsctl debug log res "ora.akyorap2.vip:1" 
Set Resource Debug Module: ora.akyorap2.vip  Level: 1
#

Errors from the Log:
(<ORA_CRS_HOME>/log/<nodeName>/racg/ora.akyorap2.vip.log)

Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] checkIf: start for if=en1
Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] IsIfAlive: start for if=en1

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:49 GMT+02:00 2009 [ 413770 ] defaultgw:  started
Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] defaultgw:  completed with 10.46.1
80.1

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:49 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S
10.46.180.52  -c 1 -w 1 10.46.180.1

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:51 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S
10.46.180.52  -c 1 -w 1 10.46.180.1

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: RX packets checked if=en1 failed
Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] Interface en1 checked failed (host
=akyorap2)
Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: end for if=en1

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:52 GMT+02:00 2009 [ 413770 ] checkIf: end for if=en1
Invalid parameters, or failed to bring up VIP (host=akyorap2)


 Comments   
Comment by ubTools Support [ 18/Mar/09 08:10 PM ]
The problem raised from IsIfAlive() of $ORA_CRS_HOME/racgvip.

Here are the related excerpt from racgvip:

  # Check the status of the interface thro' pinging gateway
  if [ -n "$DEFAULTGW" ]
  then
    _RET=1
    # get base IP address of the interface
    tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'`
    # get RX packets numbers
    _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
    x=$CHECK_TIMES
    while [ $x -gt 0 ]
    do
      if [ -n "$tmpIP" ]
      then
        logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW
"
        $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      else
        logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW"
        $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      fi
      _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
      if [ "$_O1" != "$_O2" ]
      then
        # RX packets numbers changed
        _RET=0
        break
      fi
      $SLEEP 1
      x=`$EXPR $x - 1`
    done
    if [ $_RET -ne 0 ]
    then
      logx "IsIfAlive: RX packets checked if=$_IF failed"
    else
      logx "IsIfAlive: RX packets checked if=$_IF OK"
    fi
....

According to the the code above, it does the followings:

  • Assigns the current RX packet number to _O1 variable as the first RX packet number.
  • Loops $CHECK_TIMES times:
    • Pings default gateway.
    • Assigns the current RX packet number to _O2 variable as the next RX packet number.
    • If RX packet number changed(_O1!=_O2), break the loop.
    • Sleep 1 second.
  • If RX packet number is NOT changed(_O1==_O2) raise the error; else it's OK.
Comment by ubTools Support [ 18/Mar/09 08:28 PM ]
racgvip was modified as below to dump the values of _O1 and _O2:
...
    # get RX packets numbers
    _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
    logx "--------------> by dunal: _O1: $_O1"

    x=$CHECK_TIMES
    while [ $x -gt 0 ]
    do
      if [ -n "$tmpIP" ]
      then
        logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW
"
        $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      else
        logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW"
        $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1
      fi
      _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
      logx "--------------> by dunal: _O2: $_O2"
...

As seen above, logx "--------------> by dunal: ..." lines are added to the script. Don't do that if you're not sure about what you do.

After restarting the VIP, the values of _O1 and _O2 are dumped in the logs.

Failed Node:

...
Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O1: -

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:49 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S
10.46.180.52  -c 1 -w 1 10.46.180.1
Wed Mar 18 20:58:50 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: -

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:51 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S
10.46.180.52  -c 1 -w 1 10.46.180.1
Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: -

2009-03-18 20:58:52.212: [    RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: RX packets checked if=en1 failed
Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] Interface en1 checked failed (host
=akyorap2)
...

As seen above, the values are '-'. It's wrong. But, they are same. So, RX packet number not changed.

Successful Node:

Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O1: 17297

2009-03-18 20:58:55.793: [    RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:55 GMT+02:00 2009 [ 405728 ] About to execute command: /usr/sbin/ping -S
10.46.180.51  -c 1 -w 1 10.46.180.1
Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O2: 17298

2009-03-18 20:58:55.793: [    RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18
 20:58:55 GMT+02:00 2009 [ 405728 ] IsIfAlive: RX packets checked if=en1 OK

_O1 and _O2 are different. That means RX packet number changed and the interface is up.

Comment by ubTools Support [ 18/Mar/09 08:44 PM ]

netstat Output on Failed Node:

/usr/bin/netstat -f inet -n -I en1 | /usr/bin/awk "{ if (/^en1/) {print $5; exit}}"
en1   1500  link#3      0.21.5e.34.55.bc       -    34601     0    16269     3     0

The column#5 is '-'. This is wrong and caused the problem.

netstat Output on Successful Node:

en1   1500  link#3      0.21.5e.34.57.fe            29223     0    10609     3     0

The column#5 is 29223. This is expected number.

Headers of netstat on Failed Node:

#/usr/bin/netstat -f inet -n -I en1 
Name  Mtu   Network     Address           ZoneID    Ipkts Ierrs    Opkts Oerrs  Coll
en1   1500  link#3      0.21.5e.34.55.bc       -    35645     0    16801     3     0
en1   1500  10.46.180   10.46.180.52           -    35645     0    16801     3     0

Headers of netstat on Successful Node:

#/usr/bin/netstat -f inet -n -I en1 
Name  Mtu   Network     Address           ZoneID    Ipkts Ierrs    Opkts Oerrs  Coll
en1   1500  link#3      0.21.5e.34.57.fe            29743     0    10762     3     0
en1   1500  10.46.180   10.46.180.51                29743     0    10762     3     0
en1   1500  10.46.180   10.46.180.53                29743     0    10762     3     0
en1   1500  10.46.180   10.46.180.54                29743     0    10762     3     0

The difference is the ZoneID column.

Looks like a network configuration problem. This issue will be open for an update from Network Administrators.

Comment by ubTools Support [ 19/Mar/09 12:54 PM ]
The Network Adminisitrator said it was an AIX Bug:

But, this fix changes ZoneID from blank value to '-'. After this fix, no VIP could be started.

Comment by ubTools Support [ 19/Mar/09 01:11 PM ]
No solution found from Metalink.
Comment by ubTools Support [ 19/Mar/09 01:45 PM ]
Looks like an inconsistency of Oracle on AIX 6.1.

Workaround:

Capturing column number of netstat must be changed from 5 to 6.

Original lines for _O1:

...
    tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'`
    # get RX packets numbers
    _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
    x=$CHECK_TIMES
    while [ $x -gt 0 ]
...

Modified line for _O1:

...
    tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'`
    # get RX packets numbers
    _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"`
    x=$CHECK_TIMES
    while [ $x -gt 0 ]
...

Original lines for _O2:

...
      fi
      _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
      if [ "$_O1" != "$_O2" ]
      then
        # RX packets numbers changed
...

Modified line for _O2:

...
      fi
      _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"`
      if [ "$_O1" != "$_O2" ]
      then
        # RX packets numbers changed
...

Then, VIP could be started on the correct nodes:

./crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....ap1.gsd application    ONLINE    ONLINE    akyorap1
ora....ap1.ons application    ONLINE    ONLINE    akyorap1
ora....ap1.vip application    ONLINE    ONLINE    akyorap1
ora....ap2.gsd application    ONLINE    ONLINE    akyorap2
ora....ap2.ons application    ONLINE    ONLINE    akyorap2
ora....ap2.vip application    ONLINE    ONLINE    akyorap2

Note: Don't edit Oracle scripts unless you know what you're doing.





[QA-47] ORA-00354 ORA-00353 ORA-00312: Redolog Block Corruption Created: 10/Mar/09  Updated: 10/Apr/09

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10.2.0.4 SE,RAC
Operating System: Solaris
Operating System Version: 10

 Description   
Problem:

Import causes instance to be hang. During import only one instance is open.

imp system/manager file=../yedek/gedik_full.dmp log=../yedek/gedik_full_imp3.log full=y FEEDBACK=1000000
buffer=10000000 RESUMABLE=y RESUMABLE_TIMEOUT=72000

Diagnostic Data for Oracle:

Alert Log:

Mon Mar  9 19:38:45 2009
ARC0: Log corruption near block 50941 change 9160702125 time ?
Mon Mar  9 19:38:45 2009
Errors in file /u01/app/oracle/admin/ORCL/bdump/orcl1_arc0_26085.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1
9:38:35
ORA-00312: online log 1 thread 1: '+DATA/orcl/onlinelog/group_1.516.680795507'
ARC0: All Archive destinations made inactive due to error 354
Mon Mar  9 19:38:45 2009
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/p
roduct/10.2.0/dbs/arch/1_27_681074311.dbf' (error 354)
 (ORCL1)
Committing creation of archivelog '/u01/app/oracle/product/10.2.0/dbs/arch/1_2
7_681074311.dbf' (error 354)
ARCH: Archival stopped, error occurred. Will continue retrying

Archive Log Trace:

Corrupt redo block 50941 detected: bad block number
Flag: 0x30 Format: 0x38 Block: 0x20302030 Seq: 0x5c305c79 Beg: 0x3030 Cks:0x5
c31
----- Dump of Corrupt Redo Buffer -----
5c463830203020305c305c795c3130305c3230305c305c305c305c3020665c30
3030433c5c305c345c305c305c305c305c305c3035320a303a35323920202009
5c305c5030305c303033203120305c32303920383022203520305c315c353034
5c305c3020305c3020725c305c3820373035317231305c315c305c330a305c30
3239353220093a35203843203030433f20372034203530395c3230225c393830
30313030203020315c3630795c3431305c3430305c463830203020305c305c79
5c3130305c32303035320a303a3532395c2020095c305c305c305c30433c2066
5c3430305c305c305c305c305c305c3020305c305c305c5030305c3030292031
20305c32303920383022203520305c310a3320363239353220093a355c305c20
5c305c30203620303038203331725c455c625c365c3331305c305c3020665c30
3030433c20372034203530395c3230225c3938305c3530302030203035320a79
3a3532393020200931305c365c305c3038305c6230305c4641305c3431305c35
5c305c3030305c3020305c3542305c3f5c305c3031305c3030305c3131305c31
425420440a4920353239353220093a35314345205c305c395c3130305c323062
20382030203530395c313022203530305c305c345c305c302035303020372034
303530365c3831315c3230304646463035320a463a3532394320200938412032
Rereading log member '+DATA/orcl/onlinelog/group_1.516.680795507' (corruption
)
...
Corrupt redo block 50941 detected: bad block number
Flag: 0x0 Format: 0x0 Block: 0x00000000 Seq: 0x00000000 Beg: 0x0 Cks:0x0
----- Dump of Corrupt Redo Buffer -----
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
Rereading log member '+DATA/orcl/onlinelog/group_1.516.680795507' (corruption
)
...
Corrupt redo block 50941 detected: bad block number
Flag: 0x30 Format: 0x38 Block: 0x20302030 Seq: 0x5c305c79 Beg: 0x3030 Cks:0x5
c31
----- Dump of Corrupt Redo Buffer -----
5c463830203020305c305c795c3130305c3230305c305c305c305c3020665c30
3030433c5c305c345c305c305c305c305c305c3035320a303a35323920202009
5c305c5030305c303033203120305c32303920383022203520305c315c353034
5c305c3020305c3020725c305c3820373035317231305c315c305c330a305c30
3239353220093a35203843203030433f20372034203530395c3230225c393830
30313030203020315c3630795c3431305c3430305c463830203020305c305c79
5c3130305c32303035320a303a3532395c2020095c305c305c305c30433c2066
5c3430305c305c305c305c305c305c3020305c305c305c5030305c3030292031
20305c32303920383022203520305c310a3320363239353220093a355c305c20
5c305c30203620303038203331725c455c625c365c3331305c305c3020665c30
3030433c20372034203530395c3230225c3938305c3530302030203035320a79
3a3532393020200931305c365c305c3038305c6230305c4641305c3431305c35
5c305c3030305c3020305c3542305c3f5c305c3031305c3030305c3131305c31
425420440a4920353239353220093a35314345205c305c395c3130305c323062
20382030203530395c313022203530305c305c345c305c302035303020372034
303530365c3831315c3230304646463035320a463a3532394320200938412032
*** 2009-03-10 03:55:10.757 62692 kcrr.c

As seen above, even if the database hangs, the contents of redo buffer dump change.

Diagnostic Data for Solaris:

Soft Link Mapping to Raw Devices:

oravol1: disk@g600a0b80005a81660000074949959b42:b,raw
oravol2: disk@g600a0b80005a816600000742499595ea:b,raw
oravol3: disk@g600a0b80005a8166000007444995971e:b,raw
oravol4: disk@g600a0b80005a8c9f000004f049959717:b,raw
oravol5: disk@g600a0b80005a8c9f000004f249959991:b,raw

Open File Descriptors of ARCH process:

bash-3.00$ ps -ef|grep arc0
  oracle 19941 14227   0 04:10:31 pts/12      0:00 grep arc0
  oracle 26085     1   0 19:25:05 ?           0:29 ora_arc0_ORCL1

bash-3.00$ ls -ltr /proc/26085/path
...
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 261
 -> /devices/scsi_vhci/disk@g600a0b80005a816600000742499595ea:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 260
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f049959717:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 259
 -> /devices/scsi_vhci/disk@g600a0b80005a81660000074949959b42:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 257
 -> /devices/scsi_vhci/disk@g600a0b80005a8166000007444995971e:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 256
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f249959991:b,raw
...
bash-3.00$

Gathering truss output for ARCH:

truss -fae -w 261,260,259,257,256 -r 261,260,259,257,256 -o arc0.truss.log -p 26085

The command above will trace system calls with pread()/pwrite() IO buffer dumping for fd of 261,260,259,257,256.

Open File Descriptors of LGWR process:

bash-3.00$ ps -ef|grep lgwr
  oracle 28447     1   0   Mar 04 ?           0:17 asm_lgwr_+ASM1
  oracle 25925     1   0 19:24:49 ?           0:38 ora_lgwr_ORCL1
  oracle 26468 14227   0 04:21:02 pts/12      0:00 grep lgwr

bash-3.00$ ls -ltr /proc/25925/path
...
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 260
 -> /devices/scsi_vhci/disk@g600a0b80005a816600000742499595ea:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 259
 -> /devices/scsi_vhci/disk@g600a0b80005a81660000074949959b42:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 258
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f049959717:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 257
 -> /devices/scsi_vhci/disk@g600a0b80005a8166000007444995971e:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 256
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f249959991:b,raw
...

Gathering truss output for ARCH:

bash-3.00$ truss -fae -w 260,259,258,257,256 -r 260,259,258,257,256 -o lgwr.truss.log -p 25925 &

The command above will trace system calls with pread()/pwrite() IO buffer dumping for fd of 260,259,258,257,256.



 Comments   
Comment by ubTools Support [ 10/Mar/09 03:49 AM ]
Last Successful Log Switch:
Beginning log switch checkpoint up to RBA [0x19.2.10], SCN: 9160700232
Mon Mar  9 19:38:21 2009
Thread 1 advanced to log sequence 25 (LGWR switch)
  Current log# 1 seq# 25 mem# 0: +DATA/orcl/onlinelog/group_1.516.680795507
Thread 1 cannot allocate new log, sequence 26
Checkpoint not complete
  Current log# 1 seq# 25 mem# 0: +DATA/orcl/onlinelog/group_1.516.680795507
Mon Mar  9 19:38:28 2009
Completed checkpoint up to RBA [0x19.2.10], SCN: 9160700232

As seen above, the last successful sequence before the corruption is 25.

Header of Archive Log:

(root@gdksun1:bin)$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf
 bs=512 skip=50941 count=1|od -x

0000000 2201 0000   c6fd 0000      0019 0000 81d8 54c6
                    <blockNo>
0000020 2e32 3134 362e 0736 6b78 1207 2f0f 0212
0000040 332d 002c 0505 3831 3834 0532 7567 6469
0000060 0c65 3838 322e 3433 382e 2e38 3138 7807
0000100 076b 0f12 172f 3001 002c 0505 3032 3834
0000120 0739 7362 7361 6369 0e69 3538 312e 3530
0000140 312e 3535 322e 3233 7807 076b 0f12 172f

The block number is 0x0000c6fd (bytes swapped since the platform is little endian). Since 50941=0x0000c6fd, block number in archive log is correct. That means, LGWR had successfuly written the correct redo before the log switch.

Comment by ubTools Support [ 10/Mar/09 03:59 AM ]
Computing the Offset of Corrupted ASM Block:

SQL> select GROUP_NUMBER,NAME,ALLOCATION_UNIT_SIZE from v$asm_diskgroup;

GROUP_NUMBER NAME                      ALLOCATION_UNIT_SIZE
------------ ------------------------- --------------------
           1 DATA                                   1048576

SQL>  select  GROUP_NUMBER,  DISK_NUMBER, name, path
           from v$asm_disk;

GROUP_NUMBER DISK_NUMBER NAME                      PATH
------------ ----------- ------------------------- --------------------
           1           0 DATA_0000                 /u01/oradata/oravol1
           1           1 DATA_0001                 /u01/oradata/oravol2
           1           2 DATA_0002                 /u01/oradata/oravol3
           1           3 DATA_0003                 /u01/oradata/oravol4
           1           4 DATA_0004                 /u01/oradata/oravol5
  • ASM File Name: +DATA/orcl/onlinelog/group_1.516.680795507
  • ASM File#.........: 516
  • Corrupted Block#...: 50941
  • File Block Size:
SQL> select BLOCK_SIZE from  v$asm_file where FILE_NUMBER=516;

BLOCK_SIZE
----------
       512
  • Blocks per ASM Extent: 1048576/512=2048
  • ASM Extent#......: 50941/2048 = 24 (rounded down)
  • Block# in ASM Extent...: 50941 - 24*2048 = 1789
  • Disk# and ASM Extent Offset:
SQL> select DISK_KFFXP,  AU_KFFXP from x$kffxp
     where XNUM_KFFXP=24 and group_kffxp=1 and  NUMBER_KFFXP=516;

DISK_KFFXP   AU_KFFXP
---------- ----------
         1      60884

Disk#1 : /u01/oradata/oravol2
ASM Extent Offset...: 60884*1048576 = 63841501184 --> 0xEDD400000
ASM Corrupted Block Offset.....: 63841501184+1789*512 = 63842417152 --> 0xEDD4DFA00

Comment by ubTools Support [ 10/Mar/09 04:47 AM ]
Interpreting the truss Output of ARCH:

fd#261 is /u01/oradata/oravol2 for ARCH.

Reading Offsets by ARCH:

bash-3.00$ grep "pread(261" arc0.truss.log
26085:  pread(261, 0xFFFFFD7FFC32DE00, 131072, 0xEDE600000) = 131072
26085:  pread(261, 0xFFFFFD7FFC21CE00, 131072, 0xEDE620000) = 131072
26085:  pread(261, 0xFFFFFD7FFC10BE00, 131072, 0xEDE640000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE2DE00, 131072, 0xEDE660000) = 131072
26085:  pread(261, 0xFFFFFD7FFBA2DE00, 131072, 0xEDE680000) = 131072
26085:  pread(261, 0xFFFFFD7FFB42DE00, 131072, 0xEDE6A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB53DE00, 131072, 0xEDE6C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB64DE00, 131072, 0xEDE6E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFADCDE00, 131072, 0xEDE700000) = 131072
26085:  pread(261, 0xFFFFFD7FFAE6DE00, 131072, 0xEDE800000) = 131072
26085:  pread(261, 0xFFFFFD7FFAEDDE00, 131072, 0xEDE720000) = 131072
26085:  pread(261, 0xFFFFFD7FFAF7DE00, 131072, 0xEDE820000) = 131072
26085:  pread(261, 0xFFFFFD7FFC2CDE00, 131072, 0xEDE740000) = 131072
26085:  pread(261, 0xFFFFFD7FFC36DE00, 131072, 0xEDE840000) = 131072
26085:  pread(261, 0xFFFFFD7FFC1BCE00, 131072, 0xEDE760000) = 131072
26085:  pread(261, 0xFFFFFD7FFC25CE00, 131072, 0xEDE860000) = 131072
26085:  pread(261, 0xFFFFFD7FFC0ABE00, 131072, 0xEDE780000) = 131072
26085:  pread(261, 0xFFFFFD7FFC14BE00, 131072, 0xEDE880000) = 131072
26085:  pread(261, 0xFFFFFD7FFBDCDE00, 131072, 0xEDE7A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE6DE00, 131072, 0xEDE8A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB9CDE00, 131072, 0xEDE7C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBA6DE00, 131072, 0xEDE8C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB3CDE00, 131072, 0xEDE7E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB46DE00, 131072, 0xEDE8E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB51DE00, 131072, 0xEDE900000) = 131072
26085:  pread(261, 0xFFFFFD7FFB62DE00, 131072, 0xEDE920000) = 131072
26085:  pread(261, 0xFFFFFD7FFAE0DE00, 131072, 0xEDE940000) = 131072
26085:  pread(261, 0xFFFFFD7FFAF1DE00, 131072, 0xEDE960000) = 131072
26085:  pread(261, 0xFFFFFD7FFC30DE00, 131072, 0xEDE980000) = 131072
26085:  pread(261, 0xFFFFFD7FFC1FCE00, 131072, 0xEDE9A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFC0EBE00, 131072, 0xEDE9C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE0DE00, 131072, 0xEDE9E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
26085:  pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
26085:  pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072
26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:  pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
26085:  pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072
26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:  pread(261, 0xFFFFFD7FFC53BE00, 16384, 0xEDDED4000) = 16384
bash-3.00$

As seen above, offsets starting with 0xEDE and 0xEDD5 are greater than our corrupted offset of 0xEDD4DFA00. So, They are out of the scope.

The followings should be examined:

  • 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
    • This is the ASM Extent Offset. In other words, it's the base offset. (0xEDD400000+512)<0xEDD4DFA00. So, it doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • (0xEDD400200+130560)<0xEDD4DFA00. It doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • (0xEDD420000+512)<0xEDD4DFA00. It doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • Same as before.
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • Same as before.

ARCH did not read the corrupted block#50941. But, it reported an error.

dd Output of the Corrupted Block:

ASM Corrupted Block Offset in 512 byte block: 63842417152/512=124692221

bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124692221 count=1|od -x

0000000 2201 0000 f0fd 0000 001b 0000 80d8 2304
                  <blockNo>
0000020 3838 322e 3731 312e 3431 7807 0a6c 111e
0000040 2230 3001 002c 0605 3131 3730 3130 3306

0x0000f0fd is not 50941. So, it's corrupted.

The reason why ARCH did not read this block is hidden in the error messages:

ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1

It says near.

Comment by ubTools Support [ 10/Mar/09 06:16 AM ]
Finding the Other Corrupted Block:

dd Outputs on pread() of ARCH:

  • 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
    • Offset: 0xEDD400000 = 63841501184
    • Offset in 512 byte block: 63841501184/512=124690432
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690432 count=1|od -x
      
      0000000 2201 0000 c000 0000 001b 0000 8000 621d
                        <blockNo>
      ...
      
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • First Block Offset: 0xEDD400200 = 63841501696
    • First Block Offset in 512 byte block: 63841501696/512=124690433 (next block of previous block)
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690433 count=1|od -x
      
      0000000 2201 0000 c001 0000 001b 0000 8124 5172
                        <blockNo>
      ..
      
    • Last Block Offset: 0xEDD400200 + 130560-512= 63841631744
    • First Block Offset in 512 byte block: 63841631744/512=124690687
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690687 count=1|od -x
      
      0000000 2201 0000 c0ff 0000 001b 0000 8018 4635
                        <blockNo>
      ..
      
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • Offset: 0xEDD420000 = 63841632256
    • Offset in 512 byte block: 63841632256/512 = 124690688
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690688 count=1|od -x
      
      0000000 2201 0000 c800 0000 001b 0000 805c 2d48
                        <blockNo>
      ..
      

As seen above, the block numbers increase from 0xC000 to 0xC0FF. But, in the last call, it jumped to 0xC800.

truss Output of ARCH for block# 0xC800

26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:    01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
                 <blockNo>
26085:     1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
26085:    0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n
26085:    07\f %1F01 0 ,\00505 6 2 0 5 1\b a d a m k a c i0E 1 9 5 . 2 4 4
26085:     . 6 2 . 1 4 507 x l\n07\f % "01 0 ,\00505 6 2 0 5 1\b a d a m k
26085:     a c i\f 7 8 . 1 9 0 . 6 8 . 1 707 x l\n07\f % #01 0 ,\00502 - 1
26085:    05 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f % .02 - 2
26085:     ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07
26085:    \f &0102 - 2 ,\00505 6 1 1 4 105 1 9 5 5 60E 1 9 5 . 2 4 4 . 6 2
26085:     . 1 4 707 x l\n07\f &\r01 0 ,\00505 6 1 1 4 105 1 9 5 5 6\f 8 8
26085:     . 2 3 4 . 5 . 2 3 107 x l\n07\f &0F01 0 ,\00502 - 105 K A Y A 2
26085:    0E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f &1002 - 2 ,\00506 1 1
26085:     1 0 1 605 O K A Y A\f 8 5 . 1 0 8 . 8 7 . 5 007 x l\n07\f & !01
26085:     0 ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n
26085:    07\f & "02 - 2 ,\00505 4 1 9 3 806 6 4 3 2 5 5\r 8 8 . 2 2 5 . 1
26085:     2 0 . 5 307 x l\n07\f & +01 0 ,\00505 5 3 0 5 506 0 9 1 2 1 90E

Then, the following messages were written to the trace file:

26085:  write(2, " * * *   2 0 0 9 - 0 3 -".., 27)      = 27
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, "  ", 1)                               = 1
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " C o r r u p t   r e d o".., 51)      = 51
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " F l a g :   0 x 3 0   F".., 80)      = 80
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " - - - - -   D u m p   o".., 39)      = 39
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 4 6 3 8 3 0 2 0 3 0".., 64)      = 64
                                   <blockNoPiece0>
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 0 4 3 3 c 5 c 3 0".., 64)      = 64
           <blockNoPiece1>
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 5 0 3 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 2 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 2 3 9 3 5 3 2 2 0 0 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 1 3 0 3 0 2 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 1 3 0 3 0 5 c 3 2".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 4 3 0 3 0 5 c 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 2 0 3 0 5 c 3 2 3 0 3 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 2 0 3 6".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 0 4 3 3 c 2 0 3 7".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 a 3 5 3 2 3 9 3 0 2 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 3 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 4 2 5 4 2 0 4 4 0 a 4 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 2 0 3 8 2 0 3 0 2 0 3 5".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 5 3 0 3 6 5 c 3 8".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " R e r e a d i n g   l o".., 78)      = 78
26085:  write(2, "\n", 1)                               = 1

Rereading the block fails like this.

There are 2 problems:

  • Redo block# jumped to 0xC800 from 0xC0FF. So, On-Disk image is corrupted.
  • On-Memory image of block is different than On-Disk image.
Comment by ubTools Support [ 10/Mar/09 10:19 AM ]
Checking missing IO of LGWR from truss Output :
bash-3.00$ grep Err lgwr.truss.log|grep pwrite
bash-3.00$ grep Err lgwr.truss.log|grep pread
bash-3.00$

No missing IO.

Checking IO buffers of LGWR:

fd#260 is /u01/oradata/oravol2 for LGWR.
Offset: 0xEDD420000.

The Last write to block:

25925: pwrite(260, 0x380D78400, 76288, 0xEDD420000) = 76288
25925: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
                 <blockNo>
25925: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
25925: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n

As seen above, the contents of redo buffer is corrupted. The block number is 0xC800.

But, this LGWR had generated correct archivelog:

bash-3.00$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf bs=512 skip=256 count=1|od -x
1+0 records in
1+0 records out
0000000 2201 0000 0100 0000 0019 0000 8000 d162
                  <blockNo>
0000020 3534 332e 2e33 3032 0733 6b78 0904 3c0c
0000040 0114 2c30 0500 3205 3031 3631 6905 6e69

0x0100 = 256, which is the correct block number.

Comment by ubTools Support [ 10/Mar/09 10:35 AM ]
Looks like a configuration issue or a bug in OS/STORAGE side.

This issue handles redo corruption only. But, the database encounters the corruptions on UNDO,INDEX,TABLE, CONTROL FILES, too. But, the root cause is same:
The On-Disk image of the block and its On-Memory image are not same.

Similar to QA-37.

This issue will be updated when a comment is sent by the OS vendor.

Comment by ubTools Support [ 10/Apr/09 01:13 PM ]
Operating System reinstalled by the vendor. Then problem has not occured.




[QA-46] ORA-12545: Connect failed in RAC environment because of an implicit redirect to another node. Created: 27/Feb/09  Updated: 27/Feb/09

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - SQL*Net Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Not a Problem Votes: 0

Product Version: Oracle 10.2.0.4 Standard Edition, RAC
Operating System: Solaris
Operating System Version: 10

 Description   
Description:

The clients can not connect to the database with ORA-12545 error even if They can ping the database server.

Diagnostic Data for Oracle:

Remote and Local Listeners for Both Nodes:

SQL> show parameter listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
remote_listener                      string      LISTENERS_ORCL
SQL>

Remote Listener Configuration for Both Nodes:

LISTENERS_ORCL =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = gdksun1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = gdksun2-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = gdksun1-pubext-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = gdksun2-pubext-vip)(PORT = 1521))
  )

tns alias

SUNGDK =
 (DESCRIPTION =
   (ADDRESS = (PROTOCOL = TCP)(HOST = <IP0>)(PORT = 1521))
   (ADDRESS = (PROTOCOL = TCP)(HOST = <IP1>)(PORT = 1521))
   (LOAD_BALANCE = yes)
   (CONNECT_DATA =
     (SERVER = DEDICATED)
     (SERVICE_NAME = ORCL)
     (FAILOVER_MODE =
       (TYPE = SELECT)
       (METHOD = BASIC)
       (RETRIES = 180)
       (DELAY = 5)
     )
   )
 )

sqlnet trace parameters

TRACE_LEVEL_CLIENT      = 16
TRACE_FILE_CLIENT       = sqlnet.trc
TRACE_DIRECTORY_CLIENT  = <dizinAdı>
TRACE_UNIQUE_CLIENT     = ON
TRACE_TIMESTAMP_CLIENT  = ON

sqlnet trace

(5996) [27-ÅžUB-2009 20:57:50:875] nttgetport: port resolved to 1521
(5996) [27-ÅžUB-2009 20:57:50:875] nttgetport: exit
(5996) [27-ÅžUB-2009 20:57:50:875] nttbnd2addr: using host IP address: <IP1>
(5996) [27-ÅžUB-2009 20:57:50:875] nttbnd2addr: exit
(5996) [27-ÅžUB-2009 20:57:50:875] nsc2addr: normal exit

The host IP and port are resolved to <IP1> and 1521, respectively.

(5996) [27-ÅžUB-2009 20:57:50:937] nscon: sending NSPTCN packet
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: plen=58, type=1
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: socket 420 had bytes written=58
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: exit
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 58 bytes to transport
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: packet dump
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 00 3A 00 00 01 00 00 00  |.:......|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 01 38 01 2C 00 00 08 00  |.8.,....|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 7F FF 86 0E 00 00 01 00  |........|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 01 3E 00 3A 00 00 02 00  |.>.:....|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 21 21 00 00 00 00 00 00  |!!......|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 00 00 00 00 0A C0 00 00  |........|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 00 0A 00 00 00 00 00 00  |........|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 00 00                    |..      |
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: normal exit

A connect packet (NSPTCN) sent to <IP1>.

(5996) [27-ÅžUB-2009 20:57:50:937] nsdofls: sending NSPTDA packet
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: plen=328, type=6
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: socket 420 had bytes written=328
(5996) [27-ÅžUB-2009 20:57:50:937] nttwr: exit
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 328 bytes to transport
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: packet dump
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 01 48 00 00 06 00 00 00  |.H......|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 00 00 28 44 45 53 43 52  |..(DESCR|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 49 50 54 49 4F 4E 3D 28  |IPTION=(|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 41 44 44 52 45 53 53 3D  |ADDRESS=|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 28 50 52 4F 54 4F 43 4F  |(PROTOCO|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 4C 3D 54 43 50 29 28 48  |L=TCP)(H|
...
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 75 72 61 64 3F 54 75 6C  |urad?Tul|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: 75 6E 61 79 29 29 29 29  |unay))))|
(5996) [27-ÅžUB-2009 20:57:50:937] nspsend: normal exit

A data packet (NSPTDA) sent to <IP1>.

(5996) [27-ÅžUB-2009 20:57:50:937] nscon: recving a packet
(5996) [27-ÅžUB-2009 20:57:50:937] nsprecv: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nsbal: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nsbgetfl: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nsbgetfl: normal exit
(5996) [27-ÅžUB-2009 20:57:50:937] nsmal: entry
(5996) [27-ÅžUB-2009 20:57:50:937] nsmal: 48 bytes at 0x15bcf60
(5996) [27-ÅžUB-2009 20:57:50:937] nsmal: normal exit
(5996) [27-ÅžUB-2009 20:57:50:937] nsbal: normal exit
(5996) [27-ÅžUB-2009 20:57:50:937] nsprecv: reading from transport...
(5996) [27-ÅžUB-2009 20:57:50:937] nttrd: entry
(5996) [27-ÅžUB-2009 20:57:50:968] nttrd: socket 420 had bytes read=10
(5996) [27-ÅžUB-2009 20:57:50:968] nttrd: exit
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 10 bytes from transport
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: tlen=10, plen=10, type=5
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: packet dump
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 00 0A 00 00 05 02 00 00  |........|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 01 85                    |..      |
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: normal exit
(5996) [27-ÅžUB-2009 20:57:50:968] nscon: got NSPTRD packet

Got a redirect packet (NSPTRD) from <IP1>.

(5996) [27-ÅžUB-2009 20:57:50:968] nsrdr: recving a packet
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: entry
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: reading from transport...
(5996) [27-ÅžUB-2009 20:57:50:968] nttrd: entry
(5996) [27-ÅžUB-2009 20:57:50:968] nttrd: socket 420 had bytes read=399
(5996) [27-ÅžUB-2009 20:57:50:968] nttrd: exit
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 399 bytes from transport
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: tlen=399, plen=399, type=6
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: packet dump
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 01 8F 00 00 06 00 00 00  |........|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 00 40 28 41 44 44 52 45  |.@(ADDRE|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 53 53 3D 28 50 52 4F 54  |SS=(PROT|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 4F 43 4F 4C 3D 54 43 50  |OCOL=TCP|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 29 28 48 4F 53 54 3D 67  |)(HOST=g|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 64 6B 73 75 6E 32 29 28  |dksun2)(|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 50 4F 52 54 3D 31 35 32  |PORT=152|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 31 29 29 00 28 44 45 53  |1)).(DES|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 43 52 49 50 54 49 4F 4E  |CRIPTION|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 3D 28 41 44 44 52 45 53  |=(ADDRES|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 53 3D 28 50 52 4F 54 4F  |S=(PROTO|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 43 4F 4C 3D 54 43 50 29  |COL=TCP)|
...
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 3D 4D 75 72 61 64 3F 54  |=Murad?T|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 75 6C 75 6E 61 79 29 29  |ulunay))|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 28 49 4E 53 54 41 4E 43  |(INSTANC|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 45 5F 4E 41 4D 45 3D 6F  |E_NAME=o|
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: 72 63 6C 32 29 29 29     |rcl2))) |
(5996) [27-ÅžUB-2009 20:57:50:968] nsprecv: normal exit
(5996) [27-ÅžUB-2009 20:57:50:968] nsrdr: got NSPTDA packet

Got a data packet (NSPTDA) from <IP1>.

(5996) [27-ÅžUB-2009 20:57:50:984] nttgetport: port resolved to 1521
(5996) [27-ÅžUB-2009 20:57:50:984] nttgetport: exit
(5996) [27-ÅžUB-2009 20:57:50:984] nttbnd2addr: looking up IP addr for host: gdksun2
(5996) [27-ÅžUB-2009 20:57:53:640] nttbnd2addr:  *** hostname lookup failure! ***
(5996) [27-ÅžUB-2009 20:57:53:640] nttbnd2addr: exit


As seen above, even if the initial request was sent to <IP1>, now it's redirected to an host named gdksun2.

(5996) [27-ÅžUB-2009 20:57:53:640] nserror: entry
(5996) [27-ÅžUB-2009 20:57:53:640] nserror: nsres: id=0, op=77, ns=12545, ns2=12560; nt[0]=515,
 nt[1]=1001, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
(5996) [27-ÅžUB-2009 20:57:53:640] snsbitts_ts: entry
(5996) [27-ÅžUB-2009 20:57:53:640] snsbitts_ts: acquired the bit
(5996) [27-ÅžUB-2009 20:57:53:640] snsbitts_ts: normal exit
(5996) [27-ÅžUB-2009 20:57:53:640] snsbitcl_ts: entry
(5996) [27-ÅžUB-2009 20:57:53:640] snsbitcl_ts: normal exit
(5996) [27-ÅžUB-2009 20:57:53:640] nsc2addr: error exit
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: entry
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: 318 bytes at 0x15bce20
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: normal exit
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: entry
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: 164 bytes at 0x15b9920
(5996) [27-ÅžUB-2009 20:57:53:640] nsmfr: normal exit
(5996) [27-ÅžUB-2009 20:57:53:640] nladtrm: entry
(5996) [27-ÅžUB-2009 20:57:53:640] nladtrm: exit
(5996) [27-ÅžUB-2009 20:57:53:640] nscall: error exit
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:  error from nscall
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    nr err code: 0
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    ns main err code: 12545
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    ns (2)  err code: 12560
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    nt main err code: 515
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    nt (2)  err code: 1001
(5996) [27-ÅžUB-2009 20:57:53:640] nioqper:    nt OS   err code: 0
(5996) [27-ÅžUB-2009 20:57:53:640] niomapnserror: entry
(5996) [27-ÅžUB-2009 20:57:53:640] niqme: entry
(5996) [27-ÅžUB-2009 20:57:53:640] niqme: reporting NS-12545 error as ORA-12545
(5996) [27-ÅžUB-2009 20:57:53:640] niqme: exit
(5996) [27-ÅžUB-2009 20:57:53:640] niomapnserror: returning error 12545
(5996) [27-ÅžUB-2009 20:57:53:640] niomapnserror: exit
(5996) [27-ÅžUB-2009 20:57:53:640] niotns: Couldn't connect, returning 12545

Then, the client got ORA-12545 error.



 Comments   
Comment by ubTools Support [ 27/Feb/09 10:16 PM ]
In this issue, the client was redirected to the less loaded other node, which is not reachable by the remote client.

This is expected behavior as below:

  • According to listener.ora configuration, listener sends IP or hostname back to client.
  • When Load Balancing is in use in RAC environment, request sent to listener may be redirected to other node if other node is less loaded.

For both cases, If listener sends an unreachable IP or hostname, client encounters an error.

Solutions:

  • Change hostname to IP address in listener.ora or add hostname to DNS,"/etc/hosts"-like configuration file in client side.
  • If there are multiple IP addresses in database server, and they are reachable by some group of clients only, then define multiple listeners for each group and allow only 1 listener in load balancing.




[QA-45] 'direct path read temp' hangs on read() system call when ASMLIB in use. Created: 02/Feb/09  Updated: 03/Feb/09

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10.2.0.4 Standard Edition, RAC
Operating System: Linux
Operating System Version: SLES 10 SP1 (x86-64)

 Description   
Environment:
Database......: Oracle 10.2.0.4 Standard Edition, RAC
ASMLIB........: oracleasm-2.6.16.46-0.12-smp-2.0.3-1.x86_64.rpm
                oracleasmlib-2.0.2-1.x86_64.rpm
                oracleasm-support-2.1.2-1.SLE10.x86_64.rpm

Description:

direct path read temp hangs on read() system call when ASMLIB in use.

Diagnostic Data for Oracle:

Wait Event:

SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING';

      SEQ# EVENT
---------- ----------------------------------------------------------------
        P1         P2         P3  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ---------- ---------------
     46619 direct path read temp
       202     285578          7          0            5611
...
SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING';

      SEQ# EVENT
---------- ----------------------------------------------------------------
        P1         P2         P3  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ---------- ---------------
     46619 direct path read temp
       202     285578          7          0            5824

SQL>

The session is waiting for the completion of direct path read temp for 5824 seconds. The SEQ# column is not changing. It's TOO long to read just 7 blocks from the disk.

Stack Trace:

SQL> select spid from v$session s,v$process p where s.paddr=p.addr and s.sid=512;

SPID
------------
2359

SQL> oradebug SETOSPID 2359
Oracle pid: 38, Unix process pid: 2359, image: oracle@gdksun1

SQL> oradebug dump errorstack 3
Statement processed.

SQL> oradebug TRACEFILE_NAME
/u03/app/oracle/admin/ORCL/udump/orcl1_ora_2359.trc
SQL>

<from the trace file>

Current SQL statement for this session:
CREATE INDEX "ACD2" ON "ACCOUNT_DETAIL" ...
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksdxfdmp()+1118      call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksdxcb()+1547        call     ksdxfdmp()           7FFF177EDD90 ? 000000011 ?
                                                   000000003 ? 7FFF177EDED0 ?
                                                   7FFF177EDE30 ? 000000000 ?
sspuser()+111        call     ksdxcb()             000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF177EDE30 ? 000000000 ?
__funlockfile()+80   call     sspuser()            000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF177EDE30 ? 000000000 ?
__read_nocancel()+7  signal   __funlockfile()      00000000D ? 7FFF177EE970 ?
                                                   000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
call_instance_read(  call     __read_nocancel()    00000000D ? 7FFF177EE970 ?
)+12                                               000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
asm_io_v2()+185      call     call_instance_read(  00000000D ? 7FFF177EE970 ?
                              )                    000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
kfkOsmIO()+1205      call     asm_io_v2()          00000000D ? 7FFF177EE970 ?
                                                   000000246 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
kfkReapIO()+497      call     kfkOsmIO()           2B4E95830588 ? 2B4E95AAE000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 2B4E95B2E000 ?
kfkIOPriv()+770      call     kfkReapIO()          000000000 ? 006110320 ?
                                                   2B4E95830588 ? 006110320 ?
                                                   006110320 ? 2B4E95B2E000 ?
kfdIOPriv()+95       call     kfkIOPriv()          000000000 ? 000000000 ?
                                                   000000024 ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
kfioReapIO()+476     call     kfdIOPriv()          000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
kfioRequest()+197    call     kfioReapIO()         7FFF177EED68 ? 000000001 ?
                                                   0FFFFFFFF ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
ksfd_osmwat()+874    call     kfioRequest()        000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdwtio()+693       call     ksfd_osmwat()        000000001 ? 000000000 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdwat1()+220       call     ksfdwtio()           000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdrwat0()+1269     call     ksfdwat1()           000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdblock()+156      call     ksfdrwat0()          000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcflwi()+48          call     ksfdblock()          7FFF177F11C0 ? 000000001 ?
                                                   000000010 ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcflci()+689         call     kcflwi()             2B4E95FF3F28 ? 000000001 ?
                                                   000000010 ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcblci()+197         call     kcflci()             2B4E95FF3F28 ? 000000000 ?
                                                   0000000CA ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
kcblcio()+280        call     kcblci()             2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
kcblsltck()+50       call     kcblcio()            2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
stsCheckIO()+194     call     kcblsltck()          2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
srsnext()+746        call     stsCheckIO()         2B4E95D16FE0 ? 2B4E958F9108 ?
                                                   000000000 ? 000000001 ?
                                                   7FFF177F1270 ? 000000000 ?
srsget()+138         call     srsnext()            2B4E9602FE14 ? 000000000 ?
                                                   2B4E95D16FE0 ? 2B4E958F8F10 ?
                                                   2B4E00000000 ? 000000000 ?
sorgetqbf()+297      call     srsget()             2B4E95D16F28 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   2B4E95D372B0 ? 2B4E95D37468 ?
qersoFetch()+176     call     sorgetqbf()          2B4E95D16F28 ? 2B4E95D37468 ?
                                                   2B4E95D372B0 ? 7FFF177F1584 ?
                                                   2B4E95D372B0 ? 2B4E95D37468 ?
qerliFetch()+304     call     qersoFetch()         5EEF015A0 ? 002D292E4 ?
                                                   7FFF177F1748 ? 000000001 ?
                                                   5EEF01648 ? 2B4E95D37468 ?
kdicrws()+8744       call     qerliFetch()         5EEF01358 ? 00143CCD6 ?
                                                   2B4E9581C140 ? 000000001 ?
                                                   5EEF01648 ? 5D2E60688 ?
kdicdrv()+335        call     kdicrws()            5D2E60688 ? 5D2E60B60 ?
                                                   000000000 ? 000000001 ?
                                                   2B4E95D367B0 ? 5D2E60610 ?
opiexe()+12879       call     kdicdrv()            5D2E60B60 ? 5D2E60380 ?
                                                   000000002 ? 000000001 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
                                                   7FFF177F3748 ? 000000004 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiosq()+11          call     opiosq0()            000000003 ? 00000000F ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiodr()+984         call     opiosq()             000000003 ? 00000000F ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   2B4E95D367B0 ? 000000000 ?
ttcpip()+1012        call     opiodr()             00000004A ? 00000000F ?
                                                   7FFF177F65E0 ? 000000004 ?
                                                   0053E3F30 ? 000000000 ?
opitsk()+1322        call     ttcpip()             0060AB150 ? 7FFF177F4540 ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   7FFF177F60D8 ? 7FFF177F6748 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFF177F65E0 ? 000000001 ?
                                                   000000000 ? 612CA0900000000 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   000000000 ? 612CA0900000000 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   0053E3D00 ? 612CA0900000000 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   0053E3D00 ? 612CA0900000000 ?
opimai_real()+163    call     sou2o()              7FFF177F7780 ? 00000003C ?
                                                   000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
main()+116           call     opimai_real()        000000002 ? 7FFF177F7810 ?
                                                   000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
__libc_start_main()  call     main()               000000002 ? 7FFF177F7810 ?
+244                                               000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
_start()+41          call     __libc_start_main()  0006D23A8 ? 000000002 ?
                                                   7FFF177F7968 ? 000000000 ?
                                                   0053E3D00 ? 000000002 ?

Looks like an hang in __read_nocancel() .

Diagnostic Data for Linux:

strace Output:

oracle@gdksun1:~> strace -fp 2359
Process 2359 attached - interrupt to quit
read(13,

The process is sleeping on the file descriptor(fd) of 13 by read() system call.

lsof Output:

oracle@gdksun1:~> lsof -p 2359|grep 13
oracle  2359 oracle  DEL    REG               0,12
                         131074 /2
oracle  2359 oracle  mem    REG                8,2
    133423                16839 /lib64/ld-2.4.so
oracle  2359 oracle  mem    REG               8,17
    681761                52138 /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so
oracle  2359 oracle  mem    REG               8,17
    691049                52139 /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so
oracle  2359 oracle  mem    REG               8,17
  11385162                44025 /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so
oracle  2359 oracle    6w   REG               8,17
   1494136                90989 /u03/app/oracle/admin/ORCL/bdump/alert_ORCL1.log
oracle  2359 oracle   13u   REG               0,19
         0 18446604444591769000 /dev/oracleasm/iid/0000000000000002
oracle@gdksun1:~>

The fd#13 is an ASM device.

gdb output:

oracle@gdksun1:~> gdb $ORACLE_HOME/bin/oracle 2359
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
Attaching to program: /u03/app/oracle/product/10.2.0/db_1/bin/oracle, process 2359
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 47616513025792 (LWP 2359)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libnuma.so...done.
Loaded symbols for /usr/lib64/libnuma.so
Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...done.
Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) backtrace
#0  0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00002b4e95d6772c in call_instance_read (priv=<value optimized out>, buf=0x7fff177ee970, size=80)
 at asmlib_v2.c:540
#2  0x00002b4e95d67869 in asm_io_v2 (ctx=0xd, requests=<value optimized out>, reqlen=582,
 waitreqs=0xffffffffffffffff, waitlen=0,
    completions=0x2b4e95cce000, complen=1, timeout=4294967295, statusp=0x7fff177eea14) at asmlib_v2.c:705
#3  0x0000000000b1804d in kfkOsmIO ()
#4  0x0000000000b113e9 in kfkReapIO ()
#5  0x0000000000b0b342 in kfkIOPriv ()
#6  0x0000000000a8939f in kfdIOPriv ()
#7  0x0000000000b06eec in kfioReapIO ()
#8  0x0000000000b04de5 in kfioRequest ()
#9  0x00000000008d669a in ksfd_osmwat ()
#10 0x00000000008be64d in ksfdwtio ()
#11 0x00000000008bb3a4 in ksfdwat1 ()
#12 0x00000000008bb1f5 in ksfdrwat0 ()
#13 0x00000000008bb464 in ksfdblock ()
#14 0x00000000026c0e98 in kcflwi ()
#15 0x00000000026c0e31 in kcflci ()
#16 0x0000000001011435 in kcblci ()
#17 0x0000000001010e20 in kcblcio ()
#18 0x0000000001010ca2 in kcblsltck ()
#19 0x00000000020588f0 in stsCheckIO ()
#20 0x0000000002063908 in srsnext ()
#21 0x0000000002062eba in srsget ()
#22 0x000000000205d089 in sorgetqbf ()
#23 0x0000000002d64222 in qersoFetch ()
#24 0x0000000002d2e412 in qerliFetch ()
#25 0x00000000014327d2 in kdicrws ()
#26 0x000000000142fa47 in kdicdrv ()
#27 0x0000000002f2cf07 in opiexe ()
#28 0x00000000034c55d4 in opiosq0 ()
#29 0x00000000034c48db in opiosq ()
#30 0x00000000012e88f4 in opiodr ()
#31 0x0000000003a4b900 in ttcpip ()
#32 0x00000000012e3fc4 in opitsk ()
#33 0x00000000012e6ee4 in opiino ()
#34 0x00000000012e88f4 in opiodr ()
#35 0x00000000012da313 in opidrv ()
#36 0x0000000001e62466 in sou2o ()
#37 0x00000000006d24cb in opimai_real ()
#38 0x00000000006d241c in main ()
(gdb)

Looks like a hang in __read_nocancel() . It's the same as in Oracle stack trace.

An Excerpt from /var/log/messages:

Feb  2 22:46:51 gdksun1 kernel: 509 [RAIDarray.mpp]mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed.

At the same time, mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed appeared in /var/log/messages.



 Comments   
Comment by ubTools Support [ 03/Feb/09 12:03 AM ]
The problem caused by __read_nocancel () from /lib64/libpthread.so.0.

OS Vendor driver looks incompatible with Oracle ASMLIB.





[QA-44] TNS connection lost for big SQL*Net packets, and slow performance for small SQL*Net packets. Created: 26/Nov/08  Updated: 26/Nov/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 9i
Operating System: TRU64

 Description   
The customer has the following errors in SQ*Net SERVER trace:
[17-KAS-2008 06:02:33:866] nspsend: transport write error
[17-KAS-2008 06:02:33:867] nserror: nsres: id=0, op=67, ns=12547, ns2=12560; nt[0]=517, nt[1]=32, nt[2]=0; ora[
0]=0, ora[1]=0, ora[2]=0
[17-KAS-2008 06:02:33:868] nsdo: nsctxrnk=0
[17-KAS-2008 06:02:33:869] nioqsn: send failed: bl = 47, nicbl = 59
[17-KAS-2008 06:02:33:870] nioqper:  error from nioqsn
[17-KAS-2008 06:02:33:871] nioqper:    nr err code: 0
[17-KAS-2008 06:02:33:872] nioqper:    ns main err code: 12547
[17-KAS-2008 06:02:33:873] nioqper:    ns (2)  err code: 12560
[17-KAS-2008 06:02:33:875] nioqper:    nt main err code: 517
[17-KAS-2008 06:02:33:876] nioqper:    nt (2)  err code: 32
[17-KAS-2008 06:02:33:877] nioqper:    nt OS   err code: 0
[17-KAS-2008 06:02:33:878] nioqer: entry
[17-KAS-2008 06:02:33:879] nioqce: entry
[17-KAS-2008 06:02:33:880] nioqce: exit
[17-KAS-2008 06:02:33:881] nioqer: exit
[17-KAS-2008 06:02:33:882] nioqsn:  returning error: 3113

nt[1]=32 is Operating System Dependent(OSD) error code.

An excerpt from truss output of SERVER process:

531245: lseek(7, 0, SEEK_CUR)                           = 501528
531245: write(7, " [ 1 7 - K A S - 2 0 0 8".., 34)      = 34
531245: lseek(7, 0, SEEK_CUR)                           = 501562
531245: write(7, " e n t r y\n", 6)                     = 6
531245: write(12, "07DB\0\006\0\0\0\0\002C2".., 2011)   Err#32 Broken pipe
531245:     Received signal #13, SIGPIPE [ignored]
531245:       siginfo: SIGPIPE
531245: lseek(8, 69632, SEEK_SET)                       = 69632
531245: read(8, "12\0DB0F\0\0 t\0DC0F\0\0".., 512)      = 512
531245: lseek(8, 13312, SEEK_SET)                       = 13312
531245: read(8, "19\0A203\0\09E\0A303\0\0".., 512)      = 512
531245: gettimeofday(0x000000011FFF7A90, 0x00000000)    = 0
531245: lseek(7, 0, SEEK_CUR)                           = 501568

OSD error is Err#32 Broken pipe. This OSD error is also defined in errno.h:

  • #define EPIPE 32 /* Broken pipe */

Client side SQL*Net trace shows that client is waiting for a response from server on nttrd() call.

Since the server process is lost connection, it's not able to send a message to the client side. Since the client side is not getting a response, his screen waits in "Not Responding" state in Windows.



 Comments   
Comment by ubTools Support [ 26/Nov/08 01:38 PM ]
The customer uses CISCO ASA 5520 series, Version 8.0.4 FIREWALL. This has an option of inspect sqlnet. After this option has been disabled, the problem has been solved.




[QA-43] Slow performance while navigating on the forms items. Created: 26/Nov/08  Updated: 26/Nov/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10g 10.2.0.4
Operating System: Windows
Operating System Version: 2003

 Description   
The customer encountered slow performance while navigating on their forms items. The problem occured sporadically. When it occurs it takes 3-4 seconds, which are not acceptable.

An excerpt from EVENT 10046 trace file:

*** [ Windows thread id: 5580 ]
*** 2008-11-24 21:27:20.390
RPC CALL:...(stament removed);
=====================
PARSING IN CURSOR #5 len=56 dep=1 uid=78 oct=3 lid=78 tim=1420397928
hv=3822139714 ad='57c1a3bc'
SELECT ...(stament removed)
END OF STMT
PARSE #5:c=0,e=32,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1420397924
EXEC #5:c=0,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1420398028
FETCH #5:c=0,e=14,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=1,tim=1420398156
RPC EXEC:c=0,e=0
WAIT #0: nam='SQL*Net message to client' ela= 4 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1420398646
*** 2008-11-24 21:27:20.765
WAIT #0: nam='SQL*Net message from client' ela= 3 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1420772565
RPC CALL:...(stament removed);
EXEC #5:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1420773632
FETCH #5:c=0,e=14,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=1,tim=1420773671
RPC EXEC:c=0,e=0
WAIT #0: nam='SQL*Net message to client' ela= 5 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1420774146
*** [ Windows thread id: 4416 ]
*** 2008-11-24 21:27:24.765
WAIT #0: nam='SQL*Net message from client' ela= 6 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1424766291
RPC CALL:...(stament removed);
=====================
PARSING IN CURSOR #7 len=70 dep=1 uid=78 oct=3 lid=78 tim=1424767161
hv=2516830579 ad='5af8ae9c'
SELECT ...(stament removed)
END OF STMT
PARSE #7:c=0,e=78,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1424767156
EXEC #7:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1424767987
FETCH #7:c=0,e=94,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,tim=1424768238
RPC EXEC:c=0,e=0
WAIT #0: nam='SQL*Net message to client' ela= 7 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1424768583
PUT: vc (659FD948), msg (5CF1D5B0), size (90), flgs (3)
*** [ Windows thread id: 5580 ]
*** 2008-11-24 21:27:24.765
WAIT #0: nam='SQL*Net message from client' ela= 4 driver id=1297371904
#bytes=1 p3=0 obj#=-1 tim=1424771121
RPC CALL:...(stament removed);
=====================
PARSING IN CURSOR #8 len=90 dep=1 uid=78 oct=3 lid=78 tim=1424772412
hv=1036237733 ad='5d1575f0'
SELECT ...(stament removed)
END OF STMT
PARSE #8:c=0,e=56,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1424772407
EXEC #8:c=0,e=55,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1424772567
FETCH #8:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1424772608
RPC EXEC:c=0,e=0
WAIT #0: nam='SQL*Net message to client' ela= 6 driver id=1297371904
#bytes=4 p3=0 obj#=-1 tim=1424773339


 Comments   
Comment by ubTools Support [ 26/Nov/08 12:10 PM ]
The time had been spent between the following operations:
WAIT #0: nam='SQL*Net message to client' ela= 5 driver id=1297371904
 #bytes=1 p3=0 obj#=-1 tim=1420774146
*** [ Windows thread id: 4416 ]
*** 2008-11-24 21:27:24.765
WAIT #0: nam='SQL*Net message from client' ela= 6 driver id=1297371904
 #bytes=1 p3=0 obj#=-1 tim=1424766291

As seen from the excerpt above, the windows thread ID had switched to 4416.

elapsed time: (tim=1424766291) - (tim=1420774146) = 3992145 microseconds = 3.992145 seconds.

The customer has SHARED SERVER configuration. But, SHARED_SERVERS parameter was set to 1. Unfortunately, we had not got an opportunity to debug SHARED SERVER operations. But, increasing SHARED_SERVERS parameter has solved this thread switch problem since there are now pre-created SHARED SERVERS.





[QA-42] ORA-27040 ORA-19504 OSD-04002: While backing up by RMAN to shared disk on Windows. Created: 18/Sep/08  Updated: 18/Sep/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.2.0.4
Operating System: Windows
Operating System Version: Windows 2000

 Description   
(The problem solution is simple. But, since it saves setup times, it's doccumented here.)

Note:145843.1 How to Configure RMAN to Write to Shared Drives on Windows NT/2000 is implemented. Although the script works on RMAN command line; it fails if it's defined as a job on Enterprise Manager.

An excerpt from the script:

run
{
 allocate channel ch0 device type disk format '\\host\RMAN\RMAN_%U';
 ...
} 

An excerpt from the output log of EM:

RMAN> run

2> {

3> allocate channel ch0 device type disk format '\host\RMAN\RMAN_%U';
...
10> }
...
RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of backup plus archivelog command at 09/18/2008 17:37:36

ORA-19504: failed to create file "C:\host\RMAN\RMAN_1KJQTRAU_1_1"

ORA-27040: file create error, unable to create file

OSD-04002: unable to open file

O/S-Error: (OS 3) The system cannot find the path specified.


 Comments   
Comment by ubTools Support [ 18/Sep/08 02:54 PM ]
As seen above, the file name in the script is '\\host\RMAN\RMAN_%U'. But, it's converted by EM to:
  • '\host\RMAN\RMAN_%U'
  • Then "C:\host\RMAN\RMAN_1KJQTRAU_1_1"
Comment by ubTools Support [ 18/Sep/08 02:57 PM ]
'\' character is a special character in JAVA/C. The correct file name should be:
  • '\\\host\RMAN\RMAN_%U'

Three '\' characters should be used before hostname; not two.





[QA-41] Startup database fails with ORA-600 [4000], ORA-600 [4137]. Created: 16/Jun/08  Updated: 17/Jun/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 8.1.7.3.0
Operating System: HP-UX

 Description   
After an hardware problem, database crashed.

Since an ARCHIVELOG is missed, and restoring the previous backup is not acceptable, the customer wanted to open database in inconsistent state.



 Comments   
Comment by ubTools Support [ 16/Jun/08 11:38 PM ]
Steps to open the database:
  • setting _ALLOW_RESETLOGS_CORRUPTION=TRUE in init<SID>.ora.
  • startup mount;
  • recover database until cancel;
    <--cancel
  • alter database open resetlogs;

But, it failed with the following error:

ORA-00600: internal error code, arguments: [4000], [9], [], [], [], [], [], []

Oracle Note:47456.1:

DESCRIPTION:

  This has the potential to be a very serious error.

  It means that Oracle has tried to find an undo segment number in the 
  dictionary cache and failed.

ARGUMENTS:
  Arg [a] Undo segment number

FUNCTIONALITY:      
  KERNEL TRANSACTION UNDO

IMPACT:             
  INSTANCE FAILURE - Instance will not restart
  STATEMENT FAILURE
Comment by ubTools Support [ 17/Jun/08 12:13 AM ]
An exerpt from the trace file:
ORA-00600: internal error code, arguments: [4000], [9], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
...
Block header dump:  0x0080003e
 Object id on Block? Y
 seg/obj: 0x12  csc: 0x570.b8368d16  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
...
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   xid:  0x0009.019.000dc23f    uba: 0x58c13ddb.0523.46  --U-    1  fsc 0x0000.b8368d17

Looks like a problem regarding obj$ and its undo...If undo requirement is bypassed, there will be no requirement for undo. In order to do that, bumping SNC further needed.

csc shows the the SCN of last block cleanout. We guessed it may be used a target bumping SCN as below:

  • 0x570.b8368d16 => 0x570b8368d16 => Decimal: 5981685058838 => divide by 1024/1024/1024 = 5571

Bump SCN as below and restart:

  • Setting _MINIMUM_GIGA_SCN = 5571 in init<SID>.ora
  • startup mount;
  • recover database until cancel;
    <--cancel
  • alter database open resetlogs;

ORA-600 [4000] disappeared. But now, the following error appeared:

ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []

Oracle Note:47456.1:

DESCRIPTION:        

  While backing out an undo record (i.e. at the time of rollback) we found a
  transaction id mis-match indicating either a corruption in the rollback 
  segment or corruption in an object which the rollback segment is trying to
  apply undo records on.

  This would indicate a corrupted rollback segment. 

FUNCTIONALITY:      
 Kernel Transaction Undo Recovery
 
IMPACT:             
  POSSIBLE PHYSICAL CORRUPTION in Rollback segments
Comment by ubTools Support [ 17/Jun/08 12:21 AM ]
Restart the database:
  • Setting _CORRUPTED_ROLLBACK_SEGMENTS in init<SID>.ora
  • startup mount;
  • recover database until cancel;
    <--cancel
  • alter database open resetlogs;

The database is opened.

Since it's opened in inconsistent state, a full export and then import into a new database is required to get rid of the inconsistency in Oracle dictionary. But, the customer data will not be consistent after the import. It should be reviewed by the customer.

Comment by ubTools Support [ 17/Jun/08 09:28 AM ]
The database was opened inconsistently. It'll be recreated with full export/import.




[QA-40] "Oracle Database Server" status is INVALID after applying 10.2.0.4 PatchSet. Created: 15/Jun/08  Updated: 15/Jun/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
Operating System: IBM-AIX

 Description   
After applying 10.2.0.4.0 PatchSet into 10.2.0.3.0, catupgrd.sql logs shows the following:
...
SQL> CREATE OR REPLACE PACKAGE BODY dbms_sqlpa wrapped
  2  a000000
  3  1
  4  abcd
  5  abcd
  6  abcd
...
Warning: Package Body created with compilation errors.

SQL> show errors;
Errors for PACKAGE BODY DBMS_SQLPA:

LINE/COL ERROR
-------- -----------------------------------------------------------------
113/5    PL/SQL: SQL Statement ignored
118/44   PL/SQL: ORA-00904: "OTHER_XML": invalid identifier
SQL> 
...
Component                                Status         Version  HH:MM:SS
Oracle Database Server                  INVALID      10.2.0.4.0  00:09:22
JServer JAVA Virtual Machine              VALID      10.2.0.4.0  00:02:43
Oracle XDK                                VALID      10.2.0.4.0  00:00:29
Oracle Database Java Packages             VALID      10.2.0.4.0  00:00:14
Oracle Text                               VALID      10.2.0.4.0  00:00:21
Oracle XML Database                       VALID      10.2.0.4.0  00:02:02
Oracle Workspace Manager                  VALID      10.2.0.4.3  00:00:43
Oracle Data Mining                        VALID      10.2.0.4.0  00:00:20
OLAP Analytic Workspace                   VALID      10.2.0.4.0  00:00:16
OLAP Catalog                              VALID      10.2.0.4.0  00:00:55
Oracle OLAP API                           VALID      10.2.0.4.0  00:00:43
Oracle interMedia                         VALID      10.2.0.4.0  00:02:24
Spatial                                   VALID      10.2.0.4.0  00:01:34
Oracle Ultra Search                       VALID      10.2.0.4.0  00:00:22
Oracle Expression Filter                  VALID      10.2.0.4.0  00:00:09
Oracle Enterprise Manager                 VALID      10.2.0.4.0  00:01:36
Oracle Rule Manager                       VALID      10.2.0.4.0  00:00:08
.


 Comments   
Comment by ubTools Support [ 15/Jun/08 06:32 PM ]
Compiling DBMS_SQLPA causes the problem. To find the object including OTHER_XML column, ERRORSTACK trace for ORA-904 would be useful. But, since it's a known column of PLAN_TABLE, it's not required while diagnosing the problem.

There were both SYS.PLAN_TABLE as a table and PUBLIC.PLAN_TABLE as a public synonym in the database:

SQL> select owner,object_name,object_type from dba_objects where owner in ('SYS','PUBLIC') and upper(object_name)  like 'PLAN_TABLE%';

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-------------------
PUBLIC
PLAN_TABLE
SYNONYM

SYS
PLAN_TABLE
TABLE

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-------------------

SYS
PLAN_TABLE$
TABLE


SQL> select TABLE_OWNER,TABLE_NAME from dba_synonyms where OWNER='PUBLIC' and SYNONYM_NAME='PLAN_TABLE';

TABLE_OWNER                    TABLE_NAME
------------------------------ ------------------------------
SYS                            PLAN_TABLE$

SQL>

But, not all columns of SYS.PLAN_TABLE table and PUBLIC.PLAN_TABLE synonym are same:

SQL> desc sys.plan_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 STATEMENT_ID                                       VARCHAR2(30)
 TIMESTAMP                                          DATE
 REMARKS                                            VARCHAR2(80)
 OPERATION                                          VARCHAR2(30)
 OPTIONS                                            VARCHAR2(255)
 OBJECT_NODE                                        VARCHAR2(128)
 OBJECT_OWNER                                       VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(30)
 OBJECT_INSTANCE                                    NUMBER(38)
 OBJECT_TYPE                                        VARCHAR2(30)
 OPTIMIZER                                          VARCHAR2(255)
 SEARCH_COLUMNS                                     NUMBER
 ID                                                 NUMBER(38)
 PARENT_ID                                          NUMBER(38)
 POSITION                                           NUMBER(38)
 COST                                               NUMBER(38)
 CARDINALITY                                        NUMBER(38)
 BYTES                                              NUMBER(38)
 OTHER_TAG                                          VARCHAR2(255)
 PARTITION_START                                    VARCHAR2(255)
 PARTITION_STOP                                     VARCHAR2(255)
 PARTITION_ID                                       NUMBER(38)
 OTHER                                              LONG

SQL>

SQL> desc sys.plan_table$
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 STATEMENT_ID                                       VARCHAR2(30)
 PLAN_ID                                            NUMBER
 TIMESTAMP                                          DATE
 REMARKS                                            VARCHAR2(4000)
 OPERATION                                          VARCHAR2(30)
 OPTIONS                                            VARCHAR2(255)
 OBJECT_NODE                                        VARCHAR2(128)
 OBJECT_OWNER                                       VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(30)
 OBJECT_ALIAS                                       VARCHAR2(65)
 OBJECT_INSTANCE                                    NUMBER(38)
 OBJECT_TYPE                                        VARCHAR2(30)
 OPTIMIZER                                          VARCHAR2(255)
 SEARCH_COLUMNS                                     NUMBER
 ID                                                 NUMBER(38)
 PARENT_ID                                          NUMBER(38)
 DEPTH                                              NUMBER(38)
 POSITION                                           NUMBER(38)
 COST                                               NUMBER(38)
 CARDINALITY                                        NUMBER(38)
 BYTES                                              NUMBER(38)
 OTHER_TAG                                          VARCHAR2(255)
 PARTITION_START                                    VARCHAR2(255)
 PARTITION_STOP                                     VARCHAR2(255)
 PARTITION_ID                                       NUMBER(38)
 OTHER                                              LONG
 OTHER_XML                                          CLOB
 DISTRIBUTION                                       VARCHAR2(30)
 CPU_COST                                           NUMBER(38)
 IO_COST                                            NUMBER(38)
 TEMP_SPACE                                         NUMBER(38)
 ACCESS_PREDICATES                                  VARCHAR2(4000)
 FILTER_PREDICATES                                  VARCHAR2(4000)
 PROJECTION                                         VARCHAR2(4000)
 TIME                                               NUMBER(38)
 QBLOCK_NAME                                        VARCHAR2(30)

SQL>

Since table access takes precedence on synonym access, SYS.PLAN_TABLE table was used. But, this table doesn't have a column named OTHER_XML, which caused the problem.

After dropping SYS.PLAN_TABLE table, PUBLIC.PLAN_TABLE synonym used:

SQL> drop table sys.plan_table;

Table dropped.

SQL>



SQL> desc plan_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 STATEMENT_ID                                       VARCHAR2(30)
 PLAN_ID                                            NUMBER
 TIMESTAMP                                          DATE
 REMARKS                                            VARCHAR2(4000)
 OPERATION                                          VARCHAR2(30)
 OPTIONS                                            VARCHAR2(255)
 OBJECT_NODE                                        VARCHAR2(128)
 OBJECT_OWNER                                       VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(30)
 OBJECT_ALIAS                                       VARCHAR2(65)
 OBJECT_INSTANCE                                    NUMBER(38)
 OBJECT_TYPE                                        VARCHAR2(30)
 OPTIMIZER                                          VARCHAR2(255)
 SEARCH_COLUMNS                                     NUMBER
 ID                                                 NUMBER(38)
 PARENT_ID                                          NUMBER(38)
 DEPTH                                              NUMBER(38)
 POSITION                                           NUMBER(38)
 COST                                               NUMBER(38)
 CARDINALITY                                        NUMBER(38)
 BYTES                                              NUMBER(38)
 OTHER_TAG                                          VARCHAR2(255)
 PARTITION_START                                    VARCHAR2(255)
 PARTITION_STOP                                     VARCHAR2(255)
 PARTITION_ID                                       NUMBER(38)
 OTHER                                              LONG
 OTHER_XML                                          CLOB
 DISTRIBUTION                                       VARCHAR2(30)
 CPU_COST                                           NUMBER(38)
 IO_COST                                            NUMBER(38)
 TEMP_SPACE                                         NUMBER(38)
 ACCESS_PREDICATES                                  VARCHAR2(4000)
 FILTER_PREDICATES                                  VARCHAR2(4000)
 PROJECTION                                         VARCHAR2(4000)
 TIME                                               NUMBER(38)
 QBLOCK_NAME                                        VARCHAR2(30)

SQL>

Applying PatchSet did not give INVALID status:

Component                                Status         Version  HH:MM:SS
Oracle Database Server                    VALID      10.2.0.4.0  00:09:20
JServer JAVA Virtual Machine              VALID      10.2.0.4.0  00:02:56
Oracle XDK                                VALID      10.2.0.4.0  00:00:28
Oracle Database Java Packages             VALID      10.2.0.4.0  00:00:14
Oracle Text                               VALID      10.2.0.4.0  00:00:22
Oracle XML Database                       VALID      10.2.0.4.0  00:02:05
Oracle Workspace Manager                  VALID      10.2.0.4.3  00:00:45
Oracle Data Mining                        VALID      10.2.0.4.0  00:00:21
OLAP Analytic Workspace                   VALID      10.2.0.4.0  00:00:16
OLAP Catalog                              VALID      10.2.0.4.0  00:00:55
Oracle OLAP API                           VALID      10.2.0.4.0  00:00:41
Oracle interMedia                         VALID      10.2.0.4.0  00:02:24
Spatial                                   VALID      10.2.0.4.0  00:01:37
Oracle Ultra Search                       VALID      10.2.0.4.0  00:00:22
Oracle Expression Filter                  VALID      10.2.0.4.0  00:00:09
Oracle Enterprise Manager                 VALID      10.2.0.4.0  00:01:37
Oracle Rule Manager                       VALID      10.2.0.4.0  00:00:08
.
Comment by ubTools Support [ 15/Jun/08 06:34 PM ]
  • Drop SYS.PLAN_TABLE table.
  • Install PatchSet.




[QA-39] Database hangs on "cursor: pin S wait on X" wait events. Created: 07/Jun/08  Updated: 07/Jun/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
Operating System: IBM-AIX

 Description   
The Database hangs.

The ASH report shows the activity on cursor: pin S wait on X wait event.

Top User Events
Event                        Event Class   % Activity     Avg Active Sessions
cursor: pin S wait on X      Concurrency        98.90                   13.74

An excerpt from SYSTEMSTATE (level 10) dump

PROCESS 23:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=42033 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f3271
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000d44f4280
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4939 efd 0 whr 5 slp 44251
       opr=2 pso=7000000cefa5df0 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 33:
...
    ----------------------------------------
    SO: 7000000f78e0770, type: 4, owner: 7000000fa8bbcf8, flag: INIT/-/-/0x00
    (session) sid: 4976 trans: 7000000f10ce318, creator: 7000000fa8bbcf8, flag: (8100041) USR/- BSY/-/-/-/-/-
              DID: 0001-0021-000193D1, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 3, prv: 0, sql: 7000000fc179fa8, psql: 7000000fc68f018, user: 0/SYS
    O/S info: user: orapaky0, term: , ospid: 5099938, machine: akmenkulp2
              program: sqlplus@akmenkulp2 (TNS V1-V3)
    application name: sqlplus@akmenkulp2 (TNS V1-V3), hash value=0
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=27479 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f33be
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc64adb8
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4976 efd 0 whr 5 slp 15527
       opr=2 pso=7000000d86f7d88 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc64ad80
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper EXCL
       Cursor Pin uid 4976 efd 0 whr 1 slp 0
       opr=3 pso=7000000cee2b2c0 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 34:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=63882 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f348e
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fdd75c30
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4693 efd 0 whr 5 slp 62048
       opr=2 pso=7000000e7548610 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 39:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=12512 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f36cf
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fd857ad8
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4970 efd 0 whr 5 slp 12395
       opr=2 pso=7000000cea1c250 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 47:
...
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=48746 wait_time=0 seconds since wait started=0
                  idn=16a1ebe6, value=132e00000000, where|sleeps=5002494fe
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000c5ca4690
         Mutex 7000000d71fd6b0(4910, 0) idn 16a1ebe6 oper GET_SHRD
         Cursor Pin uid 4896 efd 0 whr 5 slp 48634
         opr=2 pso=7000000e24f7e30 flg=0
         pcs=7000000d71fd6b0 nxt=0 flg=35 cld=0 hd=7000000fda5fc00 par=7000000d71fdaa0
         ct=0 hsh=0 unp=0 unn=0 hvl=d71fdd78 nhv=1 ses=7000000f78b5cc0
         hep=7000000d71fd730 flg=80 ld=1 ob=7000000d76d3848 ptr=70000008a8ff478 fex=70000008a8fe788
        ----------------------------------------
...
PROCESS 53:
...
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=38686 wait_time=0 seconds since wait started=0
                  idn=16a1ebe6, value=132e00000000, where|sleeps=500249647
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000db1dee98
         Mutex 7000000d71fd6b0(4910, 0) idn 16a1ebe6 oper GET_SHRD
         Cursor Pin uid 4828 efd 0 whr 5 slp 38647
         opr=2 pso=7000000e7f58940 flg=0
         pcs=7000000d71fd6b0 nxt=0 flg=35 cld=0 hd=7000000fda5fc00 par=7000000d71fdaa0
         ct=0 hsh=0 unp=0 unn=0 hvl=d71fdd78 nhv=1 ses=7000000f78b5cc0
         hep=7000000d71fd730 flg=80 ld=1 ob=7000000d76d3848 ptr=70000008a8ff478 fex=70000008a8fe788
        ----------------------------------------
...
PROCESS 54:
...
      (session) sid: 4910 trans: 0, creator: 7000000d81123c0, flag: (e1) USR/- BSY/-/-/-/-/-
                DID: 0001-002F-00044CCF, short-term DID: 0000-0000-00000000
                txn branch: 0
                oct: 3, prv: 0, sql: 7000000dbd12ce0, psql: 7000000dbb98da0, user: 55/SYSMAN
      O/S info: user: orapaky0, term: unknown, ospid: 1234, machine: akmenkulp2
                program: OMS
      client info: akmenkulp2_Management_Service
      application name: OEM.DefaultPool, hash value=3997945242
      action name: /database/instance/sitemap, hash value=105676648
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=54571 wait_time=0 seconds since wait started=0
                  idn=ec048ba, value=137000000000, where|sleeps=5007f3abc
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000dcd36f50
         Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
         Cursor Pin uid 4910 efd 0 whr 5 slp 53641
         opr=2 pso=7000000e7c7f338 flg=0
         pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
         ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
         hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
        ----------------------------------------
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000dcd36e38
         Mutex 7000000d71fd6b0(4910, 0) idn 16a1ebe6 oper EXCL
         Cursor Pin uid 4910 efd 0 whr 1 slp 0
         opr=3 pso=7000000e79b9030 flg=0
         pcs=7000000d71fd6b0 nxt=0 flg=35 cld=0 hd=7000000fda5fc00 par=7000000d71fdaa0
         ct=0 hsh=0 unp=0 unn=0 hvl=d71fdd78 nhv=1 ses=7000000f78b5cc0
         hep=7000000d71fd730 flg=80 ld=1 ob=7000000d76d3848 ptr=70000008a8ff478 fex=70000008a8fe788
        ----------------------------------------
...
PROCESS 55:
...
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=39147 wait_time=0 seconds since wait started=0
                  idn=16a1ebe6, value=132e00000000, where|sleeps=5002496e9
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000dbb57a10
         Mutex 7000000d71fd6b0(4910, 0) idn 16a1ebe6 oper GET_SHRD
         Cursor Pin uid 4973 efd 0 whr 5 slp 39118
         opr=2 pso=7000000ce4e6fe8 flg=0
         pcs=7000000d71fd6b0 nxt=0 flg=35 cld=0 hd=7000000fda5fc00 par=7000000d71fdaa0
         ct=0 hsh=0 unp=0 unn=0 hvl=d71fdd78 nhv=1 ses=7000000f78b5cc0
         hep=7000000d71fd730 flg=80 ld=1 ob=7000000d76d3848 ptr=70000008a8ff478 fex=70000008a8fe788
        ----------------------------------------
...
PROCESS 62:
...
      (session) sid: 4805 trans: 0, creator: 7000000eab6d018, flag: (e1) USR/- BSY/-/-/-/-/-
                DID: 0001-003E-00010563, short-term DID: 0000-0000-00000000
                txn branch: 0
                oct: 3, prv: 0, sql: 7000000dce39e08, psql: 7000000dc0fc240, user: 72/MENKUL2008
      O/S info: user: akbank, term: L1058, ospid: 3468:3428, machine: AA\L1058
                program: toad.exe
      application name: TOAD 9.1.0.62, hash value=3156025525
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=41562 wait_time=0 seconds since wait started=0
                  idn=ec048ba, value=137000000000, where|sleeps=5007f3dc4
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000db0f6b20
         Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
         Cursor Pin uid 4805 efd 0 whr 5 slp 41523
         opr=2 pso=7000000ce1b0070 flg=0
         pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
         ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
         hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
        ----------------------------------------
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000db0f6ae8
         Mutex 7000000e51cde60(4805, 0) idn 7a99f649 oper EXCL
         Cursor Pin uid 4805 efd 0 whr 1 slp 0
         opr=3 pso=7000000d8b887e8 flg=0
         pcs=7000000e51cde60 nxt=0 flg=35 cld=0 hd=7000000dcd2b108 par=7000000d9991e20
         ct=0 hsh=0 unp=0 unn=0 hvl=d99920f8 nhv=1 ses=7000000f88942e0
         hep=7000000e51cdee0 flg=80 ld=1 ob=7000000e59bf060 ptr=7000000882ed3d8 fex=7000000882ec6e8
        ----------------------------------------
...
PROCESS 65:
...
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=35648 wait_time=0 seconds since wait started=0
                  idn=7a99f649, value=12c500000000, where|sleeps=500028b22
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000fd34bbf8
         Mutex 7000000e51cde60(4805, 0) idn 7a99f649 oper GET_SHRD
         Cursor Pin uid 4570 efd 0 whr 5 slp 35618
         opr=2 pso=7000000d898d0c0 flg=0
         pcs=7000000e51cde60 nxt=0 flg=35 cld=0 hd=7000000dcd2b108 par=7000000d9991e20
         ct=0 hsh=0 unp=0 unn=0 hvl=d99920f8 nhv=1 ses=7000000f88942e0
         hep=7000000e51cdee0 flg=80 ld=1 ob=7000000e59bf060 ptr=7000000882ed3d8 fex=7000000882ec6e8
        ----------------------------------------
...
PROCESS 68:
...
   (session) sid: 4698 trans: 0, creator: 7000000f98a7320, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-0044-00000344, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 3, prv: 0, sql: 7000000c5281eb8, psql: 7000000d4d7e910, user: 72/MENKUL2008
    O/S info: user: geneks, term: AKYGM011, ospid: 3168:3148, machine: AKYATIRIM\AKYGM011
              program:
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=44607 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f3dfe
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000dc155c48
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4698 efd 0 whr 5 slp 8722
       opr=2 pso=7000000cea6fce8 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000dc155c10
       Mutex 7000000e5f9cd90(4698, 0) idn 651e7adb oper EXCL
       Cursor Pin uid 4698 efd 0 whr 1 slp 0
       opr=3 pso=7000000cef62b68 flg=0
       pcs=7000000e5f9cd90 nxt=0 flg=34 cld=1 hd=7000000dcea52a8 par=7000000e5f9d708
       ct=1 hsh=0 unp=0 unn=0 hvl=e5f9d098 nhv=1 ses=7000000f782cbe0
       hep=7000000e5f9ce10 flg=80 ld=1 ob=7000000e53207c8 ptr=700000099b3ef88 fex=700000099b3e298
      ----------------------------------------
...
PROCESS 70:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=34939 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f3e0f
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc176d20
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4784 efd 0 whr 5 slp 33075
       opr=2 pso=7000000ceb24778 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 73:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=24155 wait_time=0 seconds since wait started=0
                idn=651e7adb, value=125a00000000, where|sleeps=5000121e2
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000dc4eba08
       Mutex 7000000e5f9cd90(4698, 0) idn 651e7adb oper GET_SHRD
       Cursor Pin uid 4637 efd 0 whr 5 slp 8674
       opr=2 pso=7000000cef8dcb0 flg=0
       pcs=7000000e5f9cd90 nxt=0 flg=34 cld=1 hd=7000000dcea52a8 par=7000000e5f9d708
       ct=1 hsh=0 unp=0 unn=0 hvl=e5f9d098 nhv=1 ses=7000000f782cbe0
       hep=7000000e5f9ce10 flg=80 ld=1 ob=7000000e53207c8 ptr=700000099b3ef88 fex=700000099b3e298
      ----------------------------------------
...
PROCESS 76:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=35688 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f3ed4
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc655c80
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4806 efd 0 whr 5 slp 35673
       opr=2 pso=7000000ce702c90 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
PROCESS 142:
...
      waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=48159 wait_time=0 seconds since wait started=0
                  idn=16a1ebe6, value=132e00000000, where|sleeps=5002499ac
...
        ----------------------------------------
        KGX Atomic Operation Log 7000000db235510
         Mutex 7000000d71fd6b0(4910, 0) idn 16a1ebe6 oper GET_SHRD
         Cursor Pin uid 4592 efd 0 whr 5 slp 48111
         opr=2 pso=7000000e2252830 flg=0
         pcs=7000000d71fd6b0 nxt=0 flg=35 cld=0 hd=7000000fda5fc00 par=7000000d71fdaa0
         ct=0 hsh=0 unp=0 unn=0 hvl=d71fdd78 nhv=1 ses=7000000f78b5cc0
         hep=7000000d71fd730 flg=80 ld=1 ob=7000000d76d3848 ptr=70000008a8ff478 fex=70000008a8fe788
        ----------------------------------------
...


 Comments   
Comment by ubTools Support [ 07/Jun/08 05:46 PM ]
Mutex identifier value helps to locate address of mutex. For example:
PROCESS 23:
...
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=42033 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f3271
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000d44f4280
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4939 efd 0 whr 5 slp 44251
       opr=2 pso=7000000cefa5df0 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------

The mutex identifier 0xec048ba at the address 0x7000000d92bce40 is requested in the shared mode (GET_SHRD)

Comment by ubTools Support [ 07/Jun/08 06:11 PM ]
According to SYSTEMSTATE dump, the following table shows the which process is waiting for the which process:
Waiter Process# Holder Process#
23 33
33 33
34 33
39 33
47 54
53 54
54 33
55 54
62 33
65 62
68 33
70 33
73 68
76 33
142 54

According to the table above, there is a deadlock and the root holder is the process#33. This process waits for itself. That means there is a self-deadlock problem.

Comment by ubTools Support [ 07/Jun/08 06:25 PM ]
Process#33 state:
PROCESS 33:
...
    ----------------------------------------
    SO: 7000000f78e0770, type: 4, owner: 7000000fa8bbcf8, flag: INIT/-/-/0x00
    (session) sid: 4976 trans: 7000000f10ce318, creator: 7000000fa8bbcf8, flag: (8100041) USR/- BSY/-/-/-/-/-
              DID: 0001-0021-000193D1, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 3, prv: 0, sql: 7000000fc179fa8, psql: 7000000fc68f018, user: 0/SYS
    O/S info: user: orapaky0, term: , ospid: 5099938, machine: akmenkulp2
              program: sqlplus@akmenkulp2 (TNS V1-V3)
    application name: sqlplus@akmenkulp2 (TNS V1-V3), hash value=0
    waiting for 'cursor: pin S wait on X' blocking sess=0x0 seq=27479 wait_time=0 seconds since wait started=0
                idn=ec048ba, value=137000000000, where|sleeps=5007f33be
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc64adb8
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper GET_SHRD
       Cursor Pin uid 4976 efd 0 whr 5 slp 15527
       opr=2 pso=7000000d86f7d88 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...
      ----------------------------------------
      KGX Atomic Operation Log 7000000fc64ad80
       Mutex 7000000d92bce40(4976, 0) idn ec048ba oper EXCL
       Cursor Pin uid 4976 efd 0 whr 1 slp 0
       opr=3 pso=7000000cee2b2c0 flg=0
       pcs=7000000d92bce40 nxt=7000000cb6ed710 flg=35 cld=0 hd=7000000fc8611f0 par=7000000d94637a0
       ct=4 hsh=0 unp=0 unn=0 hvl=cb6eda00 nhv=1 ses=7000000f78e0770
       hep=7000000d92bcec0 flg=80 ld=1 ob=7000000d9c1b3a8 ptr=7000000991e6108 fex=7000000991e5418
      ----------------------------------------
...

Process#33 holds mutex identifier 0xec048ba in the exclusive mode. But, it requests the same identifier in the shared mode. It's self-deadlock bug.

Comment by ubTools Support [ 07/Jun/08 06:39 PM ]
Unfortunately, there is no stack trace of process#33 to find the kernel function in which it's was running. But, the following SQL ran by process#33 helps to narrow down the Oracle bugs:
      LIBRARY OBJECT HANDLE: handle=7000000fc179fa8 mtx=7000000fc17a0d8(1) cdp=1
      name=
select i.obj#, i.rowcnt, i.leafcnt, i.distkey, i.lblkkey, i.dblkkey,i.clufac, i.blevel
, i.analyzetime, i.samplesize, decode(i.pctthres$,null,null,mod(trunc(i.pctthres$/256),256)),
 i.flags, ist.cachedblk, ist.cachehit, ist.logicalread from ind$ i, ind_stats$ ist
 where i.obj# = ist.obj#(+) and i.bo#=:1 order by i.obj#
      hash=25d75620e6d3487e18921ac30ec048ba timestamp=06-06-2008 23:01:28
...

Looks like a statistic collection SQL...

Comment by ubTools Support [ 07/Jun/08 06:44 PM ]
Oracle Note:5907779.8:
This problem is introduced in 10.2.0.3 

A process may hang with a self deadlock typically when executing 
DBMS_STATS. The hung process shows itself waiting on a
"cursor: pin S wait on X" waitevent waiting for an object
that it has pinned itself.

According to the note, this problem has been fixed in Oracle 10.2.0.4.





[QA-38] DBMS_XMLPARSER.FREEPARSER doesn't release UGA memory. Created: 06/Jun/08  Updated: 06/Jun/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
Operating System: Solaris

 Description   
DBMS_XMLPARSER.FREEPARSER doesn't release UGA memory.

Session memory statistics before operation:

SQL> select name,value from v$sesstat a, v$statname b
  2  where a.statistic#=b.statistic#
  3  and b.name like '%memory%'
  4  and sid = 58
  5  order by value desc;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session pga memory                                                   424336
session pga memory max                                               424336
session uga memory                                                   209872
session uga memory max                                               209872
sorts (memory)                                                           16
workarea memory allocated                                                14

6 rows selected.

Operation:

...
dbms_xmlparser.parseclob (v_parser, data_for_table);
...
dbms_xmlparser.freeParser(v_parser);
...

Session memory statistics after operation:

SQL> select name,value from v$sesstat a, v$statname b
  2  where a.statistic#=b.statistic#
  3  and b.name like '%memory%'
  4  and sid = 58
  5  order by value desc;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session pga memory                                                 52396928
session pga memory max                                             52396928
session uga memory                                                 51816784
session uga memory max                                             51816784
sorts (memory)                                                           19
workarea memory allocated                                                14

6 rows selected.

An excerpt from HEAPDUMP LEVEL 4 (UGA) dump:

...
EXTENT 788 addr=ffffffff7ce90080
  Chunk ffffffff7ce90090 sz=      392    free      "               "
  Chunk ffffffff7ce90218 sz=      184    freeable  "kgiobdtb       "
  Chunk ffffffff7ce902d0 sz=     1112    recreate  "koh-kghu sessi "  latch=0
     ds ffffffff7ce9db50 sz=     1112 ct=        1
  Chunk ffffffff7ce90728 sz=     2136    freeable  "PLS non-lib hp "  ds=ffffffff7cf6abd8
  Chunk ffffffff7ce90f80 sz=     4288    freeable  "qmxdpls_subhea "  ds=ffffffff7ce96b78
  Chunk ffffffff7ce92040 sz=     4288    freeable  "qmxdpls_subhea "  ds=ffffffff7ce96b78
  Chunk ffffffff7ce93100 sz=     4288    freeable  "qmxdpls_subhea "  ds=ffffffff7ce96b78
  Chunk ffffffff7ce941c0 sz=     4288    freeable  "qmxdpls_subhea "  ds=ffffffff7ce96b78
  Chunk ffffffff7ce95280 sz=     4328    freeable  "qmxdpls_subhea "  ds=ffffffff7ce96b78
  Chunk ffffffff7ce96368 sz=       48    freeable  "allocator state"
  Chunk ffffffff7ce96398 sz=       72    freeable  "persistant defi"
  Chunk ffffffff7ce963e0 sz=       48    freeable  "kgbt           "
  Chunk ffffffff7ce96410 sz=       48    freeable  "frame segment  "
  Chunk ffffffff7ce96440 sz=       64    freeable  "qmxdpls_init_ug"
  Chunk ffffffff7ce96480 sz=       48    freeable  "frame segment  "
  Chunk ffffffff7ce964b0 sz=       72    freeable  "frame segment  "
  Chunk ffffffff7ce964f8 sz=       72    freeable  "kxsxsi: frame  "
  Chunk ffffffff7ce96540 sz=     1568    recreate  "qmxdpls_subhea "  latch=0
     ds ffffffff7ce96b78 sz= 50681480 ct=    11820
        ffffffff779d6940 sz=     4288
        ffffffff779d7a00 sz=     4288
        ffffffff779d8ac0 sz=     4288
        ffffffff779d9b80 sz=     4288
        ffffffff779dac40 sz=     4288
        ffffffff779dbd00 sz=     4288
        ffffffff779dcdc0 sz=     4288
        ffffffff779dde80 sz=     4288
        ffffffff779def40 sz=     4288
        ffffffff779c04c0 sz=     4288
        ffffffff779c1580 sz=     4288
        ffffffff779c2640 sz=     4288
        ffffffff779c3700 sz=     4288
        ffffffff779c47c0 sz=     4288
        ffffffff779c5880 sz=     4288
        ffffffff779c6940 sz=     4288
        ffffffff779c7a00 sz=     4288
...
       ffffffff7ce93100 sz=     4288
        ffffffff7ce941c0 sz=     4288
        ffffffff7ce95280 sz=     4328
  Chunk ffffffff7ce96b60 sz=      160    freeable  "qmxdpls_heapptr"
  Chunk ffffffff7ce96c00 sz=      232    freeable  "lob ctl struct "
  Chunk ffffffff7ce96ce8 sz=       80    freeable  "frame          "
  Chunk ffffffff7ce96d38 sz=       40    freeable  "private oac inf"
  Chunk ffffffff7ce96d60 sz=      128    freeable  "bnrdef and uac "
  Chunk ffffffff7ce96de0 sz=      600    recreate  "bind var heap  "  latch=0
     ds ffffffff7ce971f0 sz=      600 ct=        1
  Chunk ffffffff7ce97038 sz=      928    freeable  "kgiob          "
  Chunk ffffffff7ce973d8 sz=     4160    freeable  "koh-kghu sessi "  ds=ffffffff7cf65710
  Chunk ffffffff7ce98418 sz=     8192    freeable  "kdit           "
  Chunk ffffffff7ce9a418 sz=       40    free      "               "
  Chunk ffffffff7ce9a440 sz=     8192    freeable  "kdit           "
  Chunk ffffffff7ce9c440 sz=       48    freeable  "ktatt          "
  Chunk ffffffff7ce9c470 sz=       48    freeable  "kdit           "
  Chunk ffffffff7ce9c4a0 sz=       80    freeable  "kgicu          "
  Chunk ffffffff7ce9c4f0 sz=     5672    free      "               "
  Chunk ffffffff7ce9db18 sz=     2520    freeable  "koh-kghu sessio"
  Chunk ffffffff7ce9e4f0 sz=       48    freeable  "frame segment  "
  Chunk ffffffff7ce9e520 sz=       40    freeable  "frame segment  "
  Chunk ffffffff7ce9e548 sz=       72    freeable  "kxsxsi: frame  "
  Chunk ffffffff7ce9e590 sz=     2464    perm      "perm           "  alo=432
  Chunk ffffffff7ce9ef30 sz=       48    freeable  "allocator state"
  Chunk ffffffff7ce9ef60 sz=       80    freeable  "frame          "
  Chunk ffffffff7ce9efb0 sz=      128    freeable  "bnrdef and uac "
  Chunk ffffffff7ce9f030 sz=      600    recreate  "bind var heap  "  latch=0
     ds ffffffff7ce9f440 sz=      600 ct=        1
  Chunk ffffffff7ce9f288 sz=      928    freeable  "kgiob          "
  Chunk ffffffff7ce9f628 sz=     2520    freeable  "koh-kghu sessio"
EXTENT 789 addr=ffffffff7ce30080
  Chunk ffffffff7ce30090 sz=     2016    perm      "perm           "  alo=2016
 ...
Total heap size    = 51790440
FREE LISTS:
 Bucket 0 size=56
...
 Bucket 16 size=524312
 Bucket 17 size=2097176
Total free space   =   870336
UNPINNED RECREATABLE CHUNKS (lru first):
PERMANENT CHUNKS:
  Chunk ffffffff7ce9e590 sz=     2464    perm      "perm           "  alo=432
  Chunk ffffffff7ce30090 sz=     2016    perm      "perm           "  alo=2016
  Chunk ffffffff7cf70090 sz=      288    perm      "perm           "  alo=288
  Chunk ffffffff7cf600a8 sz=    20320    perm      "perm           "  alo=20320
Permanent space    =    25088
******************************************************

DBMS_SESSION.FREE_UNUSED_USER_MEMORY did not help.



 Comments   
Comment by ubTools Support [ 06/Jun/08 01:35 PM ]
The UGA of PGA had been filled with a big chunk which has recreatable "qmxdpls_subhea". This chunk is 50681480 byte. (See QA-8 for the simple definitions of HEAPDUMP).

Oracle Note:3518909.8:

Calling Dbms_xmlparser.freeParser / dbms_xmldom.freeDocument in the procudure do not 
appear to free the memory.

The leaked memory shows in heapdumps as "qmxdpls_subheap"

Although the mentioned bug fixed in Oracle 9.2.0.6; the customer encounters the same problem in Oracle 9.2.0.8.

Since the next usage of DBMS_XMLPARSER.PARSECLOB after a previous DBMS_XMLPARSER.FREEPARSER within the same session, the UGA did not grow. This is acceptable by the customer.





[QA-37] "ORA-01187: cannot read from file" in one of the RAC Node. Created: 08/May/08  Updated: 12/May/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10g 10.2.0.3
Operating System: Linux
Operating System Version: RHEL 4

 Description   
The one of RAC Nodes encounters the following error codes while no problem occurs on the other node:

From ALERT LOG:

Generic Alert Log Error May 2, 2008 11:21:30 PM ORA-12012: error on auto execute of job 8913
ORA-01187: cannot read from file ORA-01187: cannot read from file 96 because it failed verification tests
ORA-01110: data file 96: '/u64/oradata/DMSDB/LBPRD_IDX_SKU_005DMSDB.dbf'
ORA-06512: at "SYS.PRVT_ADVISOR", line 1624
ORA-06512: at "SYS.DBMS_ADVISOR", line 186
ORA-06512: at "SYS.DBMS_SPACE", line 1347
ORA-06512: at "SYS.DBMS_SPACE", line 1566
because it failed verification tests
Trace File:  /u00/app/oracle/oracle/admin/DMSDB/bdump/dmsdb2_j000_21660.trc 

From dmsdb2_j000_21660.trc:

*** 2008-04-07 23:03:34.750
GATHER_STATS_JOB: GATHER_TABLE_STATS('"LOADOWNER"','"MARKEDPRODUCT"','"MARKEDPRODUCT_20080406"', ...)
ORA-01187: cannot read from file 88 because it failed verification tests
ORA-01110: data file 88: '/u64/oradata/DMSDB/MRK_IDX_004DMSDB.dbf'


 Comments   
Comment by ubTools Support [ 08/May/08 12:32 PM ]
From the trace file, the problem can be reproduced by DBMS_STATS.GATHER_TABLE_STATS().

The following strace output will give the system calls:

strace -f -o strace.log sqlplus / as sysdba <<EOF
exec DBMS_STATS.GATHER_TABLE_STATS('<owner>','<tableName>','<partitionName',1,DEGREE=>2);
exit;
EOF
Comment by ubTools Support [ 08/May/08 12:35 PM ]
An excerpt from strace.log that the db files were opened with O_DIRECT flag:
16822 open("/u51/oradata/DMSDB/system001DMSDB.dbf", O_RDWR|O_SYNC|O_DIRECT|O_LAR
GEFILE) = 14
...
16822 open("/u31/oradata/DMSDB/ctl1DMSDB.ctl", O_RDWR|O_SYNC|O_DIRECT|O_LARGEFIL
E) = 15
16822 open("/u32/oradata/DMSDB/ctl2DMSDB.ctl", O_RDWR|O_SYNC|O_DIRECT|O_LARGEFIL
E) = 16
16822 open("/u33/oradata/DMSDB/ctl3DMSDB.ctl", O_RDWR|O_SYNC|O_DIRECT|O_LARGEFIL
E) = 17
16822 open("/u52/oradata/DMSDB/undotbs002DMSDB2.dbf", O_RDWR|O_SYNC|O_DIRECT|O_L
ARGEFILE) = 18
...
16822 open("/u70/oradata/DMSDB/TEMPDMSDB_002.dbf", O_RDWR|O_SYNC|O_DIRECT|O_LARG
EFILE) = 20
16822 open("/u70/oradata/DMSDB/TEMPDMSDB_002.dbf", O_RDWR|O_DIRECT|O_LARGEFILE)
= 21
...
16822 open("/u52/oradata/DMSDB/sysaux001DMSDB.dbf", O_RDWR|O_SYNC|O_DIRECT|O_LAR
GEFILE) = 28
16822 open("/u51/oradata/DMSDB/system002DMSDB.dbf", O_RDWR|O_SYNC|O_DIRECT|O_LAR
GEFILE) = 29
16822 open("/u55/oradata/DMSDB/undotbs001DMSDB1.dbf", O_RDWR|O_SYNC|O_DIRECT|O_L
ARGEFILE) = 30
16822 open("/u61/oradata/DMSDB/MRK_IDX_001DMSDB.dbf", O_RDWR|O_SYNC|O_DIRECT|O_L
ARGEFILE) = 31
...

The second excerpt from strace.log that the db files were opened without O_DIRECT flag:

16822 open("/u65/oradata/DMSDB/dor_data_200805_001_DMSDB.dbf", O_RDWR|O_SYNC|O_L
ARGEFILE) = 19
16822 open("/u65/oradata/DMSDB/dor_data_200805_002_DMSDB.dbf", O_RDWR|O_SYNC|O_L
ARGEFILE) = 27
...
16822 open("/u64/oradata/DMSDB/MRK_IDX_004DMSDB.dbf", O_RDWR|O_SYNC|O_LARGEFILE)
 = 32
Comment by ubTools Support [ 08/May/08 12:52 PM ]
The customer uses OCFS2.

O_DIRECT flag of open() system call bypasses File System(FS) cache; and DISK-IO occurs between user address space and disk.

OCFS opens dbfiles with O_DIRECT flag to eliminate inconsistency among FS caches of nodes. Since RAC provides consistency among SGAs and there will be no db buffers is FS cache, no consistency problem occurs.

From Ref: Oracle Note:391771.1:

48. Any special flags to run Oracle RAC?
OCFS2 volumes containing the Voting diskfile (CRS), Cluster registry (OCR),
Data files, Redo logs, Archive logs and Control files must be mounted
with the datavolume and nointr mount options.
The datavolume option ensures that the Oracle processes opens these files
with the o_direct flag.
The nointr option ensures that the ios are not interrupted by signals.

# mount -o datavolume,nointr -t ocfs2 /dev/sda1 /u01/db

The customer was not using the datavolume,nointr option. After mounting with the datavolume,nointr, the problem has been solved.





[QA-36] Who is the inventor of Response Time Analysis(RTA) in Oracle ? Created: 08/Apr/08  Updated: 01/May/08

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: X
Operating System: Generic

 Description   
This issue moved to http://www.ubtools.com/web/public/resources/logs/rta_inventor.




[QA-35] ORA-00600 [kturrur11], [65535], [0]: Instance crashed. Created: 08/Nov/07  Updated: 08/Nov/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
Operating System: IBM-AIX

 Description   
The instance crashes with the following error code:
ORA-00600: internal error code, arguments: [kturrur11], [65535], [0], [], [], [], [], []

Stack trace:

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              FFFFFFFFFFFA3B0 ? 000000000 ?
ksedmp+0290          bl       ksedst               1047C9C10 ?
ksfdmp+0018          bl       03F53584             
kgerinv+00dc         bl       _ptrgl               
kgeasnmierr+0040     bl       kgerinv              0FFFFFFFF ? 0000000BA ?
                                                   FFFFFFFFFFFAD18 ?
                                                   FFFFFFFFFFFAD20 ?
                                                   FFFFFFFFFFFAC08 ?
kturgmbu+02b8        bl       kgeasnmierr          000085188 ? 000000000 ?
                                                   000550021 ? 200000002 ?
                                                   000000000 ? 00000FFFF ?
                                                   000000000 ? 000000000 ?
kturrur+01c8         bl       kturgmbu             1001AEFA8 ? 70000020F745AC0 ?
                                                   0000F4240 ? 104BF4B00 ?
                                                   000085188 ? FFFFFFFFFFFA950 ?
                                                   70000020A3BC630 ? 1102B0D58 ?
ktundo+016c          bl       kturrur              1102B0D58 ? 000000000 ?
                                                   100000000 ? FFFFFFFFFFFA9D0 ?
                                                   FFFFFF00000003 ? 15453015F ?
                                                   000000000 ? 000000000 ?
ktubko+0794          bl       ktundo               1FFFFBC80 ?
                                                   5B16B1B30F745928 ?
                                                   1001DD940 ? 70000020A3C8EA0 ?
                                                   000000528 ? FFFFFFFFFFFBE08 ?
                                                   400000000 ? FFFFFFFFFFFBF08 ?
kturrt+15fc          bl       ktubko               70000020A3C52F0 ? 600000000 ?
                                                   000000000 ? 0E8E65D60 ?
                                                   3B008CEC89 ? 55002100000000 ?
kturec+0dcc          bl       kturrt               FFFFFFFFFFFC528 ?
                                                   21000000000000 ? 1FFFFC5E0 ?
                                                   000000000 ? 0000009A0 ?
                                                   40A288838 ? 110021A88 ?
kturax+0300          bl       kturec               5522880400 ? 000000000 ?
                                                   19E370001 ? 000000001 ?
                                                   FFFFFFFFFFFFCAC0 ?
                                                   11FFFFFEFF ? 400000000 ?
ktprbeg+02b0         bl       kturax               10FDB1B2B0 ? 004AD8530 ?
ktmmon+0ebc          bl       ktprbeg              080000000 ?
ktmSmonMain+0030     bl       ktmmon               000000000 ?
ksbrdp+03e0          bl       _ptrgl               
opirip+03fc          bl       01FC66A0             
opidrv+0448          bl       opirip               1103BD070 ? 4103BE990 ?
                                                   FFFFFFFFFFFF860 ?
sou2o+0090           bl       opidrv               32023373FC ? 400000020 ?
                                                   FFFFFFFFFFFF860 ?
opimai_real+0150     bl       01FC0DF4             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0090         bl       main                 000000000 ? 000000000 ?


 Comments   
Comment by ubTools Support [ 08/Nov/07 08:40 AM ]
From Oracle Note:4940513.8:
Bug 4940513  OERI[kturrur11] can occur with multi block undo
 This note gives a brief overview of bug 4940513.
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 11
    Versions confirmed as being affected	

        * 9.2.0.6
        * 9.2.0.7
        * 10.1.0.4
        * 10.1.0.5
        * 10.2.0.2 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        * 9.2.0.8 (Server Patch Set)
        * 10.2.0.3 (Server Patch Set)
        * 11g (Future version) 

Symptoms:
	
Related To:

    * Internal Error May Occur (ORA-600)
    * ORA-600 [kturrur11] 

	

    * (None Specified) 

Description

    In rare situations the server could raise ORA-600 [kturrur11][65535][0]

    Workaround:
      Avoid the multi block undo code path by making sure that the 
      block size in the undo tablespace is large enough to accomodate the 
      largest column that is changed by any SQL statement. If the block size 
      of the data tablespaces is larger than the block size of the undo 
      tablespace, increase the blocksize of the undo tablespace to that of 
      the data tablespace.

Comment by ubTools Support [ 08/Nov/07 10:32 AM ]
Workaround:

Drop segments which need recovery.

Finding the UNDO segment from ALERT LOG:

Errors in file /product/10g/admin/DTWP/bdump/dtwp1_smon_7024648.trc:
ORA-00600: internal error code, arguments: [kturrur11], [65535], [0], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Sat Nov  3 13:19:38 2007
ORACLE Instance DTWP1 (pid = 15) - Error 600 encountered while recovering transaction (85, 33).
Sat Nov  3 13:19:38 2007
Errors in file /product/10g/admin/DTWP/bdump/dtwp1_smon_7024648.trc:
ORA-00600: internal error code, arguments: [kturrur11], [65535], [0], [], [], [], [], []

SMON is trying to rollback a transaction in (UNDOSEGMENT#85, UNDOSLOT#33).

Identifiying UNDO segment:

select segment_name,owner,tablespace_name from dba_rollback_segs
where segment_id=85;  

SEGMENT_NAME                   OWNER  TABLESPACE_NAME
------------------------------ ------ ------------------------------
_SYSSMU85$                     PUBLIC UNDOTBS1

Undo block in the SMON trace:

********************************************************************************
UNDO BLK:  
xid: 0x0055.021.00085188  seq: 0xffff cnt: 0x1   irb: 0x1   icl: 0x0   flg: 0x0000
 
 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x0018     
 
*-----------------------------
* Rec #0x1  slt: 0x21  objn: 125213(0x0001e91d)  objd: 564547  tblspc: 10(0x0000000a)
*       Layer:   5 (Transaction Undo)   opc: 1   rci 0x00   
Undo type:  Multi-block undo Mid-piece   Last buffer split:  Yes 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x5b16b1af
*-----------------------------

Transaction ID: xid: 0x0055.021.00085188

Hexadecimal 55 = Decimal 85
Hexadecimal 21 = Decimal 33

That means this UNDO block is the block which SMON is reading to rollback a segment.

irb points to last UNDO RECORD in UNDO block. rci points to previous UNDO RECORD. if rci=0, it's the first UNDO RECORD. Recovery operation starts from irb and chain is followed by rci until rci is zero.

In this case, the UNDO block includes just one UNDO RECORD. This UNDO RECORD inludes UNDO DATA for object#125213.

Object needs recovery:

select owner,object_name,object_type from dba_objects where object_id=125213

Owner : OWBRUN
Object_name : sm_post_ind2
Object_type  INDEX

Index dropped. But problem did not disappear. Then, it's decided to drop this UNDO segment after identifiying all objects in.

Reading Transaction Table in the UNDO header:

ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU85$';

...
********************************************************************************
Undo Segment:  _SYSSMU85$ (85)
********************************************************************************
...
TRN TBL::
 
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num    cmt
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x85435  0xffff  0x0847.4024f907  0x00000000  0x0000.000.00000000  0x00000000
   0x00000000  1194066440
   0x01    9    0x00  0x84b3c  0x0004  0x0847.4024f89b  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x02    9    0x00  0x85237  0x0006  0x0847.4024f895  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x03    9    0x00  0x85406  0x0011  0x0847.4024f877  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x04    9    0x00  0x851d9  0x000a  0x0847.4024f89e  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x05    9    0x00  0x85234  0x002f  0x0847.4024f881  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x06    9    0x00  0x8543f  0x002b  0x0847.4024f897  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x07    9    0x00  0x850ce  0x002e  0x0847.4024f8ac  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x08    9    0x00  0x853f3  0x001a  0x0847.4024f88a  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x09    9    0x00  0x85188  0x001f  0x0847.4024f87d  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0a    9    0x00  0x84f75  0x0014  0x0847.4024f8a0  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0b    9    0x00  0x832f2  0x0007  0x0847.4024f8aa  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0c    9    0x00  0x85313  0x001e  0x0847.4024f8d2  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0d    9    0x00  0x85320  0x000c  0x0847.4024f8d0  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0e    9    0x00  0x849fb  0x0012  0x0847.4024f890  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x0f    9    0x00  0x8530c  0x000e  0x0847.4024f88e  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x10    9    0x00  0x84ac9  0x0015  0x0847.4024f8de  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x11    9    0x00  0x854f4  0x0009  0x0847.4024f87a  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x12    9    0x00  0x84ce9  0x0002  0x0847.4024f892  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x13    9    0x00  0x85220  0x001b  0x0847.4024f8c4  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x14    9    0x00  0x85119  0x001d  0x0847.4024f8a2  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x15    9    0x00  0x8540c  0x0025  0x0847.4024f8e0  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x16    9    0x00  0x85177  0x0017  0x0847.4024f8ea  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x17    9    0x00  0x84f02  0x002a  0x0847.4024f8ec  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x18    9    0x00  0x84e2d  0x0027  0x0847.4024f8f7  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x19    9    0x00  0x8537a  0x0020  0x0847.4024f8a6  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1a    9    0x00  0x8530b  0x000f  0x0847.4024f88c  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1b    9    0x00  0x841bc  0x0029  0x0847.4024f8c6  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1c    9    0x00  0x852a9  0x002d  0x0847.4024f8fb  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1d    9    0x00  0x84d24  0x0019  0x0847.4024f8a4  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1e    9    0x00  0x85419  0x0010  0x0847.4024f8d3  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x1f    9    0x00  0x84ea2  0x0005  0x0847.4024f87f  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x20    9    0x00  0x853a5  0x000b  0x0847.4024f8a8  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x21   10    0x10  0x85188  0x0919  0x0847.4024f8fd  0x5b16b1b3  0x0000.000.00000000  0x00000002   0x00000000  0
   0x22    9    0x00  0x85279  0x002c  0x0847.4024f886  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x23    9    0x00  0x847b0  0x0028  0x0847.4024f8bb  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x24    9    0x00  0x851cf  0x0023  0x0847.4024f8b9  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x25    9    0x00  0x84a9c  0x0016  0x0847.4024f8e1  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x26   10    0x90  0x7539b  0x0003  0x0847.3fcdcb21  0x4f81d3d2  0x0000.000.00000000  0x0000dbd9   0x00000000  0
   0x27    9    0x00  0x850ac  0x001c  0x0847.4024f8f9  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x28    9    0x00  0x8531b  0x0013  0x0847.4024f8bc  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x29    9    0x00  0x854f0  0x000d  0x0847.4024f8c7  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2a    9    0x00  0x85301  0x0018  0x0847.4024f8ed  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2b    9    0x00  0x83c38  0x0001  0x0847.4024f899  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2c    9    0x00  0x85051  0x0008  0x0847.4024f888  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2d    9    0x00  0x84a3c  0x0000  0x0847.4024f904  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2e    9    0x00  0x84f35  0x0024  0x0847.4024f8ae  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440
   0x2f    9    0x00  0x85100  0x0022  0x0847.4024f884  0x00000000  0x0000.000.00000000  0x00000000   0x00000000
  1194066440

State#10 means active transaction. dba points to starting UNDO block address.

There are 2 active transactions. The one of them points to the slot of 0x21, which is the same as seen in the SMON trace that causes this ORA-600 [kturrur11] error. The other active transaction is available in the slot of 0x26, which has a dba of 0x4f81d3d2.

The object in the slot of 0x21 had been found above; but the object in slot of 0x26 is not known yet.

Object needs recovery:

Hexadecimal 4f81d3d2 = Decimal 1333908434

select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(1333908434) from x$dual;
318

select DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(1333908434) from x$dual;
119762

alter system dump datafile 318 block 119762;

...
*** SESSION ID:(489.37) 2007-11-03 16:23:56.878
Start dump data blocks tsn: 1 file#: 318 minblk 119762 maxblk 119762
buffer tsn: 1 rdba: 0x4f81d3d2 (318/119762)
...
UNDO BLK:  
xid: 0x0055.026.0007539b  seq: 0xf72d cnt: 0x5b  irb: 0x1   icl: 0x0   flg: 0x0000
 
 Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f8c     0x02 0x1f30     0x03 0x1ed4     0x04 0x1e78     0x05 0x1e1c     
0x06 0x1dc0     0x07 0x1d64     0x08 0x1d08     0x09 0x1cac     0x0a 0x1c50     
0x0b 0x1bf4     0x0c 0x1b9c     0x0d 0x1b48     0x0e 0x1af0     0x0f 0x1a98     
0x10 0x1a40     0x11 0x19e8     0x12 0x1990     0x13 0x193c     0x14 0x18e4     
0x15 0x1890     0x16 0x1838     0x17 0x17e0     0x18 0x178c     0x19 0x1738     
0x1a 0x16e0     0x1b 0x1688     0x1c 0x1634     0x1d 0x15dc     0x1e 0x1584     
0x1f 0x152c     0x20 0x14d4     0x21 0x147c     0x22 0x1424     0x23 0x13cc     
0x24 0x1374     0x25 0x131c     0x26 0x12c4     0x27 0x126c     0x28 0x1214     
0x29 0x11bc     0x2a 0x1168     0x2b 0x1110     0x2c 0x10b8     0x2d 0x1064     
0x2e 0x1010     0x2f 0x0fbc     0x30 0x0f64     0x31 0x0f0c     0x32 0x0eb8     
0x33 0x0e60     0x34 0x0e0c     0x35 0x0db8     0x36 0x0d64     0x37 0x0d0c     
0x38 0x0cb4     0x39 0x0c60     0x3a 0x0c08     0x3b 0x0bb0     0x3c 0x0b58     
0x3d 0x0b04     0x3e 0x0aac     0x3f 0x0a58     0x40 0x0a04     0x41 0x09ac     
0x42 0x0954     0x43 0x08fc     0x44 0x08a4     0x45 0x0850     0x46 0x07f8     
0x47 0x07a0     0x48 0x074c     0x49 0x06f4     0x4a 0x06a0     0x4b 0x0648     
0x4c 0x05f0     0x4d 0x0598     0x4e 0x0540     0x4f 0x04e8     0x50 0x0490     
0x51 0x0438     0x52 0x03e0     0x53 0x0388     0x54 0x0334     0x55 0x02dc     
0x56 0x0284     0x57 0x022c     0x58 0x01d4     0x59 0x0180     0x5a 0x012c     
0x5b 0x00d4     
 
*-----------------------------

irb points to the UNDO RECORD of 0x1.

*-----------------------------
* Rec #0x1  slt: 0x26  objn: 125212(0x0001e91c)  objd: 564548  tblspc: 10(0x0000000a)
*       Layer:  10 (Index)   opc: 22   rci 0x00   
Undo type:  Regular undo    User Undo Applied  Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
rdba: 0x4f81d3d1
*-----------------------------
...

rci of UNDO RECORD of 0x1 is 0x00. That means this is the first and last UNDO RECORD.

Object ID in this UNDO RECORD is 125212.

SQL> select owner,object_name,object_type from dba_objects where object_id in (125213,125212);

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-------------------
OWBRUN
SM_POST_IND1
INDEX

It's another lucky object that its type is INDEX. This index dropped. Now, after being sure that there is no new active transactions in this UNDO segment, the followings were done:

  • Shutdown the database
  • Set the following parameter to PFILE/SPFILE:

_smu_debug_mode=4
_offline_rollback_segments=(_SYSSMU85$)

  • Startup the database
  • drop rollback segment "_SYSSMU85$";

After UNDO segment is successfuly dropped, the INTERNAL parameters above should be removed. But, in our case, while dropping UNDO segment, although the current internal error (ORA-600) [kturrur11]) disappeared; the another internal error (ORA-600 [kddummy_blkchk]) was encountered. It's created as another issue as QA-34.

Since all objects needing recovery in the UNDO segment were dropped, there is no need to re-create the database after using _offline_rollback_segments parameter.





[QA-34] ORA-00600 [kddummy_blkchk] while dropping UNDO segment. Created: 08/Nov/07  Updated: 08/Nov/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Blocker
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
Operating System: IBM-AIX

 Description   
While dropping an offlined UNDO segment (by _offline_rollback_segments), the following error appeared:
SQL> drop rollback segment "_SYSSMU85$";
>
> drop rollback segment "_SYSSMU85$"
> *
> ERROR at line 1:
> ORA-00607: Internal error occurred while making a change to a data block
> ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [846985],
> [38508], [], [], [], []

Then, the instance crashed. After re-starting the instance, it crashed again.

Stack trace:

ORA-00600: internal error code, arguments: [kddummy_blkchk], [2], [846985], [38508], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              FFFFFFFFFFF9D10 ? 000000000 ?
ksedmp+0290          bl       ksedst               1047C9C10 ?
ksfdmp+0018          bl       03F53584             
kgerinv+00dc         bl       _ptrgl               
kseinpre+0040        bl       kgerinv              110040AA0 ? 000000000 ?
                                                   1048470A0 ? 07FFFFFFF ?
                                                   000000000 ?
ksesin+0048          bl       kseinpre             1048470A0 ? 07FFFFFFF ?
                                                   000000000 ?
kco_blkchk+0778      bl       ksesin               10484752C ? 300000003 ?
                                                   000000000 ? 000000002 ?
                                                   000000000 ? 0000CEC89 ?
                                                   000000000 ? 00000966C ?
kcoapl+0d24          bl       kco_blkchk           FFF00FFFFFFA310 ?
                                                   284422800B4E4358 ?
                                                   102FD4FDC ? 7000001F5151F50 ?
                                                   000000080 ?
kcbapl+0178          bl       kcoapl               FFFFFFFFFFFC218 ?
                                                   7000001E815A000 ? 100000001 ?
                                                   7FFFFFFF000000F7 ?
                                                   200000000000 ? 20BD260C8 ?
                                                   000000000 ?
kcrfw_redo_gen+2964  bl       kcbapl               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
kcbchg1_main+25e0    bl       kcrfw_redo_gen       102DA5AC7 ?
                                                   2D30491F0A2889F8 ?
                                                   FFFFFFFFFFFAB20 ?
                                                   700000010008000 ? 1000024A4 ?
                                                   000000001 ? 400000000000001 ?
                                                   000000000 ?
kcbchg1+038c         bl       kcbchg1_main         000000000 ? 0000001F4 ?
                                                   000000000 ? 110366678 ?
                                                   0000023A0 ? 70000020B29AFD8 ?
ktbchgro+0380        bl       kcbchg1              00A288AD0 ? 30A2889FA ?
                                                   FFFFFFFFFFFB620 ?
                                                   FFFFFFFFFFFB658 ? 000000000 ?
                                                   000000000 ?
ktfbapp+0044         bl       ktbchgro             000000000 ? 300000003 ?
                                                   FFFFFFFFFFFCB48 ?
                                                   FFFFFFFFFFFC218 ?
                                                   FFFFFFFFFFFBFD8 ?
                                                   FFFFFFFFFFFC0B0 ?
                                                   FFFFFFFFFFFC4D8 ?
                                                   FFFFFFFFFFFC5B0 ?
kteopgen+00ec        bl       ktfbapp              000000000 ? FFFFFFFFFFFC218 ?
                                                   044244040 ? 0FFFFFFFF ?
                                                   FFFFFFFFFFFBFD8 ?
kteopdelete+1468     bl       kteopgen             FFFFFFFFFFFCB48 ? 000000000 ?
                                                   FFFFFFFFFFFBFD8 ?
                                                   FFFFFFFFFFFC140 ?
                                                   FFFFFFFFFFFC218 ?
                                                   FFFFFFFFFFFC0B0 ? 000000000 ?
                                                   1101EBDCC ?
ktsxfastdele+0118    bl       kteopdelete          700000209E9B238 ? 100000001 ?
                                                   100B5A770 ? 000000000 ?
                                                   FFFFFFFFFFFC270 ? 000000000 ?
                                                   000000000 ?
kteopshrink+0308     bl       01FC21A0             
ktssdrbm_segment+0a  bl       kteopshrink          100000001 ? FFFFFFFFFFFCAD8 ?
f8                                                 000000001 ? 000000001 ?
                                                   0000001A0 ? 700000200C016A0 ?
                                                   000000000 ?
ktssdro_segment+06c  bl       ktssdrbm_segment     FFFFFFFFFFFD498 ?
8                                                  FFFFFFFFFFFD560 ? 100008043 ?
                                                   1FFFFFFFF ?
ktssdt_segs+0350     bl       ktssdro_segment      70000020A3C52F0 ? 600007530 ?
                                                   0001DCE78 ?
ktmmon+1048          bl       ktssdt_segs          000000000 ?
                                                   7FFFFFFF7FFFFFFF ?
                                                   7FFFFFFF7FFFFFFF ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
                                                   7FFFFFFC7FFFFFFC ?
                                                   0472CD5BE ?
ktmSmonMain+0030     bl       ktmmon               000000000 ?
ksbrdp+03e0          bl       _ptrgl               
opirip+03fc          bl       01FC66A0             
opidrv+0448          bl       opirip               1103BD070 ? 4103BE990 ?
                                                   FFFFFFFFFFFF860 ?
sou2o+0090           bl       opidrv               32023373FC ? 400000020 ?
                                                   FFFFFFFFFFFF860 ?
opimai_real+0150     bl       01FC0DF4             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0090         bl       main                 000000000 ? 000000000 ?

UNDO segment status:

SQL> select status$ from undo$ where us#=85;

   STATUS$
----------
         1

UNDO$ structure from $ORACLE_HOME/rdbms/admin/sql.bsq:

create table undo$                                     /* undo segment table */
( us#           number not null,                      /* undo segment number */
  name          varchar2("M_IDEN") not null,    /* name of this undo segment */
  user#         number not null,      /* owner: 0 = SYS(PRIVATE), 1 = PUBLIC */
  file#         number not null,               /* segment header file number */
  block#        number not null,              /* segment header block number */
  scnbas        number,           /* highest commit time in rollback segment */
  scnwrp        number,              /* scnbas - scn base, scnwrp - scn wrap */
  xactsqn       number,               /* highest transaction sequence number */
  undosqn       number,                /* highest undo block sequence number */
  inst#         number,    /* parallel server instance that owns the segment */
  status$       number not null,              /* segment status (see KTS.H): */
  /* 1 = INVALID, 2 = AVAILABLE, 3 = IN USE, 4 = OFFLINE, 5 = NEED RECOVERY,
   * 6 = PARTLY AVAILABLE (contains in-doubt txs)
   */
  ts#           number,                                 /* tablespace number */
  ugrp#         number,                      /* The undo group it belongs to */
  keep          number,
  optimal       number,
  flags         number,
  spare1        number,
  spare2        number,
  spare3        number,
  spare4        varchar2(1000),
  spare5        varchar2(1000),
  spare6        date
)

status$=1 means INVALID or DOES NOT EXIST. That means the UNDO segment doesn't exist.



 Comments   
Comment by ubTools Support [ 08/Nov/07 08:03 AM ]
Since the UNDO segment doesn't exist, the most probably its type is converted to TEMP. After setting the following event in the SPFILE/PFILE, the problem disappeared.
event="10061 trace name context forever, level 10" 

This event disables SMON from cleaning temp segment.

Comment by ubTools Support [ 08/Nov/07 12:39 PM ]
The current UNDO TABLESPACE was dropped, and a new one has been created. Then, Event 10061 has been removed.




[QA-31] How did Oracle compute the selectivity on index ? Created: 15/Sep/07  Updated: 30/Sep/15

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - SQL Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

File Attachments: File prod_ora_537_SELECT_PROD_I1_10053.trc     Zip Archive sqlt_s4880_prod_fsthqdr2_I1_main.zip    
Product Version: 9.2.0.7.0
Operating System: HP-UX
Operating System Version: B.11.11
SQL_TEXT: CREATE OR REPLACE VIEW IC_ITEM_INV_V
(ITEM_ID, LOT_NO, SUBLOT_NO, LOT_ID, LOT_STATUS,
 LOT_CREATED, EXPIRE_DATE, QC_GRADE, WHSE_CODE, LOCATION,
 LOCT_ONHAND, LOCT_ONHAND2, COMMIT_QTY, COMMIT_QTY2)
AS
SELECT l.item_id, l.lot_no, l.sublot_no, l.lot_id, s.lot_status,
          l.lot_created, l.expire_date, l.qc_grade, b.whse_code, b.LOCATION,
          b.loct_onhand, b.loct_onhand2, 0, 0
     FROM ic_lots_mst l, ic_loct_inv b, ic_lots_sts s
    WHERE l.item_id = b.item_id
      AND l.inactive_ind = 0
      AND l.lot_id = b.lot_id
      AND b.lot_status = s.lot_status(+)
      AND NVL (s.order_proc_ind, 1) = 1
      AND NVL (s.rejected_ind, 0) = 0
      AND b.loct_onhand > 0
   UNION ALL
   SELECT /*+ INDEX(t IC_TRAN_PNDI1) */ t.item_id, l.lot_no, l.sublot_no, t.lot_id, t.lot_status,
          l.lot_created, l.expire_date, l.qc_grade, t.whse_code, t.LOCATION,
          0, 0, t.trans_qty commit_qty, t.trans_qty2 commit_qty2
     FROM ic_lots_mst l, ic_tran_pnd t, ic_item_mst i
    WHERE i.item_id = l.item_id
      AND i.item_id = t.item_id
      AND l.inactive_ind = 0
      AND t.lot_id = l.lot_id
      AND t.delete_mark = 0
      AND t.completed_ind = 0
      AND t.trans_qty < 0
/



SELECT SUM (loct_onhand), SUM (loct_onhand2), SUM (commit_qty),
         SUM (commit_qty2), SUM (loct_onhand) + SUM (commit_qty), lot_no,
         sublot_no, lot_id, lot_status, lot_created, LOCATION, expire_date,
         qc_grade
    FROM xtdba.ic_item_inv_v_sil x
   WHERE item_id = 5125
     AND whse_code = '350'
     AND loct_onhand >= 0
     AND expire_date >
                    TO_DATE ('06-SEP-2007, 11:59:59', 'DD-MON-YYYY, HH:MI:SS')
     AND lot_id > 0
     AND LOCATION <> 'NONE'
GROUP BY lot_no,
         sublot_no,
         lot_id,
         lot_status,
         lot_created,
         LOCATION,
         expire_date,
         qc_grade
  HAVING SUM (loct_onhand) + SUM (commit_qty) > 0
ORDER BY lot_created

 Description   
The customer wanted to know how Oracle computes the selectivity on index IC_TRAN_PNDI1. They're not sure if Oracle optimizer computes correct.

 Comments   
Comment by ubTools Support [ 15/Sep/07 11:11 AM ]
Event 10053 trace file.
Comment by ubTools Support [ 15/Sep/07 11:14 AM ]
SQLTXPLAIN report.
Comment by ubTools Support [ 15/Sep/07 11:15 AM ]
SQLTXPLAIN report.
Comment by ubTools Support [ 15/Sep/07 11:44 AM ]
BASE STATISTICAL INFORMATION
 
***********************
Table stats    Table: IC_TRAN_PND   Alias:  T
  (Using composite stats)
  TOTAL ::  CDN: 34357548  NBLKS:  737250  AVG_ROW_LEN:  143
-- Index stats
  INDEX NAME: IC_TRAN_PNDI1  COL#: 2 7 6 8 
    TOTAL ::  LVLS: 3   #LB: 341316  #DK: 113700  LB/K: 3  DB/K: 248  CLUF: 28283677
...
***********************

Definition of BASE STATISTICAL INFORMATION

CDN: Cardinality, number of rows.
NBLKS: Number of blocks.
AVG_ROW_LEN: Average row length.

COL#: Column numbers in order.
LVLS: Index depth.
#LB: Number of leaf blocks.
#DK: Number of distinct keys.
LB/K: Leaf blocks per key.
DB/K: Data bloks per key.
CLUF: Clustering factor.

SINGLE TABLE ACCESS PATH

Column: DELETE_MAR  Col#: 28     Table: IC_TRAN_PND   Alias:  T
    NDV: 3         NULLS: 0         DENS: 3.3333e-01 LO:  0  HI: 2
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column: COMPLETED_  Col#: 24     Table: IC_TRAN_PND   Alias:  T
    NDV: 2         NULLS: 0         DENS: 5.0000e-01 LO:  0  HI: 1
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:  TRANS_QTY  Col#: 16     Table: IC_TRAN_PND   Alias:  T
    NDV: 62267     NULLS: 0         DENS: 1.6060e-05 LO:  -3116050  HI: 150907016871
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:    ITEM_ID  Col#: 2      Table: IC_TRAN_PND   Alias:  T
    NDV: 4527      NULLS: 0         DENS: 2.2090e-04 LO:  3  HI: 9816
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:  WHSE_CODE  Col#: 6      Table: IC_TRAN_PND   Alias:  T
    NDV: 105       NULLS: 0         DENS: 9.5238e-03
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:     LOT_ID  Col#: 7      Table: IC_TRAN_PND   Alias:  T
    NDV: 13443     NULLS: 0         DENS: 7.4388e-05 LO:  0  HI: 46635
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:   LOCATION  Col#: 8      Table: IC_TRAN_PND   Alias:  T
    NDV: 31        NULLS: 0         DENS: 3.2258e-02
    NO HISTOGRAM: #BKT: 1 #VAL: 2
  TABLE: IC_TRAN_PND     ORIG CDN: 34357548  ROUNDED CDN: 1  CMPTD CDN: 0
...

Definition of SINGLE TABLE ACCESS PATH

NDV: Number of distinct values.
NULLS: Number of NULLs.
DENS: Density.
LO: Lowest value for numeric columns.
HI: Highest value for numeric columns.
...

Comment by ubTools Support [ 15/Sep/07 11:54 AM ]
According to the execation plan, these are the predicates:

Access Predicates:

T.ITEM_ID=5125
AND T.LOT_ID=L.LOT_ID
AND T.WHSE_CODE='350'

Filter Predicates:

B.ITEM_ID=T.ITEM_ID
AND T.LOT_ID>0
AND T.LOCATION<>'NONE'

Column order of IC_TRAN_PNDI1:

  • ITEM_ID
  • LOT_ID
  • WHSE_CODE
  • LOCATION
Comment by ubTools Support [ 15/Sep/07 12:12 PM ]
According to the execution plan, T.LOT_ID is joined with L.LOT_ID. That means T.LOT_ID gets values in the join. So, accessing the index consists of the following columns:
  • ITEM_ID
  • LOT_ID
  • WHSE_CODE

That's why the access predicates consist of these columns. T.LOCATION<>'NONE' is not included in access predicates. Because, <> can not be used accessing index.

After accessing index by access predicates, filter operation starts by filter predicates in order to eliminate rows on index without going to table. Additionally, T.LOCATION<>'NONE' is used in filter predicates to filter index keys on index.

Comment by ubTools Support [ 15/Sep/07 12:59 PM ]
Note: Since there is no NULL/histogram in our IC_TRAN_PNDI1 index columns and all predicates are ANDed, we did not cover other situations for selectivity computations.

Selectivity of access predicates:

Column Operation Formula Value
ITEM_ID = 1/NDV=DENS 2.2090e-04
LOT_ID = 1/NDV=DENS 7.4388e-05
WHSE_CODE = 1/NDV=DENS 9.5238e-03

Since the columns are ANDed, combined selectivity means:

= Sel(ITEM_ID)*Sel(LOT_ID)*Sel(WHSE_CODE)
= 2.2090e-04*7.4388e-05*9.5238e-03
= 1.5649e-10

Selectivity of filter predicates:

After accessing the index, filter operation will start. In our case, access predicates will also be used in filter operation to eliminate rows in the index. Because, their values are known, and can be used in filter operation.

But, their selectivity will not be re-computed, since they are already computed to access the index. So, T.LOT_ID>0 in filter predicates doesn't make sense even if its operation is not an equal operation as in access predicates.

Column Operation Formula Value
LOCATION <> 1-(1/NDV=DENS) 1-3.2258e-02=0.967742

Since the columns are ANDed, combined selectivity means:

= Sel(ITEM_ID)*Sel(LOT_ID)*Sel(WHSE_CODE)*Sel(LOCATION)
= 2.2090e-04*7.4388e-05*9.5238e-03*0.967742
= 1.5144e-10

Comment by ubTools Support [ 15/Sep/07 01:33 PM ]
Interpreting Event 10053 trace file is need to see if optimizer computation and ours match.

Final cost at the bottom:

Final - All Rows Plan:
  JOIN ORDER: 1
  CST: 29  CDN: 2  RSC: 28  RSP: 28  BYTES: 308
  IO-RSC: 28  IO-RSP: 28  CPU-RSC: 0  CPU-RSP: 0

The final cost is 29.

Going backward lines to break down the final cost of 29:

 
BASE STATISTICAL INFORMATION
***********************
Table stats    Table: IC_ITEM_INV_V_SIL   Alias:  X
  TOTAL ::  (NOT ANALYZED)    CDN: 0  NBLKS:  0  AVG_ROW_LEN:  0
_OPTIMIZER_PERCENT_PARALLEL = 0
  BEST_CST: 13.00  PATH: 2  Degree:  1

The cost of IC_ITEM_INV_V_SIL is 13.

 
GENERAL PLANS
***********************
Join order[1]:  IC_ITEM_INV_V_SIL[X]#0
GROUP BY sort
GROUP BY cardinality:  1, TABLE cardinality:  2
HAVING selectivity:  5.0000e-02  -> GROUPS:  1
    SORT resource      Sort statistics
      Sort width:          299 Area size:     1048576 Max Area size:   104857600   Degree: 1
      Blocks to Sort:        1 Row size:          180 Rows:          2
      Initial runs:          1 Merge passes:        1 IO Cost / pass:         30
      Total IO sort cost: 16
      Total CPU sort cost: 0
      Total Temp space used: 0
Best so far: TABLE#: 0  CST:         29  CDN:          2  BYTES:        308
    SORT resource      Sort statistics
      Sort width:          299 Area size:     1048576 Max Area size:   104857600   Degree: 1
      Blocks to Sort:        1 Row size:          180 Rows:          2
      Initial runs:          1 Merge passes:        1 IO Cost / pass:         30
      Total IO sort cost: 16
      Total CPU sort cost: 0
      Total Temp space used: 0
..

The cost of sorting IC_ITEM_INV_V_SIL is 16.

Total cost (29) = Accessing IC_ITEM_INV_V_SIL (13) + Sorting IC_ITEM_INV_V_SIL (16)

Going backward lines to break down the cost of 13:

Join result: cost: 7  cdn: 1  rcz: 98
Best so far: TABLE#: 0  CST:          1  CDN:          1  BYTES:          4
Best so far: TABLE#: 1  CST:          1  CDN:          1  BYTES:         11
Best so far: TABLE#: 3  CST:          3  CDN:          1  BYTES:         65
Best so far: TABLE#: 2  CST:          7  CDN:          1  BYTES:         98
Final - All Rows Plan:
  JOIN ORDER: 2
  CST: 7  CDN: 1  RSC: 7  RSP: 7  BYTES: 98
  IO-RSC: 7  IO-RSP: 7  CPU-RSC: 0  CPU-RSP: 0

JOIN ORDER: 2 is selected with the cost of 7.

Going backward lines to break down the cost of 7:

Join order[2]:  IC_ITEM_MST_B[B]#0  IC_ITEM_MST_TL[T]#1  IC_LOTS_MST[L]#3  IC_TRAN_PND[T]#2
...
Now joining: IC_TRAN_PND[T]#2 *******
NL Join
  Outer table: cost: 3  cdn: 1  rcz: 65  resp:  3
  Access path: index (scan)
      Index: IC_TRAN_PNDI1
  TABLE: IC_TRAN_PND
      RSC_CPU: 0   RSC_IO: 4
  IX_SEL:  1.5650e-10  TB_SEL:  1.5144e-10
    Join:  resc: 7  resp: 7
  Best NL cost: 7  resp: 7

Our index IC_TRAN_PNDI1 appears here. So, here is the stop point for our case.

Here is the selectivity comparison table which includes Oracle-computed selectivity values and manually computed selectivity values.

  Oracle Manual
IX_SEL(Access predicates) 1.5650e-10 1.5649e-10
TB_SEL(Access+Filter Predicates) 1.5144e-10 1.5144e-10

No computation errors found.

Comment by ubTools Support [ 30/Sep/15 02:32 PM ]
"Cost Based Oracle: Fundamentals" book of Jonathan Lewis was used in the calculations above.




[QA-30] Memory leak on MMNL background process. Created: 16/Jul/07  Updated: 18/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.1.0.3.0
Operating System: Solaris
Operating System Version: 5.10

 Description   

Problem:

The size of MMNL background process is growing, then the server is crashed.

Analysis:

bash-3.00$ ps -ef|grep mmnl
oracle 2250 1 0 Jun 28 ? 12:03 ora_mmnl_bgw
oracle 21397 20996 0 13:31:42 pts/5 0:00 grep mmnl

SQL> select s.sid, n.name,s.value
from v$sesstat s , v$statname n
where s.statistic# = n.statistic#
and n.name like '%memory%'
and s.sid in
(select se.sid from v$session se, v$process pr
where se.paddr=pr.addr and pr.spid=2250)
order by value desc;

SID NAME VALUE
---------- ---------------------------------------------------------------- ----------
1646 session pga memory 463496
1646 session pga memory max 463496
1646 session uga memory 88640
1646 session uga memory max 88640
1646 workarea memory allocated 0
1646 sorts (memory) 0

6 rows selected.

SQL>

bash-3.00$ pmap -x 2250
2250: ora_mmnl_bgw
Address Kbytes RSS Anon Locked Mode Mapped File
0000000100000000 81016 78904 - - r-x-- oracle
000000010501C000 856 592 112 - rwx-- oracle
00000001050F2000 3128 1352 64 - rwx-- [ heap ]
0000000105400000 4190208 1255504 4096 - rwx-- [ heap ]
0000000205000000 3731456 2145424 1138688 - rwx-- [ heap ]

0000000380000000 253952 253952 - 253952 rwxsR [ ism shmi d=0xc ]
0000040000000000 290816 290816 - 290816 rwxsR [ ism shmi d=0xd ]
0000040040000000 290816 290816 - 290816 rwxsR [ ism shmi d=0xe ]
0000040080000000 16 16 - 16 rwxsR [ ism shmi d=0xf ]

FFFFFFFF7B500000 64 24 - - rwx-- [ anon ]
FFFFFFFF7B530000 128 16 - - rw--- [ anon ]
FFFFFFFF7B600000 8 - - - rw-s- dev:291,0 in o:240652
FFFFFFFF7B750000 64 24 16 - rw--- [ anon ]
FFFFFFFF7B760000 64 24 24 - rw--- [ anon ]
FFFFFFFF7B770000 64 56 48 - rw--- [ anon ]
FFFFFFFF7B800000 16 16 - - r-x-- liblgrp.so.1
FFFFFFFF7B904000 8 8 - - rwx-- liblgrp.so.1
FFFFFFFF7BA78000 8 8 - - rwxs- [ anon ]
FFFFFFFF7BB00000 8 8 - - r-x-- libc_psr.so. 1
FFFFFFFF7BC00000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7BD00000 8 8 - - r-x-- libmd5.so.1
FFFFFFFF7BE02000 8 8 - - rwx-- libmd5.so.1
FFFFFFFF7BF00000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7C000000 640 168 - - r-x-- libm.so.2
FFFFFFFF7C19E000 40 24 8 - rwx-- libm.so.2
FFFFFFFF7C200000 8 8 - - r-x-- libkstat.so. 1
FFFFFFFF7C302000 8 8 8 - rwx-- libkstat.so. 1
FFFFFFFF7C400000 32 24 - - r-x-- librt.so.1
FFFFFFFF7C508000 8 8 - - rwx-- librt.so.1
FFFFFFFF7C600000 32 32 - - r-x-- libaio.so.1
FFFFFFFF7C708000 8 8 - - rwx-- libaio.so.1
FFFFFFFF7C800000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7C900000 912 656 - - r-x-- libc.so.1
FFFFFFFF7CAE4000 64 64 64 - rwx-- libc.so.1
FFFFFFFF7CAF4000 8 - - - rwx-- libc.so.1
FFFFFFFF7CB00000 24 16 16 - rwx-- [ anon ]
FFFFFFFF7CC00000 32 16 - - r-x-- libgen.so.1
FFFFFFFF7CD08000 8 8 - - rwx-- libgen.so.1
FFFFFFFF7CE00000 56 32 - - r-x-- libsocket.so .1
FFFFFFFF7CF0E000 16 16 - - rwx-- libsocket.so .1
FFFFFFFF7D000000 688 248 - - r-x-- libnsl.so.1
FFFFFFFF7D1AC000 64 64 - - rwx-- libnsl.so.1
FFFFFFFF7D1BC000 32 8 - - rwx-- libnsl.so.1
FFFFFFFF7D200000 1912 320 - - r-x-- libnnz10.so
FFFFFFFF7D4DC000 632 232 - - rwx-- libnnz10.so
FFFFFFFF7D57A000 8 - - - rwx-- libnnz10.so
FFFFFFFF7D600000 40 16 - - r-x-- libdbcfg10.s o
FFFFFFFF7D708000 8 8 - - rwx-- libdbcfg10.s o
FFFFFFFF7D800000 8488 8200 - - r-x-- libjox10.so
FFFFFFFF7E148000 536 480 - - rwx-- libjox10.so
FFFFFFFF7E200000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7E300000 16 16 - - r-x-- libocrutl10. so
FFFFFFFF7E402000 16 16 - - rwx-- libocrutl10. so
FFFFFFFF7E500000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7E600000 144 40 - - r-x-- libocrb10.so
FFFFFFFF7E722000 8 8 - - rwx-- libocrb10.so
FFFFFFFF7E800000 200 72 - - r-x-- libocr10.so
FFFFFFFF7E930000 16 16 - - rwx-- libocr10.so
FFFFFFFF7EA00000 8 8 - - r-x-- libskgxn2.so
FFFFFFFF7EB00000 8 8 - - rwx-- libskgxn2.so
FFFFFFFF7EC00000 1480 352 - - r-x-- libhasgen10. so
FFFFFFFF7EE70000 72 56 - - rwx-- libhasgen10. so
FFFFFFFF7EE82000 8 - - - rwx-- libhasgen10. so
FFFFFFFF7EF00000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7F000000 8 8 - - r-x-- libskgxp10.s o
FFFFFFFF7F100000 8 8 - - rwx-- libskgxp10.s o
FFFFFFFF7F200000 8 8 - - r-x-- libodmd10.so
FFFFFFFF7F300000 8 8 - - rwx-- libodmd10.so
FFFFFFFF7F400000 8 8 - - r-x-- libdl.so.1
FFFFFFFF7F500000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7F600000 176 176 - - r-x-- ld.so.1
FFFFFFFF7F72C000 16 16 8 - rwx-- ld.so.1
FFFFFFFF7FFF0000 64 48 24 - rw--- [ stack ]
---------------- ---------- ---------- ---------- ----------
total Kb 8859344 4329168 1143232 835600
bash-3.00$

bash-3.00$ truss -p 2250

open("/dev/kstat", O_RDONLY) = 58843
ioctl(58843, KSTAT_IOC_CHAIN_ID, 0x00000000) = 755
ioctl(58843, KSTAT_IOC_READ, "kstat_headers") Err#12 ENOMEM
brk(0x2EBC064D0) = 0
brk(0x2EBC264D0) = 0
ioctl(58843, KSTAT_IOC_READ, "kstat_headers") = 755
brk(0x2EBC264D0) = 0
brk(0x2EBC2A4D0) = 0
ioctl(58843, KSTAT_IOC_READ, "cpu_info0") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info1") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info2") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info3") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info8") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info9") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info10") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info11") = 755
ioctl(58843, KSTAT_IOC_READ, "cpu_info512") = 755
pset_bind(PS_QUERY, P_PID, -1, 0xFFFFFFFF7FFFD5AC) = 0
open("/dev/kstat", O_RDONLY) = 58844



 Comments   
Comment by ubTools Support [ 16/Jul/07 05:47 AM ]

Cause:

session pga memory is 463496 BYTE. But, it's too high in OS even if shared segment is substructed:

4329168 - (253952+290816+290816+16)= 3493568 (KB)

3493568*1024 is too high. There is huge memory allocation in HEAP usage.

Oracle opened /dev/kstat to get operating system kernel statistics without closing it before subsequent open. There should be one close() system call for each open() call.

Bug:

ORACLE BUG: 3701351.

Base Bug#3559340 inludes a fix for Oracle 10.1.0.3.

Comment by ubTools Support [ 02/Aug/07 12:50 PM ]
Base Bug#3559340 is fixed in Oracle 10.1.0.4.




[QA-29] ORA-600 [2845] while selecting, Invalid ROWID. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 7.3.4.5.0
Operating System: TRU64

 Description   
ORA-600 [2845] while selecting, Invalid ROWID.

 Comments   
Comment by ubTools Support [ 15/Jul/07 06:14 PM ]

Error code:

ORA-00600: internal error code, arguments: [2845], [0], [30], [0], [], [], [], []

Error code definition:

Oracle is reading a range of blocks from a database file.
If the starting block number or file number is 0, or the file number is  greater than can be accommodated in
the SGA (DB_FILES), error ORA-600 [2845] is raised.

Ref: Metalink Note: 31057.1 ORA-600 [2845] "Read of bad DBA Requested"

Cursor dump:

********************   Cursor Dump   ************************
Current cursor: 30, pgadep: 0
  pgactx: 14a415b8 ctxcbk: 0 ctxqbc: 14a41990 qbcrws: 14a40ef0
Cursor Dump:
----------------------------------------
...
----------------------------------------
Cursor 30 (1400ea8e0): CURFETCH  curiob: 14019be40
curflg: 46 curpar: 0 curusr: c curses 8a2f38
cursor name: SELECT "NOTE" FROM "MEDIX"."PAT_SES" WHERE "ROWID"=:1
child pin: 5485b30, child lock: 54f2960, parent lock: 54c5f68
xscflg: 80110676, parent handle: 14a46070
 nxt: 3.0x00000018  nxt: 2.0x000007d8  nxt: 1.0x000004e0
Cursor frame allocation dump:
frm: -------- Comment --------  Size  Seg Off
bind 0: dty=1 mxl=32(18) mal=00 scl=00 pre=00 oacflg=01
  bfp=140192748 bln=18 avl=18 flg=05
  value="00000000.0000.0000"
----------------------------------------
...

Problem explanation:

As seen in the cursor dump above, the current cursor number is 30. The cursor#30 has a bind variable using a ROWID. But, the value of this bind variable is "00000000.0000.0000". In Oracle7, this ROWID points to block#0, slot#0, file#0. This is wrong.

Recommendation:

  • Check application against any possible wrong ROWID usage.
  • Call Oracle Support.




[QA-28] ORA-00600 [729]: UGA memory leak. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.7.3.0
Operating System: Windows
Operating System Version: 2000

 Description   
ORA-00600 [729]: UGA memory leak.

 Comments   
Comment by ubTools Support [ 15/Jul/07 06:04 PM ]

Error code:

ORA-00600: internal error code, arguments: [729], [480], [space leak], [], [], [], [], []

Error code definition:

A space leak has been detected in the User Global Area (UGA). There is no data corruption as a result of this error. It is an internal memory housekeeping problem. Second argument is the number of bytes leaked.

UGA Heap dump:

******** ERROR: UGA memory leak detected 480 ********
******************************************************
HEAP DUMP heap name="session heap"  desc=0x222bd6f4
extent sz=0x108c alt=32767 het=32767 rec=0 flg=3 opc=3
parent=212550 owner=ad83d50 nex=0 xsz=0x108c
EXTENT 0
 Chunk 2330b100 sz=     3844    free      "               "
EXTENT 1
 Chunk 232f5174 sz=      516    free      "               "
EXTENT 2
 Chunk 236f0050 sz=     4176    free      "               "
EXTENT 3
 Chunk 236d0050 sz=     1228    free      "               "
EXTENT 4
 Chunk 236f18e4 sz=     1280    free      "               "
EXTENT 5
 Chunk 23307098 sz=     4228    free      "               "
EXTENT 6
 Chunk 2330a27c sz=     3696    free      "               "
EXTENT 7
 Chunk 23308130 sz=     1008    free      "               "
 Chunk 23308520 sz=      480    freeable  "define var info"
 Chunk 23308700 sz=     2740    free      "               "
EXTENT 8
 Chunk 23306214 sz=     2832    perm      "perm           "  alo=2832
 Chunk 23306d24 sz=      864    free      "               "
EXTENT 9
 Chunk 233091c8 sz=     4228    free      "               "
EXTENT 10
 Chunk 232f405c sz=      612    free      "               "
EXTENT 11
...

Problem explanation:

As seen in the UGA heap dump, there is a freeable chunk of define var info memory type. This chunk looks leaked.

Workaround:

There is no data corruption in this error, and can be safely ignore for small memory leaks by adding the following event to init.ora:

  • event = "10262 trace name context forever, level 500"

Then, restart your database. This event disables space leaks less than 500 bytes.

You can see the details at Metalink Note:31056.1 ORA-600 [729] "UGA Space Leak"

Bug:

Bug:2177050: ORA-600 [729] after application of the 8.1.7.3 patchset. The resulting trace file will include a memory dump which shows unfreed memory chunks with the tags "define var info" and/or "oactoid info".
Ref: Metalink Note:31056.1 ORA-600 [729] "UGA Space Leak"





[QA-27] ORA-00600 [kcbgcur_1] by PQ operation. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.6.1.0
Operating System: Linux
Operating System Version: 2.2.14-5.0

 Description   
ORA-00600 [kcbgcur_1] by PQ operation.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:59 PM ]

Error code:

ORA-00600: internal error code, arguments: [kcbgcur_1], [], [], [], [], [], [], []

Oracle kernel function from which the problem is raised:

kcbgcur().

This function is a function of Oracle Cache Layer.

Undo block dump:

UNDO BLK:  
xid: 0x0005.05e.000000c4  seq: 0x8c  cnt: 0x31  irb: 0x19  icl: 0x0   flg: 0x0000

Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x1f38     0x02 0x1e88     0x03 0x1de4     0x04 0x1d3c     0x05 0x1c94    
0x06 0x1bf4     0x07 0x1b54     0x08 0x1ac4     0x09 0x1a20     0x0a 0x1978    
0x0b 0x18d4     0x0c 0x1820     0x0d 0x1784     0x0e 0x16e0     0x0f 0x1638    
0x10 0x1598     0x11 0x14e8     0x12 0x1448     0x13 0x13a4     0x14 0x1308    
0x15 0x126c     0x16 0x11d0     0x17 0x112c     0x18 0x1084     0x19 0x0fe0    
0x1a 0x0f0c     0x1b 0x0e60     0x1c 0x0db8     0x1d 0x0d28     0x1e 0x0c90    
0x1f 0x0bf0     0x20 0x0b28     0x21 0x0a88     0x22 0x09ec     0x23 0x0950    
0x24 0x08ac     0x25 0x0814     0x26 0x077c     0x27 0x06e4     0x28 0x0650    
0x29 0x05b4     0x2a 0x0524     0x2b 0x0480     0x2c 0x03f4     0x2d 0x035c    
0x2e 0x02c0     0x2f 0x0230     0x30 0x01a0     0x31 0x0108
...
*-----------------------------
* Rec #0x19  slt: 0x5e  objn: 0(0x00000000)  objd: 0  tblspc: 0(0x00000000)
*       Layer:  11 (Row)   opc: 1   rci 0x18  
Undo type:  Regular undo   Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
rdba: 0x00000000
*-----------------------------
KDO undo record:
KTB Redo
op: 0x02  ver: 0x01  
op: C  uba: 0x00c0083d.008c.18
KDO Op code: IRP  xtype: XA  bdba: 0x0040760a  hdba: 0x004075d9
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 130(0x82) size/delt: 56
fb: --H-FL-- lb: 0x0 cc: 4
null: ----
col  0: [ 3]  37 34 34
col  1: [20]  45 6c 65 63 74 72 6f 6e 69 63 20 73 74 72 75 63 74 75 72 65
col  2: [ 0]
col  3: [ 0]
*-----------------------------
...

Problem explanation:

irb points the first undo record in undo block to begin rollback. So, the record 0x19 is your first undo record. The object number of the block, and the object number of the block undo applied to are 0. I think this may be your problem. Oracle may not be able to know the real object number during this rollback.

Bug:

It looks like:

  • Bug:984947 A PARALLEL QUERY SLAVE GOT ORA-600[KCBGCUR_1]




[QA-26] ORA-00600 [12700] by SNP process. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.0.5.2.1
Operating System: Windows

 Description   
ORA-00600 [12700] by SNP process.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:54 PM ]

Error code:

ORA-00600: internal error code, arguments: [12700], [62], [4202128], [133], [], [], [], []

Current SQL statement for this session:

SELECT source from source$ WHERE obj# =:1 ORDER BY line

Oracle kernel function from which the problem is raised:

rtbhiopn().

Error code definition:

Oracle is trying to access a row using its ROWID, which has been obtained from an index.

A mismatch was found between the index rowid and the data block it is   pointing to.
The rowid points to a non-existent row in the data block.
The corruption can be in data and/or index blocks.

ORA-600 [12700] can also be reported due to a consistent read (CR) problem.

The information dumped to the trace file varies greatly between releases:

- in Oracle 7.3.x it is ORA-600 [12700][a1][a2] , where
 Arg [a1] dba (Data Block Address)
 Arg [a2] slot number (number of the row in the block pointed by the dba)

- in Oracle 8.x and 9.x, it is ORA-600 [12700][a1][a2][a3] , where
 Arg [a1] dataobj# from sys.obj$
 Arg [a2] relative dba of the data block
 Arg [a3] slot number of the row in the data block  

Details: Metalink Note:28229.1 ORA-600 [12700] "Index entry Points to Missing ROWID"

Error code interpretation:

Argument             Dec              Hex
----------    ----------    ----------
        [62]              62             0x3E
[4202128]      4202128       0x401E90
      [133]             133             0x85

This problem is related to the slot#133 of the rdba#4202128 of the object#62.

Index block dump:

Block header dump: rdba: 0x00401ede
Object id on Block? Y
seg/obj: 0x63  csc: 0x00.2fbe43  itc: 2  flg: -  typ: 2 - INDEX
    fsl: 0  fnx: 0x0 ver: 0x01

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0012.01a.00000130  0x008027fb.000e.02  C---    0  scn 0x0000.0000c71c
0x02   0x0002.013.00000768  0x00807cf2.4f11.08  --U-  217  fsc 0x0000.002fbe45

Leaf block dump
===============
header address 74698844=0x473d05c
kdxcolev 0
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 217
kdxcofbo 470=0x1d6
kdxcofeo 471=0x1d7
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 4202207=0x401edf
kdxleprv 4203488=0x4023e0
kdxledsz 6
kdxlecol 0
kdxlebksz 3940
row#0[471] flag: ----, lock: 2, data:(6):  00 40 1e 93 00 5b
col 0; len 3; (3):  c2 27 11
col 1; len 3; (3):  c2 02 62
...
row#213[3876] flag: ----, lock: 2, data:(6):  00 40 1e 90 00 85
col 0; len 3; (3):  c2 27 11
col 1; len 3; (3):  c2 05 0b
row#214[3892] flag: ----, lock: 2, data:(6):  00 40 1e 90 00 86
col 0; len 3; (3):  c2 27 11
col 1; len 3; (3):  c2 05 0c
...

Data block dump:

Block header dump: rdba: 0x00401e90
Object id on Block? Y
seg/obj: 0x3e  csc: 0x00.2fbe43  itc: 1  flg: -  typ: 1 - DATA
    fsl: 0  fnx: 0x0 ver: 0x01

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0002.013.00000768  0x00807cf2.4f11.0b  --U-  139  fsc 0x009b.002fbe45

data_block_dump
===============
tsiz: 0xfb8
hsiz: 0x128
pbl: 0x04b43044
bdba: 0x00401e90
flag=---------
ntab=1
nrow=139
frre=-1
fsbo=0x128
fseo=0x2d4
avsp=0x111
tosp=0x23a
0xe:pti[0] nrow=139 offs=0
0x12:pri[0] offs=0xfb6
0x14:pri[1] offs=0xfb4
0x16:pri[2] offs=0xfb2
.
0x11a:pri[132] offs=0x382
0x11c:pri[133] sfll=0
0x11e:pri[134] sfll=0
0x120:pri[135] sfll=0
0x122:pri[136] sfll=0
0x124:pri[137] sfll=0
0x126:pri[138] sfll=0
block_row_dump:
tab 0, row 0, @0xfb6
tl: 2 fb: --HDFL-- lb: 0x1
tab 0, row 1, @0xfb4
tl: 2 fb: --HDFL-- lb: 0x1
.
tab 0, row 132, @0x382
tl: 42 fb: --H-FL-- lb: 0x1 cc: 3
col  0: [ 3]  c2 27 11
col  1: [ 3]  c2 05 0a
col  2: [30]
09 09 09 09 09 09 6c 5f 6e 65 78 74 48 6f 6c 64 44 65 73 69 72 65 4e 75 6d
62 65 72 2c 0a
end_of_block_dump

Problem explanation:

As seen in the index block dump, kdxledsz is 6. That means this index is a unique B*Tree index which uses restricted ROWID format in 6 bytes. The first 4 bytes are used for rdba, and the last 2 bytes are used for slot#.

This internal error code had returned 0x401E90 for the rdba, and 0x85 for the slot#. The restricted ROWID in the index dump has to be the combination of them. So, it's 0x00401E900085. This restricted ROWID is available in the index dump.

The pri[] field shows slot# of rows in data block. In this error, the returned slot# is 133. But, as seen in the data block dump, there is no row allocated for this slot. The max slot# in the block dump is 132.

Although there is a value in the index block, there is no matching row in the data block. The data block looks corrupted.

Workaround:

The most probably the object#62 is source$. Restore SYSTEM tablespace from the backup, and recover it.





[QA-25] ORA-00600 [kkslgop1] in SELECT when CURSOR_SHARING IS NOT EXACT. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.7.2.0
Operating System: IBM-AIX

 Description   
ORA-00600 [kkslgop1] in SELECT when CURSOR_SHARING IS NOT EXACT.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:42 PM ]

Error code:

ORA-00600: internal error code, arguments: [kkslgop1], [], [], [], [], [], [], []

Current SQL statement for this session:

SELECT COMP_TIME FROM CSTMAPSTATUS  WHERE CSTID = :"SYS_B_0" AND  SLOTNO = :"SYS_B_1"

Oracle kernel function from which the problem is raised:

kkslgop().

This is a function of Oracle Compilation Layer.

Process state:

PROCESS STATE
-------------
...
   ----------------------------------------
   SO: 404c6264, type: 3, owner: 403cde98, pt: 0, flag: INIT/-/-/0x00
   (session) trans: 40e83928, creator: 403cde98, flag: (8000041) USR/- BSY/-/-/-/-/-
             DID: 0001-0014-00000002, short-term DID: 0000-0000-00000000
             txn branch: 40f8201c
             oct: 3, prv: 0, user: 24/APPMGR
   O/S info: user: Administrator, term: CIMMB, ospid: 219:228, machine: PDP1_MES_DOM\CIMMB
             program: TIME_GAP.exe
   last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=60687 wait_time=-1
               driver id=54435000, #bytes=1, =0
     ----------------------------------------
...

Problem explanation:

As you see in your SQL statement, your bind variables are system generated bind variables. In other words, cursor sharing is enabled in your database.

Also, as seen in the process state, your last wait event is SQL*Net message from dblink. That means a dblink operation had been done before.

Workaround:

Use cursor_sharing=exact

Bug:

  • Bug:2169897 ORA-600 ARGUMENTS: [KKSLGOP1] VIA SELECT ACROSS DB_LINK
  • Bug:2159152 CURSOR_SHARING=FORCE MAY NOT SHARE STATEMENTS USING VIEWS IN 8172/8173
    Back to top




[QA-24] ORA-07445 [000000010112A75C] during import. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.7.2.0
Operating System: Solaris
Operating System Version: 5.8

 Description   

Error code:

ORA-07445: exception encountered: core dump [000000010112A75C] [SIGSEGV] [Address not mapped to object] [260] [] []

Current SQL statement for this session:

CREATE PROCEDURE TableParse_Proc wrapped
0
abcd
abcd
abcd
abcd
abcd
..

Oracle kernel function from which the problem is raised:

parfs4_freelist_sort()

Process state:

PROCESS STATE
-------------
...
   ----------------------------------------
   SO: 399071ce0, type: 3, owner: 39905dc18, pt: 0, flag: INIT/-/-/0x00
   (session) trans: 399e4f3b8, creator: 39905dc18, flag: (10000041) USR/- BSY/-/-/-/-/-
             DID: 0001-0008-00000002, short-term DID: 0000-0000-00000000
             txn branch: 0
             oct: 24, prv: 0, user: 360/WINRLS
   O/S info: user: mtrxdev, term: pts/2, ospid: 8915, machine: vhdcap5g
             program: imp@vhdcap5g (TNS V1-V3)
   last wait for 'SQL*Net more data from client' blocking sess=0x0 seq=45627 wait_time=-2
               driver id=62657100, #bytes=2882, =0
     ----------------------------------------

Problem explanation:

As seen above, this problem was encountered in import while creating a wrapped package.

Bug:

There are several bugs about this problem with additional ORA-4030 error. The base bug is below:

  • Bug:2278310 IMPORT OF WRAPPED PL/SQL PROCEDURE FAILS WITH ORA-04030





[QA-23] ORA-00600 [15851] while creating unique index. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 7.3.4.0.0
Operating System: Windows
Operating System Version: 4.0

 Description   
ORA-00600 [15851] while creating unique index.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:31 PM ]

Error code:

ORA-00600: internal error code, arguments: [15851], [8], [8], [1], [2], [], [], []

Oracle kernel function from which the problem is raised:

srsqb1nx().

Problem explanation:

Most probably, this is a sort problem while creating index.

Bug:

Metalink Note:1032586.6 ORA-600 [15851]





[QA-22] ORA-00600 [13004] while creating index. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 7.3.4.0.0
Operating System: Windows
Operating System Version: 4.0

 Description   
ORA-00600 [13004] while creating index.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:27 PM ]

Error code:

ORA-00600: internal error code, arguments: [13004], [], [], [], [], [], [], []

Oracle kernel function from which the problem is raised:

kkrirop().

This is a function of Oracle Compilation Layer.

Bug:

Bug:994802 CREATE INDEX RESULTS IN ORA-600 [13004]





[QA-21] ORA-07445 [11]: SMON crashed. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.7.1.0
Operating System: HP-UX
Operating System Version: B.11.00

 Description   
ORA-07445 [11]: SMON crashed.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:23 PM ]

Error code:

 
ORA-07445: exception encountered: core dump [11] [3221212616] [240] [0] [] []

Oracle kernel function from which the problem is raised:

 
kdb4_dup_keys().

This is a function of Oracle Data Layer.

Cursor dump:

 
********************   Cursor Dump   ************************
Current cursor: 1, pgadep: 1
  pgactx: c00000014e736d90 ctxcbk: c00000014e776720 ctxqbc: 0 ctxrws: c00000014e7253c8
Cursor Dump:
----------------------------------------
Cursor 1 (80000001000befe8): CURBOUND  curiob: 80000001000c1358
curflg: 5 curpar: 0 curusr: 0 curses c00000012c18a070
cursor name: delete from uet$ where ts#=:1 and segfile#=:2 and segblock#=:3 and ext#=:4
child pin: c00000013511e670, child lock: c00000013511d630, parent lock: c00000013511d6a0
xscflg: 20100466, parent handle: c00000014e748b20, xscfl2: 5100400
 nxt: 3.0x00000560  nxt: 2.0x000005e0  nxt: 1.0x000005e0
Cursor frame allocation dump:
frm: -------- Comment --------  Size  Seg Off
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
  bfp=80000001000d2470 bln=22 avl=02 flg=05
  value=1
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
  bfp=80000001000d2440 bln=24 avl=02 flg=05
  value=2
bind 2: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
  bfp=80000001000d2410 bln=24 avl=02 flg=05
  value=2
bind 3: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
  bfp=80000001000d23e0 bln=24 avl=02 flg=05
  value=59
End of cursor dump

Recomendation:

Check if sys.uet$ is corrupted.

Bug:

Bug:2106455 SMON CRASHES WITH ORA-07445 IN KDB4_DUP_KEYS





[QA-20] ORA-00600 [723]: Memory leak in LGWR. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.6.0.0
Operating System: HP-UX
Operating System Version: B.11.00

 Description   
ORA-00600 [723]: Memory leak in LGWR.

 Comments   
Comment by ubTools Support [ 15/Jul/07 05:17 PM ]

Error code:

 
ORA-00600: internal error code, arguments: [723], [5200], [5200], [memory leak], [], [], [], []

Oracle kernel function from which the problem is raised:

 
ksmdpg()

Deallocate variable PGA. Just free top PGA heap, the callback will free. the extents to the OSD.
Ref: Bug:1283286

Process state:

 
PROCESS STATE
-------------
Process global information:
    process: 0, call: 0, xact: 0, curses: 0, usrses: 0
No process is allocated.
END OF PROCESS STATE

PGA Heap dump:

 

******** ERROR: PGA memory leak detected 5200 > 3616 ********
******************************************************
HEAP DUMP heap name="pga heap"  desc=0x40003190
extent sz=0x2148 alt=40 het=32767 rec=0 flg=3 opc=3
parent=0 owner=0 nex=0 xsz=0x2148
EXTENT 0
 Chunk 400de8e0 sz=     4432    free      "               "
 Chunk 400dfa30 sz=      256    freeable  "LGWR PIC bds ar"
 Chunk 400dfb30 sz=      896    freeable  "LGWR PIC ins ar"
 Chunk 400dfeb0 sz=      896    freeable  "LGWR PIC ins ar"
 Chunk 400e0230 sz=      568    free      "               "
 Chunk 400e0468 sz=      896    freeable  "LGWR PIC ins ar"
 Chunk 400e07e8 sz=      568    free      "               "
EXTENT 1
.

Problem explanation:

As seen above and included in your trace, the memory class of some chunks are "LGWR PIC ins ar" and similar. If you notice that sum of them is 5200 bytes, and they are freeable chunks. These chunks are leaked.

Also, there is no allocated process for LGWR. The most probably, you are closing the database.

Workaround:

There is no data corruption in this error, and can be safely ignore for small memory leaks by adding the following event to init.ora:

  • event = "10262 trace name context forever, level 6000"

Then, restart your database. This event disables space leaks less than 6000 bytes.

You can see the details at Metalink Note:39308.1 ORA-600 [723] "PGA memory leak"

Bug:

Bug:1125724 ORA-600[723] DURING SHUTDOWN





[QA-19] ORA-00600 [2845] in UPDATE. WRONG ROWID VALUE. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 7.3.2.3.0
Operating System: HP-UX
Operating System Version: B.10.20

 Description   

Error code:

 
ORA-00600: internal error code, arguments: [2845], [0], [50], [39314], [], [], [], []

Current SQL statement for this session:

 
update pers_auth_str_tbl  set asgn_str=:b1 where rowid=:b2

Oracle kernel function from which the problem is raised:

 
kcfrbd()

This funtion is a funtion of Oracle's Cache Layer.

Values of bind variables:

 
:b1 = 0
:b2 = "9992"

Data types of bind variables:

 
:b1 : Number
:b2 : Varchar2

Problem explanation:

As you see, your ROWID value in :b2 is "9992". This is incorrect. ROWID format in Oracle7 is 'BBBBBBBB.SSSS.FFFF' (Block.Slot.File).

Bug:

Check Bug#632396. This bug says:

  • The correct behaviour is to return an "Invalid Rowid" message.

Recomendation:

Use proper datatype in the bind variable.






[QA-18] ORA-00600 [6033] in SELECT. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 8.1.7.0.0
Operating System: Solaris
Operating System Version: 5.8

 Description   

Error code:

 
ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []

Current SQL statement for this session:

 
SELECT *   FROM CSD_BOUNCE_CONTENT_BODY  WHERE BOUNCE_CONTENT_ID = :b1

Oracle kernel function from which the problem is raised:

 
kdifxs()

This is a function of Oracle Data Layer and responsible for fetching a row in an index scan.

Leaf block dump:

 
Leaf block dump
===============
header address 2567438500=0x990800a4
kdxcolev 0
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 663
kdxcofbo 1410=0x582
kdxcofeo 4421=0x1145
kdxcoavs 8687
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 62923132=0x3c0217c
kdxledsz 0
kdxlebksz 16152
kdxlepnro 11
kdxlepnco 1
...

This is an index object# 0x1ec41. As seen above, kdxcoopc is 0xa0. That means, this index is a key compressed V8 B*Tree index. Also, kdxledsz is 0. In other words, this index is a non-unique index.

Recommendations:

Check your table by the following statement against any possible corruption:

  • SQL > analyze table CSD_BOUNCE_CONTENT_BODY validate structure cascade;

If no corruption is detected, please see the following bugs:

  • ORA-600 [6033] DURING WORK FLOW ORDER IMPORT PROCESS
  • ORA-600 [711], [1], [0X2EDFE84] [KDIFXS - PREFIX CONTEXT] WITH COMPRESSED INDEX





[QA-17] Which parameters affect CBO ? Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - SQL Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Admin
Resolution: Answered Votes: 0

Product Version: Generic
Operating System: Generic

 Description   
Which parameters affect CBO ?

 Comments   
Comment by ubTools Support [ 15/Jul/07 02:37 PM ]
The parameters affecting CBO are included in Event 10053 trace files.

Sample:

 
SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> select f1 from test10053 where f1=23;

       F1
----------
       23

Trace file is generated under USER_DUMP_DEST. Here is an excerpt from the trace file:

 
*** 2005-01-10 13:09:03.010
*** SESSION ID:(8.1072) 2005-01-10 13:09:03.008
QUERY
select f1 from test10053 where f1=23
***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
OPTIMIZER_FEATURES_ENABLE = 9.2.0
OPTIMIZER_MODE/GOAL = Choose
_OPTIMIZER_PERCENT_PARALLEL = 101
HASH_AREA_SIZE = 1048576
HASH_JOIN_ENABLED = TRUE
HASH_MULTIBLOCK_IO_COUNT = 0
SORT_AREA_SIZE = 524288
OPTIMIZER_SEARCH_LIMIT = 5
PARTITION_VIEW_ENABLED = FALSE
_ALWAYS_STAR_TRANSFORMATION = FALSE
_B_TREE_BITMAP_PLANS = TRUE
STAR_TRANSFORMATION_ENABLED = FALSE
_COMPLEX_VIEW_MERGING = TRUE
_PUSH_JOIN_PREDICATE = TRUE
PARALLEL_BROADCAST_ENABLED = TRUE
OPTIMIZER_MAX_PERMUTATIONS = 2000
OPTIMIZER_INDEX_CACHING = 0
_SYSTEM_INDEX_CACHING = 0
OPTIMIZER_INDEX_COST_ADJ = 100
OPTIMIZER_DYNAMIC_SAMPLING = 1
_OPTIMIZER_DYN_SMP_BLKS = 32
QUERY_REWRITE_ENABLED = FALSE
QUERY_REWRITE_INTEGRITY = ENFORCED
_INDEX_JOIN_ENABLED = TRUE
_SORT_ELIMINATION_COST_RATIO = 0
_OR_EXPAND_NVL_PREDICATE = TRUE
_NEW_INITIAL_JOIN_ORDERS = TRUE
ALWAYS_ANTI_JOIN = CHOOSE
ALWAYS_SEMI_JOIN = CHOOSE
_OPTIMIZER_MODE_FORCE = TRUE
_OPTIMIZER_UNDO_CHANGES = FALSE
_UNNEST_SUBQUERY = TRUE
_PUSH_JOIN_UNION_VIEW = TRUE
_FAST_FULL_SCAN_ENABLED = TRUE
_OPTIM_ENHANCE_NNULL_DETECTION = TRUE
_ORDERED_NESTED_LOOP = TRUE
_NESTED_LOOP_FUDGE = 100
_NO_OR_EXPANSION = FALSE
_QUERY_COST_REWRITE = TRUE
QUERY_REWRITE_EXPRESSION = TRUE
_IMPROVED_ROW_LENGTH_ENABLED = TRUE
_USE_NOSEGMENT_INDEXES = FALSE
_ENABLE_TYPE_DEP_SELECTIVITY = TRUE
_IMPROVED_OUTERJOIN_CARD = TRUE
_OPTIMIZER_ADJUST_FOR_NULLS = TRUE
_OPTIMIZER_CHOOSE_PERMUTATION = 0
_USE_COLUMN_STATS_FOR_FUNCTION = TRUE
_SUBQUERY_PRUNING_ENABLED = TRUE
_SUBQUERY_PRUNING_REDUCTION_FACTOR = 50
_SUBQUERY_PRUNING_COST_FACTOR = 20
_LIKE_WITH_BIND_AS_EQUALITY = FALSE
_TABLE_SCAN_COST_PLUS_ONE = TRUE
_SORTMERGE_INEQUALITY_JOIN_OFF = FALSE
_DEFAULT_NON_EQUALITY_SEL_CHECK = TRUE
_ONESIDE_COLSTAT_FOR_EQUIJOINS = TRUE
_OPTIMIZER_COST_MODEL = CHOOSE
_GSETS_ALWAYS_USE_TEMPTABLES = FALSE
DB_FILE_MULTIBLOCK_READ_COUNT = 16
_NEW_SORT_COST_ESTIMATE = TRUE
_GS_ANTI_SEMI_JOIN_ALLOWED = TRUE
_CPU_TO_IO = 0
_PRED_MOVE_AROUND = TRUE
***************************************
BASE STATISTICAL INFORMATION
***********************
Table stats    Table: TEST10053   Alias: TEST10053
 TOTAL ::  CDN: 1000  NBLKS:  2  AVG_ROW_LEN:  7
-- Index stats
 INDEX NAME: I_TEST10053  COL#: 1
   TOTAL ::  LVLS: 1   #LB: 3  #DK: 1000  LB/K: 1  DB/K: 1  CLUF: 2
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column:         F1  Col#: 1      Table: TEST10053   Alias: TEST10053
   NDV: 1000      NULLS: 0         DENS: 1.0000e-03 LO:  1  HI: 1000
   NO HISTOGRAM: #BKT: 1 #VAL: 2
 TABLE: TEST10053     ORIG CDN: 1000  ROUNDED CDN: 1  CMPTD CDN: 1
 Access path: tsc  Resc:  2  Resp:  2
 Access path: index (iff)
     Index: I_TEST10053
 TABLE: TEST10053
     RSC_CPU: 0   RSC_IO: 2
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e+00
 Access path: iff  Resc:  2  Resp:  2
 Access path: index (equal)
     Index: I_TEST10053
 TABLE: TEST10053
     RSC_CPU: 0   RSC_IO: 1
 IX_SEL:  0.0000e+00  TB_SEL:  1.0000e-03
 BEST_CST: 1.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]: TEST10053 [TEST10053]
Best so far: TABLE#: 0  CST:          1  CDN:          1  BYTES:          3
Final:
 CST: 1  CDN: 1  RSC: 1  RSP: 1  BYTES: 3
 IO-RSC: 1  IO-RSP: 1  CPU-RSC: 0  CPU-RSP: 0

Warnings:

  • To generate Event 10053 data, statement must be HARD PARSED.
  • RBO doesn't generate Event 10053 data.




[QA-16] Does commit cause checkpoint ? Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Generic
Operating System: Generic

 Description   
Does commit cause checkpoint ?

 Comments   
Comment by ubTools Support [ 15/Jul/07 02:34 PM ]
Commit doesn't cause a checkpoint itself. If it was so, there would not be a need for redo-logging.

There is an article of K Gopalakrishnan. It's published in Oracle Internals magazine. Gopal explains the relationship and differences between commit-SCN and SCN.





[QA-15] SQ enqueue problem. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.1.0.3
Operating System: Generic

 Description   
Other than SYSDBA, no new connections allowed to the database.

 Comments   
Comment by ubTools Support [ 15/Jul/07 02:32 PM ]

An excerpt from SYSTEMSTATE dump:

The SYSTEMSTATE dump was generated in USER_DUMP_DEST as below:

 
SQL> connect / as sysdba
SQL> alter session set max_dump_file_size=UNLIMITED;
SQL> alter session set events 'IMMEDIATE trace name SYSTEMSTATE level 10';
-- 2 or 3 minutes later
SQL> alter session set events 'IMMEDIATE trace name SYSTEMSTATE level 10';

There are many sessions in SYSTEMSTATE dump waiting for enq: SQ - contention as below:

(session) trans: 0, creator: 41b5c31f8, flag: (e1) USR/- BSY/////-
DID: 0001-0057-00000021, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
O/S info: user: , term: , ospid: , machine:
program:
waiting for 'enq: SQ - contention' blocking sess=0x4234f5b68 seq=1 wait_time=0
name|mode=53510006, object #=8e, 0=0
Dumping Session Wait History
for 'enq: SQ - contention' count=1 wait_time=3007641
name|mode=53510006, object #=8e, 0=0
for 'enq: SQ - contention' count=1 wait_time=3007783
...
SO: 40e830a70, type: 54, owner: 4234e5f20, flag: INIT///0x00
LIBRARY OBJECT LOCK: lock=40e830a70 handle=41985e8b8 mode=N
call pin=41a6d2740 session pin=0 hpc=0000 hlc=0000
htl=40e830ae0[40e8308e8,40e6833d8] htb=40e8d2cc8
user=4234e5f20 session=4234e5f20 count=1 flags=PNC/[0400] savepoint=2
LIBRARY OBJECT HANDLE: handle=41985e8b8
name=SYS.AUDSES$
hash=deaeba50687d3d62c586aafe9b84f98c timestamp=07-20-2004 15:34:57
namespace=TABL flags=KGHP/TIM/SML/[02000000]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=S latch#=34 hpc=022e hlc=022e

The blocking session(0x4234f5b68):

SO: 4234f5b68 , type: 4, owner: 41b5c8c60, flag: INIT///0x00
(session) trans: 41cde12f8, creator: 41b5c8c60, flag: (e1) USR/- BSY/////-
DID: 0001-0056-00000020, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
O/S info: user: , term: , ospid: , machine:
program:
waiting for 'gc cr request' blocking sess=0x0 seq=2 wait_time=0
file#=1, block#=1ea, class#=1
Dumping Session Wait History
for 'gc cr request' count=1 wait_time=1230415
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230287
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1220829
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230501
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230312
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230295
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230618
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1230445
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1229421
file#=1, block#=1ea, class#=1
for 'gc cr request' count=1 wait_time=1231336
file#=1, block#=1ea, class#=1
temporary object counter: 0

Problem interpretation:

The blocker session waited for a RAC related wait event named gc cr request for the same file# and block#. Unfortunately, at the time of the problem happened, no SYSTEMSTATE dumps were generated for the other nodes. So, it was not possible to diagnose the root blocker on the other node to find why it holds same buffer too long.

The sessions were waiting for SQ enqueue on SYS.AUDSES$ sequence. During connection, the value of V$SESSION.AUDSID is obtained from SYS.AUDSES$ sequence. SYSDBA doesn't use this sequence in connection. So, it was not blocked.

Solution:

The default cache size of SYS.AUDSES$ was 20. It has been increased to 1000.





[QA-14] is the current CPU breakdown formula correct ? Created: 15/Jul/07  Updated: 19/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Generic
Operating System: Generic

 Description   
is the current CPU breakdown formula correct ?

CPU used by this session = parse time cpu + recursive cpu usage + others



 Comments   
Comment by ubTools Support [ 15/Jul/07 02:14 PM ]

Answer:

This is the most well-known, but wrong formula I've read in many Oracle documentations.

parse time cpu includes parse cpu time of both recursive and user statements. recursive cpu usage includes both parse cpu time and non-parse cpu time of recursive statements. That means parse cpu usage of recursive statements is included in both parse time cpu and recursive cpu usage. In other words, it's duplicated and formula above is not correct.

ubTools offers the following formula:

CPU used by this session = parse time cpu + others(exec_and_fetch_time_cpu)

Question:

If there is little or no SQL processing done within PL/SQL, should I also subtract recursive cpu usage from CPU used by this session to get the others cpu component ?

Answer:

NO. A formula should explain all cases. It should not work for just some scenarios only.

Also, both SQL and statements in PL/SQL are associated with a cursor internally in Oracle perspective. In other words, they are not different things in PARSE,EXEC,FETCH calls. If a statement is called by an other statement, it's called recursive statement. So, both an SQL and a PL/SQL can be recursive statements.

  • If there is no SQL processing in PL/SQL, it means there is no SQL in parent and child PL/SQLs. There are 2 scenarios for this case:
    • If There is no child PL/SQL in the parent PL/SQL, recursive cpu usage is ZERO. Since it's zero, no need to substruct it from CPU used by this session.
    • If there is child PL/SQL in the parent PL/SQL, PARSE call is done for child PL/SQL in recursive mode. In this case, parse time cpu of recursive statement is already included in recursive cpu usage. So, recursive cpu usage should not be substructed from CPU used by this session.
  • If there is little SQL processing in PL/SQL with little parse time cpu, the distortion in the mentioned wrong formula is small. But, recursive cpu usage should still NOT be substructed from CPU used by this session even if the distortion is small. Why should DBAs substruct it if they have more correct formula ? No need.

Recommendation:

The current Reponse Time Performance Analysis(RTA) implementaions are not correct. RTA has not reached its next level, yet. That's why ubTools offered a new technique by Microstate Response-time Performance Profiling (MRPP).

There has been a question on this topic at Tom Kyte's site by referring ubTools:

Question:

 
Tom,

Just wanted to: what exactly is "CPU used by this session". One site(
http://www.ubtools.com/cgi-bin/ib/ikonboard.cgi?act=ST;f=25;t=4
says
<>
CPU used by this session = parse time cpu + recursive cpu usage + others

This is the most well-known, but wrong formula I've read in many Oracle
documentations.

parse time cpu includes parse cpu time of both recursive and user statements.
recursive cpu usage includes both parse cpu time and non-parse cpu of recursive
statements. That means parse cpu usage of recursive statements is included in
both parse time cpu and recursive cpu usage. In other words, it's duplicated and
formula above is not correct.

ubTools offers the following formula:

CPU used by this session = parse time cpu + others(exec_and_fetch_time_cpu)
<>

what is exec_and_fetch_time_cpu ?

Regards

Tom Kyte's answer:

 
I am not so sure they are correct.  unless they are talking about the
description of cpu used by this session (i is not clear to me whether they are
saying "the description is wrong" or "the value reported by the statistic is
wrong"

if the values were wrong, the cpu times reported for most things would exceed
elapsed time by large margins.  so, they should be able to demonstrate that for
us.

(and you would have to ask the author of an article in most cases "what did you
mean by this "exec and fetch time cpu" and how exactly do you think we could
find it)

I think they were saying "the description provided is wrong", but I have an
easier description.

cpu use by this session is cpu used by that session.

Our answer:

We said the current CPU breakdown formula is incorrect, not the description of Oracle statistics.

CPU used by this session is the total CPU usage in session or instance level. And, there are 3 components in CPU usage:

  • Parse
  • Exec
  • Fetch

These components can be seen in SQL_TRACE / EVENT10046 traces. Parse component is available by parse time cpu statistic. Since there is no Oracle statistic for Exec/Fetch components, we call them as others.

We had not mentioned values of the CPU usage statistics in this discussion. If we start talking about the values, it gets started another wrong topic on the values. Here is a brief explanation:

  • In busy environments, distortion on CPU measurement is minimal. In, non-busy environments, it may not be minimal. In many cases, there is no big performance problem in non-busy environments. So, the distortion on CPU usage doesn't make sense in many cases.
  • The wait measurement includes serious distortions in busy environments.

ubTools says for years that RESPONSE TIME ANALYSIS(RTA) CAN NOT BE IMPLEMENTED IN INSTANCE LEVEL. RTA IS A METHOD FOR SESSION LEVEL.

For the full details with the proven samples, see Microstate Response-time Performance Profiling (MRPP).





[QA-13] Dumping a stack trace is too slow in 10g. Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Database Tuning Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.1.0.2
Operating System: Generic

 Description   

Error code:

ORA-4031 (may not be seen by end users).

Error code definition:

The CPU usage reaches 98% in KERNEL mode. strace utility on linux reports that the process spins on read() system call.

System calls:

An excerpt from strace -p <OSPID> output:

 
read(29, "<\345&\0\1\0\21\0\330Wk\10\0\0\0@\30\0\0\0\0\0\0\0", 24) = 24
read(29, "I\345&\0\1\0\22\0\230\226\217\10\0\0\0@\30\0\0\0\0\0\0"..., 24) = 24
read(29, "\20\1\0\0\1\0\24\0\260\n3\0\0\0\0`\20\0\0\0\0\0\0\0", 24) = 24
read(29, "\34\1\0\0\1\0\34\0`\200\353\0\0\0\0`\10\0\0\0\0\0\0\0", 24) = 24
read(29, "\'\1\0\0\1\0\34\0h\200\353\0\0\0\0`\4\0\0\0\0\0\0\0", 24) = 24
read(29, "3\1\0\0\1\0\34\0l\200\353\0\0\0\0`\4\0\0\0\0\0\0\0", 24) = 24
read(29, "V\345&\0\4\0\361\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
read(29, "`\345&\0\1\0\f\0\300XF\4\0\0\0@\4\0\0\0\0\0\0\0", 24) = 24
read(29, "k\345&\0\1\0\21\0\20Xk\10\0\0\0@\30\0\0\0\0\0\0\0", 24) = 24
read(29, "y\345&\0\1\0\22\0\260\226\217\10\0\0\0@\30\0\0\0\0\0\0"..., 24) = 24
read(29, "\20\1\0\0\1\0\24\0\300\n3\0\0\0\0`\20\0\0\0\0\0\0\0", 24) = 24
read(29, "\34\1\0\0\1\0\34\0p\200\353\0\0\0\0`\10\0\0\0\0\0\0\0", 24) = 24
read(29, "\'\1\0\0\1\0\34\0x\200\353\0\0\0\0`\4\0\0\0\0\0\0\0", 24) = 24
read(29, "3\1\0\0\1\0\34\0|\200\353\0\0\0\0`\4\0\0\0\0\0\0\0", 24) = 24

The file descriptor is 29(the first argument in read() system call). By linux lsof command, the descriptor#29 is 1081732 /oracle/product/10.1.0/bin/oracle. In other words, the process is reading Oracle executable.

strace -c -p <OSPID> output for 1 minute:

 
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 21.570914 40 543060 read
0.00 0.000016 2 7 lseek
0.00 0.000003 3 1 getpid
0.00 0.000001 1 1 open
0.00 0.000001 1 1 readlink
------ ----------- ----------- --------- --------- ----------------
100.00 21.570935 543070 total

read() system call had been called 543,060 times per minute. That's why CPU utilization in KERNEL mode is high.

An excerpt from stack trace by OS debugger:

 
#0 0x200000000137fa81 in read () from /lib/tls/libpthread.so.0
#1 0x4000000004d7a7e0 in sskgds_getsnm ()
#2 0x4000000002766160 in skdsttpcs ()
#3 0x4000000001131920 in ksedst ()
#4 0x40000000011b0140 in ksm_4031_dump ()

ksm_4031_dump() function of Oracle dumps ORA-4031 traces. The top of the stack includes read() system calls.

Problem interpretation:

The process gets ORA-4031 error, then tries to dump trace file for this error. But, while dumping the trace, it spins on read() system calls.

Workaround:

Set the following parameters in pfile/spfile:

 
_4031_dump_bitvec        = 0
_4031_max_dumps         = 0

Bug:

Ref: Oracle Note:3964602 DUMPING A CALL STACK TRACE IS SLOW.






[QA-12] Do read()/write() system calls block users in physical IO ? Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: ???
Operating System: Generic

 Description   
do read()/write() system calls block users until physical IO to disk is completed ?

 Comments   
Comment by ubTools Support [ 15/Jul/07 01:35 PM ]
There is a common misconseption that read()/write() system calls block users until physical IO to disk is completed.

read()/write() system calls do not block users during pyhsical IO unless file is opened with O_DIRECT or O_SYNC flags. Users are blocked just during copying buffers from/to user address space to/from kernel address space. So, although read()/write() calls look synchronous in user perspective, they don't do physical IO as synchronously.

In Asynchronous IO calls(i.e aio_read()/aio_write()), users are just blocked during enqueuing IO requests, not during copying buffers from/to user address space to/from kernel address space and not during physical IO.





[QA-11] How to see the tasks of Oracle background processes ? Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 10.1.0.3.0
Operating System: Generic

 Description   
How to see the tasks of Oracle background processes ?

 Comments   
Comment by ubTools Support [ 15/Jul/07 01:29 PM ]

Answer:

Use the following query:

select substr(DEST,1,10) DEST, DESCRIPTION from x$messages order by DEST;

 
DEST DESCRIPTION

---------- ----------------------------------------------------------------

* Monitor Cleanup

* KSB action for X-instance calls

* generic shutdown background

* Scumnt mount lock

* database close in progress

* Poll system events broadcast channel

* svr actn for shrd grp reg/dereg

ARB* ASM to slave BG msg

ARC* Archiver wakeup

ARCH Archiver Message

ARCH Archiver shutdown


DEST DESCRIPTION

---------- ----------------------------------------------------------------

CJQ* Shutdown Job Queue Process

CJQ* Job Queue Interupt

CJQ* Job Queue Interupt

CJQ* Job Queue Interupt

CJQ* Job Queue Timout

CJQ0 Check for async messages from other instances

CJQ0 Coordinator send broadcast timeout

CKPT create/scrub cmon foregrounds

CKPT perform RM action in CKPT

CKPT identify control file

CKPT close control file


DEST DESCRIPTION

---------- ----------------------------------------------------------------

CKPT release (XR,4,0) enqueue

CKPT CKPT stat update timeout action

CKPT CKPT reuse call completion action

CKPT CKPT reuse range call continuation

CKPT CKPT reuse call continuation

CKPT refresh control file

CKPT check for parameters from other instances

CKPT start background

CKPT CPU dynamic reconfiguration

CKPT check for quiesce messages

CKPT unquiesce the instance during database close


DEST DESCRIPTION

---------- ----------------------------------------------------------------

CKPT unsubscribe to quiesce channel

CKPT subscribe to quiesce channel

CKPT Get Proxy Lock

CKPT Db Checkpt Compl check

CKPT Db Checkpt Request check

CKPT update recovery-based i/o statistics

CKPT Compile Environment Monitor

CKPT SQL Memory Management Calculation

CKPT free PX memory chunks in background

CKPT KKX: drop ncomp dll action

CKPT Flashback barrier


DEST DESCRIPTION

---------- ----------------------------------------------------------------

CKPT hold alert level

CKPT recovery area alert action

CKPT start change tracking in ckpt

CKPT get (XR,4,0) enqueue

CKPT sense a heartbeat

CKPT set heartbeat sensing

CKPT emulate i/o errors on a disk

CKPT timeout

CKPT Run self test on group

CKPT asynchronously dismount disk group

CKPT dismount disk group


DEST DESCRIPTION

---------- ----------------------------------------------------------------

CKPT query disk group status

CKPT check disk status

CKPT update disk status

CKPT update disk group status

CKPT kfc CKPT dismount disk group

CKPT kfc CKPT mount disk group

CTWR change tracking message

CTWR change tracking timeout action

DBW* hardware clock went backwards

DBW* DBWR write buffers

DBW* get/release open thread enqueue


DEST DESCRIPTION

---------- ----------------------------------------------------------------

DBW* mount/dismount all db files

DBW0 SGA memory tuning parameter update - DBW0

DBW0 Db mount lock

DBW0 kfcb Poke DBW0

DBW0 kfc mount disk group

DBW0 kfc dismount disk group

DBW0 kfc invalidate file extent

DBW0 Reserve lock name space lock

DBW0 Release lock name space lock

DBW0 complete Release space call

DBW0 verify/invalidate all db files


DEST DESCRIPTION

---------- ----------------------------------------------------------------

DBW0 recovery db file verification

DBW0 identify db file

DBW0 close and unlock db file

DBW0 lock db file

DBW0 offline db file

DBW0 Db File check

DBW0 Message to flush IMU txns

DBW0 Db Instance Lock Mgmt

DIAG write trace records out

DIAG Clusterwise dump request

DIAG poradebug commands


DEST DESCRIPTION

---------- ----------------------------------------------------------------

DIAG write trace records out

DIAG write trace records out

DIAG write trace records out

DMON DMON Wakeup

DMON DMON shutdown

DMON DMON Verify Standby shutdown for PM violation

DMON Standby site request resync

DMON Metadata file available

DMON DMON rcv NS status

DMON DMON Receive Message

DMON DMON Disable DRC


DEST DESCRIPTION

---------- ----------------------------------------------------------------

DMON DMON Interrupt Routine

INSV INSV Wakeup

INSV NetSlave Shutdown Message

INSV INSV Receive Message

LCK0 ksim LCK0 functions

LCK0 ksim reg/dereg instance group

LCK0 ksim query instance group

LCK0 ksim polling interrupt action

LCK0 KSXR remote instance died

LCK0 KSXR finialize

LCK0 kxfp signal recv function


DEST DESCRIPTION

---------- ----------------------------------------------------------------

LCK0 get and hold global enqueue

LCK0 perform a user instance lock operation

LCK0 SMON purge object number cache

LCK0 KQLM interrupt action

LCK0 KQLM invalidation instance lock operation

LCK0 KQLM pin instance lock operation

LCK0 KQR timeout action

LCK0 KQR get instance lock

LCK0 sequence bckgrnd instance lock

LCK0 release TS enq for sort segment

LCK0 kea signal recv function


DEST DESCRIPTION

---------- ----------------------------------------------------------------

LCK0 get TS enq for sort segment

LCK0 release quiesce enqueue

LCK0 get quiesce enqueue

LCK0 KCL lock affinity timeout action

LCK0 Check SCN adjust

LCK0 Cross-instance broadcast message

LCK0 ksim get value

LGWR LGWR failure

LGWR kfr ACD relocation

LGWR kfr Incr Ckpt

LGWR kfr Poke LGWR


DEST DESCRIPTION

---------- ----------------------------------------------------------------

LGWR kfr Dismount disk group

LGWR kfr mount disk group

LGWR LGWR to Start DMON

LGWR free KTU instance lock

LGWR convert KTU instance lock

LGWR get KTU instance lock

LGWR dml_locks = 0 global enforcement

LGWR Open/close/mount/dismount thread

LGWR Redo writer generate offline immed marker

LGWR Redo writer log switch operations

LGWR LGWR re-eval standby locks


DEST DESCRIPTION

---------- ----------------------------------------------------------------

LGWR Redo writer interrupt action

LGWR Redo writer IO's

LMD* Flush side-channel msgs LMD

LNS* Network Server wakeup

LNS* Network Server forced

LNS* Network Server shutdown

LNS* Network Server reinit

MMAN lock memory at startup

MMAN Memory Management

MMAN Handle sga_target resize

MMAN Reset advisory pool when advisory turned ON


DEST DESCRIPTION

---------- ----------------------------------------------------------------

MMAN Complete deferred initialization of components

MMAN lock memory timeout action

MMNL tune undo retention

MMNL MMNL Periodic MQL Selector

MMNL ASH Sampler (KEWA)

MMNL MMON SWRF Raw Metrics Capture

MMON reload failed KSPD callbacks

MMON SGA memory tuning

MMON background recovery area alert action

MMON Flashback Marker

MMON tablespace alert monitor


DEST DESCRIPTION

---------- ----------------------------------------------------------------

MMON UNDO MMON ACTION

MMON MMON Local action Listener

MMON MMON Remote action Listener

MMON Advisor delete expired tasks

MMON ASH Emergency Flusher (KEWA)

MMON MMON SWRF Auto DBFUS Task

MMON MMON SWRF Auto Purge Task

MMON MMON SWRF Auto Flush Task

MMON alert message purge

MMON alert message cleanup

MMON Check for sync messages from other instances


DEST DESCRIPTION

---------- ----------------------------------------------------------------

MMON ADDM (KEH)

MMON threshold reconciliation

MMON metrics monitoring

MMON shutdown MMON

MMON run-once action driver

MMON MMON testing slave

MMON MMON testing action

MMON MMON Completion Callback Dispatcher

MMON Job Autostart action force

MMON Coordinator autostart timeout

MMON Check for autostart messages from other instances


DEST DESCRIPTION

---------- ----------------------------------------------------------------

MMON Compute cache stats in background

MMON undo usage

MMON recovery area alert action

MMON SGA memory tuning parameter update

MMON reconfiguration MMON action

NSV* NetSlave Wakeup Message

NSV* NetSlave Receive Message

NSV* NetSlave Metadata Resync

NSV* NetSlave Health Check Message

NSV* NetSlave Shutdown Message

NSV* NetSlave request Primary to resync


DEST DESCRIPTION

---------- ----------------------------------------------------------------

NSV* NetSlave Check DRC version

QMNC Shutdown Q Monitor Coord

RBAL ASM to master BG msg

RBAL BG load lib msg

RBAL|SMON OSM to BG mesg

RECO distributed recovery wakeup

RECO distributed recovery shutdown

RSM* RSM Wakeup

RSM* RSM Receive Message

RSM* RSM Receive Message Response

RVWR Open/close flashback thread


DEST DESCRIPTION

---------- ----------------------------------------------------------------

RVWR RVWR IO's

SMON kfcl instance recovery

TEST Reliable Test Dummy Call


212 rows selected.


SQL>




[QA-9] How to set an event in other session ? Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Administration Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: ???
Operating System: Generic

 Description   
How to set an event in other session ?

 Comments   
Comment by ubTools Support [ 15/Jul/07 01:19 PM ]

Answer:

Use SYS.DBMS_SYSTEM.SET_EV() procedure. Here is the specification for this procedure:

 
PROCEDURE SET_EV
Argument Name                  Type                    In/Out Default?
------------------------------ ----------------------- ------ --------
SI                             BINARY_INTEGER          IN
SE                             BINARY_INTEGER          IN
EV                             BINARY_INTEGER          IN
LE                             BINARY_INTEGER          IN
NM                             VARCHAR2                IN
  • SID: V$SESSION.SID
  • SE: V$SESSION.SERIAL#
  • EV: Event number. For example:
    • 10046: SQL traces.
    • 10053: Optimizer traces.
    • NNN : ORA-NNN errors.
    • 65535: IMMEDIATE traces.
  • LE: Event level. For Event 10046 events:
    • 0: Disable event.
    • 1: PARSE, FETCH, EXEC, EXECUTION PLAN
    • 4: Level 1 + BINDS
    • 8: Level 1 + WAITS
    • 12: Level 4 + Level 8
  • NM: Event name. For example:
    • ERRORSTACK.......: For error stack traces.
    • PROCESSSTATE...: For process states
    • SYSTEMSTATE.......: For System states.
    • ''..................................: For CONTEXT FOREVER.

Sample:

Dumps PROCESSSTATE trace IMMEDIATELY in LEVEL 10:

 
SQL> exec dbms_system.set_ev(8,1056,65535,10,'PROCESSSTATE');

Dumps ERRORSTACK trace in LEVEL 3 on ORA-942 error:

 
SQL> exec dbms_system.set_ev(8,1060,942,3,'ERRORSTACK');

Dumps Event 10046 trace in LEVEL 8 for CONTEXT FOREVER:

 
SQL> exec dbms_system.set_ev(8,1060,10046,8,'');




[QA-8] Heapdump Interpretation Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: ???
Operating System: Generic

 Description   
I have a process which is taking up way more memory than I'd expected. The process runs a PL/SQL that does some nested loop joins on a PL/SQL table.

The background process is using > 200Mb of private memory and this number goes up if we tweak the WHERE clause in the join to return more data.

I did a heapdump of the process and the trace file looks like this (lots of stuff trimmed):

 
...
EXTENT 437
Chunk 925dfe4 sz= 1836 perm "perm " alo=1836
Chunk 925e710 sz= 1156 recreate "session heap " latch=0
ds 92693fc sz= 30315156 ct= 440
b7aa56c sz= 3980
92f30a0 sz= 1072
afb6e34 sz= 16472
afb2dcc sz= 16472
afaed64 sz= 16472
...

I presume that "session heap" is the UGA for this process'
session. Basically it goes on like this for several pages with sz anywhere between 16k and 1Mb. How can I interpret this? I presume the memory is to do with cursor information. This is a sort but the sort area size is only 10Mb and cannot account for all the private memory in use.

I'm just trying to decide if this is a reasonable amount of memory to be using (i.e. explain what it is using it for) and just put up with it, or if something has gone wrong. I'm on 8.1.5 on Linux 2.2 (I know, I know...)

Thanks for any insight!



 Comments   
Comment by ubTools Support [ 15/Jul/07 01:03 PM ]

Answer:

A heap consists of memory areas named extent. Each extent consists of memory areas named chunks.

Interpretation:

 
EXTENT 437
  Chunk 925dfe4 sz=     1836    perm      "perm           "
alo=1836
  Chunk  925e710 sz=     1156    recreate  "session heap   "
latch=0

EXTENT 437        ---> extent number
925dfe4           ----> chunk address
sz=               -----> size of chunk
perm              ------> permanent memory class
"perm           " ------> chunk comment

Memory classes can be the followings:

  • Recreatable (can be removed and then recreated when requested. i.e: shared SQL statements)
  • Free (free, no object in it)
  • Freeable(used in session/call duration)
  • Permanent(for permament objects)

Each chunk in same extent is contiguous. For your case, the first chunk address(0x925dfe4) + its size(1836) = the second chunk address (0x925e710)

For your problem:

Shared memory segments such as SGA are included in process address space. So, You may be encoutering this problem. Search metalink for pmap command.





[QA-7] _TRACE_FILES_PUBLIC parameter Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: ???
Operating System: Generic

 Description   

Parameter:

 
Name.................: _TRACE_FILES_PUBLIC
Values...............: TRUE/FALSE
Default value........: FALSE
Initial Release......: ?
Scope................: Instance

Explanation:

Trace files are not created with read permission by default for non-dba groups. Here is a sample on Linux:

 
$ ls -ltr
total 4
-rw-r-----    1 oracle   oinstall     2146 Jan  6 11:37 linkplus_ora_18653.trc

With _TRACE_FILES_PUBLIC=TRUE, other groups can read trace files.

 
$ ls -ltr
total 8
-rw-r--r--    1 oracle   oinstall     2742 Jan  6 12:00 linkplus_ora_18759.trc

Warning:

Setting this parameter to TRUE should be done for trusted users since trace files may include security data in BIND variables.






[QA-6] _OPTIM_PEEK_USER_BINDS parameter Created: 15/Jul/07  Updated: 16/Sep/07

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Internals Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: 9.0.1
Operating System: Generic

 Description   

Parameter:

Name.................: _OPTIM_PEEK_USER_BINDS
Values...............: TRUE/FALSE
Default value........: TRUE
Initial Release......: 9.0.1
Scope................: Instance/Session

Explanation:

Until Oracle 9.0.1, values of bind variables are known in the PARSE phase. Since it's not known, it's not possible to generate execution plans according to bind values.

With 9i and onwards, Oracle peeks the values of bind variables in the FIRST PARSE phase and generates execution plans according to the values in this first PARSE. If subsequent bind values are skewed, then execution plans may not be optimal for the subsequent binds.






Generated at Thu Mar 28 15:40:02 UTC 2024 using JIRA Standard Edition, Version: 3.12.3-#302.