History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QA-45
Type: Oracle - Operating System Oracle - Operating System
Status: Closed Closed
Resolution: Answered
Priority: Major Major
Assignee: ubTools Support
Reporter: ubTools Support
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Questions & Answers

'direct path read temp' hangs on read() system call when ASMLIB in use.

Created: 02/Feb/09 11:13 PM   Updated: 03/Feb/09 12:03 AM
Fix Version/s: None

Product Version: Oracle 10.2.0.4 Standard Edition, RAC
Operating System: Linux
Operating System Version: SLES 10 SP1 (x86-64)


 Description  « Hide
Environment:
Database......: Oracle 10.2.0.4 Standard Edition, RAC
ASMLIB........: oracleasm-2.6.16.46-0.12-smp-2.0.3-1.x86_64.rpm
                oracleasmlib-2.0.2-1.x86_64.rpm
                oracleasm-support-2.1.2-1.SLE10.x86_64.rpm

Description:

direct path read temp hangs on read() system call when ASMLIB in use.

Diagnostic Data for Oracle:

Wait Event:

SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING';

      SEQ# EVENT
---------- ----------------------------------------------------------------
        P1         P2         P3  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ---------- ---------------
     46619 direct path read temp
       202     285578          7          0            5611
...
SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING';

      SEQ# EVENT
---------- ----------------------------------------------------------------
        P1         P2         P3  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ---------- ---------------
     46619 direct path read temp
       202     285578          7          0            5824

SQL>

The session is waiting for the completion of direct path read temp for 5824 seconds. The SEQ# column is not changing. It's TOO long to read just 7 blocks from the disk.

Stack Trace:

SQL> select spid from v$session s,v$process p where s.paddr=p.addr and s.sid=512;

SPID
------------
2359

SQL> oradebug SETOSPID 2359
Oracle pid: 38, Unix process pid: 2359, image: oracle@gdksun1

SQL> oradebug dump errorstack 3
Statement processed.

SQL> oradebug TRACEFILE_NAME
/u03/app/oracle/admin/ORCL/udump/orcl1_ora_2359.trc
SQL>

<from the trace file>

Current SQL statement for this session:
CREATE INDEX "ACD2" ON "ACCOUNT_DETAIL" ...
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksdxfdmp()+1118      call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF177ECC40 ? 7FFF177ECCA0 ?
                                                   7FFF177ECBE0 ? 000000000 ?
ksdxcb()+1547        call     ksdxfdmp()           7FFF177EDD90 ? 000000011 ?
                                                   000000003 ? 7FFF177EDED0 ?
                                                   7FFF177EDE30 ? 000000000 ?
sspuser()+111        call     ksdxcb()             000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF177EDE30 ? 000000000 ?
__funlockfile()+80   call     sspuser()            000000001 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FFF177EDE30 ? 000000000 ?
__read_nocancel()+7  signal   __funlockfile()      00000000D ? 7FFF177EE970 ?
                                                   000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
call_instance_read(  call     __read_nocancel()    00000000D ? 7FFF177EE970 ?
)+12                                               000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
asm_io_v2()+185      call     call_instance_read(  00000000D ? 7FFF177EE970 ?
                              )                    000000050 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
kfkOsmIO()+1205      call     asm_io_v2()          00000000D ? 7FFF177EE970 ?
                                                   000000246 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000000 ? 2B4E95CCE000 ?
kfkReapIO()+497      call     kfkOsmIO()           2B4E95830588 ? 2B4E95AAE000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 2B4E95B2E000 ?
kfkIOPriv()+770      call     kfkReapIO()          000000000 ? 006110320 ?
                                                   2B4E95830588 ? 006110320 ?
                                                   006110320 ? 2B4E95B2E000 ?
kfdIOPriv()+95       call     kfkIOPriv()          000000000 ? 000000000 ?
                                                   000000024 ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
kfioReapIO()+476     call     kfdIOPriv()          000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
kfioRequest()+197    call     kfioReapIO()         7FFF177EED68 ? 000000001 ?
                                                   0FFFFFFFF ? 000000000 ?
                                                   2B4E95B66040 ? 000000001 ?
ksfd_osmwat()+874    call     kfioRequest()        000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdwtio()+693       call     ksfd_osmwat()        000000001 ? 000000000 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdwat1()+220       call     ksfdwtio()           000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdrwat0()+1269     call     ksfdwat1()           000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   7FFF177EED68 ? 2B4E00000001 ?
ksfdblock()+156      call     ksfdrwat0()          000000001 ? 000000030 ?
                                                   07FFFFFFF ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcflwi()+48          call     ksfdblock()          7FFF177F11C0 ? 000000001 ?
                                                   000000010 ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcflci()+689         call     kcflwi()             2B4E95FF3F28 ? 000000001 ?
                                                   000000010 ? 000000000 ?
                                                   2B4E7FFFFFFF ? 2B4E00000001 ?
kcblci()+197         call     kcflci()             2B4E95FF3F28 ? 000000000 ?
                                                   0000000CA ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
kcblcio()+280        call     kcblci()             2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
kcblsltck()+50       call     kcblcio()            2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
stsCheckIO()+194     call     kcblsltck()          2B4E95D168F0 ? 2B4E95FF3E70 ?
                                                   000000001 ? 000045B8A ?
                                                   7FFF177F1270 ? 000000000 ?
srsnext()+746        call     stsCheckIO()         2B4E95D16FE0 ? 2B4E958F9108 ?
                                                   000000000 ? 000000001 ?
                                                   7FFF177F1270 ? 000000000 ?
srsget()+138         call     srsnext()            2B4E9602FE14 ? 000000000 ?
                                                   2B4E95D16FE0 ? 2B4E958F8F10 ?
                                                   2B4E00000000 ? 000000000 ?
sorgetqbf()+297      call     srsget()             2B4E95D16F28 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   2B4E95D372B0 ? 2B4E95D37468 ?
qersoFetch()+176     call     sorgetqbf()          2B4E95D16F28 ? 2B4E95D37468 ?
                                                   2B4E95D372B0 ? 7FFF177F1584 ?
                                                   2B4E95D372B0 ? 2B4E95D37468 ?
qerliFetch()+304     call     qersoFetch()         5EEF015A0 ? 002D292E4 ?
                                                   7FFF177F1748 ? 000000001 ?
                                                   5EEF01648 ? 2B4E95D37468 ?
kdicrws()+8744       call     qerliFetch()         5EEF01358 ? 00143CCD6 ?
                                                   2B4E9581C140 ? 000000001 ?
                                                   5EEF01648 ? 5D2E60688 ?
kdicdrv()+335        call     kdicrws()            5D2E60688 ? 5D2E60B60 ?
                                                   000000000 ? 000000001 ?
                                                   2B4E95D367B0 ? 5D2E60610 ?
opiexe()+12879       call     kdicdrv()            5D2E60B60 ? 5D2E60380 ?
                                                   000000002 ? 000000001 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
                                                   7FFF177F3748 ? 000000004 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiosq()+11          call     opiosq0()            000000003 ? 00000000F ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   2B4E95D367B0 ? 000000000 ?
opiodr()+984         call     opiosq()             000000003 ? 00000000F ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   2B4E95D367B0 ? 000000000 ?
ttcpip()+1012        call     opiodr()             00000004A ? 00000000F ?
                                                   7FFF177F65E0 ? 000000004 ?
                                                   0053E3F30 ? 000000000 ?
opitsk()+1322        call     ttcpip()             0060AB150 ? 7FFF177F4540 ?
                                                   7FFF177F65E0 ? 000000000 ?
                                                   7FFF177F60D8 ? 7FFF177F6748 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFF177F65E0 ? 000000001 ?
                                                   000000000 ? 612CA0900000000 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   000000000 ? 612CA0900000000 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   0053E3D00 ? 612CA0900000000 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF177F77A8 ? 000000000 ?
                                                   0053E3D00 ? 612CA0900000000 ?
opimai_real()+163    call     sou2o()              7FFF177F7780 ? 00000003C ?
                                                   000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
main()+116           call     opimai_real()        000000002 ? 7FFF177F7810 ?
                                                   000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
__libc_start_main()  call     main()               000000002 ? 7FFF177F7810 ?
+244                                               000000004 ? 7FFF177F77A8 ?
                                                   0053E3D00 ? 612CA0900000000 ?
_start()+41          call     __libc_start_main()  0006D23A8 ? 000000002 ?
                                                   7FFF177F7968 ? 000000000 ?
                                                   0053E3D00 ? 000000002 ?

Looks like an hang in __read_nocancel() .

Diagnostic Data for Linux:

strace Output:

oracle@gdksun1:~> strace -fp 2359
Process 2359 attached - interrupt to quit
read(13,

The process is sleeping on the file descriptor(fd) of 13 by read() system call.

lsof Output:

oracle@gdksun1:~> lsof -p 2359|grep 13
oracle  2359 oracle  DEL    REG               0,12
                         131074 /2
oracle  2359 oracle  mem    REG                8,2
    133423                16839 /lib64/ld-2.4.so
oracle  2359 oracle  mem    REG               8,17
    681761                52138 /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so
oracle  2359 oracle  mem    REG               8,17
    691049                52139 /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so
oracle  2359 oracle  mem    REG               8,17
  11385162                44025 /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so
oracle  2359 oracle    6w   REG               8,17
   1494136                90989 /u03/app/oracle/admin/ORCL/bdump/alert_ORCL1.log
oracle  2359 oracle   13u   REG               0,19
         0 18446604444591769000 /dev/oracleasm/iid/0000000000000002
oracle@gdksun1:~>

The fd#13 is an ASM device.

gdb output:

oracle@gdksun1:~> gdb $ORACLE_HOME/bin/oracle 2359
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
Attaching to program: /u03/app/oracle/product/10.2.0/db_1/bin/oracle, process 2359
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so
Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so...done.
Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 47616513025792 (LWP 2359)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libnuma.so...done.
Loaded symbols for /usr/lib64/libnuma.so
Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...done.
Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so
0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb)
(gdb) backtrace
#0  0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00002b4e95d6772c in call_instance_read (priv=<value optimized out>, buf=0x7fff177ee970, size=80)
 at asmlib_v2.c:540
#2  0x00002b4e95d67869 in asm_io_v2 (ctx=0xd, requests=<value optimized out>, reqlen=582,
 waitreqs=0xffffffffffffffff, waitlen=0,
    completions=0x2b4e95cce000, complen=1, timeout=4294967295, statusp=0x7fff177eea14) at asmlib_v2.c:705
#3  0x0000000000b1804d in kfkOsmIO ()
#4  0x0000000000b113e9 in kfkReapIO ()
#5  0x0000000000b0b342 in kfkIOPriv ()
#6  0x0000000000a8939f in kfdIOPriv ()
#7  0x0000000000b06eec in kfioReapIO ()
#8  0x0000000000b04de5 in kfioRequest ()
#9  0x00000000008d669a in ksfd_osmwat ()
#10 0x00000000008be64d in ksfdwtio ()
#11 0x00000000008bb3a4 in ksfdwat1 ()
#12 0x00000000008bb1f5 in ksfdrwat0 ()
#13 0x00000000008bb464 in ksfdblock ()
#14 0x00000000026c0e98 in kcflwi ()
#15 0x00000000026c0e31 in kcflci ()
#16 0x0000000001011435 in kcblci ()
#17 0x0000000001010e20 in kcblcio ()
#18 0x0000000001010ca2 in kcblsltck ()
#19 0x00000000020588f0 in stsCheckIO ()
#20 0x0000000002063908 in srsnext ()
#21 0x0000000002062eba in srsget ()
#22 0x000000000205d089 in sorgetqbf ()
#23 0x0000000002d64222 in qersoFetch ()
#24 0x0000000002d2e412 in qerliFetch ()
#25 0x00000000014327d2 in kdicrws ()
#26 0x000000000142fa47 in kdicdrv ()
#27 0x0000000002f2cf07 in opiexe ()
#28 0x00000000034c55d4 in opiosq0 ()
#29 0x00000000034c48db in opiosq ()
#30 0x00000000012e88f4 in opiodr ()
#31 0x0000000003a4b900 in ttcpip ()
#32 0x00000000012e3fc4 in opitsk ()
#33 0x00000000012e6ee4 in opiino ()
#34 0x00000000012e88f4 in opiodr ()
#35 0x00000000012da313 in opidrv ()
#36 0x0000000001e62466 in sou2o ()
#37 0x00000000006d24cb in opimai_real ()
#38 0x00000000006d241c in main ()
(gdb)

Looks like a hang in __read_nocancel() . It's the same as in Oracle stack trace.

An Excerpt from /var/log/messages:

Feb  2 22:46:51 gdksun1 kernel: 509 [RAIDarray.mpp]mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed.

At the same time, mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed appeared in /var/log/messages.



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
ubTools Support - 03/Feb/09 12:03 AM
The problem caused by __read_nocancel () from /lib64/libpthread.so.0.

OS Vendor driver looks incompatible with Oracle ASMLIB.