<< Back to previous view |
![]() |
[QA-45] 'direct path read temp' hangs on read() system call when ASMLIB in use. Created: 02/Feb/09 Updated: 03/Feb/09 |
|
Status: | Closed |
Project: | Questions & Answers |
Fix Version/s: | None |
Type: | Oracle - Operating System | Priority: | Major |
Reporter: | ubTools Support | Assignee: | ubTools Support |
Resolution: | Answered | Votes: | 0 |
Product Version: | Oracle 10.2.0.4 Standard Edition, RAC |
Operating System: | Linux |
Operating System Version: | SLES 10 SP1 (x86-64) |
Description |
Environment:
Database......: Oracle 10.2.0.4 Standard Edition, RAC ASMLIB........: oracleasm-2.6.16.46-0.12-smp-2.0.3-1.x86_64.rpm oracleasmlib-2.0.2-1.x86_64.rpm oracleasm-support-2.1.2-1.SLE10.x86_64.rpm Description: direct path read temp hangs on read() system call when ASMLIB in use. Diagnostic Data for Oracle: Wait Event: SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING'; SEQ# EVENT ---------- ---------------------------------------------------------------- P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT ---------- ---------- ---------- ---------- --------------- 46619 direct path read temp 202 285578 7 0 5611 ... SQL> select SEQ#,EVENT,P1,P2,P3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where sid=512 and state='WAITING'; SEQ# EVENT ---------- ---------------------------------------------------------------- P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT ---------- ---------- ---------- ---------- --------------- 46619 direct path read temp 202 285578 7 0 5824 SQL> The session is waiting for the completion of direct path read temp for 5824 seconds. The SEQ# column is not changing. It's TOO long to read just 7 blocks from the disk. Stack Trace: SQL> select spid from v$session s,v$process p where s.paddr=p.addr and s.sid=512; SPID ------------ 2359 SQL> oradebug SETOSPID 2359 Oracle pid: 38, Unix process pid: 2359, image: oracle@gdksun1 SQL> oradebug dump errorstack 3 Statement processed. SQL> oradebug TRACEFILE_NAME /u03/app/oracle/admin/ORCL/udump/orcl1_ora_2359.trc SQL> <from the trace file> Current SQL statement for this session: CREATE INDEX "ACD2" ON "ACCOUNT_DETAIL" ... ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FFF177ECC40 ? 7FFF177ECCA0 ? 7FFF177ECBE0 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FFF177ECC40 ? 7FFF177ECCA0 ? 7FFF177ECBE0 ? 000000000 ? ksdxfdmp()+1118 call ksedmp() 000000003 ? 000000001 ? 7FFF177ECC40 ? 7FFF177ECCA0 ? 7FFF177ECBE0 ? 000000000 ? ksdxcb()+1547 call ksdxfdmp() 7FFF177EDD90 ? 000000011 ? 000000003 ? 7FFF177EDED0 ? 7FFF177EDE30 ? 000000000 ? sspuser()+111 call ksdxcb() 000000001 ? 000000011 ? 000000001 ? 000000001 ? 7FFF177EDE30 ? 000000000 ? __funlockfile()+80 call sspuser() 000000001 ? 000000011 ? 000000001 ? 000000001 ? 7FFF177EDE30 ? 000000000 ? __read_nocancel()+7 signal __funlockfile() 00000000D ? 7FFF177EE970 ? 000000050 ? FFFFFFFFFFFFFFFF ? 000000000 ? 2B4E95CCE000 ? call_instance_read( call __read_nocancel() 00000000D ? 7FFF177EE970 ? )+12 000000050 ? FFFFFFFFFFFFFFFF ? 000000000 ? 2B4E95CCE000 ? asm_io_v2()+185 call call_instance_read( 00000000D ? 7FFF177EE970 ? ) 000000050 ? FFFFFFFFFFFFFFFF ? 000000000 ? 2B4E95CCE000 ? kfkOsmIO()+1205 call asm_io_v2() 00000000D ? 7FFF177EE970 ? 000000246 ? FFFFFFFFFFFFFFFF ? 000000000 ? 2B4E95CCE000 ? kfkReapIO()+497 call kfkOsmIO() 2B4E95830588 ? 2B4E95AAE000 ? 000000000 ? 000000000 ? 000000000 ? 2B4E95B2E000 ? kfkIOPriv()+770 call kfkReapIO() 000000000 ? 006110320 ? 2B4E95830588 ? 006110320 ? 006110320 ? 2B4E95B2E000 ? kfdIOPriv()+95 call kfkIOPriv() 000000000 ? 000000000 ? 000000024 ? 000000000 ? 2B4E95B66040 ? 000000001 ? kfioReapIO()+476 call kfdIOPriv() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 2B4E95B66040 ? 000000001 ? kfioRequest()+197 call kfioReapIO() 7FFF177EED68 ? 000000001 ? 0FFFFFFFF ? 000000000 ? 2B4E95B66040 ? 000000001 ? ksfd_osmwat()+874 call kfioRequest() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 7FFF177EED68 ? 2B4E00000001 ? ksfdwtio()+693 call ksfd_osmwat() 000000001 ? 000000000 ? 07FFFFFFF ? 000000000 ? 7FFF177EED68 ? 2B4E00000001 ? ksfdwat1()+220 call ksfdwtio() 000000001 ? 000000030 ? 07FFFFFFF ? 000000000 ? 7FFF177EED68 ? 2B4E00000001 ? ksfdrwat0()+1269 call ksfdwat1() 000000001 ? 000000030 ? 07FFFFFFF ? 000000000 ? 7FFF177EED68 ? 2B4E00000001 ? ksfdblock()+156 call ksfdrwat0() 000000001 ? 000000030 ? 07FFFFFFF ? 000000000 ? 2B4E7FFFFFFF ? 2B4E00000001 ? kcflwi()+48 call ksfdblock() 7FFF177F11C0 ? 000000001 ? 000000010 ? 000000000 ? 2B4E7FFFFFFF ? 2B4E00000001 ? kcflci()+689 call kcflwi() 2B4E95FF3F28 ? 000000001 ? 000000010 ? 000000000 ? 2B4E7FFFFFFF ? 2B4E00000001 ? kcblci()+197 call kcflci() 2B4E95FF3F28 ? 000000000 ? 0000000CA ? 000045B8A ? 7FFF177F1270 ? 000000000 ? kcblcio()+280 call kcblci() 2B4E95D168F0 ? 2B4E95FF3E70 ? 000000001 ? 000045B8A ? 7FFF177F1270 ? 000000000 ? kcblsltck()+50 call kcblcio() 2B4E95D168F0 ? 2B4E95FF3E70 ? 000000001 ? 000045B8A ? 7FFF177F1270 ? 000000000 ? stsCheckIO()+194 call kcblsltck() 2B4E95D168F0 ? 2B4E95FF3E70 ? 000000001 ? 000045B8A ? 7FFF177F1270 ? 000000000 ? srsnext()+746 call stsCheckIO() 2B4E95D16FE0 ? 2B4E958F9108 ? 000000000 ? 000000001 ? 7FFF177F1270 ? 000000000 ? srsget()+138 call srsnext() 2B4E9602FE14 ? 000000000 ? 2B4E95D16FE0 ? 2B4E958F8F10 ? 2B4E00000000 ? 000000000 ? sorgetqbf()+297 call srsget() 2B4E95D16F28 ? 000000000 ? 000000000 ? 000000000 ? 2B4E95D372B0 ? 2B4E95D37468 ? qersoFetch()+176 call sorgetqbf() 2B4E95D16F28 ? 2B4E95D37468 ? 2B4E95D372B0 ? 7FFF177F1584 ? 2B4E95D372B0 ? 2B4E95D37468 ? qerliFetch()+304 call qersoFetch() 5EEF015A0 ? 002D292E4 ? 7FFF177F1748 ? 000000001 ? 5EEF01648 ? 2B4E95D37468 ? kdicrws()+8744 call qerliFetch() 5EEF01358 ? 00143CCD6 ? 2B4E9581C140 ? 000000001 ? 5EEF01648 ? 5D2E60688 ? kdicdrv()+335 call kdicrws() 5D2E60688 ? 5D2E60B60 ? 000000000 ? 000000001 ? 2B4E95D367B0 ? 5D2E60610 ? opiexe()+12879 call kdicdrv() 5D2E60B60 ? 5D2E60380 ? 000000002 ? 000000001 ? 2B4E95D367B0 ? 000000000 ? opiosq0()+3316 call opiexe() 000000004 ? 000000000 ? 7FFF177F3748 ? 000000004 ? 2B4E95D367B0 ? 000000000 ? opiosq()+11 call opiosq0() 000000003 ? 00000000F ? 7FFF177F65E0 ? 000000000 ? 2B4E95D367B0 ? 000000000 ? opiodr()+984 call opiosq() 000000003 ? 00000000F ? 7FFF177F65E0 ? 000000000 ? 2B4E95D367B0 ? 000000000 ? ttcpip()+1012 call opiodr() 00000004A ? 00000000F ? 7FFF177F65E0 ? 000000004 ? 0053E3F30 ? 000000000 ? opitsk()+1322 call ttcpip() 0060AB150 ? 7FFF177F4540 ? 7FFF177F65E0 ? 000000000 ? 7FFF177F60D8 ? 7FFF177F6748 ? opiino()+1026 call opitsk() 000000003 ? 000000000 ? 7FFF177F65E0 ? 000000001 ? 000000000 ? 612CA0900000000 ? opiodr()+984 call opiino() 00000003C ? 000000004 ? 7FFF177F77A8 ? 000000000 ? 000000000 ? 612CA0900000000 ? opidrv()+547 call opiodr() 00000003C ? 000000004 ? 7FFF177F77A8 ? 000000000 ? 0053E3D00 ? 612CA0900000000 ? sou2o()+114 call opidrv() 00000003C ? 000000004 ? 7FFF177F77A8 ? 000000000 ? 0053E3D00 ? 612CA0900000000 ? opimai_real()+163 call sou2o() 7FFF177F7780 ? 00000003C ? 000000004 ? 7FFF177F77A8 ? 0053E3D00 ? 612CA0900000000 ? main()+116 call opimai_real() 000000002 ? 7FFF177F7810 ? 000000004 ? 7FFF177F77A8 ? 0053E3D00 ? 612CA0900000000 ? __libc_start_main() call main() 000000002 ? 7FFF177F7810 ? +244 000000004 ? 7FFF177F77A8 ? 0053E3D00 ? 612CA0900000000 ? _start()+41 call __libc_start_main() 0006D23A8 ? 000000002 ? 7FFF177F7968 ? 000000000 ? 0053E3D00 ? 000000002 ? Looks like an hang in __read_nocancel() . Diagnostic Data for Linux: strace Output: oracle@gdksun1:~> strace -fp 2359 Process 2359 attached - interrupt to quit read(13, The process is sleeping on the file descriptor(fd) of 13 by read() system call. lsof Output: oracle@gdksun1:~> lsof -p 2359|grep 13 oracle 2359 oracle DEL REG 0,12 131074 /2 oracle 2359 oracle mem REG 8,2 133423 16839 /lib64/ld-2.4.so oracle 2359 oracle mem REG 8,17 681761 52138 /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so oracle 2359 oracle mem REG 8,17 691049 52139 /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so oracle 2359 oracle mem REG 8,17 11385162 44025 /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so oracle 2359 oracle 6w REG 8,17 1494136 90989 /u03/app/oracle/admin/ORCL/bdump/alert_ORCL1.log oracle 2359 oracle 13u REG 0,19 0 18446604444591769000 /dev/oracleasm/iid/0000000000000002 oracle@gdksun1:~> The fd#13 is an ASM device. gdb output: oracle@gdksun1:~> gdb $ORACLE_HOME/bin/oracle 2359 GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"... Using host libthread_db library "/lib64/libthread_db.so.1". Attaching to program: /u03/app/oracle/product/10.2.0/db_1/bin/oracle, process 2359 Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxp10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libhasgen10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libskgxn2.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocr10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrb10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libocrutl10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libjox10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libclsra10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libdbcfg10.so Reading symbols from /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so...done. Loaded symbols for /u03/app/oracle/product/10.2.0/db_1/lib/libnnz10.so Reading symbols from /usr/lib64/libaio.so.1...done. Loaded symbols for /usr/lib64/libaio.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 47616513025792 (LWP 2359)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libnuma.so...done. Loaded symbols for /usr/lib64/libnuma.so Reading symbols from /opt/oracle/extapi/64/asm/orcl/1/libasm.so...done. Loaded symbols for /opt/oracle/extapi/64/asm/orcl/1/libasm.so 0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0 (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) backtrace #0 0x00002b4e9512f910 in __read_nocancel () from /lib64/libpthread.so.0 #1 0x00002b4e95d6772c in call_instance_read (priv=<value optimized out>, buf=0x7fff177ee970, size=80) at asmlib_v2.c:540 #2 0x00002b4e95d67869 in asm_io_v2 (ctx=0xd, requests=<value optimized out>, reqlen=582, waitreqs=0xffffffffffffffff, waitlen=0, completions=0x2b4e95cce000, complen=1, timeout=4294967295, statusp=0x7fff177eea14) at asmlib_v2.c:705 #3 0x0000000000b1804d in kfkOsmIO () #4 0x0000000000b113e9 in kfkReapIO () #5 0x0000000000b0b342 in kfkIOPriv () #6 0x0000000000a8939f in kfdIOPriv () #7 0x0000000000b06eec in kfioReapIO () #8 0x0000000000b04de5 in kfioRequest () #9 0x00000000008d669a in ksfd_osmwat () #10 0x00000000008be64d in ksfdwtio () #11 0x00000000008bb3a4 in ksfdwat1 () #12 0x00000000008bb1f5 in ksfdrwat0 () #13 0x00000000008bb464 in ksfdblock () #14 0x00000000026c0e98 in kcflwi () #15 0x00000000026c0e31 in kcflci () #16 0x0000000001011435 in kcblci () #17 0x0000000001010e20 in kcblcio () #18 0x0000000001010ca2 in kcblsltck () #19 0x00000000020588f0 in stsCheckIO () #20 0x0000000002063908 in srsnext () #21 0x0000000002062eba in srsget () #22 0x000000000205d089 in sorgetqbf () #23 0x0000000002d64222 in qersoFetch () #24 0x0000000002d2e412 in qerliFetch () #25 0x00000000014327d2 in kdicrws () #26 0x000000000142fa47 in kdicdrv () #27 0x0000000002f2cf07 in opiexe () #28 0x00000000034c55d4 in opiosq0 () #29 0x00000000034c48db in opiosq () #30 0x00000000012e88f4 in opiodr () #31 0x0000000003a4b900 in ttcpip () #32 0x00000000012e3fc4 in opitsk () #33 0x00000000012e6ee4 in opiino () #34 0x00000000012e88f4 in opiodr () #35 0x00000000012da313 in opidrv () #36 0x0000000001e62466 in sou2o () #37 0x00000000006d24cb in opimai_real () #38 0x00000000006d241c in main () (gdb) Looks like a hang in __read_nocancel() . It's the same as in Oracle stack trace. An Excerpt from /var/log/messages: Feb 2 22:46:51 gdksun1 kernel: 509 [RAIDarray.mpp]mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed. At the same time, mppLnx_do_queuecommand: mppLnx_scsi_execute_async failed appeared in /var/log/messages. |
Comments |
Comment by ubTools Support [ 03/Feb/09 12:03 AM ] |
The problem caused by __read_nocancel () from /lib64/libpthread.so.0.
OS Vendor driver looks incompatible with Oracle ASMLIB. |