<< Back to previous view |
![]() |
[QA-48] Unable to start VIP because of invalid RX packets numbers. Created: 18/Mar/09 Updated: 19/Mar/09 |
|
Status: | Closed |
Project: | Questions & Answers |
Fix Version/s: | None |
Type: | Oracle - Operating System | Priority: | Major |
Reporter: | ubTools Support | Assignee: | ubTools Support |
Resolution: | Answered | Votes: | 0 |
Product Version: | Oracle 10.2.0.4, RAC |
Operating System: | IBM-AIX |
Operating System Version: | 6.1 |
Description |
*When starting a VIP on a node, it fails and started on the other node.
Starting the VIP: # ./crs_start ora.akyorap2.vip Attempting to start `ora.akyorap2.vip` on member `akyorap2` Start of `ora.akyorap2.vip` on member `akyorap2` failed. Attempting to start `ora.akyorap2.vip` on member `akyorap1` Start of `ora.akyorap2.vip` on member `akyorap1` succeeded. # The log level increased to get more detailed diagnostic data. Setting Log Level: #./crsctl debug log res "ora.akyorap2.vip:1" Set Resource Debug Module: ora.akyorap2.vip Level: 1 # Errors from the Log: Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] checkIf: start for if=en1 Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] IsIfAlive: start for if=en1 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] defaultgw: started Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] defaultgw: completed with 10.46.1 80.1 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: RX packets checked if=en1 failed Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] Interface en1 checked failed (host =akyorap2) Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: end for if=en1 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] checkIf: end for if=en1 Invalid parameters, or failed to bring up VIP (host=akyorap2) |
Comments |
Comment by ubTools Support [ 18/Mar/09 08:10 PM ] |
The problem raised from IsIfAlive() of $ORA_CRS_HOME/racgvip.
Here are the related excerpt from racgvip: # Check the status of the interface thro' pinging gateway if [ -n "$DEFAULTGW" ] then _RET=1 # get base IP address of the interface tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'` # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` x=$CHECK_TIMES while [ $x -gt 0 ] do if [ -n "$tmpIP" ] then logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW " $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 else logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW" $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` if [ "$_O1" != "$_O2" ] then # RX packets numbers changed _RET=0 break fi $SLEEP 1 x=`$EXPR $x - 1` done if [ $_RET -ne 0 ] then logx "IsIfAlive: RX packets checked if=$_IF failed" else logx "IsIfAlive: RX packets checked if=$_IF OK" fi .... According to the the code above, it does the followings:
|
Comment by ubTools Support [ 18/Mar/09 08:28 PM ] |
racgvip was modified as below to dump the values of _O1 and _O2:
... # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` logx "--------------> by dunal: _O1: $_O1" x=$CHECK_TIMES while [ $x -gt 0 ] do if [ -n "$tmpIP" ] then logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW " $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 else logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW" $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` logx "--------------> by dunal: _O2: $_O2" ... As seen above, logx "--------------> by dunal: ..." lines are added to the script. Don't do that if you're not sure about what you do. After restarting the VIP, the values of _O1 and _O2 are dumped in the logs. Failed Node: ... Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O1: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:50 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: RX packets checked if=en1 failed Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] Interface en1 checked failed (host =akyorap2) ... As seen above, the values are '-'. It's wrong. But, they are same. So, RX packet number not changed. Successful Node: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O1: 17297 2009-03-18 20:58:55.793: [ RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] About to execute command: /usr/sbin/ping -S 10.46.180.51 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O2: 17298 2009-03-18 20:58:55.793: [ RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] IsIfAlive: RX packets checked if=en1 OK _O1 and _O2 are different. That means RX packet number changed and the interface is up. |
Comment by ubTools Support [ 18/Mar/09 08:44 PM ] |
netstat Output on Failed Node: /usr/bin/netstat -f inet -n -I en1 | /usr/bin/awk "{ if (/^en1/) {print $5; exit}}" en1 1500 link#3 0.21.5e.34.55.bc - 34601 0 16269 3 0 The column#5 is '-'. This is wrong and caused the problem. netstat Output on Successful Node: en1 1500 link#3 0.21.5e.34.57.fe 29223 0 10609 3 0 The column#5 is 29223. This is expected number. Headers of netstat on Failed Node: #/usr/bin/netstat -f inet -n -I en1 Name Mtu Network Address ZoneID Ipkts Ierrs Opkts Oerrs Coll en1 1500 link#3 0.21.5e.34.55.bc - 35645 0 16801 3 0 en1 1500 10.46.180 10.46.180.52 - 35645 0 16801 3 0 Headers of netstat on Successful Node: #/usr/bin/netstat -f inet -n -I en1 Name Mtu Network Address ZoneID Ipkts Ierrs Opkts Oerrs Coll en1 1500 link#3 0.21.5e.34.57.fe 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.51 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.53 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.54 29743 0 10762 3 0 The difference is the ZoneID column. Looks like a network configuration problem. This issue will be open for an update from Network Administrators. |
Comment by ubTools Support [ 19/Mar/09 12:54 PM ] |
The Network Adminisitrator said it was an AIX Bug:
But, this fix changes ZoneID from blank value to '-'. After this fix, no VIP could be started. |
Comment by ubTools Support [ 19/Mar/09 01:11 PM ] |
No solution found from Metalink. |
Comment by ubTools Support [ 19/Mar/09 01:45 PM ] |
Looks like an inconsistency of Oracle on AIX 6.1.
Workaround: Capturing column number of netstat must be changed from 5 to 6. Original lines for _O1: ... tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'` # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` x=$CHECK_TIMES while [ $x -gt 0 ] ... Modified line for _O1: ... tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'` # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"` x=$CHECK_TIMES while [ $x -gt 0 ] ... Original lines for _O2: ... fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` if [ "$_O1" != "$_O2" ] then # RX packets numbers changed ... Modified line for _O2: ... fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"` if [ "$_O1" != "$_O2" ] then # RX packets numbers changed ... Then, VIP could be started on the correct nodes: ./crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....ap1.gsd application ONLINE ONLINE akyorap1 ora....ap1.ons application ONLINE ONLINE akyorap1 ora....ap1.vip application ONLINE ONLINE akyorap1 ora....ap2.gsd application ONLINE ONLINE akyorap2 ora....ap2.ons application ONLINE ONLINE akyorap2 ora....ap2.vip application ONLINE ONLINE akyorap2 Note: Don't edit Oracle scripts unless you know what you're doing. |