AnsweredAssumed Answered

Server crashes everyday - rejecting I/O to offline device

Question asked by Saravanan Sanmugavel on Feb 20, 2016
Latest reply on Feb 24, 2016 by Abid Syed

We are using Hitachi DF600F on Ubuntu Trusty (Ubuntu 14.04.2 LTS) and server crashes everyday with below error, can someone please take a look and suggest what could be wrong.

 

Feb 20 05:00:02 rwcr6s5 kernel: [18317.194933] scsi host2: BM_4729 : Alloc of WRB_HANDLE Failedfor the CID : 384

Feb 20 05:00:57 rwcr6s5 kernel: [18372.738283] scsi host2: BM_2208 : Event CXN_KILLED_BAD_WRB_INDEX_ERROR[15] received on CID : 384

Feb 20 05:00:57 rwcr6s5 kernel: [18372.739748]  connection2:0: detected conn error (1011)

Feb 20 05:00:58 rwcr6s5 iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 05:01:01 rwcr6s5 iscsid: connection2:0 is operational after recovery (1 attempts)

Feb 20 05:01:15 rwcr6s5 kernel: [18390.876217] scsi host2: BM_2177 : Cmd Notification CMD_KILLED_INVALID_R2T_RCVD[19] on CID : 385

Feb 20 05:01:15 rwcr6s5 kernel: [18390.877966] scsi host2: BM_1111 : In ISCSI_OP_REJECT

Feb 20 05:01:15 rwcr6s5 kernel: [18390.878950]  connection2:0: pdu (op 0x5 itt 0x5c01e400) rejected. Reason code 0x9

Feb 20 05:01:15 rwcr6s5 kernel: [18390.880651] scsi host2: BM_2208 : Event CXN_KILLED_FIN_RCVD[13] received on CID : 385

Feb 20 05:01:15 rwcr6s5 kernel: [18390.882128]  connection2:0: detected conn error (1011)

Feb 20 05:01:16 rwcr6s5 iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 05:01:19 rwcr6s5 iscsid: connection2:0 is operational after recovery (1 attempts)

Feb 20 05:02:17 rwcr6s5 kernel: [18452.769625] scsi host2: BM_2208 : Event CXN_KILLED_BAD_WRB_INDEX_ERROR[15] received on CID : 386

Feb 20 05:02:17 rwcr6s5 kernel: [18452.770919]  connection2:0: detected conn error (1011)

Feb 20 05:02:18 rwcr6s5 iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 05:02:21 rwcr6s5 iscsid: connection2:0 is operational after recovery (1 attempts)

Feb 20 05:02:27 rwcr6s5 kernel: [18463.059157] scsi host2: BM_2208 : Event CXN_KILLED_BAD_WRB_INDEX_ERROR[15] received on CID : 387

Feb 20 05:02:27 rwcr6s5 kernel: [18463.060591]  connection2:0: detected conn error (1011)

.

.

.

Feb 20 06:05:16 rwcr6s5 iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 06:05:19 rwcr6s5 iscsid: connection2:0 is operational after recovery (1 attempts)

Feb 20 06:05:46 rwcr6s5 kernel: [22260.482950] scsi host2: BM_1111 : In ISCSI_OP_REJECT

Feb 20 06:05:46 rwcr6s5 kernel: [22260.510666]  connection2:0: pdu (op 0x5 itt 0x5c01dd00) rejected. Reason code 0x9

Feb 20 06:05:46 rwcr6s5 kernel: [22260.535976] scsi host3: BG_134 : Error detected on the adapter

Feb 20 06:05:46 rwcr6s5 kernel: [22260.535977] scsi host3: BG_143 : UE_LOW TXULP1  bit set

Feb 20 06:05:46 rwcr6s5 kernel: [22260.571966] scsi host2: BG_134 : Error detected on the adapter

Feb 20 06:05:46 rwcr6s5 kernel: [22260.571967] scsi host2: BG_143 : UE_LOW TXULP1  bit set

Feb 20 06:05:46 rwcr6s5 kernel: [22260.674788] scsi host2: BM_2208 : Event CXN_KILLED_FIN_RCVD[13] received on CID : 441

Feb 20 06:05:46 rwcr6s5 kernel: [22260.725516]  connection2:0: detected conn error (1011)

Feb 20 06:05:46 rwcr6s5 kernel: [22260.725606] be2net 0000:04:00.1: Unrecoverable Error detected in the adapter

Feb 20 06:05:46 rwcr6s5 kernel: [22260.751254] be2net 0000:04:00.1: Please reboot server to recover

Feb 20 06:05:46 rwcr6s5 kernel: [22260.776343] be2net 0000:04:00.1: UE: TXULP1  bit set

Feb 20 06:05:46 rwcr6s5 kernel: [22260.801178] be2net 0000:04:00.0: Unrecoverable Error detected in the adapter

Feb 20 06:05:46 rwcr6s5 kernel: [22260.825605] be2net 0000:04:00.0: Please reboot server to recover

Feb 20 06:05:46 rwcr6s5 kernel: [22260.849518] be2net 0000:04:00.0: UE: TXULP1  bit set

Feb 20 06:05:46 rwcr6s5 kernel: [22260.851903] bonding: bond0: link status down for backup interface em50, disabling it in 200 ms

Feb 20 06:05:46 rwcr6s5 kernel: [22260.951905] bonding: bond0: link status down for active interface em49, disabling it in 200 ms

Feb 20 06:05:46 rwcr6s5 kernel: [22261.055847] bonding: bond0: link status definitely down for interface em50, disabling it

Feb 20 06:05:46 rwcr6s5 kernel: [22261.159880] bonding: bond0: link status definitely down for interface em49, disabling it

Feb 20 06:05:46 rwcr6s5 kernel: [22261.159884] device em49 left promiscuous mode

Feb 20 06:05:46 rwcr6s5 kernel: [22261.159904] be2net 0000:04:00.0: Setting HW VLAN filtering failed.

Feb 20 06:05:46 rwcr6s5 iscsid: Kernel reported iSCSI connection 2:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 06:05:47 rwcr6s5 kernel: [22261.871833] br0: port 1(bond0) entered disabled state

Feb 20 06:05:47 rwcr6s5 kernel: [22261.871980] br2: port 1(bond0.1300) entered disabled state

Feb 20 06:05:47 rwcr6s5 kernel: [22261.872078] br3: port 1(bond0.1320) entered disabled state

Feb 20 06:05:47 rwcr6s5 kernel: [22261.872180] br1: port 1(bond0.1321) entered disabled state

Feb 20 06:05:49 rwcr6s5 iscsid: Received iferror -5: Input/output error.

Feb 20 06:05:49 rwcr6s5 iscsid: cannot make a connection to 192.168.110.250:3260 (-5,2)

Feb 20 06:05:52 rwcr6s5 iscsid: Received iferror -5: Input/output error.

Feb 20 06:05:52 rwcr6s5 iscsid: cannot make a connection to 192.168.110.250:3260 (-5,2)

Feb 20 06:05:53 rwcr6s5 kernel: [22267.566711]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4300457632, last ping 4300458884, now 4300460136

Feb 20 06:05:53 rwcr6s5 kernel: [22267.624840]  connection1:0: detected conn error (1011)

Feb 20 06:05:53 rwcr6s5 iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Feb 20 06:05:55 rwcr6s5 iscsid: Received iferror -5: Input/output error.

Feb 20 06:05:55 rwcr6s5 iscsid: cannot make a connection to 192.168.110.250:3260 (-5,2)

Feb 20 06:05:56 rwcr6s5 iscsid: Received iferror -5: Input/output error.

.

.

.

Feb 20 06:22:45 rwcr6s5 iscsid: Received iferror -5: Input/output error.

Feb 20 06:22:45 rwcr6s5 iscsid: cannot make a connection to 192.168.110.250:3260 (-5,2)

Feb 20 06:22:45 rwcr6s5 kernel: [23280.016267] sd 2:0:1:2: rejecting I/O to offline device

Feb 20 06:22:45 rwcr6s5 kernel: [23280.027098] sd 2:0:1:2: rejecting I/O to offline device

Feb 20 06:22:45 rwcr6s5 kernel: [23280.033387] sd 2:0:1:2: rejecting I/O to offline device

Feb 20 06:22:45 rwcr6s5 kernel: [23280.038438] sd 2:0:1:2: rejecting I/O to offline device

 

 

Below are the kernel and package details :

 

uname -a

Linux reanvrsv101 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

 

dpkg -l | grep -i scsi

ii  lsscsi                              0.27-2                                amd64        list all SCSI devices (or hosts) currently on system

ii  open-iscsi                          2.0.873-3ubuntu9                      amd64        High performance, transport independent iSCSI implementation

 

dpkg -l | grep -i multipath

ii  multipath-tools                     0.4.9-3ubuntu7.4                      amd64        maintain multipath block device access

ii  multipath-tools-boot                0.4.9-3ubuntu7.4                      all          Support booting from multipath devices

 

 

multipath -ll

Error: : Inappropriate ioctl for device

cciss TUR  failed in CCISS_GETLUNINFO: Inappropriate ioctl for device

lun5 (360060e801059677005701f07000000f5) dm-44 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:5  sdo  8:224  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:5  sdm  8:192  active ready  running

lun4 (360060e801059677005701f07000000f4) dm-46 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:4  sdk  8:160  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:4  sdl  8:176  active ready  running

lun3 (360060e801059677005701f07000000f3) dm-41 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:3  sdj  8:144  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:3  sdi  8:128  active ready  running

lun2 (360060e801059677005701f07000000f2) dm-47 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:2  sdg  8:96   active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:2  sdh  8:112  active ready  running

lun1 (360060e801059677005701f07000000f1) dm-36 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:1  sdf  8:80   active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:1  sde  8:64   active ready  running

lun12 (360060e801059677005701f07000000fc) dm-39 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:12 sdw  65:96  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:12 sdah 66:16  active ready  running

lun11 (360060e801059677005701f07000000fb) dm-38 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:11 sdaf 65:240 active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:11 sdv  65:80  active ready  running

lun9 (360060e801059677005701f07000000f9) dm-43 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:9  sdaa 65:160 active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:9  sds  65:32  active ready  running

lun10 (360060e801059677005701f07000000fa) dm-42 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:10 sdt  65:48  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:10 sdac 65:192 active ready  running

lun8 (360060e801059677005701f07000000f8) dm-48 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:8  sdq  65:0   active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:8  sdx  65:112 active ready  running

lun7 (360060e801059677005701f07000000f7) dm-45 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 2:0:1:7  sdu  65:64  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 3:0:0:7  sdp  8:240  active ready  running

lun6 (360060e801059677005701f07000000f6) dm-40 HITACHI,DF600F

size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=1 status=active

| `- 3:0:0:6  sdn  8:208  active ready  running

`-+- policy='round-robin 0' prio=0 status=enabled

  `- 2:0:1:6  sdr  65:16  active ready  running

Outcomes