HBA Card Fault 시 Log 확인 및 HBA Card 교체를 위한 PCI 슬롯 정보 확인

System Story/CentOS 5,6 2014. 3. 29. 21:09

Linux OS 이하에서 SAN Storage 사용중 HBA Card Port Fault 발생시 Card 교체를 위한 확인 과정은 다음과 같다.

 

 

1. OS 이하 /var/log/messages 상 FC Link Down 메시지를 확인후 해당 HBA Card의 PCI Bus Address 정보 확인

 

[root@marine1 ~]# cat  /var/log/messages | grep "kernel: lpfc" -A7

 

Mar 26 01:29:43 marine1 kernel: lpfc 0000:0b:00.1: 1:1305 Link Down Event xe received Data: xe x20 x80110 x0 x0

Mar 26 01:30:13 marine1 kernel:  rport-3:0-2: blocked FC remote port time out: saving binding

Mar 26 01:30:13 marine1 kernel: Error:Mpx:Path Bus 3 Tgt 0 Lun 141 to 000492600113 is dead.

Mar 26 01:30:13 marine1 kernel: Error:Mpx:Path Bus 3 Tgt 0 Lun 161 to 000492600113 is dead.

Mar 26 01:30:13 marine1 kernel: Error:Mpx:Path Bus 3 Tgt 0 Lun 162 to 000492600113 is dead.

Mar 26 01:30:13 marine1 kernel: Error:Mpx:Path Bus 3 Tgt 0 Lun 163 to 000492600113 is dead.

Mar 26 01:30:13 marine1 kernel: Error:Mpx:Path Bus 3 Tgt 0 Lun 9 to 000492600113 is dead.

 

 

2. 확인된 PCI BUS 정보를 기준으로 HBA Card 정보와 해당 PCI Slot 정보의 조회

 

[root@marine1 ~]#

[root@marine1 ~]# lspci | grep -i "fibre channel" | grep -i "0b:00.1"

0b:00.1 Fibre Channel: Emulex Corporation Zephyr-X LightPulse Fibre Channel Host Adapter (rev 02)

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]# dmidecode -t slot | grep "0b:00" -B7

Designation: PCI-E Slot 9

Type: x8 PCI Express Gen 2 x16

Current Usage: In Use

Length: Long

Characteristics:

3.3 V is provided

PME signal is supported

Bus Address: 0000:0b:00.0

 

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]# find /sys/class/fc_host/ -type l -name device -exec ls -l {} \; | grep -i "0b:00.1"

lrwxrwxrwx 1 root root 0 3 26 11:35 /sys/class/fc_host/host3/device -> ../../../devices/pci0000:00/0000:00:07.0/0000:0b:00.1/host3

[root@marine1 ~]#

 

 

3. 해당 PCI Bus 정보 및 맵핑된 Host Number를 기준으로 Storage Level 에서 Fault 확인 (예시 EMC Storage 기준)

 

 

[root@marine1 ~]#

[root@marine1 ~]# powermt display

Symmetrix logical device count=163

CLARiiON logical device count=0

Invista logical device count=0

==============================================================================

----- Host Bus Adapters --------- ------ I/O Paths ----- ------ Stats ------

### HW Path Summary Total Dead IO/Sec Q-IOs Errors

==============================================================================

2 lpfc optimal 163 0 - 0 0

3 lpfc failed 163 163 - 0 163

4 lpfc optimal 163 0 - 0 0

5 lpfc optimal 163 0 - 0 0

6 lpfc optimal 163 0 - 0 0

7 lpfc optimal 163 0 - 0 0

8 lpfc optimal 163 0 - 0 0

9 lpfc optimal 163 0 - 0 0

10 lpfc optimal 163 0 - 0 0

11 lpfc optimal 163 0 - 0 0

12 lpfc optimal 163 0 - 0 0

13 lpfc optimal 163 0 - 0 0

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]# powermt display dev=all | head -8 && powermt display dev=all | grep '1$'

Pseudo name=emcpowera

Symmetrix ID=000492600113

Logical device ID=0055

state=alive; policy=SymmOpt; priority=0; queued-IOs=0;

==============================================================================

--------------- Host --------------- - Stor - -- I/O Path -- -- Stats ---

### HW Path I/O Paths Interf. Mode State Q-IOs Errors

==============================================================================

3 lpfc sdfj FA 10fB active dead 0 1

3 lpfc sdfk FA 10fB active dead 0 1

3 lpfc sdfl FA 10fB active dead 0 1

3 lpfc sdfm FA 10fB active dead 0 1

3 lpfc sdfn FA 10fB active dead 0 1

3 lpfc sdfo FA 10fB active dead 0 1

3 lpfc sdfp FA 10fB active dead 0 1

 

.

.

.

 

생략

 

[root@marine1 ~]#

[root@marine1 ~]#

[root@marine1 ~]# powermt check dev=all

Warning: Symmetrix device path sdfj is currently dead.

Do you want to remove it (y/n/a/q)?

[root@marine1 ~]#

[root@marine1

 

tags : , , , , , ,
Trackbacks 0 : Comments 0

Write a comment