zoukankan      html  css  js  c++  java
  • Infiniband IPoIB Debug FAQ

    Here's an update to my initial attempt at an IPoIB FAQ:

    ping doesn't work between IPoIB nodes. What should I do ?

    First, verify that the ports are active.

    This can be done via:

    cat /sys/class/infiniband/mthca0/ports/1/state

    This should indicate 4: ACTIVE

    assuming the HCA is mthca0 and port 1 is the one plugged into the subnet
    (switch, etc.).

    If the port is not active, there could be several reasons:

    1. You need an SM in your subnet to bring the ports to active. Do you
    have an SM ? This can be embedded in a switch or some other IB hardware
    or run on an end node (HCA) although OpenIB (gen2) does not currently
    support this.

    2. If you have an SM in your subnet, there might be a cabling problem
    where the SM cannot "reach" your end node.

    If the port is active, indicate the subnet configuration and which SM is
    being utilized.

    Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0"
    show anything on the other nodes when you try to ping or something?

    There are 2 levels of IPoIB debug which can be enabled when building:
    IP-over-InfiniBand debugging and IP-over-InfiniBand data path debugging.
    The latter has performance implications and should only be enabled when
    all else fails. Enable the first level of IPoIB debug and then:

    mount -t ipoib_debugfs none /ipoib_debufs/
    cat /ipoib_debugfs/ib0_mcg

    Other things to verify and supply to help isolate the problem:

    1. Verify the firmware version via

    cat /sys/class/infiniband/mthca0/fw_ver

    For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version
    4.5.3 is recommended.

    2. Make sure the IB modules are loaded:
    /sbin/lsmod | grep ib_
    should show ib_mthca (HCA driver) as well as ib_ipoib. There are others
    but those are the two which need to be loaded and any others will
    follow.

    3. Make sure there are no errors in /var/log/messages pertaining to ib_.

    4. Indicate the IP configuration via
    /sbin/ifconfig -a
    and
    ip addr show dev ib0
    (assuming ib0 is the network interface being configured)

    This is because ifconfig can only show the first 16 octets of the HW
    address (and the last two bytes are actually wrong, because the
    SIOGIFHWADDR ioctl that it uses can only return 14 bytes). IPoIB has
    a 20 byte HW address; the four (or six?) bytes that get cut off are
    the low-order bytes of the port GID, which is probably where the
    difference between port GIDs is.

    To see the real IB hardware address, you need to do something like "ip addr show dev ib0". For example,
    Code: Select all
        # ifconfig ib0
        ib0       Link encap:UNSPEC  HWaddr
    00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
                  BROADCAST MULTICAST  MTU:2044  Metric:1
                  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
                  collisions:0 txqueuelen:128
                  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

        # ip addr show dev ib0
        5: ib0: <BROADCAST,MULTICAST> mtu 2044 qdisc noop qlen 128
            link/[32]
    00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:01:07:8c:e4:61 brd
    00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff


    5. Use
    ip neigh show dev ib0
    to display ARP table for IB interface ib0
  • 相关阅读:
    HDU-ACM“菜鸟先飞”冬训系列赛——第9场
    HDU3092:Least common multiple(素数筛选+完全背包)
    HDU1452:Happy 2004(积性函数)(因子和)
    HDU-ACM“菜鸟先飞”冬训系列赛——第8场(1004)
    Codeforces Round #395 (Div. 2)
    一些算法技巧整理(添加中)
    poj 1852 Ants
    hdu 1587 Flowers
    hdu 1563 Find your present!
    hdu 1570 AC
  • 原文地址:https://www.cnblogs.com/super119/p/2017826.html
Copyright © 2011-2022 走看看