[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

DMU clients hang without explanation



                                                          Uppsala,  3-AUG-2001

    Hi,

    We have a Digital UNIX cluster consisting of some 30 nodes.
    One node is a DMU master, the other nodes boot from this node. Some 
    of the nodes are used for CPU intensive calculations and I/O work and 
    hang once a month or so without any trail in the error logs 
    (uerf, /var/adm/messages, /var/adm/syslog.dated). The other nodes 
    show no problems whatsoever.

    I suspect that it might be due to network access between the client
    nodes and the DMU master node. This maybe due to the CPU and I/O 
    intensive jobs running on these nodes. Does anybody know if this is
    a correct guess, and if so, what would be the best way to try to 
    improve the situation?

    The machines in question are:
    DMU master:   DPWS 600au, running Digital UNIX 4.0E
    DMU client 1: DPWS 600au,                      4.0E
               2: DS10                             4.0F
               3: DS10                             4.0F
               4: XP1000                           4.0F
    The DMU master and clients 1, 2 and 3 are connected to the same Cisco 
    XL3548 switch with full duplex 100Mb/s connections. The other DMU 
    clients are connected to similar Cisco switches via a Gigabit backbone. 
    The remaining nodes are DEC 3000/300 and AlphaStation 200 machines 
    running Digital UNIX 4.0E and 4.0F. We have no problems with these 
    machines, only with the clients #1-4.

    Thank you for your kind help,

                                   Roger Ruber.

  **************************************************************************
  *                      Roger Ruber, ruber@tsl.uu.se                      *
  *     The Svedberg Laboratory, P.O. Box 533, S-75121 Uppsala, Sweden     *
  *   +46 - 18 - 471 3109 (telephone)    (facsimile) +46 - 18 - 471 3833   *
  **************************************************************************