[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SUMMARY: trucluster question



I received two replies, both indicating that it is expected that one 
node of the cluster would crash. I find this somewhat surprising (and
distressing) since I expected that the orphan node would wait patiently
until the cluster interconnect was restored, at which point it would 
rejoin the cluster.

Steve

On Wed, Jul 31, 2002 at 03:05:40PM -0400, Steve Feehan wrote:
> I have just setup a two node trucluster (5.1a) on two DS10Ls w/
> a LAN interconnect.
> 
> To see what happens when the LAN interconnect was broken, I unplugged
> the cable. Before disconnecting, I spread the file systems across the
> two nodes (ie. / and /var on member1, /usr on member2) just to make
> things interesting.
> 
> I disconnected the cable and both systems appeard to hang, which is
> expected. 
> 
> After about two minutes one node came back online, with cfsmgr 
> showing that it had taken over the other nodes file systems.
> 
> The unexpected bit is that the other node had crashed. I switched over
> to the console to find it at the >>> prompt.
> 
> So my two questions:
> 
>  1. why did one of the members crash?
> 
>  2. is there a way to reduce the timeout between the clusters 
>     separating, and a member taking over? And if so, is this a good
>     idea or should I not mess with the defaults?
> 
> Thanks.
> 
> -- 
> Steve Feehan
> Unix Systems Administrator
> Structural Biology and Bioinformatics Group
> University of Vermont

-- 
Steve Feehan