[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SUMMARY: hardware RAID domain panics



Summarized issue:
ES45, Tru64 5.1A (no patches)
External Hardware RAID: Western Scientific F4 Tornado RAID IDE-SCSI
    3TB partitioned & presented to Tru64 as 2TB and 1.3TB Luns. Each
    incorporated as single domain with single fileset each.

Successful usage, as is, until ~40% and ~56% capacity fill whereupon 
begin AdvFS I/O errors followed in short order by domain panics and
withdrawal of domain from service.

fixfdmn showed the following:

fixfdmn -n d12
fixfdmn: Checking the RBMT.
fixfdmn: Can't read page at block -660733904 on '/dev/disk/dsk12c'.
fixfdmn: Invalid argument
fixfdmn: Error correcting the RBMT.

Was this OS or hardware related?

Additional evidence later from examination of disklabel I applied:
#            size       offset    fstype  fsize  bsize   cpg  # ~Cyl
values
  a:       131072            0    unused      0      0        #      0 -
7
  b:       262144       131072    unused      0      0        #      8 -
23
  c:  -1651834880            0     AdvFS                      #      0 -
161323
  d:            0            0    unused      0      0        #      0 -
0
  e:            0            0    unused      0      0        #      0 -
0
  f:            0            0    unused      0      0        #      0 -
0
  g:   1321369600       393216    unused      0      0        #     24 -
80673
  h:   1321369600   1321762816    unused      0      0        #  80674 -
161323
-------------------------

Answer: Problem is twofold, and was not hardware related. 
	1. Patch Kit 3, at minimum, required - Advfs fixes
	   (I have installed Patch Kit 6 for 5.1A)
	2. Disklabel applied to the luns was wrong, as hinted
	   by the negative-integer partition sizes in the label.
	   I had applied a default disklabel by doing
		disklabel -rw dsk12
   	   This is wrong! I should have used the following syntax
 	   which forces disklabel to query the disk, in this case 
 	   the hardware RAID controller, for disk info:
		disklabel -rwt advfs dsk12 junk
 	   where 'junk' is anything not found in /etc/disktab

Many thanks to:
John Farmer
Bob Harris
Robert Collins
Alan Rollow
-- 
Neil R. Smith, Comp. Sys. Mngr.		neils@xxxxxxxx
Dept. Atmospheric Sci., Texas A&M Univ.	979/845-6272 FAX:979/862-4466