We are a fairly large ISP with several Portmaster PM2E-30's in several
different geographical locations. These units are all on UPS, and are all
on UTP ethernet, on a couple different heterogenous LANs, mostly linux
servers, a couple Suns. NYNEX T1's interconnect the LANs, with IRX 114's
and 111's on either end of the T1's, with Adtran and TXport CSU/DSU's
(each T1 with the same CSU/DSU on either end). We are using Cardinal v.34
modems (damn good, btw, and very inexpensive) and have some Motorola
BitSurfr's and UTA220's on the async ports of these units. THe problem is
this:
We have an alarming number of incidents where part of the FLASH in the
Portmaster becomes corrupted, usually the `System Messages' area, so that
every once in a while, a Portmaster will lose its brains, and upon any
connection to the unit, instead of the usual, comforting
ComOS - Livingston PortMaster
one sees something like
SYSMSG453 SYSMSG455 SYSMSG932
and no login: prompt appears until a CR prompts it. After logging in, a
`sho ses' (or any other command, for that matter) elicits the same
response from the Portmaster.
In most cases, either re-applying the ComOS (through PMConsole) or a FLASH
format and then ComOS reload solves the problem. However, there have been
a few times when we have not been able to recover the machine, and have
had to RMA the unit.
After the first couple times this happened, the folks at Livingston
indicated that this could be the result of power spikes/surges, so we put
all the units on conditioned power. Besides, this happens to Portmasters
in geographically diverse locations, so it is unlikely that a power
problem is the cause.
We are baffled. I have been reading this list for close to 2 years now,
and have _maybe_ once heard of a similar problem, perhaps happening once
to one person/organization. However, this happens to us fairly frequently,
maybe once a month or more. Early on, it happened to 4 different machines
in one week, in 2 different locations. However, this was before the PM's
were put onto UPS's.
Does anyone have any idea what is so different about our network that
could cause this sort of problem? All our subscribers are doing SLIP/PPP
with either MacTCP/MacPPP, Windoze3.1 with Trumpet or some commercial
package, or Win95. We use RADIUS authentication on a linux machine. I
cannot think of any other info that might be relevant to this.
Any clues?
Chris Woods Senior System Administrator USAinternet, Inc.
GCS/CM/IT d- s++:+ a- C++++$ ULS++++$ P+++$>++++ L++++$ E W$ N+ !o
K++ !w--- !O !M-- !V-- PS+? !PE !Y+>++ PGP+ t+@ !5 X !R tv? b+ DI++
D+@ G++ e h---- r+++ y++++
cjwoods@usa1.net http://www.usa1.com 508-774-4700