Re: Pm3.3.2c1 upgrade

Rich Adamson (adar0@routers.com)
Fri, 27 Sep 96 17:14:35 CST

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Luc Croteau: "Radius authentication"
Previous message: David Denney: "Re: Microcom Modems"

>VERY OLD Pm2e's have some problem which causes the box to lose part of
>its mind when upgraded to 3.3.2. Along with the S0 problem which can
>be solved by reformatting, there is also a problem the Ethernet MAC
>address changing to some constant value. If you notice the bad
>MAC address, the box REQUIRES RMAing to fix.

I've just spent the better part of three weeks upgrading various
'older' PM2's (e.g., 2.4, 3.1.4) to 3.3.2c1, and think I can add
constructive comments to the above.

1. The problem referred to above regarding the changing of the
Mac address happens with the older models that have an EPROM
version of "B", and possibly those up to about "E". The problem
is easy to spot as the mac address will change to "<cr><lr>WARN"
if you translate the hex codes. You can see the changed mac address
in every device (router, unix system) that has a displayable
arp cache. If it affects your operation (ie, you have two or more
older PM2's on the exact same ethernet segment), and, it actually
does create a problem for you, then Livingston will replace the EPROM's.
However, there are no significant reasons for people to replace
the EPROMs on older PM2's just to "remain current".

2. None of the problems seen in upgrading lots of PM2's have
required any form of RMA's. There are valid work arounds for
just about every problem. The exception is truely when the
NVRAM is lost, and for the most part, one would not be able to
predetermine or test it to know if it went south. The issue
becomes one of having to "change" NVRAM values, and once a problem
is identified, there's no going back. Since the NVRAM is glued
onto the motherboard, one basically has to replace the motherboard.

3. The upgrading of older PM2's (pre 3.0) to later releases is
very unpredicable at best. We have upgraded several boxes from
2.4 -> 3.0.4 -> 3.3.2c1 with success, and have another batch that
have been total failures. The failures seem to relate to the
"amount" of traffic going through the box at the time of upgrading,
and, to the number of ports (and their exact configuration) at the
time of upgrading. I've had lots of PM2 10-port boxes be a success
in upgrading when no traffic is going through; had ZERO success and
multiple failures with the exact same box that traffic moving at the
same time that an upgrade was applied. And, have seen the exact
same failures with 3.1.4; the more traffic, the higher probability
of upgrade failure.

>Megazone, if you don't know about this, PLEASE look up RMA 5141
>in your records. This is one of the boxes we sent in for this
>condition. Some engineer at Livinston gave my boss the reason
>why this happens to older boxes at Interop. I need to find out
>from him.
>
>So, if you're having problems, check your MAC address!
>
>Now, on all our newer boxes, we have had no problems in upgrading.
>So, aside from the hardware, things have been happy.
>
>> Like people who
>> started getting weird local IPs for users - they must have had some random
>> bits misplaced that were in areas not used for 3.3.1 or prior, and when they
>> loaded 3.3.2 or up suddently those bits were sitting in the middle of a user
>> data structure.
>
>I would question an upgrade program that didn't first "memset"
>usable memory addresses to a known state.

It is somewhat upsetting that changes to NVRAM layout actually occur
with production hardware, and the vendor has not detected/addressed
the issue in the upgrades. Based upon my experience with the product,
it is highly likely the change in NVRAM configurations was not "planned"
but rather the result of insufficient pre-release testing. Its
certainly not that difficult to read between the lines to understand
the level of bugs in the later ComOS releases that we've all seen.

...

>> There is. If it detects an error during the upgrade it aborts. But 'data'
>> is not considered an error, and the PM doesn't know if data in the FLASH is
>> supposed to be there or not, it could be spurious bits set in a data
>> structure. Everything I've seen so far has been either spurious data (the
>> odd local IPs, the S0 port acting weird) or a HW problem on the FLASH. The
>> former is recoverable with a wipe and reconfigure, the later is an RMA.
>
>If the mapping has changed to such a serious degree, why was the
>install program not designed to re-init all affected structures to
>stable, maybe empty, values? I would rather be told that I will
>need to re-init all ports on a PM (say, with expect) than have
>something lose its mind. But, I'm not in the Livingston development
>loop, so my reasoning may be faulty.

See my response above. You already know it wasn't "planned". Add to
that the very significant list of serious problems with PMconsole for
Windows, and you can guess at production release objectives/deadline
issues within the company.

Rich
adar0@routers.com

Next message: Luc Croteau: "Radius authentication"
Previous message: David Denney: "Re: Microcom Modems"