Here is Sun's commentary on the bug (thanks Barron)... note that the
patches mention havent been released yet last time i looked.
On 7 Jun 1996 23:36:21 GMT, in comp.unix.solaris cathe@Eng.Sun.COM
(Cathe A. Ray) wrote:
>Sun doesn't ordinarily announce patches when they're released. But
>we've just finished a series of TCP-related fixes and improvements, and
>we want to make sure that the news gets out as quickly as possible to
>the many people who can benefit from our work.
>
>This patch announcement will be of interest mostly to folks who use Sun
>workstations over "slow" links, like most dial-up lines. Please note,
>though, that you might benefit from the work we'll discuss here even if
>you've never used one of our workstations directly. (Many companies
>who provide Internet access use Suns as part of the communication path.
>And the patches are for Suns running Solaris 2.4 and up.)
>
>Also note: This message is coming to you directly from the engineers
>who did the work. We wanted to get the information out to you right
>away, but we really aren't trying to replace all the other Sun sources
>of information you might have access to. Please, don't send us lots of
>detailed questions--we're not volunteering to answer them (or even
>respond to many of the followups here). We just really wanted to make
>sure this message got out. Thanks.
>
>Cathe A. Ray
>Manager, Internet Engineering
>
>
> TCP Performance Improvements For Slow Network Links
> ===================================================
>
>Our Sun team is responsible for basic network communications software.
>We've been putting in a lot of work lately on improving the performance
>of TCP over slow network links. Now we're finished; testing is
>complete; and the patches (for Solaris 2.4 and later) will be available
>shortly.
>
>We undertook the work in response to feedback from customers serving
>WWW users over asynchronous PPP links. Users of LANs and WANs built on
>10base-T and faster media never saw the problem behavior, which
>actually affected FTP and other TCP-based applications as well.
>
>With the new patches in place, slow links will operate with roughly the
>same efficiency as fast links. Without the patches, efficiency of very
>slow links could, under Solaris 2.5, sink to as low as 5 per cent of the
>theoretical maximum.
>
>In the following sections we will describe in detail what was wrong and
>how we fixed it. If you don't need to know all that, just check the
>table below for the patch numbers. They'll be available soon from our
>usual patch sources. We're confident that customers who have seen the
>problem will now observe a remarkable improvement. Others will see no
>change.
>
> SPARC:
>
> module
> 2.4 2.5 2.5.1 affected
> |-----------|-----------|-----------|-----------------|
> | 101945-xx | 103169-05 | 103582-01 | /kernel/drv/ip |
> | 101945-xx | 103447-03 | 103630-01 | /kernel/drv/tcp |
> |-----------|-----------|-----------|-----------------|
>
> X86:
> module
> 2.4 2.5 2.5.1 affected
> |-----------|-----------|-----------|-----------------|
> | 101946-xx | 103170-05 | 103581-01 | /kernel/drv/ip |
> | 101946-xx | 103448-03 | 103631-01 | /kernel/drv/tcp |
> |-----------|-----------|-----------|-----------------|
>
> PowerPC:
> module
> 2.5.1 affected
> |-----------|-----------------|
> | 103583-01 | /kernel/drv/ip |
> | 103632-01 | /kernel/drv/tcp |
> |-----------|-----------------|
>
> Note: Where a revision number has been indicated it you should ask
> for the patch of at least that revision. In the case of the 2.4
> patch revision number it was not available at the time of this
> posting. Always try to get "the latest version" of any patch
> you go after.
>
>
>HISTORY
>
>Strangely, the decline in throughput was the result of several
>improvements we made over the years to the TCP retransmission
>algorithms and parameters. Every change improved performance for
>systems with fast links. The cumulative effect for slow links was just
>the reverse; but almost all our systems--and our customers'--were
>hooked up to fast links, and the drawbacks went largely unnoticed. That
>was the state of affairs at the time 2.4 was released.
>
>By the time 2.5 came out, async hookups to the Web had exploded. We had
>implemented another relatively minor TCP bug fix. Customers with fast
>links were better off. The efficiency of slow links declined. We
>quickly learned we had a problem.
>
>We tracked down the inconsistencies and rewrote the code. We've
>redesigned the algorithm for good behavior across all supported
>configurations. We've added slow links and a wide mix of simulated
>platforms to our test beds, and tested the fixes in both high-speed and
>slow-speed networks. The problem is resolved.
>
>Excellence is a moving target.
>
>
>TECHNICAL DETAILS
>
>Here are some technical details. As you'll see, we've made it a pretty
>frank discussion. (Please be aware, though, that we do not intend to
>spend much time debating our decisions here.)
>
>The throughput troubles on slow lines result from an excessive rate of
>retransmissions. The rate, in turn, is caused by a mis-tuning adaptive
>algorithm.
>
>TCP packets are retransmitted if no response is received before a
>timeout period has expired. Our routines implement a variant of the
>familiar Karn and Jacobson adaptive algorithms, which attempts to
>predict an efficient timeout value based on the time it took previous
>packets to complete a roundtrip. Elapsed values are combined into a
>smoothed average roundtrip time ("RTT") and variance.
>
>The key elements in this calculation are the initial RTT value and the
>subsequent RTT's factored in. The changes we have made involve both of
>these key areas.
>
>
>INITIAL RTT VALUES
>
>As an unintended result of several cumulative changes, the kernel
>parameter "tcp_rexmit_interval_initial" was actually not being used. In
>fact, all Internet Routing Entry (IRE) RTT values were being
>initialized to 512 milliseconds. TCP was using that as an initial
>setting.
>
>For connections which flow through a route with a roundtrip time less
>than that (such as a LAN or WAN built on 10base-T) all was well. When
>the connection closed, the actual IRE RTT value was updated and the
>predictive timeout value successfully adjusted.
>
>For connections with an RTT greater then 512 ms, however, the timeout
>would necessarily trip, and retransmissions occur. If the actual time
>differed sufficiently from the original estimated value, TCP was never
>able to send a segment without one or more retransmissions. A realistic
>RTT for the route could never be established. This scenario is the
>beginning of the explanation of what has been happening on several-hop
>Internet or asynchronous PPP links.
>
>Our solution is to initialize all IRE RTT's to zero instead of 512 ms.
>Any new connection for a route will now, when lookup discloses the zero
>value, get the value of the "tcp_rexmit_interval_initial" parameter
>instead. (And it's been increased to 3 seconds.) So in most cases the
>adaptive algorithm will now be able to adjust timeout values effectively.
>
>
>RTO (RETRANSMIT TIMEOUT) ALGORITHM INTERACTION
>
>Another factor contributing to packet congestion and retransmission was
>a change to the RTO algorithm, introduced in a 2.4 Kernel Patch. The
>intent was to make the behavior more "conservative"--that is, lower the
>risk of poor timeout values. The effect on low-speed links was
>unexpectedly contrary.
>
>A key (and unintended) effect of the code change was that RTT data from
>retransmitted packets was discarded. This behavior, together with the
>poor initial RTT values described earlier, meant that the adaptive
>algorithm was deprived of the information needed to adjust the RTO.
>
>Our solution is to keep the RTO RTT update still conservative, but now
>update the RTO after no more than one receive window's worth of valid
>RTT's. Further, when an invalid RTT is seen--an ACK of a retransmitted
>segment, for example--any valid RTT information is fed into the RTO
>algorithm.
>
>
>ZERO WINDOW PROBE BUG FIX
>
>The problems described so far affect Solaris 2.4 and 2.5 equally. What
>changed with 2.5?
>
>One important fix we included in 2.5 was for the "zero window probe"
>bug, a well-publicized problem affecting just about all versions of
>UNIX. As part of that rewrite, we removed a nondescript piece of logic
>that implemented a simple "backoff" scheme. The excised code caused the
>RTO to be lengthened by one-eighth as a result of certain failures. It
>seemed not to be needed; but it had concealed the presence of the other
>bugs by providing a means for the RTO to reach a successful value. When
>this code was removed the other underlying problems were exposed.
>
>
>IRE RTT LOGIC
>
>This last part of the problem concerns the interaction between TCP and
>the Solaris-specific Internet Routing Entries. The IRE RTT logic caches
>RTT values to be re-used when a new connection is made over a familiar
>link.
>
>This is a fine approach. The implementation, however, had a flaw: the
>IRE RTT was updated regardless of the RTT value supplied by TCP.
>
>As you will have guessed by now, users of high-speed links saw no
>effect. But in highly variable RTT routes, when a connection dominated
>by small segments was closed, a problem could result. An RTT too short
>for large segments was used to update the IRE RTT, and a subsequent
>connection dominated by large segments (like FTP) experienced an
>excessive retransmission rate. It was a different path to a familiar
>dilemma: too small a timeout value.
>
>Naturally the most highly variable RTT's tend to be seen on async PPP
>links, where the RTT of the route is compounded from (1) wire latency,
>(2) low bandwidth, and (3) congestion/queuing delays as more than one
>segment is transmited by TCP.
>
>Our solution is to add an new ndd variable "tcp_rtt_updates". It allows
>tuning or disabling of IRE RTT updates. A value of zero disables IRE
>RTT updates. A value greater than zero specifies how many RTT updates
>to the RTO are required--that is, how many chances the algorithm has
>had to adapt the timeout--before a closing connection will be allowed
>to update the RTT in the IRE.
>
>
>CONCLUSION
>
>We've fished out, fixed, and explained some subtle flaws in our
>adaptive retransmission algorithm. We take the responsibility for
>introducing them--and the credit, too, for practically every piece was,
>by itself, a successful response to our customers' needs. Better and
>exhaustive testing would have shown up the flaws earlier, privately,
>harmlessly. That's always our goal, and our customers have a right to
>expect the best. Yes.
>
>There's always tomorrow. In the meantime: we killed this one, folks.
>Our sincere thanks for your attention--and your business.
>
>--
>
>Cathe A. Ray | Love makes the world go `round
>SunSoft (415) 786-5178 | but Chocolate makes the trip worthwhile!
>cathe.ray@Eng.sun.com |