Discussion:
fetch hangs, part 2
Matt White
2004-07-18 17:16:54 UTC
Permalink
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance. Did
any of you guys who saw this problem before come up with a solution?


-Matt
Andre Guibert de Bruet
2004-07-18 17:38:34 UTC
Permalink
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance. Did
any of you guys who saw this problem before come up with a solution?
This report is void of anything that can give a clue as to what the
problem could be. What state is the process wedged in? What platform? What
NIC are you using? You're running CURRENT as of what date? Is it 100%
reproduceable on your system? If so, can you make a ktrace available? Does
wget hang in a similar fashion?

Regards,
Andy
Post by Matt White
Andre Guibert de Bruet | Enterprise Software Consultant >
Silicon Landmark, LLC. | http://siliconlandmark.com/ >
Matt White
2004-07-18 18:37:57 UTC
Permalink
Sorry, I was referring to a previous post which described exactly the
problem I was having. I was more asking if those guys had figured out what
was going on. In fact, I think that's pretty clear from my message.

But as to your questions:

Not 100% reproducible, but happens enough that I can't leave a large fetch
going and expect it to be done when I get back. Haven't tried wget because
the previous posts indicated that they were unable to get it to work with
the ports system.

This is a 5.2.1 system cvsupped as of 7/17. Device is a RealTek 8139.
Processes wedge in sbwait. No ktrace right now because I need to head out,
but I should be able to get one this evening.


-Matt


--On Sunday, July 18, 2004 1:38 PM -0400 Andre Guibert de Bruet
Post by Andre Guibert de Bruet
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance.
Did any of you guys who saw this problem before come up with a solution?
This report is void of anything that can give a clue as to what the
problem could be. What state is the process wedged in? What platform?
What NIC are you using? You're running CURRENT as of what date? Is it
100% reproduceable on your system? If so, can you make a ktrace
available? Does wget hang in a similar fashion?
Regards,
Andy
Post by Matt White
Andre Guibert de Bruet | Enterprise Software Consultant >
Silicon Landmark, LLC. | http://siliconlandmark.com/ >
Conrad J. Sabatier
2004-07-18 18:48:56 UTC
Permalink
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance.
Did
any of you guys who saw this problem before come up with a solution?
I found that changing my buffer size settings made all the difference.
I had been using the following in /etc/sysctl.conf:

net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=65536

Changing them to:

net.inet.tcp.recvspace=65536
net.inet.tcp.sendspace=32768

Solved the problem.
--
Conrad J. Sabatier <***@cox.net> -- "In Unix veritas"
Matt White
2004-07-18 19:08:40 UTC
Permalink
Conrad:

Thanks, but it looks like those are the default settings, so I'm guessing
setting that's not going to help me. It's probably better to figure out
what's wrong here anyway so we can use whatever settings we want. If I
don't hear anything more, I'll try to get a ktrace tonight.


-Matt

--On Sunday, July 18, 2004 1:48 PM -0500 "Conrad J. Sabatier"
Post by Conrad J. Sabatier
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance.
Did
any of you guys who saw this problem before come up with a solution?
I found that changing my buffer size settings made all the difference.
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.sendspace=32768
Solved the problem.
--
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-current
Darren Pilgrim
2004-07-20 20:48:10 UTC
Permalink
I've been having the same (or a similar problem). Fetching the distfile for
a port will hang. I poked some more at this and found that all the sites
fetch is hanging on successfully negotiate the TCP connection then go dead
without closing the connection. My workaround has been to edit the Makefile
and remove the offending site from MASTER_SITES or manually fetch from
ftp.freebsd.org.
From: Matt White
Thanks, but it looks like those are the default settings, so I'm guessing
setting that's not going to help me. It's probably better to figure out
what's wrong here anyway so we can use whatever settings we want. If I
don't hear anything more, I'll try to get a ktrace tonight.
-Matt
--On Sunday, July 18, 2004 1:48 PM -0500 "Conrad J. Sabatier"
Post by Conrad J. Sabatier
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance.
Did
any of you guys who saw this problem before come up with a solution?
I found that changing my buffer size settings made all the difference.
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.sendspace=32768
Solved the problem.
Robert Watson
2004-07-20 23:18:26 UTC
Permalink
Post by Darren Pilgrim
I've been having the same (or a similar problem). Fetching the distfile
for a port will hang. I poked some more at this and found that all the
sites fetch is hanging on successfully negotiate the TCP connection then
go dead without closing the connection. My workaround has been to edit
the Makefile and remove the offending site from MASTER_SITES or manually
fetch from ftp.freebsd.org.
Could you try disabling TCP SACK and see if things "get better"? It's one
of the things that has changed in the TCP code recently. This could well
be a user space fetch issue, but it would be worth trying it to see, if
only to rule it out.

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
Post by Darren Pilgrim
From: Matt White
Thanks, but it looks like those are the default settings, so I'm guessing
setting that's not going to help me. It's probably better to figure out
what's wrong here anyway so we can use whatever settings we want. If I
don't hear anything more, I'll try to get a ktrace tonight.
-Matt
--On Sunday, July 18, 2004 1:48 PM -0500 "Conrad J. Sabatier"
Post by Conrad J. Sabatier
Post by Matt White
I looked in the archives and saw that a couple of others also had this
problem. Fetch is getting distributions for ports and it'll just hang.
Killing the build and restarting it will usually cause it to advance.
Did
any of you guys who saw this problem before come up with a solution?
I found that changing my buffer size settings made all the difference.
net.inet.tcp.recvspace=131072
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.sendspace=32768
Solved the problem.
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-current
Darren Pilgrim
2004-07-21 10:54:34 UTC
Permalink
Post by Robert Watson
Post by Darren Pilgrim
I've been having the same (or a similar problem). Fetching the distfile
for a port will hang. I poked some more at this and found that all the
sites fetch is hanging on successfully negotiate the TCP connection then
go dead without closing the connection. My workaround has been to edit
the Makefile and remove the offending site from MASTER_SITES or manually
fetch from ftp.freebsd.org.
Could you try disabling TCP SACK
Set net.inet.tcp.delayed_ack=0?
Post by Robert Watson
and see if things "get better"? It's one
of the things that has changed in the TCP code recently. This could well
be a user space fetch issue, but it would be worth trying it to see, if
only to rule it out.
It didn't change. The connections still hung.
Daniel Lang
2004-07-21 15:02:58 UTC
Permalink
Hi Darren,

Darren Pilgrim wrote on Wed, Jul 21, 2004 at 03:54:34AM -0700:
[..]
Post by Darren Pilgrim
Post by Robert Watson
Could you try disabling TCP SACK
Set net.inet.tcp.delayed_ack=0?
No, the correct one is:

net.inet.tcp.sack.enable=0

However, a bug was fixed in the SACK code just recently.
I think the commit went in yesterday, or the day before.
However, it is possible that there are more issues around...

HTH,
Daniel
--
IRCnet: Mr-Spock
- In dieser Mail ist ein Geist, der Dich in den Hintern beisst -
Daniel Lang * ***@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/
Andre Oppermann
2004-07-21 15:34:34 UTC
Permalink
Post by Darren Pilgrim
Post by Robert Watson
Post by Darren Pilgrim
I've been having the same (or a similar problem). Fetching the distfile
for a port will hang. I poked some more at this and found that all the
sites fetch is hanging on successfully negotiate the TCP connection then
go dead without closing the connection. My workaround has been to edit
the Makefile and remove the offending site from MASTER_SITES or manually
fetch from ftp.freebsd.org.
Could you try disabling TCP SACK
Set net.inet.tcp.delayed_ack=0?
Post by Robert Watson
and see if things "get better"? It's one
of the things that has changed in the TCP code recently. This could well
be a user space fetch issue, but it would be worth trying it to see, if
only to rule it out.
It didn't change. The connections still hung.
Darren sent me a list with sites where he experiences the problem and I
have tested them with FreeBSD 5-CURRENT (yesterday's), 4.10-STABLE and
Windows 2kSP4. None worked. For FreeBSD fetch waited for the connection
timeout and went to the second site to fetch the tarball (successfully).

Maybe he was just not patient enough to wait for fetch to move on.
--
Andre
Darren Pilgrim
2004-07-21 19:47:19 UTC
Permalink
Post by Andre Oppermann
Darren sent me a list with sites where he experiences the
problem and I
have tested them with FreeBSD 5-CURRENT (yesterday's), 4.10-STABLE and
Windows 2kSP4. None worked. For FreeBSD fetch waited for
the connection
timeout and went to the second site to fetch the tarball
(successfully).
Maybe he was just not patient enough to wait for fetch to move on.
Probably not. :)

On the machine in question, 1-2 minutes is longer than the time from extract
to package for most ports. I've since set FETCH_CMD=/usr/bin/fetch -ARrT 10
in /etc/make.conf to cycle through sites more quickly.
Don Lewis
2004-07-21 17:20:24 UTC
Permalink
Post by Darren Pilgrim
Post by Robert Watson
Post by Darren Pilgrim
I've been having the same (or a similar problem). Fetching the distfile
for a port will hang. I poked some more at this and found that all the
sites fetch is hanging on successfully negotiate the TCP connection then
go dead without closing the connection. My workaround has been to edit
the Makefile and remove the offending site from MASTER_SITES or manually
fetch from ftp.freebsd.org.
Could you try disabling TCP SACK
Set net.inet.tcp.delayed_ack=0?
Post by Robert Watson
and see if things "get better"? It's one
of the things that has changed in the TCP code recently. This could well
be a user space fetch issue, but it would be worth trying it to see, if
only to rule it out.
It didn't change. The connections still hung.
Smells somewhat like a broken path MTU discovery problem. There might
be a hop in the path with a smaller than expected MTU and something
between there and the server that is filtering ICMP.

An interesting experiment would be to see what happens if a smaller than
normal MSS is requested. This would probably be easiest for someone
with a PPP connection.
Matt White
2004-07-21 17:43:20 UTC
Permalink
In my case, I tracked this down to either my netscreen5 or the cable into
it going south. It's a bit strange because other software wasn't having an
issue, so I'll probably look at this more in the next few days. But for
now, removing that box from my network has made fetch happy again, which in
turn has made me happy again.


-Matt


--On Wednesday, July 21, 2004 3:54 AM -0700 Darren Pilgrim
Post by Darren Pilgrim
Post by Robert Watson
Post by Darren Pilgrim
I've been having the same (or a similar problem). Fetching the
distfile for a port will hang. I poked some more at this and found
that all the sites fetch is hanging on successfully negotiate the TCP
connection then go dead without closing the connection. My workaround
has been to edit the Makefile and remove the offending site from
MASTER_SITES or manually fetch from ftp.freebsd.org.
Could you try disabling TCP SACK
Set net.inet.tcp.delayed_ack=0?
Post by Robert Watson
and see if things "get better"? It's one
of the things that has changed in the TCP code recently. This could well
be a user space fetch issue, but it would be worth trying it to see, if
only to rule it out.
It didn't change. The connections still hung.
Doug White
2004-07-21 18:51:33 UTC
Permalink
Post by Matt White
In my case, I tracked this down to either my netscreen5 or the cable into
it going south. It's a bit strange because other software wasn't having an
issue, so I'll probably look at this more in the next few days. But for
now, removing that box from my network has made fetch happy again, which in
turn has made me happy again.
It might have been breaking passive-mode FTP. I've noticed this problem
coming from certain places. Thankfully not from my workplace or home :)

Some older inspective firewalls don't understand passive mode and drop the
connection.
--
Doug White | FreeBSD: The Power to Serve
***@gumbysoft.com | www.FreeBSD.org
Matt White
2004-07-21 20:10:12 UTC
Permalink
I don't think so in this case since I was using this same netscreen with a
FreeBSD 5.1 box without incident. I'm actually suspicious of the cabling
at the moment, but I have higher priorities than to figure that out at the
moment.


-Matt



--On Wednesday, July 21, 2004 11:51 AM -0700 Doug White
Post by Doug White
Post by Matt White
In my case, I tracked this down to either my netscreen5 or the cable into
it going south. It's a bit strange because other software wasn't having
an issue, so I'll probably look at this more in the next few days. But
for now, removing that box from my network has made fetch happy again,
which in turn has made me happy again.
It might have been breaking passive-mode FTP. I've noticed this problem
coming from certain places. Thankfully not from my workplace or home :)
Some older inspective firewalls don't understand passive mode and drop the
connection.
--
Doug White | FreeBSD: The Power to Serve
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/freebsd-current
Loading...