other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for
ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT
And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB
This should fix the case where port unreachables for udp returned
ENETRESET instead of ECONNREFUSED
Problem found by: Bill Fenner <fenner@research.att.com>
Reviewed by: jlemon
sysctl, net.inet.ip.fw.permanent_rules.
This allows you to install rules that are persistent across flushes,
which is very useful if you want a default set of rules that
maintains your access to remote machines while you're reconfiguring
the other rules.
Reviewed by: Mark Murray <markm@FreeBSD.org>
very specific scenarios, and now that we have had net.inet.tcp.blackhole for
quite some time there is really no reason to use it any more.
(last of three commits)
using it. Not checking this may have caused the wrong IP address to be
used when processing certain IP options (see example below). This also
caused the wrong route to be passed to ip_output() when forwarding, but
fortunately ip_output() is smart enough to detect this.
This example demonstrates the wrong behavior of the Record Route option
observed with this bug. Host ``freebsd'' is acting as the gateway for
the ``sysv''.
1. On the gateway, we add the route to the destination. The new route
will use the primary address of the loopback interface, 127.0.0.1:
: freebsd# route add 10.0.0.66 -iface lo0 -reject
: add host 10.0.0.66: gateway lo0
2. From the client, we ping the destination. We see the correct replies.
Please note that this also causes the relevant route on the ``freebsd''
gateway to be cached in ipforward_rt variable:
: sysv# ping -snv 10.0.0.66
: PING 10.0.0.66: 56 data bytes
: ICMP Host Unreachable from gateway 192.168.0.115
: ICMP Host Unreachable from gateway 192.168.0.115
: ICMP Host Unreachable from gateway 192.168.0.115
:
: ----10.0.0.66 PING Statistics----
: 3 packets transmitted, 0 packets received, 100% packet loss
3. On the gateway, we delete the route to the destination, thus making
the destination reachable through the `default' route:
: freebsd# route delete 10.0.0.66
: delete host 10.0.0.66
4. From the client, we ping destination again, now with the RR option
turned on. The surprise here is the 127.0.0.1 in the first reply.
This is caused by the bug in ip_rtaddr() not checking the cached
route is still up befor use. The debug code also shows that the
wrong (down) route is further passed to ip_output(). The latter
detects that the route is down, and replaces the bogus route with
the valid one, so we see the correct replies (192.168.0.115) on
further probes:
: sysv# ping -snRv 10.0.0.66
: PING 10.0.0.66: 56 data bytes
: 64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms
: IP options: <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
: 64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms
: IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
: 64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms
: IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66,
: 192.168.0.65, 192.168.0.115, 192.168.0.120,
: 0.0.0.0(Current), 0.0.0.0, 0.0.0.0
:
: ----10.0.0.66 PING Statistics----
: 3 packets transmitted, 3 packets received, 0% packet loss
: round-trip (ms) min/avg/max = 0/3/10
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.
Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.
This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.
netinet/in.c, netinet/in_rmx.c:
When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)
Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.
PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)
is transmitted as all ones". This got broken after introduction
of delayed checksums as follows. Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP. (The discussion still
takes place on -net.) Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression). Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero). It was fixed next day in revision 1.21,
by removing the INVERT expression. The latter also broke the
0 -> 0xffff fixup for UDP checksums.
Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 0000
After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 ffff
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.
Reported by: "Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by: luigi, joe
on certain types of SOCK_RAW sockets. Also, use the ip.ttl MIB
variable instead of MAXTTL constant as the default time-to-live
value for outgoing IP packets all over the place, as we already
do this for TCP and UDP.
Reviewed by: wollman
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.
We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().
Also:
- Calculate the correct number of bytes that need to be
retained for icmp_error(), instead of assuming that 64
is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
copy from the supplied mbuf chain, in case the first 8
bytes of IP payload are not stored directly after the IP
header.
- Sanity-check ip_len in icmp_error(), and panic if it is
less than sizeof(struct ip). Incoming packets with bad
ip_len values are discarded in ip_input(), so this should
only be triggered by bugs in the code, not by bad packets.
This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.
Reported by: Mike Tancsa <mike@sentex.net>
Reviewed by: ru, bmilekic, jlemon, dillon
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.
Add warning comments about the ip_checkinterface feature.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.
Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).
This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)
Submitted by: Mike Silbersack <silby@silby.com>
addressed to the interface on the other side of the box follow their
historical path.
Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().
Always check the arrival interface when matching the packet destination
against the interface broadcast addresses. This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side. This was broken
when the directed broadcast code was removed from revision 1.32. If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.
Optimize the order of the tests.
Reviewed by: jlemon
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface. Skip this check if the packet was locally
generated.
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.
Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).
Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).
rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.
Clean up some layering violations (access to tcp internals from in_pcb)
This piece of code has not been referenced since it was put there
in 1995. Also done a codebased search on popular networking libraries
and third-party applications. This is an orphan.
Reviewed by: jesper
connection, but send it immediately. Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
error will be passed up to the user, who will close the connection, so
it does not appear to make a sense to leave the connection open.
This also fixes a bug with kqueue, where the filter does not set EOF
on the connection, because the connection is still open.
Also remove calls to so{rw}wakeup, as we aren't doing anything with
them at the moment anyway.
Reviewed by: alfred, jesper
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct. This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.
Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).
Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.
Disallow resetting TCP sessions which are not in a SYN_SENT state.
Reviewed by: jesper, -net
and 1.84 of src/sys/netinet/udp_usrreq.c
The changes broken down:
- remove 0 as a wildcard for addresses and port numbers in
src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
all sessions with the specific remote address.
- change
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
to use in_pcbnotifyall() to notify multiple sessions, instead of
using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
udp_ctlinput() and tcp_ctlinput(), so they will call
in_pcbnotifyall() to notify all sessions with the specific
remote address.
Approved by: jlemon
Inspired by: NetBSD
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.
Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project