an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.
We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().
Also:
- Calculate the correct number of bytes that need to be
retained for icmp_error(), instead of assuming that 64
is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
copy from the supplied mbuf chain, in case the first 8
bytes of IP payload are not stored directly after the IP
header.
- Sanity-check ip_len in icmp_error(), and panic if it is
less than sizeof(struct ip). Incoming packets with bad
ip_len values are discarded in ip_input(), so this should
only be triggered by bugs in the code, not by bad packets.
This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.
Reported by: Mike Tancsa <mike@sentex.net>
Reviewed by: ru, bmilekic, jlemon, dillon
it doesn't block packets whose destination address has been translated to
the loopback net by ipnat.
Add warning comments about the ip_checkinterface feature.
However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref
count is > 1, we won't decrement the count at all. This could lead to
route entries never being deleted.
Here, we call rtfree() not only if the initial two conditions fail, but
also if the ref count is > 1 (and we therefore don't immediately delete
the route, but let rtfree() handle it).
This is an urgent MFC candidate. Thanks go to Mike Silbersack for the
fix, once again. :-)
Submitted by: Mike Silbersack <silby@silby.com>
addressed to the interface on the other side of the box follow their
historical path.
Explicitly block packets sent to the loopback network sent from the outside,
which is consistent with the behavior of the forwarding path between
interfaces as implemented in in_canforward().
Always check the arrival interface when matching the packet destination
against the interface broadcast addresses. This bug allowed TCP
connections to be made to the broadcast address of an interface on the
far side of the system because the M_BCAST flag was not set because the
packet was unicast to the interface on the near side. This was broken
when the directed broadcast code was removed from revision 1.32. If
the directed broadcast code was stil present, the destination would not
have been recognized as local until the packet was forwarded to the output
interface and ether_output() looped a copy back to ip_input() with
M_BCAST set and the receive interface set to the output interface.
Optimize the order of the tests.
Reviewed by: jlemon
if an arriving packet belongs to us, also check that the packet arrived
through the correct interface. Skip this check if the packet was locally
generated.
When we recieve a fragmented TCP packet (other than the first) we can't
extract header information (we don't have state to reference). In a rather
unelegant fashion we just move on and assume a non-match.
Recent additions to the TCP header-specific section of the code neglected
to add the logic to the fragment code so in those cases the match was
assumed to be positive and those parts of the rule (which should have
resulted in a non-match/continue) were instead skipped (which means
the processing of the rule continued even though it had already not
matched).
Fault can be spread out over Rich Steenbergen (tcpoptions) and myself
(tcp{seq,ack,win}).
rwatson sent me a patch that got me thinking about this whole situation
(but what I'm committing / this description is mine so don't blame him).
For TCP, verify that the sequence number in the ICMP packet falls within
the tcp receive window before performing any actions indicated by the
icmp packet.
Clean up some layering violations (access to tcp internals from in_pcb)
This piece of code has not been referenced since it was put there
in 1995. Also done a codebased search on popular networking libraries
and third-party applications. This is an orphan.
Reviewed by: jesper
connection, but send it immediately. Prior to this change, it was possible
to delay a delayed-ack for multiple times, resulting in degraded TCP
behavior in certain corner cases.
error will be passed up to the user, who will close the connection, so
it does not appear to make a sense to leave the connection open.
This also fixes a bug with kqueue, where the filter does not set EOF
on the connection, because the connection is still open.
Also remove calls to so{rw}wakeup, as we aren't doing anything with
them at the moment anyway.
Reviewed by: alfred, jesper
reset TCP connections which are in the SYN_SENT state, if the sequence
number in the echoed ICMP reply is correct. This behavior can be
controlled by the sysctl net.inet.tcp.icmp_may_rst.
Currently, only subtypes 2,3,10,11,12 are treated as such
(port, protocol and administrative unreachables).
Assocaiate an error code with these resets which is reported to the
user application: ENETRESET.
Disallow resetting TCP sessions which are not in a SYN_SENT state.
Reviewed by: jesper, -net
and 1.84 of src/sys/netinet/udp_usrreq.c
The changes broken down:
- remove 0 as a wildcard for addresses and port numbers in
src/sys/netinet/in_pcb.c:in_pcbnotify()
- add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify
all sessions with the specific remote address.
- change
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
to use in_pcbnotifyall() to notify multiple sessions, instead of
using in_pcbnotify() with 0 as src address and as port numbers.
- remove check for src port == 0 in
- src/sys/netinet/tcp_subr.c:tcp_ctlinput()
- src/sys/netinet/udp_usrreq.c:udp_ctlinput()
as they are no longer needed.
- move handling of redirects and host dead from in_pcbnotify() to
udp_ctlinput() and tcp_ctlinput(), so they will call
in_pcbnotifyall() to notify all sessions with the specific
remote address.
Approved by: jlemon
Inspired by: NetBSD
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.
Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project
treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify()
It's sufficient to check for src|local port, as we'll have no
sessions with src|local port == 0
Without this a attacker sending ICMP messages, where the attached
IP header (+ 8 bytes) has the address and port numbers == 0, would
have the ICMP message applied to all sessions.
PR: kern/25195
Submitted by: originally by jesper, reimplimented by jlemon's advice
Reviewed by: jlemon
Approved by: jlemon
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.
Reviewed by: bde
Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h
Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input
In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB
or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG
Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst
to reflect the fact that we also react on ICMP unreachables that
are not administrative prohibited. Also update the comments to
reflect this.
In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat
PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different.
PR: 23986
Submitted by: Jesper Skriver <jesper@skriver.dk>
address is configured on a interface. This is useful for routers with
dynamic interfaces. It is now possible to say:
0100 allow tcp from any to any established
0200 skipto 1000 tcp from any to any
0300 allow ip from any to any
1000 allow tcp from 1.2.3.4 to me 22
1010 deny tcp from any to me 22
1020 allow tcp from any to any
and not have to worry about the behaviour if dynamic interfaces configure
new IP numbers later on.
The check is semi expensive (traverses the interface address list)
so it should be protected as in the above example if high performance
is a requirement.
were performed to determine if the received packet should be reset. This
created erroneous ratelimiting and false alarms in some cases. The code
has now been reorganized so that the checks for validity come before
the call to badport_bandlim. Additionally, a few changes in the symbolic
names of the bandlim types have been made, as well as a clarification of
exactly which type each RST case falls under.
Submitted by: Mike Silbersack <silby@silby.com>
turned on, and the case of it not being defined at all.
i.e. Disabling bridging re-enables some of the checks it disables.
Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
available, the error return should be EADDRNOTAVAIL rather than
EAGAIN.
PR: 14181
Submitted by: Dima Dorfman <dima@unixfreak.org>
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
splimp() -- we need it because dummynet can be invoked by the
bridging code at splimp().
This should cure the pipe "stalls" that several people have been
reporting on -stable while using bridging+dummynet (the problem
would not affect routers using dummynet).
reserved and now allocated TCP flags in incoming packets. This patch
stops overloading those bits in the IP firewall rules, and moves
colliding flags to a seperate field, ipflg. The IPFW userland
management tool, ipfw(8), is updated to reflect this change. New TCP
flags related to ECN are now included in tcp.h for reference, although
we don't currently implement TCP+ECN.
o To use this fix without completely rebuilding, it is sufficient to copy
ip_fw.h and tcp.h into your appropriate include directory, then rebuild
the ipfw kernel module, and ipfw tool, and install both. Note that a
mismatch between module and userland tool will result in incorrect
installation of firewall rules that may have unexpected effects. This
is an MFC candidate, following shakedown. This bug does not appear
to affect ipfilter.
Reviewed by: security-officer, billf
Reported by: Aragon Gouveia <aragon@phat.za.net>
to supress logging when ARP replies arrive on the wrong interface:
"/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"
the default is to log just to give notice about possibly incorrectly
configured networks.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.
rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.
quote begin.
A Destination Unreachable message that is received MUST be
reported to the transport layer. The transport layer SHOULD
use the information appropriately; for example, see Sections
4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
that has its own mechanism for notifying the sender that a
port is unreachable (e.g., TCP, which sends RST segments)
MUST nevertheless accept an ICMP Port Unreachable for the
same purpose.
quote end.
I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.
When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.
The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.
I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.
PR: 23086
Submitted by: Jesper Skriver <jesper@skriver.dk>
1. ICMP ECHO and TSTAMP replies are now rate limited.
2. RSTs generated due to packets sent to open and unopen ports
are now limited by seperate counters.
3. Each rate limiting queue now has its own description, as
follows:
Limiting icmp unreach response from 439 to 200 packets per second
Limiting closed port RST response from 283 to 200 packets per second
Limiting open port RST response from 18724 to 200 packets per second
Limiting icmp ping response from 211 to 200 packets per second
Limiting icmp tstamp response from 394 to 200 packets per second
Submitted by: Mike Silbersack <silby@silby.com>
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.
IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.
Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf
header structure, called "pkt," and could consequently pollute the mbuf
free list if it was ever passed to m_freem(). The fix involved passing not
pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf
utility routines.
* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header
back into the mbuf (prepended) because the dummynet code that follows expects
it to be there but it is, unfortunately for dummynet, passed to bdg_forward
as a seperate argument.
PRs: kern/19551 ; misc/21534 ; kern/23010
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: bmilekic
Approved by: luigi
instead.
Also, fix a small set of "avail." If we're setting `avail,' we shouldn't
be re-checking whether m_flags is M_EXT, because we know that it is, as if
it wasn't, we would have already returned several lines above.
Reviewed by: jlemon
only be checked if the system is currently performing New Reno style
fast recovery. However, this value was being checked regardless of the
NR state, with the end result being that the congestion window was never
opened.
Change the logic to check t_dupack instead; the only code path that
allows it to be nonzero at this point is NewReno, so if it is nonzero,
we are in fast recovery mode and should not touch the congestion window.
Tested by: phk
PPTP links are no longer dropped by simple (and inappropriate in this
case) "inactivity timeout" procedure, only when requested through the
control connection.
It is now possible to have multiple PPTP servers running behind NAT.
Just redirect the incoming TCP traffic to port 1723, everything else
is done transparently.
Problems were reported and the fix was tested by:
Michael Adler <Michael.Adler@compaq.com>,
David Andersen <dga@lcs.mit.edu>
<sys/proc.h> to <sys/systm.h>.
Correctly document the #includes needed in the manpage.
Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.
Define __offsetof() in <machine/ansi.h>
Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>
Remove myriad of local offsetof() definitions.
Remove includes of <stddef.h> in kernel code.
NB: Kernelcode should *never* include from /usr/include !
Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.
Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.
Paritials reviews by: various.
Significant brucifications by: bde
of IP datagram. This fixes the problem when firewall denied fragmented
packets whose last fragment was less than minimum protocol header size.
Found by: Harti Brandt <brandt@fokus.gmd.de>
PR: kern/22309
in the code enforces this. So, do not check for and attempt a
false reassembly if only IP_RF is set.
Also, removed the dead code, since we no longer use dtom() on
return from ip_reass().
statistics on a per network address basis.
Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.
Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
and instead reapply the revision 1.49 of mbuf.h, i.e.
Fixed regression of the type of the `header' member of struct pkthdr from
`void *' to caddr_t in rev.1.51. This mainly caused an annoying warning
for compiling ip_input.c.
Requested by: bde