it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.
Reviewed by: sam
o pickup Giant in divert_packet to protect sbappendaddr since it
can be entered through MPSAFE callouts or through ip_input when
mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
on an ICMP redirect (may not be needed)
Supported by: FreeBSD Foundation
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond
Supported by: FreeBSD Foundation
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).
Supported by: FreeBSD Foundation
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd. This caused major havoc (for some
reason it appeared most frequently for folks running natd). Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us. This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.
Supported by: FreeBSD Foundation
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
previously only considered the send sequence space. Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.
This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
the module. Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed. Instead
stop the callout first then lock.
Supported by: FreeBSD Foundation
o add some more debugging help for figuring out why folks are
getting complaints about releasing routing table entries with
a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
(aka RFC2292bis). Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.
Obtained from: KAME
that at most 20% of sockets can be in time_wait at one time, ensuring
that time_wait sockets do not starve real connections from inpcb
structures.
No implementation change is needed, jlemon already implemented a nice
LRU-ish algorithm for tcp_tw structure recycling.
This should reduce the need for sysadmins to lower the default msl on
busy servers.
when loaded as a module
o cleanup data structures on module unload when no application has
been started (i.e. kldload, kldunload w/o mrtd)
o remove extraneous unlocks immediately prior to destroying them
Supported by: FreeBSD Foundation
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.
Supported by: FreeBSD Foundation
Submitted by: Pyun YongHyeon
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing. In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children. When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.
Submitted by: Jim Kuhn <jkuhn@sandvine.com>
Reviewed by: "Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by: Ben Pfountz <netprince@vt.edu>
MFC after: 1 week
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.
Add a more verbose comment about exactly what this code does.
Submitted by: ru
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules
Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.
Approved by: jake (mentor)
Reported by: Apple Product Security <product-security@apple.com>
Skinny is the protocol used by Cisco IP phones to talk to Cisco Call
Managers. With this code, one can use a Cisco IP phone behind a FreeBSD
NAT gateway.
Currently, having the Call Manager behind the NAT gateway is not supported.
More information on enabling Skinny support in libalias, natd, and ppp
can be found in those applications' manpages.
PR: 55843
Reviewed by: ru
Approved by: ru
MFC after: 30 days
sending an ICMP packet doesn't cause a panic. A better solution is needed;
possibly defering the transmit to a dedicated thread.
Observed by: "Aaron Wohl" <freebsd@soith.com>
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage
Sponsored by: FreeBSD Foundation
o change time to MPSAFE callout
o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable
by net.inet.ip.dummynet.debug
o make boot-time printf dependent on bootverbose
Sponsored by: FreeBSD Foundation
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and
fixing numerous problems.
Sponsored by: FreeBSD Foundation
Reviewed by: Pavlin Radoslavov <pavlin@icir.org>
Changes from the original implementation:
- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed. Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.
- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments. While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.
- Add two modes of random fragmentation. Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
NB: There is a known LOR on the forwarding path; this needs to be resolved
together with a similar issue in the bridge. For the moment it is
believed to be benign.
Sponsored by: FreeBSD Fondation
returned mbuf can be NULL. Check for NULL in rip_output() when
prepending an IP header. This prevents mbuf exhaustion from
causing a local kernel panic when sending raw IP packets.
PR: kern/55886
Reported by: Pawel Malachowski <pawmal-posting@freebsd.lublin.pl>
MFC after: 3 days
mac_reflect_mbuf_icmp()
mac_reflect_mbuf_tcp()
These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket. For example, in respond to a ping or a RST
packet to a SYN on a closed port.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
from queue(3).
Improve vertical compactness by using a IGMP_PRINTF() macro rather
than #ifdefing IGMP_DEBUG a large number of debugging printfs.
Reviewed by: mdodd (SLIST changes)
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.
Suggested by: fenestro
Obtained from: BSD/OS
Reviewed by: mini, sam
Approved by: jake (mentor)
of bw_meter entries were processed up to one second ahead.
After an unappropriate rescheduling of some of the bw_meter
entries, the upcalls weren't delivered.
* pim_register_prepare() uses the appropriate sw_csum flag to
call ip_fragment() so the IP checksum is computed properly.
* Modify pim_register_prepare() to take care of IP packets that
don't need fragmentation.
* Add-back in_delayed_cksum() to encap_send(), because it seems it
should be there.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
binaries in /bin and /sbin installed in /lib. Only the versioned files
reside in /lib, the .so symlink continues to live /usr/lib so the
toolchain doesn't need to be modified.
segments are lost for the application. This broke, for example,
ports/benchmarks/dbs which needs the SYN segment to filter the
contents of the trace buffer for the connection it is interested in.
This patch makes the SYN segments available again. Unfortunately they
are now associated with the listening socket instead of the new one, so
a change to applications is required, but without this patch it wouldn't
work altogether.
PR: kern/45966
code has rotten a bit so that the header length is not correct at
the point when tcp_trace is called. Temporarily compute the correct
value before the call and restore the old value after. This makes
ports/benchmarks/dbs to almost work.
This is a NOP unless you compile with TCPDEBUG.
in tcp_input that leave the function before hitting the tcp_trace
function call for the TCPDEBUG option. This has made TCPDEBUG mostly
useless (and tools like ports/benchmarks/dbs not working). Add
tcp_trace calls to the return paths that could be identified in this
maze.
This is a NOP unless you compile with TCPDEBUG.
new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels
for IP over ATM, but also CBR, VBR and ABR. Change the format of the
link layer address to specify the channel characteristics. The old
format is still supported and opens UBR channels.
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):
options MROUTING # Multicast routing
options PIM # Protocol Independent Multicast
2. Add support for advanced multicast API setup/configuration and
extensibility.
3. Add support for kernel-level PIM Register encapsulation.
Disabled by default. Can be enabled by the advanced multicast API.
4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
This change allows one to specify almost the complete traffic parameters
for IPoverATM channels through the routing table. Up to now we used
4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed.
If the address is longer, however, the 5th byte is interpreted as the
traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the
parameters for this traffic class:
UBR: 0 byte or 3 byte PCR
CBR: 3 byte PCR
VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS
ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM,
1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF
A script to generate the corresponding 'route add' arguments will follow soon.
sysctl:
- sysctlbyname("net.inet.ip.mfctable", ...)
- sysctlbyname("net.inet.ip.viftable", ...)
This change is needed so netstat can use sysctlbyname() to read
the data from those tables.
Otherwise, in some cases "netstat -g" may fail to report the
multicast forwarding information (e.g., if we run a multicast
router on PicoBSD).
* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast
routing daemon, set properly "im->im_vif" to the receiving
incoming interface of the packet that triggered that upcall
rather than to the expected incoming interface of that packet.
* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"
* Few formatting nits (e.g., replace extra spaces with TABs)
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
code used to call rtrequest(RTM_DELETE, ...). This is a problem, because
the function that just has called us (route_output)
is not really happy with the route it just is creating beeing ripped out
from under it. Unfortunately we also cannot return an error from
ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
check for raw IP system management operations is often (although
not always) implicit due to the namespacing of raw IP sockets. I.e.,
you have to have privilege to get a raw IP socket, so much of the
management code sitting on raw IP sockets assumes that any requests
on the socket should be granted privilege.
Obtained from: TrustedBSD Project
Product of: France
Set 31 is still special because rules belonging to it are not deleted
by the "ipfw flush" command, but must be deleted explicitly with
"ipfw delete set 31" or by individual rule numbers.
This implement a flexible form of "persistent rules" which you might
want to have available even after an "ipfw flush".
Note that this change does not violate POLA, because you could not
use set 31 in a ruleset before this change.
sbin/ipfw changes to allow manipulation of set 31 will follow shortly.
Suggested by: Paul Richards
lastest rev of the spec. Use an explicit flag for Fast Recovery. [1]
Fix bug with exiting Fast Recovery on a retransmit timeout
diagnosed by Lu Guohan. [2]
Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com>
Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2]
Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>,
Sally Floyd <floyd@acm.org> [1]
Since we already had 'O_NOP' instructions which always match, all
I needed to do is allow the NOP command to have arbitrary length
(i.e. move its label in a different part of the switch() which
validates instructions).
The kernel must know nothing about comments, everything else is
done in userland (which will be described in the upcoming ipfw2.c
commit).
support matching a list of addr/mask pairs so one can write
more efficient rulesets which were not possible before e.g.
add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16
The change is fully backward compatible.
ipfw2 and manpage commit to follow.
MFC after: 3 days
Should work with both regular and fast ipsec (mutually exclusive).
See manpage for more details.
Submitted by: Ari Suutari (ari.suutari@syncrontech.com)
Revised by: sam
MFC after: 1 week
"ipid" options. This feature has been requested by several users.
On passing, fix some minor bugs in the parser. This change is fully
backward compatible so if you have an old /sbin/ipfw and a new
kernel you are not in trouble (but you need to update /sbin/ipfw
if you want to use the new features).
Document the changes in the manpage.
Now you can write things like
ipfw add skipto 1000 iplen 0-500
which some people were asking to give preferential treatment to
short packets.
The 'MFC after' is just set as a reminder, because I still need
to merge the Alpha/Sparc64 fixes for ipfw2 (which unfortunately
change the size of certain kernel structures; not that it matters
a lot since ipfw2 is entirely optional and not the default...)
PR: bin/48015
MFC after: 1 week
this makes connect act more sensibly in these cases.
PR: 50839
Submitted by: Barney Wolff <barney@pit.databus.com>
Patch delayed by laziness of: silby
MFC after: 1 week
is not called, and no static rules match an outgoing packet, the
latter retains its source IP address. This is in support of the
"static NAT only" mode.
only meaningful for fragments. Also don't bother to byte-swap the
ip_id when we do generate it; it is only used at the receiver as a
nonce. I tried several different permutations of this code with no
measurable difference to each other or to the unmodified version, so
I've settled on the one for which gcc seems to generate the best code.
(If anyone cares to microoptimize this differently for an architecture
where it actually matters, feel free.)
Suggested by: Steve Bellovin's paper in IMW'02
sure that the MAC label on TCP responses during TIMEWAIT is
properly set from either the socket (if available), or the mbuf
that it's responding to.
Unfortunately, this is made somewhat difficult by the TCP code,
as tcp_twstart() calls tcp_twrespond() after discarding the socket
but without a reference to the mbuf that causes the "response".
Passing both the socket and the mbuf works arounds this--eventually
it might be good to make sure the mbuf always gets passed in in
"response" scenarios but working through this provided to
complicate things too much.
Approved by: re (scottl)
Reviewed by: hsu
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
copying for mbuf headers now works properly in m_dup_pkthdr(), so
we don't need to do an explicit copy.
Approved by: re (jhb)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
state. Those changed attempted to work around the changed invariant
that inp->in_socket was sometimes now NULL, but the logic wasn't
quite right, meaning that inp->in_socket would be dereferenced by
cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC
were in use. Attempt to clarify and correct the logic.
Note: the work-around originally introduced with the reduced TCP
wait state handling to use cr_cansee() instead of cr_canseesocket()
in this case isn't really right, although it "Does the right thing"
for most of the cases in the base system. We'll need to address
this at some point in the future.
Pointed out by: dcs
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>