jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.
Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.
The patch being committed is not identical to the patch
in the PR. The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely. This change has also been
presented and addressed on the freebsd-hackers mailing
list.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
possible while maintaining compatibility with the widest range of TCP stacks.
The algorithm is as follows:
---
For connections in the ESTABLISHED state, only resets with
sequence numbers exactly matching last_ack_sent will cause a reset,
all other segments will be silently dropped.
For connections in all other states, a reset anywhere in the window
will cause the connection to be reset. All other segments will be
silently dropped.
---
The necessity of accepting all in-window resets was discovered
by jayanth and jlemon, both of whom have seen TCP stacks that
will respond to FIN-ACK packets with resets not meeting the
strict last_ack_sent check.
Idea by: Darren Reed
Reviewed by: truckman, jlemon, others(?)
1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.
There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.
The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.
2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.
=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable
net/if_atmsubr.c
rt_check() cleanup
net/if_ethersubr.c
rt_check() cleanup, arpcom untangling
net/if_fddisubr.c
rt_check() cleanup, arpcom untangling
net/if_iso88025subr.c
rt_check() cleanup
netatalk/aarp.c
arpcom untangling, remove a block of duplicated code
netatalk/at_extern.h
arpcom untangling
netinet/if_ether.c
rt_check() cleanup (change arpresolve)
netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)
from tcp_hostcache would have overridden a (now) lower MTU of
an interface or route that changed since first PMTU discovery.
The bug would have caused TCP to redo the PMTU discovery when
not strictly necessary.
Make a comment about already pre-initialized default values
more clear.
Reviewed by: sam
source address of a packet exists in the routing table. The
default route is ignored because it would match everything and
render the check pointless.
This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.
Example:
ipfw add 1000 deny ip from any to any not versrcreach
also known in Cisco-speak as:
ip verify unicast source reachable-via any
Reviewed by: luigi
implementation taken directly from OpenBSD.
I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
TIME_WAIT recycling cases I was able to generate with http testing tools.
In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number. As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.
I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.
Except in such contrived benchmarking situations, this problem should never
come up in normal usage... until networks get faster.
No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.
in favour of rtalloc_ign(), which is what would end up being called
anyways.
There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
we convert ip_len into a network byte order; in_delayed_cksum() still
expects it in host byte order.
The symtom was the ``in_cksum_skip: out of data by %d'' complaints
from the kernel.
To add to the previous commit log. These fixes make tcpdump(1) happy
by not complaining about UDP/TCP checksum being bad for looped back
IP multicast when multicast router is deactivated.
Reported by: Vsevolod Lobko
to implement this mistake.
Fixed some nearby style bugs (initialization in declaration, misformatting
of this initialization, missing blank line after the declaration, and
comparision of the non-boolean result of the initialization with 0 using
"!". In KNF, "!" is not even used to compare booleans with 0).
It was fixed by moving problemetic checks, as well as checks that
doesn't need locking before locks are acquired.
Submitted by: Ryan Sommers <ryans@gamersimpact.com>
In co-operation with: cperciva, maxim, mlaier, sam
Tested by: submitter (previous patch), me (current patch)
Reviewed by: cperciva, mlaier (previous patch), sam (current patch)
Approved by: sam
Dedicated to: enough!
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.
+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future
Previously, Giant would be grabbed at entry to the IP local delivery code
when debug.mpsafenet was set to true, as that implied Giant wouldn't be
grabbed in the driver path. Now, we will use this primitive to
conditionally grab Giant in the event the entire network stack isn't
running MPSAFE (debug.mpsafenet == 0).
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active. This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive). This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
interfaces during encapsulation. Add a wonking comment on how we need
some sort of drain/reference count mechanism to keep gre references
alive while in use and simultaneous destroy.
This commit does not lockdown softc data, which follows in a future
commit.
- Add encapmtx to protect ip_encap.c global variables (encapsulation
list).
- Unifdef #ifdef 0 pieces of encap_init() which was (and now really
is) basically a no-op.
- Lock encapmtx when walking encaptab, modifying it, comparing
entries, etc.
- Remove spl's.
Note that currently there's no facilite to make sure outstanding
use of encapsulation methods on a table entry have drained bfore
we allow a table entry to be removed. As such, it's currently the
caller's responsibility to make sure that draining takes place.
Reviewed by: mlaier
ifp is now passed explicitly to ether_demux; no need to look it up again.
Make mtag a global var in ip_input.
Noticed by: rwatson
Approved by: bms(mentor)
to NET_UNLOCK_GIANT(). While they are used in similar ways, the
semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT()
directly wrap mutex lock and unlock operations, whereas drop/pickup
special case the handling of Giant recursion. Add a comment saying
as much.
Add NET_ASSERT_GIANT(), which conditionally asserts Giant based
on the value of debug_mpsafenet.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.
Approved by: bms(mentor)