Commit Graph

2005 Commits

Author SHA1 Message Date
mlaier
5eba798674 Commit pf version 3.5 and link additional files to the kernel build.
Version 3.5 brings:
 - Atomic commits of ruleset changes (reduce the chance of ending up in an
   inconsistent state).
 - A 30% reduction in the size of state table entries.
 - Source-tracking (limit number of clients and states per client).
 - Sticky-address (the flexibility of round-robin with the benefits of
   source-hash).
 - Significant improvements to interface handling.
 - and many more ...
2004-06-16 23:24:02 +00:00
mlaier
18ff360027 Prepare for pf 3.5 import:
- Remove pflog and pfsync modules. Things will change in such a fashion
   that there will be one module with pf+pflog that can be loaded into
   GENERIC without problems (which is what most people want). pfsync is no
   longer possible as a module.
 - Add multicast address for in-kernel multicast pfsync protocol. Protocol
   glue will follow once the import is done.
 - Add one more mbuf tag
2004-06-16 22:59:06 +00:00
maxim
32bf9d060d o connect(2): if there is no a route to the destination
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.

Submitted by:	Gleb Smirnoff
Reviewed by:	-current (silence)
2004-06-16 10:02:36 +00:00
bms
cafb94bcea Fix build for IPSEC && !INET6
PR:		kern/66125
Submitted by:	Cyrille Lefevre
2004-06-16 09:35:07 +00:00
bms
ff08157f93 Reverse a patch which has no effect on -CURRENT and should probably be
applied directly to -STABLE.

Noticed by:	iedowse
Pointy hat to:	bms
2004-06-16 08:50:14 +00:00
bms
075deb0c4c In ip_forward(), when calculating the MTU in effect for an IPSEC transport
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).

PR:		kern/42727
Obtained from:	NetBSD (ip_input.c rev 1.151)
2004-06-16 08:33:09 +00:00
bms
e8e6ce5e86 In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chain
is sane, and ipsec4_getpolicybyaddr() will therefore complete.

PR:		kern/42727
Obtained from:	KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)
2004-06-16 08:28:54 +00:00
bms
deb499d51d Disconnect a temporarily-connected UDP socket in out-of-mbufs case. This
fixes the problem of UDP sockets getting wedged in a connected state (and
bound to their destination) under heavy load.
Temporary bind/connect should probably be deleted in future
as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].

Notes:
 - INP_LOCK() is already held in udp_output(). The connection is in effect
   happening at a layer lower than the socket layer, therefore in theory
   socket locking should not be needed.
 - Inlining the in_pcbdisconnect() operation buys us nothing (in the case
   of the current state of the code), as laddr is not part of the
   inpcb hash or the udbinfo hash. Therefore there should be no need
   to rehash after restoring laddr in the error case (this was a
   concern of the original author of the patch).

PR:		kern/41765
Requested by:	gnn
Submitted by:	Jinmei Tatuya (with cleanups)
Tested by:	spray(8)
2004-06-16 05:41:00 +00:00
rwatson
1e2bc9e8f6 Convert GIANT_REQUIRED to NET_ASSERT_GIANT for socket access. 2004-06-16 03:36:06 +00:00
rwatson
029226f3a8 Grab the socket buffer send or receive mutex when performing a
read-modify-write on the sb_state field.  This commit catches only
the "easy" ones where it doesn't interact with as yet unmerged
locking.
2004-06-15 03:51:44 +00:00
rwatson
f2c0db1521 The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
mlaier
977d97b004 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
dfr
a1fa8042f5 Add a new driver to support IP over firewire. This driver is intended to
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
2004-06-13 10:54:36 +00:00
rwatson
f1bc833e95 Socket MAC labels so_label and so_peerlabel are now protected by
SOCK_LOCK(so):

- Hold socket lock over calls to MAC entry points reading or
  manipulating socket labels.

- Assert socket lock in MAC entry point implementations.

- When externalizing the socket label, first make a thread-local
  copy while holding the socket lock, then release the socket lock
  to externalize to userspace.
2004-06-13 02:50:07 +00:00
rwatson
82295697cd Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
csjp
5931514a19 Modify ip fw so that whenever UID or GID constraints exist in a
ruleset, the pcb is looked up once per ipfw_chk() activation.

This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).

Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.

The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png

Reviewed by:	andre, luigi, rwatson
Approved by:	bmilekic (mentor)
2004-06-11 22:17:14 +00:00
rwatson
e7bc2202da Remove unneeded Giant acquisition in divert_packet(), which is
left over from debug.mpsafenet affecting only the forwarding
plane.  Giant is now acquired in the ithread/netisr or in the
system call code.
2004-06-11 04:06:51 +00:00
rwatson
71dd84f2b8 Lock down parallel router_info list for tracking multicast IGMP
versions of various routers seen:

- Introduce igmp_mtx.
- Protect global variable 'router_info_head' and list fields
  in struct router_info with this mutex, as well as
  igmp_timers_are_running.
- find_rti() asserts that the caller acquires igmp_mtx.
- Annotate a failure to check the return value of
  MALLOC(..., M_NOWAIT).
2004-06-11 03:42:37 +00:00
ru
30a30620ee init_tables() must be run after sys/net/route.c:route_init(). 2004-06-10 20:20:37 +00:00
ru
27bed143c8 Introduce a new feature to IPFW2: lookup tables. These are useful
for handling large sparse address sets.  Initial implementation by
Vsevolod Lobko <seva@ip.net.ua>, refined by me.

MFC after:	1 week
2004-06-09 20:10:38 +00:00
ume
4ef088056e do not send icmp response if the original packet is encrypted.
Obtained from:	KAME
MFC after:	1 week
2004-06-07 09:56:59 +00:00
bmilekic
dcf58b3319 Move the locking of the pcb into raw_output(). Organize code so
that m_prepend() is not called with possibility to wait while the
pcb lock is held.  What still needs revisiting is whether the
ripcbinfo lock is really required here.

Discussed with: rwatson
2004-06-03 03:15:29 +00:00
phk
f43aa0c4bc add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
phk
d6f7d2bde6 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
csjp
3006bc70da Add a super-user check to ipfw_ctl() to make sure that the calling
process is a non-prison root. The security.jail.allow_raw_sockets
sysctl variable is disabled by default, however if the user enables
raw sockets in prisons, prison-root should not be able to interact
with firewall rule sets.

Approved by:	rwatson, bmilekic (mentor)
2004-05-25 15:02:12 +00:00
yar
45f0ba1547 When checking for possible port theft, skip over a TCP inpcb
unless it's in the closed or listening state (remote address
== INADDR_ANY).

If a TCP inpcb is in any other state, it's impossible to steal
its local port or use it for port theft.  And if there are
both closed/listening and connected TCP inpcbs on the same
localIP:port couple, the call to in_pcblookup_local() will
find the former due to the design of that function.

No objections raised in:	-net, -arch
MFC after:			1 month
2004-05-20 06:35:02 +00:00
maxim
a4dd24e359 o Calculate a number of bytes to copy (cnt) correctly:
+----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
  |    | |C| | |    |    |                          |    |
  | IP |N|O|L|P|    | IP |                          | IP |
  | #1 |O|D|E|T|    | #2 |                          | #n |
  |    |P|E|N|R|    |    |                          |    |
  +----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
               ^    ^<---- cnt - (IPOPT_MINOFF - 1) ---->|
               |    |
src            |    +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr)
               |
dst            +-- cp[IPOPT_OFF + 1]

PR:		kern/66386
Submitted by:	Andrei Iltchenko
MFC after:	3 weeks
2004-05-11 19:14:44 +00:00
maxim
5839c11830 o IFNAMSIZ does include the trailing \0.
Approved by:	andre

o Document net.inet.icmp.reply_src.
2004-05-07 01:24:53 +00:00
andre
832d1bd181 Provide the sysctl net.inet.ip.process_options to control the processing
of IP options.

 net.inet.ip.process_options=0  Ignore IP options and pass packets unmodified.
 net.inet.ip.process_options=1  Process all IP options (default).
 net.inet.ip.process_options=2  Reject all packets with IP options with ICMP
  filter prohibited message.

This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).

IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.

Reviewed by:	sam (mentor)
2004-05-06 18:46:03 +00:00
rwatson
ff404935e2 Switch to using the inpcb MAC label instead of socket MAC label when
labeling new mbufs created from sockets/inpcbs in IPv4.  This helps avoid
the need for socket layer locking in the lower level network paths
where inpcb locks are already frequently held where needed.  In
particular:

- Use the inpcb for label instead of socket in raw_append().
- Use the inpcb for label instead of socket in tcp_output().
- Use the inpcb for label instead of socket in tcp_respond().
- Use the inpcb for label instead of socket in tcp_twrespond().
- Use the inpcb for label instead of socket in syncache_respond().

While here, modify tcp_respond() to avoid assigning NULL to a stack
variable and centralize assertions about the inpcb when inp is
assigned.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 02:11:47 +00:00
rwatson
e15e5d4977 Assert inpcb lock in udp_append().
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 01:08:15 +00:00
rwatson
2f2ce5b406 Assert the inpcb lock on 'last' in udp_append(), since it's always
called with it, and also requires it.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, McAfee Research
2004-05-04 00:10:16 +00:00
maxim
96efdc3250 o Fix misindentation in the previous commit. 2004-05-03 17:15:34 +00:00
andre
d18bda0b0d Back out a change that slipped into the previous commit for which other
supporting parts have not yet been committed.

Remove pre-mature IP options ignoring option.
2004-05-03 16:07:13 +00:00
andre
7e338f7bd0 Optimize IP fastforwarding some more:
o New function ip_findroute() to reduce code duplication for the
  route lookup cases. (luigi)

o Store ip_len in host byte order on the stack instead of using
  it via indirection from the mbuf.  This allows to defer the host
  byte conversion to a later point and makes a quicker fallback to
  normal ip_input() processing. (luigi)

o Check if route is dampned with RTF_REJECT flag and drop packet
  already here when ARP is unable to resolve destination address.
  An ICMP unreachable is sent to inform the sender.

o Check if interface output queue is full and drop packet already
  here.  No ICMP notification is sent because signalling source quench
  is depreciated.

o Check if media_state is down (used for ethernet type interfaces)
  and drop the packet already here.  An ICMP unreachable is sent to
  inform the sender.

o Do not account sent packets to the interface address counters.  They
  are only for packets with that 'ia' as source address.

o Update and clarify some comments.

Submitted by:	luigi (most of it)
2004-05-03 13:52:47 +00:00
darrenr
a50f040209 Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier. 2004-05-02 15:10:17 +00:00
darrenr
4e8ce3156b oops, I forgot this file in a prior commit (change was still sitting here,
uncommitted):

Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h
alongside all of the other macros that work ok mbuf's and tag's.
2004-05-02 15:07:37 +00:00
darrenr
8f62dbebe1 Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside
all of the other macros that work ok mbuf's and tag's.
2004-05-02 06:36:30 +00:00
bmilekic
6bbcc9da29 Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR.  The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely.  This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
2004-04-26 19:46:52 +00:00
silby
051b00be73 Tighten up reset handling in order to make reset attacks as difficult as
possible while maintaining compatibility with the widest range of TCP stacks.

The algorithm is as follows:

---
For connections in the ESTABLISHED state, only resets with
sequence numbers exactly matching last_ack_sent will cause a reset,
all other segments will be silently dropped.

For connections in all other states, a reset anywhere in the window
will cause the connection to be reset.  All other segments will be
silently dropped.
---

The necessity of accepting all in-window resets was discovered
by jayanth and jlemon, both of whom have seen TCP stacks that
will respond to FIN-ACK packets with resets not meeting the
strict last_ack_sent check.

Idea by:        Darren Reed
Reviewed by:    truckman, jlemon, others(?)
2004-04-26 02:56:31 +00:00
luigi
53bd42643d Another small set of changes to reduce diffs with the new arp code. 2004-04-25 15:00:17 +00:00
luigi
93066eb95b remove a stale comment on the behaviour of arpresolve 2004-04-25 14:06:23 +00:00
luigi
131ad9c351 Start the arp timer at init time.
It runs so rarely that it makes no sense to wait until the first request.
2004-04-25 12:50:14 +00:00
luigi
59063f7a08 This commit does two things:
1. rt_check() cleanup:
    rt_check() is only necessary for some address families to gain access
    to the corresponding arp entry, so call it only in/near the *resolve()
    routines where it is actually used -- at the moment this is
    arpresolve(), nd6_storelladdr() (the call is embedded here),
    and atmresolve() (the call is just before atmresolve to reduce
    the number of changes).
    This change will make it a lot easier to decouple the arp table
    from the routing table.

    There is an extra call to rt_check() in if_iso88025subr.c to
    determine the routing info length. I have left it alone for
    the time being.

    The interface of arpresolve() and nd6_storelladdr() now changes slightly:
     + the 'rtentry' parameter (really a hint from the upper level layer)
       is now passed unchanged from *_output(), so it becomes the route
       to the final destination and not to the gateway.
     + the routines will return 0 if resolution is possible, non-zero
       otherwise.
     + arpresolve() returns EWOULDBLOCK in case the mbuf is being held
       waiting for an arp reply -- in this case the error code is masked
       in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
    Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
    and use the IFP2AC macro to access arpcom fields.
    This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
   rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
   rt_check() cleanup

net/if_ethersubr.c
   rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
   rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
   rt_check() cleanup

netatalk/aarp.c
   arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
   arpcom untangling

netinet/if_ether.c
   rt_check() cleanup (change arpresolve)

netinet6/nd6.c
   rt_check() cleanup (change nd6_storelladdr)
2004-04-25 09:24:52 +00:00
silby
67d79f0cfb Wrap two long lines in the previous commit. 2004-04-23 23:29:49 +00:00
andre
7357a88fdb Correct an edge case in tcp_mss() where the cached path MTU
from tcp_hostcache would have overridden a (now) lower MTU of
an interface or route that changed since first PMTU discovery.
The bug would have caused TCP to redo the PMTU discovery when
not strictly necessary.

Make a comment about already pre-initialized default values
more clear.

Reviewed by:	sam
2004-04-23 22:44:59 +00:00
andre
d4f49f008f Add the option versrcreach to verify that a valid route to the
source address of a packet exists in the routing table.  The
default route is ignored because it would match everything and
render the check pointless.

This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.

Example:

 ipfw add 1000 deny ip from any to any not versrcreach

also known in Cisco-speak as:

  ip verify unicast source reachable-via any

Reviewed by:	luigi
2004-04-23 14:28:38 +00:00
andre
e8723e5528 Fix a potential race when purging expired hostcache entries.
Spotted by:	luigi
2004-04-23 13:54:28 +00:00
silby
b4c0a798a2 Take out an unneeded variable I forgot to remove in the last commit,
and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
2004-04-22 08:34:55 +00:00
silby
760a7deec6 Simplify random port allocation, and add net.inet.ip.portrange.randomized,
which can be used to turn off randomized port allocation if so desired.

Requested by:	alfred
2004-04-22 08:32:14 +00:00
bms
8fb0962eb0 Fix a typo in a comment. 2004-04-20 19:04:24 +00:00
silby
f0d28bbf0c Switch from using sequential to random ephemeral port allocation,
implementation taken directly from OpenBSD.

I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
2004-04-20 06:45:10 +00:00
silby
743d110741 Enhance our RFC1948 implementation to perform better in some pathlogical
TIME_WAIT recycling cases I was able to generate with http testing tools.

In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number.  As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.

I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.

Except in such contrived benchmarking situations, this problem should never
come up in normal usage...  until networks get faster.

No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.
2004-04-20 06:33:39 +00:00
luigi
eb6b02962a Replace Bcopy with 'the real thing' as in the rest of the file. 2004-04-18 11:45:49 +00:00
luigi
94049d0810 In an effort to simplify the routing code, try to deprecate rtalloc()
in favour of rtalloc_ign(), which is what would end up being called
anyways.

There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
2004-04-14 01:13:14 +00:00
imp
b49b7fe799 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
ru
a6980b04fc Fixed a bug in previous revision: compute the payload checksum before
we convert ip_len into a network byte order; in_delayed_cksum() still
expects it in host byte order.

The symtom was the ``in_cksum_skip: out of data by %d'' complaints
from the kernel.

To add to the previous commit log.  These fixes make tcpdump(1) happy
by not complaining about UDP/TCP checksum being bad for looped back
IP multicast when multicast router is deactivated.

Reported by:	Vsevolod Lobko
2004-04-07 10:01:39 +00:00
bde
014c3b4511 Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h>
to implement this mistake.

Fixed some nearby style bugs (initialization in declaration, misformatting
of this initialization, missing blank line after the declaration, and
comparision of the non-boolean result of the initialization with 0 using
"!".  In KNF, "!" is not even used to compare booleans with 0).
2004-04-06 10:59:11 +00:00
rwatson
e391576e63 Two missed in previous commit -- compare pointer with NULL rather than
using it as a boolean.
2004-04-05 00:52:05 +00:00
rwatson
bdbf43dbba Prefer NULL to 0 when checking pointer values as integers or booleans. 2004-04-05 00:49:07 +00:00
pjd
2e6142d4d9 Fix a panic possibility caused by returning without releasing locks.
It was fixed by moving problemetic checks, as well as checks that
doesn't need locking before locks are acquired.

Submitted by:		Ryan Sommers <ryans@gamersimpact.com>
In co-operation with:	cperciva, maxim, mlaier, sam
Tested by:		submitter (previous patch), me (current patch)
Reviewed by:		cperciva, mlaier (previous patch), sam (current patch)
Approved by:		sam
Dedicated to:		enough!
2004-04-04 20:14:55 +00:00
luigi
c54de1f76f + arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
  to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
  etc.) for future use.

+ struct route: remove an unused field, move close to each
  other some fields that might likely go away in the future
2004-04-04 06:14:55 +00:00
deischen
a883ccc958 Unbreak natd.
Reported and submitted by:	Sean McNeil (sean at mcneil.com)
2004-04-02 17:57:57 +00:00
des
38842c29ce Raise WARNS level to 2. 2004-03-31 21:33:55 +00:00
des
2209468b0e Deal with aliasing warnings.
Reviewed by:	ru
Approved by:	silence on the lists
2004-03-31 21:32:58 +00:00
rwatson
8525af93ba Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it.
Previously, Giant would be grabbed at entry to the IP local delivery code
when debug.mpsafenet was set to true, as that implied Giant wouldn't be
grabbed in the driver path.  Now, we will use this primitive to
conditionally grab Giant in the event the entire network stack isn't
running MPSAFE (debug.mpsafenet == 0).
2004-03-28 23:12:19 +00:00
pjd
c83007d58a Remove unused argument. 2004-03-28 15:48:00 +00:00
pjd
49554d1bd8 Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
	- in_pcbbind_setup(),
	- in_pcbconnect(),
	- in_pcbconnect_setup(),
	- in6_pcbbind(),
	- in6_pcbconnect(),
	- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by:	rwatson
Reviewed by:	rwatson, ume
2004-03-27 21:05:46 +00:00
pjd
02bc133779 Remove unused argument.
Reviewed by:	ume
2004-03-27 20:41:32 +00:00
ume
11f479f519 Validate IPv6 socket options more carefully to avoid a panic.
PR:		kern/61513
Reviewed by:	cperciva, nectar
2004-03-26 19:52:18 +00:00
pjd
89f5b6c374 Remove unused function.
It was used in FreeBSD 4.x, but now we're using cr_canseesocket().
2004-03-25 15:12:12 +00:00
ru
869b51c6d0 Untangle IP multicast routing interaction with delayed payload checksums.
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active.  This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive).  This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
2004-03-25 08:46:27 +00:00
rwatson
aaf338640e Lock down global variables in if_gre:
- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
  from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
  interfaces during encapsulation.  Add a wonking comment on how we need
  some sort of drain/reference count mechanism to keep gre references
  alive while in use and simultaneous destroy.

This commit does not lockdown softc data, which follows in a future
commit.
2004-03-22 16:04:43 +00:00
mdodd
97af430ce1 - Fix indentation lost by 'diff -b'.
- Un-wrap short line.
2004-03-21 18:51:26 +00:00
mdodd
3092080960 Remove interface type specific code from arprequest(), and in_arpinput().
The AF_ARP case in the (*if_output)() routine will handle the interface type
specific bits.

Obtained from:	NetBSD
2004-03-21 06:36:05 +00:00
des
3cb81148d8 Run through indent(1) so I can read the code without getting a headache.
The result isn't quite knf, but it's knfer than the original, and far
more consistent.
2004-03-16 21:30:41 +00:00
mdodd
90eb7f448e De-register. 2004-03-14 00:44:11 +00:00
rwatson
fea945fb73 Lock down IP-layer encapsulation library:
- Add encapmtx to protect ip_encap.c global variables (encapsulation
   list).
 - Unifdef #ifdef 0 pieces of encap_init() which was (and now really
   is) basically a no-op.
 - Lock encapmtx when walking encaptab, modifying it, comparing
   entries, etc.
 - Remove spl's.

Note that currently there's no facilite to make sure outstanding
use of encapsulation methods on a table entry have drained bfore
we allow a table entry to be removed.  As such, it's currently the
caller's responsibility to make sure that draining takes place.

Reviewed by:	mlaier
2004-03-10 02:48:50 +00:00
rwatson
7866f6e355 Scrub unused variable zeroin_addr. 2004-03-10 01:01:04 +00:00
hsu
01e90a548b To comply with the spec, do not copy the TOS from the outer IP
header to the inner IP header of the PIM Register if this is a PIM
Null-Register message.

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2004-03-08 07:47:27 +00:00
hsu
ef9b350961 Include <sys/types.h> for autoconf/automake detection.
Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2004-03-08 07:45:32 +00:00
mlaier
206b5dcf52 Add some missing DUMMYNET_UNLOCK() in config_pipe().
Noticed by:	Simon Coggins
Approved by:	bms(mentor)
2004-03-03 01:33:22 +00:00
mlaier
a32329ae33 Two minor follow-ups on the MT_TAG removal:
ifp is now passed explicitly to ether_demux; no need to look it up again.
Make mtag a global var in ip_input.

Noticed by:	rwatson
Approved by:	bms(mentor)
2004-03-02 14:37:23 +00:00
rwatson
8da3c338df Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT()
to NET_UNLOCK_GIANT().  While they are used in similar ways, the
semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT()
directly wrap mutex lock and unlock operations, whereas drop/pickup
special case the handling of Giant recursion.  Add a comment saying
as much.

Add NET_ASSERT_GIANT(), which conditionally asserts Giant based
on the value of debug_mpsafenet.
2004-03-01 22:37:01 +00:00
ume
178eaf92b9 fix -O0 compilation without INET6.
Pointed out by:	ru
2004-03-01 19:10:31 +00:00
rwatson
2336a6f7fe Remove unneeded {} originally used to hold local variables for dummynet
in a code block, as the variable is now gone.

Submitted by:	sam
2004-02-28 19:50:43 +00:00
rwatson
4bcda84b60 Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These
were needed by the MAC Framework until inpcbs gained labels.

Submitted by:	sam
2004-02-28 15:12:20 +00:00
mlaier
d937176b34 Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
2004-02-26 04:27:55 +00:00
mlaier
428f1c9a0f Tweak existing header and other build infrastructure to be able to build
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).

Approved by: bms(mentor)
2004-02-26 03:53:54 +00:00
truckman
1de257deb3 Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire().  Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails.  Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by:	bms
2004-02-26 00:27:04 +00:00
mlaier
1504165dce Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
bde
232b17fc86 Fixed namespace pollution in rev.1.74. Implementation of the syncache
increased <netinet/tcp_var>'s already large set of prerequisites, and
this was handled badly.  Just don't declare the complete syncache struct
unless <netinet/pcb.h> is included before <netinet/tcp_var.h>.

Approved by:	jlemon (years ago, for a more invasive fix)
2004-02-25 13:03:01 +00:00
bde
50325f4c96 Don't use the negatively-opaque type uma_zone_t or be chummy with
<vm/uma.h>'s idempotency indentifier or its misspelling.
2004-02-25 11:53:19 +00:00
hsu
bb0d027044 Relax a KASSERT condition to allow for a valid corner case where
the FIN on the last segment consumes an extra sequence number.

Spurious panic reported by Mike Silbersack <silby@silby.com>.
2004-02-25 08:53:17 +00:00
andre
5ef70fe223 Convert the tcp segment reassembly queue to UMA and limit the maximum
amount of segments it will hold.

The following tuneables and sysctls control the behaviour of the tcp
segment reassembly queue:

 net.inet.tcp.reass.maxsegments (loader tuneable)
  specifies the maximum number of segments all tcp reassemly queues can
  hold (defaults to 1/16 of nmbclusters).

 net.inet.tcp.reass.maxqlen
  specifies the maximum number of segments any individual tcp session queue
  can hold (defaults to 48).

 net.inet.tcp.reass.cursegments (readonly)
  counts the number of segments currently in all reassembly queues.

 net.inet.tcp.reass.overflows (readonly)
  counts how often either the global or local queue limit has been reached.

Tested by:	bms, silby
Reviewed by:	bms, silby
2004-02-24 15:27:41 +00:00
pjd
806650a364 Fixed ucred structure leak.
Approved by:	scottl (mentor)
PR:		54163
MFC after:	3 days
2004-02-19 14:13:21 +00:00
mlaier
60723c3260 Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
ume
92aaace604 IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by:	"Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
2004-02-17 14:02:37 +00:00
bms
2b958c2272 Shorten the name of the socket option used to enable TCP-MD5 packet
treatment.

Submitted by:	Vincent Jardin
2004-02-16 22:21:16 +00:00
ume
a9d87abe7e don't update outgoing ifp, if ipsec tunnel mode encapsulation
was not made.

Obtained from:	KAME
2004-02-16 17:05:06 +00:00