freebsd-dev

Author	SHA1	Message	Date
Poul-Henning Kamp	41ee9f1c69	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
Christian S.J. Peron	b5ef991561	Add a super-user check to ipfw_ctl() to make sure that the calling process is a non-prison root. The security.jail.allow_raw_sockets sysctl variable is disabled by default, however if the user enables raw sockets in prisons, prison-root should not be able to interact with firewall rule sets. Approved by: rwatson, bmilekic (mentor)	2004-05-25 15:02:12 +00:00
Yaroslav Tykhiy	4658dc8325	When checking for possible port theft, skip over a TCP inpcb unless it's in the closed or listening state (remote address == INADDR_ANY). If a TCP inpcb is in any other state, it's impossible to steal its local port or use it for port theft. And if there are both closed/listening and connected TCP inpcbs on the same localIP:port couple, the call to in_pcblookup_local() will find the former due to the design of that function. No objections raised in: -net, -arch MFC after: 1 month	2004-05-20 06:35:02 +00:00
Maxim Konovalov	a49b21371a	o Calculate a number of bytes to copy (cnt) correctly: +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ \| \| \|C\| \| \| \| \| \| \| \| IP \|N\|O\|L\|P\| \| IP \| \| IP \| \| #1 \|O\|D\|E\|T\| \| #2 \| \| #n \| \| \|P\|E\|N\|R\| \| \| \| \| +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->\| \| \| src \| +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) \| dst +-- cp[IPOPT_OFF + 1] PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks	2004-05-11 19:14:44 +00:00
Maxim Konovalov	d0946241ac	o IFNAMSIZ does include the trailing \0. Approved by: andre o Document net.inet.icmp.reply_src.	2004-05-07 01:24:53 +00:00
Andre Oppermann	2bde81acd6	Provide the sysctl net.inet.ip.process_options to control the processing of IP options. net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified. net.inet.ip.process_options=1 Process all IP options (default). net.inet.ip.process_options=2 Reject all packets with IP options with ICMP filter prohibited message. This sysctl affects packets destined for the local host as well as those only transiting through the host (routing). IP options do not have any legitimate purpose anymore and are only used to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP stacks. Reviewed by: sam (mentor)	2004-05-06 18:46:03 +00:00
Robert Watson	c18b97c630	Switch to using the inpcb MAC label instead of socket MAC label when labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 02:11:47 +00:00
Robert Watson	87f2bb8caf	Assert inpcb lock in udp_append(). Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 01:08:15 +00:00
Robert Watson	cbe42d48bd	Assert the inpcb lock on 'last' in udp_append(), since it's always called with it, and also requires it. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 00:10:16 +00:00
Maxim Konovalov	1a0c4873ed	o Fix misindentation in the previous commit.	2004-05-03 17:15:34 +00:00
Andre Oppermann	7652802b06	Back out a change that slipped into the previous commit for which other supporting parts have not yet been committed. Remove pre-mature IP options ignoring option.	2004-05-03 16:07:13 +00:00
Andre Oppermann	06bb56f43c	Optimize IP fastforwarding some more: o New function ip_findroute() to reduce code duplication for the route lookup cases. (luigi) o Store ip_len in host byte order on the stack instead of using it via indirection from the mbuf. This allows to defer the host byte conversion to a later point and makes a quicker fallback to normal ip_input() processing. (luigi) o Check if route is dampned with RTF_REJECT flag and drop packet already here when ARP is unable to resolve destination address. An ICMP unreachable is sent to inform the sender. o Check if interface output queue is full and drop packet already here. No ICMP notification is sent because signalling source quench is depreciated. o Check if media_state is down (used for ethernet type interfaces) and drop the packet already here. An ICMP unreachable is sent to inform the sender. o Do not account sent packets to the interface address counters. They are only for packets with that 'ia' as source address. o Update and clarify some comments. Submitted by: luigi (most of it)	2004-05-03 13:52:47 +00:00
Darren Reed	2f3f1e6773	Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.	2004-05-02 15:10:17 +00:00
Darren Reed	7fbb130049	oops, I forgot this file in a prior commit (change was still sitting here, uncommitted): Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 15:07:37 +00:00
Darren Reed	ab884d993e	Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 06:36:30 +00:00
Bosko Milekic	5a59cefcd1	Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800	2004-04-26 19:46:52 +00:00
Mike Silbersack	80dd2a81fb	Tighten up reset handling in order to make reset attacks as difficult as possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)	2004-04-26 02:56:31 +00:00
Luigi Rizzo	b2a8ac7ca5	Another small set of changes to reduce diffs with the new arp code.	2004-04-25 15:00:17 +00:00
Luigi Rizzo	491522eade	remove a stale comment on the behaviour of arpresolve	2004-04-25 14:06:23 +00:00
Luigi Rizzo	cfff63f1b8	Start the arp timer at init time. It runs so rarely that it makes no sense to wait until the first request.	2004-04-25 12:50:14 +00:00
Luigi Rizzo	cd46a114fc	This commit does two things: 1. rt_check() cleanup: rt_check() is only necessary for some address families to gain access to the corresponding arp entry, so call it only in/near the resolve() routines where it is actually used -- at the moment this is arpresolve(), nd6_storelladdr() (the call is embedded here), and atmresolve() (the call is just before atmresolve to reduce the number of changes). This change will make it a lot easier to decouple the arp table from the routing table. There is an extra call to rt_check() in if_iso88025subr.c to determine the routing info length. I have left it alone for the time being. The interface of arpresolve() and nd6_storelladdr() now changes slightly: + the 'rtentry' parameter (really a hint from the upper level layer) is now passed unchanged from _output(), so it becomes the route to the final destination and not to the gateway. + the routines will return 0 if resolution is possible, non-zero otherwise. + arpresolve() returns EWOULDBLOCK in case the mbuf is being held waiting for an arp reply -- in this case the error code is masked in the caller so the upper layer protocol will not see a failure. 2. arpcom untangling Where possible, use 'struct ifnet' instead of 'struct arpcom' variables, and use the IFP2AC macro to access arpcom fields. This mostly affects the netatalk code. === Detailed changes: === net/if_arcsubr.c rt_check() cleanup, remove a useless variable net/if_atmsubr.c rt_check() cleanup net/if_ethersubr.c rt_check() cleanup, arpcom untangling net/if_fddisubr.c rt_check() cleanup, arpcom untangling net/if_iso88025subr.c rt_check() cleanup netatalk/aarp.c arpcom untangling, remove a block of duplicated code netatalk/at_extern.h arpcom untangling netinet/if_ether.c rt_check() cleanup (change arpresolve) netinet6/nd6.c rt_check() cleanup (change nd6_storelladdr)	2004-04-25 09:24:52 +00:00
Mike Silbersack	6b2fc10b64	Wrap two long lines in the previous commit.	2004-04-23 23:29:49 +00:00
Andre Oppermann	2d166c0202	Correct an edge case in tcp_mss() where the cached path MTU from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary. Make a comment about already pre-initialized default values more clear. Reviewed by: sam	2004-04-23 22:44:59 +00:00
Andre Oppermann	22b5770b99	Add the option versrcreach to verify that a valid route to the source address of a packet exists in the routing table. The default route is ignored because it would match everything and render the check pointless. This option is very useful for routers with a complete view of the Internet (BGP) in the routing table to reject packets with spoofed or unrouteable source addresses. Example: ipfw add 1000 deny ip from any to any not versrcreach also known in Cisco-speak as: ip verify unicast source reachable-via any Reviewed by: luigi	2004-04-23 14:28:38 +00:00
Andre Oppermann	b62dccc7e5	Fix a potential race when purging expired hostcache entries. Spotted by: luigi	2004-04-23 13:54:28 +00:00
Mike Silbersack	174624e01d	Take out an unneeded variable I forgot to remove in the last commit, and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.	2004-04-22 08:34:55 +00:00
Mike Silbersack	6ac48b7409	Simplify random port allocation, and add net.inet.ip.portrange.randomized, which can be used to turn off randomized port allocation if so desired. Requested by: alfred	2004-04-22 08:32:14 +00:00
Bruce M Simpson	de9f59f850	Fix a typo in a comment.	2004-04-20 19:04:24 +00:00
Mike Silbersack	6dd946b3f7	Switch from using sequential to random ephemeral port allocation, implementation taken directly from OpenBSD. I've resisted committing this for quite some time because of concern over TIME_WAIT recycling breakage (sequential allocation ensures that there is a long time before ports are recycled), but recent testing has shown me that my fears were unwarranted.	2004-04-20 06:45:10 +00:00
Mike Silbersack	c1537ef063	Enhance our RFC1948 implementation to perform better in some pathlogical TIME_WAIT recycling cases I was able to generate with http testing tools. In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged. I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will always be increasing, no matter how quickly the port is recycled. Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster. No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.	2004-04-20 06:33:39 +00:00
Luigi Rizzo	ac912b2dc8	Replace Bcopy with 'the real thing' as in the rest of the file.	2004-04-18 11:45:49 +00:00
Luigi Rizzo	e6e51f0518	In an effort to simplify the routing code, try to deprecate rtalloc() in favour of rtalloc_ign(), which is what would end up being called anyways. There are 25 more instances of rtalloc() in net*/ and about 10 instances of rtalloc_ign()	2004-04-14 01:13:14 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Ruslan Ermilov	390cdc6a76	Fixed a bug in previous revision: compute the payload checksum before we convert ip_len into a network byte order; in_delayed_cksum() still expects it in host byte order. The symtom was the ``in_cksum_skip: out of data by %d'' complaints from the kernel. To add to the previous commit log. These fixes make tcpdump(1) happy by not complaining about UDP/TCP checksum being bad for looped back IP multicast when multicast router is deactivated. Reported by: Vsevolod Lobko	2004-04-07 10:01:39 +00:00
Bruce Evans	30a4ab088a	Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h> to implement this mistake. Fixed some nearby style bugs (initialization in declaration, misformatting of this initialization, missing blank line after the declaration, and comparision of the non-boolean result of the initialization with 0 using "!". In KNF, "!" is not even used to compare booleans with 0).	2004-04-06 10:59:11 +00:00
Robert Watson	47f32f6fa6	Two missed in previous commit -- compare pointer with NULL rather than using it as a boolean.	2004-04-05 00:52:05 +00:00
Robert Watson	24459934e9	Prefer NULL to 0 when checking pointer values as integers or booleans.	2004-04-05 00:49:07 +00:00
Pawel Jakub Dawidek	52710de1cb	Fix a panic possibility caused by returning without releasing locks. It was fixed by moving problemetic checks, as well as checks that doesn't need locking before locks are acquired. Submitted by: Ryan Sommers <ryans@gamersimpact.com> In co-operation with: cperciva, maxim, mlaier, sam Tested by: submitter (previous patch), me (current patch) Reviewed by: cperciva, mlaier (previous patch), sam (current patch) Approved by: sam Dedicated to: enough!	2004-04-04 20:14:55 +00:00
Luigi Rizzo	f7c5baa1c6	+ arpresolve(): remove an unused argument + struct ifnet: remove unused fields, move ipv6-related field close to each other, add a pointer to l3<->l2 translation tables (arp,nd6, etc.) for future use. + struct route: remove an unused field, move close to each other some fields that might likely go away in the future	2004-04-04 06:14:55 +00:00
Daniel Eischen	ab39bc9a92	Unbreak natd. Reported and submitted by: Sean McNeil (sean at mcneil.com)	2004-04-02 17:57:57 +00:00
Dag-Erling Smørgrav	e271f829b8	Raise WARNS level to 2.	2004-03-31 21:33:55 +00:00
Dag-Erling Smørgrav	2871c50186	Deal with aliasing warnings. Reviewed by: ru Approved by: silence on the lists	2004-03-31 21:32:58 +00:00
Robert Watson	7101d752b2	Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it. Previously, Giant would be grabbed at entry to the IP local delivery code when debug.mpsafenet was set to true, as that implied Giant wouldn't be grabbed in the driver path. Now, we will use this primitive to conditionally grab Giant in the event the entire network stack isn't running MPSAFE (debug.mpsafenet == 0).	2004-03-28 23:12:19 +00:00
Pawel Jakub Dawidek	56dc72c3b6	Remove unused argument.	2004-03-28 15:48:00 +00:00
Pawel Jakub Dawidek	b0330ed929	Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume	2004-03-27 21:05:46 +00:00
Pawel Jakub Dawidek	6823b82399	Remove unused argument. Reviewed by: ume	2004-03-27 20:41:32 +00:00
Hajimu UMEMOTO	a5d1aae31a	Validate IPv6 socket options more carefully to avoid a panic. PR: kern/61513 Reviewed by: cperciva, nectar	2004-03-26 19:52:18 +00:00
Pawel Jakub Dawidek	8da601dfb7	Remove unused function. It was used in FreeBSD 4.x, but now we're using cr_canseesocket().	2004-03-25 15:12:12 +00:00
Ruslan Ermilov	26f16ebeb1	Untangle IP multicast routing interaction with delayed payload checksums. Compute the payload checksum for a locally originated IP multicast where God intended, in ip_mloopback(), rather than doing it in ip_output() and only when multicast router is active. This is more correct as we do not fool ip_input() that the packet has the correct payload checksum when in fact it does not (when multicast router is inactive). This is also more efficient if we don't join the multicast group we send to, thus allowing the hardware to checksum the payload.	2004-03-25 08:46:27 +00:00
Robert Watson	bdae44a844	Lock down global variables in if_gre: - Add gre_mtx to protect global softc list. - Hold gre_mtx over various list operations (insert, delete). - Centralize if_gre interface teardown in gre_destroy(), and call this from modevent unload and gre_clone_destroy(). - Export gre_mtx to ip_gre.c, which walks the gre list to look up gre interfaces during encapsulation. Add a wonking comment on how we need some sort of drain/reference count mechanism to keep gre references alive while in use and simultaneous destroy. This commit does not lockdown softc data, which follows in a future commit.	2004-03-22 16:04:43 +00:00
Matthew N. Dodd	2964fb6538	- Fix indentation lost by 'diff -b'. - Un-wrap short line.	2004-03-21 18:51:26 +00:00
Matthew N. Dodd	64bf80ce1b	Remove interface type specific code from arprequest(), and in_arpinput(). The AF_ARP case in the (*if_output)() routine will handle the interface type specific bits. Obtained from: NetBSD	2004-03-21 06:36:05 +00:00
Dag-Erling Smørgrav	f0f93429cf	Run through indent(1) so I can read the code without getting a headache. The result isn't quite knf, but it's knfer than the original, and far more consistent.	2004-03-16 21:30:41 +00:00
Matthew N. Dodd	e952fa39de	De-register.	2004-03-14 00:44:11 +00:00
Robert Watson	fe5a02c927	Lock down IP-layer encapsulation library: - Add encapmtx to protect ip_encap.c global variables (encapsulation list). - Unifdef #ifdef 0 pieces of encap_init() which was (and now really is) basically a no-op. - Lock encapmtx when walking encaptab, modifying it, comparing entries, etc. - Remove spl's. Note that currently there's no facilite to make sure outstanding use of encapsulation methods on a table entry have drained bfore we allow a table entry to be removed. As such, it's currently the caller's responsibility to make sure that draining takes place. Reviewed by: mlaier	2004-03-10 02:48:50 +00:00
Robert Watson	846840ba95	Scrub unused variable zeroin_addr.	2004-03-10 01:01:04 +00:00
Jeffrey Hsu	a062038267	To comply with the spec, do not copy the TOS from the outer IP header to the inner IP header of the PIM Register if this is a PIM Null-Register message. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2004-03-08 07:47:27 +00:00
Jeffrey Hsu	4c9792f9d3	Include <sys/types.h> for autoconf/automake detection. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2004-03-08 07:45:32 +00:00
Max Laier	b81dae751b	Add some missing DUMMYNET_UNLOCK() in config_pipe(). Noticed by: Simon Coggins Approved by: bms(mentor)	2004-03-03 01:33:22 +00:00
Max Laier	4672d81921	Two minor follow-ups on the MT_TAG removal: ifp is now passed explicitly to ether_demux; no need to look it up again. Make mtag a global var in ip_input. Noticed by: rwatson Approved by: bms(mentor)	2004-03-02 14:37:23 +00:00
Robert Watson	6200a93f82	Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT() to NET_UNLOCK_GIANT(). While they are used in similar ways, the semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT() directly wrap mutex lock and unlock operations, whereas drop/pickup special case the handling of Giant recursion. Add a comment saying as much. Add NET_ASSERT_GIANT(), which conditionally asserts Giant based on the value of debug_mpsafenet.	2004-03-01 22:37:01 +00:00
Hajimu UMEMOTO	04d3a45241	fix -O0 compilation without INET6. Pointed out by: ru	2004-03-01 19:10:31 +00:00
Robert Watson	768bbd68cc	Remove unneeded {} originally used to hold local variables for dummynet in a code block, as the variable is now gone. Submitted by: sam	2004-02-28 19:50:43 +00:00
Robert Watson	a7b6a14aee	Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam	2004-02-28 15:12:20 +00:00
Max Laier	25a4adcec4	Bring eventhandler callbacks for pf. This enables pf to track dynamic address changes on interfaces (dailup) with the "on (<ifname>)"-syntax. This also brings hooks in anticipation of tracking cloned interfaces, which will be in future versions of pf. Approved by: bms(mentor)	2004-02-26 04:27:55 +00:00
Max Laier	cc5934f5af	Tweak existing header and other build infrastructure to be able to build pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile (i.e. do not connect it to any (automatic) builds - yet). Approved by: bms(mentor)	2004-02-26 03:53:54 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Max Laier	ac9d7e2618	Re-remove MT_TAGs. The problems with dummynet have been fixed now. Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam	2004-02-25 19:55:29 +00:00
Bruce Evans	0613995bd0	Fixed namespace pollution in rev.1.74. Implementation of the syncache increased <netinet/tcp_var>'s already large set of prerequisites, and this was handled badly. Just don't declare the complete syncache struct unless <netinet/pcb.h> is included before <netinet/tcp_var.h>. Approved by: jlemon (years ago, for a more invasive fix)	2004-02-25 13:03:01 +00:00
Bruce Evans	a545b1dc4d	Don't use the negatively-opaque type uma_zone_t or be chummy with <vm/uma.h>'s idempotency indentifier or its misspelling.	2004-02-25 11:53:19 +00:00
Jeffrey Hsu	89c02376fc	Relax a KASSERT condition to allow for a valid corner case where the FIN on the last segment consumes an extra sequence number. Spurious panic reported by Mike Silbersack <silby@silby.com>.	2004-02-25 08:53:17 +00:00
Andre Oppermann	12e2e97051	Convert the tcp segment reassembly queue to UMA and limit the maximum amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby	2004-02-24 15:27:41 +00:00
Pawel Jakub Dawidek	41fe0c8ad5	Fixed ucred structure leak. Approved by: scottl (mentor) PR: 54163 MFC after: 3 days	2004-02-19 14:13:21 +00:00
Max Laier	36e8826ffb	Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place. Approved by: bms(mentor)	2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO	da0f40995d	IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>	2004-02-17 14:02:37 +00:00
Bruce M Simpson	88f6b0435e	Shorten the name of the socket option used to enable TCP-MD5 packet treatment. Submitted by: Vincent Jardin	2004-02-16 22:21:16 +00:00
Hajimu UMEMOTO	70dbc6cbfc	don't update outgoing ifp, if ipsec tunnel mode encapsulation was not made. Obtained from: KAME	2004-02-16 17:05:06 +00:00
Bruce M Simpson	91179f796d	Spell types consistently throughout this file. Do not use the __packed attribute, as we are often #include'd from userland without <sys/cdefs.h> in front of us, and it is not strictly necessary. Noticed by: Sascha Blank	2004-02-16 14:40:56 +00:00
Bruce M Simpson	32ff046639	Final brucification pass. Spell types consistently (u_int). Remove bogus casts. Remove unnecessary parenthesis. Submitted by: bde	2004-02-14 21:49:48 +00:00
Max Laier	97075d0c0a	Do not expose ip_dn_find_rule inline function to userland and unbreak world. ----------------------------------------------------------------------	2004-02-13 22:26:36 +00:00
Max Laier	189a0ba4e7	Do not check receive interface when pfil(9) hook changed address. Approved by: bms(mentor)	2004-02-13 19:20:43 +00:00
Max Laier	1094bdca51	This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson	2004-02-13 19:14:16 +00:00
Bruce M Simpson	265ed01285	Brucification. Submitted by: bde	2004-02-13 18:21:45 +00:00
Hajimu UMEMOTO	efddf5c64d	supported IPV6_RECVPATHMTU socket option. Obtained from: KAME	2004-02-13 14:50:01 +00:00
Bruce M Simpson	b30190b542	Update the prototype for tcpsignature_apply() to reflect the spelling of the types used by m_apply()'s callback function, f, as documented in mbuf(9). Noticed by: njl	2004-02-12 20:16:09 +00:00
Bruce M Simpson	bca0e5bfc3	style(9) pass; whitespace and comments. Submitted by: njl	2004-02-12 20:12:48 +00:00
Bruce M Simpson	a0194ef1ea	Remove an unnecessary initialization that crept in from the code which verifies TCP-MD5 digests. Noticed by: njl	2004-02-12 20:08:28 +00:00
Bruce M Simpson	45d370ee8b	Fix a typo; left out preprocessor conditional for sigoff variable, which is only used by TCP_SIGNATURE code. Noticed by: Roop Nanuwa	2004-02-11 09:46:54 +00:00
Bruce M Simpson	1cfd4b5326	Initial import of RFC 2385 (TCP-MD5) digest support. This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net	2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO	f073c60f73	pass pcb rather than so. it is expected that per socket policy works again.	2004-02-03 18:20:55 +00:00
Andre Oppermann	b74d89bbbb	Add sysctl net.inet.icmp.reply_src to specify the interface name used for the ICMP reply source in reponse to packets which are not directly addressed to us. By default continue with with normal source selection. Reviewed by: bms	2004-02-02 22:53:16 +00:00
Andre Oppermann	1488eac8ec	More verbose description of the source ip address selection for ICMP replies. Reviewed by: bms	2004-02-02 22:17:09 +00:00
Poul-Henning Kamp	be8a62e821	Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.	2004-01-31 10:40:25 +00:00
Maxim Sobolev	4c83789253	Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again.	2004-01-30 09:03:01 +00:00
Ruslan Ermilov	0ca2861fc9	Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls.	2004-01-27 22:17:39 +00:00
Maxim Sobolev	7735aeb9bb	Add support for WCCPv2. It should be enablem manually using link2 ifconfig(8) flag since header for version 2 is the same but IP payload is prepended with additional 4-bytes field. Inspired by: Roman Synyuk <roman@univ.kiev.ua> MFC after: 2 weeks	2004-01-26 12:33:56 +00:00
Maxim Sobolev	6e628b8187	(whilespace-only) Kill trailing spaces.	2004-01-26 12:21:59 +00:00
Andre Oppermann	241f1e33b1	Remove leftover FREE() from changes in rev 1.50. Noticed by: Jun Kuriyama <kuriyama@imgsrc.co.jp>	2004-01-23 01:39:12 +00:00
Andre Oppermann	201d185b69	Split the overloaded variable 'win' into two for their specific purposes: recwin and sendwin. This removes a big source of confusion and makes following the code much easier. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:22:14 +00:00
Andre Oppermann	1ddba8d63e	Move the reduction by one of the syncache limit after the zone has been allocated. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:14:48 +00:00
Andre Oppermann	73080de2be	Remove an unused variable and put the sockaddr_in6 onto the stack instead of malloc'ing it. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:10:11 +00:00
Jeffrey Hsu	61a36e3dfc	Merge from DragonFlyBSD rev 1.10: date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited. Obtained from: DragonFlyBSD	2004-01-20 21:40:25 +00:00
Poul-Henning Kamp	5e289f9eb6	Mostly mechanical rework of libalias: Makes it possible to have multiple packet aliasing instances in a single process by moving all static and global variables into an instance structure called "struct libalias". Redefine a new API based on s/PacketAlias/LibAlias/g Add new "instance" argument to all functions in the new API. Implement old API in terms of the new API.	2004-01-17 10:52:21 +00:00
Hajimu UMEMOTO	548c676b32	do not deref freed pointer Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net> Reviewed by: itojun	2004-01-13 09:51:47 +00:00
Andre Oppermann	bed824fa90	Disable the minmssoverload connection drop by default until the detection logic is refined.	2004-01-12 15:46:04 +00:00
Don Lewis	e29ef13f6c	Check that sa_len is the appropriate value in tcp_usr_bind(), tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking to see whether the address is multicast so that the proper errno value will be returned if sa_len is incorrect. The checks are identical to the ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are called after the multicast address check passes. MFC after: 30 days	2004-01-10 08:53:00 +00:00
Andre Oppermann	1ddc17c1d5	Reduce TCP_MINMSS default to 216. The AX.25 protocol (packet radio) is frequently used with an MTU of 256 because of slow speeds and a high packet loss rate.	2004-01-09 14:14:10 +00:00
Andre Oppermann	53369ac9bb	Limiters and sanity checks for TCP MSS (maximum segement size) resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day	2004-01-08 17:40:07 +00:00
Andre Oppermann	bf87c82ebb	If path mtu discovery is enabled set the DF bit in all cases we send packets on a tcp connection. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)	2004-01-08 11:17:11 +00:00
Andre Oppermann	e0f630ea7a	Do not set the ip_id to zero when DF is set on packet and restore the general pre-randomid behaviour. Setting the ip_id to zero causes several problems with packet reassembly when a device along the path removes the DF bit for some reason. Other BSD and Linux have found and fixed the same issues. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)	2004-01-08 11:13:40 +00:00
Andre Oppermann	dba7bc6a65	Enable the following TCP options by default to give it more exposure: rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance. Reviewed by: sam (mentor)	2004-01-06 23:29:46 +00:00
Andre Oppermann	87c3bd2755	According to RFC1812 we have to ignore ICMP redirects when we are acting as router (ipforwarding enabled). This doesn't fix the problem that host routes from ICMP redirects are never removed from the kernel routing table but removes the problem for machines doing packet forwarding. Reviewed by: sam (mentor)	2004-01-06 23:20:07 +00:00
Ruslan Ermilov	3b95e1346a	Document the net.inet.ip.subnets_are_local sysctl.	2003-12-30 16:05:03 +00:00
Maxim Sobolev	73d7ddbc56	Sync with NetBSD: if_gre.c rev.1.41-1.49 o Spell output with two ts. o Remove assigned-to but not used variable. o fix grammatical error in a diagnostic message. o u_short -> u_int16_t. o gi_len is ip_len, so it has to be network byteorder. if_gre.h rev.1.11-1.13 o prototype must not have variable name. o u_short -> u_int16_t. o Spell address with two d's. ip_gre.c rev.1.22-1.29 o KNF - return is not a function. o The "osrc" variable in gre_mobile_input() is only ever set but not referenced; remove it. o correct (false) assumptions on mbuf chain. not sure if it really helps, but anyways, it is necessary to perform m_pullup. o correct arg to m_pullup (need to count IP header size as well). o remove redundant adjustment of m->m_pkthdr.len. o clear m_flags just for safety. o tabify. o u_short -> u_int16_t. MFC after: 2 weeks	2003-12-30 11:41:43 +00:00
Sam Leffler	437ffe1823	o eliminate widespread on-stack mbuf use for bpf by introducing a new bpf_mtap2 routine that does the right thing for an mbuf and a variable-length chunk of data that should be prepended. o while we're sweeping the drivers, use u_int32_t uniformly when when prepending the address family (several places were assuming sizeof(int) was 4) o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated mbufs have been eliminated; this may better be moved to the bpf routines Reviewed by: arch@ and several others	2003-12-28 03:56:00 +00:00
Maxim Konovalov	fad1d65260	o Fix a comment: softticks lives in sys/kern/kern_timeout.c. PR: kern/60613 Submitted by: Gleb Smirnoff MFC after: 3 days	2003-12-27 14:08:53 +00:00
Hajimu UMEMOTO	8b8a0cef40	NULL is not 0. Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>	2003-12-24 18:22:04 +00:00
Ruslan Ermilov	3117579171	I didn't notice it right away, but check the right length too.	2003-12-23 14:08:50 +00:00
Ruslan Ermilov	78e2d2bd28	Fix a problem introduced in revision 1.84: m_pullup() does not necessarily return the same mbuf chain so we need to recompute mtod() consumers after pulling up.	2003-12-23 13:33:23 +00:00
Peter Wemm	a89ec05e3e	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
Sam Leffler	ededbec187	o move mutex init/destroy logic to the module load/unload hooks; otherwise they are initialized twice when the code is statically configured in the kernel because the module load method gets invoked before the user application calls ip_mrouter_init o add a mutex to synchronize the module init/done operations; this sort of was done using the value of ip_mroute but X_ip_mrouter_done sets it to NULL very early on which can lead to a race against ip_mrouter_init--using the additional mutex means this is safe now o don't call ip_mrouter_reset from ip_mrouter_init; this now happens once at module load and X_ip_mrouter_done does the appropriate cleanup work to insure the data structures are in a consistent state so that a subsequent init operation inherits good state Reviewed by: juli	2003-12-20 18:32:48 +00:00
John Baldwin	a5b061f9d2	Fix some becuase -> because typos. Reported by: Marco Wertejuk <wertejuk@mwcis.com>	2003-12-17 16:12:01 +00:00
Robert Watson	2d92ec9858	Switch TCP over to using the inpcb label when responding in timed wait, rather than the socket label. This avoids reaching up to the socket layer during connection close, which requires locking changes. To do this, introduce MAC Framework entry point mac_create_mbuf_from_inpcb(), which is called from tcp_twrespond() instead of calling mac_create_mbuf_from_socket() or mac_create_mbuf_netlayer(). Introduce MAC Policy entry point mpo_create_mbuf_from_inpcb(), and implementations for various policies, which generally just copy label data from the inpcb to the mbuf. Assert the inpcb lock in the entry point since we require consistency for the inpcb label reference. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-17 14:55:11 +00:00
Maxim Konovalov	1c86761b2a	o IN_MULTICAST wants an address in host byte order. PR: kern/60304 Submitted by: demon MFC after: 1 week	2003-12-16 18:21:47 +00:00
Maksim Yevmenkin	a6a66f5c4c	Do not panic when flushing dummynet firewall rules Reviewed by: andre Approved by: re (scottl)	2003-12-06 09:01:25 +00:00
Andre Oppermann	f5bd8e9aff	Swap destination and source arguments of two bcopy() calls. Before committing the initial tcp_hostcache I changed them from memcpy() to conform with FreeBSD style without realizing the difference in argument definition. This fixes hostcache operation for IPv6 (in general and explicitly IPv6 path mtu discovery) and T/TCP (RFC1644). Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Approved by: re (rwatson)	2003-12-02 21:25:12 +00:00
Sam Leffler	d559f5c3d8	Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate code is compiled in to support the O_IPSEC operator. Previously no support was included and ipsec rules were always matching. Note that we do not return an error when an ipsec rule is added and the kernel does not have IPsec support compiled in; this is done intentionally but we may want to revisit this (document this in the man page). PR: 58899 Submitted by: Bjoern A. Zeeb Approved by: re (rwatson)	2003-12-02 00:23:45 +00:00
Andre Oppermann	cd6c4060c8	Fix an optimization where I made an ifdef'd out section to broad. When the hostcache bucket limit is reached the last bucket wasn't removed from the bucket row but inserted a few lines later at the bucket row head again. This leads to infinite loop when the same bucket row is accessed the next time for a lookup/insert or purge action. Tested by: imp, Matt Smith Approved by: re (rwatson)	2003-11-28 16:33:03 +00:00
Andre Oppermann	623f556031	Fix verify_rev_path() function. The author of this function tried to cut corners which completely broke down when the routing table locking was introduced. Reviewed by: sam (mentor) Approved by: re (rwatson)	2003-11-27 09:40:13 +00:00
Andre Oppermann	0cfbbe3bde	Make sure all uses of stack allocated struct route's are properly zeroed. Doing a bzero on the entire struct route is not more expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst. Reviewed by: sam (mentor) Approved by: re (scottl)	2003-11-26 20:31:13 +00:00
Sam Leffler	5bd311a566	Split the "inp" mutex class into separate classes for each of divert, raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)	2003-11-26 01:40:44 +00:00
Andre Oppermann	943ae30252	Restructure a too broad ifdef which was disabling the setting of the tcp flightsize sysctl value for local networks in the !INET6 case. Approved by: re (scottl)	2003-11-25 20:58:59 +00:00
Sam Leffler	6714d7c751	Correct a problem where ipfw-generated packets were being returned for ipfw processing w/o an indication the packets were generated by ipfw--and so should not be processed (this manifested itself as a LOR.) The flag bit in the mbuf that was used to mark the packets was not listed in M_COPYFLAGS so if a packet had a header prepended (as done by IPsec) the flag was lost. Correct this by defining a new M_PROTO6 flag and use it to mark packets that need this processing. Reviewed by: bms Approved by: re (rwatson) MFC after: 2 weeks	2003-11-24 03:57:03 +00:00
Sam Leffler	6a3ca7514d	Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines potentially transmit packets that may enter KAME IPsec w/o Giant if the callouts are marked MPSAFE. Reviewed by: ume Approved by: re (rwatson)	2003-11-23 18:13:41 +00:00
Thomas Moestl	1f831750b5	bzero() the the sockaddr used for the destination address for rtalloc_ign() in in_pcbconnect_setup() before it is filled out. Otherwise, stack junk would be left in sin_zero, which could cause host routes to be ignored because they failed the comparison in rn_match(). This should fix the wrong source address selection for connect() to 127.0.0.1, among other things. Reviewed by: sam Approved by: re (rwatson)	2003-11-23 03:02:00 +00:00
Andre Oppermann	97d8d152c2	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 20:07:39 +00:00
Andre Oppermann	26d02ca7ba	Remove RTF_PRCLONING from routing table and adjust users of it accordingly. The define is left intact for ABI compatibility with userland. This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 19:47:31 +00:00
Maxim Konovalov	dbf7b38125	Fix an arguments order in check_uidgid() call. PR: kern/59314 Submitted by: Andrey V. Shytov Approved by: re (rwatson, jhb)	2003-11-20 10:28:33 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Olivier Houchard	8c8268cb4f	In rip_abort(), unlock the inpcb if we didn't detach it, or we may recurse on the lock before destroying the mutex. Submitted by: sam	2003-11-17 19:21:53 +00:00
Brian Feldman	633461295a	Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but do not have mh_nextpkt initialized. Somtimes what's there is "1", and the ip_input() code pukes trying to m_free() it, rendering divert sockets and such broken. This really underscores the need to get rid of MT_TAG. Reviewed by: rwatson	2003-11-17 03:17:49 +00:00
Andre Oppermann	be7e82e44a	Make two casts correct for all types of 64bit platforms. Explained by: bde	2003-11-16 12:50:33 +00:00
Andre Oppermann	df903fee84	Correct a cast to make it compile on 64bit platforms (noticed by tinderbox) and remove two unneccessary variable initializations. Make the introduction comment more clear with regard which parts of the packet are touched. Requested by: luigi	2003-11-15 17:03:37 +00:00
Andre Oppermann	c76ff7084f	Make ipstealth global as we need it in ip_fastforward too.	2003-11-15 01:45:56 +00:00
Andre Oppermann	02c1c7070e	Remove the global one-level rtcache variable and associated complex locking and rework ip_rtaddr() to do its own rtlookup. Adopt all its callers to this and make ip_output() callable with NULL rt pointer. Reviewed by: sam (mentor)	2003-11-14 21:48:57 +00:00
Andre Oppermann	9188b4a169	Introduce ip_fastforward and remove ip_flow. Short description of ip_fastforward: o adds full direct process-to-completion IPv4 forwarding code o handles ip fragmentation incl. hw support (ip_flow did not) o sends icmp needfrag to source if DF is set (ip_flow did not) o supports ipfw and ipfilter (ip_flow did not) o supports divert, ipfw fwd and ipfilter nat (ip_flow did not) o returns anything it can't handle back to normal ip_input Enable with sysctl -w net.inet.ip.fastforwarding=1 Reviewed by: sam (mentor)	2003-11-14 21:02:22 +00:00
Sam Leffler	f7bbe2c0f1	add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb) Supported by: FreeBSD Foundation	2003-11-13 05:18:23 +00:00
Sam Leffler	1b73ca0bf1	o reorder some locking asserts to reflect the order of the locks o correct a read-lock assert in in_pcblookup_local that should be a write-lock assert (since time wait close cleanups may alter state) Supported by: FreeBSD Foundation	2003-11-13 05:16:56 +00:00
Andre Oppermann	16d6c90f5d	Move global variables for icmp_input() to its stack. With SMP or preemption two CPUs can be in the same function at the same time and clobber each others variables. Remove register declaration from local variables. Reviewed by: sam (mentor)	2003-11-13 00:32:13 +00:00
Andre Oppermann	2683ceb661	Do not fragment a packet with hardware assistance if it has the DF bit set. Reviewed by: sam (mentor)	2003-11-12 23:35:40 +00:00
Bruce M Simpson	83453a06de	Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path. This switch toggles between strict multicast delivery, and traditional multicast delivery. The traditional (default) behaviour is to deliver multicast datagrams to all sockets which are members of that group, regardless of the network interface where the datagrams were received. The strict behaviour is to deliver multicast datagrams received on a particular interface only to sockets whose membership is bound to that interface. Note that as a matter of course, multicast consumers specifying INADDR_ANY for their interface get joined on the interface where the default route happens to be bound. This switch has no effect if the interface which the consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING. The original patch has been cleaned up somewhat from that submitted. It has been tested on a multihomed machine with multiple QuickTime RTP streams running over the local switch, which doesn't do IGMP snooping. PR: kern/58359 Submitted by: William A. Carrel Reviewed by: rwatson MFC after: 1 week	2003-11-12 20:17:11 +00:00
Andre Oppermann	122aad88d5	dropwithreset is not needed in this case as tcp_drop() is already notifying the other side. Before we were sending two RST packets.	2003-11-12 19:38:01 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Sam Leffler	a0bf1601a7	correct typos Pointed out by: Mike Silbersack	2003-11-11 18:16:54 +00:00
Sam Leffler	3d0b255a9a	o add missing inpcb locking in tcp_respond o replace spl's with lock assertions Supported by: FreeBSD Foundation	2003-11-11 17:54:47 +00:00
Sam Leffler	383df78dc8	use Giant-less callouts when debug_mpsafenet is non-zero Supported by: FreeBSD Foundation	2003-11-10 23:29:33 +00:00
Ian Dowse	3ab2096b80	In in_pcbconnect_setup(), don't use the cached inp->inp_route unless it is marked as RTF_UP. This appears to fix a crash that was sometimes triggered when dhclient(8) tried to send a packet after an interface had been detatched. Reviewed by: sam	2003-11-10 22:45:37 +00:00
Jeffrey Hsu	1ce43e2348	Mark TCP syncache timer as not Giant-free ready yet.	2003-11-10 20:42:04 +00:00
Sam Leffler	7138d65c3f	replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF macros that expand to include assertions when the system is built with INVARIANTS Supported by: FreeBSD Foundation	2003-11-08 23:36:32 +00:00
Sam Leffler	252f24a2cf	divert socket fixups: o pickup Giant in divert_packet to protect sbappendaddr since it can be entered through MPSAFE callouts or through ip_input when mpsafenet is 1 o add missing locking on output o add locking to abort and shutdown o add a ctlinput handler to invalidate held routing table references on an ICMP redirect (may not be needed) Supported by: FreeBSD Foundation	2003-11-08 23:09:42 +00:00
Sam Leffler	8484384564	assert optional inpcb is passed in locked Supported by: FreeBSD Foundation	2003-11-08 23:03:29 +00:00
Sam Leffler	59daba27d9	add locking assertions Supported by: FreeBSD Foundation	2003-11-08 23:02:36 +00:00
Sam Leffler	3c47a187b7	assert inpcb is locked in udp_output Supported by: FreeBSD Foundation	2003-11-08 23:00:48 +00:00
Sam Leffler	c29afad673	o correct locking problem: the inpcb must be held across tcp_respond o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation	2003-11-08 22:59:22 +00:00
Sam Leffler	2a0746208b	use local values instead of chasing pointers Supported by: FreeBSD Foundation	2003-11-08 22:57:13 +00:00
Sam Leffler	fa286d7db2	replace mtx_assert by INP_LOCK_ASSERT Supported by: FreeBSD Foundation	2003-11-08 22:55:52 +00:00
Sam Leffler	50d7c061a3	add some missing locking Supported by: FreeBSD Foundation	2003-11-08 22:53:41 +00:00
Sam Leffler	1d78192b35	the sbappendaddr call in socket_send must be protected by Giant because it can happen from an MPSAFE callout Supported by: FreeBSD Foundation	2003-11-08 22:51:18 +00:00
Sam Leffler	e3f268fc89	add locking assertions that turn into noops if INET6 is configured; this is necessary because the ipv6 code shares the in_pcb code with ipv4 but (presently) lacks proper locking Supported by: FreeBSD Foundation	2003-11-08 22:48:27 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
Sam Leffler	27a940c9a2	unbreak compilation of FAST_IPSEC Supported by: FreeBSD Foundation	2003-11-08 00:34:34 +00:00
Sam Leffler	aab621f060	MFp4: reminder that random id code is not reentrant Supported by: FreeBSD Foundation	2003-11-07 23:31:29 +00:00
Sam Leffler	8f1ee3683d	Move uid/gid checking logic out of line and lock inpcb usage. This has a LOR between IPFW inpcb locks but I'm committing it now as the lesser of two evils (the other being unlocked use of in_pcblookup). Supported by: FreeBSD Foundation	2003-11-07 23:26:57 +00:00
Hajimu UMEMOTO	aef3a65eb7	use ipsec_getnhist() instead of obsoleted ipsec_gethist(). Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Reviewed by: Ari Suutari <ari@suutari.iki.fi> (ipfw@)	2003-11-07 20:25:47 +00:00
Sam Leffler	ad67584665	Fix locking of the ip forwarding cache. We were holding a reference to a routing table entry w/o bumping the reference count or locking against the entry being free'd. This caused major havoc (for some reason it appeared most frequently for folks running natd). Fix is to bump the reference count whenever we copy the route cache contents into a private copy so the entry cannot be reclaimed out from under us. This is a short term fix as the forthcoming routing table changes will eliminate this cache entirely. Supported by: FreeBSD Foundation	2003-11-07 01:47:52 +00:00
Hajimu UMEMOTO	0f9ade718d	- cleanup SP refcnt issue. - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard\|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec__policy -> ipsec__pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy )pcb_sp, spp (struct secpolicy ), sp (struct secpolicy ) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME	2003-11-04 16:02:05 +00:00
Robert Watson	3de758d3e3	Note that when ip_output() is called from ip_forward(), it will already have its options inserted, so the opt argument to ip_output() must be NULL.	2003-11-03 18:03:05 +00:00
Robert Watson	eecfe773aa	Remove comment about desire for eventual explicit labeling of ICMP header copy made on input path: this is now handled differently. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 18:01:38 +00:00
Sam Leffler	04df2fbbb8	Remove bogus RTFREE that was added in rev 1.47. The rmx code operates directly on the radix tree and does not hold any routing table refernces. This fixes the reference counting problems that manifested itself as a panic during unmount of filesystems that were mounted by NFS over an interface that had been removed. Supported by: FreeBSD Foundation	2003-11-03 06:11:44 +00:00
Sam Leffler	9ce7877897	Correct rev 1.56 which (incorrectly) reversed the test used to decide if in_pcbpurgeif0 should be invoked. Supported by: FreeBSD Foundation	2003-11-03 03:22:39 +00:00
Mike Silbersack	4bd4fa3fe6	Add an additional check to the tcp_twrecycleable function; I had previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well. The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.	2003-11-02 07:47:03 +00:00
Mike Silbersack	96af9ea52b	- Add a new function tcp_twrecycleable, which tells us if the ISN which we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.	2003-11-01 07:30:08 +00:00
Brooks Davis	9bf40ede4a	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)	2003-10-31 18:32:15 +00:00
Sam Leffler	9c63e9dbd7	Overhaul routing table entry cleanup by introducing a new rtexpunge routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference). Supported by: FreeBSD Foundation	2003-10-30 23:02:51 +00:00
Sam Leffler	d0402f1b73	Potential fix for races shutting down callouts when unloading the module. Previously we grabbed the mutex used by the callouts, then stopped the callout with callout_stop, but if the callout was already active and blocked by the mutex then it would continue later and reference the mutex after it was destroyed. Instead stop the callout first then lock. Supported by: FreeBSD Foundation	2003-10-29 19:15:00 +00:00
Sam Leffler	3520e9d61d	o add locking to protect routing table refcnt manipulations o add some more debugging help for figuring out why folks are getting complaints about releasing routing table entries with a zero refcnt o fix comment that talked about spl's o remove duplicate define of DUMMYNET_DEBUG Supported by: FreeBSD Foundation	2003-10-29 19:03:58 +00:00
Hajimu UMEMOTO	59dfcba4aa	add ECN support in layer-3. - implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c. make ip{,6}_ecn_egress() return integer to tell the caller that this packet should be dropped. - handle ECN at fragment reassembly in ip_input.c and frag6.c. Obtained from: KAME	2003-10-29 15:07:04 +00:00
Hajimu UMEMOTO	11de19f44d	ip6_savecontrol() argument is redundant	2003-10-29 12:52:28 +00:00
Sam Leffler	9c855a36c1	Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels. Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage". Pointed out by: Jonathan Stone Reviewed by: Robert Watson	2003-10-29 05:40:07 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Hajimu UMEMOTO	618d51bbdc	revert following unwanted changes: - __packed to __attribute__((__packed__) - uintN_t back to u_intN_t Reported by: bde	2003-10-25 10:57:08 +00:00
Hajimu UMEMOTO	16cd67e933	correct namespace pollution. Submitted by: bde	2003-10-25 09:37:10 +00:00
Hajimu UMEMOTO	c302f5bc07	remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{} according to rfc2292bis. Obtained from: KAME	2003-10-24 20:37:05 +00:00
Hajimu UMEMOTO	5434eaa208	correct tab and order.	2003-10-24 19:51:49 +00:00
Hajimu UMEMOTO	f95d46333d	Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542 (aka RFC2292bis). Though I believe this commit doesn't break backward compatibility againt existing binaries, it breaks backward compatibility of API. Now, the applications which use Advanced Sockets API such as telnet, ping6, mld6query and traceroute6 use RFC3542 API. Obtained from: KAME	2003-10-24 18:26:30 +00:00
Mike Silbersack	0709c23335	Reduce the number of tcp time_wait structs to maxsockets / 5; this ensures that at most 20% of sockets can be in time_wait at one time, ensuring that time_wait sockets do not starve real connections from inpcb structures. No implementation change is needed, jlemon already implemented a nice LRU-ish algorithm for tcp_tw structure recycling. This should reduce the need for sysadmins to lower the default msl on busy servers.	2003-10-24 05:44:14 +00:00
Sam Leffler	ac6b0748be	o restructure initialization code so data structures are setup when loaded as a module o cleanup data structures on module unload when no application has been started (i.e. kldload, kldunload w/o mrtd) o remove extraneous unlocks immediately prior to destroying them Supported by: FreeBSD Foundation	2003-10-24 00:09:18 +00:00
Mike Silbersack	184dcdc7c8	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.	2003-10-21 18:28:36 +00:00
Hajimu UMEMOTO	b339980338	enclose IPv6 part with ifdef INET6. Obtained from: KAME	2003-10-20 16:19:01 +00:00
Hajimu UMEMOTO	31b3783c8d	correct linkmtu handling. Obtained from: KAME	2003-10-20 15:27:48 +00:00

... 2 3 4 5 6 ...

2078 Commits