freebsd-nq

Author	SHA1	Message	Date
Julian Elischer	7e170af886	Remove two lines that somehow snuck back in after testing. ip is now an argument to the function ipfw_log()	2007-01-09 21:03:07 +00:00
Maxim Konovalov	8b5b885047	o One more typo in the comment. PR: kern/107609 Submitted by: Dr. Markus Waldeck	2007-01-06 13:12:24 +00:00
Paolo Pisati	3d2fff0d3d	Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined. Reviewed: luigi	2007-01-05 12:15:31 +00:00
Paolo Pisati	61c0e134f5	Wrap ipfw nat support in a new kernel config option named "IPFIREWALL_NAT": this way nat is turned off by default and POLA is preserved. Reviewed by: rwatson	2007-01-03 11:12:54 +00:00
Julian Elischer	3b62120e87	Remove a bunch of dependencies in the IP header being the first thing in the mbuf. First moves toward being able to cope better with having layer 2 (or other encapsulation data) before the IP header in the packet being examined. More commits to come to round out this functionality. This commit should have no practical effect but clears the way for what is coming. Revirewed by: luigi, yar MFC After: 2 weeks	2007-01-02 19:57:31 +00:00
Warner Losh	6796a2d434	Fix typo in comment. Submitted by: remko	2007-01-01 00:35:34 +00:00
Warner Losh	74eb3236c7	Add comment about udp checksums being off in BSD 4.2 compatibility mode. Submitted by: Dr. Markus Waldeck PR: kern/106657	2006-12-31 21:34:53 +00:00
John Baldwin	54e3607de6	Whitespace fix and remove an extra cast.	2006-12-30 17:53:28 +00:00
Paolo Pisati	ff2f6fe80f	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
Randall Stewart	139bc87fda	a) macro-ization of all mbuf and random number access plus timers. This makes the code more portable and able to change out the mbuf or timer system used more easily ;-) b) removal of all use of pkt-hdr's until only the places we need them (before ip_output routines). c) remove a bunch of code not needed due to <b> aka worrying about pkthdr's :-) d) There was one last reorder problem it looks where if a restart occur's and we release and relock (at the point where we setup our alias vtag) we would end up possibly getting the wrong TSN in place. The code that fixed the TSN's just needed to be shifted around BEFORE the release of the lock.. also code that set the state (since this also could contribute). Approved by: gnn	2006-12-29 20:21:42 +00:00
John Baldwin	08651e1f24	Some whitespace nits and remove a few casts.	2006-12-29 14:58:18 +00:00
Paolo Pisati	ccd57eea11	o made in kernel libalias mpsafe o fixed a comment o made in kernel libalias a bit less verbose (disabled automatic logging everytime a new link is added or deleted) Approved by: glebius (mentor)	2006-12-15 12:50:06 +00:00
Randall Stewart	a5d547add3	1) Fixes on a number of different collision case LOR's. 2) Fix all "magic numbers" to be constants. 3) A collision case that would generate two associations to the same peer due to a missing lock is fixed. 4) Added tracking of where timers are stopped. Approved by: gnn	2006-12-14 17:02:55 +00:00
Christian S.J. Peron	826cef3d75	Fix LOR between the syncache and inpcb locks when MAC is present in the kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed: - Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels. This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf. These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create. This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized. Submitted by: andre [1] Discussed with: rwatson [1] andre submitted the tcp_syncache.c changes	2006-12-13 06:00:57 +00:00
Bjoern A. Zeeb	7d32aa0cc9	In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument. This is the "+ one more change" missed in the original commit. Noticed by: tinderbox Pointy hat to: me (#1)	2006-12-12 17:44:46 +00:00
Bjoern A. Zeeb	1d54aa3ba9	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.	2006-12-12 12:17:58 +00:00
Bruce M Simpson	3dbee59bd4	Back out revision 1.264. Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic. Reported by: Mark Tinguely	2006-12-10 13:44:00 +00:00
Robert Watson	ece4c06484	Improve style(9) conformance of igmp.c.	2006-12-04 00:41:48 +00:00
Warner Losh	850adc0cd7	Make sure that carp_header is 36 bytes long	2006-12-01 18:37:41 +00:00
Paolo Pisati	5910c1c1b9	Make libalias.conf parsing a bit smarter. This closes PR kern/106112. While here, add mbuf's #includes i forgot in the previous commit. Approved by: gleb	2006-12-01 16:34:53 +00:00
Paolo Pisati	e876228edc	Remove m_megapullup from ng_nat and put it under libalias. Approved by: gleb	2006-12-01 16:27:11 +00:00
Robert Watson	e3fd5ffdf1	Consistently use #ifdef INET6 rather than mixing and matching with #if defined(INET6). Don't comment the end of short #ifdef blocks. Comment cleanup. Line wrap.	2006-11-30 10:54:54 +00:00
Sam Leffler	21367f630d	Change error codes returned by protocol operations when an inpcb is marked INP_DROPPED or INP_TIMEWAIT: o return ECONNRESET instead of EINVAL for close, disconnect, shutdown, rcvd, rcvoob, and send operations o return ECONNABORTED instead of EINVAL for accept These changes should reduce confusion in applications since EINVAL is normally interpreted to mean an invalid file descriptor. This change does not conflict with POSIX or other standards I checked. The return of EINVAL has always been possible but rare; it's become more common with recent changes to the socket/inpcb handling and with finer-grained locking and preemption. Note: there are other instances of EINVAL for this state that were left unchanged; they should be reviewed. Reviewed by: rwatson, andre, ru MFC after: 1 month	2006-11-22 17:16:54 +00:00
Bjoern A. Zeeb	89e7e7e32a	Add SCTP as a known upper layer protocol over v6. We are not yet aware of the protocol internals but this way SCTP traffic over v6 will not be discarded. Reported by: Peter Lei via rrs Tested by: Peter Lei <peterlei cisco.com>	2006-11-13 19:07:32 +00:00
Randall Stewart	7f34832b95	In a true restart case, the send_lock was not being aquired. This meant that when we cleanup the outbound we may have one in transit to be added with the old sequence number. This is bad since then we loose a message :( Also the report_outbound needed to have the right lock when its called which it did not.. I added the lock with of course a flag since we want to have the lock before we call it in the restart case. This also fixed the FIX ME case where, in the cookie collision case, we mark for retransmit any that were bundled with the cookie that was dropped. This also means changes to the output routine so we can assure getting the COOKIE-ACK sent BEFORE we retransmit the Data. Approved by: gnn	2006-11-11 22:44:12 +00:00
Randall Stewart	6a91f103b6	Turns out we would reset the TSN seq counter during a colliding INIT. This if fine except when we have data outstanding... we basically reset it to the previous value it was.. so then we end up assigning the same TSN to two different data chunks. This patch: 1) Finds a missing lock for when we change the stream numbers during COOKIE and INIT-ACK processing.. we were NOT locking the send_buffer.. which COULD cause problems (found by inspection looking for <2>) 2) Fixes a case during a colliding INIT where we incorrectly reset the sending Sequence thus in some cases duplicately assigning a TSN. 3) Additional enhancments to logging so we can see strm/tsn in the receiver AND new tracking to watch what the sender is doing with TSN and STRM seq's. Approved by: gnn	2006-11-11 15:59:01 +00:00
Randall Stewart	de0e935b29	This patch fixes a LOR that happens during INIT-ACK collision. We were calling select_a_tag() inside sctp_send_initate_ack(). During collision cases we have a stcb and thus a SCTP_LOCK. When we call select_a_tag it (below it) locks the INFO lock. We now 1) pre-select the nonce-tie-tags in sctputil.c during setup of a tcb. 2) In the other case where we have to select tags, we unlock after incr the ref cnt (so assoc won't go away0 and then do the tag selection followed by a relock and decr the refcnt. Approved by: gnn	2006-11-10 13:34:55 +00:00
Randall Stewart	08598d7067	Fixes an issue with handling of stream reset. When a reset comes in we need to calculate the length and therefore the number of listed streams (if any) based on the TLV type. Otherwise if we get a retran we could in theory panic by sending a notification to a user with a incorrect list and thus no memory listing the streams. Found in IOS by devtest :-) Approved by: gnn	2006-11-09 21:01:07 +00:00
Randall Stewart	03b0b02163	-Fixes first of all the getcred on IPv6 and V4. The copy's were incorrect and so was the locking. -A bug was also found that would create a race and panic when an abort arrived on a socket being read from. -Also fix the reader to get MSG_TRUNC when a partial delivery is aborted. -Also addresses a couple of coverity caught error path memory leaks and a couple of other valid complaints Approved by: gnn	2006-11-08 00:21:13 +00:00
Joe Marcus Clarke	1bc3d4c1d1	Fix TFTP NAT support by making sure the appropriate fingerprinting checks are done. Reviewed by: piso	2006-11-07 21:06:48 +00:00
Robert Watson	b96fbb37da	Convert three new suser(9) calls introduced between when the priv(9) patch was prepared and committed to priv(9) calls. Add XXX comments as, in each case, the semantics appear to differ from the TCP/UDP versions of the calls with respect to jail, and because cr_canseecred() is not used to validate the query. Obtained from: TrustedBSD Project	2006-11-06 14:54:06 +00:00
Randall Stewart	f4ad963c9f	This changes tracks down the EEOR->NonEEOR mode failure to wakeup on close of the sender. It basically moves the return (when the asoc has a reader/writer) further down and gets the wakeup and assoc appending (of the PD-API event) moved up before the return. It also moves the flag set right before the return so we can assure only once adding the PD-API events. Approved by: gnn	2006-11-06 14:34:21 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Ruslan Ermilov	9274ba8a1f	Revert previous commit, and instead make the expression in rev. 1.2 match the style of this file. OK'ed by: rrs	2006-11-05 14:36:59 +00:00
Randall Stewart	50cec91936	Tons of fixes to get all the 64bit issues removed. This also moves two 16 bit int's to become 32 bit values so we do not have to use atomic_add_16. Most of the changes are %p, casts and other various nasty's that were in the orignal code base. With this commit my machine will now do a build universe.. however I as yet have not tested on a 64bit machine .. it may not work :-(	2006-11-05 13:25:18 +00:00
Ruslan Ermilov	11acae799a	Fix pointer arithmetic to be 64-bit friendly.	2006-11-04 08:45:50 +00:00
Ruslan Ermilov	e349e6b8a0	Remove bogus casts that Randall for some reason didn't borrow from my supplied patch.	2006-11-04 08:19:01 +00:00
John Birrell	5051417909	Remove a bogus cast in an attempt to fix the tinderbox builds on lots of arches.	2006-11-04 05:39:39 +00:00
Randall Stewart	562a89b562	More 64 bit pointer fun. %p changed in multiple prints the mtod() was also fixed.	2006-11-03 23:04:34 +00:00
Randall Stewart	249820a7d8	Fix two of the 64bit errors on the printfs.	2006-11-03 21:19:54 +00:00
Randall Stewart	cef8ad061a	Somehow I missed this one. The sys/cdef.h was out of order with respect to the FSBID..	2006-11-03 19:48:56 +00:00
Randall Stewart	73932c69b6	Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I inserted a few to the new files.. but I falied to add the #include <sys/cdef.h> Which causes a compile error.. sorry about that... got it now :-) Approved by:gnn	2006-11-03 17:21:53 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
Oleg Bulyzhin	35da9180dc	- Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70 - Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input() Reviewed by: glebius, rwatson - purge_flow_set(): - Do not leak memory while purging queues which are not bound to pipe. - style(9) cleanup MFC after: 2 months	2006-10-29 12:09:24 +00:00
Oleg Bulyzhin	c2df509a1d	- Convert net.inet.ip.dummynet.curr_time net.inet.ip.dummynet.searches net.inet.ip.dummynet.search_steps to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs. - Implement simple mechanics for dummynet(4) internal time correction. Under certain circumstances (system high load, dummynet lock contention, etc) dummynet's tick counter can be significantly slower than it should be. (I've observed up to 25% difference on one of my production servers). Since this counter used for packet scheduling, it's accuracy is vital for precise bandwidth limitation. Introduce new sysctl nodes: net.inet.ip.dummynet. tick_lost - number of ticks coalesced by taskqueue thread. tick_adjustment - number of time corrections done. tick_diff - adjusted vs non-adjusted tick counter difference tick_delta - last vs 'standard' tick differnece (usec). tick_delta_sum - accumulated (and not corrected yet) time difference (usec). Reviewed by: glebius MFC after: 2 month	2006-10-27 13:05:37 +00:00
Oleg Bulyzhin	b2b05096fd	Use separate thread for servicing dummynet(4). Utilize taskqueue(9) API. Submitted by: glebius MFC after: 2 month	2006-10-27 11:16:58 +00:00
Oleg Bulyzhin	c447b19f6e	style(9) cleanup. MFC after: 2 month	2006-10-27 10:52:32 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Julian Elischer	010b65f54a	revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.	2006-10-21 00:16:31 +00:00
Julian Elischer	3df668cc38	Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.	2006-10-20 19:32:08 +00:00
Maxim Konovalov	428b67b194	o Do not do args->f_id.addr_type == 6 when there is IS_IP6_FLOW_ID() exactly for that.	2006-10-11 12:14:28 +00:00
Maxim Konovalov	f16ccf6814	o Kill a nit in the comment.	2006-10-11 12:00:53 +00:00
Maxim Konovalov	5f197ce41e	o Extend not very informative ipfw(4) message 'drop session, too many entries' by src:port and dst:port pairs. IPv6 part is non-functional as ``limit'' does not support IPv6 flows. PR: kern/103967 Submitted by: based on Bruce Campbell patch MFC after: 1 month	2006-10-11 11:52:34 +00:00
Ruslan Ermilov	cc81ddd9db	Merge the rest of my changes.	2006-10-11 07:11:56 +00:00
Paolo Pisati	f3d9aab351	Various mdoc and grammar fixes. Approved by: glebius Reviewed by: glebius, ru	2006-10-08 13:53:45 +00:00
Bjoern A. Zeeb	7002145d8e	Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2. PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days	2006-10-07 10:19:58 +00:00
Gleb Smirnoff	f7a679b200	Save space on stack moving token ring stuff to its own hack block.	2006-10-04 11:08:14 +00:00
Gleb Smirnoff	9b9a52b496	Style rev. 1.152.	2006-10-04 10:59:21 +00:00
Andre Oppermann	6a7c943c59	Remove stone-aged and irrelevant "#ifndef notdef".	2006-09-29 16:44:45 +00:00
Bruce M Simpson	910e1364b6	Nits. Submitted by: ru	2006-09-29 16:16:41 +00:00
Bruce M Simpson	2d20d32344	Push removal of mrouted down to the rest of the tree.	2006-09-29 15:45:11 +00:00
Maxim Konovalov	acc03ac6bb	o Convert w/spaces to tabs in the previous commit.	2006-09-29 06:46:31 +00:00
Mike Silbersack	d4bdcb16cc	Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5, scale it to min(ephemeral port range / 2, maxsockets / 5) so that people with large gobs of memory and/or large maxsockets settings will not exhaust their entire ephemeral port range with sockets in the TIME_WAIT state during periods of heavy load. Those who wish to tweak the size of the TIME_WAIT zone can still do so with net.inet.tcp.maxtcptw. Reviewed by: glebius, ru	2006-09-29 06:24:26 +00:00
Andre Oppermann	2c30ec0a1f	When tcp_output() receives an error upon sending a packet it reverts parts of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653	2006-09-28 18:02:46 +00:00
Andre Oppermann	6a2257d911	When doing TSO correctly do the check to prevent a maximum sized IP packet from overflowing.	2006-09-28 13:59:26 +00:00
Bruce M Simpson	050596b4a0	Fix the IPv4 multicast routing detach path. On interface detach whilst the MROUTER is running, the system would panic as described in the PR. The fix in the PR is a good start, however, the other state associated with the multicast forwarding cache has to be freed in order to avoid leaking memory and other possible panics. More care and attention is needed in this area. PR: kern/82882 MFC after: 1 week	2006-09-28 12:21:08 +00:00
Bruce M Simpson	d966841427	The IPv4 code should clean up multicast group state when an interface goes away. Without this change, it leaks in_multi (and often ether_multi state) if many clonable interfaces are created and destroyed in quick succession. The concept of this fix is borrowed from KAME. Detailed information about this behaviour, as well as test cases, are available in the PR. PR: kern/78227 MFC after: 1 week	2006-09-28 10:04:07 +00:00
Paolo Pisati	7c00cc76f0	Compilation.	2006-09-27 02:08:44 +00:00
Paolo Pisati	be4f3cd0d9	Summer of Code 2005: improve libalias - part 1 of 2 With the first part of my previous Summer of Code work, we get: -made libalias modular: -support for 'particular' protocols (like ftp/irc/etcetc) is no more hardcoded inside libalias, but it's available through external modules loadable at runtime -modules are available both in kernel (/boot/kernel/alias_.ko) and user land (/lib/libalias_) -protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp, skinny and smedia -added logging support for kernel side -cleanup After a buildworld, do a 'mergemaster -i' to install the file libalias.conf in /etc or manually copy it. During startup (and after every HUP signal) user land applications running the new libalias will try to read a file in /etc called libalias.conf: that file contains the list of modules to load. User land applications affected by this commit are ppp and natd: if libalias.conf is present in /etc you won't notice any difference. The only kernel land bit affected by this commit is ng_nat: if you are using ng_nat, and it doesn't correctly handle ftp/irc/etcetc sessions anymore, remember to kldload the correspondent module (i.e. kldload alias_ftp). General information and details about the inner working are available in the libalias man page under the section 'MODULAR ARCHITECTURE (AND ipfw(4) SUPPORT)'. NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat support will be part of the next libalias-related commit. Approved by: glebius Reviewed by: glebius, ru	2006-09-26 23:26:53 +00:00
John-Mark Gurney	e16fa5ca55	fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...	2006-09-26 01:21:46 +00:00
Bruce M Simpson	13c8384424	Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set. PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week	2006-09-25 11:53:54 +00:00
Bruce M Simpson	e2fd806b36	Spleling Submitted by: pjd	2006-09-25 11:48:07 +00:00
Bruce M Simpson	07ea6709ea	Account for output IP datagrams on the ifaddr where they originated from, not the first ifaddr on the ifp. This is similar to what NetBSD does. PR: kern/72936 Submitted by: alfred Reviewed by: andre	2006-09-25 10:11:16 +00:00
John-Mark Gurney	4dc630cdd2	if min is greater than max, prefer max over min... I managed to get a retransmit timer that was going to take 19 days to trigger... Reviewed by: silby	2006-09-25 07:22:39 +00:00
John-Mark Gurney	402865f637	now that we don't automagicly increase the MTU of host routes, when we copy the loopback interface, copy it's mtu also.. This means that we again have large mtu support for local ip addresses...	2006-09-23 19:24:10 +00:00
Bruce M Simpson	f1edc3bde5	Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth	2006-09-23 16:26:31 +00:00
Andre Oppermann	7ff0b850a6	Make tcp_usr_send() free the passed mbufs on error in all cases as the comment to it claims. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:39:35 +00:00
John Hay	724e825a16	Handle a list of IPv6 src and dst addresses correctly, eg. ipfw add allow ip6 from any to 2000::/16,2002::/16 PR: 102422 (part 3) Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru> MFC after: 5 days	2006-09-16 10:27:05 +00:00
Andre Oppermann	31ecb34a4e	When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len from wrapping when we generate a maximally sized packet for later segmentation. Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-15 16:08:09 +00:00
Andrey A. Chernov	239e71c612	Add missing #ifdef INET6 (can't be compiled)	2006-09-14 10:22:35 +00:00
Andre Oppermann	67d828b162	Remove unessary includes and follow common ordering style.	2006-09-13 13:21:17 +00:00
Andre Oppermann	bf6d304ab2	Rewrite of TCP syncookies to remove locking requirements and to enhance functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-13 13:08:27 +00:00
Christian S.J. Peron	d94f2a68f8	Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point exists to allow the mandatory access control policy to properly initialize mbufs generated by the firewall. An example where this might happen is keep alive packets, or ICMP error packets in response to other packets. This takes care of kernel panics associated with un-initialize mbuf labels when the firewall generates packets. [1] I modified this patch from it's original version, the initial patch introduced a number of entry points which were programmatically equivalent. So I introduced only one. Instead, we should leverage mac_create_mbuf_netlayer() which is used for similar situations, an example being icmp_error() This will minimize the impact associated with the MFC Submitted by: mlaier [1] MFC after: 1 week This is a RELENG_6 candidate	2006-09-12 04:25:13 +00:00
Andre Oppermann	384a05bfd0	Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet. Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)	2006-09-11 19:56:10 +00:00
John-Mark Gurney	3ae2ad088e	make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.	2006-09-10 17:49:09 +00:00
Gleb Smirnoff	3e630ef9a9	Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress creating a compress TIME WAIT states, if both connection endpoints are local. Default is off.	2006-09-08 13:09:15 +00:00
Ruslan Ermilov	751dea2935	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).	2006-09-07 13:06:00 +00:00
Andre Oppermann	b3c0f300fb	Second step of TSO (TCP segmentation offload) support in our network stack. TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet. The length of TSO bursts is limited to TCP_MAXWIN. The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled. TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options). IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet. Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-07 12:53:01 +00:00
Ruslan Ermilov	3c89486cc7	Remove a microoptimization for i386 that was a micropessimization for amd64.	2006-09-07 09:49:08 +00:00
Andre Oppermann	233dcce118	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
Andre Oppermann	6fbfd5825f	Check inp_flags instead of inp_vflag for INP_ONESBCAST flag. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 19:04:36 +00:00
Andre Oppermann	773725a255	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 17:12:10 +00:00
Gleb Smirnoff	2c857a9be9	o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru	2006-09-06 13:56:35 +00:00
Gleb Smirnoff	c3e07bf82a	Finally fix rev. 1.256 Pointy hat to: glebius	2006-09-05 14:00:59 +00:00
Gleb Smirnoff	23ebab416c	Remove extra parenthesis in last commit. Nitpicked by: ru	2006-09-05 12:22:54 +00:00
Gleb Smirnoff	1f1f90c3a7	- Make net.inet.tcp.maxtcptw modifiable at run time. - If net.inet.tcp.maxtcptw was ever set explicitly, do not change it if kern.ipc.maxsockets is changed.	2006-09-05 12:08:47 +00:00
Thomas Quinot	d438d81581	Fix typo in comment.	2006-09-04 08:32:17 +00:00
John Hay	1c31b456b9	Recognise IPv6 PIM packets. MFC after: 1 week	2006-08-31 16:56:45 +00:00
Mohan Srinivasan	2374501ca4	Fix for a bug that causes the computation of "len" in tcp_output() to get messed up, resulting in an inconsistency between the TCP state and so_snd.	2006-08-26 17:53:19 +00:00
Julian Elischer	afad78e259	comply with style police Submitted by: ru MFC after: 1 month	2006-08-18 22:36:05 +00:00
Julian Elischer	c487be961a	Allow ipfw to forward to a destination that is specified by a table. for example: fwd tablearg ip from any to table(1) where table 1 has entries of the form: 1.1.1.0/24 10.2.3.4 208.23.2.0/24 router2 This allows trivial implementation of a secondary routing table implemented in the firewall layer. I expect more work (under discussion with Glebius) to follow this to clean up some of the messy parts of ipfw related to tables. Reviewed by: Glebius MFC after: 1 month	2006-08-17 22:49:50 +00:00
Julian Elischer	b7522c27d2	Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was in older versions of FreeBSD. This option is pointless as it is needed in just about every interesting usage of forward that I have ever seen. It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x or 7.x Reviewed by: glebius MFC after: 1 week	2006-08-17 00:37:03 +00:00
Mohan Srinivasan	464469c713	Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby	2006-08-11 21:15:23 +00:00
Brooks Davis	43bc7a9c62	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.	2006-08-04 21:27:40 +00:00
Oleg Bulyzhin	0e0b1bb57a	Remove useless NULL pointer check: we are using M_WAITOK flag for memory allocation. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Approved by: glebius (mentor) MFC after: 1 week	2006-08-04 10:50:51 +00:00
Robert Watson	e850475248	Move soisdisconnected() in tcp_discardcb() to one of its calling contexts, tcp_twstart(), but not to the other, tcp_detach(), as the socket is already being torn down and therefore there are no listeners. This avoids a panic if kqueue state is registered on the socket at close(), and eliminates to XXX comments. There is one case remaining in which tcp_discardcb() reaches up to the socket layer as part of the TCP host cache, which would be good to avoid. Reported by: Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>	2006-08-02 16:18:05 +00:00
Oleg Bulyzhin	9b1858ca78	Do not leak memory while flushing rules. Noticed by: yar Approved by: glebius (mentor) MFC after: 1 week	2006-08-02 14:58:51 +00:00
Robert Watson	a152f8a361	Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn	2006-07-21 17:11:15 +00:00
Stephan Uphoff	d915b28015	Fix race conditions on enumerating pcb lists by moving the initialization ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked) I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs) Reviewed by: rwatson@, mohans@ MFC after: 1 week	2006-07-18 22:34:27 +00:00
Sam Leffler	6b7330e2d4	Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code) Reviewed by: arch@	2006-07-09 06:04:01 +00:00
Max Laier	05206588f2	Make in-kernel multicast protocols for pfsync and carp work after enabling dynamic resizing of multicast membership array. Reported and testing by: Maxim Konovalov, Scott Ullrich Reminded by: thompsa MFC after: 2 weeks	2006-07-08 00:01:01 +00:00
Robert Watson	be54a5eeb3	Remove unneeded mac.h include. MFC after: 3 days	2006-07-06 13:25:01 +00:00
Oleg Bulyzhin	6372145725	Complete timebase (time_second -> time_uptime) conversion. PR: kern/94249 Reviewed by: andre (few months ago) Approved by: glebius (mentor)	2006-07-05 23:37:21 +00:00
Maxim Konovalov	764a094c3f	o Kill BUGS section as it is not valid since rev. 1.4 alias_pptp.c. Spotted by: ru.unix.bsd activists MFC after: 1 week	2006-07-04 20:39:38 +00:00
Yaroslav Tykhiy	4b97d7affd	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru	2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy	ad67537233	Use TAILQ_FOREACH consistently.	2006-06-29 17:09:47 +00:00
Gleb Smirnoff	4d09f5a030	Fix URL to Bellovin's paper. Submitted by: Anton Yuzhaninov <citrin rambler-co.ru>	2006-06-29 13:38:36 +00:00
Bjoern A. Zeeb	333ad3bc40	Eliminate the offset argument from send_reject. It's not been used since FreeBSD-SA-06:04.ipfw. Adopt send_reject6 to what had been done for legacy IP: no longer send or permit sending rejects for any but the first fragment. Discussed with: oleg, csjp (some weeks ago)	2006-06-29 11:17:16 +00:00
Bjoern A. Zeeb	421d8aa603	Use INPLOOKUP_WILDCARD instead of just 1 more consistently. OKed by: rwatson (some weeks ago)	2006-06-29 10:49:49 +00:00
Pawel Jakub Dawidek	835d4b8924	- Use suser_cred(9) instead of directly checking cr_uid. - Change the order of conditions to first verify that we actually need to check for privileges and then eventually check them. Reviewed by: rwatson	2006-06-27 11:35:53 +00:00
Andre Oppermann	cc477a6347	In syncache_respond() do not reply with a MSS that is larger than what the peer announced to us but make it at least tcp_minmss in size. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 17:54:53 +00:00
Andre Oppermann	8bfb19180d	Some cleanups and janitorial work to tcp_syncache: o don't assign remote/local host/port information manually between provided struct in_conninfo and struct syncache, bcopy() it instead o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture the purpose of this field o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons o fix IPSEC error case printf's to report correct function name o in syncache_socket() only transpose enhanced tcp options parameters to struct tcpcb when the inpcb doesn't has TF_NOOPT set o in syncache_respond() reorder stack variables o in syncache_respond() remove bogus KASSERT() No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 16:14:19 +00:00
Andre Oppermann	f72167f4d1	Some cleanups and janitorial work to tcp_dooptions(): o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its usage accordingly o update the comments to the tcp_dooptions() invocation in tcp_input():after_listen to reflect reality o move the logic checking the echoed timestamp out of tcp_dooptions() to the only place that uses it next to the invocation described in the previous item o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others o add comments in to struct tcpopt.to_flags #defines No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 15:35:25 +00:00
Andre Oppermann	dfabcc1d29	Reverse the source/destination parameters to in[6]_pcblookup_hash() in syncache_respond() for the #ifdef MAC case. Submitted by: Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>	2006-06-26 09:43:55 +00:00
Robert Watson	b4470c1639	In tcp6_usr_attach(), return immediately if SS_ISDISCONNECTED, to avoid dereferencing an uninitialized inp variable. Submitted by: Michiel Boland <michiel at boland dot org> MFC after: 1 month	2006-06-26 09:38:08 +00:00
Andre Oppermann	a846263567	Decrement the global syncache counter in syncache_expand() when the entry is removed from the bucket. This fixes the syncache statistics.	2006-06-25 11:11:33 +00:00
Andre Oppermann	649ac0ce5f	Move the syncookie MD5 context from globals to the stack to make it MP safe.	2006-06-22 15:07:45 +00:00
Hajimu UMEMOTO	a0a59ae4af	- Pullup even when the extention header is unknown, to prevent infinite loop with net.inet6.ip6.fw.deny_unknown_exthdrs=0. - Teach ipv6 and ipencap as they appear in an IPv4/IPv6 over IPv6 tunnel. - Test the next extention header even when the routing header type is unknown with net.inet6.ip6.fw.deny_unknown_exthdrs=0. Found by: xcast-fan-club MFC after: 1 week	2006-06-22 13:22:54 +00:00
Andre Oppermann	c9f7b0ad5b	Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied memory location for already existing/initialized mutexes. With random data in the memory location this fails (ie. after a soft reboot). Reported by: brueffer, YAMAMOTO Shigeru Submitted by: YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>	2006-06-20 08:11:30 +00:00
David Malone	5e1aa27995	When we receive an out-of-window SYN for an "ESTABLISHED" connection, ACK the SYN as required by RFC793, rather than ignoring it. NetBSD have had a similar change since 1999. PR: 93236 Submitted by: Grant Edwards <grante@visi.com> MFC after: 1 month	2006-06-19 12:33:52 +00:00
Andre Oppermann	6593a94979	Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer used and needed. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-18 14:24:12 +00:00
Andre Oppermann	2f1a4ccfc1	Do not access syncache entry before it was allocated for the TF_NOOPT case in syncache_add(). Found by: Coverity Prevent CID: 1473	2006-06-18 13:03:42 +00:00
Andre Oppermann	8411d000a1	Move all syncache related structures to tcp_syncache.c. They are only used there. This unbreaks userland programs that include tcp_var.h. Discussed with: rwatson	2006-06-18 12:26:11 +00:00
Andre Oppermann	bdfbf1e203	Remove double lock acquisition in syncookie_lookup() which came from last minute conversions to macros. Pointy hat to: andre	2006-06-18 11:48:03 +00:00
Andre Oppermann	ee2e4c1d4e	Fix the !INET6 compile. Reported by: alc	2006-06-17 18:42:07 +00:00
Andre Oppermann	93f0d0c5bf	Rearrange fields in struct syncache and syncache_head to make them more cache line friendly. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:57:36 +00:00
Andre Oppermann	0c529372f0	ANSIfy and tidy up comments. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:49:11 +00:00
Andre Oppermann	351630c40d	Add locking to TCP syncache and drop the global tcpinfo lock as early as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity. On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment. Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:32:38 +00:00
Oleg Bulyzhin	254c472561	Add support of 'tablearg' feature for: - 'tag' & 'untag' action parameters. - 'tagged' & 'limit' rule options. Rule examples: pipe 1 tag tablearg ip from table(1) to any allow ip from any to table(2) tagged tablearg allow tcp from table(3) to any 25 setup limit src-addr tablearg sbin/ipfw/ipfw2.c: 1) new macros GET_UINT_ARG - support of 'tablearg' keyword, argument range checking. PRINT_UINT_ARG - support of 'tablearg' keyword. 2) strtoport(): do not silently truncate/accept invalid port list expressions like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup. Approved by: glebius (mentor) MFC after: 1 month	2006-06-15 09:39:22 +00:00
Oleg Bulyzhin	58a0fab73f	install_state(): style(9) cleanup Approved by: glebius (mentor) MFC after: 1 month	2006-06-15 08:54:29 +00:00
Andrew Thompson	5feebeeb53	Enable proxy ARP answers on any of the bridged interfaces if proxy record belongs to another interface within the bridge group. PR: kern/94408 Submitted by: Eygene A. Ryabinkin MFC after: 1 month	2006-06-09 00:33:30 +00:00
Oleg Bulyzhin	458009ae93	install_state() should properly initialize 'addr_type' field of newly created flows for O_LIMIT rules. Otherwise 'ipfw -d show' is unable to display PARENT rules properly. (This bug was exposed by ipfw2.c rev.1.90) Approved by: glebius (mentor) MFC after: 2 weeks	2006-06-08 11:27:45 +00:00
Oleg Bulyzhin	d2dc1907e8	Fix following rules: pipe X (tag\|altq) Y ... Approved by: glebius (mentor) MFC after: 2 weeks	2006-06-08 11:13:23 +00:00
Robert Watson	f2de87fec4	Push acquisition of pcbinfo lock out of tcp_usr_attach() into tcp_attach() after the call to soreserve(), as it doesn't require the global lock. Rearrange inpcb locking here also. MFC after: 1 month	2006-06-04 09:31:34 +00:00
Robert Watson	d8ab0ec661	When entering a timer on a tcpcb, don't continue processing if it has been dropped. This prevents a bug introduced during the socket/pcb refcounting work from occuring, in which occasionally the retransmit timer may fire after a connection has been reset, resulting in the resulting R\|A TCP packet having a source port of 0, as the port reservation has been released. While here, fixing up some RUNLOCK->WUNLOCK bugs. MFC after: 1 month	2006-06-03 19:37:08 +00:00
Robert Watson	f24618aaf0	Acquire udbinfo lock after call to soreserve() rather than before, as it is not required. This simplifies error-handling, and reduces the time that this lock is held. MFC after: 1 month	2006-06-03 19:29:26 +00:00
Christian S.J. Peron	16d878cc99	Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month	2006-06-02 19:59:33 +00:00
Robert Watson	ad3a630f7e	Minor restyling and cleanup around ipport_tick(). MFC after: 1 month	2006-06-02 08:18:27 +00:00
Oleg Bulyzhin	6a7d5cb645	Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9). Since tags are kept while packet resides in kernelspace, it's possible to use other kernel facilities (like netgraph nodes) for altering those tags. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru> Approved by: glebius (mentor) Idea from: OpenBSD PF MFC after: 1 month	2006-05-24 13:09:55 +00:00
Maxim Konovalov	d45e4f9945	o In udp\|rip_disconnect() acquire a socket lock before the socket state modification. To prevent races do that while holding inpcb lock. Reviewed by: rwatson	2006-05-21 19:28:46 +00:00
Maxim Konovalov	635354c446	o Add missed error check: in ip_ctloutput() sooptcopyin() returns a result but we never examine it. Reviewed by: rwatson MFC after: 2 weeks	2006-05-21 17:52:08 +00:00
Bruce M Simpson	8d7d85149e	Initialize the new members of struct ip_moptions as a defensive programming measure. Note that whilst these members are not used by the ip_output() path, we are passing an instance of struct ip_moptions here which is declared on the stack (which could be considered a bad thing). ip_output() does not consume struct ip_moptions, but in case it does in future, declare an in_multi vector on the stack too to behave more like ip_findmoptions() does.	2006-05-18 19:51:08 +00:00
Gleb Smirnoff	e5f88c4492	Since m_pullup() can return a new mbuf, change gre_input2() to return mbuf back to gre_input(). If the former returns mbuf back to the latter, then pass it to raw_input(). Coverity ID: 829	2006-05-16 11:15:22 +00:00
Gleb Smirnoff	ffb761f624	- Backout one line from 1.78. The tp can be freed by tcp_drop(). - Style next line. Coverity ID: 912	2006-05-16 10:51:26 +00:00
Maxim Konovalov	eb16472f74	o In rip_disconnect() do not call rip_abort(), just mark a socket as not connected. In soclose() case rip_detach() will kill inpcb for us later. It makes rawconnect regression test do not panic a system. Reviewed by: rwatson X-MFC after: with all 1th April inpcb changes	2006-05-15 09:28:57 +00:00
Max Laier	0e7185f6e7	Use only lower 64bit of src/dest (and src/dest port) for hashing of IPv6 connections and get rid of the flow_id as it is not guaranteed to be stable some (most?) current implementations seem to just zero it out. PR: kern/88664 Reported by: jylefort Submitted by: Joost Bekkers (w/ changes) Tested by "regisr" <regisrApoboxDcom>	2006-05-14 23:42:24 +00:00
Bruce M Simpson	3548bfc964	Fix a long-standing limitation in IPv4 multicast group membership. By making the imo_membership array a dynamically allocated vector, this minimizes disruption to existing IPv4 multicast code. This change breaks the ABI for the kernel module ip_mroute.ko, and may cause a small amount of churn for folks working on the IGMPv3 merge. Previously, sockets were subject to a compile-time limitation on the number of IPv4 group memberships, which was hard-coded to 20. The imo_membership relationship, however, is 1:1 with regards to a tuple of multicast group address and interface address. Users who ran routing protocols such as OSPF ran into this limitation on machines with a large system interface tree.	2006-05-14 14:22:49 +00:00
Max Laier	656faadcb8	Remove ip6fw. Since ipfw has full functional IPv6 support now and - in contrast to ip6fw - is properly lockes, it is time to retire ip6fw.	2006-05-12 20:39:23 +00:00
Max Laier	e93187482d	Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing seperately. Also use pfil hook/unhook instead of keeping the check functions in pfil just to return there based on the sysctl. While here fix some whitespace on a nearby SYSCTL_ macro.	2006-05-12 04:41:27 +00:00
Max Laier	432288dcb6	Don't claim "(+ipv6)" if we didn't build with INET6.	2006-05-11 15:22:38 +00:00
Robert Watson	59b8854eee	Modify UDP to use sosend_dgram() instead of sosend(). This allows for signicantly optimized UDP socket I/O when using a single UDP socket from many threads or processes that share it, by avoiding significant locking and other overhead in the general sosend() path that isn't necessary for simple datagram sockets. Specifically, this change results in a significant performance improvement for threaded name service in BIND9 under load. Suggested by: Jinmei_Tatsuya at isc dot org	2006-05-06 11:24:59 +00:00
Bjoern A. Zeeb	91b309a1c4	Make sure the ip data pointer is correct before touching it again after ipsec4_output processing else KAME IPSec using the handbook configuration with gif(4) will panic the kernel. Problem reported by: t. patterson <tp lot.org> Tested by: t. patterson <tp lot.org>	2006-05-05 07:31:03 +00:00
Robert Watson	3127286870	Only return (tw) from tcp_twclose() if reuse is passed, otherwise return NULL. In principle this shouldn't change the behavior, but avoids returning a potentially invalid/inappropriate pointer to the caller. Found with: Coverity Prevent (tm) Submitted by: pjd MFC after: 3 months	2006-05-05 06:50:23 +00:00
Pawel Jakub Dawidek	1d7d0bfe5e	/tmp/cvsTXPIwQ	2006-05-05 06:24:34 +00:00
Marcel Moolenaar	7c5a8ab212	In in_pcbdrop(), fix !INVARIANTS build.	2006-04-25 23:23:13 +00:00
Robert Watson	8e3f3b169e	Rename 'last' to 'inp' in udp_append(): the name 'last' is due to the fact that the loop through inpcb's in udp_input() tracks the last inpcb while looping. We keep that name in the calling loop but not in the delivery routine itself. MFC after: 3 months	2006-04-25 17:38:08 +00:00
Robert Watson	10702a2840	Abstract inpcb drop logic, previously just setting of INP_DROPPED in TCP, into in_pcbdrop(). Expand logic to detach the inpcb from its bound address/port so that dropping a TCP connection releases the inpcb resource reservation, which since the introduction of socket/pcb reference count updates, has been persisting until the socket closed rather than being released implicitly due to prior freeing of the inpcb on TCP drop. MFC after: 3 months	2006-04-25 11:17:35 +00:00
Robert Watson	c78cbc7b1d	Instead of calling tcp_usr_detach() from tcp_usr_abort(), break out common pcb tear-down logic into tcp_detach(), which is called from either. Invoke tcp_drop() from the tcp_usr_abort() path rather than tcp_disconnect(), as we want to drop it immediately not perform a FIN sequence. This is one reason why some people were experiencing panics in sodealloc(), as the netisr and aborting thread were simultaneously trying to tear down the socket. This bug could often be reproduced using repeated runs of the listenclose regression test. MFC after: 3 months PR: 96090 Reported by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris Tested by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris	2006-04-24 08:20:02 +00:00
Robert Watson	9106a6d6b0	Replace isn_mtx direct use with ISN_*() lock macros so that locking details/strategy can be changed without touching every use. MFC after: 3 months	2006-04-23 12:27:42 +00:00
Robert Watson	4c0e8f41f6	Introduce a new TCP mutex, isn_mtx, which protects the initial sequence number state, rather than re-using pcbinfo. This introduces some additional mutex operations during isn query, but avoids hitting the TCP pcbinfo lock out of yet another frequently firing TCP timer. MFC after: 3 months	2006-04-22 19:23:24 +00:00
Robert Watson	602cc7f12b	Assert the inpcb lock when rehashing an inpcb. Improve consistency of style around some current assertions. MFC after: 3 months	2006-04-22 19:15:20 +00:00
Robert Watson	6466b28a40	Remove pcbinfo locking from in_setsockaddr() and in_setpeeraddr(); holding the inpcb lock is sufficient to prevent races in reading the address and port, as both the inpcb lock and pcbinfo lock are required to change the address/port. Improve consistency of spelling in assertions about inp != NULL. MFC after: 3 months	2006-04-22 19:10:02 +00:00
Paul Saab	4f590175b7	Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.	2006-04-21 09:25:40 +00:00
Gleb Smirnoff	4cbb118526	Merge rev. 1.240 of ip_output.c, so that IPFIREWALL_FORWARD_EXTENDED kernel option will affect both forwarding methods - classic and fast.	2006-04-18 09:20:16 +00:00
Robert Watson	3cbe7fafa5	Modify tcp_timewait() to accept an inpcb reference, not a tcptw reference. For now, we allow the possibility that the in_ppcb pointer in the inpcb may be NULL if a timewait socket has had its tcptw structure recycled. This allows tcp_timewait() to consistently unlock the inpcb. Reported by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-09 16:59:19 +00:00
Mohan Srinivasan	1714e18e79	Eliminate debug code that catches bugs in the hinting of sack variables (tcp_sack_output_debug checks cached hints aginst computed values by walking the scoreboard and reports discrepancies). The sack hinting code has been stable for many months now so it is time for the debug code to go. Leaving tcp_sack_output_debug ifdef'ed out in case we need to resurrect it at a later point.	2006-04-06 17:21:16 +00:00
Robert Watson	a460ae4b4c	Don't unlock a timewait structure if the pointer is NULL in tcp_timewait(). This corrects a bug (or lack of fixing of a bug) in tcp_input.c:1.295. Submitted by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-05 08:45:59 +00:00
Mohan Srinivasan	1f65c2cd31	Certain (bad) values of sack blocks can end up corrupting the sack scoreboard. Make the checks in tcp_sack_doack() more robust to prevent this. Submitted by: Raja Mukerji (raja@mukerji.com) Reviewed by: Mohan Srinivasan	2006-04-05 00:11:04 +00:00
Gleb Smirnoff	a73b656763	Add a tunable net.inet.tcp.maxtcptw, that allows to set a limit on tcptw zone independently from setting a limit on socket zone.	2006-04-04 14:31:37 +00:00
Robert Watson	ae0e714308	Before dereferencing intotw() when INP_TIMEWAIT, check for inp_ppcb being NULL. We currently do allow this to happen, but may want to remove that possibility in the future. This case can occur when a socket is left open after TCP wraps up, and the timewait state is recycled. This will be cleaned up in the future. Found by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-04 12:26:07 +00:00
Robert Watson	cb895fb9b0	In TCP notify routines, check inpcb for INP_TIMEWAIT and INP_DROPPED. The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT checks appear to have always been required, but not been there, which is/was a bug. This avoids unconditionally casting of in_ppcb to a tcpcb, when it may be a twtcb, which may have resulted in obscure ICMP-related panics in earlier releases. MFC after: 3 months	2006-04-03 14:07:50 +00:00
Robert Watson	afa39e25c4	Change inp_ppcb from caddr_t to void , fix/remove associated related casts. Consistently use intotw() to cast inp_ppcb pointers to struct tcptw pointers. Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb * pointers. Don't assign tp to the results to intotcpcb() during variable declation at the top of functions, as that is before the asserts relating to locking have been performed. Do this later in the function after appropriate assertions have run to allow that operation to be conisdered safe. MFC after: 3 months	2006-04-03 13:33:55 +00:00
Robert Watson	43f56a32a0	Style tweaks: convert to ANSI from K&R function prototypes. MFC after: 3 months	2006-04-03 12:59:27 +00:00
Robert Watson	2fc5ae87d0	Update comment on tcp_close() for new world order. MFC after: 3 months	2006-04-03 12:52:13 +00:00
Robert Watson	e6e65783d6	Clarify comment on handling of non-timewait TCP states in tcp_usr_detach(). MFC after: 3 months	2006-04-03 12:43:56 +00:00
Robert Watson	fa38deac65	Fix up locking surrounding tcp_drop sysctl: in the new world order, we don't free inpcbs until after the socket is closed, so we always need to unlock an inpcb after calling tcp_drop() on it. MFC after: 3 months	2006-04-03 11:57:12 +00:00
Robert Watson	3d2d3ef434	After checking for SO_ISDISCONNECTED in tcp_usr_accept(), return immediately rather than jumping to the normal output handling, which assumes we've pulled out the inpcb, which hasn't happened at this point (and isn't necessary). Return ECONNABORTED instead of EINVAL when the inpcb has entered INP_TIMEWAIT or INP_DROPPED, as this is the documented error value. This may correct the panic seen by Ganbold. MFC after: 1 month Reported by: Ganbold <ganbold at micom dot mng dot net>	2006-04-03 09:52:55 +00:00
Robert Watson	a34f6c1e1d	Correct incorrect assertion in div_bind(): inp must not be NULL here. Reported by: tegge MFC after: 3 months	2006-04-03 09:01:17 +00:00
Robert Watson	953b5606df	During reformulation of tcp_usr_detach(), the call to initiate TCP disconnect for fully connected sockets was dropped, meaning that if the socket was closed while the connection was alive, it would be leaked. Structure tcp_usr_detach() so that there are two clear parts: initiating disconnect, and reclaiming state, and reintroduce the tcp_disconnect() call in the first part. MFC after: 3 months	2006-04-02 16:42:51 +00:00
Robert Watson	34af7bae80	Properly handle an edge case previously not handled correctly: a socket can have a tcp connection that has entered time wait attached to it, in the event that shutdown() is called on the socket and the FINs properly exchange before close(). In this case we don't detach or free the inpcb, just leave the tcptw detached and freed, but we must release the inpcb lock (which we didn't previously). MFC after: 3 months	2006-04-01 23:53:25 +00:00
Robert Watson	623dce13c6	Update TCP for infrastructural changes to the socket/pcb refcount model, pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months	2006-04-01 16:36:36 +00:00
Robert Watson	14ba8add01	Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months	2006-04-01 16:20:54 +00:00
Robert Watson	4c7c478d0f	Break out in_pcbdetach() into two functions: - in_pcbdetach(), which removes the link between an inpcb and its socket. - in_pcbfree(), which frees a detached pcb. Unlike the previous in_pcbdetach(), neither of these functions will attempt to conditionally free the socket, as they are responsible only for managing in_pcb memory. Mirror these changes into in6_pcbdetach() by breaking it into in6_pcbdetach() and in6_pcbfree(). While here, eliminate undesired checks for NULL inpcb pointers in sockets, as we will now have as an invariant that sockets will always have valid so_pcb pointers. MFC after: 3 months	2006-04-01 16:04:42 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Robert Watson	ac45e92ff2	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
Robert Watson	6882aa2c03	Define two new inpcb flags in the inp_vflag field, which for whatever reason, seems to be where new flags are getting defined: INP_DROPPED - The protocol has terminated this connection and the socket is not reusable: when the socket code enters the protocol, an error is immediately returned. This will substitute for NULLing the so_pcb socket field, helping to implement the invariant that all valid sockets have valid pcb's in TCP. INP_SOCKREF - The protocol has become the owner of the socket reference, and will need to free it when freeing the pcb, which will be used when a TCP socket is closed but still has queued data. MFC after: 1 month	2006-03-26 11:30:31 +00:00
Robert Watson	a07b8fd178	Minor style tweak: tab after #define, not space. MFC after: 1 month	2006-03-26 11:26:12 +00:00
Robert Watson	1c53f80637	Explicitly assert socket pointer is non-NULL in tcp_input() so as to provide better debugging information. Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans. MFC after: 1 month	2006-03-26 01:33:41 +00:00
Gleb Smirnoff	0fa0801895	o Introduce carp_multicast_cleanup(), which removes and frees multicast addresses from carp interface. [1] o Rewrite carpdetach(), so that it does the following things: [1] - Stops callouts. - Decrements carp_suppress_preempt, if needed. - Downs interface and sets CARP state to INIT. - Calls carp_multicast_cleanup(). - Detaches softc from carp_if and if we are the last frees the carp_if. o Use new carpdetach() in carp_clone_destroy(). o In carp_ifdetach() acquire the carp_if lock and cleanup all interfaces hanging on carp_if. [1] o Make carp_ifdetach() static and use EVENT(9) to call it from if_detach(). [2] o In carp_setrun() exit if the softc doesn't have a valid pointer to parent. [1] Obtained from: OpenBSD [1] Submitted by: Dan Lukes <dan obluda.cz> [2] PR: kern/82908 [2]	2006-03-21 14:29:48 +00:00
Giorgos Keramidas	f92e18f4fc	Add descriptions for the sysctls: net.inet.icmp.drop_redirect net.inet.icmp.log_redirect net.inet.icmp.icmplim net.inet.icmp.icmplim_output Approved & text by: andre	2006-03-20 21:44:12 +00:00

... 2 3 4 5 6 ...

2815 Commits