freebsd-skq

Author	SHA1	Message	Date
rrs	ad17d8ebf0	Found by Michael. In cases where we run out of memory (no more inp space) we don't propely NULL the INP on return. Obtained from: tuexen MFC after: 3 Days	2010-06-09 22:05:29 +00:00
rrs	73f9a7ebf6	Fix serveral bugs all having to do with freeing an sctp_inpcb: 1) Make sure not to remove the flag on the PCB until after the close() caller is back in control with the lock. Otherwise a quickly freeing assoc could kill the inpcb and cause a panic. 2) Make sure all calls to log_closing have not released the locks before calling the log function, we don't want the logging function to crash us due to a freed inpcb. 3) Make sure that when we get to the end, we release all locks (after removing them from view) and as long as we are NOT the inp-kill timer removing the inp, call the callout_drain() function so a racing timer won't later call in and cause a racing crash. MFC after: 1 week	2010-06-09 16:42:42 +00:00
rrs	d1171df905	BUG:Turns out we need to use both bit maps to calculate the cum-ack (we were not doing it for the NR-Sack case). With this fix NR-sack should now work correctly. MFC after: 1 week	2010-06-09 16:39:18 +00:00
rrs	8bbbdc4764	2 Bugs: 1) Only use both mapping arrays when NR sack is off. This way we can hold off moving the cumack (not the best but workable) when NR-sack is on. 2) We must make sure to just return on the move of the bit to the NR array if the cum-ack as already went past the TSN. This prevents marking a bit behind the array and hitting the invariant code that panic's us. MFC after: 1 week	2010-06-08 03:39:31 +00:00
rrs	79f1540a6e	This fixes a BUG in the handling of the cum-ack calculation. We were only paying attention to the nr-mapping-array. Which seems to make sense on the surface, by definition things up to the cum-ack should be deliverable thus in the nr-mapping-array. However (there is always a gotcha) thats not true when it comes to large messages. The stack may hold the message while re-assembling it not not deliver it based on several thresholds. If that happens (which it would for smaller large messages) then the cum-ack is figured wrong. We now properly use both arrays in the cum-ack calculation. MFC after: 1 week.	2010-06-07 18:29:10 +00:00
rrs	259851f89c	Opps... my bad.. we don't need a SOCK_UNLOCK() after calling socantrcvmore_locked() since it will unlock the lock for you. MFC after: 1 week	2010-06-07 11:33:20 +00:00
rrs	de042002c0	Fix so we call socantrcvmore_locked so we don't see a race where we unlock to call the non-locked version and have the socket go away. MFC after: 1 week	2010-06-07 04:01:38 +00:00
rrs	3bcf4834bb	1) Optimize the cleanup and don't always depend on the timer. This is done by considering the locks we will destroy and if they are contended we consider it the same as a reference count being up. Fixing this appears to cleanup another crash that was appearing with all the timers where the socket buf lock got corrupted. 2) Fix the sysctl code to take a lot more care when looking at INP's that are in the GONE or ALLGONE state. MFC after: 1 week	2010-06-06 20:34:17 +00:00
rrs	6bf375889d	Ok, yet another bug in killing off all the hundreds of apitesters.. Basically we end up with attempting to destroy a lock thats contended on. A cookie echo arrives at the same time that the close is happening. The close gets the lock but the cookie echo has already passed the check for the gone flag and is then locked waiting on the create lock.. when we go to destroy it bam. For now we do the timer destroy for all calls to close.. We can probably optimize this later so that we check whats being contended on and if there is contention then do the timer thing. but this is probably safest since the inp has been removed from all lists and references and only the timer can find it.. once the locks are released all other places will instantly see the GONE flag and bail (thats what the change in sctp_input is one place that was lacking the bail code). MFC after: 1 week	2010-06-06 19:24:32 +00:00
rrs	e9703449d0	1) Further enhance the INVARIANT lock validation (no locks) are held by checking the create and inp locks as well. 2) Fix a bug in that when a socket is closed an INIT-ACK is returned, we do NOT unlock the locked_tcb unless its different (an unlikely scenario). If we blindly unlock as we were doing before we can end up unlocking the actual stcb thats about to be sent down to the free function which requires the lock be held. MFC after: 1 week	2010-06-06 16:11:16 +00:00
rrs	aaa6b56e3f	Fix a bug in the sctp_inpcb_free. Basically if the socket was setup to do an abortive close an association that was in the accept_queue could get stuck and never freed. Now we properly start the kill timer on the socket and turn off the flag (same thing we do for the graceful close method). MFC after: 1 week	2010-06-06 16:09:12 +00:00
rrs	54047d0058	Fix a bug in sctp_abort_assoc(). DON'T call the sctp_inpcb_free when the gone flag is set. You don't know what locks the caller has set and there is already a kill timer running. MFC after: 1 week	2010-06-06 16:07:40 +00:00
rrs	13d687dbf6	Hopefully this fixes a LOR by making so we only hold the iterator lock during updates to the iterators work. MFC after: 1 week	2010-06-06 02:33:46 +00:00
rrs	923bc21fb4	Bruce's fix for some return's in error legs. MFC after: 1 week	2010-06-06 02:32:20 +00:00
rrs	28122090a3	Purge out a Windows def that somehow slipped past the scrubber. MFC after: 1 Week	2010-06-05 21:39:52 +00:00
rrs	246b12c936	Spacing issues MFC after: 1 Week	2010-06-05 21:33:16 +00:00
rrs	c4f6e9b730	This change does the following: 1) Fix the alignment of a comment. 2) Fix a BUG where we were NOT paying attention to the RESEND marking on retransmitting control chunks.. and worse we were not decrementing the retran count that could cause us to loop forever. 3) Add in the valdiate_no_lock function on invariants so that we will really check all ways out to be sure a lock does not slip out locked. MFC after: 1 week.	2010-06-05 21:27:43 +00:00
rrs	5a1c7d3374	Use the proper increment macro when increasing the number on sent_queue_retran_cnt. MFC after: 1 week	2010-06-05 21:22:58 +00:00
rrs	24eae4311a	This does two changes: 1) Makes it so that the INVARIANT function validate nolocks is available anywhere. 2) Fixes a BUG where a close has been done on a collision socket and the cookie processing would return leaving a lock held. MFC after: 1 week	2010-06-05 21:20:28 +00:00
rrs	7d3c46ab4c	This fixes a bug in the close up of a socket that had un-accepted assoc's. Basically the assoc (and inp) would get stuck and never get cleaned up. MFC after: 1 week	2010-06-05 21:17:23 +00:00
zec	b16b48273a	Virtualize the IPv4 multicast routing code. Submitted by: iprebeg Reviewed by: bms, bz, Pavlin Radoslavov MFC after: 30 days	2010-06-02 15:44:43 +00:00
qingli	f6ab4a6810	This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface. MFC after: 3 days	2010-05-25 20:42:35 +00:00
rrs	1f1a47d59f	This adds back the Iterator to the sctp code base. We now properly have ONE thread that services all VNET's. Also we purge out the old timer based iterator code which had multiple LOR's and other issues. MFC after: 3 days	2010-05-16 17:03:56 +00:00
rrs	f5c91155a5	Fix an old long time bug in generating a fwd-tsn. This would appear when greater than the size of mbuf TSN's would need to be skipped. MFC after: 3 days	2010-05-12 18:33:25 +00:00
rrs	3c1a227e65	More PR-SCTP bugs: - Make sure that when you kick the streams you add correctly using a 16 bit unsigned. - Make sure when sending out you allow FWD-TSN to skip over and list the ACKED chunks in the stream/seq list (so the rcv will kick the stream) MFC after: 3 days	2010-05-12 18:00:15 +00:00
tuexen	594aca58ad	Get rid of unused constants. MFC after: 3 days.	2010-05-12 16:10:33 +00:00
rrs	078a73da46	This fixes PR-SCTP issues: - Slide the map at the proper place. - Mark the bits in the nr_array ONLY if there is no marking. - When generating a FWD-TSN we allow us to skip past ACKED chunks too. MFC after: 1 weeks	2010-05-12 13:45:46 +00:00
rrs	f0f6266342	This fixes a bug with the one-2-one model socket when a user sets up a socket to a server sends data and closes the socket before the server has called accept(). It used to NOT work at all. Now we add a flag to the assoc and defer assoc cleanup so that the accept will suceed.	2010-05-11 17:02:29 +00:00
bz	0a90ef1728	MFP4: @176978-176982, 176984, 176990-176994, 177441 "Whitspace" churn after the VIMAGE/VNET whirls. Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed. Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9. This also removes some header file pollution for putatively static global variables. Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed. Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days	2010-04-29 11:52:42 +00:00
bz	c7fd54ae5a	Enhance the historic behaviour of raw sockets and jails in a way that we allow all possible jail IPs as source address rather than forcing the "primary". While IPv6 naturally has source address selection, for legacy IP we do not go through the pain in case IP_HDRINCL was not set. People should bind(2) for that. This will, for example, allow ping(\|6) -S to work correctly for non-primary addresses. Reported by: (ten 211.ru) Tested by: (ten 211.ru) MFC after: 4 days	2010-04-27 15:07:08 +00:00
bms	6def960c90	Fix a regression where DVMRP diagnostic traffic, such as that used by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control traffic must always have a TTL of 1. Submitted by: Matthew Luckie MFC after: 3 days	2010-04-27 14:14:21 +00:00
tuexen	8156e27dd7	Sending a FWDTSN chunk should not affect the retran count. MFC after: 3 days.	2010-04-25 19:00:37 +00:00
tuexen	92b6c67524	Undo my lastest fix since that wasn't one at all. MFC after: 3 days.	2010-04-25 15:04:57 +00:00
tuexen	312805d71c	* Fix compilation when using SCTP_AUDITING_ENABLED. * Fix delaying of SACK by taking out old optimization code which does not optimize anymore. * Fix fast retransmission of chunks abandoned by the "number of retransmissions" policy. MFC after: 3 days.	2010-04-23 08:19:47 +00:00
bz	b883f7a391	Avoid memory access after free. Use the (shortend) copy for the ipsec mtu lookup as well. PR: kern/145736 Submitted by: Peter Molnar (peter molnar.cc) MFC after: 3 days	2010-04-21 10:21:34 +00:00
tuexen	df535bd79d	Update highest_tsn variables when sliding mapping arrays.	2010-04-20 08:51:21 +00:00
tuexen	be51b44753	Really print the nr_mapping array when it should be printed.` MFC after: 3 days.	2010-04-20 08:50:19 +00:00
luigi	6758ecb23d	whitespace fixes (trailing whitespace, bad indentation after a merge, etc.)	2010-04-19 16:17:30 +00:00
ken	fc7b7bb0cb	Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was causing TSO to break for the Xen netfront driver. Reviewed by: gibbs, rwatson MFC after: 7 days	2010-04-19 15:15:36 +00:00
tuexen	ea377e0111	Get delayed SACK working again. MFC after: 3 days.	2010-04-19 14:15:58 +00:00
tuexen	66c04c10a0	Fix a bug where SACKs are not sent when they should. Move some protection code to INVARIANTS. Cleanups. MFC after: 3 days.	2010-04-17 12:22:44 +00:00
bz	d7a91dc6bf	Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed. In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained. In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag. In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free(). In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well. Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE	2010-04-11 16:04:08 +00:00
bz	1f5c413779	Try to help with a virtualized dummynet after r206428. This adds the explicit include (so far probably included through one of the few "hidden" includes in other header files) for vnet.h and adds a cast to unbreak LINT-VIMAGE.	2010-04-10 22:11:01 +00:00
rpaulo	d89608b359	Honor the CE bit even when the CWR bit is set. PR: 145600 Submitted by: Richard Scheffenegger <rs at netapp.com> MFC after: 1 week	2010-04-10 12:47:06 +00:00
bms	a59efb5aef	Fix a few issues related to the legacy 4.4 BSD multicast APIs. IPv4 addresses can and do change during normal operation. Testing by pfSense developers exposed an issue where OpenOSPFD was using the IPv4 address to leave the OSPF link-scope multicast groups on a dynamic OpenVPN tun interface, rather than using RFC 3678 with the interface index, which won't be raced when the interface's addresses change. In inp_join_group(): If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does not violate POLA for apps using the old API. It also stops us falling through to kicking IGMP state transactions in what is otherwise a no-op case. [This has already been dealt with in HEAD, but make it explicit before we MFC the change to 8.] In inp_leave_group(): Fix a bogus conditional. Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case where it actually belongs. If an interface was specified, by primary IPv4 address, for ioctl IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation), then and only then should we look up the ifp from the IPv4 address in mreqs.imr_interface. If not, we fall through to imo_match_group() as before, but only in the IP_DROP_MEMBERSHIP case. With these changes, the legacy 4.4BSD multicast API idempotence should be mostly preserved in the SSM enabled IPv4 stack. Found by: ermal (with pfSense) MFC after: 3 days	2010-04-10 12:05:31 +00:00
luigi	ed181b3acb	This commit enables partial operation of dummynet with kernels compiled with "options VIMAGE". As it is now, there is still a single instance of the pipes, and it is only usable from vnet0 (the main instance). Trying to use a pipe from a different vimage does not crash the system as it did before, but the traffic coming out from the pipe goes to the wrong place, and i still need to figure out where. Support for per-vimage pipes is almost there (just a matter of uncommenting the VNET_* definitions for dn_cfg, plus putting into the structure the remaining static variables), however i need first to figure out how init/uninit work, and also to understand where packets are ending up on exit from a pipe. In summary: vimage support for dummynet is not complete yet, but we are getting there.	2010-04-09 18:02:19 +00:00
luigi	0881f9be0f	no need to pass an argument to dn_compat_calc_size() MFC after: 3 days	2010-04-09 16:06:53 +00:00
luigi	e00fa2c8d4	Hopefully fix the recent breakage in rule deletion. A few more tests and this will also go into -stable where the problem is more critical.	2010-04-07 08:23:58 +00:00
tuexen	be2bd893e0	Fix a off-by-one bug in zeroing out the mapping arrays. Fix sctp_print_mapping_array(). MFC after: 1 week	2010-04-06 18:57:50 +00:00
tuexen	a8e5a68f92	Use also SCTP/IPv6 checksum offloading in special cases. MFC after: 2 weeks	2010-04-03 23:51:41 +00:00
tuexen	238a37de82	* Fix some race condition in SACK/NR-SACK processing. * Fix handling of mapping arrays when draining mbufs or processing FORWARD-TSN chunks. * Cleanup code (no duplicate code anymore for SACKs and NR-SACKs). Part of this code was developed together with rrs. MFC after: 2 weeks.	2010-04-03 15:40:14 +00:00
delphij	69ea0c9b4e	Add definition of IPv6 mobility header's protocol number, as assigned by IANA and defined in RFC 3775. Obtained from: KAME	2010-03-31 23:02:25 +00:00
luigi	f0058daed2	fix bug in previous commit related to rule deletion (stable/8 just fixed moments ago)	2010-03-31 02:20:22 +00:00
luigi	8e0cabacd0	remove a leftover debugging message	2010-03-29 12:27:49 +00:00
luigi	564e0558f0	Fix handling of set manipulations. This patch has two fixes for potential kernel panics (one wrong index, one access to the wrong lock) and two fixes to wrong logic in a conditional. The potential panics are also on stable/8, so I am going to MFC the fix quickly.	2010-03-29 12:19:23 +00:00
rrs	e4906bb78b	Adds the option of keeping per-cpu statistics in SCTP. This may be useful since it gets rid of atomics but I want it to remain an option until I can do further testing on if it really speeds things up.	2010-03-24 20:02:40 +00:00
rrs	96102fe418	lagging file I forgot to commit with my nr-sack fixes... opps Reviewed by: tuexen@freebsd.org	2010-03-24 20:01:14 +00:00
rrs	4938adaeeb	Fix for NR-Sack code. The code was NOT working properly when enabled. Basically most of the operations were incorrect causing bad sacks when you enabled nr-sack. The fixes range across 4 files and unifiy most of the processing so that we only test nr_sack flags to decide which type of sack to generate. Optimization left for this is to combine the sack generation code and make it capable of generating either sack thus shrinking out a routine. Reviewed by: tuexen@freebsd.org	2010-03-24 19:45:36 +00:00
luigi	9cd70e5323	Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed. I forgot to handle this case when i did the mtag cleanup three months ago. PR: 145004	2010-03-24 15:16:59 +00:00
rrs	a4998a854d	Fixes a bug where SACKs in the face of mapping_array expansion would break. Basically once we expanded the array we no longer had both mapping arrays in sync which the sack processing code depends on. This would mean we were randomly referring to memory that was probably not there. This mostly just gave us bad sack results going back to the peer. If INVARIENTS was on of course we would hit the panic routine in the sack_check call. We also add a print routine for the place where one would panic in invarients so one can see what the main mapping array holds. Reviewed by: tuexen@freebsd.org MFC after: 2 weeks	2010-03-23 01:36:50 +00:00
kmacy	01cb21605b	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
luigi	5bd32ef7a5	Add a priority-based packet scheduler. Sponsored by: The ONELAB2 Project Submitted by: Riccardo Panicucci	2010-03-21 16:30:32 +00:00
luigi	2122ae15e7	no need for ipfw_flush_tables(), we just need ipfw_destroy_tables()	2010-03-21 15:54:07 +00:00
luigi	8cf7b4ad59	revise documentation	2010-03-21 15:52:55 +00:00
kmacy	7ef5a84218	- spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1 - don't default to acquiring tcbinfo lock exclusively in rexmt MFC after: 7 days	2010-03-20 19:47:30 +00:00
bz	d9875d4fd4	Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout. Reviewed by: rwatson MFC after: 7 days	2010-03-17 18:28:27 +00:00
luigi	3ada53d651	small fixes to estimate the buffer size when requesting all pipes/flows.	2010-03-15 18:09:21 +00:00
luigi	3c242d0b3e	+ implement (two lines) the kernel side of 'lookup dscp N' to use the dscp as a search key in table lookups; + (re)implement a sysctl variable to control the expire frequency of pipes and queues when they become empty; + add 'queue number' as optional part of the flow_id. This can be enabled with the command queue X config mask queue ... and makes it possible to support priority-based schedulers, where packets should be grouped according to the priority and not some fields in the 5-tuple. This is implemented as follows: - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but without changing the size or shape of the structure, so there are no ABI changes. On passing, also document how other fields are used, and remove some useless assignments in ip_fw2.c - implement small changes in the userland code to set/read the field; - revise the functions in ip_dummynet.c to manipulate masks so they also handle the additional field; There are no ABI changes in this commit.	2010-03-15 17:14:27 +00:00
rwatson	1fdd3bccc0	Abstract out initialization of most aspects of struct inpcbinfo from their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy() to do this work in a central spot. As inpcbinfo becomes more complex due to ongoing work to add connection groups, this will reduce code duplication. MFC after: 1 month Reviewed by: bz Sponsored by: Juniper Networks	2010-03-14 18:59:11 +00:00
rrs	5db64758fc	The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks	2010-03-12 22:58:52 +00:00
kmacy	128542c758	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
luigi	0d5da117aa	implement listing of a subset of pipes/queues/schedulers. The filtering of the output is done in the kernel instead of userland to reduce the amount of data transfered.	2010-03-11 22:42:33 +00:00
luigi	5bde959c5f	fix handling of commands issued by RELENG_7 version of /sbin/ipfw, Submitted by: Riccardo Panicucci	2010-03-10 14:21:05 +00:00
qingli	93013817b0	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
luigi	91eb56543a	cosmetic changes and C++ compatibility	2010-03-08 11:27:39 +00:00
luigi	4cac8d2a86	don't use C++ keywords as variable names	2010-03-08 11:27:08 +00:00
luigi	d13cb4f803	do not report an error unnecessarily	2010-03-08 11:22:47 +00:00
bz	721ece0e76	Destroy TCP UMA zones (empty or not) upon network stack teardown to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet. We will still leak pages (especially for zones marked NOFREE). Reshuffle cleanup order in tcp_destroy() to get rid of what we can easily free first. Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days	2010-03-07 15:58:44 +00:00
bz	07f7a52d59	Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head. Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days	2010-03-07 15:37:58 +00:00
rwatson	7502c4d558	Locking the tcbinfo structure should not be necessary in tcp_timer_delack(), so don't. MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks	2010-03-07 14:23:44 +00:00
rwatson	14fa088a3b	Add comment in tcp_discardcb() talking about how we don't, but should, address TCP races relating to not calling tcp_drain() on stopped callouts. Discussed with: bz	2010-03-07 14:13:59 +00:00
rwatson	480b74ed20	Make udp_set_kernel_tunneling() less forgiving when its invariants are violated: so_pcb can never be NULL for a valid UDP socket, and it is always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does. MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks	2010-03-07 10:47:47 +00:00
rwatson	c25f1494fd	Remove unnecessary locking of divcbinfo lock from div_output(): this has not been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was guaranteed to be stable when a valid socket reference is held (as it is in the output path). MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks	2010-03-06 22:04:45 +00:00
rwatson	7255ccc6fe	Add a comment to tcp_usr_accept() to indicate why it is we acquire the tcbinfo lock there: r175612, which re-added it, masked a race between sonewconn(2) and accept(2) that could allow an incompletely initialized address on a newly-created socket on a listen queue to be exposed. Full details can be found in that commit message. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-06 21:38:31 +00:00
bz	f82acabd2e	Destroy UDP UMA zones (empty or not) upon network stack teardown to not leak them making the VM subsystem unhappy with every stoped vnet(). We will still leak pages (especially as zones are marked NOFREE). () This will also keep vmstat -z more usable. Sponsored by: ISPsystem MFC after: 5 days	2010-03-06 21:24:32 +00:00
rwatson	72ccf68411	Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE() to match other pcbinfo locking macros. MFC after: 1 week	2010-03-06 21:24:11 +00:00
luigi	b84681e7ab	plug a memory leak on pipe's reconfiguration	2010-03-05 17:53:28 +00:00
luigi	3aef100f01	fix a memory leak when deleting RED queues	2010-03-05 12:58:19 +00:00
luigi	8399f05e14	portability fixes	2010-03-04 21:52:40 +00:00
luigi	34f9fab9a3	don't use keywords as variable names.	2010-03-04 21:01:59 +00:00
luigi	70c24f778e	use callout_drain() (outside the lock) when unloading the module. This prevents a potential deadlock. Submitted by: Francesco Magno	2010-03-04 16:53:38 +00:00
luigi	e983b27b49	improve compatibility with RELENG_7.2	2010-03-04 16:52:26 +00:00
luigi	5ceeac4aa8	Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet. The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions. In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ. Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland. Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time. Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits. CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.	2010-03-02 17:40:48 +00:00
joel	bb682915c9	The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software. Obtained from: NetBSD	2010-03-01 17:05:46 +00:00
bz	b8a1e8dec8	Upon virtual network stack teardown properly release the TCP syncache resources. Sponsored by: ISPsystem Reviewed by: rwatson MFC After: 5 days	2010-02-20 21:45:04 +00:00
tuexen	f9cc41e4ee	Fix handling of SHUTDOWN-ACK chunk in COOKIE_WAIT and COOKIE_ECHOED. MFC after: 1 week	2010-02-20 20:30:40 +00:00
bz	29381991cf	Split up ip_drain() into an outer lock and iterator part and a "locked" version that will only handle a single network stack instance. The latter is called directly from ip_destroy(). Hook up an ip_destroy() function to release resources from the legacy IP network layer upon virtual network stack teardown. Sponsored by: ISPsystem Reviewed by: rwatson MFC After: 5 days	2010-02-20 19:59:52 +00:00
tuexen	02181ec064	* Fix another u_long -> uint32_t issue. * Remove an unused global variable. * Fix an issue reported by Bruce Cran related to reusing SCTP socket which where connected. MFC after: 1 week	2010-02-19 18:00:38 +00:00
pjd	c527452336	No need to include security/mac/mac_framework.h here.	2010-02-18 22:26:01 +00:00
tuexen	93bada478f	Use uint32_t instead of u_long. MFC after: 1 week	2010-02-18 13:46:54 +00:00
luigi	c2328f70d5	remove recursive lock/unlock calls, we do them already before entering the switch. Reported by: Marta Carbone	2010-02-17 13:06:06 +00:00
tuexen	06fc12b77a	Add missing SCTP_PACKED. Spotted by Irene Ruengeler. MFC after: 1 week	2010-02-13 21:38:15 +00:00
bz	0cce20af31	Properly free resources when destroying the TCP hostcache while tearing down a network stack (in the VIMAGE jail+vnet case). For that break out the logic from tcp_hc_purge() into an internal function we can call from both, the sysctl handler and the tcp_hc_destroy(). Sponsored by: ISPsystem Reviewed by: silby, lstewart MFC After: 8 days	2010-02-09 21:31:53 +00:00
tuexen	78aa3f59ba	Restore the checksum received before processing the packet. MFC after: 1 week	2010-02-04 21:02:29 +00:00
qingli	4d8ba24be3	Some of the existing ppp and vpn related scripts create and set the IP addresses of the tunnel end points to the same value. In these cases the loopback route is not installed for the local end. Verified by: avg MFC after: 5 days	2010-02-02 20:38:30 +00:00
luigi	d774a108f2	use u_char instead of u_int for short bitfields. For our compiler the two constructs are completely equivalent, but some compilers (including MSC and tcc) use the base type for alignment, which in the cases touched here result in aligning the bitfields to 32 bit instead of the 8 bit that is meant here. Note that almost all other headers where small bitfields are used have u_int8_t instead of u_int. MFC after: 3 days	2010-02-01 14:13:44 +00:00
tuexen	01ee00225c	Use [] instead of [0] for flexible arrays. Obtained from: Bruce Cran MFC after: 1 week	2010-01-22 07:53:41 +00:00
tuexen	5aaf03563a	Get rid of a lot of duplicated code for NR-SACK handle. Generalize the SACK to code handle also NR-SACKs.	2010-01-17 21:00:28 +00:00
rrs	e0b03cdcce	Bug fix: If the allocation of a socket failed and we freed the inpcb, it was possible to not set the proper flags on the pcb (i.e. the socket is not there). This is HIGHLY unlikely since no one else should be able to find the socket.. but for consistency we do the proper loop thing to make sure that we mark the socket as gone on the PCB.	2010-01-17 19:47:59 +00:00
rrs	735b231916	Pulls out another leaked windows ifdef that somehow made its way through the scrubber.	2010-01-17 19:40:21 +00:00
rrs	c85a2af4da	This change syncs up the socketAPI stream-reset values to match those in linux and the I-D just released to the IETF.	2010-01-17 19:35:38 +00:00
rrs	09211b9ce2	More leaked ifdefs for APPLE and its mobility stuff.	2010-01-17 19:24:30 +00:00
rrs	3a0bea0af0	Remove another set of "leaked" ifdefs that somehow found their way into FreeBSD.	2010-01-17 19:21:50 +00:00
rrs	317a5adf4b	Remove strange APPLE define that leaked through the scrubber scripts. Scripts are now fixed so this won't happen again.	2010-01-17 19:17:16 +00:00
bz	5d1c4cb181	Garbage collect references to the no longer implemented tcp_fasttimo(). Discussed with: rwatson MFC after: 5 days	2010-01-17 13:07:52 +00:00
bz	d80ba03e3c	Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control whether to use source address selection (default) or the primary jail address for unbound outgoing connections. This is intended to be used by people upgrading from single-IP jails to multi-IP jails but not having to change firewall rules, application ACLs, ... but to force their connections (unless otherwise changed) to the primry jail IP they had been used for years, as well as for people prefering to implement similar policies. Note that for IPv6, if configured incorrectly, this might lead to scope violations, which single-IPv6 jails could as well, as by the design of jails. [1] Reviewed by: jamie, hrs (ipv6 part) Pointed out by: hrs [1] MFC After: 2 weeks Asked for by: Jase Thew (bazerka beardz.net)	2010-01-17 12:57:11 +00:00
ume	185bf1f1d5	Change 'me' to match any IPv6 address configured on an interface in the system as well as any IPv4 address. Reviewed by: David Horn <dhorn2000__at__gmail.com>, luigi, qingli MFC after: 2 weeks	2010-01-17 08:39:48 +00:00
tuexen	c0a018dc4a	Get rid of support of an old version of the SCTP-AUTH draft. Get rid of unused MD5 code. MFC after: 1 week	2010-01-16 20:04:17 +00:00
qingli	316634c7ad	Ensure an address is removed from the interface address list when the installation of that address fails. PR: 139559	2010-01-08 17:49:24 +00:00
ru	ce510bcb3f	Complete the swap of carp(4) log levels and document the change. MFC after: 3 days	2010-01-08 16:14:41 +00:00
mbr	7450f52a57	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
luigi	51e5ccee24	we don't use dummynet_drain!	2010-01-07 13:53:47 +00:00
luigi	057d16827d	check that we have an ipv4 packet before swapping ip_len and ip_off. This should fix the handling of ipv6 packets which i broke when i made ipfw operate on packets in network format. Reported by: Hajimu UMEMOTO	2010-01-07 12:00:54 +00:00
luigi	db333db4e6	Following up on a request from Ermal Luci to make ip_divert work as a client of pf(4), make ip_divert not depend on ipfw. This is achieved by moving to ip_var.h the struct ipfw_rule_ref (which is part of the mtag for all reinjected packets) and other declarations of global variables, and moving to raw_ip.c global variables for filter and divert hooks. Note that names and locations could be made more generic (ipfw_rule_ref is really a generic reference robust to reconfigurations; the packet filter is not necessarily ipfw; filters and their clients are not necessarily limited to ipv4), but _right now_ most of this stuff works on ipfw and ipv4, so i don't feel like doing a gratuitous renaming, at least for the time being.	2010-01-07 10:39:15 +00:00
luigi	6ea737556e	some header shuffling to help decoupling ip_divert from ipfw	2010-01-07 10:08:05 +00:00
luigi	6a3745e3ec	put ip_len in correct order for ip_output(). This prevents a panic when ipfw generates packets on its own (such as reject or keepalives for dynamic rules). Reported by: Chagin Dmitry	2010-01-07 09:28:17 +00:00
luigi	543315e6a4	this file does not require ip_dummynet.h	2010-01-05 11:00:31 +00:00
qingli	281d5caa0e	An existing incomplete ARP entry would expire a subsequent statically configured entry of the same host. This bug was due to the expiration timer was not cancelled when installing the static entry. Since there exist a potential race condition with respect to timer cancellation, simply check for the LLE_STATIC bit inside the expiration function instead of cancelling the active timer. MFC after: 5 days	2010-01-05 00:35:46 +00:00
luigi	40024ff7c3	Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler. - make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place. - use a macro FREE_PKT to dispose of mbufs. This eases portability. On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things. Overall the code shrinks a bit and is hopefully more readable. I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).	2010-01-04 19:01:22 +00:00
tuexen	67e62f9811	Correct usage of parenthesis. PR: kern/142066 Approved by: rrs (mentor) Obtained from: Henning Petersen, Bruce Cran. MFC after: 2 weeks	2010-01-04 18:25:38 +00:00
np	10cde58f33	Avoid NULL dereference in arpresolve.	2010-01-03 06:43:13 +00:00
qingli	0897bcc8ad	Consolidate the route message generation code for when address aliases were added or deleted. The announced route entry for an address alias is no longer empty because this empty route entry was causing some route daemon to fail and exit abnormally. MFC after: 5 days	2009-12-30 22:13:01 +00:00
qingli	ed965a92bc	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
syrinx	3c572e438b	Make sure the multicast forwarding cache entry's stall queue is properly initialized before trying to insert an entry into it. PR: kern/142052 Reviewed by: bms MFC after: now	2009-12-30 08:52:13 +00:00
luigi	7236f425fc	we really need htonl() here, see the comment a few lines above in the code.	2009-12-29 00:02:57 +00:00
antoine	bfd388c026	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
bz	7eddc3a63a	Make the compiler happy after r201125: - + remove two unnecessary initializations in ip_output; + + remove one unnecessary initializations in ip_output;	2009-12-28 21:14:18 +00:00
luigi	1a1b4d40fb	introduce a local variable rte acting as a cache of ro->ro_rt within ip_output, achieving (in random order of importance): - a reduction of the number of 'r's in the source code; - improved legibility; - a reduction of 64 bytes in the .text	2009-12-28 14:48:32 +00:00
luigi	9c18067568	+ remove an unused #define print_ip; + remove two unnecessary initializations in ip_output; + localize 'len'; + introduce a temporary variable n to count the number of fragments, the compiler seems unable to identify a common subexpression (written 3 times, used twice); + document some assumptions on ip_len and ip_hl	2009-12-28 14:09:46 +00:00
luigi	b41c473d90	bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects to find it there. Unfortunately this reintroduces the dependency on ip_fw_pfil.c	2009-12-28 12:29:13 +00:00
luigi	483862a5a2	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
luigi	ffe8fa8dad	readability fixes -- add braces on large blocks, remove unnecessary initializations	2009-12-28 10:19:53 +00:00
luigi	5596409e34	explain details of operation of table lookups, and improve portability	2009-12-28 10:12:35 +00:00
luigi	19c9e43f09	diverted packet must re-enter _after_ the matching rule, or we create loops. The divert cookie (that can be set from userland too) contains the matching rule nr, so we must start from nr+1. Reported by: Joe Marcus Clarke	2009-12-27 10:19:10 +00:00
luigi	62c83b51a2	fix poor indentation resulting from a merge	2009-12-24 17:35:28 +00:00
luigi	4c57fc7f52	mostly style changes, such as removal of trailing whitespace, reformatting to avoid unnecessary line breaks, small block restructuring to avoid unnecessary nesting, replace macros with function calls, etc. As a side effect of code restructuring, this commit fixes one bug: previously, if a realloc() failed, memory was leaked. Now, the realloc is not there anymore, as we first count how much memory we need and then do a single malloc.	2009-12-23 18:53:11 +00:00
luigi	d90c98559e	fix build with the new fast lookup structure. Also remove some unnecessary headers	2009-12-23 12:15:21 +00:00
luigi	be2e837cde	fix build on 64-bit architectures. Also fix the indentation on a few lines.	2009-12-23 12:00:50 +00:00
luigi	2043aec456	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
jhb	beb0e14aae	- Rename the __tcpi_(snd\|rcv)_mss fields of the tcp_info structure to remove the leading underscores since they are now implemented. - Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info structure. Reviewed by: rwatson MFC after: 2 weeks	2009-12-22 15:47:40 +00:00

1 2 3 4 5 ...

3905 Commits