freebsd-skq

Author	SHA1	Message	Date
melifaro	235fbf1304	Fix panic when handling non-inet arp message introduced in r286825. Submitted by: delphij	2015-08-18 06:16:19 +00:00
melifaro	bc522110e3	Split arpresolve() into fast/slow path. This change isolates the most common case (e.g. successful lookup) from more complicates scenarios. It also (tries to) make code more simple by avoiding retry: cycle. The actual goal is to prepare code to the upcoming change that will allow LL address retrieval without acquiring LLE lock at all. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D3383	2015-08-16 12:23:58 +00:00
tuexen	41e76c86c5	Allow the path MTU to grow up to the outgoing interface MTU. MFC after: 3 days	2015-08-14 14:26:13 +00:00
melifaro	0ddc1c9d3a	Move lle update code from from gigantic ip_arpinput() to separate bunch of functions. The goal is to isolate actual lle updates to permit more fine-grained locking. Do all lle link-level update under AFDATA wlock. Sponsored by: Yandex LLC	2015-08-13 13:38:09 +00:00
hiren	3d944dc04e	Remove unused TCPTV_SRTTDFLT. We initialize srtt with TCPTV_SRTTBASE when we don't have any rtt estimate. Differential Revision: D3334 Sponsored by: Limelight Networks	2015-08-12 16:08:37 +00:00
melifaro	0c24547a66	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
melifaro	d8f92ce2cf	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
melifaro	8e6b3a8d59	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
melifaro	ba06112c24	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
melifaro	4f240a9c31	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
kp	120a48fab6	tcp_reass_zone is not a VNET variable. This fixes a panic during 'sysctl -a' on VIMAGE kernels. The tcp_reass_zone variable is not VNET_DEFINE() so we can not mark it as a VNET variable (with CTLFLAG_VNET).	2015-08-09 19:07:24 +00:00
marius	89fb1d12ff	Fix compilation after r286458.	2015-08-08 21:42:15 +00:00
marius	fbf037b786	Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.	2015-08-08 21:41:59 +00:00
melifaro	33c52eed18	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
melifaro	20bb5966e2	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
melifaro	a915efe931	Simplify ip[6] simploop: Do not pass 'dst' sockaddr to ip[6]_mloopback: - We have explicit check for AF_INET in ip_output() - We assume ip header inside passed mbuf in ip_mloopback - We assume ip6 header inside passed mbuf in ip6_mloopback	2015-08-08 15:58:35 +00:00
jch	349429fe82	Fix a kernel assertion issue introduced with r286227: Avoid too strict INP_INFO_RLOCK_ASSERT checks due to tcp_notify() being called from in6_pcbnotify(). Reported by: Larry Rosenman <ler@lerctr.org> Submitted by: markj, jch	2015-08-08 08:40:36 +00:00
markj	b56b272f75	The mbuf parameter to ip_output_pfil() must be an output parameter since pfil(9) hooks may modify the chain. X-MFC-With: r286028	2015-08-03 17:47:02 +00:00
jch	67927a7a7c	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
tuexen	7f035ef5b6	Don't take the port numbers for packets containing ABORT chunks from a freed mbuf. Just use them from the stcb. MFC after: 3 days	2015-08-02 16:07:30 +00:00
ae	5db04a4584	Remove unneded #include "opt_inet.h".	2015-07-31 09:02:28 +00:00
hiren	6aa0502e8d	Update snd_una description to make it more readable. Differential Revision: https://reviews.freebsd.org/D3179 Reviewed by: gnn Sponsored by: Limelight Networks	2015-07-30 19:24:49 +00:00
eri	1434c6f800	Avoid double reference decrement when firewalls force relooping of packets When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues. This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route. Differential Revision: https://reviews.freebsd.org/D3037 Reviewed by: gnn Approved by: gnn(mentor)	2015-07-29 20:10:36 +00:00
eri	d4d2ec9641	ip_output normalization and fixes ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it. Gather all this code together to make it readable and properly handle the reloop cases. Some of the issues identified: M_IP_NEXTHOP is not handled properly in existing code. route reference leaking is possible with in FIB number change route flags checking is not consistent in the function Differential Revision: https://reviews.freebsd.org/D3022 Reviewed by: gnn Approved by: gnn(mentor) MFC after: 4 weeks	2015-07-29 18:04:01 +00:00
pkelsey	c409257912	Revert r265338, r271089 and r271123 as those changes do not handle non-inline urgent data and introduce an mbuf exhaustion attack vector similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs. Address the issue described in FreeBSD-SA-15:15.tcp. Reviewed by: glebius Approved by: so Approved by: jmallett (mentor) Security: FreeBSD-SA-15:15.tcp Sponsored by: Norse Corp, Inc.	2015-07-29 17:59:13 +00:00
ae	271b2043d8	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
ae	75425458ac	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
tuexen	a7c1a8d16f	Fix a typo reported by Erik Cederstrand. MFC after: 1 week	2015-07-28 08:50:13 +00:00
tuexen	63528763eb	Provide consistent error causes whenever an ABORT chunk is sent. MFC after: 1 week	2015-07-27 22:35:54 +00:00
tuexen	f6d397ca56	Improve locking on Mac OS X. This does not change the functionality on FreeBSD. Reviewed by: rrs MFC after: 1 week	2015-07-26 10:37:40 +00:00
tuexen	a486c29457	Fix and improve a debug message. The SID was reported as an SSN. MFC after: 1 week	2015-07-26 10:17:17 +00:00
tuexen	c0e1a0d3a9	Move including netinet/icmp6.h around to avoid a problem when including netinet/icmp6.h and net/netmap.h. Both use ni_flags... This allows to build multistack with SCTP support. MFC after: 1 week	2015-07-25 18:26:09 +00:00
kp	5bc481bce4	Remove stale comment. The IPv6 pseudo header checksum was added by bz in r235961. Sponsored by: Essen FreeBSD Hackathon	2015-07-25 16:14:55 +00:00
rrs	f1b6d4d83c	Fix silly syntax error emacs chugged in for me.. gesh. MFC after: 3 weeks	2015-07-24 14:13:43 +00:00
rrs	69d858f584	Fix an issue with MAC OS locking and also optimize the case where we are sending back a stream-reset and a sack timer is running, in that case we should just send the SACK. MFC after: 3 weeks	2015-07-24 14:09:03 +00:00
rrs	606fc6cd55	Fix several problems with Stream Reset. 1) We were not handling (or sending) the IN_PROGRESS case if the other side (or our side) was not able to reset (awaiting more data). 2) We would improperly send a stream-reset when we should not. Not waiting until the TSN had been assigned when data was inqueue. Reviewed by: tuexen	2015-07-22 11:30:37 +00:00
delphij	722d66a533	Fix resource exhaustion due to sessions stuck in LAST_ACK state. Submitted by: Jonathan Looney (Juniper SIRT) Reviewed by: lstewart Security: CVE-2015-5358 Security: SA-15:13.tcp	2015-07-21 23:42:15 +00:00
eri	98d59caff0	IPSEC, remove variable argument function its already due. Differential Revision: https://reviews.freebsd.org/D3080 Reviewed by: gnn, ae Approved by: gnn(mentor)	2015-07-21 21:46:24 +00:00
rrs	e4870a15db	When a tunneling protocol is being used with UDP we must release the lock on the INP before calling the tunnel protocol, else a LOR may occur (it does with SCTP for sure). Instead we must acquire a ref count and release the lock, taking care to allow for the case where the UDP socket has gone away and not unlocking since the refcnt decrement on the inp will do the unlock in that case. Reviewed by: tuexen MFC after: 3 weeks	2015-07-21 09:54:31 +00:00
luigi	b13eda2674	fix a typo in a comment	2015-07-18 15:28:32 +00:00
kevlo	cad9ad69e5	Since the IETF has redefined the meaning of the tos field to accommodate a set of differentiated services, set IPTOS_PREC_* macros using IPTOS_DSCP_* macro definitions. While here, add IPTOS_DSCP_VA macro according to RFC 5865. Differential Revision: https://reviews.freebsd.org/D3119 Reviewed by: gnn	2015-07-18 06:48:30 +00:00
pkelsey	b4b4952931	Check TCP timestamp option flag so that the automatic receive buffer scaling code does not use an uninitialized timestamp echo reply value from the stack when timestamps are not enabled. Differential Revision: https://reviews.freebsd.org/D3060 Reviewed by: hiren Approved by: jmallett (mentor) MFC after: 3 days Sponsored by: Norse Corp, Inc.	2015-07-17 17:36:33 +00:00
eri	faf2b7a96c	Correct issue presented in r285051, apparently neither clang nor gcc complain about this. But clang intis the var to NULL correctly while gcc on at least mips does not. Correct the undefined behavior by initializing the variable properly. PR: 201371 Differential Revision: https://reviews.freebsd.org/D3036 Reviewed by: gnn Approved by: gnn(mentor)	2015-07-09 16:28:36 +00:00
tuexen	be47d64dfd	Export the ssthresh value per SCTP path via the sysctl interface. MFC after: 1 month	2015-07-07 06:34:28 +00:00
eri	c02a77e9f8	Avoid doing multiple route lookups for the same destination IP during forwarding ip_forward() does a route lookup for testing this packet can be sent to a known destination, it also can do another route lookup if it detects that an ICMP redirect is needed, it forgets all of this and handovers to ip_output() to do the same lookup yet again. This optimisation just does one route lookup during the forwarding path and handovers that to be considered by ip_output(). Differential Revision: https://reviews.freebsd.org/D2964 Approved by: ae, gnn(mentor) MFC after: 1 week	2015-07-02 18:10:41 +00:00
np	a503d4a154	Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags hanging off the header need to be freed too. Differential Revision: https://reviews.freebsd.org/D2708 Reviewed by: ae@, hiren@	2015-06-30 17:19:58 +00:00
hiren	160de052b2	Avoid a situation where we do not set persist timer after a zero window condition. If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer. PR: 192599 Differential Revision: D2946 Submitted by: jlott at averesystems dot com Reviewed by: jhb, jch, gnn, hiren Tested by: jlott at averesystems dot com, jch MFC after: 2 weeks	2015-06-29 21:23:54 +00:00
hiren	0b2e37f199	Reverting r284710. Today I learned: iff == if and only if. Suggested by: many	2015-06-22 22:16:06 +00:00
hiren	36e0fd586d	Fix a typo: s/iff/if/ Sponsored by: Limelight Networks	2015-06-22 21:53:55 +00:00
tuexen	84a04be05a	Fix two KTRACE related bugs. Reported by: Coverity CID: 1018058, 1018060 MFC after: 3 days	2015-06-19 21:55:12 +00:00
tuexen	d78e6c8ac7	When setting the primary address, return an error whenever it fails. MFC after: 3 days	2015-06-19 12:48:22 +00:00
ae	38dbbbe90e	Fix possible use after free in encap[46]_input(). There is small window, when encap_detach() can free matched entry directly after we release encapmtx. Instead of use pointer to the matched entry, save pointers to needed variables from this entry and use them after release mutex. Pass argument stored in the encaptab entry to encap_fillarg(), instead of pointer to matched entry. Also do not allocate new mbuf tag, when argument that we plan to save in this tag is NULL. Also make encaptab variable static. Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-06-18 18:28:38 +00:00
tuexen	25a52b7a51	Fix a bug related to flow assignment I introduce in https://svnweb.freebsd.org/base?view=revision&revision=275483 MFC after: 3 days	2015-06-17 19:26:23 +00:00
tuexen	2af840e2ac	Add FIB support for SCTP. This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379 MFC after: 3 days	2015-06-17 15:20:14 +00:00
eri	00bae21874	If there is a system with a bpf consumer running and a packet is wanted to be transmitted but the arp cache entry expired, which triggers an arp request to be sent, the bpf code might want to sleep but crash the system due to a non sleep lock held from the arp entry not released properly. Release the lock before calling the arp request code to solve the issue as is done on all the other code paths. PR: 200323 Approved by: ae, gnn(mentor) MFC after: 1 week Sponsored by: Netgate Differential Revision: https://reviews.freebsd.org/D2828	2015-06-17 12:23:04 +00:00
tuexen	70a364cfb0	Correctly detect the case where the last address is removed. MFC after: 3 days	2015-06-14 22:14:00 +00:00
tuexen	c96e74f040	Stop the heartbeat timer when removing a net. Thanks to the reporter of https://code.google.com/p/sctp-refimpl/issues/detail?id=14 for reporting the issue. MFC after: 3 days	2015-06-14 17:48:44 +00:00
tuexen	dd78cae9d3	Fix the reporting of the PMTUD state for specific paths. MFC after: 3 days	2015-06-12 18:59:29 +00:00
tuexen	146839518d	Code cleanup. MFC after: 3 days	2015-06-12 17:20:09 +00:00
tuexen	9532b4863e	In case of an output error, continue with the next net, don't try to continue sending on the same net. This fixes a bug where an invalid mbuf chain was constructed, if a full size frame of control chunks should be sent and there is a output error. Based on a discussion with rrs@, change move to the next net. This fixes the bug and improves the behaviour. Thanks to Irene Ruengeler for spending a lot of time in narrowing this problem down. MFC after: 3 days	2015-06-12 16:01:41 +00:00
jch	582bfe6ed0	Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763 Reviewed by: hiren Approved by: hiren MFC after: 1 day Sponsored by: Verisign, Inc.	2015-06-10 20:43:07 +00:00
tuexen	1aac6c70c3	Export a pointer to the SCTP socket. This is needed to add SCTP support to sockstat. MFC after: 3 days	2015-06-04 12:46:56 +00:00
tuexen	e78026ed4c	Remove printf() noise... MFC after: 3 days	2015-05-29 08:31:15 +00:00
tuexen	0de1f8c7b5	Report the MTU consistently as specified in https://tools.ietf.org/html/rfc6458 Thanks to Irene Ruengeler for helping me to fix this bug. MFC after: 3 days	2015-05-28 20:33:28 +00:00
tuexen	a864af1358	Take source and destination address into account when determining the scope. This fixes a problem when a client with a global address connects to a server with a private address. Thanks to Irene Ruengeler in helping me to find the issue. MFC after: 3 days	2015-05-28 19:28:08 +00:00
tuexen	2ff25b445e	Retire SCTP_DONT_DO_PRIVADDR_SCOPE which was never defined. MFC after: 3 days	2015-05-28 18:52:32 +00:00
tuexen	9794079730	Fix a bug where messages would not be sent in SHUTDOWN_RECEIVED state. This problem was reported by Mark Bonnekessel and Markus Boese. Thanks to Irene Ruengeler for helping me to fix the cause of the problem. It can be tested with the following packetdrill script: +0.0 socket(..., SOCK_STREAM, IPPROTO_SCTP) = 3 +0.0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0.0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 // Check the handshake with an empty(!) cookie +0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0.0 > sctp: INIT[flgs=0, tag=1, a_rwnd=..., os=..., is=..., tsn=0, ...] +0.1 < sctp: INIT_ACK[flgs=0, tag=2, a_rwnd=10000, os=1, is=1, tsn=0, STATE_COOKIE[len=4, val=...]] +0.0 > sctp: COOKIE_ECHO[flgs=0, len=4, val=...] +0.1 < sctp: COOKIE_ACK[flgs=0] +0.0 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 +0.0 write(3, ..., 1024) = 1024 +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=0, sid=0, ssn=0, ppid=0] +0.0 write(3, ..., 1024) = 1024 // Pending due to Nagle +0.0 < sctp: SHUTDOWN[flgs=0, cum_tsn=0] +0.0 > sctp: DATA[flgs=BE, len=1040, tsn=1, sid=0, ssn=1, ppid=0] +0.0 < sctp: SACK[flgs=0, cum_tsn=1, a_rwnd=10000, gaps=[], dups=[]] // Do we need another SHUTDOWN here? +0.0 > sctp: SHUTDOWN_ACK[flgs=0] +0.0 < sctp: SHUTDOWN_COMPLETE[flgs=0] +0.0 close(3) = 0 MFC after: 3 days	2015-05-28 18:34:02 +00:00
tuexen	ec768a1460	Use macros for overhead in a consistent way. No functional change. Thanks to Irene Ruengeler for suggesting the change. MFC after: 3 days	2015-05-28 17:57:56 +00:00
tuexen	67b3bbe09c	Some more debug info cleanup. MFC after: 3 days	2015-05-28 16:39:22 +00:00
tuexen	a82f33e60c	Fix and cleanup the debug information. This has no user-visible changes. Thanks to Irene Ruengeler for proving a patch. MFC after: 3 days	2015-05-28 16:00:23 +00:00
tuexen	d4fad6a818	Address some compiler warnings. No functional change. MFC after: 3 days	2015-05-28 14:24:21 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
hiren	0e45b26b69	Add a new sysctl net.inet.tcp.hostcache.purgenow=1 to expire and purge all entries in hostcache immediately. In collaboration with: bz, rwatson MFC after: 1 week Relnotes: yes Sponsored by: Limelight Networks	2015-05-20 01:08:01 +00:00
hiren	b33b449313	Correct the wording as we are increasing the window size. Reviewed by: jhb Sponsored by: Limelight Networks	2015-05-19 19:17:20 +00:00
ae	cbc4e577f0	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
tuexen	02ec72fed7	Ensure that the COOKIE-ACK can be sent over UDP if the COOKIE-ECHO was received over UDP. Thanks to Felix Weinrank for makeing me aware of the problem and to Irene Ruengeler for providing the fix. MFC after: 1 week	2015-05-12 08:08:16 +00:00
gnn	ac5008de11	Add a state transition call to show that we have entered TIME_WAIT. Although this is not important to the rest of the TCP processing it is a conveneint way to make the DTrace state-transition probe catch this important state change. MFC after: 1 week	2015-05-01 12:49:03 +00:00
gnn	d333d3e495	Move the SIFTR DTrace probe out of the writing thread context and directly into the place where the data is collected.	2015-04-30 17:43:40 +00:00
gnn	08d35a248b	Brief demo script showing the various values that can be read via the new SIFTR statically defined tracepoint (SDT). Differential Revision: https://reviews.freebsd.org/D2387 Reviewed by: bz, markj	2015-04-29 17:19:55 +00:00
melifaro	9f3d7ccd07	Make rule table kernel-index rewriting support any kind of objects. Currently we have tables identified by their names in userland with internal kernel-assigned indices. This works the following way: When userland wishes to communicate with kernel to add or change rule(s), it makes indexed sorted array of table names (internally ipfw_obj_ntlv entries), and refer to indices in that array in rule manipulation. Prior to committing new rule to the ruleset kernel a) finds all referenced tables, bump their refcounts and change values inside the opcodes to be real kernel indices b) auto-creates all referenced but not existing tables and then do a) for them. Kernel does almost the same when exporting rules to userland: prepares array of used tables in all rules in range, and prepends it before the actual ruleset retaining actual in-kernel indexes for that. There is also special translation layer for legacy clients which is able to provide 'real' indices for table names (basically doing atoi()). While it is arguable that every subsystem really needs names instead of numbers, there are several things that should be noted: 1) every non-singleton subsystem needs to store its runtime state somewhere inside ipfw chain (and be able to get it fast) 2) we can't assume object numbers provided by humans will be dense. Existing nat implementation (O(n) access and LIST inside chain) is a good example. Hence the following: * Convert table-centric rewrite code to be more generic, callback-based * Move most of the code from ip_fw_table.c to ip_fw_sockopt.c * Provide abstract API to permit subsystems convert their objects between userland string identifier and in-kernel index. (See struct opcode_obj_rewrite) for more details * Create another per-chain index (in next commit) shared among all subsystems * Convert current NAT44 implementation to use new API, O(1) lookups, shared index and names instead of numbers (in next commit). Sponsored by: Yandex LLC	2015-04-27 08:29:39 +00:00
ae	b38774ffc9	Remove now unneded KEY_FREESP() for case when ipsec[46]_process_packet() returns EJUSTRETURN. Sponsored by: Yandex LLC	2015-04-27 01:11:09 +00:00
ae	5a6412a276	Fix possible use after free due to security policy deletion. When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(), we hold one reference to security policy and release it just after return from this function. But IPSec processing can be deffered and when we release reference to security policy after ipsec[46]_process_packet(), user can delete this security policy from SPDB. And when IPSec processing will be done, xform's callback function will do access to already freed memory. To fix this move KEY_FREESP() into callback function. Now IPSec code will release reference to SP after processing will be finished. Differential Revision: https://reviews.freebsd.org/D2324 No objections from: #network Sponsored by: Yandex LLC	2015-04-27 00:55:56 +00:00
tuexen	5106da568a	Don't panic under INVARIANTS when receiving a SACK which cumacks a TSN never sent. While there, fix two typos. MFC after: 1 week	2015-04-26 21:47:15 +00:00
bapt	9ec79fef30	mdoc: fix rendering issues	2015-04-26 11:39:25 +00:00
ae	2af9b531aa	Fix possible reference leak. Sponsored by: Yandex LLC	2015-04-24 21:05:29 +00:00
glebius	2ac0a031ed	Improve carp(4) locking: - Use the carp_sx to serialize not only CARP ioctls, but also carp_attach() and carp_detach(). - Use cif_mtx to lock only access to those the linked list. - These locking changes allow us to do some memory allocations with M_WAITOK and also properly call callout_drain() in carp_destroy(). - In carp_attach() assert that ifaddr isn't attached. We always come here with a pristine address from in[6]_control(). Reviewed by: oleg Sponsored by: Nginx, Inc.	2015-04-21 20:25:12 +00:00
glebius	14b7122d6d	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
jch	b227cb3d85	Fix an old and well-documented use-after-free race condition in TCP timers: - Add a reference from tcpcb to its inpcb - Defer tcpcb deletion until TCP timers have finished Differential Revision: https://reviews.freebsd.org/D2079 Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com> Reviewed by: imp, rrs, adrian, jhb, bz Approved by: jhb Sponsored by: Verisign, Inc.	2015-04-16 10:00:06 +00:00
adrian	efbeb67101	Fix RSS build - netisr input / NETISR_IP_DIRECT is used here.	2015-04-15 00:57:21 +00:00
mjg	7f65d178b4	Replace struct filedesc argument in getsock_cap with struct thread This is is a step towards removal of spurious arguments.	2015-04-11 16:00:33 +00:00
mjg	22da590f11	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.	2015-04-11 15:40:28 +00:00
delphij	34b5443a48	Attempt to fix build after 281351 by defining full prototype for the functions that were moved to ip_reass.c.	2015-04-11 01:06:59 +00:00
glebius	d2eb1d4ad8	o Use Jenkins hash. With previous hash, for a single source IP address and sequential IP ID case (e.g. ping -f), distribution fell into 8-10 buckets out of 64. With Jenkins hash, distribution is even. o Add random seed to the hash. Sponsored by: Nginx, Inc.	2015-04-10 06:55:43 +00:00
glebius	a0f9f3c303	Move all code related to IP fragment reassembly to ip_reass.c. Some function names have changed and comments are reformatted or added, but there is no functional change. Claim copyright for me and Adrian. Sponsored by: Nginx, Inc.	2015-04-10 06:02:37 +00:00
glebius	1f2f31e87d	Now that IP reassembly is no longer under single lock, book-keeping amount of allocations in V_nipq is racy. To fix that, we would simply stop doing book-keeping ourselves, and rely on UMA doing that. There could be a slight overcommit due to caches, but that isn't a big deal. o V_nipq and V_maxnipq go away. o net.inet.ip.fragpackets is now just SYSCTL_UMA_CUR() o net.inet.ip.maxfragpackets could have been just SYSCTL_UMA_MAX(), but historically it has special semantics about values of 0 and -1, so provide sysctl_maxfragpackets() to handle these special cases. o If zone limit lowers either due to net.inet.ip.maxfragpackets or due to kern.ipc.nmbclusters, then new function ipq_drain_tomax() goes over buckets and frees the oldest packets until we are in the limit. The code that (incorrectly) did that in ip_slowtimo() is removed. o ip_reass() doesn't check any limits and calls uma_zalloc(M_NOWAIT). If it fails, a new function ipq_reuse() is called. This function will find the oldest packet in the currently locked bucket, and if there is none, it will search in other buckets until success. Sponsored by: Nginx, Inc.	2015-04-09 22:13:27 +00:00
glebius	8685b42463	In the ip_reass() do packet examination and adjusting before acquiring locks and doing lookups. Sponsored by: Nginx, Inc.	2015-04-09 21:32:32 +00:00
glebius	a68631b0e4	Make ip reassembly queue mutexes per-vnet, putting them into the structure that they protect. Sponsored by: Nginx, Inc.	2015-04-09 21:17:07 +00:00
glebius	977624bc79	Use TAILQ_FOREACH_SAFE() instead of implementing it ourselves. Sponsored by: Nginx, Inc.	2015-04-09 09:00:32 +00:00
glebius	03afa52780	If V_maxnipq is set to zero, drain the queue here and now, instead of relying on timeouts. Sponsored by: Nginx, Inc.	2015-04-09 08:56:23 +00:00
glebius	ddd581998f	o Since we always update either fragdrop or fragtimeout stat counter when we free a fragment, provide two inline functions that do that for us: ipq_drop() and ipq_timeout(). o Rename ip_free_f() to ipq_free() to match the name scheme of IP reassembly. o Remove assertion from ipq_free(), since it requires extra argument to be passed, but locking scheme is simple enough and function is static. Sponsored by: Nginx, Inc.	2015-04-09 08:52:02 +00:00

1 2 3 4 5 ...

5263 Commits