freebsd-nq

Author	SHA1	Message	Date
Scott Long	4c7070db25	Import the 'iflib' API library for network drivers. From the author: "iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers." Participation is purely optional. The IFLIB kernel config option is provided for drivers that want to transition between legacy and iflib modes of operation. ixl and ixgbe driver conversions will be committed shortly. We hope to see participation from the Broadcom and maybe Chelsio drivers in the near future. Submitted by: mmacy@nextbsd.org Reviewed by: gallatin Differential Revision: D5211	2016-05-18 04:35:58 +00:00
Eitan Adler	cef367e6a1	Don't repeat the the word 'the' (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)	2016-05-17 12:52:31 +00:00
Bjoern A. Zeeb	54d9f34ea3	Mark the unused arguments of various SYSINIT functions __unused. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-05-17 00:32:36 +00:00
Don Lewis	1ef3d54d20	When handling SIOCSIFNAME ensure that the new interface name is NUL terminated. Reject the rename attempt if the name is too long. MFC after: 1 week	2016-05-15 21:37:36 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Nick Hibma	6d07c1575b	Allow silencing of 'promiscuous mode enabled/disabled' messages. PR: 166255 Submitted by: eugen.grosbein.net Obtained from: eugen.grosbein.net MFC after: 1 week	2016-05-12 19:42:13 +00:00
Alan Somers	8907f744ff	Improve performance and functionality of the bitstring(3) api Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow for efficient searching of set or cleared bits starting from any bit offset within the bit string. Performance is improved by operating on longs instead of bytes and using ffsl() for searches within a long. ffsl() is a compiler builtin in both clang and gcc for most architectures, converting what was a brute force while loop search into a couple of instructions. All of the bitstring(3) API continues to be contained in the header file. Some of the functions are large enough that perhaps they should be uninlined and moved to a library, but that is beyond the scope of this commit. sys/sys/bitstring.h: Convert the majority of the existing bit string implementation from macros to inline functions. Properly protect the implementation from inadvertant macro expansion when included in a user's program by prefixing all private macros/functions and local variables with '_'. Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and bit_ffc() in terms of their "at" counterparts. Provide a kernel implementation of bit_alloc(), making the full API usable in the kernel. Improve code documenation. share/man/man3/bitstring.3: Add pre-exisiting API bit_ffc() to the synopsis. Document new APIs. Document the initialization state of the bit strings allocated/declared by bit_alloc() and bit_decl(). Correct documentation for bitstr_size(). The original code comments indicate the size is in bytes, not "elements of bitstr_t". The new implementation follows this lead. Only hastd assumed "elements" rather than bytes and it has been corrected. etc/mtree/BSD.tests.dist: tests/sys/Makefile: tests/sys/sys/Makefile: tests/sys/sys/bitstring.c: Add tests for all existing and new functionality. include/bitstring.h Include all headers needed by sys/bitstring.h lib/libbluetooth/bluetooth.h: usr.sbin/bluetooth/hccontrol/le.c: Include bitstring.h instead of sys/bitstring.h. sbin/hastd/activemap.c: Correct usage of bitstr_size(). sys/dev/xen/blkback/blkback.c Use new bit_alloc. sys/kern/subr_unit.c: Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of unrb.busy, which caches the number of bits set in unrb.map. When INVARIANTS are disabled, nothing needs to know that information. callapse_unr can be adapted to use bit_ffs and bit_ffc instead. Eliminating unrb.busy saves memory, simplifies the code, and provides a slight speedup when INVARIANTS are disabled. sys/net/flowtable.c: Use the new kernel implementation of bit-alloc, instead of hacking the old libc-dependent macro. sys/sys/param.h Update __FreeBSD_version to indicate availability of new API Submitted by: gibbs, asomers Reviewed by: gibbs, ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6004	2016-05-04 22:34:11 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Bjoern A. Zeeb	46b0539ca4	Remove the most useful INET \|\| INET6 check leftover from whenever, doing nothing. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-05-03 16:01:53 +00:00
Randall Stewart	abb901c5d7	Complete the UDP tunneling of ICMP msgs to those protocols interested in having tunneled UDP and finding out about the ICMP (tested by Michael Tuexen with SCTP.. soon to be using this feature). Differential Revision: http://reviews.freebsd.org/D5875	2016-04-28 15:53:10 +00:00
Conrad Meyer	dcbee68850	radix_mpath: Don't derefence a NULL pointer in for loop iteration It seems rn_dupedkey may be NULL, because of the NULL check inside the loop. (Also, the rt gets assigned from rn_dupedkey and NULL checked at top of loop.) However, the for-loop update condition happens before the top-of-loop check and dereferences 'rt' unconditionally. Instead, NULL-check before dereferencing. If rn_dupedkey cannot in fact be NULL, or something else protects this, feel free to revert this and add an ASSERT of some kind instead. This was introduced in r191080 (2009) and moved around slightly in r293657. Reported by: Coverity CID: 1348482 Sponsored by: EMC / Isilon Storage Division	2016-04-26 20:27:17 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Pedro F. Giffuni	8dfea46460	Remove slightly used const values that can be replaced with nitems(). Suggested by: jhb	2016-04-21 15:38:28 +00:00
Bjoern A. Zeeb	29bda43fa4	Add more fields from struct ifnet needed during debugging a kernel panic. Move if_fib into the right place. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-04-20 21:04:39 +00:00
Conrad Meyer	856d8ddbb3	radix rn_inithead: Fix minor leak in low memory conditions R_Zalloc is essentially a malloc(M_NOWAIT) wrapper. It is possible that 'rnh' failed to allocate, but 'rmh' succeeds. In that case, we bail out of rn_inithead() but previously did not free 'rmh'. Introduced in r287073 (projects/routing) / MFP r294706. Reported by: Coverity CID: 1350258 Sponsored by: EMC / Isilon Storage Division	2016-04-20 02:01:45 +00:00
Conrad Meyer	5412ec6e3f	bpf_getdltlist: Don't overrun 'lst' 'lst' is allocated with 'n1' members. 'n' indexes 'lst'. So 'n == n1' is an invalid 'lst' index. This is a follow-up to r296009. Reported by: Coverity CID: 1352743 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:39:31 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Pedro F. Giffuni	155d72c498	sys/net* : for pointers replace 0 with NULL. Mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 17:30:33 +00:00
Bjoern A. Zeeb	05fc416403	During if_vmove() we call if_detach_internal() which in turn calls the event handler notifying about interface departure and one of the consumers will detach if_bpf. There is no way for us to re-attach this easily as the DLT and hdrlen are only given on interface creation. Add a function to allow us to query the DLT and hdrlen from a current BPF attachment and after if_attach_internal() manually re-add the if_bpf attachment using these values. Found by panics triggered by nd6 packets running past BPF_MTAP() with no proper if_bpf pointer on the interface. Also add a basic DDB show function to investigate the if_bpf attachment of an interface. Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5896	2016-04-11 10:00:38 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Ravi Pokala	729a4cff7e	Revert accidental submit of WIP as part of r297609 Pointyhat to: rpokala	2016-04-06 04:58:20 +00:00
Ravi Pokala	06152bf0e1	Storage Controller Interface driver - typo in unimplemented macro in scic_sds_controller_registers.h s/contoller/controller/ PR: 207336 Submitted by: Tony Narlock <tony @ git-pull.com>	2016-04-06 04:50:28 +00:00
John Baldwin	2f9b9f9c7f	Remove an unneeded check. CPUs with valid per-CPU data are not absent. Sponsored by: Netflix	2016-04-05 00:09:19 +00:00
Bjoern A. Zeeb	905197505e	Catch up with some more fields. I needed the bpf one lately. Sponsored by: The FreeBSD Foundation	2016-03-31 18:53:13 +00:00
Edward Tomasz Napierala	35030a5dd4	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-29 13:56:59 +00:00
George V. Neville-Neil	cd4a821c2f	Add ethertype reserved for network testing MFC after: 2 weeks	2016-03-28 18:25:54 +00:00
Bjoern A. Zeeb	4f321dbd1c	Fix compile errors after r297225: - properly V_irtualise variable access unbreaking VIMAGE kernels. - remove the volatile from the function return type to make architecture using gcc happy [-Wreturn-type] "type qualifiers ignored on function return type" I am not entirely happy with this solution putting the u_int there but it will do for now.	2016-03-24 11:40:10 +00:00
George V. Neville-Neil	84cc0778d0	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306	2016-03-24 07:54:56 +00:00
Sepherosa Ziehau	1321c5029e	buf_ring/drbr: Add buf_ring_peek_clear_sc and use it in drbr_peek Unlike buf_ring_peek, it only supports single consumer mode, and it clears the cons_head if DEBUG_BUFRING/INVARIANTS is defined. The normal use case of drbr_peek for network drivers is: m = drbr_peek(br); err = hw_spec_encap(&m); /* could m_defrag/m_collapse / () if (err) { if (m == NULL) drbr_advance(br); else drbr_putback(br, m); /* break the loop / } drbr_advance(br); The race is: If hw_spec_encap() m_defrag or m_collapse the mbuf, i.e. the old mbuf was freed, or like the Hyper-V's network driver, that transmission- done does not even require the TX lock; then on the other CPU at the () time, the freed mbuf could be recycled and being drbr_enqueue even before the current CPU had the chance to call drbr_{advance,putback}. This triggers a panic in drbr_enqueue duplicated element check, if DEBUG_BUFRING/INVARIANTS is defined. Use buf_ring_peek_clear_sc() in drbr_peek() to fix the above race. This change is a NO-OP, if neither DEBUG_BUFRING nor INVARIANTS are defined. MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5416	2016-02-29 03:54:51 +00:00
Konstantin Belousov	70209aca16	In bpf_getdltlist(), do not call copyout(9) while holding bpf lock. Copy the data into temprorary malloced buffer and drop the lock for copyout. Reported, reviewed and tested by: cem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-24 22:00:35 +00:00
Marcelo Araujo	d931334bd4	Fix regression introduced on 272446r. lagg(4) supports the protocol none, where it disables any traffic without disabling the lagg(4) interface itself. PR: 206921 Submitted by: Pushkar Kothavade <pushkarbk@gmail.com> Reviewed by: rpokala Approved by: bapt (mentor) MFC after: 3 weeks Sponsored by: gandi.net Differential Revision: https://reviews.freebsd.org/D5076	2016-02-19 06:35:53 +00:00
Devin Teske	41c0ec9a16	Merge SVN r295220 (bz) from projects/vnet/ Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.	2016-02-11 17:07:19 +00:00
Bjoern A. Zeeb	a5243af262	Code duplication but rib_head is special. Not found an easy way to go back and harmize the use cases among RIB, IPFW, PF yet but it's also not the scope of this work. Prevents instant panics on teardown and frees the FIB bits again. Sponsored by: The FreeBSD Foundation	2016-02-03 21:56:51 +00:00
Bjoern A. Zeeb	2414e86439	MfH @r295202 Expect to see panics in routing code at least now.	2016-02-03 11:49:51 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Gleb Smirnoff	d17d4c6b2a	Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.	2016-01-27 00:20:07 +00:00
Marko Zec	ca7ba6a8fd	Prune a definition which is / was never used.	2016-01-25 20:35:15 +00:00
Alexander V. Chernikov	94017572ab	Fix flowtable part missed in r294706.	2016-01-25 09:31:32 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	809da2a3e0	Remove unused radix_mpath definitions.	2016-01-25 05:28:19 +00:00
Marcelo Araujo	d62edc5eb5	Add an IOCTL rr_limit to let users fine tuning the number of packets to be sent using roundrobin protocol and set a better granularity and distribution among the interfaces. Tuning the number of packages sent by interface can increase throughput and reduce unordered packets as well as reduce SACK. Example of usage: # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \ 192.168.1.1 netmask 255.255.255.0 # ifconfig lagg0 rr_limit 500 Reviewed by: thompsa, glebius, adrian (old patch) Approved by: bapt (mentor) Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D540	2016-01-23 04:18:44 +00:00
Bjoern A. Zeeb	009e81b164	MFH @r294567	2016-01-22 15:11:40 +00:00
Bjoern A. Zeeb	1f12da0e82	Just checkpoint the WIP in order to be able to make the tree update easier. Note: this is currently not in a usable state as certain teardown parts are not called and the DOMAIN rework is missing. More to come soon and find its way to head. Obtained from: P4 //depot/user/bz/vimage/... Sponsored by: The FreeBSD Foundation	2016-01-22 15:00:01 +00:00
Alexander V. Chernikov	b7d076ed19	Clean up original route path selection logic a bit. NULL pointer dereference claimed by Coverity was possible if one (or several) next-hops for had their weights set to 0. CID: 1348482	2016-01-15 13:47:11 +00:00
Alexander V. Chernikov	fcbfdb37a1	Fix panic in IP redirect. Panic was introduced in r293466. Found by: Yamagi Burmeister <lists at yamagi.org>>	2016-01-14 16:31:00 +00:00
Alexander V. Chernikov	10e0e23528	Remove now-unused wrappers for various routing functions.	2016-01-14 08:54:44 +00:00
Alexander V. Chernikov	0eb64f4e44	Remove RTF_RNH_LOCKED support from rtalloc1_fib(). Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC	2016-01-13 14:32:48 +00:00
Alexander V. Chernikov	59747033cd	Bring RADIX_MPATH support to new routing KPI to ease migration. Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.	2016-01-11 08:45:28 +00:00
Alexander V. Chernikov	e5f3746abd	Do not rewrite all ro_flags.	2016-01-11 08:00:13 +00:00
Alexander V. Chernikov	64e9493420	Fix userland build broken by r293470. Pointy hat to: melifaro	2016-01-09 18:42:12 +00:00
Alexander V. Chernikov	36402a681f	Finish r275196: do not dereference rtentry in if_output() routines. The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae	2016-01-09 16:34:37 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Alexander V. Chernikov	f2b2e77a41	(Temporarily) remove route_redirect_event eventhandler. Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np	2016-01-09 06:26:40 +00:00
Alexander V. Chernikov	16703ea811	Please Coverity by removing unneccessary check (rt_key() is always set). Coverity CID: 1347797	2016-01-09 05:39:06 +00:00
Alexander V. Chernikov	048738b546	Do more fine-grained locking in rtrequest1_fib(). Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643. Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better locking for RTM_ADD / RTM_DELETE cases.	2016-01-08 16:25:11 +00:00
Alexander V. Chernikov	9a1b64d5a0	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
Alexander V. Chernikov	0d4df0290e	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae	2016-01-03 09:54:03 +00:00
Alexander V. Chernikov	6cdb18544d	Remove second EVENTHANDLER_REGISTER slipped in r292978. Describe the reason of doing unconditional M_PREPEND in ether_output().	2016-01-01 10:15:06 +00:00
Marcelo Araujo	25656def0d	Clean up unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie Approved by: rodrigc (mentor) Differential Revision: https://reviews.freebsd.org/D4719	2015-12-31 07:03:41 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Marcelo Araujo	2bfd3dfb9f	Wrap using #ifdef 'notyet' those variables and statements not yet implemented to lower the compiler warnings. It fix the case of unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie, ae Approved by: bapt (mentor) Differential Revision: https://reviews.freebsd.org/D4720	2015-12-31 02:01:20 +00:00
Alexander V. Chernikov	a18742e938	Add SFF-8024 Extended Specification Compliance Submitted by: markb_mellanox.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4666	2015-12-28 09:26:07 +00:00
Bjoern A. Zeeb	f501e6f136	If vnets are torn down while ifconfig runs an ioctl to say, destroy an epair(4), we may hit if_detach_internal() without holding a lock and by the time we aquire it the interface might be gone. We should not panic() in this case as it is our fault for not holding the lock all the way. It is not ideal to return silently without error to user space, but other callers will all ignore the return values so do not change the entire KPI for little benefit for now. The ifp will be dealt with one way or another still. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4529	2015-12-22 15:03:45 +00:00
Bjoern A. Zeeb	616bc4f476	If bootverbose is enabled every vnet startup and virtual interface creation will print extra lines on the console. We are generally not interested in this (repeated) information for each VNET. Thus only print it for the default VNET. Virtual interfaces on the base system will remain printing information, but e.g. each loopback in each vnet will no longer cause a "bpf attached" line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4531	2015-12-22 15:00:04 +00:00
Bjoern A. Zeeb	76d68eccbd	Simplify bringup order by removing a SYSINIT making it a static list initialization. Mfp4 @180384,180385: There is no need for a dedicated SYSINIT here. The list can be initialized statically. Sponsored by: CK Software GmbH Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4528	2015-12-22 14:57:04 +00:00
Steven Hartland	d6e82913c1	Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay	2015-12-17 14:41:30 +00:00
Alexander V. Chernikov	427c2f4ef0	Provide additional lle data in IPv6 lltable dump used by ndp(8). Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae	2015-12-16 10:14:16 +00:00
Alexander V. Chernikov	0792bcbb54	Convert if_stf(4) to new routing api.	2015-12-16 09:18:20 +00:00
Steven Hartland	52e53e2de0	Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111	2015-12-15 16:02:11 +00:00
Alexander V. Chernikov	6af272d88e	Fix PINNED routes handling. Before r291643, adding new interface prefix had the following logic: try_add: EEXIST && (PINNED) { try_del(w/o PINNED flag) if (OK) try_add(PINNED) } In r291643, deletion was performed w/ PINNED flag held which leaded to new interface prefixes (like ::1) overriding older ones. Fix this by requesting deletion w/o RTF_PINNED. PR: kern/205285 Submitted by: Fabian Keil <fk at fabiankeil.de>	2015-12-13 16:37:01 +00:00
Alexander V. Chernikov	12cb7521c2	Remove LLE read lock from IPv6 fast path. LLE structure is mostly unchanged during its lifecycle: there are only 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we send NS to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: Special r_skip_req (introduced in D3688) value is used for fast path feedback. It is read lockless by fast path, but updated under req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. After transitioning to STALE state, callout timer is armed to run each V_nd6_delay seconds to make sure that if packet was transmitted at the start of given interval, we would be able to switch to PROBE state in V_nd6_delay seconds as user expects. (in STALE state) timer is rescheduled until original V_nd6_gctimer expires keeping lle in STALE state (remaining timer value stored in lle_remtime). (in STALE state) timer is rescheduled if packet was transmitted less that V_nd6_delay seconds ago to make sure we transition to PROBE state exactly after V_n6_delay seconds. As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled by fast path without acquiring lle read lock. Differential Revision: https://reviews.freebsd.org/D3780	2015-12-13 07:39:49 +00:00
Alexander V. Chernikov	65ff3638df	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC	2015-12-08 10:50:03 +00:00
Alexander V. Chernikov	f8aee88f0b	Remove LLE read lock from IPv4 fast path. LLE structure is mostly unchanged during its lifecycle. To be more specific, there are 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we re-send arp request to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: 1) introduce special r_skip_req field which is read lockless by fast path, but updated under (new) req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. 2) introduce simple state machine: incomplete->reachable<->verify->deleted. Before that we implicitely had incomplete->reachable->deleted state machine, with V_arpt_keep between "reachable" and "deleted". Verification was performed in runtime 5 seconds before V_arpt_keep expire. This is changed to "change state to verify 5 seconds before V_arpt_keep, set r_skip_req to non-zero value and check it every second". If the value is zero - then send arp verification probe. These changes do not introduce any signifficant control plane overhead: typically lle callout timer would fire 1 time more each V_arpt_keep (1200s) for used lles and up to arp_maxtries (5) for dead lles. As a result, all packets towards "reachable" lle are handled by fast path without acquiring lle read lock. Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler might keep LLE lock for signifficant amount of time, which might not be feasible for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock). Differential Revision: https://reviews.freebsd.org/D3688	2015-12-05 09:50:37 +00:00
Alexander V. Chernikov	4b3dc89847	Move RTF_PINNED handling to generic route code. This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.	2015-12-02 08:17:31 +00:00
Enji Cooper	af5c99e53f	Fix LINT-NOIP kernels after r291467 rn is only used if INET or INET6 are defined Sponsored by: EMC / Isilon Storage Division	2015-12-01 05:59:53 +00:00
Alexander V. Chernikov	674e0823c1	Move flowtable rte checks to separate function.	2015-11-30 05:59:22 +00:00
Alexander V. Chernikov	e8b0643eee	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().	2015-11-30 05:51:14 +00:00
Enji Cooper	766b4e4b5c	Fix building sys/modules/if_enc by adding missing headers X-MFC with: r291292, r291299 (if that ever happens) Pointyhat to: ae	2015-11-25 21:16:10 +00:00
Andrey V. Elsukov	03b7b4bf05	Fix the build.	2015-11-25 11:31:07 +00:00
Andrey V. Elsukov	ef91a9765d	Overhaul if_enc(4) and make it loadable in run-time. Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).	2015-11-25 07:31:59 +00:00
Fabien Thomas	d6d3f24890	Implement the sadb_x_policy_priority field as it is done in Linux: lower priority policies are inserted first. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield	2015-11-17 14:39:33 +00:00
Alexander V. Chernikov	e4790abf19	Pass provided af instead of AF_UNSPEC to setwa_f callback.	2015-11-14 18:16:17 +00:00
Alexander V. Chernikov	8ad43f2d0a	Move iflladdr_event eventhandler invocation to if_setlladdr. Suggested by: glebius	2015-11-14 13:34:03 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Alexander V. Chernikov	b13c5b5db2	Use lladdr_event to propagate gratiotus arp. Differential Revision: https://reviews.freebsd.org/D4019	2015-11-09 10:11:14 +00:00
Alexander V. Chernikov	ddd208f7ad	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
Steven Hartland	c1be893c44	Add sysctl to control LACP strict compliance default Add net.link.lagg.lacp.default_strict_mode which defines the default value for LACP strict compliance for created lagg devices. Also: * Add lacp_strict option to ifconfig(8). * Fix lagg(4) creation examples. * Minor style(9) fix. MFC after: 1 week	2015-11-06 15:33:27 +00:00
George V. Neville-Neil	33872124a5	Replace the fastforward path with tryforward which does not require a sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate)	2015-11-05 07:26:32 +00:00
Randall Stewart	d1a6f62c45	Fix three flowtable bugs, a) one lookup issue, b) a two cleaner issue. MFC after: 3 days Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4014	2015-11-02 21:21:00 +00:00
Alexander V. Chernikov	bb3d23fd35	Fix lladdr change propagation for on vlans on top of it. Fix lladdr update when setting mac address manually. Fix lladdr_event for slave ports addition. MFC after: 4 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D4004	2015-11-01 19:59:04 +00:00
Bryan Drewery	2780ba06c7	Avoid passing an uninitialized 'i'. Currently nothing was depending on it anyhow. Coverity CID: 1331562	2015-10-29 18:58:18 +00:00
Andrey V. Elsukov	50bc87bc43	Check the size of data available in mbuf, before using them. PR: 202667 MFC after: 1 week	2015-10-28 17:55:37 +00:00
Kristof Provost	2602284308	pf: Fix compliation warning with gcc While fixing the PF_ANEQ() macro I messed up the parentheses, leading to compliation warnings with gcc. Spotted by: ian Pointy Hat: kp	2015-10-25 18:09:03 +00:00
Kristof Provost	7d7624233a	PF_ANEQ() macro will in most situations returns TRUE comparing two identical IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32 bits (like for IPv6 packet). Those bits containt some garbage and in result PF_ANEQ() wrongly returns TRUE. Fix: Check if packet is of AF_INET type and if it is then compare only first 32 bits of data. PR: 204005 Submitted by: Miłosz Kaniewski	2015-10-25 13:14:53 +00:00
Ed Maste	40a02d00a5	if_tap: correct typo in sysctl description (Enably) Sponsored by: The FreeBSD Foundation	2015-10-21 19:56:16 +00:00
Alexander V. Chernikov	f221bcaa06	Remove several compat functions from pre-fib era.	2015-10-17 17:26:44 +00:00
Hiroki Sato	b7a581eaa6	Fix a panic when destroying a lagg interface. Differential Revision: https://reviews.freebsd.org/D3883	2015-10-16 01:16:01 +00:00
Kristof Provost	c110fc49da	pf: Fix TSO issues In certain configurations (mostly but not exclusively as a VM on Xen) pf produced packets with an invalid TCP checksum. The problem was that pf could only handle packets with a full checksum. The FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only addresses, length and protocol). Certain network interfaces expect to see the pseudo-header checksum, so they end up producing packets with invalid checksums. To fix this stop calculating the full checksum and teach pf to only update TCP checksums if TSO is disabled or the change affects the pseudo-header checksum. PR: 154428, 193579, 198868 Reviewed by: sbruno MFC after: 1 week Relnotes: yes Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D3779	2015-10-14 16:21:41 +00:00
Hiroki Sato	023d10cbc7	Fix a bug that caused reinitialization failure of MAC addresses on the lagg interface when removing the primary port. PR: 201916 Differential Revision: https://reviews.freebsd.org/D3301	2015-10-07 06:32:34 +00:00
Marcelo Araujo	973532fc7d	Remove per complete the fec aggregation protocol. The remove began with revision r271733. NOTE: This patch must never be merge to 10-Stable Reviewed by: glebius Approved by: bapt (mentor) Relnotes: Yes Sponsored by: EuroBSDCon Sweden. Differential Revision: D3786	2015-10-04 08:00:29 +00:00
Hiroki Sato	f1aaad0cd9	Add IFCAP_LINKSTATE support.	2015-10-03 09:15:23 +00:00
Andrey V. Elsukov	1a6fb597b0	Always detach encap handler when reconfiguring tunnel. Reported by: hrs MFC after: 1 week	2015-10-03 03:57:58 +00:00
Alexander V. Chernikov	1558cb2448	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Andrey V. Elsukov	b71bed24a6	Use KASSERT for some checks, that are late to do. Discussed with: melifaro, glebius	2015-09-16 13:17:00 +00:00
Oleg Bulyzhin	3f70ebbf05	Remove superfluous m_freem(). MFC after: 1 month	2015-09-16 10:07:45 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Eric van Gyzen	17a036563d	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Hans Petter Selasky	d76d40126e	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
Hiroki Sato	b1c250ff3f	- Remove GIF_{SEND,ACCEPT}_REVETHIP. - Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions. MFC after: 3 days	2015-09-10 05:59:39 +00:00
Alexander V. Chernikov	441f9243df	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464	2015-09-05 05:33:20 +00:00
Hiroki Sato	18e199ad72	Fix a panic which was reproducible by an infinite loop of "ifconfig epair0 create && ifconfig epair0a destroy". This was caused by an uninitialized function pointer in softc->media.	2015-09-02 16:30:45 +00:00
Alexander V. Chernikov	3b0fd911fa	Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in alloc handler, based on flags.	2015-08-31 05:03:36 +00:00
Adrian Chadd	8f1111cf0b	Remove now unused (and #if 0'ed out) headers.	2015-08-29 04:33:31 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Kristof Provost	64b3b4d611	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza	c614d0a443	Fix the spelling of eri's name. Pointy hat to: loos MFC with: r287009	2015-08-24 23:40:36 +00:00
Luiz Otavio O Souza	0a70aaf8f5	Add ALTQ(9) support for the CoDel algorithm. CoDel is a parameterless queue discipline that handles variable bandwidth and RTT. It can be used as the single queue discipline on an interface or as a sub discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ. Differential Revision: https://reviews.freebsd.org/D3272 Reviewd by: rpaulo, gnn (previous version) Obtained from: pfSense Sponsored by: Rubicon Communications (Netgate)	2015-08-21 22:02:22 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Hiren Panchasara	0e02b43a07	Make LAG LACP fast timeout tunable through IOCTL. Differential Revision: D3300 Submitted by: LN Sundararajan <lakshmi.n at msystechnologies> Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas MFC after: 2 weeks Sponsored by: Panasas	2015-08-12 20:21:04 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Luiz Otavio O Souza	9224217213	Remove the mtx_sleep() from the kqueue f_event filter. The filter is called from the network hot path and must not sleep. The filter runs with the descriptor lock held and does not manipulates the buffers, so it is not necessary sleep when the hold buffer is in use. Just ignore the hold buffer contents when it is being copied to user space (when hold buffer in use is set). This fix the "Sleeping thread owns a non-sleepable lock" panic when the userland thread is too busy reading the packets from bpf(4). PR: 200323 MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-08-03 22:14:45 +00:00
Luiz Otavio O Souza	98fa5d858c	Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the buffers while the hold buffer is in use). Suggested by: ed, ghelmer MFC with: r286142	2015-08-03 18:22:31 +00:00
John-Mark Gurney	bba6880eab	looks like all archs either have clang or cdefs included before.. drop this include as unnecessary.. Requested by: bde	2015-08-02 21:33:40 +00:00
John-Mark Gurney	70e47040b0	convert to C11's _Static_assert, and pull in sys/cdefs.h for compatibility w/ older non-C11 compilers... passed make tinerdbox.. Suggested by: imp	2015-08-02 00:15:52 +00:00
Luiz Otavio O Souza	f87e372ef2	Remove two unnecessary sleeps from the hot path in bpf(4). The first one never triggers because bpf_canfreebuf() can only be true for zero-copy buffers and zero-copy buffers are not read with read(2). The second also never triggers, because we check the free buffer before calling ROTATE_BUFFERS(). If the hold buffer is in use the free buffer will be NULL and there is nothing else to do besides drop the packet. If the free buffer isn't NULL the hold buffer _is_ free and it is safe to rotate the buffers. Update the comment in ROTATE_BUFFERS macro to match the logic described here. While here fix a few typos in comments. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza	faa693cdbe	Remove the sleep from the buffer allocation routine. The buffer must be allocated (or even changed) before the interface is set and thus, there is no need to verify if the buffer is in use. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza	4f42daa4a3	Do not allocate the buffers at opening of the descriptor, because once the buffer is allocated we are committed to a particular buffer method (BPF_BUFMODE_BUFFER in this case). If we are using zero-copy buffers, the userland program must register its buffers before set the interface. If we are using kernel memory buffers, we can allocate the buffer at the time that the interface is being set. This fix allows the usage of BIOCSETBUFMODE after r235746. Update the comments to reflect the recent changes. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:02:12 +00:00
Andrey V. Elsukov	926381e108	Ansify if_stf.c	2015-07-31 09:04:22 +00:00
John-Mark Gurney	af024d3b23	temporarily fix build.. This isn't the final fix, and testing is still on going, but it has passed world for mips and powerpc... I know this has an extra semicolon, but this is the patch that is tested... Looks like better fix is to use _Static_assert...	2015-07-31 07:48:08 +00:00
John-Mark Gurney	817c7ed900	Clean up this header file... use CTASSERTs now that we have them... Replace a draft w/ RFC that's over 10 years old. Note that _AALG and _EALG do not need to match what the IKE daemons think they should be.. This is part of the KABI... I decided to renumber AESCTR, but since we've never had working AESCTR mode, I'm not really breaking anything.. and it shortens a loop by quite a bit.. remove SKIPJACK IPsec support... SKIPJACK never made it out of draft (in 1999), only has 80bit key, NIST recommended it stop being used after 2010, and setkey nor any of the IKE daemons I checked supported it... jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6 Reviewed by: gnn (earlier version)	2015-07-31 00:23:21 +00:00
Andrey V. Elsukov	a5965d1513	Build if_stf(4) module only when both INET and INET6 support are enabled.	2015-07-30 10:26:43 +00:00
Luiz Otavio O Souza	8b15f615e0	Follow r256586 and rename the kernel version of the Free() macro to R_Free(). This matches the other macros and reduces the chances to clash with other headers. This also fixes the build of radix.c outside of the kernel environment. Reviewed by: glebius	2015-07-30 02:09:03 +00:00
Andrey V. Elsukov	10a0e0bf0a	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Marko Zec	22a9384098	Prevent null-pointer dereferencing. MFC after: 3 days	2015-07-20 08:21:51 +00:00
Mark Murray	3aa77530ca	* Address review (and add a bit myself). - Tweek man page. - Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA. - Tidy up headers a bit. - Tidy up declarations a bit. - Make static in a couple of places where needed. - Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used. - Get rid of random__process_buffer() functions. They were only used in one place each, and are better subsumed into those places. - Remove _post_read() functions as they are stubs everywhere. - Assert against buffer size illegalities. - Clean up some silly code in the randomdev_read() routine. - Make the harvesting more consistent. - Make some requested argument name changes. - Tidy up and clarify a few comments. - Make some requested comment changes. - Make some requested macro changes. * NOTE: the thing calling itself a 'unit test' is not yet a proper unit test, but it helps me ensure things work. It may be a proper unit test at some time in the future, but for now please don't make any assumptions or hold any expectations. Differential Revision: https://reviews.freebsd.org/D2025 Approved by: so (/dev/random blanket)	2015-07-12 18:14:38 +00:00
Luigi Rizzo	847bf38369	Sync netmap sources with the version in our private tree. This commit contains large contributions from Giuseppe Lettieri and Stefano Garzarella, is partly supported by grants from Verisign and Cisco, and brings in the following: - fix zerocopy monitor ports and introduce copying monitor ports (the latter are lower performance but give access to all traffic in parallel with the application) - exclusive open mode, useful to implement solutions that recover from crashes of the main netmap client (suggested by Patrick Kelsey) - revised memory allocator in preparation for the 'passthrough mode' (ptnetmap) recently presented at bsdcan. ptnetmap is described in S. Garzarella, G. Lettieri, L. Rizzo; Virtual device passthrough for high speed VM networking, ACM/IEEE ANCS 2015, Oakland (CA) May 2015 http://info.iet.unipi.it/~luigi/research.html - fix rx CRC handing on ixl - add module dependencies for netmap when building drivers as modules - minor simplifications to device-specific routines (txsync, rxsync) - general code cleanup (remove unused variables, introduce macros to access rings and remove duplicate code, Applications do not need to be recompiled, unless of course they want to use the new features (monitors and exclusive open). Those willing to try this code on stable/10 can just update the sys/dev/netmap/, sys/net/netmap with the version in HEAD and apply the small patches to individual device drivers. MFC after: 1 month Sponsored by: (partly) Verisign, Cisco	2015-07-10 05:51:36 +00:00
Patrick Kelsey	e9617c305c	Fix if_loop so bpfwrite() can use it regardless of the state of bd_hdrcmplt. As if_loop does not use link-level headers, its behavior when used by bpfwrite() should be the same regardless of the state of bd_hdrcmplt. Without this change, libpcap (and other BPF users that work like it) fail when writing to loopback interfaces. Differential Revision: https://reviews.freebsd.org/D2989 Reviewed by: gnn, melifaro Approved by: jmallett (mentor) MFC after: 3 days	2015-07-06 02:12:49 +00:00
George V. Neville-Neil	987de84445	New AES modes for IPSec, user space components. Update setkey and libipsec to understand aes-gcm-16 as an encryption method. A partial commit of the work in review D2936. Submitted by: eri Reviewed by: jmg MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-03 20:09:14 +00:00
Mark Murray	d1b06863fb	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00

1 2 3 4 5 ...

3662 Commits