freebsd-nq

Author	SHA1	Message	Date
Alexander V. Chernikov	1fe201c322	Simplify the way of attaching IPv6 link-layer header. Problem description: How do we currently perform layer 2 resolution and header imposition: For IPv4 we have the following chain: ip_output() -> (ether\|atm\|whatever)_output() -> arpresolve() Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data. For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr() We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder). After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again. As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()). Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr() The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve() Error handling: Currently sending packet to non-existing la results in ip6_<output\|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether\|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0. (And EWOULDBLOCK is actually used by IB/TOE code). Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469	2015-09-16 14:26:28 +00:00
Andrey V. Elsukov	b71bed24a6	Use KASSERT for some checks, that are late to do. Discussed with: melifaro, glebius	2015-09-16 13:17:00 +00:00
Oleg Bulyzhin	3f70ebbf05	Remove superfluous m_freem(). MFC after: 1 month	2015-09-16 10:07:45 +00:00
Alexander V. Chernikov	59c180c35c	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
Alexander V. Chernikov	d3cdb71655	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
Eric van Gyzen	17a036563d	Fix the handling of IPv6 On-Link Redirects. On receipt of a redirect message, install an interface route for the redirected destination. On removal of the corresponding Neighbor Cache entry, remove the interface route. This requires changes in rtredirect_fib() to cope with an AF_LINK address for the gateway and with the absence of RTF_GATEWAY. This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo Phase 2 test suite. Unrelated to the above, fix a recursion on the radix node head lock triggered by the Tahi Redirected to Alternate Router test cases. When I first wrote this patch in October 2012, all Section 2 (Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE, and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported that it passes Tahi. (Thanks!) These other test cases also passed in 2012: * the RTF_MODIFIED case, with IPv4 and IPv6 (using a RTF_HOST\|RTF_GATEWAY route for the destination) * the redirected-to-self case, with IPv4 and IPv6 * a valid IPv4 redirect All testing in 2012 was done with WITNESS and INVARIANTS. Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015, Mark Kelley <mark_kelley@dell.com> in 2012, TC Telkamp <terence_telkamp@dell.com> in 2012 PR: 152791 Reviewed by: melifaro (current rev), bz (earlier rev) Approved by: kib (mentor) MFC after: 1 month Relnotes: yes Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D3602	2015-09-14 19:17:25 +00:00
Alexander V. Chernikov	3e7a2321e3	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
Hans Petter Selasky	d76d40126e	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
Hiroki Sato	b1c250ff3f	- Remove GIF_{SEND,ACCEPT}_REVETHIP. - Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions. MFC after: 3 days	2015-09-10 05:59:39 +00:00
Alexander V. Chernikov	441f9243df	Constantify lookup key in ifa_ifwith* functions. Some places in our network stack already have const arguments (like if_output() routines and LLE functions). Code using ifa_ifwith (and similar functins) along with LLE/_output functions is currently bound to use tricks like __DECONST(). Provide a cleaner way by making sockaddr lookup key really constant. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3464	2015-09-05 05:33:20 +00:00
Hiroki Sato	18e199ad72	Fix a panic which was reproducible by an infinite loop of "ifconfig epair0 create && ifconfig epair0a destroy". This was caused by an uninitialized function pointer in softc->media.	2015-09-02 16:30:45 +00:00
Alexander V. Chernikov	3b0fd911fa	Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in alloc handler, based on flags.	2015-08-31 05:03:36 +00:00
Adrian Chadd	8f1111cf0b	Remove now unused (and #if 0'ed out) headers.	2015-08-29 04:33:31 +00:00
Adrian Chadd	e5562eb934	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
Kristof Provost	64b3b4d611	pf: Remove support for 'scrub fragment crop\|drop-ovl' The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse users into making poor choices. It's also a fairly large amount of complex code, so just remove the support altogether. Users who have 'scrub fragment crop\|drop-ovl' in their pf configuration will be implicitly converted to 'scrub fragment reassemble'. Reviewed by: gnn, eri Relnotes: yes Differential Revision: https://reviews.freebsd.org/D3466	2015-08-27 21:27:47 +00:00
Luiz Otavio O Souza	c614d0a443	Fix the spelling of eri's name. Pointy hat to: loos MFC with: r287009	2015-08-24 23:40:36 +00:00
Luiz Otavio O Souza	0a70aaf8f5	Add ALTQ(9) support for the CoDel algorithm. CoDel is a parameterless queue discipline that handles variable bandwidth and RTT. It can be used as the single queue discipline on an interface or as a sub discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ. Differential Revision: https://reviews.freebsd.org/D3272 Reviewd by: rpaulo, gnn (previous version) Obtained from: pfSense Sponsored by: Rubicon Communications (Netgate)	2015-08-21 22:02:22 +00:00
Alexander V. Chernikov	5a2555160f	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
Hiren Panchasara	0e02b43a07	Make LAG LACP fast timeout tunable through IOCTL. Differential Revision: D3300 Submitted by: LN Sundararajan <lakshmi.n at msystechnologies> Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas MFC after: 2 weeks Sponsored by: Panasas	2015-08-12 20:21:04 +00:00
Alexander V. Chernikov	0447c1367a	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
Alexander V. Chernikov	314294de5c	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
Alexander V. Chernikov	41cb42a633	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
Alexander V. Chernikov	2caee4be35	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
Alexander V. Chernikov	11cdad9873	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
Alexander V. Chernikov	4bdf0b6a9a	MFP r274295: * Move interface route cleanup to route.c:rt_flushifroutes() * Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2015-08-08 18:14:59 +00:00
Alexander V. Chernikov	e362cf0e9f	MFP r274553: * Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete lla_create now returns with LLE_EXCLUSIVE lock for lle. * Provide typedefs for new/existing lltable callbacks. Reviewed by: ae	2015-08-08 17:48:54 +00:00
Luiz Otavio O Souza	9224217213	Remove the mtx_sleep() from the kqueue f_event filter. The filter is called from the network hot path and must not sleep. The filter runs with the descriptor lock held and does not manipulates the buffers, so it is not necessary sleep when the hold buffer is in use. Just ignore the hold buffer contents when it is being copied to user space (when hold buffer in use is set). This fix the "Sleeping thread owns a non-sleepable lock" panic when the userland thread is too busy reading the packets from bpf(4). PR: 200323 MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-08-03 22:14:45 +00:00
Luiz Otavio O Souza	98fa5d858c	Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the buffers while the hold buffer is in use). Suggested by: ed, ghelmer MFC with: r286142	2015-08-03 18:22:31 +00:00
John-Mark Gurney	bba6880eab	looks like all archs either have clang or cdefs included before.. drop this include as unnecessary.. Requested by: bde	2015-08-02 21:33:40 +00:00
John-Mark Gurney	70e47040b0	convert to C11's _Static_assert, and pull in sys/cdefs.h for compatibility w/ older non-C11 compilers... passed make tinerdbox.. Suggested by: imp	2015-08-02 00:15:52 +00:00
Luiz Otavio O Souza	f87e372ef2	Remove two unnecessary sleeps from the hot path in bpf(4). The first one never triggers because bpf_canfreebuf() can only be true for zero-copy buffers and zero-copy buffers are not read with read(2). The second also never triggers, because we check the free buffer before calling ROTATE_BUFFERS(). If the hold buffer is in use the free buffer will be NULL and there is nothing else to do besides drop the packet. If the free buffer isn't NULL the hold buffer _is_ free and it is safe to rotate the buffers. Update the comment in ROTATE_BUFFERS macro to match the logic described here. While here fix a few typos in comments. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza	faa693cdbe	Remove the sleep from the buffer allocation routine. The buffer must be allocated (or even changed) before the interface is set and thus, there is no need to verify if the buffer is in use. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza	4f42daa4a3	Do not allocate the buffers at opening of the descriptor, because once the buffer is allocated we are committed to a particular buffer method (BPF_BUFMODE_BUFFER in this case). If we are using zero-copy buffers, the userland program must register its buffers before set the interface. If we are using kernel memory buffers, we can allocate the buffer at the time that the interface is being set. This fix allows the usage of BIOCSETBUFMODE after r235746. Update the comments to reflect the recent changes. MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-31 20:02:12 +00:00
Andrey V. Elsukov	926381e108	Ansify if_stf.c	2015-07-31 09:04:22 +00:00
John-Mark Gurney	af024d3b23	temporarily fix build.. This isn't the final fix, and testing is still on going, but it has passed world for mips and powerpc... I know this has an extra semicolon, but this is the patch that is tested... Looks like better fix is to use _Static_assert...	2015-07-31 07:48:08 +00:00
John-Mark Gurney	817c7ed900	Clean up this header file... use CTASSERTs now that we have them... Replace a draft w/ RFC that's over 10 years old. Note that _AALG and _EALG do not need to match what the IKE daemons think they should be.. This is part of the KABI... I decided to renumber AESCTR, but since we've never had working AESCTR mode, I'm not really breaking anything.. and it shortens a loop by quite a bit.. remove SKIPJACK IPsec support... SKIPJACK never made it out of draft (in 1999), only has 80bit key, NIST recommended it stop being used after 2010, and setkey nor any of the IKE daemons I checked supported it... jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6 Reviewed by: gnn (earlier version)	2015-07-31 00:23:21 +00:00
Andrey V. Elsukov	a5965d1513	Build if_stf(4) module only when both INET and INET6 support are enabled.	2015-07-30 10:26:43 +00:00
Luiz Otavio O Souza	8b15f615e0	Follow r256586 and rename the kernel version of the Free() macro to R_Free(). This matches the other macros and reduces the chances to clash with other headers. This also fixes the build of radix.c outside of the kernel environment. Reviewed by: glebius	2015-07-30 02:09:03 +00:00
Andrey V. Elsukov	10a0e0bf0a	Eliminate the use of m_copydata() in gif_encapcheck(). ip_encap already has inspected mbuf's data, at least an IP header. And it is safe to use mtod() and do direct access to needed fields. Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that mbuf has a packet header. Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also remove "martian filters" checks. According to RFC 4213 it is enough to verify that the source address is the address of the encapsulator, as configured on the decapsulator. Reviewed by: melifaro Obtained from: Yandex LLC Sponsored by: Yandex LLC	2015-07-29 14:07:43 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Marko Zec	22a9384098	Prevent null-pointer dereferencing. MFC after: 3 days	2015-07-20 08:21:51 +00:00
Mark Murray	3aa77530ca	* Address review (and add a bit myself). - Tweek man page. - Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA. - Tidy up headers a bit. - Tidy up declarations a bit. - Make static in a couple of places where needed. - Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used. - Get rid of random__process_buffer() functions. They were only used in one place each, and are better subsumed into those places. - Remove _post_read() functions as they are stubs everywhere. - Assert against buffer size illegalities. - Clean up some silly code in the randomdev_read() routine. - Make the harvesting more consistent. - Make some requested argument name changes. - Tidy up and clarify a few comments. - Make some requested comment changes. - Make some requested macro changes. * NOTE: the thing calling itself a 'unit test' is not yet a proper unit test, but it helps me ensure things work. It may be a proper unit test at some time in the future, but for now please don't make any assumptions or hold any expectations. Differential Revision: https://reviews.freebsd.org/D2025 Approved by: so (/dev/random blanket)	2015-07-12 18:14:38 +00:00
Luigi Rizzo	847bf38369	Sync netmap sources with the version in our private tree. This commit contains large contributions from Giuseppe Lettieri and Stefano Garzarella, is partly supported by grants from Verisign and Cisco, and brings in the following: - fix zerocopy monitor ports and introduce copying monitor ports (the latter are lower performance but give access to all traffic in parallel with the application) - exclusive open mode, useful to implement solutions that recover from crashes of the main netmap client (suggested by Patrick Kelsey) - revised memory allocator in preparation for the 'passthrough mode' (ptnetmap) recently presented at bsdcan. ptnetmap is described in S. Garzarella, G. Lettieri, L. Rizzo; Virtual device passthrough for high speed VM networking, ACM/IEEE ANCS 2015, Oakland (CA) May 2015 http://info.iet.unipi.it/~luigi/research.html - fix rx CRC handing on ixl - add module dependencies for netmap when building drivers as modules - minor simplifications to device-specific routines (txsync, rxsync) - general code cleanup (remove unused variables, introduce macros to access rings and remove duplicate code, Applications do not need to be recompiled, unless of course they want to use the new features (monitors and exclusive open). Those willing to try this code on stable/10 can just update the sys/dev/netmap/, sys/net/netmap with the version in HEAD and apply the small patches to individual device drivers. MFC after: 1 month Sponsored by: (partly) Verisign, Cisco	2015-07-10 05:51:36 +00:00
Patrick Kelsey	e9617c305c	Fix if_loop so bpfwrite() can use it regardless of the state of bd_hdrcmplt. As if_loop does not use link-level headers, its behavior when used by bpfwrite() should be the same regardless of the state of bd_hdrcmplt. Without this change, libpcap (and other BPF users that work like it) fail when writing to loopback interfaces. Differential Revision: https://reviews.freebsd.org/D2989 Reviewed by: gnn, melifaro Approved by: jmallett (mentor) MFC after: 3 days	2015-07-06 02:12:49 +00:00
George V. Neville-Neil	987de84445	New AES modes for IPSec, user space components. Update setkey and libipsec to understand aes-gcm-16 as an encryption method. A partial commit of the work in review D2936. Submitted by: eri Reviewed by: jmg MFC after: 2 weeks Sponsored by: Rubicon Communications (Netgate)	2015-07-03 20:09:14 +00:00
Mark Murray	d1b06863fb	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00
Bjoern A. Zeeb	9656119da4	Another attempt to make this compile on more architectures after r284777.	2015-06-25 23:16:01 +00:00
Ermal Luçi	cd2bc2ef4e	Correct r284777 to use proper includes and remove dead code to unbreak kernel builds. Differential Revision: https://reviews.freebsd.org/D2847	2015-06-25 15:05:58 +00:00
Ermal Luçi	a5b789f65a	ALTQ FAIRQ discipline import from DragonFLY Differential Revision: https://reviews.freebsd.org/D2847 Reviewed by: glebius, wblock(manpage) Approved by: gnn(mentor) Obtained from: pfSense Sponsored by: Netgate	2015-06-24 19:16:41 +00:00
Dimitry Andric	966ab68df1	Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang 3.7.0. Reviewed by: marcel	2015-06-23 18:48:41 +00:00
Kristof Provost	581e697036	Fix panic when adding vtnet interfaces to a bridge vtnet interfaces are always in promiscuous mode (at least if the VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This confused the bridge code. Instead we now accept all enable/disable promiscuous commands (and always keep IFF_PROMISC set). There are also two issues with the if_bridge error handling. If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to disable promiscuous mode on the interface. That runs into an assert, because promiscuous mode was never set in the first place. (That's the panic reported in PR 200210.) We can only unset promiscuous mode if the interface actually is promiscuous. This goes against the reference counting done by if_promisc(), but only the first/last if_promic() calls can actually fail, so this is safe. A second issue is a double free of bif. It's already freed by bridge_delete_member(). PR: 200210 Differential Revision: https://reviews.freebsd.org/D2804 Reviewed by: philip (mentor)	2015-06-13 19:39:21 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Alexander V. Chernikov	5446b3f1d4	* Update SFF-8024 Identifier constants. * Fix SFF_8436_CC_EXT in SFF-8436 memory map. * Add SFF-8436/8636 bits (revision compliance/nominal bitrate). * Do some small style/type fixes.	2015-05-16 13:11:35 +00:00
Andrey V. Elsukov	c1b4f79dfa	Add an ability accept encapsulated packets from different sources by one gif(4) interface. Add new option "ignore_source" for gif(4) interface. When it is enabled, gif's encapcheck function requires match only for packet's destination address. Differential Revision: https://reviews.freebsd.org/D2004 Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-15 12:19:45 +00:00
Andrey V. Elsukov	eccfe69a5c	Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated packets on tunnel interfaces. Add support of these ioctls to gre(4), gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use if_fib value from ifnet structure, use proper value in gre(4) and me(4). Differential Revision: https://reviews.freebsd.org/D2462 No objection from: #network MFC after: 2 weeks Sponsored by: Yandex LLC	2015-05-12 07:37:27 +00:00
Hiroki Sato	1c27e6c39f	Fix a panic when VIMAGE is enabled. Spotted by: Nikos Vassiliadis	2015-05-12 03:35:45 +00:00
Andrey V. Elsukov	b347bc3b7a	Pass mtag argument into m_tag_locate() to continue the search from the last found mtag.	2015-05-06 14:02:57 +00:00
Gleb Smirnoff	a7dc945989	After r281643 an #ifdef IFT_FOO preprocessor directive returns false, since types became a enum C type. Some software uses such ifdefs to determine whether an operating systems supports certain interface type. Of course, such check is bogus. E.g. FreeBSD defines about 250 interface types, but supports only around 20. However, we need not upset such software so provide a set of defines. The current set was taken to suffice the dhcpd. Reported & tested by: Guy Yur <guyyur gmail.com>	2015-05-02 20:37:40 +00:00
Hiren Panchasara	a9467c3c45	Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need to specify exact number of cpus in loader.conf which get annoying when you have mix of machines which don't have equal number of total cpus. I propose "-1" as that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all available cpus. In collaboration with: davide Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2318 MFC after: 2 weeks Sponsored by: Limelight Networks	2015-04-25 16:12:06 +00:00
Gleb Smirnoff	9f7d0f4830	Don't propagate SIOCSIFCAPS from a vlan(4) to its parent. This leads to quite unexpected result of toggling capabilities on the neighbour vlan(4) interfaces. Reviewed by: melifaro, np Differential Revision: https://reviews.freebsd.org/D2310 Sponsored by: Nginx, Inc.	2015-04-23 13:19:00 +00:00
Craig Rodrigues	d9db52256e	Move zlib.c from net to libkern. It is not network-specific code and would be better as part of libkern instead. Move zlib.h and zutil.h from net/ to sys/ Update includes to use sys/zlib.h and sys/zutil.h instead of net/ Submitted by: Steve Kiernan stevek@juniper.net Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28 Relnotes: yes	2015-04-22 14:38:58 +00:00
Gleb Smirnoff	41c1a23326	Make IFMEDIA_DEBUG a kernel option. Sponsored by: Nginx, Inc.	2015-04-21 10:35:23 +00:00
Mark Johnston	b23cbbe6db	Move the definition of struct bpf_if to bpf.c. A couple of fields are still exposed via struct bpf_if_ext so that bpf_peers_present() can be inlined into its callers. However, this change eliminates some type duplication in the resulting CTF container, since otherwise ctfmerge(1) propagates the duplication through all types that contain a struct bpf_if. Differential Revision: https://reviews.freebsd.org/D2319 Reviewed by: melifaro, rpaulo	2015-04-20 22:08:11 +00:00
Alexander Motin	7144875388	Activate write-only optimization if bpf device opened with O_WRONLY. dhclient opens bpf as write-only to send packets. It never reads received packets from that descriptor, but processing them in kernel takes time. Especially much time takes packet timestamping on systems with expensive timecounter, such as bhyve guest, where network speed dropped in half. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2015-04-20 10:44:46 +00:00
Gleb Smirnoff	6456c04aae	Bring in if_types.h from projects/ifnet, where types are defined in enum.	2015-04-17 06:39:15 +00:00
Gleb Smirnoff	da8ae05d14	- Format copyright notices, VCS ids. - Run through unifdef(1).	2015-04-17 06:38:31 +00:00
Gleb Smirnoff	772e66a6fc	Move ALTQ from contrib to net/altq. The ALTQ code is for many years discontinued by its initial authors. In FreeBSD the code was already slightly edited during the pf(4) SMP project. It is about to be edited more in the projects/ifnet. Moving out of contrib also allows to remove several hacks to the make glue. Reviewed by: net@	2015-04-16 20:22:40 +00:00
Marcelo Araujo	546afaf83d	Remove duplicate header entry.	2015-04-16 02:44:37 +00:00
George V. Neville-Neil	6332e4cc38	Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory. Differential Revision: https://reviews.freebsd.org/D2291	2015-04-15 14:46:45 +00:00
George V. Neville-Neil	3085e1216e	Document internal interface types which are specific to FreeBSD.	2015-04-14 15:21:20 +00:00
Gleb Smirnoff	4651df570c	Redo r274966. Instead of global all-interface all-vnet undocumented sysctl, use per-interface flag, and document it. Sponsored by: Nginx, Inc.	2015-04-10 09:50:13 +00:00
George V. Neville-Neil	51d4054eeb	Revert 281276 as unnecessary. Proper change to be committed to the base polling code in a subsequent commit. Pointed out by: glebius Sponsored by: Rubicon Communications (NetGate)	2015-04-09 14:44:30 +00:00
George V. Neville-Neil	8a7ad10169	Add support for a netisr polling tunable, which allows run time switching of device polling rather than having it only be controlled by the compile time option. Summary: Rubicon Communications (Netgate) Reviewers: #network, hiren Reviewed By: #network, hiren Subscribers: hiren Differential Revision: https://reviews.freebsd.org/D2258	2015-04-08 20:25:51 +00:00
Eric Joyner	eb7e25b22f	ifmedia changes: - Extend the number of available subtypes for Ethernet media by using some of the ifmedia word's option bits to help denote subtypes. As a result, the number of possible Ethernet subtype values increases from 31 to 511. - Use some of those new values to define new media types. - lacp_compose_key() recgonizes the new Ethernet media types added. (Change made as required by a comment in if_media.h) - New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types. SIOCGIFMEDIA is retained for backwards compatibility. - Changes to ifconfig to allow it to handle the new extended media types. Submitted by: mike@karels.net (original), hselasky Reviewed by: jfvogel, gnn, hselasky Approved by: jfvogel (mentor), gnn (mentor) Differential Revision: http://reviews.freebsd.org/D1965	2015-04-07 21:31:17 +00:00
Andrey V. Elsukov	a4b65afcab	Fix a possible mbuf leak on interface departure. Reported by: Alexandre Martins	2015-03-26 23:40:22 +00:00
Gleb Smirnoff	c03044244f	Fix couple of fallouts from r280280. The first one is a simple typo, where counter was incremented on parent, instead of vlan(4) interface. The second is more complicated. Historically, in our stack the incoming packets are accounted in drivers, while incoming bytes for Ethernet drivers are accounted in ether_input_internal(). Thus, it should be removed from vlan(4) driver. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-03-25 16:01:46 +00:00
Gleb Smirnoff	b1828acf05	Make vlan_config() the signle point of validity checks. Sponsored by: Nginx, Inc.	2015-03-20 21:09:03 +00:00
Gleb Smirnoff	f941c31aae	In vlan_clone_match_ethervid(): - Use ifunit() instead of going through the interface list ourselves. - Remove unused parameter. - Move the most important comment above the function. Sponsored by: Nginx, Inc.	2015-03-20 20:42:58 +00:00
Gleb Smirnoff	67975c79be	Tiny comment fix.	2015-03-20 14:16:26 +00:00
Gleb Smirnoff	a58ea6b1cf	Now, when r272244 introduced counter(9) based counters for all interfaces, revert the r271538, which did that for vlan(4) only. No objections: melifaro Sponsored by: Nginx, Inc.	2015-03-20 14:05:17 +00:00
Gleb Smirnoff	3e8c6d74bb	Always lock the hash row of a source node when updating its 'states' counter. PR: 182401 Sponsored by: Nginx, Inc.	2015-03-17 12:19:28 +00:00
Andrey V. Elsukov	b57d97215e	Add if_input_default() method, that will be used for if_input initialization, when no input method specified before if_attach(). This prevents panics when if_input() method called directly e.g. from bpf(4) code. PR: 192426 Reviewed by: glebius MFC after: 1 week	2015-03-12 14:55:33 +00:00
Hans Petter Selasky	b7ba031ff7	Factor out mbuf hashing code from LAGG driver so that other network drivers can use it. This avoids some code duplication. Add missing default case to all switch statements while at it. Also move the hashing of the IPv6 flow field to layer 4 because the IPv6 flow field is constant on a per L4 connection basis and not on a per L3 network. Differential Revision: https://reviews.freebsd.org/D1987 Sponsored by: Mellanox Technologies MFC after: 1 month	2015-03-11 16:02:24 +00:00
Mark Johnston	aa14e9b7c9	Reimplement support for userland core dump compression using a new interface in kern_gzio.c. The old gzio interface was somewhat inflexible and has not worked properly since r272535: currently, the gzio functions are called with a range lock held on the output vnode, but kern_gzio.c does not pass the IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr() attempts to reacquire the range lock. Moreover, the new gzio interface can be used to implement kernel core compression. This change also modifies the kernel configuration options needed to enable userland core dump compression support: gzio is now an option rather than a device, and the COMPRESS_USER_CORES option is removed. Core dump compression is enabled using the kern.compress_user_cores sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D1832 Reviewed by: rpaulo Discussed with: kib	2015-03-09 03:50:53 +00:00
Gleb Smirnoff	607e337454	Optimize SIOCGIFMEDIA handling removing malloc(9) and double traversal of the list. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-03-04 15:00:20 +00:00
Hiroki Sato	c92a456b55	Fix group membership of cloned interfaces when one is moved by if_vmove(). In if_vmove(), if_detach_internal() and if_attach_internal() were called in series to detach and reattach the interface. When detaching, if_delgroup() was called and the interface leaves all of the group membership. And then upon attachment, if_addgroup(ifp, IFG_ALL) was called and it joined only "all" group again. This had a problem. Normally, a cloned interface automatically joins a group whose name is ifc_name of the cloner in addition to "all" upon creation. However, if_vmove() removed the membership and did not restore upon attachment. Differential Revision: https://reviews.freebsd.org/D1859	2015-03-02 20:00:03 +00:00
Gleb Smirnoff	fec642add7	Hide struct ifmultiaddr under _KERNEL, too.	2015-02-27 01:15:23 +00:00
Xin LI	08e5736618	Handle SIOCSIFCAP by propogating the request to the parent interface. This allows adding an vlan interface into a bridge. Thanks for William Katsak <wkatsak cs rutgers edu> for testing and fixing an issue in my previous patch draft. MFC after: 2 weeks	2015-02-20 18:39:12 +00:00
Gleb Smirnoff	e072c794ad	Now that all users of _WANT_IFADDR are fixed, remove this crutch and hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-02-19 23:16:10 +00:00
Gleb Smirnoff	0324938a0f	- Improve INET/INET6 scope. - style(9) declarations. - Make couple of local functions static.	2015-02-16 23:50:53 +00:00
Gleb Smirnoff	8dc98c2a36	Toss declarations to fix regular build and NO_INET6 build.	2015-02-16 21:52:28 +00:00
Gleb Smirnoff	f0b0fe5b45	Commit a miss from r278843. Pointy hat to: glebius	2015-02-16 18:33:33 +00:00
Brad Davis	936bdf364d	Fix build. Approved by: gibbs	2015-02-16 18:06:24 +00:00
Gleb Smirnoff	6004805208	Missed from r278831.	2015-02-16 06:02:46 +00:00
Hiroki Sato	25792b116f	Fix a panic when tearing down a vnet on a VIMAGE-enabled kernel. There was a race that bridge_ifdetach() could be called via ifnet_departure event handler after vnet_bridge_uninit(). PR: 195859 Reported by: Danilo Egea Gondolfo	2015-02-14 18:15:14 +00:00
Will Andrews	0e5f55bb95	Improve the distribution of LAGG port traffic. I edited the original change to retain the use of arc4random() as a seed for the hashing as a very basic defense against intentional lagg port selection. The author's original commit message (edited slightly): sys/net/ieee8023ad_lacp.c sys/net/if_lagg.c In lagg_hashmbuf, use the FNV hash instead of the old hash32_buf. The hash32 family of functions operate one octet at a time, and when run on a string s of length n, their output is equivalent to : ----- i=n-1 \ n \ (n-i-1) 32 ( seed^ + / 33^ * s[i] ) % 2^ / ----- i=0 The problem is that the last five bytes of input don't get multiplied by sufficiently many powers of 33 to rollover 2^32. That means that changing the last few bytes (but obviously not the very last) of input will always change the value of the hash by a multiple of 33. In the case of lagg_hashmbuf() with ipv4 input, the last four bytes are the TCP or UDP port numbers. Since the output of lagg_hashmbuf is always taken modulo the port count, and 3 is a common port count for a lagg, that's bad. It means that the UDP or TCP source port will never affect which lagg member is selected on a 3-port lagg. At 10Gbps, I was not able to measure any difference in CPU consumption between the old and new hash. Submitted by: asomers (original commit) Reviewed by: emaste, glebius MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1001723 on 2013/08/28 (original) 1114258 on 2015/01/22 (edit)	2015-01-23 00:06:35 +00:00
Gleb Smirnoff	efc6c51ffa	Back out r276841, r276756, r276747, r276746. The change in r276747 is very very questionable, since it makes vimages more dependent on each other. But the reason for the backout is that it screwed up shutting down the pf purge threads, and now kernel immedially panics on pf module unload. Although module unloading isn't an advertised feature of pf, it is very important for development process. I'd like to not backout r276746, since in general it is good. But since it has introduced numerous build breakages, that later were addressed in r276841, r276756, r276747, I need to back it out as well. Better replay it in clean fashion from scratch.	2015-01-22 01:23:16 +00:00
Adrian Chadd	b2bdc62a95	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)	2015-01-18 18:06:40 +00:00
Andrey V. Elsukov	504289ea5a	Fix condition and really sort ports. Also add comment describing the intent of this code. Reported by: sbruno MFC after: 1 week Sponsored by: Yandex LLC	2015-01-17 11:32:09 +00:00
Alexander V. Chernikov	29e0d65d7a	Eliminate SIOCGIFADDR handling in bpf. Quoting 19 years bpf.4 manual from bpf-1.2a1: " (SIOCGIFADDR is obsolete under BSD systems. SIOCGIFCONF should be used to query link-level addresses.) " * SIOCGIFADDR was not imported in NetBSD (bpf.c 1.36) and OpenBSD. * Last bits (e.g. manpage claiming SIOCGIFADDR exists) was cleaned from NetBSD via kern/21513 5 years ago, from OpenBSD via documentation/6352 5 years ago.	2015-01-16 10:09:28 +00:00
Andrey V. Elsukov	9c0265c6fd	Restore Ethernet-within-IP Encapsulation support that was broken after r273087. Move all checks from gif_output() into gif_transmit(). Previously they were checked always, because if_start always called gif_output. Now gif_transmit() can be called directly from if_bridge() code and we need do checks here. PR: 196646 MFC after: 1 week	2015-01-10 08:28:50 +00:00
Andrey V. Elsukov	3a9f9af803	Use if_name() macro instead of ifp->if_xname. MFH: 1 week	2015-01-10 03:29:17 +00:00
Andrey V. Elsukov	84d03ddad3	Fix an error introduced in r274246. Pass mtag argument into m_tag_locate() to continue the search from the last found mtag. X-MFC after: r274246	2015-01-10 03:26:46 +00:00
Andrey V. Elsukov	c26230adee	Move the recursion detection code into separate function gif_check_nesting(). Also make MTAG_GIF definition private to if_gif.c. MFC after: 1 week	2015-01-10 03:13:16 +00:00
Alexander V. Chernikov	ecf09f8321	Fix typo. Submitted by: Olivér Pintér	2015-01-09 20:29:13 +00:00
Alexander V. Chernikov	d63e657c04	* Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4: handle it in arc_output() instead of nd6_storelladdr(). * Remove IFT_ARCNET check from arpresolve() since arc_output() does not use arpresolve() to handle broadcast/multicast. This check was there since r84931. It looks like it was not used since r89099 (initial import of Arcnet support where multicast is handled separately). * Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output() calles nd6_storelladdr() for unicast addresses only. * Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now handles multicast by itself. As a result, we have the following pattern: all non-ethernet-style media have their own multicast map handling inside their appropriate routines. On the other hand, arpresolve() (and nd6_storelladdr()) which meant to be 'generic' ones de-facto handles ethernet-only multicast maps. MFC after: 3 weeks	2015-01-09 12:56:51 +00:00
Xin LI	681ed54caa	MFV r276759: libpcap 1.6.2. MFC after: 1 month	2015-01-06 22:29:12 +00:00
Craig Rodrigues	8d665c6ba8	Reapply previous patch to fix build. PR: 194515	2015-01-06 16:47:02 +00:00
Craig Rodrigues	c75820c756	Merge: r258322 from projects/pf branch Split functions that initialize various pf parts into their vimage parts and global parts. Since global parts appeared to be only mutex initializations, just abandon them and use MTX_SYSINIT() instead. Kill my incorrect VNET_FOREACH() iterator and instead use correct approach with VNET_SYSINIT(). PR: 194515 Differential Revision: D1309 Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com> Reviewed by: trociny, zec, gnn	2015-01-06 08:39:06 +00:00
Alexander V. Chernikov	3a7498636a	* Allocate hash tables separately * Make llt_hash() callback more flexible * Default hash size and hashing method is now per-af * Move lltable allocation to separate function	2015-01-05 17:23:02 +00:00
Alexander V. Chernikov	b44a7d5d87	* Use unified code for deleting entry by sockaddr instead of per-af one. * Remove now unused llt_delete_addr callback.	2015-01-03 19:09:06 +00:00
Alexander V. Chernikov	20dd899505	* Hide lltable implementation details in if_llatbl_var.h * Make most of lltable_* methods 'normal' functions instead of inline * Add lltable_get_<af\|ifp>() functions to access given lltable fields * Temporarily resurrect nd6_lookup() function	2015-01-03 16:04:28 +00:00
Andrey V. Elsukov	f188f14d43	Extern declarations in C files loses compile-time checking that the functions' calls match their definitions. Move them to header files. Reviewed by: jilles (previous version)	2014-12-25 21:32:37 +00:00
Andrey V. Elsukov	06cd035ab6	Remove if_stf.h. It contains only one function declaration used by if_stf(4). Also make in_stf_protosw structure static.	2014-12-23 20:54:59 +00:00
Andrey V. Elsukov	132c449079	Remove in_gif.h and in6_gif.h files. They only contain function declarations used by gif(4). Instead declare these functions in C files. Also make some variables static.	2014-12-23 16:17:37 +00:00
John Baldwin	fd22444c4f	Provide a dead version of if_get_counter. Submitted by: glebius Reported by: np	2014-12-12 16:10:42 +00:00
Alexander V. Chernikov	ee7e9a4e17	* Do not assume lle has sockaddr key after struct lle: use llt_fill_sa_entry() llt method to store lle address in sa. * Eliminate L3_ADDR macro and either reference IPv4/IPv6 address directly from lle or use newly-created llt_fill_sa_entry(). * Do not store sockaddr inside arp/ndp lle anymore.	2014-12-09 00:48:08 +00:00
Alexander V. Chernikov	d82ed5051c	Simplify lle lookup/create api by using addresses instead of sockaddrs.	2014-12-08 23:23:53 +00:00
Alexander V. Chernikov	73b52ad896	Use llt_prepare_static_entry method to prepare valid per-af static entry.	2014-12-07 23:59:44 +00:00
Alexander V. Chernikov	0368226e65	* Retire abstract llentry_free() in favor of lltable_drop_entry_queue() and explicit calls to RTENTRY_FREE_LOCKED() * Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6. * Rename <lltable_\|llt>_delete function to _delete_addr() to note that this function is used to external callers. Make this function maintain its own locking. * Use lookup/unlink/clear call chain from internal callers instead of delete_addr. * Fix LLE_DELETED flag handling	2014-12-07 23:08:07 +00:00
Alexander V. Chernikov	721cd2e032	Do not enforce particular lle storage scheme: * move lltable allocation to per-domain callbacks. * make llentry_link/unlink functions overridable llt methods. * make hash table traversal another overridable llt method.	2014-12-07 17:32:06 +00:00
Alexander V. Chernikov	a743ccd468	* Add llt_clear_entry() callback which is able to do all lle cleanup including unlinking/freeing * Relax locking in lltable_prefix_free_af/lltable_free * Do not pass @llt to lle free callback: it is always NULL now. * Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding unlock/lock sequinces * Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc() to retrieve preferred source address from lle hold queue and pass it instead of lle. * Finally, make nd6_create() create and return unlocked lle * Separate defrtr handling code from nd6_free(): use nd6_check_del_defrtr() to check if we need to keep entry instead of performing GC, use nd6_check_recalc_defrtr() to perform actual recalc on lle removal. * Move isRouter handling from nd6_cache_lladdr() to separate nd6_check_router() * Add initial code to maintain lle runtime flags in sync.	2014-12-07 15:42:46 +00:00
Andrey V. Elsukov	2dfcd0ae9d	Remove unneded check. No need to do m_pullup to the size that we prepended. MFC after: 1 week Sponsored by: Yandex LLC	2014-12-02 05:41:03 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Alexander V. Chernikov	ce313fdd71	* Unify lle table dump/prefix removal code. * Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes used by lltable code.	2014-11-30 14:35:01 +00:00
Alexander V. Chernikov	5d14e4cd76	Provide rte_<get\|set> methods to access rtentry for external consumers.	2014-11-29 19:27:43 +00:00
Alexander V. Chernikov	1be1588acf	* Make ifa_add_loopback_route() prepare gw before insertion. * Temporarily move ifa_switch_loopback_route() implementation to route.c	2014-11-29 15:02:45 +00:00
Bjoern A. Zeeb	2c3774c183	After r275196 unbreak NOIP and NOINET kernels by hiding an otherwise unused varibale under the proper #ifdef.	2014-11-28 14:51:49 +00:00
Alexander V. Chernikov	1a3a2b6798	Fix build broken by r275195.	2014-11-27 23:10:03 +00:00
Alexander V. Chernikov	74860d4f7c	Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr - return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead.	2014-11-27 23:06:25 +00:00
Alexander V. Chernikov	c69aeaad14	Do not try to copy header to @dst and than back to ethernet in case of pseudo_AF_HDRCMPLT: we copy media header from mbuf to 'struct sockaddr' @dst in bpf_movein, so mbuf already contains valid info.	2014-11-27 21:29:19 +00:00
Philip Paeps	894d1973f1	Add a sysctl `net.link.tap.deladdrs_on_close' to configure whether tap should delete configured addresses and routes when the interface is closed. Default is enabled (preserve current behaviour). MFC after: 1 week	2014-11-24 14:00:27 +00:00
Alexander V. Chernikov	acbc394dbe	Finish r274335#2: put RT_LOCK_DESTROY() back.	2014-11-23 17:47:12 +00:00
Alexander V. Chernikov	ec25679569	Do not try to unlock lle which is not locked. This is not a proper fix, proper one is on the way.	2014-11-23 17:45:49 +00:00
Alexander V. Chernikov	73d770287d	Do more fine-grained lltable locking: use table runtime lock as rare as we can.	2014-11-23 15:38:06 +00:00
Alexander V. Chernikov	9479029b1f	* Add lltable llt_hash callback * Move lltable items insertions/deletions to generic llt code.	2014-11-23 12:15:28 +00:00
Alexander V. Chernikov	7c066c18db	Use less-invasive approach for IF_AFDATA lock: convert into 2 locks: use rwlock accessible via external functions (IF_AFDATA_CFG_* -> if_afdata_cfg_()) for all control plane tasks use rmlock (IF_AFDATA_RUN_) for fast-path lookups.	2014-11-22 19:53:36 +00:00
Alexander V. Chernikov	27688dfe1d	Temporarily revert r274774.	2014-11-22 17:57:54 +00:00
Alexander V. Chernikov	2e47d2f953	Mark ifaddr/rtsock static entries RLLE_VALID.	2014-11-21 23:37:59 +00:00
Alexander V. Chernikov	9883e41b4b	Switch IF_AFDATA lock to rmlock	2014-11-21 02:28:56 +00:00
Alexander V. Chernikov	aca894e07b	Finish sync: remove if_faith.c	2014-11-21 01:27:27 +00:00
Alexander V. Chernikov	4d56c133fb	Sync to HEAD@r274766	2014-11-21 01:22:33 +00:00
Alexander V. Chernikov	f9723c7705	Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp we need to return. Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags fields for all relevant functions.	2014-11-20 22:41:59 +00:00
Alexander V. Chernikov	7f948f12f6	Finish r274175: do control plane MTU tracking. Update route MTU in case of ifnet MTU change. Add new RTF_FIXEDMTU to track explicitly specified MTU. Old behavior: ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU. User has to manually update all routes. ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU. However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu gets updated. New behavior: ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them. ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu. route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag. route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag. PR: 194238 MFC after: 1 month CR: D1125	2014-11-17 01:05:29 +00:00
Alexander V. Chernikov	df629abf3e	Rework LLE code locking: * struct llentry is now basically split into 2 pieces: all fields within 64 bytes (amd64) are now protected by both ifdata lock AND lle lock, e.g. you require both locks to be held exclusively for modification. All data necessary for fast path operations is kept here. Some fields were added: - r_l3addr - makes lookup key liev within first 64 bytes. - r_flags - flags, containing pre-compiled decision whether given lle contains usable data or not. Current the only flag is RLLE_VALID. - r_len - prepend data len, currently unused - r_kick - used to provide feedback to control plane (see below). All other fields are protected by lle lock. * Add simple state machine for ARP to handle "about to expire" case: Current model (for the fast path) is the following: - rlock afdata - find / rlock rte - runlock afdata - see if "expire time" is approaching (time_uptime + la->la_preempt > la->la_expire) - if true, call arprequest() and decrease la_preempt - store MAC and runlock rte New model (data plane): - rlock afdata - find rte - check if it can be used using r_* fields only - if true, store MAC - if r_kick field != 0 set it to 0. - runlock afdata New mode (control plane): - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries) seconds instead of V_arpt_keep. - on first timer invocation change state from ARP_LLINFO_REACHABLE to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in V_arpt_rexmit (default to 1 sec). - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks for r_kick value: reschedule if not changed, and send arprequest() if set to zero (e.g. entry was used). * Convert IPv4 path to use new single-lock approach. IPv6 bits to follow. * Slow down in_arpinput(): now valid reply will (in most cases) require acquiring afdata WLOCK twice. This is requirement for storing changed lle data. This change will be slightly optimized in future. * Provide explicit hash link/unlink functions for both ipv4/ipv6 code. This will probably be moved to generic lle code once we have per-AF hashing callback inside lltable. * Perform lle unlink on deletion immediately instead of delaying it to the timer routine. * Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the presence of lle reference used for safe callout calls.	2014-11-16 20:12:49 +00:00
Alexander V. Chernikov	98af5b3ad8	Finish r274335: * put RT_LOCK_DESTROY() back * remove unused RT_UNLOCK_COND macro	2014-11-16 18:44:46 +00:00
Alexander V. Chernikov	ac2cf5d37e	Revert r274585: rte lock is properly destroyed in uma dtor callback. Pointed by: glebius	2014-11-16 18:15:23 +00:00
Alexander V. Chernikov	206344ac05	Remove unused rt_endzero define. Remove rt_mtx from public rtentry version.	2014-11-16 15:31:49 +00:00
Alexander V. Chernikov	3cb04899de	Make witness happy: destroy rte lock before free. MFC after: 2 weeks	2014-11-16 14:56:31 +00:00
Alexander V. Chernikov	b4b1367ae4	* Move lle creation/deletion from lla_lookup to separate functions: lla_lookup(LLE_CREATE) -> lla_create lla_lookup(LLE_DELETE) -> lla_delete Assume lla_create to return LLE_EXCLUSIVE lock for lle. * Rework lla_rt_output to perform all lle changes under afdata WLOCK. * change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().	2014-11-15 18:54:07 +00:00
Hans Petter Selasky	3c7c188c16	Fix some minor TSO issues: - Improve description of TSO limits. - Remove a not needed KASSERT() - Remove some not needed variable casts. Sponsored by: Mellanox Technologies Discussed with: lstewart @ MFC after: 1 week	2014-11-11 12:05:59 +00:00
Gleb Smirnoff	00f22c06e8	Move struct ether_vlan_header to ethernet.h, out of if_vlan_var.h, since this structure is protocol definition, not part of implementation.	2014-11-11 10:22:33 +00:00
Luigi Rizzo	0506889c15	return kernel-supplied error if available. Also fix field names in a comment.	2014-11-10 08:31:56 +00:00
Alexander V. Chernikov	f7bab8d0dd	Switch route radix to dual-lock model: use rmlock for data patch access, and config rwlock for conrol plane processing. Route table changes require bock locks held.	2014-11-10 00:07:06 +00:00
Alexander V. Chernikov	69d149adf5	Since we no longer return individual radix entries, it is not possible to do per-rte accounting. Remove rt_kpktsent.	2014-11-09 22:59:21 +00:00
Alexander V. Chernikov	36f34ac70b	Fix nd6_output_flush() prototype. Remove 'net/route_internal.h' header from stf.	2014-11-09 22:16:50 +00:00
Alexander V. Chernikov	603eaf792b	Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. No objections from: net@	2014-11-09 21:33:01 +00:00
Alexander V. Chernikov	1f26a13f70	Remove net/route_internal header from if_disc and if_faith.	2014-11-09 16:58:36 +00:00
Alexander V. Chernikov	033074c440	Replace 'struct route ' if_output() argument with 'struct nhop_info '. Leave 'struct route' as is for legacy routing api users. Remove most of rtalloc_ign*-derived functions.	2014-11-09 16:33:04 +00:00
Gleb Smirnoff	1241937290	Remove remnants of if_ef(4).	2014-11-09 11:13:15 +00:00
Gleb Smirnoff	4ea05db88e	Use standard mtx(9), rwlock(9), sx(9) system initialization macros instead of doing initialization manually. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-09 11:11:08 +00:00
Alexander V. Chernikov	ea491b8afd	Remove unused fields from old radix_node_head.	2014-11-09 00:43:14 +00:00
Alexander V. Chernikov	55e5eda676	Separate radix and routing: use different structures for route and for other customers. Introduce new 'struct rib_head' for routing purposes and make all routing api use it.	2014-11-09 00:36:39 +00:00
Alexander V. Chernikov	a9413f6ca0	Sync to HEAD@r274297.	2014-11-08 18:13:35 +00:00
Alexander V. Chernikov	1398ffe5bc	Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users to use new rt_foreach_fib() instead of hand-rolling cycles.	2014-11-08 16:38:15 +00:00
Bjoern A. Zeeb	4dbd7c5dc4	After r274246 make the tree compile again. gcc requires variables to be initialised in two places. One of them is correctly used only under the same conditional though. For module builds properly check if the kernel supports INET or INET6, as otherwise various mips kernels without IPv6 support would fail to build.	2014-11-08 14:41:32 +00:00
Gleb Smirnoff	f4507b7166	ifindex_alloc_locked() never fails and doesn't have no-lock version, so change the prototype. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-08 07:23:01 +00:00
Alexander V. Chernikov	22b08fd8b7	Split radix implementation and system route table structure: use new 'struct radix_head' for radix.	2014-11-07 22:52:02 +00:00
Alexander V. Chernikov	389d731d64	Provide typedefs for radix functions.	2014-11-07 22:02:44 +00:00
Andrey V. Elsukov	f325335caf	Overhaul if_gre(4). Split it into two modules: if_gre(4) for GRE encapsulation and if_me(4) for minimal encapsulation within IP. gre(4) changes: * convert to if_transmit; * rework locking: protect access to softc with rmlock, protect from concurrent ioctls with sx lock; * correct interface accounting for outgoing datagramms (count only payload size); * implement generic support for using IPv6 as delivery header; * make implementation conform to the RFC 2784 and partially to RFC 2890; * add support for GRE checksums - calculate for outgoing datagramms and check for inconming datagramms; * add support for sending sequence number in GRE header; * remove support of cached routes. This fixes problem, when gre(4) doesn't work at system startup. But this also removes support for having tunnels with the same addresses for inner and outer header. * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD. Use our standard ioctls for tunnels. me(4): * implementation conform to RFC 2004; * use if_transmit; * use the same locking model as gre(4); PR: 164475 Differential Revision: D1023 No objections from: net@ Relnotes: yes Sponsored by: Yandex LLC	2014-11-07 19:13:19 +00:00
Gleb Smirnoff	833e8dc5ab	Remove struct arpcom. It is unused by most interface types, that allocate it, except Ethernet, where it carried ng_ether(4) pointer. For now carry the pointer in if_l2com directly. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-07 15:14:10 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Gleb Smirnoff	e6abef0918	Remove useless structure ifindex_entry. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-07 09:15:39 +00:00
Alexander V. Chernikov	043d919e33	Add new rib4/rib6 series of functions returning per-rte info packed on stack. Convert ng_netflow to use new routing API.	2014-11-07 02:04:48 +00:00
Alexander V. Chernikov	064b1bdb2d	Convert lle rtchecks to use new routing API. For inet/ case, this involves reverting r225947 which seem to be pretty strange commit and should be reverted in HEAD ad well.	2014-11-06 23:35:22 +00:00
Alexander V. Chernikov	57c3556b58	Fix build. Pointy hat to: melifaro	2014-11-06 17:50:35 +00:00
Alexander V. Chernikov	146a181f28	Finish r274118: remove useless fields from struct domain. Sponsored by: Yandex LLC	2014-11-06 14:39:04 +00:00
Alexander V. Chernikov	1a75e3b20f	Make checks for rt_mtu generic: Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking might be an option in some situation, it is not feasible to do MTU checks there: generic (or per-domain) routing code is perfectly capable of doing this. We currrently have 3 places where MTU is altered: 1) route addition. In this case domain overrides radix _addroute callback (in[6]_addroute) and all necessary checks/fixes are/can be done there. 2) route change (especially, GW change). In this case, there are no explicit per-domain calls, but one can override rte by setting ifa_rtrequest hook to domain handler (inet6 does this). 3) ifconfig ifaceX mtu YYYY In this case, we have no callbacks, but ip[6]_output performes runtime checks and decreases rt_mtu if necessary. Generally, the goals are to be able to handle all MTU changes in control plane, not in runtime part, and properly deal with increased interface MTU. This commit changes the following: * removes hooks setting MTU from drivers side * adds proper per-doman MTU checks for case 1) * adds generic MTU check for case 2) * The latter is done by using new dom_ifmtu callback since if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size. However, IPv6 mtu might be different from if_mtu one (e.g. default 1280) for some cases, so we need an abstract way to know maximum MTU size for given interface and domain. * moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies user-supplied data which must be checked. * removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to use this functions on new non-inserted rte. More changes will follow soon. MFC after: 1 month Sponsored by: Yandex LLC	2014-11-06 13:13:09 +00:00
Alexander V. Chernikov	69b74805d5	Convert gif and stf to use new routing api.	2014-11-04 18:48:13 +00:00
Alexander V. Chernikov	5c9ef37854	Sync to HEAD@r274095.	2014-11-04 18:22:33 +00:00
Alexander V. Chernikov	8c3cfe0be0	Hide 'struct rtentry' and all its macro inside new header: net/route_internal.h The goal is to make its opaque for all code except route/rtsock and proto domain _rmx.	2014-11-04 17:28:13 +00:00
Alexander V. Chernikov	a9ac00b76b	Convert in6p_lookup_mcast_ifp() to use new routing api. * Add special fib6_lookup_nh_ifp() to return rt_ifp instead of rt_ifa->ifa_ifp for that.	2014-11-04 17:05:24 +00:00
Alexander V. Chernikov	257480b8ab	Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes	2014-11-04 15:39:56 +00:00
Hans Petter Selasky	f8ca61996e	Clarify TSO segment limit comment and remove two TABs to make lines a bit shorter. Sponsored by: Mellanox Technologies	2014-11-03 13:02:58 +00:00
Mark Murray	10cb24248a	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)	2014-10-30 21:21:53 +00:00
Konstantin Belousov	0a2c94b86e	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
Hans Petter Selasky	0e1152fcc2	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
Alexander V. Chernikov	30514718e7	Convert several places inside netinet6/ to new api.	2014-10-25 22:53:08 +00:00
Alexander V. Chernikov	7b42f6fae2	* Convert TOE framework to use new routing api. * Add fib6_lookup_nh_ext(). * Rename union structures: nhop64_basic -> nhopu_basic, nhop64_extended -> nhopu_extended	2014-10-25 18:25:00 +00:00
Alexander V. Chernikov	9f65116cc1	* Increase nh_flags to be u16 thus reducing nhop payload to be 48 bytes * Use NHF_ namespace for all nhop flags * Rename nhop_data -> nhop_prepend * Rename fib4_lookup_nh_extended -> fib4_lookup_nh_ext * Add "flags" argument to fib4_lookup_nh_ext() to specify whether we want returned nh_ext structure to be refcounted or not.	2014-10-25 15:32:56 +00:00
Alexander V. Chernikov	b863adaaf3	Convert last piece of ip_forward to use new rouing api.	2014-10-24 22:00:25 +00:00
Andrey V. Elsukov	a663aa4ce8	Remove redundant check and m_pullup() call.	2014-10-24 13:34:22 +00:00
Alexander V. Chernikov	f50706648c	Add new fib4_lookup_nh_extended() which fills in nhop4_extended structure without doinf L2 resolve. It also requires freeing references by calling fib4_free_nh_ext(). Convert in_pcbladdr() to use it. Convert tcp_maxmtu() to use it.	2014-10-23 23:11:04 +00:00
Alexander V. Chernikov	2bb83c79f6	Rename ip_sendmbuf to fib4_sendmbuf() and move it to rt_nhops api. Convert IPv4 SAS to use new routing api.	2014-10-23 21:09:14 +00:00
Andrey V. Elsukov	61dc434406	Move if_get_counter initialization from if_attach into if_alloc. Also, initialize all counters before ifnet will become available in the system. This fixes possible access to uninitialized ifned fields. PR: 194550	2014-10-23 14:29:52 +00:00
Luigi Rizzo	11a5be0f60	since we cast a pointer, use the correct signedness (this was already in, and got lost in a recent update).	2014-10-22 18:55:36 +00:00
Bryan Venteicher	854f7e89e6	Use the size of the Ethernet address, not the entire header, when copying into forwarding entry. Reported by: Coverity CID: 1248849	2014-10-21 05:45:57 +00:00
Bryan Venteicher	007054f070	Add vxlan interface vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in a UDP packet. This implementation is based on RFC7348. Currently, the IPv6 support is not fully compliant with the specification: we should be able to receive UPDv6 packets with a zero checksum, but we need to support RFC6935 first. Patches for this should come soon. Encapsulation protocols such as vxlan emphasize the need for the FreeBSD network stack to support batching, GRO, and GSO. Each frame has to make two trips through the network stack, and each frame will be at most MTU sized. Performance suffers accordingly. Some latest generation NICs have begun to support vxlan HW offloads that we should also take advantage of. VIMAGE support should also be added soon. Differential Revision: https://reviews.freebsd.org/D384 Reviewed by: gnn Relnotes: yes	2014-10-20 14:42:42 +00:00
Alexander V. Chernikov	b4e8f808bf	Switch IPv4 output path to use new routing api. The goals of the new API is to provide consumers with minimal needed information, but as fast as possible. So we provide full nexthop info copied into alighed on-cache structure instead of rte/ia pointers, their refcounts and locks. This does not provide solution for protecting from egress ifp destruction, but does not make it any worse. Current changes: nhops: Add fib4_lookup_prepend() function which stores either full L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2 info exists and we have to take "slow" path. ip_output: Currently ip[ 46]_output consumers use 'struct route' for the following purposes: 1) double lookup avoidance(route caching) 2) plain route caching 3) get path MTU to be able to notify source. The former pattern is mostly used by various tunnels (gif, gre, stf). (Actually, gre is the only remaining, others were already converted. Their locking model did not scale good enogh to benefit from such caching, so we have (temporarily) removed it without any performance loss). Plain route caching used by SCTP is simply wrong and should be removed. Temporary break it for now just to be able to compile. Optimize path mtu reporting by providing it in new 'route_info' stucture. Minimize games with @ia locking/refcounting for route lookup: add special nhop[46]_extended structure to store more route attributes. Pointer to given structure can be passed to fib4_lookup_prepend() to indicate we want this info (we actually needs it for UDP and raw IP). ether_output: Provide light-weight ether_output2() call to deal with transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/ other L2 hooks before actually transmitting frame by if_transmit()). Add a hack based on new RT_NHOP ro_flag to distinguish which version should we call. Better way is probably to add a new "if_output_frame" driver callbacks. Next steps: * Convert ip_fastfwd part * Implement auto-growing array for per-radix nexthops * Implement LLE tracking for nexthop calculations to be able to immediately provide all necessary info in single route lookup for gateway routes * Switch radix locking scheme to runtime/cfg lock * Implement multipath support for rtsock * Implement "tracked nexthops" for tunnels (e.g. _proper_ nexthop caching) * Add IPv6 support for remaining parts (postponed not to interfere with user/ae/inet6 branch) * Consider adding "if_output_frame" driver call to ease logical frame pushing.	2014-10-19 21:07:35 +00:00
Alexander V. Chernikov	d74b9a2c6a	* Remove route caching in if_stf. * Copy necessary in6_ifa on stack instead of playing with refcounts.	2014-10-17 15:07:04 +00:00

... 2 3 4 5 6 ...

3608 Commits