freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	b98b5ae8ec	Add interface reference counting to if_lagg. Using plain ifunit() looks like request for troubles. MFC after: 2 weeks	2017-04-21 13:45:01 +00:00
Jung-uk Kim	af1973281e	Use kmem_malloc() instead of malloc(9) for the native amd64 filter. r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no longer executable. Discussed with: kib, alc	2017-04-17 22:02:09 +00:00
Jung-uk Kim	c7ff2b13d1	Remove an unnecessary declaration missed in the previous commit.	2017-04-17 21:57:23 +00:00
Jung-uk Kim	e329e330d4	Move declarations for a machine-dependent function to the header file.	2017-04-17 21:51:26 +00:00
Patrick Kelsey	2f8c6c0a58	Fix userland tools that don't check the format of routing socket messages before accessing message fields that may not be present, removing dead/duplicate/misleading code along the way. Document the message format for each routing socket message in route.h. Fix a bug in usr.bin/netstat introduced in r287351 that resulted in pointer computation with essentially random 16-bit offsets and dereferencing of the results. Reviewed by: ae MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10330	2017-04-16 19:17:10 +00:00
Andrey V. Elsukov	070f87e878	Inherit IPv6 checksum offloading flags to vlan interfaces. if_vlan(4) interfaces inherit IPv4 checksum offloading flags from the parent when VLAN_HWCSUM and VLAN_HWTAGGING flags are present on the parent interface. Do the same for IPv6 checksum offloading flags. Reported by: Harry Schmalzbauer Reviewed by: np, gnn MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10356	2017-04-11 19:23:25 +00:00
Andrey V. Elsukov	c00bf73076	Do not adjust interface MTU automatically. Leave this task to the system administrator. This restores the behavior that was prior to r274246. No objection from: #network MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10215	2017-04-11 08:56:18 +00:00
Patrick Kelsey	75580d5881	Fixed typo in comment found while reading commit email for fix of other typo in same comment. ned -> need MFC after: 3 days	2017-04-08 04:50:50 +00:00
Patrick Kelsey	59f35a8290	Fixed typo in comment. patckets -> packets MFC after: 3 days	2017-04-08 04:45:52 +00:00
Patrick Kelsey	68ce5a03a2	Fix typo in comment. logest -> longest MFC after: 3 days	2017-04-08 04:37:01 +00:00
Sean Bruno	60596476cf	Move pause frame counter out of struct if_ctx and into struct if_softc_ctx_t so that we can use it in iflib to detect pause frames. The igb(4) driver definitely used to use this in its old timer function and I see no reason to restrict it to that driver only. Sponsored by: Limelight Networks	2017-04-07 00:33:03 +00:00
Sean Bruno	ea351d3f14	Allow MSIX to be turned off by tuneable per interface, per driver. Sponsored by: Limelight Networks	2017-04-04 21:03:34 +00:00
Andrey V. Elsukov	88d950a650	Remove "IPFW static rules" rmlock. Make PFIL's lock global and use it for this purpose. This reduces the number of locks needed to acquire for each packet. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC No objection from: #network Differential Revision: https://reviews.freebsd.org/D10154	2017-04-03 13:35:04 +00:00
Sean Bruno	2b2fc97356	Don't call init functions directly from the timer/watchdog function. Enqueue this in the admin task now that it can process it. Submitted by: Matt Macy <mmacy@nextbsd.org> Sponsored by: Limelight Networks	2017-03-30 16:54:01 +00:00
Sean Bruno	5c1ff25517	Assert IFF_DRV_OACTIVE in iflib_timer() when the "hung" case is detected so that iflib's admin task can still process the reset directive and restore functionality. Sponsored by: Limelight Networks	2017-03-30 16:03:51 +00:00
Andrey V. Elsukov	af48c203d6	ake pfil's locking macros private. Obtained from: Yandex LLC MFC after: 1 week	2017-03-27 08:18:13 +00:00
Andrey V. Elsukov	52b8eb0b31	Declare module version. MFC after: 1 week	2017-03-27 07:56:41 +00:00
Ermal Luçi	4e950412ff	Correct handling of ALTQ with epair(4) interfaces but presenting that ALTQ(9) is supported. Approved by: ae MFC after: 2 weeks	2017-03-24 00:55:16 +00:00
Kristof Provost	2f8fb3a868	pf: Fix possible shutdown race Prevent possible races in the pf_unload() / pf_purge_thread() shutdown code. Lock the pf_purge_thread() with the new pf_end_lock to prevent these races. Use a shared/exclusive lock, as we need to also acquire another sx lock (VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload() to sleep, Pointed out by: eri, glebius, jhb Differential Revision: https://reviews.freebsd.org/D10026	2017-03-22 21:18:18 +00:00
Sean Bruno	5e88838850	Change casting to a uintptr_t to be compatible with non-x86 architectures. Submitted by: Matt Macy <mmacy@nextbsd.org> Reported by: rpokala Sponsored by: Limelight Networks	2017-03-14 22:25:07 +00:00
Sean Bruno	0a1b74a3d1	Fixup LINT by using uint64_t type as we do on all other calls to PNMB() Found with Jenkins. Reported by: lwshu Sponsored by: Limelight Networks	2017-03-14 15:08:56 +00:00
Sean Bruno	95246abb21	IFLIB updates - unconditionally enable BUS_DMA on non-x86 architectures - speed up rxd zeroing via customized function - support out of order updates to rxd's - add prefetching to hardware descriptor rings - only prefetch on 10G or faster hardware - add seperate tx queue intr function - preliminary rework of NETMAP interfaces, WIP Submitted by: Matt Macy <mmacy@nextbsd.org> Sponsored by: Limelight Networks	2017-03-13 22:53:06 +00:00
Andrey V. Elsukov	250a8e2720	Ignore ifnet renaming in the bpf ifnet departure handler. PR: 213015 MFC after: 1 week	2017-03-13 09:04:10 +00:00
Andrey V. Elsukov	350e622703	Remove now unneded cast.	2017-03-08 08:09:41 +00:00
Andrey V. Elsukov	22986c6740	Introduce the concept of IPsec security policies scope. Currently are defined three scopes: global, ifnet, and pcb. Generic security policies that IKE daemon can add via PF_KEY interface or an administrator creates with setkey(8) utility have GLOBAL scope. Such policies can be applied by the kernel to outgoing packets and checked agains inbound packets after IPsec processing. Security policies created by if_ipsec(4) interfaces have IFNET scope. Such policies are applied to packets that are passed through if_ipsec(4) interface. And security policies created by application using setsockopt() IP_IPSEC_POLICY option have PCB scope. Such policies are applied to packets related to specific socket. Currently there is no way to list PCB policies via setkey(8) utility. Modify setkey(8) and libipsec(3) to be able distinguish the scope of security policies in the `setkey -DP` listing. Add two optional flags: '-t' to list only policies related to virtual tunneling interfaces, i.e. policies with IFNET scope, and '-g' to list only policies with GLOBAL scope. By default policies from all scopes are listed. To implement this PF_KEY's sadb_x_policy structure was modified. sadb_x_policy_reserved field is used to pass the policy scope from the kernel to userland. SADB_SPDDUMP message extended to support filtering by scope: sadb_msg_satype field is used to specify bit mask of requested scopes. For IFNET policies the sadb_x_policy_priority field of struct sadb_x_policy is used to pass if_ipsec's interface if_index to the userland. For GLOBAL policies sadb_x_policy_priority is used only to manage order of security policies in the SPDB. For IFNET policies it is not used, so it can be used to keep if_index. After this change the output of `setkey -DP` now looks like: # setkey -DPt 0.0.0.0/0[any] 0.0.0.0/0[any] any in ipsec esp/tunnel/87.250.242.144-87.250.242.145/unique:145 spid=7 seq=3 pid=58025 scope=ifnet ifname=ipsec0 refcnt=1 # setkey -DPg ::/0 ::/0 icmp6 135,0 out none spid=5 seq=1 pid=872 scope=global refcnt=1 No objection from: #network Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9805	2017-03-07 00:13:53 +00:00
Sean Bruno	d945ed6472	Make gtaskqueue compatible with drm-next such that they can be used with the linuxkpi tasklets. Submitted by: mmacy@nextbsd.org Reported by: hps	2017-03-01 18:37:35 +00:00
Warner Losh	607a4c520e	Back out r314471. In https://reviews.freebsd.org/D1858 it was clear that this shouldn't go in. I was unaware when I merged the pull request. I don't wish to upset the status quo, so backout per project practice. Pull Request: https://github.com/freebsd/freebsd/pull/92 Noted by: hrs@	2017-03-01 05:38:04 +00:00
Warner Losh	7d85b06ecf	Fix VNET - DAD detected duplicate IPv6 address Assign a hopefully unique, locally administered etheraddr. - for epairNa & epairNb Submitted by: Catalin <sslevil@users.noreply.github.com> Pull Request: https://github.com/freebsd/freebsd/pull/92	2017-03-01 04:47:22 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Gleb Smirnoff	efe3b0de14	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
Jonathan T. Looney	e80039007a	Do some minimal work to better conform to the 802.3ad (LACP) standard. In particular, don't set the synchronized bit for the peer unless it truly appears to be synchronized to us. Also, don't set our own synchronized bit unless we have actually seen a remote system. Prior to this change, we were seeing some strange behavior, such as: 1. We send an advertisement with the Activity, Aggregation, and Default flags, followed by an advertisement with the Activity, Aggregation, Synchronization, and Default flags. However, we hadn't seen an advertisement from another peer and were still advertising the default (NULL) peer. A closer examination of the in-kernel data structures (using kgdb) showed that the system had added the default (NULL) peer as a valid aggregator for the segment. 2. We were receiving an advertisement from a peer that included the default (NULL) peer instead of including our system information. However, we responded with an advertisement that included the Synchronization flag for both our system and the peer. (Since the peer's advertisement did not include our system information, we shouldn't add the synchronization bit for the peer.) This patch corrects those two items. Reviewed by: smh MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9485	2017-02-26 00:19:02 +00:00
Pedro F. Giffuni	e099b90b80	sys: Replace zero with NULL for pointers. Found with: devel/coccinelle MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9694	2017-02-22 02:35:59 +00:00
Jason A. Harmening	e2a8d17887	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587	2017-02-19 02:03:09 +00:00
Xin LI	b86fcc147f	MFV r313759: license change for a few headers (4 clause BSD to 3 clause BSD). MFC after: 28 days X-MFC-with: r313695	2017-02-15 07:22:47 +00:00
Xin LI	ada6f083b9	MFV r313676: libpcap 1.8.1 MFC after: 1 month	2017-02-13 08:23:39 +00:00
Gleb Smirnoff	10addc1eb5	Last consumer of _WANT_RTENTRY gone.	2017-02-10 17:37:04 +00:00
Andrey V. Elsukov	fcf596178b	Merge projects/ipsec into head/. Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352	2017-02-06 08:49:57 +00:00
Sean Bruno	67af525c55	Delete duplicate break.	2017-02-04 18:25:09 +00:00
Jason A. Harmening	ad62ba6e96	Revert r313037 The switch to get_pcpu() in MI code seems to cause hangs on MIPS. Back out until we can get a better idea of what's happening there. Reported by: kan, lidl	2017-02-04 06:24:49 +00:00
Jason A. Harmening	65ed483615	Implement get_pcpu() for the remaining architectures and use it to replace pcpu_find(curcpu) in MI code.	2017-02-01 03:32:49 +00:00
Stephen J. Kiernan	d0b2cad1ca	Add the folowing set accessor functions for recently-added members of ifnet structure: if_gethwtsomax(), if_sethwtsomax() - if_hw_tsomax if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize() - if_hw_tsomaxsegsize Update em and vnic drivers which had already been coverted to use accessor functions for the other ifnet structure members. Reviewed by: erj Approved by: sjg (mentor) Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D8544	2017-01-31 16:12:31 +00:00
Luiz Otavio O Souza	13157b2baf	Do not update the lagg link layer address when destroying a lagg clone. This would enqueue an event to send the gratuitous arp on a dying lagg interface without any physical ports attached to it. Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too late, when the ifp data structure is already freed. Fix that too. Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-30 03:04:33 +00:00
Luiz Otavio O Souza	d177868c16	The stf(4) interface name does not conform with the default naming convention for interfaces, because only one stf(4) interface can exist in the system. This disallow the use of unit numbers different than 0, however, it is possible to create the clone without specify the unit number (wildcard). In the wildcard case we must update the interface name before return. This fix an infinite recursion in pf code that keeps track of network interfaces and groups: 1 - a group for the cloned type of the interface is added (stf in this case); 2 - the system will now try to add an interface named stf (instead of stf0) to stf group; 3 - when pfi_kif_attach() tries to search for an already existing 'stf' interface, the 'stf' group is returned and thus the group is added as an interface of itself; This will now cause a crash at the first attempt to traverse the groups which the stf interface belongs (which loops over itself). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-29 18:41:09 +00:00
Andriy Voskoboinyk	2bbd06fc33	Garbage collect IFT_IEEE80211 (but leave the define for possible reuse) This interface type ("a parent interface of wlanX") is not used since r287197 Reviewed by: adrian, glebius Differential Revision: https://reviews.freebsd.org/D9308	2017-01-28 17:08:40 +00:00
Sean Bruno	835809f99f	Fix i386 compile failure by moving needed closing parenthesis out of conditional block. Submitted by: hiren Reported by: cy	2017-01-28 15:44:14 +00:00
Dexuan Cui	6597559ea7	ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h Thank glebius for pointing this out: "The network stuff shall not be added to sys/eventhandler.h" Reviewed by: David_A_Bright_DELL.com, sephe, glebius Approved by: sephe (mentor) MFC after: 2 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9345	2017-01-28 07:26:42 +00:00
Sean Bruno	e035717e57	IFLIB updates: We found routing performance dropped significantly when configuring FreeBSD as a router, we are applying the following changes in order to resolve those issues and hopefully perform better. - don't prefetch the flags array, we usually don't need it - prefetch the next cache line of each of the software descriptor arrays as well as the first cache line of each of the next four packets' mbufs and clusters - reduce max copy size to 63 bytes - convert rx soft descriptors from array of structures to a structure of arrays - update copyrights Submitted by: Matt Macy <mmacy@nextbsd.org>	2017-01-27 23:08:06 +00:00
Sean Bruno	96eeabefbe	Replace customized busmaster code with standardized setup call. Reported by: jhb	2017-01-27 22:30:27 +00:00
Sean Bruno	69b7fc3e67	Minor style annoyance. Submitted by: bde	2017-01-26 13:50:09 +00:00
Kristof Provost	ab5cda71df	bridge: Release the bridge lock when calling bridge_set_ifcap() This calls ioctl() handlers for the different interfaces in the bridge. These handlers expect to get called in an ioctl context where it's safe for them to sleep. We may not sleep with the bridge lock held. However, we still need to protect the interface list, to ensure it doesn't get changed while we iterate over it. Use BRIDGE_XLOCK(), which prevents bridge members from being removed. Adding bridge members is safe, because it uses LIST_INSERT_HEAD(). This caused panics when adding xen interfaces to a bridge. PR: 216304 Reviewed by: ae MFC after: 1 week Sponsored by: RootBSD Differential Revision: https://reviews.freebsd.org/D9290	2017-01-25 21:25:26 +00:00
Luiz Otavio O Souza	338e227ac0	After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2017-01-25 19:04:08 +00:00
Sean Bruno	f7ae9a84e3	Add error checking to the pci_find_cap(, PCIY_MSIX,) call that is returns success and a good value. Only then try to use it and set the MSIX_ENABLE bit. With the current em(4) driver we have observed failures in this case in a specific environment when pci_find_cap() would not return the assumed value, which meant we ended up writing to PCI register 2 (PCI_DEVICE_ID) which is read-only. PR: 216456 Submitted by: bz	2017-01-25 14:37:05 +00:00
Sean Bruno	bd84f70044	iflib: Add internal tracking of smp startup status to reliably figure out what methods are to be used to get gtaskqueue up and running. e1000: Calculating this pointer gives undefined behaviour when (last == -1) (it is before the buffer). The pointer is always followed. Panics occurred when it points to an unmapped page. Otherwise, the pointed-to garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the broken case the loop was usually null and the function just returned, and this was acidentally correct. Submitted by: bde Reported by: Matt Macy <mmacy@nextbsd.org>	2017-01-24 16:05:42 +00:00
Sean Bruno	36fa5d5b64	Revert 312696 due to build tests.	2017-01-24 15:55:52 +00:00
Sean Bruno	562a3182f6	iflib: Add internal tracking of smp startup status to reliably figure out what methods are to be used to get gtaskqueue up and running. e1000: Calculating this pointer gives undefined behaviour when (last == -1) (it is before the buffer). The pointer is always followed. Panics occurred when it points to an unmapped page. Otherwise, the pointed-to garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the broken case the loop was usually null and the function just returned, and this was acidentally correct. Submitted by: bde Reviewed by: Matt Macy <mmacy@nextbsd.org>	2017-01-24 14:48:32 +00:00
Dexuan Cui	92a6859b91	ifnet: introduce event handlers for ifup/ifdown events Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and a VF NIC to work together, mainly to support seamless live migration. When the VF device becomes UP (or DOWN), the synthetic NIC driver needs to switch the data path from the synthetic NIC to the VF (or the opposite). So the synthetic NIC driver needs to know when a VF device is becoming UP or DOWN and hence the patch is made. Reviewed by: sephe Approved by: sephe (mentor) MFC after: 2 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8963	2017-01-24 09:19:46 +00:00
Ravi Pokala	d592868ebf	Eliminate misleading comments and dead code in lacp_port_create() Variables "fast" and "active" are both constant in lacp_port_create(), but comments mispleadingly suggest that "fast" can be changed via ioctl. The constant values control the value of "lp->lp_state", so it too is constant, and the code for assigning different value to it is essentially dead. Remove both "fast" and "active", and set "lp->lp_state" unconditionally; that gets rid of the dead code and misleading comments. CID: 1305692 CID: 1305734 Reported by: asomers Reviewed by: asomers MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D9302	2017-01-24 01:39:40 +00:00
Ryan Stone	7d309e8e40	Fix reference to free memory in ixgbe/if_media.c When ixgbe receives an interrupt indicating that a new optical module may have been inserted, it discards all of its current media types by calling ifmedia_removeall() and then creates a new set of media types for the supported media on the new module. However, ifmedia_removeall() was maintaining a pointer to whatever the current media type was before the call to ifmedia_removealL(). The result of this was that any attempt to read the current media type of the interface (e.g. via ifconfig) would return potentially garbage data from free memory (or if one were particularly unlucky on an architecture that does not malloc() from a direct map, page fault the kernel). Fix this by NULL'ing out the current media field in if_media.c, and have ixgbe update the current media type after recreating them. Submitted by: Matt Joras <matt.joras AT gmail DOT com> Reviewed by: sbruno, erj MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9164	2017-01-20 17:16:48 +00:00
Hans Petter Selasky	f3e7afe2d7	Implement kernel support for hardware rate limited sockets. - Add RATELIMIT kernel configuration keyword which must be set to enable the new functionality. - Add support for hardware driven, Receive Side Scaling, RSS aware, rate limited sendqueues and expose the functionality through the already established SO_MAX_PACING_RATE setsockopt(). The API support rates in the range from 1 to 4Gbytes/s which are suitable for regular TCP and UDP streams. The setsockopt(2) manual page has been updated. - Add rate limit function callback API to "struct ifnet" which supports the following operations: if_snd_tag_alloc(), if_snd_tag_modify(), if_snd_tag_query() and if_snd_tag_free(). - Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT flag, which tells if a network driver supports rate limiting or not. - This patch also adds support for rate limiting through VLAN and LAGG intermediate network devices. - How rate limiting works: 1) The userspace application calls setsockopt() after accepting or making a new connection to set the rate which is then stored in the socket structure in the kernel. Later on when packets are transmitted a check is made in the transmit path for rate changes. A rate change implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the destination network interface, which then sets up a custom sendqueue with the given rate limitation parameter. A "struct m_snd_tag" pointer is returned which serves as a "snd_tag" hint in the m_pkthdr for the subsequently transmitted mbufs. 2) When the network driver sees the "m->m_pkthdr.snd_tag" different from NULL, it will move the packets into a designated rate limited sendqueue given by the snd_tag pointer. It is up to the individual drivers how the rate limited traffic will be rate limited. 3) Route changes are detected by the NIC drivers in the ifp->if_transmit() routine when the ifnet pointer in the incoming snd_tag mismatches the one of the network interface. The network adapter frees the mbuf and returns EAGAIN which causes the ip_output() to release and clear the send tag. Upon next ip_output() a new "snd_tag" will be tried allocated. 4) When the PCB is detached the custom sendqueue will be released by a non-blocking ifp->if_snd_tag_free() call to the currently bound network interface. Reviewed by: wblock (manpages), adrian, gallatin, scottl (network) Differential Revision: https://reviews.freebsd.org/D3687 Sponsored by: Mellanox Technologies MFC after: 3 months	2017-01-18 13:31:17 +00:00
Sean Bruno	4ecb427a49	Fix hangs in a uniprocessor configuration (qemu, virtualbox, real hw). sys/net/iflib.c: Add ctx to filter_info and don't skpi interrupt early on unless we're on an SMP system sys/kern/subr_gtaskqueue.c: Skip smp check if we're running UP Submitted by: Matt Macy <mmacy@nextbsd.org> Reported by: emaste bde	2017-01-15 00:50:10 +00:00
Sean Bruno	5b51fcfc7a	Remove unused mtx_held() macro.	2017-01-09 23:41:10 +00:00
Sepherosa Ziehau	cc5bb78be1	if: Defer the if_up until the ifnet.if_ioctl is called. This ensures the interface is initialized by the interface driver before it can be used by the rest of the system. Reviewed by: jhb, karels, gnn MFC after: 3 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8905	2017-01-06 05:10:49 +00:00
Adrian Chadd	f69a29b5f9	[net80211] add VHT media types in the media layer.	2017-01-05 04:49:23 +00:00
Sean Bruno	1248952a50	2017 IFLIB updates in preparation for commits to e1000 and ixgbe. - iflib - add checksum in place support (mmacy) - iflib - initialize IP for TSO (going to be needed for e1000) (mmacy) - iflib - move isc_txrx from shared context to softc context (mmacy) - iflib - Normalize checks in TXQ drainage. (shurd) - iflib - Fix queue capping checks (mmacy) - iflib - Fix invalid assert, em can need 2 sentinels (mmacy) - iflib - let the driver determine what capabilities are set and what tx csum flags are used (mmacy) - add INVARIANTS debugging hooks to gtaskqueue enqueue (mmacy) - update bnxt(4) to support the changes to iflib (shurd) Some other various, sundry updates. Slightly more verbose changelog: Submitted by: mmacy@nextbsd.org Reviewed by: shurd mFC after: Sponsored by: LimeLight Networks and Dell EMC Isilon	2017-01-02 00:56:33 +00:00
Alan Somers	8a73c85db3	Remove stray debugging code from r310180 Reported by: rstone Pointy hat to: asomers MFC after: 3 weeks X-MFC-with: 310180 Sponsored by: Spectra Logic Corp	2016-12-20 15:45:53 +00:00
Alan Somers	d9fa2d67eb	Fix panic during lagg destruction with simultaneous status check If you run "ifconfig lagg0 destroy" and "ifconfig lagg0" at the same time a page fault may result. The first process will destroy ifp->if_lagg in lagg_clone_destroy (called by if_clone_destroy). Then the second process will observe that ifp->if_lagg is NULL at the top of lagg_port_ioctl and goto fallback: where it will promptly dereference ifp->if_lagg anyway. The solution is to repeat the NULL check for ifp->if_lagg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D8512	2016-12-16 22:39:30 +00:00
Adrian Chadd	fdbc9e6e82	[net80211] start laying down the foundation for 11ac support. This is a work in progress and some of this stuff may change; but hopefully I'm laying down enough stuff and space in fields to allow it to grow without another major recompile. We'll see! * Add a net80211 PHY type for VHT 2G and VHT 5G. Note - yes, VHT is supposed to be for 5GHZ, however some vendors (cough most of them) support some subset of VHT rate support in 2GHz. No - not 80MHz wide channels, but at least some MCS8-9 support, maybe some beamforming, and maybe some longer A-MPDU aggregates. I don't want to even think about MU-MIMO on 2GHz. * Add an ifmedia placeholder type for VHT rates. * Add channel flags for VHT, VHT20/40U/40D/80/80+80/160 * Add channel macros for the above * Add ieee80211_channel fields for the VHT information and flags, along with some padding (so this struct definitely grows.) * Add a phy type flag for VHT - 'v' * Bump the number of channels to a much higher amount - until we get something like the linux mac80211 chanctx abstraction (where the stack provides a current channel configuration via callbacks, versus the driver ever checking ic->ic_curchan or similar) we'll have to populate VHT+HT combinations. Eg, there'll likely be a full set of duplicate VHT20/40 channels to match HT channels. There will also be a full set of duplicate VHT80 channels - note that for VHT80, its assumed you're doing VHT40 as a base, so we don't need a duplicate of VHT80 + 20MHz only primary channels, only a duplicate of all the VHT40 combinations. I don't want to think about VHT80+80 or VHT160 for now - and I won't, as the current device I'm doing 11ac bringup on (QCA9880) only does VHT80. I'll likely revisit the channel configuration and scanning related stuff after I get VHT20/40 up. * Add vht flags and the basic MCS rate setup to ieee80211com, ieee80211vap and ieee80211_node in preparation for 11ac configuration. There is zero code that uses this right now. * Whilst here, add some more placeholders in case I need to extend out things by some uint32_t flag sized fields. Hopefully I won't! What I haven't yet done: * any of the code that uses this * any of the beamforming related fields * any of the MU-MIMO fields required for STA/AP operation * any of the IE fields in beacon frame / probe request/response handling and the calculations required for shifting beacon contents around when the TIM grows/shrinks This will require a full rebuild of net80211 related programs - ifconfig, hostapd, wpa_supplicant.	2016-12-16 04:43:31 +00:00
Luiz Otavio O Souza	8f1c8ade60	Fix the typos and style(9) in comment. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)	2016-12-08 18:18:48 +00:00
Fabien Thomas	bf4356266d	IPsec RFC6479 support for replay window sizes up to 2^32 - 32 packets. Since the previous algorithm, based on bit shifting, does not scale with large replay windows, the algorithm used here is based on RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting. The replay window will be fast to be updated, but will cost as many bits in RAM as its size. The previous implementation did not provide a lock on the replay window, which may lead to replay issues. Reviewed by: ae Obtained from: emeric.poupon@stormshield.eu Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D8468	2016-11-25 14:44:49 +00:00
Sean Bruno	da69b8f9d1	iflib updates and fixes: - reset gen on down - initialize admin task statically - drain mp_ring on down - don't drop context lock on stop - reset error stats on down - fix typo in min_latency sysctl - return ENOBUFS from if_transmit if the driver isn't running or the link is down Submitted by: mmacy@nextbsd.org Reviewed by: shurd MFC after: 2 days Sponsored by: Isilon and Limelight Networks Differential Revision: https://reviews.freebsd.org/D8558	2016-11-18 04:19:21 +00:00
Mark Johnston	55dfce589c	Plug a lock leak in sysctl_ifmalist(). Fix style in the local variable declarations. PR: 214542 MFC after: 1 week	2016-11-15 19:23:48 +00:00
Ryan Stone	ab607f28e3	Don't read if_counters with if_addr_lock held Calling into an ifnet implementation with the if_addr_lock already held can cause a LOR and potentially a deadlock, as ifnet implementations typically can take the if_addr_lock after their own locks during configuration. Refactor a sysctl handler that was violating this to read if_counter data in a temporary buffer before the if_addr_lock is taken, and then copying the data in its final location later, when the if_addr_lock is held. PR: 194109 Reported by: Jean-Sebastien Pedron MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8498 Reviewed by: sbruno	2016-11-12 19:03:23 +00:00
Luigi Rizzo	844a6f0c53	Various fixes for ptnet/ptnetmap (passthrough of netmap ports). In detail: - use PCI_VENDOR and PCI_DEVICE ids from a publicly allocated range (thanks to RedHat) - export memory pool information through PCI registers - improve mechanism for configuring passthrough on different hypervisors Code is from Vincenzo Maffione as a follow up to his GSOC work.	2016-10-27 09:46:22 +00:00
Sepherosa Ziehau	14a31e99d7	hyperv/hn: Define empty packet filter. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8342	2016-10-27 04:55:19 +00:00
Bryan Drewery	921e5f5675	Remove excess CTLFLAG_VNET Sponsored by: Dell EMC Isilon	2016-10-26 23:40:07 +00:00
Sepherosa Ziehau	121e98e697	hyperv/hn: Fix RX filter settings. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8313	2016-10-24 05:10:35 +00:00
Sepherosa Ziehau	970ead008d	hyperv/hn: Add network change support. Currently the network change is simulated by link status changes. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8295	2016-10-21 08:02:05 +00:00
Jung-uk Kim	69d410eeb1	Implement BPF_MOD and BPF_XOR instructions. These two ALU instructions first appeared on Linux. Then, libpcap adopted and made them available since 1.6.2. Now more platforms including NetBSD have them in kernel. So do we. --이 줄 이하는 자동으로 제거됩니다--	2016-10-21 06:55:07 +00:00
Andrew Gallatin	eedb49598b	Clear mbuf hashtype on loopback when RSS is enabled. The hashtype on an outgoing mbuf reflects the correct hash on the transmit side of the connection. If this hash persists on loopback, the receiving RSS/PCBGROUP code will use it to look up the pcbgroup for the transmit side, which will often not match the pcbgroup for the receive side of the connection. This leads to TCP connections hanging, and dropping the SYN/ACK packet. This is essentially the same as having a hardware network card generate mbufs with an incorrect RSS hash. There are a number of places which can set the hash on transmit, so the simplest fix is to simply clear the hash at loopback time. Clearing the hash allows a new, correct hash to be calculated in software on the receive side. Reviewed by: jtl Discussed with: adrian Sponsored by: Netflix	2016-10-20 13:48:29 +00:00
Kevin Lo	b95d46da24	Fix typo in comment.	2016-10-19 02:24:57 +00:00
Luigi Rizzo	a2a7409151	remove stale and unused code from various files fix build on 32 bit platforms simplify logic in netmap_virt.h The commands (in net/netmap.h) to configure communication with the hypervisor may be revised soon. At the moment they are unused so this will not be a change of API.	2016-10-18 16:18:25 +00:00
Luigi Rizzo	6ad42d71b2	remove trailing whitespace. No code changes.	2016-10-18 15:41:57 +00:00
Sean Bruno	2fe66646bd	Set default capabilities at attach. ref: `6425f45e5f` Submitted by: mmacy@nextbsd.org	2016-10-18 14:02:45 +00:00
Sean Bruno	add6f7d069	When deciding whether or not to call tqg_attach_cpu(), reference rid directly. ref: `c9b47b468b` Submitted by: mmacy@nextbsd.org	2016-10-18 13:29:30 +00:00
Sean Bruno	8b2a1db901	Toggle v4/v6 rxcsum together Only re-init if driver is running ref: `106518e874` Submitted by: mmacy@nextbsd.org	2016-10-18 13:22:44 +00:00
Sean Bruno	aa3c5dd8a8	Fix misusage of CPU_FFS when binding queues to cpus ref: `922d0bdf22` Submitted by: mmacy@nextbsd.org	2016-10-18 13:12:19 +00:00
Luigi Rizzo	a82ab41168	add a missing header.	2016-10-16 18:27:41 +00:00
Luigi Rizzo	37e3a6d349	Import the current version of netmap, aligned with the one on github. This commit, long overdue, contains contributions in the last 2 years from Stefano Garzarella, Giuseppe Lettieri, Vincenzo Maffione, including: + fixes on monitor ports + the 'ptnet' virtual device driver, and ptnetmap backend, for high speed virtual passthrough on VMs (bhyve fixes in an upcoming commit) + improved emulated netmap mode + more robust error handling + removal of stale code + various fixes to code and documentation (some mixup between RX and TX parameters, and private and public variables) We also include an additional tool, nmreplay, which is functionally equivalent to tcpreplay but operating on netmap ports.	2016-10-16 14:13:32 +00:00
Sepherosa Ziehau	368bf0c2c6	ifnet: Use if_link_state snapshot to invoke ifnet_link_event So that everyone in this task have consistent view of link state. Reviewed by: ae MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8214	2016-10-12 01:52:29 +00:00
Andrey V. Elsukov	199511bcdd	Make LLTABLE list lock private for if_llatbl.c Rename lock and macros to reflect that it protects V_lltables list.	2016-10-11 17:41:13 +00:00
Sepherosa Ziehau	65ca331080	hyperv/hn: Fix checksum offload settings The _correct_ way to identify the supported checksum offloading and TSO parameters is to query OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8088	2016-10-10 05:41:39 +00:00
Andrey V. Elsukov	abe95d87ee	Replace rw_init/rw_destroy with corresponding macros. Obtained from: Yandex LLC	2016-10-06 14:42:06 +00:00
Kevin Lo	c2b5ba7661	Remove an alias if_list, use if_link consistently. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8075	2016-10-06 00:51:27 +00:00
Sepherosa Ziehau	1a3c881209	hyperv/hn: Add stubs for OFFLOAD_CURRENT_CONFIG and NETWORK_CHANGE status MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8068	2016-09-30 06:58:45 +00:00
Kevin Lo	decb239dff	Remove the compatibility macro if_addrlist. Since if_addrlist is used only for ipfilter(4), add a macro if_addrlist in ip_compat.h. Reviewed by: cy Differential Revision: https://reviews.freebsd.org/D8059	2016-09-29 05:37:45 +00:00
Kevin Lo	c7641cd18d	Remove ifa_list, use ifa_link (structure field) instead. While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the interface address list in sctp_bsd_addr.c Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D8051	2016-09-28 13:29:11 +00:00
Kevin Lo	3ff511d316	Remove a comment about the size of the ifnet structure. Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D8036	2016-09-27 08:11:09 +00:00
Kristof Provost	f18598a43e	bridge: Fix fragment handling and memory leak Fragmented UDP and ICMP packets were corrupted if a firewall with reassembling feature (like pf'scrub) is enabled on the bridge. This patch fixes corrupted packet problem and the panic (triggered easly with low RAM) as explain in PR 185633. bridge_pfil and bridge_fragment relationship: bridge_pfil() receive (IN direction) packets and sent it to the firewall The firewall can be configured for reassembling fragmented packet (like pf'scrubing) in one mbuf chain when bridge_pfil() need to send this reassembled packet to the outgoing interface, it needs to re-fragment it by using bridge_fragment() bridge_fragment() had to split this mbuf (using ip_fragment) first then had to M_PREPEND each packet in the mbuf chain for adding Ethernet header. But M_PREPEND can sometime create a new mbuf on the begining of the mbuf chain, then the "main" pointer of this mbuf chain should be updated and this case is tottaly forgotten. The original bridge_fragment code (Revision 158140, 2006 April 29) came from OpenBSD, and the call to bridge_enqueue was embedded. But on FreeBSD, bridge_enqueue() is done after bridge_fragment(), then the original OpenBSD code can't work as-it of FreeBSD. PR: 185633 Submitted by: Olivier Cochard-Labbé Differential Revision: https://reviews.freebsd.org/D7780	2016-09-24 07:09:43 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Sepherosa Ziehau	b349357819	hyperv/hn: Stringent RNDIS packet message length/offset check. While I'm here, use definition in net/rndis.h MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7782	2016-09-06 03:20:06 +00:00
Sepherosa Ziehau	a8197ee35e	net/rndis: Define RNDIS status message, which could be sent by device. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7757	2016-09-05 04:56:56 +00:00
Sepherosa Ziehau	772b86ba12	net/rndis: Define common message header for RNDIS messages. And avoid RNDIS_HEADER_OFFSET hardcoding. Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7739	2016-09-02 05:57:13 +00:00
Sepherosa Ziehau	178228a10e	net/rndis: Add comment for rndis_comp_hdr Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7738	2016-09-02 05:49:38 +00:00
Sepherosa Ziehau	46ebd74ce1	net/rndis: Define types for RNDIS pktinfo rm_type field. They are defined by NDIS spec, so the NDIS prefix. Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7717	2016-09-01 07:17:06 +00:00
Sepherosa Ziehau	f320cbed5a	net/vlan: Shift for pri is 13 (pri mask 0xe000) not 1. Reviewed by: araujo, hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7710	2016-09-01 06:32:35 +00:00
Sepherosa Ziehau	6f67f21938	net/rndis: Define per-packet-info for RNDIS packet message MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7708	2016-09-01 05:40:13 +00:00
Sepherosa Ziehau	947175ca10	net/rndis: Add comment for rndis_set_parameter MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7705	2016-09-01 05:15:04 +00:00
Sepherosa Ziehau	1010113dad	net/rndis: Packet types are defined by NDIS; not RNDIS specific. Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7681	2016-08-30 03:11:07 +00:00
Sepherosa Ziehau	8bb1a21b56	hyperv/hn: Move OIDs to net/rndis.h; they are standard NDIS OIDs. Actually all OIDs defined in net/rndis.h are standard NDIS OIDs. While I'm here, use the verbose macro name as in NDIS spec. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7679	2016-08-30 02:55:07 +00:00
Sepherosa Ziehau	77c4f5aa9d	hyperv/hn: Use vmbus xact for RNDIS set. And use new RNDIS set to configure NDIS offloading parameters. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7641	2016-08-26 05:18:27 +00:00
Sepherosa Ziehau	cc3d96db55	hyperv/hn: Use vmbus xact for RNDIS query. And switch MAC address query to use new RNDIS query function. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7639	2016-08-26 05:12:09 +00:00
Sepherosa Ziehau	550bbdbd27	hyperv/hn: Use vmbus xact for RNDIS initialize. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7624	2016-08-25 05:00:41 +00:00
Sepherosa Ziehau	6d79d63a7b	net/rndis: Fix RNDIS_STATUS_PENDING definition. While I'm here, sort the RNDIS status in ascending order. MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7594	2016-08-24 03:16:25 +00:00
Sepherosa Ziehau	48ef7b17f0	net/rndis: Add canonical RNDIS major/minor version as of today. Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7593	2016-08-24 03:08:13 +00:00
Sepherosa Ziehau	1ba241d223	net: Split RNDIS protocol structs/macros out of dev/usb/net/if_urndisreg.h So that Hyper-V can leverage them instead of rolling its own definition. Discussed with: hps Reviewed by: hps MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7592	2016-08-23 02:54:06 +00:00
Andrey V. Elsukov	fdf95c0b81	Teach netisr_get_cpuid() to limit a given value to supported by netisr. Use netisr_get_cpuid() in netisr_select_cpuid() to limit cpuid value returned by protocol to be sure that it is not greather than nws_count. PR: 211836 Reviewed by: adrian MFC after: 3 days	2016-08-17 20:21:33 +00:00
Stephen Hurd	23ac9029f9	Update iflib to support more NIC designs - Move group task queue into kern/subr_gtaskqueue.c - Change intr_enable to return an int so it can be detected if it's not implemented - Allow different TX/RX queues per set to be different sizes - Don't split up TX mbufs before transmit - Allow a completion queue for TX as well as RX - Pass the RX budget to isc_rxd_available() to allow an earlier return and avoid multiple calls Submitted by: shurd Reviewed by: gallatin Approved by: scottl Differential Revision: https://reviews.freebsd.org/D7393	2016-08-12 21:29:44 +00:00
Adrian Chadd	eb81dc79e9	Extract out the various local definitions of ETHER_IS_BROADCAST() and turn them into a shared definition. Set M_MCAST/M_BCAST appropriately upon packet reception in net80211, just before they are delivered up to the ethernet stack. Submitted by: rstone	2016-08-07 03:48:33 +00:00
John Baldwin	f454e7ebf5	Add __printflike() to bus_describe_intr() to enable -Wformat checks. Fix a few places that were passing a raw string as the format to use a "%s" format string instead. MFC after: 2 months	2016-08-04 18:29:16 +00:00
Conrad Meyer	9809e9dc3a	rtentry: Initialize rt_mtx with MTX_NEW The "rtentry" zone does not use UMA_ZONE_ZINIT, so it is invalid to assume the mutex's memory will be zero. Without MTX_NEW, garbage backing memory may trigger the "re-initializing a mutex" assertion. PR: 200991 Submitted by: Chang-Hsien Tsai <luke.tw AT gmail.com>	2016-08-01 23:07:31 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Gleb Smirnoff	32e0ade6c4	Partially revert r257696/r257713, which have an issue with writing to user controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c. Noticed by: C Turt	2016-07-24 10:10:09 +00:00
Alexander Motin	84e633724f	Negotiate/disable TXCSUM_IPV6 same as TXCSUM.	2016-07-18 16:58:47 +00:00
Nathan Whitehorn	8c636a11dc	Remove assumptions in MI code that the BSP is CPU 0. MFC after: 2 weeks	2016-07-11 21:25:28 +00:00
Pedro F. Giffuni	a4bf2e2d49	ng_mppc(4):: basic readability cleanups. In particular use __unreachable() to appease static analyzers. No functional change. CID: 1356591 MFC after: 3 days	2016-07-09 02:33:45 +00:00
Conrad Meyer	91d546a05d	iflib: Fix typo in 'iflib_rx_miss_bufs' sysctl name It looks like these sysctls were copy-pasted from netmap. Most were changed from 'ixl_' prefix to 'iflib_', but this one was missed. Fix the "can't re-use a leaf (ixl_rx_miss_bufs)!" warning. Reported by: dim@ and others Sponsored by: EMC / Isilon Storage Division	2016-07-08 17:04:21 +00:00
Nathan Whitehorn	6415e9aafb	Add variable declaration missing in r302372. Submitted by: andrew Approved by: re (gjb, kib)	2016-07-06 17:46:49 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Bjoern A. Zeeb	a29c7aeb2e	Several device drivers call if_alloc() and then do further checks and will cal if_free() in case of conflict, error, .. if_free() however sets the VNET instance from the ifp->if_vnet which was not yet initialized but would only in if_attach(). Fix this by setting the curvnet from where we allocate the interface in if_alloc(). if_attach() will later overwrite this as needed. We do not set the home_vnet early on as we only want to prevent the if_free() panic but not change any of the other housekeeping, e.g., triggered through ifioctl()s. Reviewed by: brooks Approved by: re (gjb) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D7010	2016-06-29 05:21:25 +00:00
Bjoern A. Zeeb	a0429b5459	Update pf(4) and pflog(4) to survive basic VNET testing, which includes proper virtualisation, teardown, avoiding use-after-free, race conditions, no longer creating a thread per VNET (which could easily be a couple of thousand threads), gracefully ignoring global events (e.g., eventhandlers) on teardown, clearing various globally cached pointers and checking them before use. Reviewed by: kp Approved by: re (gjb) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6924	2016-06-23 21:34:38 +00:00
Navdeep Parhar	f22bfc72f8	Add spares to struct ifnet and socket for packet pacing and/or general use. Update comments regarding the spare fields in struct inpcb. Bump __FreeBSD_version for the changes to the size of the structures. Reviewed by: gnn@ Approved by: re@ (gjb@) Sponsored by: Chelsio Communications	2016-06-23 21:07:15 +00:00
Bjoern A. Zeeb	a97c790844	Add more fields to if_debug.c for ddb(4) 'show ifnet'; resort some fields to match the order in the struct. Especially needed if_pf_kif to do pf(4) VNET debugging. Approved by: re (marius) Obtained from: projects/vnet MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-06-22 12:53:10 +00:00
Bjoern A. Zeeb	d3f6f80f4b	After r302054 unloading an network interface driver on a kernel without VIMAGE support would dereference a NULL point unconditionally leading to a panic. Wrap the entire VIMAGE related code with #ifdefs rather than just the decision making part to save an extra bit of resources. Reported by: np Sponsored by: The FreeBSD Foundation MFC After: 13 days Approved by: re (marius)	2016-06-22 11:45:30 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Kristof Provost	3e248e0fb4	pf: Filter on and set vlan PCP values Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to filter on it. Reviewed by: allanjude, araujo Approved by: re (gjb) Obtained from: OpenBSD (mostly) Differential Revision: https://reviews.freebsd.org/D6786	2016-06-17 18:21:55 +00:00
Conrad Meyer	0d0338afc9	iflib: Improve cleanup on iflib_queues_alloc error path Fix some memory leaks. Some may remain. Reported by: Coverity Discussed with: mmacy CIDs: 1356036, 1356037, 1356038 Sponsored by: EMC / Isilon Storage Division	2016-06-07 20:26:00 +00:00
Conrad Meyer	16fb86ab35	iflib: Fix potential leak in iflib_if_transmit Due to an accidental mismatch between allocation and release in the slow path of iflib_if_transmit, if a caller passed 9-16 mbufs to the routine, the mbuf array would be leaked. Fix the mismatch by removing the magic numbers in favor of nitems() on the stack array. According to mmacy, this leak is unlikely. Reported by: Coverity Discussed with: mmacy CID: 1356040 Sponsored by: EMC / Isilon Storage Division	2016-06-07 19:49:08 +00:00
Pedro F. Giffuni	c3fb425204	ng_mppc(4): Bring netgraph(3) MPPC compression support. Support for compression has been available from July 2007 but it was never imported due to concerns with patents once held by STAC/HiFn. The issues have clearly been resolved so bring it in now. Special thanks to Brett Glass for preserving the code and pointing documentation for the expiration case. Obtained from: mav (through Brett Glass) Relnotes: yes MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6739	2016-06-07 15:07:00 +00:00
Sepherosa Ziehau	36ad8372d4	net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties Reviewed by: hps, erj, tuexen Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6688	2016-06-07 04:51:50 +00:00
Bjoern A. Zeeb	2d5ad99a0d	After tearing down the interface per-"domain" bits, set the data area to NULL to avoid it being mis-treated on a possible re-attach but also to get a clean NULL pointer derefence in case of errors due to unexpected race conditions elsewhere in the code, e.g., callouts. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 22:59:58 +00:00
Bjoern A. Zeeb	d117fd8003	Similarly to r301505 protect the removal of the ifa from the if_addrhead by a lock (as well as the check that the list is not empty). Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 16:23:02 +00:00
Bjoern A. Zeeb	f22d78c06e	In if_purgeaddrs() we cannot hold the lock over the entire loop due to called functions (as in other parts of the stack, leave a comment). Put around a lock the removal of the ifa from the list however to reduce the possible race with other places. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 13:17:25 +00:00
Bjoern A. Zeeb	b9dbac48f3	SYSINIT functions do not return a value; switch to void, remove the return value, and mark the unused argument __unused. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 13:01:57 +00:00
Bjoern A. Zeeb	80ae8d609a	Provide a public interface to rt_flushifroutes which takes the address family as an argument as well. This will be used to cleanup individual protocols during VNET teardown. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation	2016-06-06 12:49:47 +00:00
Bjoern A. Zeeb	e84ef07f02	Make the KASSERT message more helpful by also printing the ifp information which we are asserting. Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-06-06 10:13:48 +00:00
Marcelo Araujo	2ccbbd06d2	Add support to priority code point (PCP) that is an 3-bit field which refers to IEEE 802.1p class of service and maps to the frame priority level. Values in order of priority are: 1 (Background (lowest)), 0 (Best effort (default)), 2 (Excellent effort), 3 (Critical applications), 4 (Video, < 100ms latency), 5 (Video, < 10ms latency), 6 (Internetwork control) and 7 (Network control (highest)). Example of usage: root# ifconfig em0.1 create root# ifconfig em0.1 vlanpcp 3 Note: The review D801 includes the pf(4) part, but as discussed with kristof, we won't commit the pf(4) bits for now. The credits of the original code is from rwatson. Differential Revision: https://reviews.freebsd.org/D801 Reviewed by: gnn, adrian, loos Discussed with: rwatson, glebius, kristof Tested by: many including Matthew Grooms <mgrooms__shrew.net> Obtained from: pfSense Relnotes: Yes	2016-06-06 09:51:58 +00:00
Bjoern A. Zeeb	484149def8	Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET. Add accessor functions to toggle the state per VNET. The base system (vnet0) will always enable itself with the normal registration. We will share the registered protocol handlers in all VNETs minimising duplication and management. Upon disabling netisr processing for a VNET drain the netisr queue from packets for that VNET. Update netisr consumers to (de)register on a per-VNET start/teardown using VNET_SYS(UN)INIT functionality. The change should be transparent for non-VIMAGE kernels. Reviewed by: gnn (, hiren) Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6691	2016-06-03 13:57:10 +00:00
George V. Neville-Neil	6d76822688	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262	2016-06-02 17:51:29 +00:00
Bjoern A. Zeeb	c169d9fe07	In if_attachdomain1() there does not seem to be any reason to use TRYLOCK rather than just acquire the lock, so just do that. Reviewed by: markj Obtained from: projects/vnet MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6578	2016-05-28 08:32:15 +00:00
Nick Hibma	dbd2ee46b2	Change net.link.log_promisc_mode_change to a read-only tunable PR: 166255 Submitted by: eugen.grosbein.net Obtained from: hselasky MFC after: 3 days	2016-05-25 09:00:05 +00:00
Michael Tuexen	b5994a5c26	Allow an MTU of 65535 bytes to be set via TUN[SG]IFINFO. This requires changing the type on the mtu field in struct tuninfo from short to unsigned short. This is used, for example, by packetdrill to test with MTUs up to the maximum value. Differential Revision: 6452	2016-05-24 11:47:14 +00:00
Pedro F. Giffuni	efc457e1bc	sys/net: more spelling.	2016-05-19 16:28:05 +00:00
Michael Tuexen	683300d1d5	Allow writing IP packets of length TUNMRU no matter if TUNSIFHEAD is set or not.	2016-05-19 13:52:12 +00:00
Bjoern A. Zeeb	ad4e911678	Rather than having the if_vmove() code intermixed in the vnet_destroy() function in vnet.c move it to if.c where it logically belongs and put it under a VNET_SYSUNINIT() call. To not change the current behaviour make sure it runs first thing during teardown. In the future this will allow us more flexibility on changing the order on when we want to get rid of interfaces. Stop exporting if_vmove() and make it file static. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6438	2016-05-18 20:06:45 +00:00
Bjoern A. Zeeb	94081f88e8	Add a "vnet_state" field to struct vnet. This is set to the SI_SUB_* value before executing any VNET_SYSINIT or VNET_SYSUNINT. While good for debugging especially VNET teardown problems having a chance to know at which level during teardown we are, it will also be used to identify to detcted a "stable state" (as in fully up and running) later on. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation	2016-05-18 15:50:52 +00:00
Scott Long	fc614c29c1	Activate the NO_64BIT_ATOMICS code for mips and powerpc	2016-05-18 15:45:12 +00:00
Scott Long	c7762913ac	Remove assertions that don't make sense for the data type.	2016-05-18 15:44:45 +00:00
Bjoern A. Zeeb	00e36a5c7c	Add a dummy VNET_SYSINIT that will make sure all VNETs started will always end on SI_SUB_VNET_DONE. Obtained from: projects/vnet Sponsored by: The FreeBSD Foundation	2016-05-18 15:25:19 +00:00
Bjoern A. Zeeb	5fa0728b7d	Split 'show vnets' into 'show vnet' and 'show all vnets'. While here adjust some db_printf format string. Document the two show commands in ddb.4. Sponsored by: The FreeBSD Foundation	2016-05-18 14:43:17 +00:00
Bjoern A. Zeeb	aaeb188af3	Make compile without INET or without IP support in the kernel by hiding variables and lro function calls behind approriate #ifdefs. Also move the #includes for "opt_*" to the place where they should be.	2016-05-18 14:18:03 +00:00
Scott Long	4c7070db25	Import the 'iflib' API library for network drivers. From the author: "iflib is a library to eliminate the need for frequently duplicated device independent logic propagated (poorly) across many network drivers." Participation is purely optional. The IFLIB kernel config option is provided for drivers that want to transition between legacy and iflib modes of operation. ixl and ixgbe driver conversions will be committed shortly. We hope to see participation from the Broadcom and maybe Chelsio drivers in the near future. Submitted by: mmacy@nextbsd.org Reviewed by: gallatin Differential Revision: D5211	2016-05-18 04:35:58 +00:00
Eitan Adler	cef367e6a1	Don't repeat the the word 'the' (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)	2016-05-17 12:52:31 +00:00
Bjoern A. Zeeb	54d9f34ea3	Mark the unused arguments of various SYSINIT functions __unused. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-05-17 00:32:36 +00:00
Don Lewis	1ef3d54d20	When handling SIOCSIFNAME ensure that the new interface name is NUL terminated. Reject the rename attempt if the name is too long. MFC after: 1 week	2016-05-15 21:37:36 +00:00
John Baldwin	fdce57a042	Add an EARLY_AP_STARTUP option to start APs earlier during boot. Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix	2016-05-14 18:22:52 +00:00
Nick Hibma	6d07c1575b	Allow silencing of 'promiscuous mode enabled/disabled' messages. PR: 166255 Submitted by: eugen.grosbein.net Obtained from: eugen.grosbein.net MFC after: 1 week	2016-05-12 19:42:13 +00:00
Alan Somers	8907f744ff	Improve performance and functionality of the bitstring(3) api Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow for efficient searching of set or cleared bits starting from any bit offset within the bit string. Performance is improved by operating on longs instead of bytes and using ffsl() for searches within a long. ffsl() is a compiler builtin in both clang and gcc for most architectures, converting what was a brute force while loop search into a couple of instructions. All of the bitstring(3) API continues to be contained in the header file. Some of the functions are large enough that perhaps they should be uninlined and moved to a library, but that is beyond the scope of this commit. sys/sys/bitstring.h: Convert the majority of the existing bit string implementation from macros to inline functions. Properly protect the implementation from inadvertant macro expansion when included in a user's program by prefixing all private macros/functions and local variables with '_'. Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and bit_ffc() in terms of their "at" counterparts. Provide a kernel implementation of bit_alloc(), making the full API usable in the kernel. Improve code documenation. share/man/man3/bitstring.3: Add pre-exisiting API bit_ffc() to the synopsis. Document new APIs. Document the initialization state of the bit strings allocated/declared by bit_alloc() and bit_decl(). Correct documentation for bitstr_size(). The original code comments indicate the size is in bytes, not "elements of bitstr_t". The new implementation follows this lead. Only hastd assumed "elements" rather than bytes and it has been corrected. etc/mtree/BSD.tests.dist: tests/sys/Makefile: tests/sys/sys/Makefile: tests/sys/sys/bitstring.c: Add tests for all existing and new functionality. include/bitstring.h Include all headers needed by sys/bitstring.h lib/libbluetooth/bluetooth.h: usr.sbin/bluetooth/hccontrol/le.c: Include bitstring.h instead of sys/bitstring.h. sbin/hastd/activemap.c: Correct usage of bitstr_size(). sys/dev/xen/blkback/blkback.c Use new bit_alloc. sys/kern/subr_unit.c: Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of unrb.busy, which caches the number of bits set in unrb.map. When INVARIANTS are disabled, nothing needs to know that information. callapse_unr can be adapted to use bit_ffs and bit_ffc instead. Eliminating unrb.busy saves memory, simplifies the code, and provides a slight speedup when INVARIANTS are disabled. sys/net/flowtable.c: Use the new kernel implementation of bit-alloc, instead of hacking the old libc-dependent macro. sys/sys/param.h Update __FreeBSD_version to indicate availability of new API Submitted by: gibbs, asomers Reviewed by: gibbs, ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6004	2016-05-04 22:34:11 +00:00
Pedro F. Giffuni	a4641f4eaa	sys/net*: minor spelling fixes. No functional change.	2016-05-03 18:05:43 +00:00
Bjoern A. Zeeb	46b0539ca4	Remove the most useful INET \|\| INET6 check leftover from whenever, doing nothing. MFC after: 1 week Sponsored by: The FreeBSD Foundation	2016-05-03 16:01:53 +00:00
Randall Stewart	abb901c5d7	Complete the UDP tunneling of ICMP msgs to those protocols interested in having tunneled UDP and finding out about the ICMP (tested by Michael Tuexen with SCTP.. soon to be using this feature). Differential Revision: http://reviews.freebsd.org/D5875	2016-04-28 15:53:10 +00:00
Conrad Meyer	dcbee68850	radix_mpath: Don't derefence a NULL pointer in for loop iteration It seems rn_dupedkey may be NULL, because of the NULL check inside the loop. (Also, the rt gets assigned from rn_dupedkey and NULL checked at top of loop.) However, the for-loop update condition happens before the top-of-loop check and dereferences 'rt' unconditionally. Instead, NULL-check before dereferencing. If rn_dupedkey cannot in fact be NULL, or something else protects this, feel free to revert this and add an ASSERT of some kind instead. This was introduced in r191080 (2009) and moved around slightly in r293657. Reported by: Coverity CID: 1348482 Sponsored by: EMC / Isilon Storage Division	2016-04-26 20:27:17 +00:00
Pedro F. Giffuni	55e0987aea	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Pedro F. Giffuni	8dfea46460	Remove slightly used const values that can be replaced with nitems(). Suggested by: jhb	2016-04-21 15:38:28 +00:00
Bjoern A. Zeeb	29bda43fa4	Add more fields from struct ifnet needed during debugging a kernel panic. Move if_fib into the right place. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2016-04-20 21:04:39 +00:00
Conrad Meyer	856d8ddbb3	radix rn_inithead: Fix minor leak in low memory conditions R_Zalloc is essentially a malloc(M_NOWAIT) wrapper. It is possible that 'rnh' failed to allocate, but 'rmh' succeeds. In that case, we bail out of rn_inithead() but previously did not free 'rmh'. Introduced in r287073 (projects/routing) / MFP r294706. Reported by: Coverity CID: 1350258 Sponsored by: EMC / Isilon Storage Division	2016-04-20 02:01:45 +00:00
Conrad Meyer	5412ec6e3f	bpf_getdltlist: Don't overrun 'lst' 'lst' is allocated with 'n1' members. 'n' indexes 'lst'. So 'n == n1' is an invalid 'lst' index. This is a follow-up to r296009. Reported by: Coverity CID: 1352743 Sponsored by: EMC / Isilon Storage Division	2016-04-20 01:39:31 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Pedro F. Giffuni	155d72c498	sys/net* : for pointers replace 0 with NULL. Mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 17:30:33 +00:00
Bjoern A. Zeeb	05fc416403	During if_vmove() we call if_detach_internal() which in turn calls the event handler notifying about interface departure and one of the consumers will detach if_bpf. There is no way for us to re-attach this easily as the DLT and hdrlen are only given on interface creation. Add a function to allow us to query the DLT and hdrlen from a current BPF attachment and after if_attach_internal() manually re-add the if_bpf attachment using these values. Found by panics triggered by nd6 packets running past BPF_MTAP() with no proper if_bpf pointer on the interface. Also add a basic DDB show function to investigate the if_bpf attachment of an interface. Reviewed by: gnn MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5896	2016-04-11 10:00:38 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Ravi Pokala	729a4cff7e	Revert accidental submit of WIP as part of r297609 Pointyhat to: rpokala	2016-04-06 04:58:20 +00:00
Ravi Pokala	06152bf0e1	Storage Controller Interface driver - typo in unimplemented macro in scic_sds_controller_registers.h s/contoller/controller/ PR: 207336 Submitted by: Tony Narlock <tony @ git-pull.com>	2016-04-06 04:50:28 +00:00
John Baldwin	2f9b9f9c7f	Remove an unneeded check. CPUs with valid per-CPU data are not absent. Sponsored by: Netflix	2016-04-05 00:09:19 +00:00
Bjoern A. Zeeb	905197505e	Catch up with some more fields. I needed the bpf one lately. Sponsored by: The FreeBSD Foundation	2016-03-31 18:53:13 +00:00
Edward Tomasz Napierala	35030a5dd4	Remove some NULL checks for M_WAITOK allocations. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-29 13:56:59 +00:00
George V. Neville-Neil	cd4a821c2f	Add ethertype reserved for network testing MFC after: 2 weeks	2016-03-28 18:25:54 +00:00
Bjoern A. Zeeb	4f321dbd1c	Fix compile errors after r297225: - properly V_irtualise variable access unbreaking VIMAGE kernels. - remove the volatile from the function return type to make architecture using gcc happy [-Wreturn-type] "type qualifiers ignored on function return type" I am not entirely happy with this solution putting the u_int there but it will do for now.	2016-03-24 11:40:10 +00:00
George V. Neville-Neil	84cc0778d0	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306	2016-03-24 07:54:56 +00:00
Sepherosa Ziehau	1321c5029e	buf_ring/drbr: Add buf_ring_peek_clear_sc and use it in drbr_peek Unlike buf_ring_peek, it only supports single consumer mode, and it clears the cons_head if DEBUG_BUFRING/INVARIANTS is defined. The normal use case of drbr_peek for network drivers is: m = drbr_peek(br); err = hw_spec_encap(&m); /* could m_defrag/m_collapse / () if (err) { if (m == NULL) drbr_advance(br); else drbr_putback(br, m); /* break the loop / } drbr_advance(br); The race is: If hw_spec_encap() m_defrag or m_collapse the mbuf, i.e. the old mbuf was freed, or like the Hyper-V's network driver, that transmission- done does not even require the TX lock; then on the other CPU at the () time, the freed mbuf could be recycled and being drbr_enqueue even before the current CPU had the chance to call drbr_{advance,putback}. This triggers a panic in drbr_enqueue duplicated element check, if DEBUG_BUFRING/INVARIANTS is defined. Use buf_ring_peek_clear_sc() in drbr_peek() to fix the above race. This change is a NO-OP, if neither DEBUG_BUFRING nor INVARIANTS are defined. MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5416	2016-02-29 03:54:51 +00:00
Konstantin Belousov	70209aca16	In bpf_getdltlist(), do not call copyout(9) while holding bpf lock. Copy the data into temprorary malloced buffer and drop the lock for copyout. Reported, reviewed and tested by: cem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-24 22:00:35 +00:00
Marcelo Araujo	d931334bd4	Fix regression introduced on 272446r. lagg(4) supports the protocol none, where it disables any traffic without disabling the lagg(4) interface itself. PR: 206921 Submitted by: Pushkar Kothavade <pushkarbk@gmail.com> Reviewed by: rpokala Approved by: bapt (mentor) MFC after: 3 weeks Sponsored by: gandi.net Differential Revision: https://reviews.freebsd.org/D5076	2016-02-19 06:35:53 +00:00
Devin Teske	41c0ec9a16	Merge SVN r295220 (bz) from projects/vnet/ Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.	2016-02-11 17:07:19 +00:00
Bjoern A. Zeeb	a5243af262	Code duplication but rib_head is special. Not found an easy way to go back and harmize the use cases among RIB, IPFW, PF yet but it's also not the scope of this work. Prevents instant panics on teardown and frees the FIB bits again. Sponsored by: The FreeBSD Foundation	2016-02-03 21:56:51 +00:00
Bjoern A. Zeeb	2414e86439	MfH @r295202 Expect to see panics in routing code at least now.	2016-02-03 11:49:51 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Gleb Smirnoff	d17d4c6b2a	Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.	2016-01-27 00:20:07 +00:00
Marko Zec	ca7ba6a8fd	Prune a definition which is / was never used.	2016-01-25 20:35:15 +00:00
Alexander V. Chernikov	94017572ab	Fix flowtable part missed in r294706.	2016-01-25 09:31:32 +00:00
Alexander V. Chernikov	61eee0e202	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
Alexander V. Chernikov	809da2a3e0	Remove unused radix_mpath definitions.	2016-01-25 05:28:19 +00:00
Marcelo Araujo	d62edc5eb5	Add an IOCTL rr_limit to let users fine tuning the number of packets to be sent using roundrobin protocol and set a better granularity and distribution among the interfaces. Tuning the number of packages sent by interface can increase throughput and reduce unordered packets as well as reduce SACK. Example of usage: # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \ 192.168.1.1 netmask 255.255.255.0 # ifconfig lagg0 rr_limit 500 Reviewed by: thompsa, glebius, adrian (old patch) Approved by: bapt (mentor) Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D540	2016-01-23 04:18:44 +00:00
Bjoern A. Zeeb	009e81b164	MFH @r294567	2016-01-22 15:11:40 +00:00
Bjoern A. Zeeb	1f12da0e82	Just checkpoint the WIP in order to be able to make the tree update easier. Note: this is currently not in a usable state as certain teardown parts are not called and the DOMAIN rework is missing. More to come soon and find its way to head. Obtained from: P4 //depot/user/bz/vimage/... Sponsored by: The FreeBSD Foundation	2016-01-22 15:00:01 +00:00
Alexander V. Chernikov	b7d076ed19	Clean up original route path selection logic a bit. NULL pointer dereference claimed by Coverity was possible if one (or several) next-hops for had their weights set to 0. CID: 1348482	2016-01-15 13:47:11 +00:00
Alexander V. Chernikov	fcbfdb37a1	Fix panic in IP redirect. Panic was introduced in r293466. Found by: Yamagi Burmeister <lists at yamagi.org>>	2016-01-14 16:31:00 +00:00
Alexander V. Chernikov	10e0e23528	Remove now-unused wrappers for various routing functions.	2016-01-14 08:54:44 +00:00
Alexander V. Chernikov	0eb64f4e44	Remove RTF_RNH_LOCKED support from rtalloc1_fib(). Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC	2016-01-13 14:32:48 +00:00
Alexander V. Chernikov	59747033cd	Bring RADIX_MPATH support to new routing KPI to ease migration. Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.	2016-01-11 08:45:28 +00:00
Alexander V. Chernikov	e5f3746abd	Do not rewrite all ro_flags.	2016-01-11 08:00:13 +00:00
Alexander V. Chernikov	64e9493420	Fix userland build broken by r293470. Pointy hat to: melifaro	2016-01-09 18:42:12 +00:00
Alexander V. Chernikov	36402a681f	Finish r275196: do not dereference rtentry in if_output() routines. The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae	2016-01-09 16:34:37 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Alexander V. Chernikov	f2b2e77a41	(Temporarily) remove route_redirect_event eventhandler. Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np	2016-01-09 06:26:40 +00:00
Alexander V. Chernikov	16703ea811	Please Coverity by removing unneccessary check (rt_key() is always set). Coverity CID: 1347797	2016-01-09 05:39:06 +00:00
Alexander V. Chernikov	048738b546	Do more fine-grained locking in rtrequest1_fib(). Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643. Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better locking for RTM_ADD / RTM_DELETE cases.	2016-01-08 16:25:11 +00:00
Alexander V. Chernikov	9a1b64d5a0	Add rib_lookup_info() to provide API for retrieving individual route entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).	2016-01-04 15:03:20 +00:00
Alexander V. Chernikov	0d4df0290e	Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu(). Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae	2016-01-03 09:54:03 +00:00
Alexander V. Chernikov	6cdb18544d	Remove second EVENTHANDLER_REGISTER slipped in r292978. Describe the reason of doing unconditional M_PREPEND in ether_output().	2016-01-01 10:15:06 +00:00
Marcelo Araujo	25656def0d	Clean up unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie Approved by: rodrigc (mentor) Differential Revision: https://reviews.freebsd.org/D4719	2015-12-31 07:03:41 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Marcelo Araujo	2bfd3dfb9f	Wrap using #ifdef 'notyet' those variables and statements not yet implemented to lower the compiler warnings. It fix the case of unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie, ae Approved by: bapt (mentor) Differential Revision: https://reviews.freebsd.org/D4720	2015-12-31 02:01:20 +00:00
Alexander V. Chernikov	a18742e938	Add SFF-8024 Extended Specification Compliance Submitted by: markb_mellanox.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4666	2015-12-28 09:26:07 +00:00
Bjoern A. Zeeb	f501e6f136	If vnets are torn down while ifconfig runs an ioctl to say, destroy an epair(4), we may hit if_detach_internal() without holding a lock and by the time we aquire it the interface might be gone. We should not panic() in this case as it is our fault for not holding the lock all the way. It is not ideal to return silently without error to user space, but other callers will all ignore the return values so do not change the entire KPI for little benefit for now. The ifp will be dealt with one way or another still. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4529	2015-12-22 15:03:45 +00:00
Bjoern A. Zeeb	616bc4f476	If bootverbose is enabled every vnet startup and virtual interface creation will print extra lines on the console. We are generally not interested in this (repeated) information for each VNET. Thus only print it for the default VNET. Virtual interfaces on the base system will remain printing information, but e.g. each loopback in each vnet will no longer cause a "bpf attached" line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4531	2015-12-22 15:00:04 +00:00
Bjoern A. Zeeb	76d68eccbd	Simplify bringup order by removing a SYSINIT making it a static list initialization. Mfp4 @180384,180385: There is no need for a dedicated SYSINIT here. The list can be initialized statically. Sponsored by: CK Software GmbH Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4528	2015-12-22 14:57:04 +00:00
Steven Hartland	d6e82913c1	Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay	2015-12-17 14:41:30 +00:00
Alexander V. Chernikov	427c2f4ef0	Provide additional lle data in IPv6 lltable dump used by ndp(8). Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae	2015-12-16 10:14:16 +00:00
Alexander V. Chernikov	0792bcbb54	Convert if_stf(4) to new routing api.	2015-12-16 09:18:20 +00:00
Steven Hartland	52e53e2de0	Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111	2015-12-15 16:02:11 +00:00
Alexander V. Chernikov	6af272d88e	Fix PINNED routes handling. Before r291643, adding new interface prefix had the following logic: try_add: EEXIST && (PINNED) { try_del(w/o PINNED flag) if (OK) try_add(PINNED) } In r291643, deletion was performed w/ PINNED flag held which leaded to new interface prefixes (like ::1) overriding older ones. Fix this by requesting deletion w/o RTF_PINNED. PR: kern/205285 Submitted by: Fabian Keil <fk at fabiankeil.de>	2015-12-13 16:37:01 +00:00
Alexander V. Chernikov	12cb7521c2	Remove LLE read lock from IPv6 fast path. LLE structure is mostly unchanged during its lifecycle: there are only 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we send NS to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: Special r_skip_req (introduced in D3688) value is used for fast path feedback. It is read lockless by fast path, but updated under req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. After transitioning to STALE state, callout timer is armed to run each V_nd6_delay seconds to make sure that if packet was transmitted at the start of given interval, we would be able to switch to PROBE state in V_nd6_delay seconds as user expects. (in STALE state) timer is rescheduled until original V_nd6_gctimer expires keeping lle in STALE state (remaining timer value stored in lle_remtime). (in STALE state) timer is rescheduled if packet was transmitted less that V_nd6_delay seconds ago to make sure we transition to PROBE state exactly after V_n6_delay seconds. As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled by fast path without acquiring lle read lock. Differential Revision: https://reviews.freebsd.org/D3780	2015-12-13 07:39:49 +00:00
Alexander V. Chernikov	65ff3638df	Merge helper fib* functions used for basic lookups. Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC	2015-12-08 10:50:03 +00:00
Alexander V. Chernikov	f8aee88f0b	Remove LLE read lock from IPv4 fast path. LLE structure is mostly unchanged during its lifecycle. To be more specific, there are 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we re-send arp request to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: 1) introduce special r_skip_req field which is read lockless by fast path, but updated under (new) req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. 2) introduce simple state machine: incomplete->reachable<->verify->deleted. Before that we implicitely had incomplete->reachable->deleted state machine, with V_arpt_keep between "reachable" and "deleted". Verification was performed in runtime 5 seconds before V_arpt_keep expire. This is changed to "change state to verify 5 seconds before V_arpt_keep, set r_skip_req to non-zero value and check it every second". If the value is zero - then send arp verification probe. These changes do not introduce any signifficant control plane overhead: typically lle callout timer would fire 1 time more each V_arpt_keep (1200s) for used lles and up to arp_maxtries (5) for dead lles. As a result, all packets towards "reachable" lle are handled by fast path without acquiring lle read lock. Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler might keep LLE lock for signifficant amount of time, which might not be feasible for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock). Differential Revision: https://reviews.freebsd.org/D3688	2015-12-05 09:50:37 +00:00
Alexander V. Chernikov	4b3dc89847	Move RTF_PINNED handling to generic route code. This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.	2015-12-02 08:17:31 +00:00
Enji Cooper	af5c99e53f	Fix LINT-NOIP kernels after r291467 rn is only used if INET or INET6 are defined Sponsored by: EMC / Isilon Storage Division	2015-12-01 05:59:53 +00:00
Alexander V. Chernikov	674e0823c1	Move flowtable rte checks to separate function.	2015-11-30 05:59:22 +00:00
Alexander V. Chernikov	e8b0643eee	Add new rt_foreach_fib_walk_del() function for deleting route entries by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().	2015-11-30 05:51:14 +00:00
Enji Cooper	766b4e4b5c	Fix building sys/modules/if_enc by adding missing headers X-MFC with: r291292, r291299 (if that ever happens) Pointyhat to: ae	2015-11-25 21:16:10 +00:00
Andrey V. Elsukov	03b7b4bf05	Fix the build.	2015-11-25 11:31:07 +00:00
Andrey V. Elsukov	ef91a9765d	Overhaul if_enc(4) and make it loadable in run-time. Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).	2015-11-25 07:31:59 +00:00
Fabien Thomas	d6d3f24890	Implement the sadb_x_policy_priority field as it is done in Linux: lower priority policies are inserted first. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield	2015-11-17 14:39:33 +00:00
Alexander V. Chernikov	e4790abf19	Pass provided af instead of AF_UNSPEC to setwa_f callback.	2015-11-14 18:16:17 +00:00
Alexander V. Chernikov	8ad43f2d0a	Move iflladdr_event eventhandler invocation to if_setlladdr. Suggested by: glebius	2015-11-14 13:34:03 +00:00
Randall Stewart	7c4676ddee	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.	2015-11-13 22:51:35 +00:00
Alexander V. Chernikov	b13c5b5db2	Use lladdr_event to propagate gratiotus arp. Differential Revision: https://reviews.freebsd.org/D4019	2015-11-09 10:11:14 +00:00
Alexander V. Chernikov	ddd208f7ad	Unify setting lladdr for AF_INET[6].	2015-11-07 11:12:00 +00:00
Steven Hartland	c1be893c44	Add sysctl to control LACP strict compliance default Add net.link.lagg.lacp.default_strict_mode which defines the default value for LACP strict compliance for created lagg devices. Also: * Add lacp_strict option to ifconfig(8). * Fix lagg(4) creation examples. * Minor style(9) fix. MFC after: 1 week	2015-11-06 15:33:27 +00:00
George V. Neville-Neil	33872124a5	Replace the fastforward path with tryforward which does not require a sysctl and will always be on. The former split between default and fast forwarding is removed by this commit while preserving the ability to use all network stack features. Differential Revision: https://reviews.freebsd.org/D4042 Reviewed by: ae, melifaro, olivier, rwatson MFC after: 1 month Sponsored by: Rubicon Communications (Netgate)	2015-11-05 07:26:32 +00:00
Randall Stewart	d1a6f62c45	Fix three flowtable bugs, a) one lookup issue, b) a two cleaner issue. MFC after: 3 days Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4014	2015-11-02 21:21:00 +00:00

... 3 4 5 6 7 ...

3922 Commits