freebsd-skq

Author	SHA1	Message	Date
Alexander V. Chernikov	6db47af467	Rename rt_msg1() to more handy rtsock_msg_mbuf(). (Just for history purposes: rt_msg2() was renamed to rtsock_msg_buffer() in r265019). Sponsored by: Yandex LLC MFC after: 1 month	2014-05-08 13:54:57 +00:00
Alexander V. Chernikov	3deb3649d5	Fix incorrect netmasks being passed via rtsock. Since radix has been ignoring sa_family in passed sockaddrs, no one ever has bothered filling valid sa_family in netmasks. Additionally, radix adjusts sa_len field in every netmask not to compare zero bytes at all. This leads us to rt_mask with sa_family of AF_UNSPEC (-1) and arbitrary sa_len field (0 for default route, for example). However, rtsock have been passing that rt_mask intact for ages, requiring all rtsock consumers to make ther own local hacks. We even have unfixed on in base: do `route -n monitor` in one window and issue `route -n get addr` for some directly-connected address. You will probably see the following: got message of size 304 on Thu May 8 15:06:06 2014 RTM_GET: Report Metrics: len 304, pid: 30493, seq 1, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 10.0.0.0 link#1 (255) ffff ffff ff em0:8.0.27.c5.29.d4 10.0.0.92 _________________^^^^^^^^^^^^^^^^^^ after the change: got message of size 312 on Thu May 8 15:44:07 2014 RTM_GET: Report Metrics: len 312, pid: 2895, seq 1, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 10.0.0.0 link#1 255.255.255.0 em0:8.0.27.c5.29.d4 10.0.0.92 _________________^^^^^^^^^^^^^^^^^^ Sponsored by: Yandex LLC MFC after: 1 month	2014-05-08 11:56:06 +00:00
Alexander V. Chernikov	c9f98940b9	Fix sysctl_ifmalist() broken in r265019. Reported by: Olivier Cochard-Labbé MFC with: r265019	2014-05-03 17:57:06 +00:00
Alexander V. Chernikov	972ed56a33	Remove additional fib checks from rtalloc1_fib. It looks like current consumers are either unaware of MRT (and uses RT_DEFAULT_FIB implicitly) or know what thay are doing, In latter case they will be either hit by KASSERT or ESCRH will be returned due to NULL rnh.	2014-05-03 16:38:05 +00:00
Alexander V. Chernikov	b980262e63	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
Alan Somers	f544a74870	Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed. The thread that is destroying the lagg has already set sc->sc_psc=NULL when the "ifconfig -am" thread gets to lacp_req(). It tries to dereference sc->sc_psc and panics. The solution is for lacp_req() to check the value of sc->sc_psc. If NULL, harmlessly return an lacp_opreq structure full of zeros. Full details in GNATS. PR: kern/189003 Reviewed by: timeout on freebsd-net@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-05-02 16:24:09 +00:00
Alexander V. Chernikov	32fb15e802	Fix rnh_walktree_from() function (patch from kern/174959). Require valid netmask to be passed since host route is always a leaf. PR: kern/174959 Submitted by: Keith Sklower MFC after: 2 weeks	2014-05-01 15:04:32 +00:00
Alexander V. Chernikov	d9437c0f46	Partially revert r265019 - allocating 512 bytes on stack can be too much for architectures like ARM. Always use rounded malloc instead. Discussed with: jmallett MFC after: 4 weeks	2014-04-29 19:48:11 +00:00
Alexander V. Chernikov	0fb9298db9	Move rt_setmetrics() from rtsock.c to route.c. All rtsock-initiated rte creation/modification are now performed in route.c holding radix tree write lock. This reduces the need for per-rte mutex. Sponsored by: Yandex LLC MFC after: 1 month	2014-04-29 19:14:42 +00:00
Alexander V. Chernikov	a713ee5cf7	Do not use senderr() in rtrequest1_fib_change(). Suggested by: glebius MFC after: 4 weeks	2014-04-29 12:52:36 +00:00
Alexander V. Chernikov	de46b2c650	Fix build Found by: ian Pointyhat to: me	2014-04-27 21:17:54 +00:00
Alexander V. Chernikov	f2e5eb368a	Improve memory allocation model for rt_msg2() rtsock messages: * memory is now allocated as early as possible, without holding locks. * sysctl users are now guaranteed to get a response (M_WAITOK buffer prealloc). * socket users are more likely to use on-stack buffer for replies. * standard kernel malloc/free functions are now used instead of radix wrappers. rt_msg2() has been renamed to rtsock_msg_buffer(). MFC after: 1 month	2014-04-27 17:41:18 +00:00
Alexander V. Chernikov	f1fcb55271	Remove useless zeroing of RTAX_DST on error. Cleanup a bit. MFC after: 1 month	2014-04-27 10:43:48 +00:00
Alexander V. Chernikov	92c227af54	Cleanup route_output() a bit. MFC after: 1 month	2014-04-27 10:20:37 +00:00
Alexander V. Chernikov	2277c5e5e2	Do not delay freeing rtm. Bandaid added in r227061 is not needed since r227061, MFC after: 1 month	2014-04-27 09:49:35 +00:00
Alexander V. Chernikov	f5d9a6964d	Move up fibnum to ensure it is always defined. Found by: ian MFC with: r264987	2014-04-27 02:20:09 +00:00
Alexander V. Chernikov	f59c6cb0fc	Remove useless `register' declarations. MFC after: 1 month	2014-04-26 22:42:21 +00:00
Alexander V. Chernikov	773aa0533b	Determine fibnum once in the beginning of route_output(). MFC after: 1 month	2014-04-26 22:32:04 +00:00
Alexander V. Chernikov	c77462dd64	Decouple RTM_CHANGE from RTM_GET handling in rtsock.c:route_output(). RTM_CHANGE is now handled inside route.c:rtrequest1_fib() as it should be. Note change change handler is a separate function rtrequest1_fib_change(). MFC after: 1 month	2014-04-26 21:03:41 +00:00
Alexander V. Chernikov	36d55f0f9d	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
Alan Somers	0cfee0c223	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
Alan Somers	0489b8916e	Fix host and network routes for new interfaces when net.add_addr_allfibs=0 sys/net/route.c In rtinit1, use the interface fib instead of the process fib. The latter wasn't very useful because ifconfig(8) is usually invoked with the default process fib. Changing ifconfig(8) to use setfib(2) would be redundant, because it already sets the interface fib. tests/sys/netinet/fibs_test.sh Clear the expected ATF failure sys/net/if.c Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib sys/netinet/in.c sys/net/if_var.h Add a fibnum argument to ifa_switch_loopback_route, a subroutine of in_scrubprefix. Pass it the interface fib. PR: kern/187549 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-04-24 17:23:16 +00:00
Martin Matuska	ecb47cf9c5	Backport from projects/pf r263908: De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. MFC after: 1 week	2014-04-20 09:17:48 +00:00
John-Mark Gurney	de51fbd695	garbage collect something that hasn't been triggered in almost 5 years... the last consumer was removed a couple years ago...	2014-04-19 19:08:08 +00:00
Rick Macklem	209579aeac	For NFS mounts using rsize,wsize=65536 over TSO enabled network interfaces limited to 32 transmit segments, there are two known issues. The more serious one is that for an I/O of slightly less than 64K, the net device driver prepends an ethernet header, resulting in a TSO segment slightly larger than 64K. Since m_defrag() copies this into 33 mbuf clusters, the transmit fails with EFBIG. A tester indicated observing a similar failure using iSCSI. The second less critical problem is that the network device driver must copy the mbuf chain via m_defrag() (m_collapse() is not sufficient), resulting in measurable overhead. This patch reduces the default size of if_hw_tsomax slightly, so that the first issue is avoided. Fixing the second issue will require a way for the network device driver to inform tcp_output() that it is limited to 32 transmit segments. Reported and tested by: csforgeron@gmail.com, markus.gebert@hostpoint.ch MFC after: 2 weeks	2014-04-17 23:31:50 +00:00
Rick Macklem	bd9fd667c4	Vlan did not set the value of if_hw_tsomax, so when vlan was stacked on top of a network interface that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface. This patch modifies vlan so that it sets if_hw_tsomax to the value of the parent interface. Reviewed by: glebius MFC after: 2 weeks	2014-04-15 21:48:35 +00:00
Rick Macklem	d092e11c6a	Fix build for non-INET that was broken by r264469. MFC after: 2 weeks	2014-04-15 13:28:54 +00:00
Rick Macklem	9387570f8a	Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces. Reviewed by: glebius MFC after: 2 weeks	2014-04-14 20:34:48 +00:00
Bruce M Simpson	d565e5f206	In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec. This KASSERT() existed as a sanity check that upper layers in the network stack (e.g. inet, inet6) had released their reference to the underlying driver's multicast memberships (ifmultiaddr{}). However it assumes the lifecycle of the driver membership corresponds to the lifecycle of the network layer membership. In the submitter's case, ieee80211_ioctl_updatemulti() attempts to reprogram the (parent, physical) ifnet{} memberships in response to a change in membership on the (child, virtual) VAP ifnet, using a batched update mechanism. These updates happen independently from the network layer, causing a "false negative" assertion failure. There are possibly other use cases where this KASSERT() may be triggered by other networking stack activity (e.g. where a nesting relationship exists between multiple ifnet{} instances). This suggests that further review of FreeBSD's approach to nested ifnet relationships is needed. MFC after: 6 weeks Submitted by: adrian@	2014-04-10 18:43:02 +00:00
Michael Tuexen	7f946da063	Call sctp_addr_change() from rt_addrmsg() instead of rt_newaddrmsg_fib(), since rt_addrmsg() gets also called from other functions. MFC after: 3 days	2014-04-07 21:28:21 +00:00
Martin Matuska	7e92ce7380	De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. Reviewed by: Nikos Vassiliadis, trociny@	2014-03-29 09:05:25 +00:00
Martin Matuska	1709ccf9d3	Merge head up to r263906.	2014-03-29 08:39:53 +00:00
Martin Matuska	d318d97fb5	Merge from projects/pf r251993 (glebius@): De-vnet hash sizes and hash masks. Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny MFC after: 1 month	2014-03-25 06:55:53 +00:00
Navdeep Parhar	61b9f217c6	Add a shorter alias for if_data.ifi_oqdrops.	2014-03-20 02:23:52 +00:00
Julio Merino	b6d88aa39b	Include strings.h so that bpf_filter.c can be built in userland. This is to bring in a definition for bzero(3), which in turn allows the tests in tools/regression/bpf/ to build again.	2014-03-19 13:10:25 +00:00
Gleb Smirnoff	2d70c0ded1	When exporting ifnet via sysctl, add ifqueue(9) drop count to the ifi_oqdrops. This is a temporary workaround until ifqueue(9) vanishes. While here, remove the pointless ifi_vhid assignment. It has sense only when we are exporting ifaddrs, not ifnets. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-19 06:08:03 +00:00
Gleb Smirnoff	66dcee729c	Garbage collect long time obsoleted (or never used) stuff from routing API.	2014-03-15 06:49:32 +00:00
Robert Watson	7527624efa	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
Gleb Smirnoff	45c203fce2	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
Gleb Smirnoff	2c284d9395	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
Gleb Smirnoff	b245f96c44	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
Gleb Smirnoff	256ea2aba9	The route code used to mtx_destroy() a locked mutex before rtentry free. Now, after r262763 it started to return locked mutexes to UMA. To fix that, conditionally unlock the mutex in the destructor. Tested by: "Sergey V. Dyatko" <sergey.dyatko@gmail.com>	2014-03-05 21:16:46 +00:00
Gleb Smirnoff	f788ab5f0b	Pacify gcc.	2014-03-05 02:35:15 +00:00
Gleb Smirnoff	5274e55eb3	Hide struct rtentry from userland.	2014-03-05 01:47:08 +00:00
Gleb Smirnoff	e3a7aa6f56	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
Gleb Smirnoff	fb3541ad15	Instead of playing games with casts simply add 3 more members to the structure pf_rule, that are used when the structure is passed via ioctl(). PR: 187074	2014-03-05 00:40:03 +00:00
George V. Neville-Neil	9b5f5ede41	Revert previous commit (262727) and bounce patch back to the submitter. Pointed out by: jhb	2014-03-04 23:55:04 +00:00
George V. Neville-Neil	596031c052	Naming consistency fix. The routing code defines RADIX_NODE_HEAD_LOCK as grabbing the write lock, but RADIX_NODE_HEAD_LOCK_ASSERT as checking the read lock. Submitted by: Vijay Singh <vijju.singh at gmail.com> MFC after: 1 month	2014-03-04 05:09:46 +00:00
John Baldwin	5b26ea5df3	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
Martin Matuska	5748b897da	Merge head up to r262222 (last merge was incomplete).	2014-02-19 22:02:15 +00:00

1 2 3 4 5 ...

3173 Commits