freebsd-skq

Author	SHA1	Message	Date
hrs	3297c817fa	Virtualize if_edsc(4).	2014-10-05 21:27:26 +00:00
hrs	70e14abeb8	Virtualize if_disc(4) cloner.	2014-10-05 19:46:52 +00:00
hrs	0826f2b25d	Virtualize if_bridge(4) cloner.	2014-10-05 19:43:37 +00:00
hrs	d278f7c187	Use printb() for boolean flags in ro_opts and actor_state for LACP.	2014-10-05 02:37:01 +00:00
hrs	2edb8f7e2f	- Move L2 addr configuration for the primary port to a taskqueue. This fixes LOR of softc rmlock in iflladdr_event handlers. - Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR. - Fix a panic in lacp_transit_expire(). - Fix a panic in lagg_input() upon shutting down a port.	2014-10-05 02:34:21 +00:00
hrs	db53b4f174	Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for backward compatibility with old ifconfig(8).	2014-10-02 20:01:13 +00:00
hrs	d30b551ba7	Virtualize net.link.vlan.soft_pad.	2014-10-02 05:56:17 +00:00
hrs	667a2b7369	Virtualize lagg(4) cloner. This change fixes a panic when tearing down if_lagg(4) interfaces which were cloned in a vnet jail. Sysctl nodes which are dynamically generated for each cloned interface (net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift ifconfig(8) parameters have been added instead. Flags and per-interface statistics counters are displayed in "ifconfig -v". CR: D842	2014-10-01 21:37:32 +00:00
melifaro	e6ca9a3b21	Free radix mask entries on main radix destroy. This is temporary commit to be merged to 10. Other approach (like hash table) should be used to store different masks. PR: 194078 Submitted by: Rumen Telbizov MFC after: 3 days	2014-10-01 21:24:58 +00:00
melifaro	d8b683d70f	Remove lock init from radix.c. Radix has never managed its locking itself. The only consumer using radix with embeded rwlock is system routing table. Move per-AF lock inits there.	2014-10-01 14:39:06 +00:00
glebius	57bac09d3f	Fix off by one in lagg_port_destroy(). Reported by: "Max N. Boyarov" <zotrix bsd.by>	2014-10-01 11:23:54 +00:00
bz	aab771d812	Move the unconditional #include of net/ifq.h to the very end of file. This seems to allow us to pass a universe with either clang or gcc after r272244 (and r272260) and probably makes it easier to untabgle these chained #includes in the future.	2014-09-28 17:09:40 +00:00
bz	abef5517f6	Remove duplicate declaraton of the if_inc_counter() function after r272244. if_var.h has the expected on and if_var.h include ifq.h and thus we get duplicates. It seems only one cavium ethernet file actually includes ifq.h directly which might be another cleanup to be done but need to test first.	2014-09-28 15:38:21 +00:00
glebius	0f9d61b26b	- Remove empty wrappers ether_poll_[de]register_drv(). [1] - Move polling(9) declarations out of ifq.h back to if_var.h they are absolutely unrelated to queues. Submitted by: Mikhail <mp lenta.ru> [1]	2014-09-28 14:05:18 +00:00
glebius	2cb6078939	Finally, convert counters in struct ifnet to counter(9). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-28 08:57:07 +00:00
glebius	53d7fba29f	Convert to if_inc_counter() last remnantes of bare access to struct ifnet counters.	2014-09-28 07:43:38 +00:00
melifaro	7d70b89c51	Use underlying ports counters to get lagg statistics instead of per-packet accounting. This introduce user-visible changes like aggregating error counters. Reviewed by: asomers (prev.version), glebius CR: D781 MFC after: 2 weeks Sponsored by: Yandex LLC	2014-09-27 13:57:48 +00:00
glebius	58a4ee184a	Remove macros that hide access to struct ifnet fields.	2014-09-26 13:02:29 +00:00
glebius	f564c3e730	Make all lagg protocol methods live in lagg_protos, not in softc. All interfaces of a same protocol, use the same methods. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 12:54:24 +00:00
ae	530a56d2e5	Keep list of lagg ports sorted by if_index. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2014-09-26 12:42:06 +00:00
glebius	7f6197c96b	- Whitespace. - Remove caddr_t.	2014-09-26 12:35:58 +00:00
glebius	62993359de	- Provide lagg_proto_attach(), lagg_proto_detach(). - Make detach a protocol method in lagg_protos. - Simplify code to lookup protocols. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 11:01:04 +00:00
glebius	680ed8e05c	- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE, then drop lock, run the attach routines, and then set it to specific proto. This removes tons of WITNESS warnings. - Make lagg protocol attach handlers not failing and allocate memory with M_WAITOK. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 08:42:32 +00:00
glebius	ee9b35f736	Make lagg protos a enum.	2014-09-26 08:12:12 +00:00
glebius	e30ec249f1	Make lagg protocols detach methods returning void. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-26 07:12:40 +00:00
hselasky	bdacf9ba4d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
hrs	3eeeb7c9a3	Fix build.	2014-09-21 07:16:51 +00:00
hrs	ffad09823e	- Virtualize interface cloner for gre(4). This fixes a panic when destroying a vnet jail which has a gre(4) interface. - Make net.link.gre.max_nesting vnet-local.	2014-09-21 03:56:06 +00:00
hrs	05fa00b397	Virtualize interface cloner for gif(4). This fixes a panic when destroying a vnet jail which has a gif(4) interface.	2014-09-21 03:55:04 +00:00
hrs	5e2751fde9	Make net.add_addr_allfibs vnet-local.	2014-09-21 03:48:20 +00:00
glebius	f2cafe032f	Mechanically convert to if_inc_counter().	2014-09-19 10:39:58 +00:00
glebius	72f04611ec	Remove ifq_drops from struct ifqueue. Now queue drops are accounted in struct ifnet if_oqdrops. Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops is simply removed from them. There were no API to read this statistic. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-19 09:01:19 +00:00
glebius	b010d64973	Increase errors, not queue drops, in cases the module is supplied with a bad packet or if mbuf allocation failes.	2014-09-19 05:43:38 +00:00
glebius	de25153d59	Remove a bunch of methods that are superseded by if_inc_counter(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-18 16:17:20 +00:00
glebius	c2d27a81fe	While not too late rename 'ifnet_counter' to 'ift_counter'. One of the imporant moments that we discussed with Marcel and Anuranjan was that a converted driver should return false for 'grep ifnet if_driver.c' :) Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-18 14:47:13 +00:00
glebius	0917a065ca	Add a function to set if_get_counter method for an ifnet. To be used in the drivers that are already converted to "Juniper drvapi". This can be revisited in future.	2014-09-18 14:38:28 +00:00
glebius	bf71125f67	While not too late rename if_get_counter_compat() to if_get_counter_default(). The compat counters will go away, but the function will remain in its place, and in all places where it is going to be called. Discussed with: melifaro	2014-09-18 10:01:56 +00:00
glebius	f76e492f6d	Add if_inc_counter(), a generic method to update ifnet(9) counter w/o dereferencing the struct. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-09-18 09:54:57 +00:00
araujo	36c415243f	Revert r271735. The comment is absolutely correct, we do not support 802.1p priority tagging. I got confused with the packet tagged and packet to be tagged. Spotted by: glebius	2014-09-18 05:43:19 +00:00
araujo	80e51d9d11	Remove old comment, we already do 802.1q tagging. Phabric: D797 Reviewed by: kevlo Approved by: kevlo Sponsored by: QNAP Systems Inc.	2014-09-18 03:09:34 +00:00
araujo	d7a9c633d7	Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group and receives frames on any port of the lagg(4). Phabric: D549 Reviewed by: glebius, thompsa Approved by: glebius Obtained from: OpenBSD Sponsored by: QNAP Systems Inc.	2014-09-18 02:12:48 +00:00
melifaro	f0ab9ab876	* Fix if_omcast handling * Convert if_oerrors to pcpu. Suggested by: glebius MFC after: 2 weeks	2014-09-16 21:48:48 +00:00
hselasky	727760a4e4	Revert r271504. A new patch to solve this issue will be made. Suggested by: adrian @	2014-09-13 20:52:01 +00:00
melifaro	bf4280c0a8	Switch if_vlan(4) to rmlock. MFC after: 2 weeks	2014-09-13 18:41:24 +00:00
melifaro	a31da764ba	Switch if_vlan(4) to use counter(9) using new if_get_counter api.	2014-09-13 18:13:08 +00:00
hselasky	3d04a989df	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies	2014-09-13 08:26:09 +00:00
asomers	081aa8a15c	Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and ifa_ifwithdstaddr. For the sake of backwards compatibility, the new arguments were added to new functions named ifa_ifwithnet_fib and ifa_ifwithdstaddr_fib, while the old functions became wrappers around the new ones that passed RT_ALL_FIBS for the fib argument. However, the backwards compatibility is not desired for FreeBSD 11, because there are numerous other incompatible changes to the ifnet(9) API. We therefore decided to remove it from head but leave it in place for stable/9 and stable/10. In addition, this commit adds the fib argument to ifa_ifwithbroadaddr for consistency's sake. sys/sys/param.h Increment __FreeBSD_version sys/net/if.c sys/net/if_var.h sys/net/route.c Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute. sys/net/route.c sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_options.c sys/netinet/ip_output.c sys/netinet6/nd6.c Fixup calls of modified functions. share/man/man9/ifnet.9 Document changed API. CR: https://reviews.freebsd.org/D458 MFC after: Never Sponsored by: Spectra Logic	2014-09-11 20:21:03 +00:00
adrian	b73995e058	Update the IPv4 input path to handle reassembled frames and incoming frames with no RSS hash. When doing RSS: * Create a new IPv4 netisr which expects the frames to have been verified; it just directly dispatches to the IPv4 input path. * Once IPv4 reassembly is done, re-calculate the RSS hash with the new IP and L3 header; then reinject it as appropriate. * Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash function (rss_soft_m2cpuid) - this will do a software hash if the hardware doesn't provide one. NICs that don't implement hardware RSS hashing will now benefit from RSS distribution - it'll inject into the correct destination netisr. Note: the netisr distribution doesn't work out of the box - netisr doesn't query RSS for how many CPUs and the affinity setup. Yes, netisr likely shouldn't really be doing CPU stuff anymore and should be "some kind of 'thing' that is a workqueue that may or may not have any CPU affinity"; that's for a later commit. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan	2014-09-09 04:18:20 +00:00
glebius	2e01608625	Clean up unused CSUM_FRAGMENT. Sponsored by: Nginx, Inc.	2014-09-03 08:30:18 +00:00
glebius	9dfcf3eeb1	Toss fields so that no padding field is required to achieve alignment.	2014-08-31 13:30:54 +00:00
glebius	833eb3c331	It is actually possible to have if_t a typedef to non-void type, and keep both converted to drvapi and non-converted drivers compilable. o Make if_t typedef to struct ifnet *. o Remove shim functions. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-31 12:48:13 +00:00
glebius	3b5ede57e9	Provide pointer from struct ifnet to struct netmap_adapter, instead of abusing spare field.	2014-08-31 11:33:19 +00:00
glebius	70b7c46209	o Remove struct if_data from struct ifnet. Now it is merely API structure for route(4) socket and ifmib(4) sysctl. o Move fields from if_data to ifnet, but keep all statistic counters separate, since they should disappear later. o Provide function if_data_copy() to fill if_data, utilize it in routing socket and ifmib handler. o Provide overridable ifnet(9) method to fetch counters. If no provided, if_get_counters_compat() would be used, that returns old counters. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-08-31 06:46:21 +00:00
glebius	73b170d619	Remove ability to write to struct if_data residing in struct ifnet via net.link.generic.IFMIB_IFDATA.*.IFDATA_GENERAL sysctl. Reasons for removal are: - No code in tree uses this possibility. - The documentation ifmib(4) doesn't say that such possibility exist. The example provided in manual page only reads data. - On many interfaces the feature simply doesn't work, since they do accounting in hardware, and overwrite if_data on tick. Sponsored by: Nginx, Inc.	2014-08-31 06:23:54 +00:00
melifaro	69a7dea554	* Add SIOCGI2C driver ioctl used to retrieve i2c info. * Convert ixgbe to use this ioctl * Convert ifconfig to use generic i2c handler for "ix" interfaces. Approved by: Eric Joyner (ixgbe part) MFC after: 2 weeks Sponsored by: Yandex LLC	2014-08-29 18:02:58 +00:00
melifaro	81b97334ef	* Add new net/sff8436.h containing constants used to access QSFP+ data via i2c inteface. These constants has been taken from SFF-8436 "QSFP+ 10 Gbs 4X PLUGGABLE TRANSCEIVER" standard rev 4.8. * Add support for printing QSFP+ information from 40G NICs such as Chelsio T5. This commit does not contain ioctl changes necessary for this functionality work, there will be another commit soon. Example: cxl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,.....> ether 00:07:43:28:ad:08 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet 40Gbase-LR4 <full-duplex> status: active plugged: QSFP+ 40GBASE-LR4 (MPO Parallel Optic) vendor: OEM PN: OP-QSFP-40G-LR4 SN: 20140318001 DATE: 2014-03-18 module temperature: 64.06 C voltage: 3.26 Volts lane 1: RX: 0.47 mW (-3.21 dBm) TX: 2.78 mW (4.46 dBm) lane 2: RX: 0.20 mW (-6.94 dBm) TX: 2.80 mW (4.47 dBm) lane 3: RX: 0.18 mW (-7.38 dBm) TX: 2.79 mW (4.47 dBm) lane 4: RX: 0.90 mW (-0.45 dBm) TX: 2.80 mW (4.48 dBm) Tested on: Chelsio T5 Tested on: Mellanox/Huawei passive/active cables/transceivers. MFC after: 2 weeks Sponsored by: Yandex LLC	2014-08-21 17:54:42 +00:00
melifaro	3142dab2d4	* Use standard net/sff8472.h header for sff bits and offsets. * Convert sff_8472_id to 'const char *' to please clang. Pointed by: np	2014-08-16 21:53:44 +00:00
luigi	3ab69a246b	Update to the current version of netmap. Mostly bugfixes or features developed in the past 6 months, so this is a 10.1 candidate. Basically no user API changes (some bugfixes in sys/net/netmap_user.h). In detail: 1. netmap support for virtio-net, including in netmap mode. Under bhyve and with a netmap backend [2] we reach over 1Mpps with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode. 2. (kernel) add support for multiple memory allocators, so we can better partition physical and virtual interfaces giving access to separate users. The most visible effect is one additional argument to the various kernel functions to compute buffer addresses. All netmap-supported drivers are affected, but changes are mechanical and trivial 3. (kernel) simplify the prototype for txsync() and rxsync() driver methods. All netmap drivers affected, changes mostly mechanical. 4. add support for netmap-monitor ports. Think of it as a mirroring port on a physical switch: a netmap monitor port replicates traffic present on the main port. Restrictions apply. Drive carefully. 5. if_lem.c: support for various paravirtualization features, experimental and disabled by default. Most of these are described in our ANCS'13 paper [1]. Paravirtualized support in netmap mode is new, and beats the numbers in the paper by a large factor (under qemu-kvm, we measured gues-host throughput up to 10-12 Mpps). A lot of refactoring and additional documentation in the files in sys/dev/netmap, but apart from #2 and #3 above, almost nothing of this stuff is visible to other kernel parts. Example programs in tools/tools/netmap have been updated with bugfixes and to support more of the existing features. This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline. A lot of this code has been contributed by my colleagues at UNIPI, including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella. MFC after: 3 days.	2014-08-16 15:00:01 +00:00
royger	6a2fcceb9c	net: move interface removal notification up in if_detach_internal This is needed to prevent having interfaces with ifp->if_addr == NULL on bridge interfaces. Moving the notification event handlers up makes sure the interfaces are removed before doing any more cleanup. Sponsored by: Citrix Systems R&D Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D598 net/if.c - Move interface removal notification up in if_detach_internal.	2014-08-16 10:47:24 +00:00
kevlo	dd40fa7e62	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
glebius	7d0b571895	- Count global pf(4) statistics in counter(9). - Do not count global number of states and of src_nodes, use uma_zone_get_cur() to obtain values. - Struct pf_status becomes merely an ioctl API structure, and moves to netpfil/pf/pf.h with its constants. - V_pf_status is now of type struct pf_kstatus. Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net> Sponsored by: InnoGames GmbH	2014-08-14 18:57:46 +00:00
araujo	9abce0e567	- Remove unneeded include. Phabric: D563 Reviewed by: kevlo Approved by: kevlo	2014-08-11 03:04:16 +00:00
kevlo	7727a3c215	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
mav	1d4e2a0972	Improve locking of multicast addresses in VLAN and LAGG interfaces. This fixes several scenarios of reproducible panics, cause by races between multicast address changes and interface destruction. MFC after: 2 weeks	2014-08-04 00:58:12 +00:00
glebius	d32e428cc3	Garbage collect couple of unused fields from struct ifaddr: - ifa_claim_addr() unused since removal of NetAtalk - ifa_metric seems to be never utilized, always a copy of if_metric	2014-07-29 15:01:29 +00:00
kevlo	940bebdcf2	Deprecate m_act. Use m_nextpkt always.	2014-07-17 05:21:16 +00:00
hselasky	35b126e324	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
gjb	fc21f40567	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
hselasky	bd1ed65f0f	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
attilio	2802c525ad	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
melifaro	e9f6263cd3	Improve logic besides net.bpf.optimize_writers. Direct bpf(4) consumers should now work fine with this tunable turned on. In fact, the only case when optimized_writers can change program behavior is direct bpf(4) consumer setting its read filter to catch-all one. MFC after: 2 weeks Sponsored by: Yandex LLC	2014-06-11 11:27:44 +00:00
luigi	55c24573f4	misc bugfixes: - stdio.h is needed for fprint() - make memsize uint32_t to avoid errors due to overflow - honor the *XPOLL flagg in NIOCREGIF requests - mmap fails wit MAP_FAILED, not NULL. MFC after: 3 days	2014-06-06 15:17:19 +00:00
luigi	d33a7e50b0	whitespace change: fix one comment, remove a stale one.	2014-06-06 15:15:27 +00:00
luigi	359dad8e6c	whitespace change: remove trailing whitespace	2014-06-05 21:12:41 +00:00
marcel	916c7006f5	Introduce a procedural interface to the ifnet structure. The new interface allows the ifnet structure to be defined as an opaque type in NIC drivers. This then allows the ifnet structure to be changed without a need to change or recompile NIC drivers. Put differently, NIC drivers can be written and compiled once and be used with different network stack implementations, provided of course that those network stack implementations have an API and ABI compatible interface. This commit introduces the 'if_t' type to replace 'struct ifnet ' as the type of a network interface. The 'if_t' type is defined as 'void ' to enable the compiler to perform type conversion to 'struct ifnet *' and vice versa where needed and without warnings. The functions that implement the API are the only functions that need to have an explicit cast. The MII code has been converted to use the driver API to avoid unnecessary code churn. Code churn comes from having to work with both converted and unconverted drivers in correlation with having callback functions that take an interface. By converting the MII code first, the callback functions can be defined so that the compiler will perform the typecasts automatically. As soon as all drivers have been converted, the if_t type can be redefined as needed and the API functions can be fix to not need an explicit cast. The immediate benefactors of this change are: 1. Juniper Networks - The network stack implementation in Junos is entirely different from FreeBSD's one and this change allows Juniper to build "stock" NIC drivers that can be used in combination with both the FreeBSD and Junos stacks. 2. FreeBSD - This change opens the door towards changing ifnet and implementing new features and optimizations in the network stack without it requiring a change in the many NIC drivers FreeBSD has. Submitted by: Anuranjan Shukla <anshukla@juniper.net> Reviewed by: glebius@ Obtained from: Juniper Networks, Inc.	2014-06-02 17:54:39 +00:00
asomers	7ca8bf0f2c	Fix unintended KBI change from r264905. Add _fib versions of ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the _fib() versions with RT_ALL_FIBS, preserving legacy behavior. sys/net/if_var.h sys/net/if.c Add legacy-compatible functions as described above. Ensure legacy behavior when RT_ALL_FIBS is passed as fibnum. sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/net/route.c sys/net/rtsock.c sys/netinet6/nd6.c Call with _fib() functions if we must use a specific fib, or the legacy functions otherwise. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c Improve the udp_dontroute test. The bug that this test exercises is that ifa_ifwithnet() will return the wrong address, if multiple interfaces have addresses on the same subnet but with different fibs. The previous version of the test only considered one possible failure mode: that ifa_ifwithnet_fib() might fail to find any suitable address at all. The new version also checks whether ifa_ifwithnet_fib() finds the correct address by checking where the ARP request goes. Reported by: bz, hrs Reviewed by: hrs MFC after: 1 week X-MFC-with: 264905 Sponsored by: Spectra Logic	2014-05-29 21:03:49 +00:00
grehan	db7a034f36	Bump bhyve allocation up to 20 bits to avoid birthday-paradox style address collisions when bhyve VMs are connected to the same broadcoast domain and are using pseudo-random allocations. Reviewed by: gnn MFC after: 1 week	2014-05-20 02:59:13 +00:00
melifaro	f13915719f	Rename rt_msg1() to more handy rtsock_msg_mbuf(). (Just for history purposes: rt_msg2() was renamed to rtsock_msg_buffer() in r265019). Sponsored by: Yandex LLC MFC after: 1 month	2014-05-08 13:54:57 +00:00
melifaro	4a170f05aa	Fix incorrect netmasks being passed via rtsock. Since radix has been ignoring sa_family in passed sockaddrs, no one ever has bothered filling valid sa_family in netmasks. Additionally, radix adjusts sa_len field in every netmask not to compare zero bytes at all. This leads us to rt_mask with sa_family of AF_UNSPEC (-1) and arbitrary sa_len field (0 for default route, for example). However, rtsock have been passing that rt_mask intact for ages, requiring all rtsock consumers to make ther own local hacks. We even have unfixed on in base: do `route -n monitor` in one window and issue `route -n get addr` for some directly-connected address. You will probably see the following: got message of size 304 on Thu May 8 15:06:06 2014 RTM_GET: Report Metrics: len 304, pid: 30493, seq 1, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 10.0.0.0 link#1 (255) ffff ffff ff em0:8.0.27.c5.29.d4 10.0.0.92 _________________^^^^^^^^^^^^^^^^^^ after the change: got message of size 312 on Thu May 8 15:44:07 2014 RTM_GET: Report Metrics: len 312, pid: 2895, seq 1, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 10.0.0.0 link#1 255.255.255.0 em0:8.0.27.c5.29.d4 10.0.0.92 _________________^^^^^^^^^^^^^^^^^^ Sponsored by: Yandex LLC MFC after: 1 month	2014-05-08 11:56:06 +00:00
melifaro	1f938512ab	Fix sysctl_ifmalist() broken in r265019. Reported by: Olivier Cochard-Labbé MFC with: r265019	2014-05-03 17:57:06 +00:00
melifaro	bb33a54f34	Remove additional fib checks from rtalloc1_fib. It looks like current consumers are either unaware of MRT (and uses RT_DEFAULT_FIB implicitly) or know what thay are doing, In latter case they will be either hit by KASSERT or ESCRH will be returned due to NULL rnh.	2014-05-03 16:38:05 +00:00
melifaro	a4407f98c0	Pass radix head ptr along with rte to rtexpunge(). Rename rtexpunge to rt_expunge().	2014-05-03 16:28:54 +00:00
asomers	cf37d83a59	Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed. The thread that is destroying the lagg has already set sc->sc_psc=NULL when the "ifconfig -am" thread gets to lacp_req(). It tries to dereference sc->sc_psc and panics. The solution is for lacp_req() to check the value of sc->sc_psc. If NULL, harmlessly return an lacp_opreq structure full of zeros. Full details in GNATS. PR: kern/189003 Reviewed by: timeout on freebsd-net@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-05-02 16:24:09 +00:00
melifaro	53cfe851be	Fix rnh_walktree_from() function (patch from kern/174959). Require valid netmask to be passed since host route is always a leaf. PR: kern/174959 Submitted by: Keith Sklower MFC after: 2 weeks	2014-05-01 15:04:32 +00:00
melifaro	e75a4a90b5	Partially revert r265019 - allocating 512 bytes on stack can be too much for architectures like ARM. Always use rounded malloc instead. Discussed with: jmallett MFC after: 4 weeks	2014-04-29 19:48:11 +00:00
melifaro	1883ddc524	Move rt_setmetrics() from rtsock.c to route.c. All rtsock-initiated rte creation/modification are now performed in route.c holding radix tree write lock. This reduces the need for per-rte mutex. Sponsored by: Yandex LLC MFC after: 1 month	2014-04-29 19:14:42 +00:00
melifaro	b1337c7d4c	Do not use senderr() in rtrequest1_fib_change(). Suggested by: glebius MFC after: 4 weeks	2014-04-29 12:52:36 +00:00
melifaro	03224963a1	Fix build Found by: ian Pointyhat to: me	2014-04-27 21:17:54 +00:00
melifaro	a153bd7770	Improve memory allocation model for rt_msg2() rtsock messages: * memory is now allocated as early as possible, without holding locks. * sysctl users are now guaranteed to get a response (M_WAITOK buffer prealloc). * socket users are more likely to use on-stack buffer for replies. * standard kernel malloc/free functions are now used instead of radix wrappers. rt_msg2() has been renamed to rtsock_msg_buffer(). MFC after: 1 month	2014-04-27 17:41:18 +00:00
melifaro	c5479b1c51	Remove useless zeroing of RTAX_DST on error. Cleanup a bit. MFC after: 1 month	2014-04-27 10:43:48 +00:00
melifaro	29b944e3ac	Cleanup route_output() a bit. MFC after: 1 month	2014-04-27 10:20:37 +00:00
melifaro	bf1b5f8b0c	Do not delay freeing rtm. Bandaid added in r227061 is not needed since r227061, MFC after: 1 month	2014-04-27 09:49:35 +00:00
melifaro	f51d6fcb64	Move up fibnum to ensure it is always defined. Found by: ian MFC with: r264987	2014-04-27 02:20:09 +00:00
melifaro	5416196308	Remove useless `register' declarations. MFC after: 1 month	2014-04-26 22:42:21 +00:00
melifaro	78166405b1	Determine fibnum once in the beginning of route_output(). MFC after: 1 month	2014-04-26 22:32:04 +00:00
melifaro	e815654815	Decouple RTM_CHANGE from RTM_GET handling in rtsock.c:route_output(). RTM_CHANGE is now handled inside route.c:rtrequest1_fib() as it should be. Note change change handler is a separate function rtrequest1_fib_change(). MFC after: 1 month	2014-04-26 21:03:41 +00:00
melifaro	7b860c446e	Unify sa_equal() macro usage. MFC after: 2 weeks	2014-04-26 14:52:03 +00:00
asomers	f8a34b6f49	Fix subnet and default routes on different FIBs on the same subnet. These two bugs are closely related. The root cause is that ifa_ifwithnet does not consider FIBs when searching for an interface address. sys/net/if_var.h sys/net/if.c Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those functions will only return an address whose interface fib equals the argument. sys/net/route.c Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib arguments. sys/netinet/in.c Update in_addprefix to consider the interface fib when adding prefixes. This will prevent it from not adding a subnet route when one already exists on a different fib. sys/net/rtsock.c sys/netinet/in_pcb.c sys/netinet/ip_output.c sys/netinet/ip_options.c sys/netinet6/nd6.c Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet. In some cases it there wasn't a clear specific fib number to use. In others, I was unable to test those functions so I chose RT_DEFAULT_FIB to minimize divergence from current behavior. I will fix some of the latter changes along with PR kern/187553. tests/sys/netinet/fibs_test.sh tests/sys/netinet/udp_dontroute.c tests/sys/netinet/Makefile Revert r263738. The udp_dontroute test was right all along. However, bugs kern/187550 and kern/187553 cancelled each other out when it came to this test. Because of kern/187553, ifa_ifwithnet searched the default fib instead of the requested one, but because of kern/187550, there was an applicable subnet route on the default fib. The new test added in r263738 doesn't work right, however. I can verify with dtrace that ifa_ifwithnet returned the wrong address before I applied this commit, but route(8) miraculously found the correct interface to use anyway. I don't know how. Clear expected failure messages for kern/187550 and kern/187552. PR: kern/187550 PR: kern/187552 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic	2014-04-24 23:56:56 +00:00
asomers	6e7494c7e1	Fix host and network routes for new interfaces when net.add_addr_allfibs=0 sys/net/route.c In rtinit1, use the interface fib instead of the process fib. The latter wasn't very useful because ifconfig(8) is usually invoked with the default process fib. Changing ifconfig(8) to use setfib(2) would be redundant, because it already sets the interface fib. tests/sys/netinet/fibs_test.sh Clear the expected ATF failure sys/net/if.c Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib sys/netinet/in.c sys/net/if_var.h Add a fibnum argument to ifa_switch_loopback_route, a subroutine of in_scrubprefix. Pass it the interface fib. PR: kern/187549 Reviewed by: melifaro MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-04-24 17:23:16 +00:00
mm	532d55ab5f	Backport from projects/pf r263908: De-virtualize UMA zone pf_mtag_z and move to global initialization part. The m_tag struct does not know about vnet context and the pf_mtag_free() callback is called unaware of current vnet. This causes a panic. MFC after: 1 week	2014-04-20 09:17:48 +00:00
jmg	072b24d95e	garbage collect something that hasn't been triggered in almost 5 years... the last consumer was removed a couple years ago...	2014-04-19 19:08:08 +00:00
rmacklem	512a8b24e7	For NFS mounts using rsize,wsize=65536 over TSO enabled network interfaces limited to 32 transmit segments, there are two known issues. The more serious one is that for an I/O of slightly less than 64K, the net device driver prepends an ethernet header, resulting in a TSO segment slightly larger than 64K. Since m_defrag() copies this into 33 mbuf clusters, the transmit fails with EFBIG. A tester indicated observing a similar failure using iSCSI. The second less critical problem is that the network device driver must copy the mbuf chain via m_defrag() (m_collapse() is not sufficient), resulting in measurable overhead. This patch reduces the default size of if_hw_tsomax slightly, so that the first issue is avoided. Fixing the second issue will require a way for the network device driver to inform tcp_output() that it is limited to 32 transmit segments. Reported and tested by: csforgeron@gmail.com, markus.gebert@hostpoint.ch MFC after: 2 weeks	2014-04-17 23:31:50 +00:00
rmacklem	6067137dbd	Vlan did not set the value of if_hw_tsomax, so when vlan was stacked on top of a network interface that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface. This patch modifies vlan so that it sets if_hw_tsomax to the value of the parent interface. Reviewed by: glebius MFC after: 2 weeks	2014-04-15 21:48:35 +00:00
rmacklem	dc2495c46c	Fix build for non-INET that was broken by r264469. MFC after: 2 weeks	2014-04-15 13:28:54 +00:00
rmacklem	ff97df6be2	Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces. Reviewed by: glebius MFC after: 2 weeks	2014-04-14 20:34:48 +00:00
bms	f923a8498a	In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec. This KASSERT() existed as a sanity check that upper layers in the network stack (e.g. inet, inet6) had released their reference to the underlying driver's multicast memberships (ifmultiaddr{}). However it assumes the lifecycle of the driver membership corresponds to the lifecycle of the network layer membership. In the submitter's case, ieee80211_ioctl_updatemulti() attempts to reprogram the (parent, physical) ifnet{} memberships in response to a change in membership on the (child, virtual) VAP ifnet, using a batched update mechanism. These updates happen independently from the network layer, causing a "false negative" assertion failure. There are possibly other use cases where this KASSERT() may be triggered by other networking stack activity (e.g. where a nesting relationship exists between multiple ifnet{} instances). This suggests that further review of FreeBSD's approach to nested ifnet relationships is needed. MFC after: 6 weeks Submitted by: adrian@	2014-04-10 18:43:02 +00:00
tuexen	90c4737aa0	Call sctp_addr_change() from rt_addrmsg() instead of rt_newaddrmsg_fib(), since rt_addrmsg() gets also called from other functions. MFC after: 3 days	2014-04-07 21:28:21 +00:00
mm	c4f653f608	Merge from projects/pf r251993 (glebius@): De-vnet hash sizes and hash masks. Submitted by: Nikos Vassiliadis <nvass gmx.com> Reviewed by: trociny MFC after: 1 month	2014-03-25 06:55:53 +00:00
np	26117c8d46	Add a shorter alias for if_data.ifi_oqdrops.	2014-03-20 02:23:52 +00:00
jmmv	ca228204dc	Include strings.h so that bpf_filter.c can be built in userland. This is to bring in a definition for bzero(3), which in turn allows the tests in tools/regression/bpf/ to build again.	2014-03-19 13:10:25 +00:00
glebius	6c64d03c91	When exporting ifnet via sysctl, add ifqueue(9) drop count to the ifi_oqdrops. This is a temporary workaround until ifqueue(9) vanishes. While here, remove the pointless ifi_vhid assignment. It has sense only when we are exporting ifaddrs, not ifnets. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-19 06:08:03 +00:00
glebius	8293a6c1cc	Garbage collect long time obsoleted (or never used) stuff from routing API.	2014-03-15 06:49:32 +00:00
rwatson	f411704afc	Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)	2014-03-15 00:57:50 +00:00
glebius	80e85e32a5	Remove AppleTalk support. AppleTalk was a network transport protocol for Apple Macintosh devices in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was a legacy protocol and primary networking protocol is TCP/IP. The last Mac OS X release to support AppleTalk happened in 2009. The same year routing equipment vendors (namely Cisco) end their support. Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 06:29:43 +00:00
glebius	d494babace	Remove IPX support. IPX was a network transport protocol in Novell's NetWare network operating system from late 80s and then 90s. The NetWare itself switched to TCP/IP as default transport in 1998. Later, in this century the Novell Open Enterprise Server became successor of Novell NetWare. The last release that claimed to still support IPX was OES 2 in 2007. Routing equipment vendors (e.g. Cisco) discontinued support for IPX in 2011. Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.	2014-03-14 02:58:48 +00:00
glebius	b38edcd355	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
glebius	dddd70a112	The route code used to mtx_destroy() a locked mutex before rtentry free. Now, after r262763 it started to return locked mutexes to UMA. To fix that, conditionally unlock the mutex in the destructor. Tested by: "Sergey V. Dyatko" <sergey.dyatko@gmail.com>	2014-03-05 21:16:46 +00:00
glebius	86b259a9f4	Pacify gcc.	2014-03-05 02:35:15 +00:00
glebius	2d3e25388b	Hide struct rtentry from userland.	2014-03-05 01:47:08 +00:00
glebius	8a3e4bbebb	- Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-05 01:17:47 +00:00
glebius	c23c087e5b	Instead of playing games with casts simply add 3 more members to the structure pf_rule, that are used when the structure is passed via ioctl(). PR: 187074	2014-03-05 00:40:03 +00:00
gnn	bb403aea9c	Revert previous commit (262727) and bounce patch back to the submitter. Pointed out by: jhb	2014-03-04 23:55:04 +00:00
gnn	6db7075105	Naming consistency fix. The routing code defines RADIX_NODE_HEAD_LOCK as grabbing the write lock, but RADIX_NODE_HEAD_LOCK_ASSERT as checking the read lock. Submitted by: Vijay Singh <vijju.singh at gmail.com> MFC after: 1 month	2014-03-04 05:09:46 +00:00
jhb	d9d6b88f18	Remove more constants related to static sysctl nodes. The MAXID constants were primarily used to size the sysctl name list macros that were removed in r254295. A few other constants either did not have an associated sysctl node, or the associated node used OID_AUTO instead. PR: ports/184525 (exp-run)	2014-02-25 18:44:33 +00:00
zec	e1e2a9a54e	V_irtualize rtsock refcounting, which reduces the chances for panics on teardown of vnets without active routing sockets while at least one routing socket is active elsewhere. Tested by: Vijay Singh MFC after: 3 days	2014-02-19 08:29:07 +00:00
glebius	d61db4cf95	Fix incorrect assertions.	2014-02-18 14:21:26 +00:00
glebius	c2818b350b	Add my copyright to flowtable.	2014-02-17 12:07:17 +00:00
glebius	23243e6842	Whitespace.	2014-02-17 12:02:44 +00:00
glebius	e48d1742c5	Bring copyright notice to standard style.	2014-02-17 12:01:50 +00:00
glebius	f62415c467	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
adrian	c11e4e28da	Make sure that the flowtable flowid is only set to m_flowid if there isn't one already supplied. The previous flowtable code also did this. Reviewed by: glebius Sponsored by: Netflix, Inc.	2014-02-15 07:57:01 +00:00
luigi	51f5fa46d7	This new version of netmap brings you the following: - netmap pipes, providing bidirectional blocking I/O while moving 100+ Mpps between processes using shared memory channels (no mistake: over one hundred million. But mind you, i said moving not processing); - kqueue support (BHyVe needs it); - improved user library. Just the interface name lets you select a NIC, host port, VALE switch port, netmap pipe, and individual queues. The upcoming netmap-enabled libpcap will use this feature. - optional extra buffers associated to netmap ports, for applications that need to buffer data yet don't want to make copies. - segmentation offloading for the VALE switch, useful between VMs. and a number of bug fixes and performance improvements. My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial amount of work on these features so we owe them a big thanks. There are some external repositories that can be of interest: https://code.google.com/p/netmap our public repository for netmap/VALE code, including linux versions and other stuff that does not belong here, such as python bindings. https://code.google.com/p/netmap-libpcap a clone of the libpcap repository with netmap support. With this any libpcap client has access to most netmap feature with no recompilation. E.g. tcpdump can filter packets at 10-15 Mpps. https://code.google.com/p/netmap-ipfw a userspace version of ipfw+dummynet which uses netmap to send/receive packets. Speed is up in the 7-10 Mpps range per core for simple rulesets. Both netmap-libpcap and netmap-ipfw will be merged upstream at some point, but while this happens it is useful to have access to them. And yes, this code will be merged soon. It is infinitely better than the version currently in 10 and 9. MFC after: 3 days	2014-02-15 04:53:04 +00:00
glebius	959dc042be	Whenever flowtable lookup fails, we do route lookup and then try to insert flow entry. During the route lookup the critical section is exited. It may happen, that after route lookup we will be executed on an other CPU that already has such flowentry. Before this change we simply freed the flowentry and returned to ip_output() with failure. Actually there is nothing wrong with using previously allocated flow entry, updating it properly. Thus, make flowentry_insert() return the new either old fle, and make use of it. Count reuses as "collisions" and real inserts as "inserts". Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-14 10:56:26 +00:00
glebius	1ea1d562a3	Once pf became not covered by a single mutex, many counters in it became race prone. Some just gather statistics, but some are later used in different calculations. A real problem was the race provoked underflow of the states_cur counter on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this value is used in pf_state_expires() and any state created by this rule is immediately expired. Thus, make fields states_cur, states_tot and src_nodes of struct pf_rule be counter(9)s. Thanks to Dennis for providing me shell access to problematic box and his help with reproducing, debugging and investigating the problem. Thanks to: Dennis Yusupoff <dyr smartspb.net> Also reported by: dumbbell, pgj, Rambler Sponsored by: Nginx, Inc.	2014-02-14 10:05:21 +00:00
adrian	98cb90e335	Don't insert a flowtable entry if the lle isn't yet valid. Some of the collisions that are occuring are due to flowtable lookups that succeed but have an invalid lle - typically because the L2 adjacency lookup hasn't completed. This would lead to a follow-up insert which would then fail (ie, collision) and the code would fall through to doing a slow-path L2/L3 lookup in the netinet/netinet6 code. This patch simply aborts storing a new flowtable entry if the lle isn't yet valid. Whilst I'm here, add a new pcpu counter for the item so the number of failures can be tracked separately from generic "collisions." Reviewed by: glebius MFC after: 10 days Sponsored by: Netflix, Inc.	2014-02-14 00:05:09 +00:00
glebius	7942efb4e3	Remove unused FL_NOAUTO.	2014-02-13 05:19:09 +00:00
glebius	7d1964f9ec	o Axe non-pcpu flowtable implementation. It wasn't enabled or used, and probably is a leftover from first prototyping by Kip. The non-pcpu implementation used mutexes, so it doubtfully worked better than simple routing lookup. o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays, use zpcpu_get() to access data in there. o Substitute own single list implementation with SLIST(). This has two functional side effects: - new flows go into head of a list, before they went to tail. - a bug when incorrect flow was deleted in flow cleaner is fixed. o Due to cache line alignment, there is no reason to keep different zones for IPv4 and IPv6 flows. Both consume one cache line, real size of allocation is equal. o Rely on that f_hash, f_rt, f_lle are stable during fle lifetime, remove useless volatile quilifiers. o More INET/INET6 splitting. Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-13 04:59:18 +00:00
trociny	42b6b03337	Fixup for r261590 (vnet sysctl handlers cleanup). Reviewed by: glebius	2014-02-09 08:13:17 +00:00
glebius	2c14a9960a	Remove ft_rtalloc and choose rtalloc function at compile time.	2014-02-08 22:12:00 +00:00
glebius	cabcac76fb	Spacing.	2014-02-08 22:10:53 +00:00
glebius	217b478e1f	Revert accidentially leaked changes in r261627.	2014-02-08 09:57:52 +00:00
glebius	02f3acc9c1	Remove never set flag FL_OVERWRITE. The only place where it was checked led to lock/critnest leak.	2014-02-08 09:56:26 +00:00
glebius	94b81dedab	Fix comment.	2014-02-07 22:30:42 +00:00
glebius	0a7a0fafcd	Remove unused defines.	2014-02-07 21:56:16 +00:00
glebius	9d7706f9f4	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
glebius	bec9d523c2	Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET in the sysctl_root(). Note: SYSCTL_VNET_* macros can be removed as well. All is needed to virtualize a sysctl oid is set CTLFLAG_VNET on it. But for now keep macros in place to avoid large code churn. Sponsored by: Nginx, Inc.	2014-02-07 13:47:33 +00:00
glebius	b86ac1fe33	Spacing.	2014-02-07 10:05:12 +00:00
melifaro	881c9e28bf	Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers). MFC after: 2 weeks	2014-01-18 23:24:51 +00:00
luigi	ba56cd1e18	forgot to update this file in 2607000	2014-01-17 04:38:58 +00:00
luigi	651494a5f1	use explicit casts with void* to compile when included by C++ code	2014-01-11 00:00:11 +00:00
melifaro	cd97f8bba8	Simplify inet alias handling code: if we're adding/removing alias which has the same prefix as some other alias on the same interface, use newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg(). This eliminates the following rtsock messages: Pinned RTM_ADD for prefix (for alias addition). Pinned RTM_DELETE for prefix (for alias withdrawal). Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24): before commit, addition: got message of size 116 on Fri Jan 10 14:13:15 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 got message of size 192 on Fri Jan 10 14:13:15 2014 RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff after commit, addition: got message of size 116 on Fri Jan 10 13:56:26 2014 RTM_NEWADDR: address being added to iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255 before commit, wihdrawal: got message of size 192 on Fri Jan 10 13:58:59 2014 RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 10.0.0.0 10.0.0.2 (255) ffff ffff ff got message of size 116 on Fri Jan 10 13:58:59 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 adter commit, withdrawal: got message of size 116 on Fri Jan 10 14:14:11 2014 RTM_DELADDR: address being removed from iface: len 116, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255 Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong (and requires some hacks to keep prefix in route table on RTM_DELETE). I've tested this change with quagga (no change) and bird (). bird alias handling is already broken in BSD sysdep code, so nothing changes here, too. I'm going to MFC this change if there will be no complains about behavior change. While here, fix some style(9) bugs introduced by r260488 (pointed by glebius and bde). Sponsored by: Yandex LLC MFC after: 4 weeks	2014-01-10 12:13:55 +00:00
melifaro	dfba7fd9ef	Split rt_newaddrmsg_fib() into two different functions. Adding/deleting interface addresses involves access to 3 different subsystems, int different parts of code. Each call can fail, so reporting successful operation by rtsock in the middle of the process error-prone. Further split routing notification API and actual rtsock calls via creating public-available rt_addrmsg() / rt_routemsg() functions with "private" rtsock_* backend. MFC after: 2 weeks	2014-01-09 18:13:25 +00:00
melifaro	6e726b4922	Constanly use RT_ALL_FIBS everywhere instead of -1. MFC after: 2 weeks	2014-01-08 23:09:02 +00:00
melifaro	58f7b15da9	Remove dead code. Reported by: Coverity Coverity CID: 1018057 MFC after: 2 weeks	2014-01-07 19:00:40 +00:00
melifaro	860ae05c24	Teach every SIOCGIFSTATUS provider to fill in ifs->ascii anyway. Remove old bits of data concat for 'ascii' field. Remove special SIOCGIFSTATUS handling from if.c (which Coverity yells at). Reported by: Coverity Coverity CID: 1147174 MFC after: 2 weeks	2014-01-07 15:59:33 +00:00
melifaro	9f8536f282	Partially fix IPv4 interface routes deletion in RADIX_MPATH. Noticed by: Nikolay Denev <ndenev at gmail.com> MFC after: 1 month	2014-01-06 22:36:20 +00:00
luigi	41068e3dad	It is 2014 and we have a new version of netmap. Most relevant features: - netmap emulation on any NIC, even those without native netmap support. On the ixgbe we have measured about 4Mpps/core/queue in this mode, which is still a lot more than with sockets/bpf. - seamless interconnection of VALE switch, NICs and host stack. If you disable accelerations on your NIC (say em0) ifconfig em0 -txcsum -txcsum you can use the VALE switch to connect the NIC and the host stack: vale-ctl -h valeXX:em0 allowing sharing the NIC with other netmap clients. - THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers instead of pointers/count as before). This was unavoidable to support, in the future, multiple threads operating on the same rings. Netmap clients require very small source code changes to compile again. On the plus side, the new API should be easier to understand and the internals are a lot simpler. The manual page has been updated extensively to reflect the current features and give some examples. This is the result of work of several people including Giuseppe Lettieri, Vincenzo Maffione, Michio Honda and myself, and has been financially supported by EU projects CHANGE and OPENLAB, from NetApp University Research Fund, NEC, and of course the Universita` di Pisa.	2014-01-06 12:53:15 +00:00
melifaro	f85abe9555	Change semantics for rnh_lookup() function: now it performs exact match search, regardless of netmask existance. This simplifies most of rnh_lookup() consumers. Fix panic triggered by deleting non-existent host route. PR: kern/185092 Submitted by: Nikolay Denev <ndenev at gmail.com> MFC after: 1 month	2014-01-04 22:25:26 +00:00
melifaro	4d725b0dec	Remove useless register variable modifiers. Do some more style(9). MFC after: 2 weeks	2014-01-03 14:33:25 +00:00
gnn	95e24fab3c	Convert #defines to enums so that the values are visible in the debugger. Requested by: gibbs MFC after: 2 weeks	2014-01-02 21:30:59 +00:00
scottl	4ba8fc2916	Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Reviewed by: max Obtained from: Netflix MFC after: 3 days	2013-12-30 01:32:17 +00:00
melifaro	97d5fc7904	Simplify contiguous mask checking. Suggested by: glebius MFC after: 2 weeks	2013-12-17 22:16:27 +00:00
luigi	eb4897aa4a	split netmap code according to functions: - netmap.c base code - netmap_freebsd.c FreeBSD-specific code - netmap_generic.c emulate netmap over standard drivers - netmap_mbq.c simple mbuf tailq - netmap_mem2.c memory management - netmap_vale.c VALE switch simplify devce-specific code	2013-12-15 08:37:24 +00:00
gnn	7a20ece098	Add constants for use in interrogating various fiber and copper connectors most often used with network interfaces. The SFF-8472 standard defines the information that can be retrieved from an optic or a copper cable plugged into a NIC, most often referred to as SFP+. Examples of values that can be read include the cable vendor's name, part number, date of manufacture as well as running data such as temperature, voltage and tx and rx power. Copious comments on how to use these values with an I2C interface are given in the header file itself. MFC after: 2 weeks	2013-11-27 20:20:02 +00:00
glebius	ec07f25797	Fix build.	2013-11-27 07:21:25 +00:00
pluknet	10cc4e7eb8	Fix macro name in comment.	2013-11-26 15:23:56 +00:00
avg	71889a5eff	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
rodrigc	7064342be9	In vnet_route_uninit(), free some memory that is allocated in vnet_route_init(). To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo see: http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021291.html This doesn't eliminate all the "Freed UMA keg was not empty" warning messages on the console, but it helps.	2013-11-25 20:33:33 +00:00
attilio	7ee4e910ce	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
glebius	c884926273	To support upcoming changes change internal API for source node handling: - Removed pf_remove_src_node(). - Introduce pf_unlink_src_node() and pf_unlink_src_node_locked(). These function do not proceed with freeing of a node, just disconnect it from storage. - New function pf_free_src_nodes() works on a list of previously disconnected nodes and frees them. - Utilize new API in pf_purge_expired_src_nodes(). In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de> Sponsored by: InnoGames GmbH Sponsored by: Nginx, Inc.	2013-11-22 19:16:34 +00:00
glebius	2d853e5460	Add missing 'extern'.	2013-11-22 19:02:22 +00:00
gnn	e2e505c220	Allow ethernet drivers to pass in packets connected via the nextpkt pointer. Handling packets in this way allows drivers to amortize work during packet reception. Submitted by: Vijay Singh Sponsored by: NetApp	2013-11-18 22:58:14 +00:00
gnn	bbfd12afaf	Clean up the macros to avoid using casts. Suggested by: bde and jhb	2013-11-15 16:03:32 +00:00
ae	445416e3f4	ANSIfy function defintions.	2013-11-15 12:12:50 +00:00
gnn	cf61e85a7a	Put in the correct bit shifting and add a type to prevent clang from complaining. While here fix up a grammar nit. Pointed out by: Sergey Kandaurov and bz@ respectively.	2013-11-14 21:57:37 +00:00
gnn	e48ac6d9c4	Shift our OUI correctly. Pointed out by: emaste	2013-11-14 20:07:17 +00:00
gnn	8018f37975	The FreeBSD Project now has its own, Ogranizationally Unique Identifier, assigned by the IEEE. This file includes documentation on how developers must carve up the space as well as an initial allocation for bhyve. Sponsored by: The FreeBSD Foundation	2013-11-14 19:53:35 +00:00
glebius	3c1f482e0e	Remove never used ioctls that originate from KAME. The proof of their zero usage was exp-run from misc/183538.	2013-11-11 05:39:42 +00:00
glebius	1dbe9493b0	Provide compat layer for OSIOCAIFADDR.	2013-11-06 19:46:20 +00:00
glebius	cb6df3f35c	Axe IFF_SMART. Fortunately this layering violating flag was never used, it was just declared.	2013-11-05 12:52:56 +00:00
glebius	3b6f8b896c	Drop support for historic ioctls and also undefine them, so that code that checks their presence via ifdef, won't use them. Bump __FreeBSD_version as safety measure.	2013-11-05 10:29:47 +00:00
glebius	592c1d7a8e	In complemence to ifa_add_loopback_route() and ifa_del_loopback_route() provide function ifa_switch_loopback_route() that will be used in case when an interface address used for a loopback route goes away, but we have another interface address with same address value and want to preserve loopback route. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-11-05 07:36:17 +00:00
glebius	bce78dfe17	Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by default from the very beginning. It was placed in wrong namespace net.link.ether, originally it had been at another wrong namespace. It was incorrectly documented at incorrect manual page arp(8). Since new-ARP commit, the tunable have been consulted only on route addition, and ignored on route deletion. Behaviour of a system with tunable turned off is not fully correct, and has no advantages comparing to normal behavior.	2013-11-05 07:32:09 +00:00
adrian	900b03bfd2	Restore the entropy gathering from the m_data pointer value, not the m_data payload. After talking with markm/bde, this is what markm actually intended.	2013-11-02 15:13:02 +00:00
luigi	41bc3f25be	update to the latest netmap snapshot. This includes the following: - use separate memory regions for VALE ports - locking fixes - some simplifications in the NIC-specific routines - performance improvements for the VALE switch - some new features in the pkt-gen test program - documentation updates There are small API changes that require programs to be recompiled (NETMAP_API has been bumped so you will detect old binaries at runtime). In particular: - struct netmap_slot now is 16 bytes to support an extra pointer, which may save one data copy when using VALE ports or VMs; - the struct netmap_if has two extra fields; MFC after: 3 days	2013-11-01 21:21:14 +00:00
adrian	42bc4565be	Convert the random entropy harvesting code to use a const void * pointer rather than just void . Then, as part of this, convert a couple of mbuf m->m_data accesses to mtod(m, const void ). Reviewed by: markm Approved by: security-officer (delphij) Sponsored by: Netflix, Inc.	2013-11-01 20:53:49 +00:00
andre	d480a68d35	Make struct ifnet readable and comprehensible again by grouping and ordering related variables, fields and locks next to each other. Add more comments to variables. Over time 'ifnet' has accumlated a lot of additional pointers and functionality in an unstructured way making it quite hard to read and understand while obfuscating relationships between fields and variables. Quantify the structure size and how bloated it has become. This is only a mechanical change in preparation for upcoming work to make ifnet opaque to drivers and to separate out the interface queuing. Sponsored by: The FreeBSD Foundation	2013-10-31 15:46:10 +00:00
andre	1dfe65e51e	Move all interface queue related structures, macros and definitions from net/if_var to it own new net/ifq.h. For now net/ifq.h is unconditionally included through net/if_var.h. This is a mechanical change in preparation to make struct ifnet and the individual interface queue mechanisms opaque. Discussed with: glebius Sponsored by: The FreeBSD Foundation	2013-10-29 17:48:08 +00:00
glebius	67835727b4	Style: s/SYS_EVENTHANDLER_H/_SYS_EVENTHANDLER_H_/g Submitted by: bde	2013-10-28 20:32:05 +00:00
glebius	c829949efa	- Make the prophecy from 1997 happen and remove if_var.h inclusion from if.h. - Remove unnecessary includes and declarations from if.h - Remove unnecessary includes and declarations from if_var.h [1] - Mark some declarations that are about to be removed in near future with comments, explaning why this declaration is still necessary. - Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H. Obtained from: bdeBSD [1] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 08:03:40 +00:00
glebius	9c54cb7e8e	Instead of putting ifnet declaration into eventhandler.h, move bpf(4) and vlan(4) related event declarations to bpf.h and if_vlan_var.h. To avoid dependency on eventhandler.h, protect these declarations with ifdef SYS_EVENTHANDLER_H. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:45:03 +00:00
glebius	f469ae1d45	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
glebius	e678c455ff	Provide forward declaration for struct ifnet. Consumers of this header don't need contents of struct.	2013-10-27 17:27:06 +00:00
glebius	36ce0e2c8d	Almost all if_clone consumers do not care about if_clone_event. Do not force them to include sys/eventhandler.h. Those who utilize EVENTHANDLER(9), will see the declaration.	2013-10-27 17:14:33 +00:00
glebius	e352aa585e	Move new pf includes to the pf directory. The pfvar.h remain in net, to avoid compatibility breakage for no sake. The future plan is to split most of non-kernel parts of pfvar.h into pf.h, and then make pfvar.h a kernel only include breaking compatibility. Discussed with: bz	2013-10-27 16:25:57 +00:00
glebius	4fe4e9732a	Start splitting pfvar.h into internal and external parts. - Provide pf_altq.h that has only stuff needed for ALTQ. - Start pf.h, that would have all constant values and eventually non-kernel structures. - Build ALTQ w/o pfvar.h, include if_var.h, that before came via pollution. - Build tcpdump w/o pfvar.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 18:59:58 +00:00
glebius	ff6e113f1b	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
glebius	bd250aba33	vnet.h needs to be included before raw_cb.h. Now it compiles due to pollution via if_var.h.	2013-10-25 19:49:03 +00:00
grehan	bdbd34c64b	Fix panic in the tap driver when a tap and vmnet interface were created after each other e.g. ifconfig tap0 ifconfig vmnet0 <panic> Appears to be a cut'n'paste error from the tap code to the vmnet code where the name string wasn't updated in the call to make_dev(). Reviewed by: glebius MFC after: 3 days	2013-10-24 22:21:31 +00:00
ae	af618a2528	Add a note that lacp_compose_key() should be updated, when new media types will be added. Submitted by: melifaro X-MFC after: r256689	2013-10-21 07:49:36 +00:00
ae	2a36c1745b	Use the same actor key for media types of the same speed. PR: 176097 MFC after: 2 weeks	2013-10-17 15:14:58 +00:00
melifaro	da935edd8d	Fix long-standing issue with incorrect radix mask calculation. Usual symptoms are messages like rn_delete: inconsistent annotation rn_addmask: mask impossibly already in tree or inability to flush/delete particular prefix in ipfw table. Changes: * Assume 32 bytes as maximum radix key length * Remove rn_init() * Statically allocate rn_ones/rn_zeroes * Make separate mask tree for each "normal" tree instead of system global one * Remove "optimization" on masks reusage and key zeroying * Change rn_addmask() arguments to accept tree pointer (no users in base) PR: kern/182851, kern/169206, kern/135476, kern/134531 Found by: Slawa Olhovchenkov <slw@zxy.spb.ru> MFC after: 2 weeks Reviewed by: glebius Sponsored by: Yandex LLC	2013-10-16 12:18:44 +00:00
melifaro	2a081ed941	Remove unused fields from radix_node_head. Sponsored by: Yandex LLC	2013-10-16 10:33:20 +00:00
glebius	26565df757	Rename Free() macro to R_Free(). This matches R_Malloc() and has much lower probability to clash with other headers. Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2013-10-16 04:59:59 +00:00
emax	d07bb38bb4	In the flowtable scanner, restart the scan at the last found position, not at position 0. Changes the scanner from O(N^2) to O(N). Submitted by: scottl Obtained from: Netflix, Inc MFC after: 3 weeks	2013-10-15 21:28:51 +00:00
glebius	790225cfbc	- Utilize counter(9) to accumulate statistics on interface addresses. Add four counters to struct ifaddr. This kills '+=' on a variables shared between processors for every packet. - Nuke struct if_data from struct ifaddr. - In ip_input() do not put a reference on ifaddr, instead update statistics right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9) for every packet. [1] - To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in rtsock.c fill if_data fields using counter_u64_fetch(). - Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which took if_data not from the ifaddr, but from ifaddr's ifnet. [2] Submitted by: melifaro [1], pluknet[2] Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 11:37:57 +00:00
glebius	bc71d67cbb	Push some defines under _KERNEL, improve styling and comments.	2013-10-15 10:43:26 +00:00
glebius	cb3115eac5	Remove ifa_mtx. It was used only in one place in kernel, and ifnet's ifaddr lock can substitute it there. Discussed with: melifaro, ae Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:41:22 +00:00
glebius	564d02b304	Remove ifa_init() and provide ifa_alloc() that will allocate and setup struct ifaddr internally. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:31:42 +00:00
glebius	1c87562bdb	Hide 'struct ifaddr' definition from userland. Two tools left that use it, namely ipftest(1) and ifmcstat(1). These sniff structure definition using _WANT_IFADDR define. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-15 10:19:24 +00:00
markm	a9d73bef64	MFC - tracking commit.	2013-10-09 21:03:34 +00:00
glebius	75528d8e36	There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)	2013-10-09 19:04:40 +00:00
markm	0643acd34d	Debug run. This now works, except that the "live" sources haven't been tested. With all sources turned on, this unlocks itself in a couple of seconds! That is no my box, and there is no guarantee that this will be the case everywhere. * Cut debug prints. * Use the same locks/mutexes all the way through. * Be a tad more conservative about entropy estimates.	2013-10-06 12:40:32 +00:00
markm	b28953010e	Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)	2013-10-04 06:55:06 +00:00
markm	4655fd3ead	MFC - tracking commit	2013-10-03 17:30:55 +00:00
glebius	515e096f72	Clear knlist before destroying it in tap(4) and tun(4). This fixes later crash, when a kqueue descriptor tries to dereference appropriate knotes. Approved by: re (kib)	2013-10-02 20:44:36 +00:00
glebius	6502d83a23	Fix a fallout from r241610. One enc interface must be created on startup. Pointy hat to: glebius Reported by: gavin Approved by: re (gjb)	2013-09-28 14:14:23 +00:00
glebius	3b39d8bc17	Clean up SIOCSIFDSTADDR usage from ifnet drivers. The ioctl itself is extremely outdated, and I doubt that it was ever used for ifnet drivers. It was used for AF_INET sockets in pre-FreeBSD time. Approved by: re (hrs) Sponsored by: Nginx, Inc.	2013-09-11 09:19:44 +00:00
des	21e6fd796b	Fix the length calculation for the final block of a sendfile(2) transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re	2013-09-10 10:05:59 +00:00
markm	42ac52f951	Bring in some behind-the-scenes development, mainly By Arthur Mesh, the rest by me. o Namespace cleanup; the Yarrow name is now restricted to where it really applies; this is in anticipation of being augmented or replaced by Fortuna in the future. Fortuna is mentioned, but behind #if logic, and is ignorable for now. o The harvest queue is pulled out into its own modules. o Entropy harvesting is emproved, both by being made more conservative, and by separating (a bit!) the sources. Available entropy crumbs are marginally improved. o Selection of sources is made clearer. With recent revelations, this will receive more work in the weeks and months to come. Submitted by: Arthur Mesh (partly) <arthurmesh@gmail.com>	2013-09-07 14:15:13 +00:00
davide	5545e24af3	Don't clear the unused SI_CHEAPCLONE flag in tap_create()/tuncreate(). Reviewed by: kib	2013-09-07 13:50:13 +00:00
markm	b41e1125b0	MFC	2013-09-07 07:58:29 +00:00
davide	966ef339de	Retire netisr.netisr_direct and netisr.netisr_direct_force sysctls. These were used to control/export dispatch policy but they're not anymore. This commit cannot be MFC'ed to 9 because old netstat(9) binary relies on such sysctl to work. On the other hand, there's no real reason to keep'em around in 10.	2013-09-06 21:02:43 +00:00
markm	ff7909302f	MFC	2013-08-30 11:38:34 +00:00
adrian	8f526008d4	Convert the if_lagg rwlock to an rmlock. We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.	2013-08-29 19:35:14 +00:00
alfred	14b1677551	Remove include opt_ofed.h since OFED is unifdef'd. Pointed out by: glebius	2013-08-27 16:45:00 +00:00
markm	8a3bb03c25	Snapshot; Do some running repairs on entropy harvesting. More needs to follow.	2013-08-26 18:35:21 +00:00
jhb	a437be7257	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
andre	11493fb48f	Remove unnecessary setup of the m->pkthdr.header pointer. Sponsored by: The FreeBSD Foundation	2013-08-25 09:41:37 +00:00
alfred	89900cc304	Remove the #ifdef OFED from the 20 byte mac in struct llentry. With this change it is now possible to build the entire infiniband stack as modules and load it dynamically including IP over IB.	2013-08-25 01:55:14 +00:00
andre	e3737c33e7	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
andre	aeda1bfef5	Whitespace, style cleanups, and improved comments.	2013-08-24 12:03:24 +00:00
andre	5ec35e1f72	ename PFIL_LIST_[UN]LOCK() to PFIL_HEADLIST_[UN]LOCK() to avoid confusion with the pfil_head chain locking macros.	2013-08-24 11:24:15 +00:00
andre	8851d5babc	Resolve the confusion between the head_list and the hook list. The linked list of pfil hooks is changed to "chain" and this term is applied consistently. The head_list remains with "list" term. Add KASSERT to vnet_pfil_uninit(). Update and extend comments. Reviewed by: eri (previous version)	2013-08-24 11:17:25 +00:00
andre	f0d721b842	Internalize pfil_hook_get(). There are no outside consumers of this API, it is only safe for internal use and even the pfil(9) man page says so in the BUGS section. Reviewed by: eri	2013-08-24 10:36:33 +00:00
andre	8095dac6a2	Convert one instance of pfil hook callback missed in r254769.	2013-08-24 10:30:20 +00:00
andre	5630e57e48	Introduce typedef for pfil hook callback function and replace all spelled out occurrences with it. Reviewed by: eri	2013-08-24 10:13:59 +00:00
bz	0473152dd5	After r241616 properly export ifi_baudrate_pf in the 32bit compat case. MFC after: 3 days	2013-08-20 14:35:17 +00:00
andre	7cc6cc696c	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
markj	f82b0dd694	Add a missing module version declaration to if_tun(4). PR: 181078 Submitted by: Brandon Gooch <jamesbrandongooch@gmail.com> MFC after: 1 week	2013-08-07 01:32:08 +00:00
hrs	b1001912a2	sin6 should be assigned before the loop.	2013-07-28 20:02:41 +00:00
hrs	ee26f7a256	- Relax the restriction on the member interfaces with LLAs. Two or more LLAs on the member interfaces are actually harmless when the parent interface does not have a LLA. - Add net.link.bridge.allow_llz_overlap. This is a knob to allow LLAs on a bridge and the member interfaces at the same time. The default is 0. Pointed out by: ume MFC after: 3 days	2013-07-28 19:49:39 +00:00
adrian	1467e47941	Break out the static, global LACP debug options into a per-lagg unit sysctl tree. * Create a net.link.lagg.X.lacp node * Add a debug node under that for tx_test and rx_test * Add lacp_strict_mode, defaulting to 1 tx_test and rx_test are still a bitmap of unit numbers for now. At some point it would be nice to create child nodes of the lagg bundle for each sub-interface, and then populate those with various knobs and statistics. Sponsored by: Netflix	2013-07-26 19:41:13 +00:00
adrian	6a6d8064b3	Fix typo. Sponsored by: Netflix	2013-07-25 19:10:23 +00:00
marcel	4ca16da195	Decouple the UUID generator from network interfaces by having MAC addresses added to the UUID generator using uuid_ether_add(). The UUID generator keeps an arbitrary number of MAC addresses, under the assumption that they are rarely removed (= uuid_ether_del()). This achieves the following: 1. It brings up closer to having the network stack as a loadable module. 2. It allows the UUID generator to filter MAC addresses for best results (= highest chance of uniqeness). 3. MAC addresses can come from anywhere, irrespactive of whether it's used for an interface or not. A side-effect of the change is that when no MAC addresses have been added, a random multicast MAC address is created once and re-used if needed. Previusly, when a random MAC address was needed, it was created for every call. Thus, a change in behaviour is introduced for when no MAC addresses exist. Obtained from: Juniper Networks, Inc.	2013-07-24 04:24:21 +00:00
rodrigc	7e3e1747c8	PR: 168520 170096 Submitted by: adrian, zec Fix multiple kernel panics when VIMAGE is enabled in the kernel. These fixes are based on patches submitted by Adrian Chadd and Marko Zec. (1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling device_attach(). This fixes multiple VIMAGE related kernel panics when trying to attach Bluetooth or USB Ethernet devices because curthread->td_vnet is NULL. (2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking interfaces, especially USB Ethernet devices. (3) Use VNET_DOMAIN_SET() in ng_btsocket.c (4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics when detaching Netgraph nodes.	2013-07-15 01:32:55 +00:00
adrian	e729c4bb92	Bring over some link aggregation / LACP protocol improvements and debugging additions. * Add some new tracing events to aid in debugging. * Add in a debugging mode to drop transmit and received frames, specifically to test whether seeing or hearing heartbeats correctly cause LACP to drop the port. * Add in (and make default) a strict LACP mode, which requires the heartbeat on a port to be heard before it's used. Sometimes vendor ports will hang but the link layer stays up, resulting in hung traffic. * Add logging the number of link status flaps, again to aid in debugging badly behaving switch ports. * Calculate the lagg interface port speed as the multiple of the configured ports, rather than the largest. Obtained from: Netflix MFC after: 2 weeks	2013-07-13 04:25:03 +00:00
hrs	ef82e58667	Add a leaf node CTL_NET.PF_ROUTE.0.AF.NET_RT_DUMP.0.FIB. This returns routing table with the specified FIB number, not td->td_proc->p_fibnum.	2013-07-12 12:36:12 +00:00
hrs	59e83b6413	- Drop GIF_ACCEPT_REVETHIP flag by default. - Add IFF_MONITOR support.	2013-07-12 12:18:07 +00:00
ae	89cf418196	Correct CTASSERT condition.	2013-07-09 15:10:27 +00:00
ae	705a50a053	Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.	2013-07-09 09:50:15 +00:00
ae	027c687189	Add several macros to help migrate statistics structures to PCPU counters.	2013-07-09 09:37:21 +00:00
ae	1a36dfcc87	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
cperciva	c6a6dc71e9	Fix typo: minmum -> minimum. Submitted by: @z3ndrag0n	2013-07-05 23:40:08 +00:00
hrs	5ede8e3214	Fix a compiler warning. MFC after: 1 week	2013-07-03 07:31:07 +00:00
hrs	50e0add9e4	- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [] Submitted by: rpaulo [] MFC after: 1 week	2013-07-02 16:58:15 +00:00
qingli	2160365ab5	Due to the routing related networking kernel redesign work in FBSD 8.0, interface routes have been returened to the applications without the RTF_GATEWAY bit. This incompatibility has caused some issues with Zebra, Qugga and the like. This patch provides the RTF_GATEWAY flag bit in returned interface routes so to behave similarly to pre 8.0 systems. Reviewed by: hrs Verified by: mackn at opendns dot com	2013-06-25 00:10:49 +00:00
delphij	d5f66cc889	Return ENETDOWN instead of ENOENT when all lagg(4) links are inactive when upper layer tries to transmit packet. This gives better feedback and meaningful errors for applications. MFC after: 2 weeks Reviewed by: thompsa	2013-06-17 19:31:03 +00:00
hrs	bba6e363a0	Return ENETDOWN when the parent interface is down. MFC after: 1 week	2013-06-16 04:40:02 +00:00
trociny	d7bd09411e	Properly set curvnet context in lagg_port_setlladdr() task handler. Reported by: Nikos Vassiliadis <nvass gmx.com> Submitted by: zec Tested by: Nikos Vassiliadis <nvass gmx.com> MFC after: 1 week	2013-06-07 10:27:50 +00:00
jhb	058cacb022	Fix build with both INET and INET6 disabled.	2013-06-04 20:40:16 +00:00
andre	b706ceb4ab	Allow drivers to specify a maximum TSO length in bytes if they are limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI)	2013-06-03 12:55:13 +00:00
luigi	f8c8cdb1f0	Bring in a number of new features, mostly implemented by Michio Honda: - the VALE switch now support up to 254 destinations per switch, unicast or broadcast (multicast goes to all ports). - we can attach hw interfaces and the host stack to a VALE switch, which means we will be able to use it more or less as a native bridge (minor tweaks still necessary). A 'vale-ctl' program is supplied in tools/tools/netmap to attach/detach ports the switch, and list current configuration. - the lookup function in the VALE switch can be reassigned to something else, similar to the pf hooks. This will enable attaching the firewall, or other processing functions (e.g. in-kernel openvswitch) directly on the netmap port. The internal API used by device drivers does not change. Userspace applications should be recompiled because we bump NETMAP_API as we now use some fields in the struct nmreq that were previously ignored -- otherwise, data structures are the same. Manpages will be committed separately.	2013-05-30 14:07:14 +00:00
luigi	93cb261dd8	clarify usage of NETMAP_BUF	2013-05-30 13:41:19 +00:00
ghelmer	37dcf710d1	While waiting for the bpf hold buffer to become idle, check the return value from mtx_sleep() and exit bpfread() on errors such as EINTR. Reviewed by: jhb	2013-05-23 21:33:10 +00:00
ed	c0a01b0858	Allow certain headers to be included more easily. Spotted by: http://hacks.owlfolio.org/header-survey/	2013-05-21 21:20:10 +00:00
melifaro	70dfcb99bb	Use separate function to update mbuf checksum flags instead of duplicating the same code in different places. MFC after: 2 weeks	2013-05-18 08:14:21 +00:00
melifaro	967c651cd7	Fix rte leak introduced in r248070. MFC after: 2 weeks	2013-05-18 07:10:22 +00:00
julian	329247aec2	Finally change the mbuf to have its own fib field instead of stealing 4 flag bits. This was supposed to happen in 8.0, and again in 2012.. MFC after: never	2013-05-16 16:20:17 +00:00
hrs	960e7e9fd4	Add IFF_MONITOR support to gre(4). Tested by: Chip Marshall MFC after: 1 week	2013-05-11 19:05:38 +00:00
andre	cc8c6e4d01	Back out r249318, r249320 and r249327 due to a heisenbug most likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation.	2013-05-06 16:42:18 +00:00
eadler	a5a9ec51d6	Correct a few sizeof()s Submitted by: swildner@DragonFlyBSD.org Reviewed by: alfred	2013-05-01 04:37:34 +00:00
luigi	ff43752d35	remove $Id$ (whitespace change)	2013-04-30 16:00:21 +00:00
glebius	b4bc270e8f	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
oleg	6ff116b3ca	Recover missing arp_ifinit() call. MFC after: 2 weeks	2013-04-18 20:13:33 +00:00
glebius	d9c22bdbc9	Switch lagg(4) statistics to counter(9). The lagg(4) is often used to bond high speed links, so basic per-packet += on statistics cause cache misses and statistics loss. Perfect solution would be to convert ifnet(9) to counters(9), but this requires much more work, and unfortunately ABI change, so temporarily patch lagg(4) manually. We store counters in the softc, and once per second push their values to legacy ifnet counters. Sponsored by: Nginx, Inc.	2013-04-15 13:00:42 +00:00
glebius	e79bb9704b	Fix build.	2013-04-10 08:09:25 +00:00
andre	306fddaf78	Change certain heavily used network related mutexes and rwlocks to reside on their own cache line to prevent false sharing with other nearby structures, especially for those in the .bss segment. NB: Those mutexes and rwlocks with variables next to them that get changed on every invocation do not benefit from their own cache line. Actually it may be net negative because two cache misses would be incurred in those cases.	2013-04-09 21:02:20 +00:00
ae	844d612b2a	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
markj	a688a70512	Ignore interface renames instead of removing the interface from the bridge group. Reviewed by: rstone Approved by: rstone (co-mentor) Sponsored by: Sandvine Incorporated MFC after: 1 week	2013-03-28 20:37:07 +00:00
glebius	82edd7c363	Remove __FreeBSD_version ifdefs.	2013-03-22 20:44:16 +00:00
ae	b3c4973a10	Fix style and comments.	2013-03-19 05:51:47 +00:00
glebius	b37af62b9e	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
glebius	76306b3465	- Use m_getcl() instead of hand allocating. - Convert panic() to KASSERT. - Remove superfluous cleaning of mbuf fields after allocation. - Add comment on possible use of m_get2() here. Sponsored by: Nginx, Inc.	2013-03-15 12:52:59 +00:00
glebius	37a43650ed	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.	2013-03-12 13:42:47 +00:00
glebius	18f90896db	Reinitialize eh after pfil(9) processing. PR: 176764 Submitted by: adri	2013-03-11 12:06:57 +00:00
melifaro	fccbb24392	Fix long-standing issue with interface routes being unprotected: Use RTM_PINNED flag to mark route as immutable. Forbid deleting immutable routes without special rtrequest1_fib() flag. Adding interface address with prefix already in route table is handled by atomically deleting old prefix and adding interface one. Discussed with: andre, eri MFC after: 3 weeks	2013-03-08 20:33:50 +00:00
melifaro	2a77ba4103	Write lock is not required for find&compare operation. MFC after: 2 weeks	2013-03-05 13:38:45 +00:00
glebius	f8098d720c	Finish the r244185. This fixes ever growing counter of pfsync bad length packets, which was actually harmless. Note that peers with different version of head/ may grow this counter, but it is harmless - all pfsync data is processed. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-15 09:03:56 +00:00
glebius	a47c0295c5	Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-11 10:58:22 +00:00
rrs	75ad250e97	This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback cannot be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org	2013-02-07 15:20:54 +00:00
glebius	7f832c3059	Retire struct sockaddr_inarp. Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag. New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value. The structure declaration is left for compatibility with third party software, but in tree code no longer use it. Reviewed by: ru, andre, net@	2013-01-31 08:55:21 +00:00
glebius	5e925b9e5a	route_output() always supplies info with RTAX_GATEWAY member that points to a sockaddr of AF_LINK family. Assert this instead of checking.	2013-01-29 21:44:22 +00:00
np	66b6d0e94e	Move lle_event to if_llatbl.h lle_event replaced arp_update_event after the ARP rewrite and ended up in if_ether.h simply because arp_update_event used to be there too. IPv6 neighbor discovery is going to grow lle_event support and this is a good time to move it to if_llatbl.h. The two in-tree consumers of this event - OFED and toecore - are not affected. Reviewed by: bz@	2013-01-25 23:58:21 +00:00
glebius	9f6a60e000	- Utilize m_get2(), accidentially fixing some signedness bugs. - Return EMSGSIZE in both cases if uio_resid is oversized or undersized. - No need to clear rcvif.	2013-01-24 14:29:31 +00:00
luigi	47486b84ac	leftover from r245579... flags for semi transparent mode and direct forwarding through a VALE switch	2013-01-23 03:49:48 +00:00
glebius	bc87b91f9e	If lagg(4) can't forward a packet due to underlying port problems, return much more meaningful ENETDOWN to the stack, instead of EBUSY.	2013-01-21 08:59:31 +00:00
glebius	cf8b6db820	- Add dashes before copyright notices. - Add $FreeBSD$. - Remove unused define.	2013-01-07 19:36:11 +00:00
peter	184da5d83d	Juggle some internal symbols from our antique zlib (that originally came in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice. This is a horrible hack to get freefall to compile, and is in dire need of reconciliation. This antique zlib-1.04 code needs to go away.	2013-01-06 14:59:59 +00:00
ae	f89408c03e	Add an ability to set net.link.stf.permit_rfc1918 from the loader. MFC after: 2 weeks	2012-12-27 21:26:08 +00:00

... 4 5 6 7 8 ...

3489 Commits