freebsd-skq

Author	SHA1	Message	Date
Gleb Smirnoff	41a7572b26	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.	2013-03-12 13:42:47 +00:00
Gleb Smirnoff	129004c56f	Reinitialize eh after pfil(9) processing. PR: 176764 Submitted by: adri	2013-03-11 12:06:57 +00:00
Alexander V. Chernikov	3034f43f2f	Fix long-standing issue with interface routes being unprotected: Use RTM_PINNED flag to mark route as immutable. Forbid deleting immutable routes without special rtrequest1_fib() flag. Adding interface address with prefix already in route table is handled by atomically deleting old prefix and adding interface one. Discussed with: andre, eri MFC after: 3 weeks	2013-03-08 20:33:50 +00:00
Alexander V. Chernikov	14126522cf	Write lock is not required for find&compare operation. MFC after: 2 weeks	2013-03-05 13:38:45 +00:00
Gleb Smirnoff	e2a55a0021	Finish the r244185. This fixes ever growing counter of pfsync bad length packets, which was actually harmless. Note that peers with different version of head/ may grow this counter, but it is harmless - all pfsync data is processed. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-15 09:03:56 +00:00
Gleb Smirnoff	24421c1c32	Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-11 10:58:22 +00:00
Randall Stewart	ded5ea6a25	This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback cannot be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org	2013-02-07 15:20:54 +00:00
Gleb Smirnoff	9711a168b9	Retire struct sockaddr_inarp. Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag. New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value. The structure declaration is left for compatibility with third party software, but in tree code no longer use it. Reviewed by: ru, andre, net@	2013-01-31 08:55:21 +00:00
Gleb Smirnoff	1910bfcba2	route_output() always supplies info with RTAX_GATEWAY member that points to a sockaddr of AF_LINK family. Assert this instead of checking.	2013-01-29 21:44:22 +00:00
Navdeep Parhar	4364ec0852	Move lle_event to if_llatbl.h lle_event replaced arp_update_event after the ARP rewrite and ended up in if_ether.h simply because arp_update_event used to be there too. IPv6 neighbor discovery is going to grow lle_event support and this is a good time to move it to if_llatbl.h. The two in-tree consumers of this event - OFED and toecore - are not affected. Reviewed by: bz@	2013-01-25 23:58:21 +00:00
Gleb Smirnoff	ed63043b21	- Utilize m_get2(), accidentially fixing some signedness bugs. - Return EMSGSIZE in both cases if uio_resid is oversized or undersized. - No need to clear rcvif.	2013-01-24 14:29:31 +00:00
Luigi Rizzo	01c039a19c	leftover from r245579... flags for semi transparent mode and direct forwarding through a VALE switch	2013-01-23 03:49:48 +00:00
Gleb Smirnoff	1d9797f128	If lagg(4) can't forward a packet due to underlying port problems, return much more meaningful ENETDOWN to the stack, instead of EBUSY.	2013-01-21 08:59:31 +00:00
Gleb Smirnoff	f6eef2c2d6	- Add dashes before copyright notices. - Add $FreeBSD$. - Remove unused define.	2013-01-07 19:36:11 +00:00
Peter Wemm	a116ec4b5e	Juggle some internal symbols from our antique zlib (that originally came in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice. This is a horrible hack to get freefall to compile, and is in dire need of reconciliation. This antique zlib-1.04 code needs to go away.	2013-01-06 14:59:59 +00:00
Andrey V. Elsukov	e37e7917f3	Add an ability to set net.link.stf.permit_rfc1918 from the loader. MFC after: 2 weeks	2012-12-27 21:26:08 +00:00
Andrey V. Elsukov	51743c5f73	Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow the use of private IPv4 addresses with stf(4). MFC after: 2 weeks	2012-12-27 20:59:22 +00:00
Kevin Lo	c7dada99bb	Fix typo in comment. Reviewed by: thompsa	2012-12-18 06:37:23 +00:00
Gleb Smirnoff	b1ec2940af	Fix problem in r238990. The LLE_LINKED flag should be tested prior to entering llentry_free(), and in case if we lose the race, we should simply perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread performing arptimer(), it will remove two references from the lle instead of one. Reported by: Ian FREISLICH <ianf clue.co.za>	2012-12-13 11:11:15 +00:00
Guy Helmer	3b3b91e736	Changes to resolve races in bpfread() and catchpacket() that, at worst, cause kernel panics. Add a flag to the bpf descriptor to indicate whether the hold buffer is in use. In bpfread(), set the "hold buffer in use" flag before dropping the descriptor lock during the call to bpf_uiomove(). Everywhere else the hold buffer is used or changed, wait while the hold buffer is in use by bpfread(). Add a KASSERT in bpfread() after re-acquiring the descriptor lock to assist uncovering any additional hold buffer races.	2012-12-10 16:14:44 +00:00
Hiroki Sato	0bebb5448b	- Move definition of V_deembed_scopeid to scope6_var.h. - Deembed scope id in L3 address in in6_lltable_dump(). - Simplify scope id recovery in rtsock routines. - Remove embedded scope id handling in ndp(8) and route(8) completely.	2012-12-05 19:45:24 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Hiroki Sato	5c9fa630f6	- Fix LOR in sa6_recoverscope() in rt_msg2()[1]. - Check V_deembed_scopeid before checking if sa_family == AF_INET6. - Fix scope id handing in route(8)[2] and ifconfig(8). Reported by: rpaulo[1], Mateusz Guzik[1], peter[2]	2012-12-04 17:12:23 +00:00
Alexander V. Chernikov	f079a0fa8c	Fix bpf_if structure leak introduced in r235745. Move all such structures to delayed-free lists and delete all matching on interface departure event. MFC after: 1 week	2012-12-02 21:43:37 +00:00
Pawel Jakub Dawidek	5ad9520341	- Use more appropriate loop (do { } while()) when generating ethernet address for bridge interface. - If we found a collision we can break the loop - only one collision is possible and one is exactly enough to need to renegerate. Obtained from: WHEEL Systems MFC after: 1 week	2012-11-29 08:06:23 +00:00
Andre Oppermann	da2299c5c7	Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability. Checksumming the IP header of fragments is no different from doing normal IP headers. Discussed with: yongari MFC after: 1 week	2012-11-27 19:31:49 +00:00
David Xu	ba60525b3f	Pass allocated unit number to make_dev, otherwise kernel panics later while cloning second tap. Reviewed by: kevlo,ed	2012-11-27 12:23:57 +00:00
Gleb Smirnoff	5e9a54290d	Better safe than sorry: reinitialize eh after ng_ether(4) and if_bridge(4) processing, since mbuf may be modified there. Submitted by: youngari	2012-11-27 06:35:26 +00:00
Gleb Smirnoff	97cce87f78	Re-initialize eh pointer after m_adj() Submitted by: Kohji Okuno <okuno.kohji jp.panasonic.com> Reviewed by: yongari	2012-11-26 19:45:01 +00:00
Adrian Chadd	d60ec817ea	Fix up a compile time warning if INET6 isn't defined.	2012-11-18 04:51:46 +00:00
Hiroki Sato	6bbfef9004	Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to userland via routing socket or sysctl. This eliminates the following KAME-specific sin6_scope_id handling routine from each userland utility: sin6.sin6_scope_id = ntohs((u_int16_t )&sin6.sin6_addr.s6_addr[2]); This behavior can be controlled by net.inet6.ip6.deembed_scopeid. This is set to 1 by default (sin6_scope_id will be filled in the kernel). Reviewed by: bz	2012-11-17 20:19:00 +00:00
Guy Helmer	0e8a1cb3c9	Work around a race in bpfread() by validating the hold buffer pointer before freeing it. Otherwise, we can lose a buffer and cause a panic in catchpacket().	2012-11-06 21:07:04 +00:00
Andrey V. Elsukov	ffdbf9da3b	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
Gleb Smirnoff	078468ede4	o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>	2012-10-26 21:06:33 +00:00
Andrey V. Elsukov	c1de64a495	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks	2012-10-25 09:39:14 +00:00
Gleb Smirnoff	da1fc67f8a	Fix fallout from r240071. If destination interface lookup fails, we should broadcast a packet, not try to deliver it to NULL. Reported by: rpaulo	2012-10-24 18:33:44 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Alexander V. Chernikov	4dab1a18a3	Make PFIL use per-VNET lock instead of per-AF lock. Since most used packet filters (ipfw and PF) use the same ruleset with the same lock for both AF_INET and AF_INET6 there is no need in more fine-grade locking. However, it is possible to request personal lock by specifying PFIL_FLAG_PRIVATE_LOCK flag in pfil_head structure (see pfil.9 for more details). Export PFIL lock via rw_lock(9)/rm_lock(9)-like API permitting pfil consumers to use this lock instead of own lock. This help reducing locks on main traffic path. pfil_assert() is currently not implemented due to absense of rm_assert(). Waiting for some kind of r234648 to be merged in HEAD. This change is part of bigger patch reducing routing locking. Sponsored by: Yandex LLC Reviewed by: glebius, ae OK'd by: silence on net@ MFC after: 3 weeks	2012-10-22 14:10:17 +00:00
Andre Oppermann	813ee737dd	Update to previous r241688 to use __func__ instead of spelled out function name in log(9) message. Suggested by: glebius	2012-10-19 10:07:55 +00:00
Andre Oppermann	15134be81b	Use LOG_WARNING level in in_attachdomain1() instead of printf(). Submitted by: vijju.singh-at-gmail.com	2012-10-18 14:08:26 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Gleb Smirnoff	44e1d89044	Utilize new macro to initialize if_baudrate().	2012-10-18 09:57:56 +00:00
Gleb Smirnoff	3aaf0159dd	Fix VIMAGE build. Reported by: Nikolai Lifanov <lifanov mail.lifanov.com> Pointy hat to: glebius	2012-10-17 21:19:27 +00:00
Maksim Yevmenkin	608ae712d3	provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf. again, use ixgbe(4) as an example of how to use new helper function. Reviewed by: jhb MFC after: 1 week	2012-10-17 19:24:13 +00:00
Xin LI	6684e46988	Fix build.	2012-10-17 08:19:08 +00:00
Maksim Yevmenkin	e9bbb44e09	report total number of ports for each lagg(4) interface via net.link.lagg.X.count sysctl MFC after: 1 week	2012-10-16 22:43:14 +00:00
Maksim Yevmenkin	0fef97fea3	introduce concept of ifi_baudrate power factor. the idea is to work around the problem where high speed interfaces (such as ixgbe(4)) are not able to report real ifi_baudrate. bascially, take a spare byte from struct if_data and use it to store ifi_baudrate power factor. in other words, real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor this should be backwards compatible with old binaries. use ixgbe(4) as an example on how drivers would set ifi_baudrate power factor Discussed with: kib, scottl, glebius MFC after: 1 week	2012-10-16 20:18:15 +00:00
Gleb Smirnoff	42a58907c3	Make the "struct if_clone" opaque to users of the cloning API. Users now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago	2012-10-16 13:37:54 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Gleb Smirnoff	21d172a3f1	A step in resolving mess with byte ordering for AF_INET. After this change: - All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated. Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)	2012-10-06 10:02:11 +00:00
Xin LI	15752fa858	MFV: libpcap 1.3.0. MFC after: 4 weeks	2012-10-05 18:42:50 +00:00
Andrew Thompson	3e92ee8a53	Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The function return value is not even checked and could lead to a panic on a null sc_rthash. MFC after: 2 weeks	2012-10-04 07:40:55 +00:00
Ed Maste	104d9fc776	Cast through void * to silence compiler warning The base netmap pointer and offsets involved are provided by the kernel side of the netmap interface and will have appropriate alignment. Sponsored by: ADARA Networks MFC After: 2 weeks	2012-10-03 21:41:20 +00:00
John Baldwin	b3aa419331	Rename the module for 'device enc' to "if_enc" to avoid conflicting with the CAM "enc" peripheral (part of ses(4)). Previously the two modules used the same name, so only one was included in a linked kernel causing enc0 to not be created if you added IPSEC to GENERIC. The new module name follows the pattern of other network interfaces (e.g. "if_loop"). MFC after: 1 week	2012-10-02 12:25:30 +00:00
Gleb Smirnoff	063efed28c	The drbr(9) API appeared to be so unclear, that most drivers in tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times. o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists. o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes. o Drivers handle their stats theirselves: if_obytes, if_omcasts. o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves. o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself. o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well. o mxge(4) just no longer needs NO_SLOW_STATS define. o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware. Reviewed by: jfv, gnn	2012-09-28 18:28:27 +00:00
Gleb Smirnoff	80cd7c7596	- In the bridge_enqueue() do success/error accounting for each fragment, not only once. - In the GRAB_OUR_PACKETS() macro do increase if_ibytes.	2012-09-26 20:09:48 +00:00
Ed Maste	66d3579a1e	Correct misspelling in debug output.	2012-09-26 01:09:19 +00:00
Ed Maste	c11038e252	Revert part of an earlier patch attempt that snuck in with r240938.	2012-09-25 23:41:45 +00:00
Ed Maste	5a71e42350	Avoid INVARIANTS panic destroying an in-use tap(4) The requirement (implied by the KASSERT in tap_destroy) that the tap is closed isn't valid; destroy_dev will block in devdrn while other threads are in d_* functions. Note: if_tun had the same issue, addressed in SVN revisions r186391, r186483 and r186497. The use of the condvar there appears to be redundant with the functionality provided by destroy_dev. Sponsored by: ADARA Networks Reviewed by: dwhite MFC after: 2 weeks	2012-09-25 22:10:14 +00:00
Ed Maste	cf8f32025f	Remove an incorrect comment	2012-09-25 21:19:17 +00:00
Gleb Smirnoff	3b7d677b8f	Convert lagg(4) to use if_transmit instead of if_start. In collaboration with: thompsa, sbruno, fabient	2012-09-20 10:05:10 +00:00
Gleb Smirnoff	22c914789e	Utilize Jenkins hash with random seed for source nodes storage.	2012-09-20 06:52:05 +00:00
Gleb Smirnoff	7b11548469	Add missing break. Pointy hat to: glebius	2012-09-20 03:09:58 +00:00
Gleb Smirnoff	9ed8bbbdbe	Fix build, pass the pointy hat please.	2012-09-18 12:21:32 +00:00
Gleb Smirnoff	1d6139c0e4	Make ruleset anchors in pf(4) reentrant. We've got two problems here: 1) Ruleset parser uses a global variable for anchor stack. 2) When processing a wildcard anchor, matching anchors are marked. To fix the first one: o Allocate anchor processing stack on stack. To make this allocation as small as possible, following measures taken: - Maximum stack size reduced from 64 to 32. - The struct pf_anchor_stackframe trimmed by one pointer - parent. We can always obtain the parent via the rule pointer. - When pf_test_rule() calls pf_get_translation(), the former lends its stack to the latter, to avoid recursive allocation 32 entries. The second one appeared more tricky. The code, that marks anchors was added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea is to enable the "quick" keyword on an anchor rule. The feature isn't documented anywhere. The most obscure part of the 1.516 was that code examines the "match" mark on a just processed child, which couldn't be put here by current frame. Since this wasn't documented even in the commit message and functionality of this is not clear to me, I decided to drop this examination for now. The rest of 1.516 is redone in a thread safe manner - the mark isn't put on the anchor itself, but on current stack frame. To avoid growing stack frame, we utilize LSB from the rule pointer, relying on kernel malloc(9) returning pointer aligned addresses. Discussed with: dhartmei	2012-09-18 10:54:56 +00:00
Gleb Smirnoff	9e8c4accee	- Add $FreeBSD$ to allow modifications to this file. - Move $OpenBSD$ to a more standard place.	2012-09-18 10:52:46 +00:00
Gleb Smirnoff	3b3a8eb937	o Create directory sys/netpfil, where all packet filters should reside, and move there ipfw(4) and pf(4). o Move most modified parts of pf out of contrib. Actual movements: sys/contrib/pf/net/.c -> sys/netpfil/pf/ sys/contrib/pf/net/.h -> sys/net/ contrib/pf/pfctl/.c -> sbin/pfctl contrib/pf/pfctl/.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/.4 -> share/man/man4 contrib/pf/pfctl/.5 -> share/man/man5 sys/netinet/ipfw -> sys/netpfil/ipfw The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice. Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd. The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match. Discussed with: bz, luigi	2012-09-14 11:51:49 +00:00
Gleb Smirnoff	d6d3f01e0a	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
Alexander V. Chernikov	73c23f3ba1	Fix the build broken by r240099. Hide link_pfil_hook under _KERNEL macro. MFC after: 3 weeks	2012-09-04 22:17:33 +00:00
Alexander V. Chernikov	7d4317bd40	Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks	2012-09-04 19:43:26 +00:00
Gleb Smirnoff	62208ca5d2	- Move jenkins.h to jenkins_hash.c - Provide missing function that can do hashing of arbitrary sized buffer. - Refetch lookup3.c and do only minimal edits to it, so that diff between our jenkins_hash.c and lookup3.c is minimal. - Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h. - Document these functions in hash(9) Obtained from: http://burtleburtle.net/bob/c/lookup3.c	2012-09-04 12:07:33 +00:00
Gleb Smirnoff	3582a9f6c6	Change bridge(4) to use if_transmit for forwarding packets to underlying interfaces instead of queueing. Tested by: ray	2012-09-03 10:08:20 +00:00
Gleb Smirnoff	3932d76033	In ifc_alloc_unit(): - In the !wildcard case, return ENOSPC instead of confusing EEXIST in case if ifc->ifc_maxunit reached. - Fix unit leak, that I've introduced in previous revision. Submitted by: Daan Vreeken <Daan vitsch.nl>	2012-08-30 12:18:45 +00:00
John Baldwin	b90dde2fbc	Fix a silly grammar bogon. Submitted by: Stephen McKay	2012-08-21 19:07:28 +00:00
John Baldwin	28cc4d37e6	Refine the changes made in r208212 to avoid bogus failures from if_delmulti() when clearing the configuration for a subinterface when the parent interface is being detached. The current code was still triggering an assertion in if_delmulti() due to the parent interface being partially detached. Fix this by not calling if_delmulti() at all if the parent interface is being detached. Warn if if_delmulti() fails when the parent is not being detached (but similar to 208212, still proceed with tearing down the vlan state). Tested by: ae@ MFC after: 1 month	2012-08-20 16:00:33 +00:00
John Baldwin	2541fcd953	Unexpand a couple of TAILQ_FOREACH()s.	2012-08-17 16:01:24 +00:00
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Gleb Smirnoff	ea53792942	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2012-08-02 13:57:49 +00:00
Gleb Smirnoff	b1d86af706	The llentry_update() is used only by flowtable and the latter always passes NULL pointer to it. Thus, code can be simplified and function renamed to llentry_alloc() to match rtalloc().	2012-08-02 13:20:44 +00:00
Gleb Smirnoff	b9aee262e5	Some more whitespace cleanup.	2012-08-01 09:00:26 +00:00
Gleb Smirnoff	ea50c13ebe	Some style(9) and whitespace changes. Together with: Andrey Zonov <andrey zonov.org>	2012-07-31 11:31:12 +00:00
Bjoern A. Zeeb	5e5c0e7980	Hardcode the loopback rx/tx checkum options for IPv6 to on without checking. This allows the FreeBSD 9.1 release process to move forward. Work around the problem that loopback connections to local addresses not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading enabled would not work. A proper fix to allow us to disable the "checksum offload" on loopback for testing, measurements, ... as we allow for IPv4 needs to put in place later. Reported by: tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk) Reported by: Mike Andrews (mandrews bit0.com), kib, ... PR: kern/170070 MFC after: 1 day X-MFC after: re approval	2012-07-28 20:31:39 +00:00
Alexander V. Chernikov	99ab4b1297	Permit changing MTU in 6to4 relay. This behavior is recommended by RFC 4213 clause 3.2. Sometimes fragmentation is the least evil. For example, some Linux IPVS kernels forwards ICMPv6 checksums to real servers incorrectly. Reviewed by: hrs(previous version) Approved by: kib(mentor) MFC after: 1 week	2012-07-15 17:44:27 +00:00
Ed Maste	bf7a35de2d	Simplify error case Submitted by: thompsa@	2012-07-10 20:59:35 +00:00
Ed Maste	683fa2b5d7	Plug potential mbuf leak when bridging fragments If an error occurs when transmitting one mbuf in a chain of fragments, free the subsequent fragments instead of leaking them. Sponsored by: ADARA Networks	2012-07-10 13:17:32 +00:00
Mikolaj Golub	7edc3d88eb	In epair_clone_destroy(), when destroying the second half, we have to switch to its vnet before calling ether_ifdetach(). Otherwise if the second half resides in a different vnet, if_detach() silently fails leaving a stale pointer in V_ifnet list, and the system crashes trying to access this pointer later. Another solution could be not to allow to destroy epair unless both ends are in the home vnet. Discussed with: bz Tested by: delphij	2012-07-09 20:38:18 +00:00
Ed Maste	21151865d5	Restore error handling lost in r191603 This was missed in the change from IFQ_ENQUEUE to if_transmit. Sponsored by: ADARA Networks	2012-07-09 14:16:49 +00:00
Ed Maste	d5fb967ab2	Implement SIOCGIFMEDIA for if_tap(4) Appease certain if_tap(4) consumers by providing simulated Ethernet media status. DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3 Obtained from: DragonFly BSD	2012-07-06 23:17:30 +00:00
Gleb Smirnoff	bf9840512a	When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)	2012-07-04 07:37:53 +00:00
Andrew Thompson	61587a8403	Add the same check as vlan(4) where we ignore the ifnet departure event if the interface is just being renamed. PR: kern/169557 Submitted by: Mark Johnston MFC after: 3 days	2012-06-30 19:09:02 +00:00
John Baldwin	304050dde0	Hold GIF_LOCK() for almost all of gif_start(). It is required to be held across in_gif_output() and in6_gif_output() anyway, and once it is held across those it might as well be held for the entire loop. This simplifies the code and removes the need for the custom IFF_GIF_WANTED flag (which belonged in the softc and not as an IFF_* flag anyway). Tested by: Vincent Hoffman vince unsane co uk	2012-06-29 15:21:34 +00:00
Navdeep Parhar	09fe63205c	- Updated TOE support in the kernel. - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m \| grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp \| grep toe # sockstat -46c \| grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)	2012-06-19 07:34:13 +00:00
Randall Stewart	4013878828	Fix comment to better reflect how we are cheating and using the csum_data. Also fix style issues with the comments.	2012-06-12 13:31:32 +00:00
Randall Stewart	cef68c63ec	Note to self. Have morning coffee before committing things. There is no mac_addr in the mbuf for BSD.. cheat like we are supposed to and use the csum field since our friend the gif tunnel itself will never use offload.	2012-06-12 12:44:17 +00:00
Randall Stewart	6f17e3a31a	Opps forgot to commit the flag.	2012-06-12 12:40:15 +00:00
Randall Stewart	776b728856	Allow a gif tunnel to be used with ALTq. Reviewed by: gnn	2012-06-12 10:44:09 +00:00
Andrew Thompson	08e348234c	Fix a panic I introduced in r234487, the bridge softc pointer is set to null early in the detach so rearrange things not to explode. Reported by: David Roffiaen, Gustau Perez Querol Tested by: David Roffiaen MFC after: 3 days	2012-06-11 20:12:13 +00:00
Alexander V. Chernikov	4fe83b8159	Fix typo introduced in r236559. Pointed by: bcr Approved by: kib(mentor)	2012-06-09 10:04:40 +00:00
Mikolaj Golub	62b1b42507	Sort includes. Submitted by: Daan Vreeken <pa4dan Bliksem.VEHosting.nl> MFC after: 3 days	2012-06-07 19:48:45 +00:00
Mikolaj Golub	fef68bb8dc	Add VIMAGE support to if_tap. PR: kern/152047, kern/158686 Submitted by: Daan Vreeken <pa4dan Bliksem.VEHosting.nl> MFC after: 1 week	2012-06-07 19:46:46 +00:00
Alexander V. Chernikov	784292f89a	Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface. Add several comments on locking Found by: avg Approved by: ae(mentor) Tested by: avg MFC after: 1 week	2012-06-04 12:36:58 +00:00
Michael Tuexen	a6cff10f2a	Seperate SCTP checksum offloading for IPv4 and IPv6. While there: remove some trainling whitespaces. MFC after: 3 days X-MFC with: 236170	2012-05-30 20:56:07 +00:00
Jung-uk Kim	9b7d4a7f2d	Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf().	2012-05-29 22:28:46 +00:00
Jung-uk Kim	8b04b48a7d	- Save the previous filter right before we set new one. - Reduce duplicate code and make it little easier to read. MFC after: 2 weeks	2012-05-29 22:21:53 +00:00
Jung-uk Kim	6f731135ac	Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor and reset statistics as it should. MFC after: 3 days	2012-05-29 18:44:53 +00:00
Alexander V. Chernikov	a86227d176	Fix BPF_JITTER code broken by r235746. Pointed by: jkim Reviewed by: jkim (except locking changes) Approved by: (mentor) MFC after: 2 weeks	2012-05-29 12:52:30 +00:00
Eygene Ryabinkin	f74d5a7a20	if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row Currently, 'ifconfig laggX down' does not remove members from this lagg(4) interface. So, 'service netif stop laggX' followed by 'service netif start laggX' will choke, because "stop" will leave interfaces attached to the laggX and ifconfig from the "start" will refuse to add already-existing interfaces. The real-world case is when I am bundling together my Ethernet and WiFi interfaces and using multiple profiles for accessing network in different places: system being booted up with one profile, but later this profile being exchanged to another one, followed by 'service netif restart' will not add WiFi interface back to the lagg: the "stop" action from 'service netif restart' will shut down my main WiFi interface, so wlan0 that exists in the lagg0 will be destroyed and purged from lagg0; the "start" action will try to re-add both interfaces, but since Ethernet one is already in lagg0, ifconfig will refuse to add the wlan0 from WiFi interface. Since adding the interface to the lagg(4) when it is already here should be an idempotent action: we're really not changing anything, so this fix doesn't change the semantics of interface addition. Approved by: thompsa Reviewed by: emaste MFC after: 1 week	2012-05-28 12:13:04 +00:00
Bjoern A. Zeeb	356ab07e2d	It turns out that too many drivers are not only parsing the L2/3/4 headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958	2012-05-28 09:30:13 +00:00
Andrew Thompson	5fc4c149ab	Turn LACP debugging from a compile time option to a sysctl, it is very handy to be able to turn it on when negotiation to a switch misbehaves. Submitted by: Andrew Boyer MFC after: 3 days	2012-05-26 08:09:01 +00:00
Bjoern A. Zeeb	d4b93a67d9	MFp4 bz_ipv6_fast: Simple yet effective change enabling checksum "offload" on loopback for IPv6 to avoid expensive computations. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 02:21:17 +00:00
Alexander V. Chernikov	97aacec622	Make most BPF ioctls() SMP-safe. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:21:00 +00:00
Alexander V. Chernikov	c7b0200eb5	Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter. Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl. This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:19:19 +00:00
Alexander V. Chernikov	afa85850e7	Fix old panic when BPF consumer attaches to destroying interface. 'flags' field is added to the end of bpf_if structure. Currently the only flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd() Problem can be easily triggered on SMP stable/[89] by the following command (sort of): 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done' Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory, while interface is still UP and there can be routes via this interface. Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api. Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches) Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:17:29 +00:00
Alexander V. Chernikov	6c74ff0ea6	Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@) Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@) Protect most of bpf_setf() by BPF global lock Add several forgotten assertions (thanks to adrian@) Document current locking model inside bpf.c Document EVENTHANDLER(9) usage inside BPF. Approved by: kib(mentor) Tested by: gnn MFC in: 4 weeks	2012-05-21 22:13:48 +00:00
Marcel Moolenaar	cf0d539f8b	Use the LLINDEX macro to access the link-level I/F index. This makes it possible to work with a different type for the sdl_index field -- it only requires a recompile. Obtained from: Juniper Networks, Inc.	2012-05-19 02:39:43 +00:00
Xin LI	47db53c31a	Sync DLTs with the latest pcap version. MFC after: 2 weeks	2012-05-14 05:10:41 +00:00
Alexander V. Chernikov	bdf942c3f0	Revert r234834 per luigi@ request. Cleaner solution (e.g. adding another header) should be done here. Original log: Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Requested by: luigi Approved by: kib(mentor)	2012-05-03 08:56:43 +00:00
Ed Maste	6107adc3f5	Relax restriction on direct tx to child ports Lagg(4) restricts the type of packet that may be sent directly to a child port, to avoid undesired output from accidental misconfiguration. Previously only ETHERTYPE_PAE was permitted. BPF writes to a lagg(4) child port are presumably intentional, so just allow them, while still blocking other packets that should take the aggregation path. PR: kern/138620 Approved by: thompsa@	2012-05-03 01:41:12 +00:00
Alexander V. Chernikov	7bd5e9b143	Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Approved by: ae(mentor) MFC after: 2 weeks	2012-04-30 10:22:23 +00:00
Alexander V. Chernikov	c2508034a2	Do not require radix write lock to be held while dumping route table via sysctl(4) interface. This permits router not to stop forwarding packets while route table is being written to user-supplied buffer. Reported by: Pawel Tyll <ptyll@nitronet.pl> Approved by: kib(mentor) MFC after: 1 week	2012-04-22 16:13:23 +00:00
Andrew Thompson	2885c19ebd	Move the interface media check to a taskqueue, some interfaces (usb) sleep during SIOCGIFMEDIA and we were holding locks.	2012-04-20 10:06:28 +00:00
Andrew Thompson	7702d4013b	Add linkstate to bridge(4), set the link to up when at least one underlying interface is up, otherwise the link is down. This, among other things, allows carp to work on a bridge. Prodded by: glebius Tested by: Alexander Lunev	2012-04-20 09:55:50 +00:00
Andrew Thompson	ddf3201009	Remove KASSERTS, they do not add any value here since the pointer is about to be derefernced anyway.	2012-04-18 01:39:14 +00:00
Luigi Rizzo	d76bf4ff7b	A bit of cleanup in the names of fields of netmap-related structures. Use the name 'ring' instead of 'queue' in all fields. Bump NETMAP_API.	2012-04-13 16:03:07 +00:00
Luigi Rizzo	d5d42003f4	remove an unnecessary #define	2012-04-12 10:32:34 +00:00
Andrew Thompson	b517176ad9	Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets are discarded, this is an issue because lacp drops the lock which may allow network threads to access freed memory. Expand the lock coverage so the detach/attach happen atomically. Submitted by: Andrew Boyer (earlier version)	2012-04-12 01:07:17 +00:00
John Baldwin	19a3210a66	Add media types for 40G media that might be used with FreeBSD. Reviewed by: bz MFC after: 2 weeks	2012-04-10 13:59:35 +00:00
Alexander V. Chernikov	9431cc1696	Fix build broken by r233938. Pointed by: David Wolfskill <david@catwhisker.org> Approved by: kib (mentor) Pointy hat to: melifaro	2012-04-06 13:34:19 +00:00
Alexander V. Chernikov	51ec1eb70d	- Improve performace for writer-only BPF users. Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send raw ethernet frames. The only FreeBSD interface that can be used to send raw frames is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses BPF only to send data. This leads us to the situation when software like cdpd, being run on high-traffic-volume interface significantly reduces overall performance since we have to acquire additional locks for every packet. Here we add sysctl that changes BPF behavior in the following way: If program came and opens BPF socket without explicitly specifyin read filter we assume it to be write-only and add it to special writer-only per-interface list. This makes bpf_peers_present() return 0, so no additional overhead is introduced. After filter is supplied, descriptor is added to original per-interface list permitting packets to be captured. Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of setting snap length. Fortunately, most programs explicitly sets (event catch-all) filter after that. tcpdump(1) is a good example. So a bit hackis approach is taken: we upgrade description only after second BIOCSETF is received. Sysctl is named net.bpf.optimize_writers and is turned off by default. - While here, document all sysctl variables in bpf.4 Sponsored by Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 4 weeks	2012-04-06 06:55:21 +00:00
Alexander V. Chernikov	e4b3229aa5	- Improve BPF locking model. Interface locks and descriptor locks are converted from mutex(9) to rwlock(9). This greately improves performance: in most common case we need to acquire 1 reader lock instead of 2 mutexes. - Remove filter(descriptor) (reader) lock in bpf_mtap[2] This was suggested by glebius@. We protect filter by requesting interface writer lock on filter change. - Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h without including rwlock stuff. However, this is is temporary solution, struct bpf_if should be made opaque for any external caller. Found by: Dmitrij Tejblum <tejblum@yandex-team.ru> Sponsored by: Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 3 weeks	2012-04-06 06:53:58 +00:00
John Baldwin	02ed02af7b	Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD. The new [RW]LOCK macros are merged back to 8.x so should be suitable for new code in HEAD even if it is to be MFC'd.	2012-03-19 21:09:12 +00:00
Bjoern A. Zeeb	bfca216eb9	Hide kernel option ROUTETABLES evaluations in the implementation rather than the header file. With this also move RT_MAXFIBS and RT_NUMFIBS into the implemantion to avoid further usage in other code. rt_numfibs is all that should be needed. This allows users to change the number of FIBs from 1..RT_MAXFIBS(16) dynamically using the tunable without the need to change the kernel config for the maximum anymore. This means that thet multi-FIB feature is now fully available with GENERIC kernels. The kernel option ROUTETABLES can still be used to set the default numbers of FIBs in absence of the tunable. Ok.ed by: julian, hrs, melifaro MFC after: 2 weeks	2012-03-18 11:23:40 +00:00
Luigi Rizzo	a72505824c	- remove an extra parenthesis in a closing brace; - add the macro NETMAP_RING_FIRST_RESERVED() which returns the index of the first non-released buffer in the ring (this is useful for code that retains buffers for some time instead of processing them immediately)	2012-03-11 17:35:12 +00:00
Andrew Thompson	cd613b6351	Move the vlan buffer space into the union which also fixes an unused variable warning with !INET & !INET6. Spotted by: pluknet	2012-03-07 07:22:53 +00:00
Andrew Thompson	86f67641a9	Add the ability to set which packet layers are used for the load balance hash calculation.	2012-03-06 22:58:13 +00:00
Marko Zec	2db13e7575	Properly restore curvnet context when returning early from ether_input_internal(). This change only affects options VIMAGE kernel builds. PR: kern/165643 Submitted by: Vijay Singh MFC after: 3 days	2012-03-04 11:11:03 +00:00
Juli Mallett	9624d94701	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
Andrew Thompson	70b23a4596	Use a more appropriate default for the maximum number of addresses in the bridge forwarding table. PR: docs/164564 Discussed with: brueffer	2012-02-29 20:58:21 +00:00
Luigi Rizzo	64ae02c365	A bunch of netmap fixes: USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.	2012-02-27 19:05:01 +00:00
Andrew Thompson	8d45bd6e80	Only look for a usable MAC address for the bridge ID from ports within our bridge, this allows us to have more than one independent bridge in the same STP domain. PR: kern/164369 Submitted by: Nikos Vassiliadis (earlier version) MFC after: 2 weeks	2012-02-24 17:50:36 +00:00
Andrew Thompson	3122b9120c	Add a sysctl/tunable default value for the use_flowid sysctl in r232008.	2012-02-23 21:56:53 +00:00
Andrew Thompson	47190ea664	Indicate this function decrements the timer as well as testing for expiry.	2012-02-23 20:58:52 +00:00
Kip Macy	a93cda789a	When using flowtable llentrys can outlive the interface with which they're associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid. Move the free pointer in to the llentry itself and update the initalization sites. MFC after: 2 weeks	2012-02-23 18:21:37 +00:00
Andrew Thompson	2ad65e315d	Now that network interfaces advertise if they support linkstate notifications we do not need to perform a media ioctl every 15 seconds.	2012-02-23 06:26:16 +00:00
Andrew Thompson	4661f8627c	bstp_input() always consumes the packet so remove the mbuf handling dance around it. Obtained from: OpenBSD (r1.37)	2012-02-23 00:59:21 +00:00
Andrew Thompson	0bf97ae271	Using the flowid in the mbuf assumes the network card is giving a good hash for the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week	2012-02-22 22:01:30 +00:00
Bjoern A. Zeeb	9dba179d5e	IFC @231845 Sponsored by: Cisco Systems, Inc.	2012-02-17 00:27:48 +00:00
Tijl Coosemans	265f940acc	Change some headers such that lang/gcc* ports no longer patch them. The lang/gcc* ports patch headers where they think something is non-standard. These patched headers override the system headers which means you have to rebuild these ports whenever you do installworld to make sure they contain the latest changes.	2012-02-14 12:50:20 +00:00
Bjoern A. Zeeb	6d076ae8f7	Introduce a new NET_RT_IFLISTL API to query the address list. It works on extended and extensible structs if_msghdrl and ifa_msghdrl. This will allow us to extend both the msghdrl structs and eventually if_data in the future without breaking the ABI. Bump __FreeBSD_version to allow ports to more easily detect the new API. Reviewed by: glebius, brooks MFC after: 3 days	2012-02-11 06:02:16 +00:00

1 2 3 4 5 ...

3058 Commits