freebsd-dev

Author	SHA1	Message	Date
Jung-uk Kim	6f731135ac	Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor and reset statistics as it should. MFC after: 3 days	2012-05-29 18:44:53 +00:00
Alexander V. Chernikov	a86227d176	Fix BPF_JITTER code broken by r235746. Pointed by: jkim Reviewed by: jkim (except locking changes) Approved by: (mentor) MFC after: 2 weeks	2012-05-29 12:52:30 +00:00
Eygene Ryabinkin	f74d5a7a20	if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row Currently, 'ifconfig laggX down' does not remove members from this lagg(4) interface. So, 'service netif stop laggX' followed by 'service netif start laggX' will choke, because "stop" will leave interfaces attached to the laggX and ifconfig from the "start" will refuse to add already-existing interfaces. The real-world case is when I am bundling together my Ethernet and WiFi interfaces and using multiple profiles for accessing network in different places: system being booted up with one profile, but later this profile being exchanged to another one, followed by 'service netif restart' will not add WiFi interface back to the lagg: the "stop" action from 'service netif restart' will shut down my main WiFi interface, so wlan0 that exists in the lagg0 will be destroyed and purged from lagg0; the "start" action will try to re-add both interfaces, but since Ethernet one is already in lagg0, ifconfig will refuse to add the wlan0 from WiFi interface. Since adding the interface to the lagg(4) when it is already here should be an idempotent action: we're really not changing anything, so this fix doesn't change the semantics of interface addition. Approved by: thompsa Reviewed by: emaste MFC after: 1 week	2012-05-28 12:13:04 +00:00
Bjoern A. Zeeb	356ab07e2d	It turns out that too many drivers are not only parsing the L2/3/4 headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely. To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well. Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6. This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist. Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags. Individual driver updates will have to follow, as will SCTP. Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958	2012-05-28 09:30:13 +00:00
Andrew Thompson	5fc4c149ab	Turn LACP debugging from a compile time option to a sysctl, it is very handy to be able to turn it on when negotiation to a switch misbehaves. Submitted by: Andrew Boyer MFC after: 3 days	2012-05-26 08:09:01 +00:00
Bjoern A. Zeeb	d4b93a67d9	MFp4 bz_ipv6_fast: Simple yet effective change enabling checksum "offload" on loopback for IPv6 to avoid expensive computations. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-25 02:21:17 +00:00
Alexander V. Chernikov	97aacec622	Make most BPF ioctls() SMP-safe. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:21:00 +00:00
Alexander V. Chernikov	c7b0200eb5	Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter. Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl. This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock. Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:19:19 +00:00
Alexander V. Chernikov	afa85850e7	Fix old panic when BPF consumer attaches to destroying interface. 'flags' field is added to the end of bpf_if structure. Currently the only flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd() Problem can be easily triggered on SMP stable/[89] by the following command (sort of): 'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done' Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory, while interface is still UP and there can be routes via this interface. Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api. Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches) Approved by: kib(mentor) MFC in: 4 weeks	2012-05-21 22:17:29 +00:00
Alexander V. Chernikov	6c74ff0ea6	Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@) Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@) Protect most of bpf_setf() by BPF global lock Add several forgotten assertions (thanks to adrian@) Document current locking model inside bpf.c Document EVENTHANDLER(9) usage inside BPF. Approved by: kib(mentor) Tested by: gnn MFC in: 4 weeks	2012-05-21 22:13:48 +00:00
Marcel Moolenaar	cf0d539f8b	Use the LLINDEX macro to access the link-level I/F index. This makes it possible to work with a different type for the sdl_index field -- it only requires a recompile. Obtained from: Juniper Networks, Inc.	2012-05-19 02:39:43 +00:00
Xin LI	47db53c31a	Sync DLTs with the latest pcap version. MFC after: 2 weeks	2012-05-14 05:10:41 +00:00
Alexander V. Chernikov	bdf942c3f0	Revert r234834 per luigi@ request. Cleaner solution (e.g. adding another header) should be done here. Original log: Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Requested by: luigi Approved by: kib(mentor)	2012-05-03 08:56:43 +00:00
Ed Maste	6107adc3f5	Relax restriction on direct tx to child ports Lagg(4) restricts the type of packet that may be sent directly to a child port, to avoid undesired output from accidental misconfiguration. Previously only ETHERTYPE_PAE was permitted. BPF writes to a lagg(4) child port are presumably intentional, so just allow them, while still blocking other packets that should take the aggregation path. PR: kern/138620 Approved by: thompsa@	2012-05-03 01:41:12 +00:00
Alexander V. Chernikov	7bd5e9b143	Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code. Approved by: ae(mentor) MFC after: 2 weeks	2012-04-30 10:22:23 +00:00
Alexander V. Chernikov	c2508034a2	Do not require radix write lock to be held while dumping route table via sysctl(4) interface. This permits router not to stop forwarding packets while route table is being written to user-supplied buffer. Reported by: Pawel Tyll <ptyll@nitronet.pl> Approved by: kib(mentor) MFC after: 1 week	2012-04-22 16:13:23 +00:00
Andrew Thompson	2885c19ebd	Move the interface media check to a taskqueue, some interfaces (usb) sleep during SIOCGIFMEDIA and we were holding locks.	2012-04-20 10:06:28 +00:00
Andrew Thompson	7702d4013b	Add linkstate to bridge(4), set the link to up when at least one underlying interface is up, otherwise the link is down. This, among other things, allows carp to work on a bridge. Prodded by: glebius Tested by: Alexander Lunev	2012-04-20 09:55:50 +00:00
Andrew Thompson	ddf3201009	Remove KASSERTS, they do not add any value here since the pointer is about to be derefernced anyway.	2012-04-18 01:39:14 +00:00
Luigi Rizzo	d76bf4ff7b	A bit of cleanup in the names of fields of netmap-related structures. Use the name 'ring' instead of 'queue' in all fields. Bump NETMAP_API.	2012-04-13 16:03:07 +00:00
Luigi Rizzo	d5d42003f4	remove an unnecessary #define	2012-04-12 10:32:34 +00:00
Andrew Thompson	b517176ad9	Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets are discarded, this is an issue because lacp drops the lock which may allow network threads to access freed memory. Expand the lock coverage so the detach/attach happen atomically. Submitted by: Andrew Boyer (earlier version)	2012-04-12 01:07:17 +00:00
John Baldwin	19a3210a66	Add media types for 40G media that might be used with FreeBSD. Reviewed by: bz MFC after: 2 weeks	2012-04-10 13:59:35 +00:00
Alexander V. Chernikov	9431cc1696	Fix build broken by r233938. Pointed by: David Wolfskill <david@catwhisker.org> Approved by: kib (mentor) Pointy hat to: melifaro	2012-04-06 13:34:19 +00:00
Alexander V. Chernikov	51ec1eb70d	- Improve performace for writer-only BPF users. Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send raw ethernet frames. The only FreeBSD interface that can be used to send raw frames is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses BPF only to send data. This leads us to the situation when software like cdpd, being run on high-traffic-volume interface significantly reduces overall performance since we have to acquire additional locks for every packet. Here we add sysctl that changes BPF behavior in the following way: If program came and opens BPF socket without explicitly specifyin read filter we assume it to be write-only and add it to special writer-only per-interface list. This makes bpf_peers_present() return 0, so no additional overhead is introduced. After filter is supplied, descriptor is added to original per-interface list permitting packets to be captured. Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of setting snap length. Fortunately, most programs explicitly sets (event catch-all) filter after that. tcpdump(1) is a good example. So a bit hackis approach is taken: we upgrade description only after second BIOCSETF is received. Sysctl is named net.bpf.optimize_writers and is turned off by default. - While here, document all sysctl variables in bpf.4 Sponsored by Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 4 weeks	2012-04-06 06:55:21 +00:00
Alexander V. Chernikov	e4b3229aa5	- Improve BPF locking model. Interface locks and descriptor locks are converted from mutex(9) to rwlock(9). This greately improves performance: in most common case we need to acquire 1 reader lock instead of 2 mutexes. - Remove filter(descriptor) (reader) lock in bpf_mtap[2] This was suggested by glebius@. We protect filter by requesting interface writer lock on filter change. - Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h without including rwlock stuff. However, this is is temporary solution, struct bpf_if should be made opaque for any external caller. Found by: Dmitrij Tejblum <tejblum@yandex-team.ru> Sponsored by: Yandex LLC Reviewed by: glebius (previous version) Reviewed by: silence on -net@ Approved by: (mentor) MFC after: 3 weeks	2012-04-06 06:53:58 +00:00
John Baldwin	02ed02af7b	Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD. The new [RW]LOCK macros are merged back to 8.x so should be suitable for new code in HEAD even if it is to be MFC'd.	2012-03-19 21:09:12 +00:00
Bjoern A. Zeeb	bfca216eb9	Hide kernel option ROUTETABLES evaluations in the implementation rather than the header file. With this also move RT_MAXFIBS and RT_NUMFIBS into the implemantion to avoid further usage in other code. rt_numfibs is all that should be needed. This allows users to change the number of FIBs from 1..RT_MAXFIBS(16) dynamically using the tunable without the need to change the kernel config for the maximum anymore. This means that thet multi-FIB feature is now fully available with GENERIC kernels. The kernel option ROUTETABLES can still be used to set the default numbers of FIBs in absence of the tunable. Ok.ed by: julian, hrs, melifaro MFC after: 2 weeks	2012-03-18 11:23:40 +00:00
Luigi Rizzo	a72505824c	- remove an extra parenthesis in a closing brace; - add the macro NETMAP_RING_FIRST_RESERVED() which returns the index of the first non-released buffer in the ring (this is useful for code that retains buffers for some time instead of processing them immediately)	2012-03-11 17:35:12 +00:00
Andrew Thompson	cd613b6351	Move the vlan buffer space into the union which also fixes an unused variable warning with !INET & !INET6. Spotted by: pluknet	2012-03-07 07:22:53 +00:00
Andrew Thompson	86f67641a9	Add the ability to set which packet layers are used for the load balance hash calculation.	2012-03-06 22:58:13 +00:00
Marko Zec	2db13e7575	Properly restore curvnet context when returning early from ether_input_internal(). This change only affects options VIMAGE kernel builds. PR: kern/165643 Submitted by: Vijay Singh MFC after: 3 days	2012-03-04 11:11:03 +00:00
Juli Mallett	9624d94701	o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo	2012-03-03 08:19:18 +00:00
Andrew Thompson	70b23a4596	Use a more appropriate default for the maximum number of addresses in the bridge forwarding table. PR: docs/164564 Discussed with: brueffer	2012-02-29 20:58:21 +00:00
Luigi Rizzo	64ae02c365	A bunch of netmap fixes: USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.	2012-02-27 19:05:01 +00:00
Andrew Thompson	8d45bd6e80	Only look for a usable MAC address for the bridge ID from ports within our bridge, this allows us to have more than one independent bridge in the same STP domain. PR: kern/164369 Submitted by: Nikos Vassiliadis (earlier version) MFC after: 2 weeks	2012-02-24 17:50:36 +00:00
Andrew Thompson	3122b9120c	Add a sysctl/tunable default value for the use_flowid sysctl in r232008.	2012-02-23 21:56:53 +00:00
Andrew Thompson	47190ea664	Indicate this function decrements the timer as well as testing for expiry.	2012-02-23 20:58:52 +00:00
Kip Macy	a93cda789a	When using flowtable llentrys can outlive the interface with which they're associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid. Move the free pointer in to the llentry itself and update the initalization sites. MFC after: 2 weeks	2012-02-23 18:21:37 +00:00
Andrew Thompson	2ad65e315d	Now that network interfaces advertise if they support linkstate notifications we do not need to perform a media ioctl every 15 seconds.	2012-02-23 06:26:16 +00:00
Andrew Thompson	4661f8627c	bstp_input() always consumes the packet so remove the mbuf handling dance around it. Obtained from: OpenBSD (r1.37)	2012-02-23 00:59:21 +00:00
Andrew Thompson	0bf97ae271	Using the flowid in the mbuf assumes the network card is giving a good hash for the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week	2012-02-22 22:01:30 +00:00
Bjoern A. Zeeb	9dba179d5e	IFC @231845 Sponsored by: Cisco Systems, Inc.	2012-02-17 00:27:48 +00:00
Tijl Coosemans	265f940acc	Change some headers such that lang/gcc* ports no longer patch them. The lang/gcc* ports patch headers where they think something is non-standard. These patched headers override the system headers which means you have to rebuild these ports whenever you do installworld to make sure they contain the latest changes.	2012-02-14 12:50:20 +00:00
Bjoern A. Zeeb	6d076ae8f7	Introduce a new NET_RT_IFLISTL API to query the address list. It works on extended and extensible structs if_msghdrl and ifa_msghdrl. This will allow us to extend both the msghdrl structs and eventually if_data in the future without breaking the ABI. Bump __FreeBSD_version to allow ports to more easily detect the new API. Reviewed by: glebius, brooks MFC after: 3 days	2012-02-11 06:02:16 +00:00
Bjoern A. Zeeb	e82cf13bfb	Backout changes from r228571. Remove if_data from struct ifa_msghdr again. While this breaks carp on HEAD temporary, it restores the upgrade path from stable, and head before 20111215. Reviewed by: glebius, brooks	2012-02-11 05:59:54 +00:00
Sergey Kandaurov	4ecf274be7	g/c last bit of old ipv6 prefix management. Reviewed by: bz Obtained from: NetBSD, net/if.h, rev 1.80	2012-02-08 22:05:26 +00:00
Luigi Rizzo	5819da83ce	- change the buffer size from a constant to a TUNABLE variable (hw.netmap.buf_size) so we can experiment with values different from 2048 which may give better cache performance. - rearrange the memory allocation code so it will be easier to replace it with a different implementation. The current code relies on a single large contiguous chunk of memory obtained through contigmalloc. The new implementation (not committed yet) uses multiple smaller chunks which are easier to fit in a fragmented address space.	2012-02-08 11:43:29 +00:00
Pawel Jakub Dawidek	50c8ec53f6	Allow to set if_bridge(4) sysctls from /boot/loader.conf. MFC after: 3 days	2012-02-07 14:50:33 +00:00
Gleb Smirnoff	e8aa8bdd64	Fix typo in r231010. Submitted by: linimon	2012-02-05 12:52:28 +00:00
Gleb Smirnoff	cad582b753	Better comment for ifa_init(), ifa_ref(), ifa_free().	2012-02-05 08:53:05 +00:00
Gleb Smirnoff	adc7231d5d	In ifa_init() initialize if_data.ifi_datalen. This would be required after upcoming changes from bz@. Discussed with: bz	2012-02-05 08:31:15 +00:00
Bjoern A. Zeeb	81d5d46b3c	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.	2012-02-03 13:08:44 +00:00
Bjoern A. Zeeb	a84986256f	Move a comment from rtinit1() to the top of the file where dealing with the (maximum) number of FIBs trying to clarify that evetually FIBs should probably attached to domain(9) specific storage. [1] Add a comment on a limitimation on the rt_add_addr_allfibs option. Use RT_DEFAULT_FIB instead of 0 where applicable. Add empty line to functions without local variables per style. Put public yet unused in-tree function rtinit_fib() under BURN_BRIDGES to indicate that it might go away in the future. No functional change. Discussed with: julian [1] (clarification on what the original one meant) Sponsored by: Cisco Systems, Inc.	2012-02-03 12:25:14 +00:00
Bjoern A. Zeeb	b3dd077152	Minor optimization doing input validation with a possible early return before doing further work. Sponsored by: Cisco Systems, Inc.	2012-02-03 11:20:11 +00:00
Bjoern A. Zeeb	096f27864f	Fix FLOWTABLE IPv6 handling in route.c missed in r205066. While doing so, for consistency with the rtalloc_ign_fib(9) interface called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer indicating that it would only handle the INET case. Sponsored by: Cisco Systems, Inc.	2012-02-03 10:17:34 +00:00
Bjoern A. Zeeb	b680a383a8	Allow for IPv6 to allocate (and in the VIMAGE case free) as many routing tables (FIBs) as IPv4. Prepare various general rt* functions for multi-FIB IPv6 handling in addition to already existing multi-FIB IPv4 cases. Sponsored by: Cisco Systems, Inc.	2012-02-03 09:23:55 +00:00
Bjoern A. Zeeb	556d81ddd7	Rather than putting magic 0s as FIB argument into the rt* calls, provide a macro RT_DEFAULT_FIB defined to 0 to more easily identify the cases tied to the default FIB. Sponsored by: Cisco Systems, Inc.	2012-02-03 09:06:24 +00:00
Kip Macy	0fe48d670f	A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks	2012-01-26 20:02:40 +00:00
Bjoern A. Zeeb	8d74af3668	Replace random ARIN direct assignment legacy IPs with proper RFC 5735 TEST-NET1 block for use in documentation and example code addresses. MFC after: 3 days	2012-01-24 15:20:31 +00:00
Eitan Adler	dde153da49	- Fix trivial typo Approved by: nwhitehorn MFC after: 3 days	2012-01-14 17:07:52 +00:00
Robert Watson	7983103ae6	Clarify throughout the vlan(4) code the difference between a "tag" (the 802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a VLAN ID (VID). Tags go in packets. VIDs identify VLANs. No functional change is intended, so this should be safe to MFC. Further cleanup with functional changes will be committed separately (for example, renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI). Reviewed by: bz Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-12 18:39:37 +00:00
Lawrence Stewart	9a7e6bac47	Consumers of bpfdetach() expect it to remove all bpf_if structs from the bpf_iflist list which reference the specified ifnet. The existing implementation only removes the first matching bpf_if found in the list, effectively leaking list entries if an ifnet has been bpfattach()ed multiple times with different DLTs. Fix the leak by performing the detach logic in a loop, stopping when all bpf_if structs referencing the specified ifnet have been detached and removed from the bpf_iflist list. Whilst here, also: - Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never exist in the list with a NULL ifnet pointer. - Except when INVARIANTS is in the kernel config, silently ignore the case where no bpf_if referencing the specified ifnet is found, as it is harmless and does not require user attention. Reviewed by: csjp MFC after: 1 week	2012-01-10 00:48:29 +00:00
John Baldwin	fbcebf7f71	Convert the per-interface address list lock from a mutex to a reader/writer lock. Reviewed by: bz	2012-01-09 19:34:12 +00:00
Gleb Smirnoff	c82c34b4a9	Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571. Submitted by: bz	2012-01-08 17:11:53 +00:00
Gleb Smirnoff	94901d5e60	Move arprequest() declaration to if_ether.h.	2012-01-08 13:34:00 +00:00
Gleb Smirnoff	dcb39bd84a	Since r228571 CARP is no longer an interface.	2012-01-06 12:05:43 +00:00
John Baldwin	137f91e80f	Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks	2012-01-05 19:00:36 +00:00
John Baldwin	a2cb1d522b	Add new variants of the IF_ADDR_LOCK() macros used for protecting interface address lists that distinguish read locks from write locks. To preserve the KPI, the previous operations are mapped to the write lock macros. The lock is still kept as a mutex for now. Reviewed by: bz MFC after: 2 weeks	2012-01-05 18:35:49 +00:00
Robert Watson	5a39f779b2	Refine last comment. Submitted by: joeld Sponsored by: ADARA Networks, Inc. MFC after: 3 days	2012-01-05 11:42:34 +00:00
Robert Watson	15f6780ef4	Add comment to the VLAN code about its integration with VIMAGE: we see what the code is doing, we recognise the legitimacy of its goal, but we're not quite sure it's going about it the right way. More pondering is clearly required. Sponsored by: ADARA Networks, Inc. Discussed with: bz MFC after: 3 days	2012-01-05 11:24:22 +00:00
Lawrence Stewart	253a3814d4	Revert r228986 until it can be reworked to avoid panicing the kernel when the same interface is attached multiple times with different DLTs, as is done in net80211 for example. Reported by: adrian	2011-12-31 07:21:28 +00:00
Lawrence Stewart	0f89fc22f3	- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one aspect of time stamp configuration per interface rather than per BPF descriptor. Prior to this, the order in which BPF devices were opened and the per descriptor time stamp configuration settings could cause non-deterministic and unintended behaviour with respect to time stamping. With the new scheme, a BPF attached interface's tscfg sysctl entry can be set to "default", "none", "fast", "normal" or "external". Setting "default" means use the system default option (set with the net.bpf.tscfg.default sysctl), "none" means do not generate time stamps for tapped packets, "fast" means generate time stamps for tapped packets using a hz granularity system clock read, "normal" means generate time stamps for tapped packets using a full timecounter granularity system clock read and "external" (currently unimplemented) means use the time stamp provided with the packet from an underlying source. - Utilise the recently introduced sysclock_getsnapshot() and sysclock_snap2bintime() KPIs to ensure the system clock is only read once per packet, regardless of the number of BPF descriptors and time stamp formats requested. Use the per BPF attached interface time stamp configuration to control if sysclock_getsnapshot() is called and whether the system clock read is fast or normal. The per BPF descriptor time stamp configuration is then used to control how the system clock snapshot is converted to a bintime by sysclock_snap2bintime(). - Remove all FAST related BPF descriptor flag variants. Performing a "fast" read of the system clock is now controlled per BPF attached interface using the net.bpf.tscfg sysctl tree. - Update the bpf.4 man page. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)	2011-12-30 08:57:58 +00:00
Pyun YongHyeon	1ad7a2570d	Update if_obytes and if_omcast after successful transmit. While I'm here update if_oerrors if parent interface of vlan is not up and running. Previously it updated collision counter and it was confusing to interprete it. PR: kern/163478 Reviewed by: glebius, jhb Tested by: Joe Holden < lists <> rewt dot org dot uk >	2011-12-29 18:40:58 +00:00
Gleb Smirnoff	7121247312	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
Gleb Smirnoff	f08535f872	Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.	2011-12-20 13:53:31 +00:00
Gleb Smirnoff	08b68b0e4c	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
Gleb Smirnoff	f3909e37ff	Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib().	2011-12-15 12:49:10 +00:00
Brooks Davis	f26fa169e7	Remove the unused if_free_type() function. X-MFC after: never	2011-12-09 23:26:28 +00:00
Luigi Rizzo	506cc70cce	1. Fix the handling of link reset while in netmap more. A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two. 2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure. 3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it). Support for other drivers (em, lem, re, igb) will come later. "ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers. Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code.	2011-12-05 12:06:53 +00:00
Lawrence Stewart	3e47c78798	Revert r227778 in preparation for committing reworked patches in its place.	2011-11-29 12:55:26 +00:00
John Baldwin	d9b1d61535	Change the if_vlan driver to use if_transmit for forwarding packets to the parent interface. This avoids the overhead of queueing a packet to an IFQ only to immediately dequeue it again. Suggested by: np Reviewed by: brooks MFC after: 1 month	2011-11-28 19:35:08 +00:00
Gleb Smirnoff	2e9fff5b18	- Use generic alloc_unr(9) allocator for if_clone, instead of hand-made. - When registering new cloner, check whether a cloner with same name already exist. - When allocating unit, also check with help of ifunit() whether such interface already exist or not. [1] PR: kern/162789 [1]	2011-11-28 14:44:59 +00:00
Gleb Smirnoff	c0ba290b5f	Improve logging: - don't hardcode function name - use LOG_DEBUG for such a debug message - print error value	2011-11-22 19:42:17 +00:00
Lawrence Stewart	b6f1c7db32	- When feed-forward clock support is compiled in, change the BPF header to contain both a regular timestamp obtained from the system clock and the current feed-forward ffcounter value. This enables new possibilities including comparison of timekeeping performance and timestamp correction during post processing. - Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping packets using the feedback or feed-forward system clock. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 04:17:24 +00:00
Luigi Rizzo	68b8534bdf	Bring in support for netmap, a framework for very efficient packet I/O from userspace, capable of line rate at 10G, see http://info.iet.unipi.it/~luigi/netmap/ At this time I am bringing in only the generic code (sys/dev/netmap/ plus two headers under sys/net/), and some sample applications in tools/tools/netmap. There is also a manpage in share/man/man4 [1] In order to make use of the framework you need to build a kernel with "device netmap", and patch individual drivers with the code that you can find in sys/dev/netmap/head.diff The file will go away as the relevant pieces are committed to the various device drivers, which should happen in a few days after talking to the driver maintainers. Netmap support is available at the moment for Intel 10G and 1G cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re"). I have partial patches for "bge" and am starting to work on "cxgbe". Hopefully changes are trivial enough so interested third parties can submit their patches. Interested people can contact me for advice on how to add netmap support to specific devices. CREDITS: Netmap has been developed by Luigi Rizzo and other collaborators at the Universita` di Pisa, and supported by EU project CHANGE (http://www.change-project.eu/) The code is distributed under a BSD Copyright. [1] In my opinion is a bad idea to have all manpage in one directory. We should place kernel documentation in the same dir that contains the code, which would make it much simpler to keep doc and code in sync, reduce the clutter in share/man/ and incidentally is the policy used for all of userspace code. Makefiles and doc tools can be trivially adjusted to find the manpages in the relevant subdirs.	2011-11-17 12:17:39 +00:00
Robert Millan	ea4d9a14f1	Remove a few bits of FreeBSD 2.x compatibility code. Approved by: kib (mentor)	2011-11-14 18:21:27 +00:00
Brooks Davis	4b22573a89	In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month	2011-11-11 22:57:52 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Max Laier	3ca1a2d6a0	Fix a use-after-free/redzone issue in the routing code. Reported by (repeatedly): Mike Tancsa Prodded by (repeatedly): bz Forgotten by (repeatedly): mlaier MFC after: 2 weeks	2011-11-03 18:33:30 +00:00
Gleb Smirnoff	a0af7c3edb	Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off the queue. It can be utilized in queue processing to avoid multiple locking/unlocking.	2011-10-27 09:45:12 +00:00
Qing Li	46a70de2b0	The host-id/interface-id can have a specific value and is properly masked out when adding a prefix route through the "route" command. However, when deleting the route, simply changing the command keyword from "add" to "delete" does not work. The failoure is observed in both IPv4 and IPv6 route insertion. The patch makes the route command behavior consistent between the "add" and the "delete" operation. MFC after: 1 week	2011-10-25 00:34:39 +00:00
Ed Schouten	cf05e311ea	Add missing #includes. According to POSIX, these two header files should be able to be included by themselves, not depending on other headers. The <net/if.h> header uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses integer datatypes (u_int32_t, u_short, etc). MFC after: 2 months	2011-10-21 12:58:34 +00:00
Ed Schouten	a185bd12f3	Get rid of D_PSEUDO. It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL. Nowadays we have a different interface for that; make_dev_p(). There's no need to keep it there. While there, remove an unneeded D_NEEDMINOR from the gpio driver. Discussed with: gonzo@ (gpio)	2011-10-18 08:09:44 +00:00
Bjoern A. Zeeb	528737fdfe	Pass the fibnum where we need filtering of the message on the rtsock allowing routing daemons to filter routing updates on an rtsock per FIB. Adjust raw_input() and split it into wrapper and a new function taking an optional callback argument even though we only have one consumer [1] to keep the hackish flags local to rtsock.c. PR: kern/134931 Submitted by: multiple (see PR) Suggested by: rwatson [1] Reviewed by: rwatson MFC after: 3 days	2011-09-28 13:48:36 +00:00
Kip Macy	1eeb6d97d0	Make KBI changes required for future MFCing of inpcb rtentry / llentry caching. Reviewed by: rwatson, bz Approved by: re (kib)	2011-09-20 20:27:26 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Andrew Thompson	0fe082e7d5	On the first loop for generating a bridge MAC address use the local hostid, this gives a good chance of keeping the same address over reboots. This is intended to help IPV6 and similar which generate their addresses from the mac. PR: kern/160300 Submitted by: mdodd Approved by: re (kib)	2011-09-04 22:06:32 +00:00
Bjoern A. Zeeb	3d07127c64	When adding IPv6 fwd support to ipfw in r225044 these two files were not committed. Initialize next_hop6 to align with the IPv4 code. PR: bin/117214 MFC after: 3 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-27 08:49:55 +00:00

1 2 3 4 5 ...

2903 Commits