Commit Graph

3466 Commits

Author SHA1 Message Date
Luigi Rizzo
847bf38369 Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after:	1 month
Sponsored by:	(partly) Verisign, Cisco
2015-07-10 05:51:36 +00:00
Patrick Kelsey
e9617c305c Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt.  As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt.  Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.

Differential Revision: https://reviews.freebsd.org/D2989
Reviewed by: gnn, melifaro
Approved by: jmallett (mentor)
MFC after: 3 days
2015-07-06 02:12:49 +00:00
George V. Neville-Neil
987de84445 New AES modes for IPSec, user space components.
Update setkey and libipsec to understand aes-gcm-16 as an
encryption method.

A partial commit of the work in review D2936.

Submitted by:	eri
Reviewed by:	jmg
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-03 20:09:14 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Bjoern A. Zeeb
9656119da4 Another attempt to make this compile on more architectures after r284777. 2015-06-25 23:16:01 +00:00
Ermal Luçi
cd2bc2ef4e Correct r284777 to use proper includes and remove dead code to unbreak kernel builds.
Differential Revision:	https://reviews.freebsd.org/D2847
2015-06-25 15:05:58 +00:00
Ermal Luçi
a5b789f65a ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
Dimitry Andric
966ab68df1 Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang
3.7.0.

Reviewed by:	marcel
2015-06-23 18:48:41 +00:00
Kristof Provost
581e697036 Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:		200210
Differential Revision:	https://reviews.freebsd.org/D2804
Reviewed by:	philip (mentor)
2015-06-13 19:39:21 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Alexander V. Chernikov
5446b3f1d4 * Update SFF-8024 Identifier constants.
* Fix SFF_8436_CC_EXT in SFF-8436 memory map.
* Add SFF-8436/8636 bits (revision compliance/nominal bitrate).
* Do some small style/type fixes.
2015-05-16 13:11:35 +00:00
Andrey V. Elsukov
c1b4f79dfa Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
Andrey V. Elsukov
eccfe69a5c Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated
packets on tunnel interfaces. Add support of these ioctls to gre(4),
gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use
if_fib value from ifnet structure, use proper value in gre(4) and me(4).

Differential Revision:	https://reviews.freebsd.org/D2462
No objection from:	#network
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-12 07:37:27 +00:00
Hiroki Sato
1c27e6c39f Fix a panic when VIMAGE is enabled.
Spotted by:	Nikos Vassiliadis
2015-05-12 03:35:45 +00:00
Andrey V. Elsukov
b347bc3b7a Pass mtag argument into m_tag_locate() to continue the search from
the last found mtag.
2015-05-06 14:02:57 +00:00
Gleb Smirnoff
a7dc945989 After r281643 an #ifdef IFT_FOO preprocessor directive returns false,
since types became a enum C type.  Some software uses such ifdefs to
determine whether an operating systems supports certain interface type.
Of course, such check is bogus. E.g. FreeBSD defines about 250 interface
types, but supports only around 20.
However, we need not upset such software so provide a set of defines. The
current set was taken to suffice the dhcpd.

Reported & tested by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:37:40 +00:00
Hiren Panchasara
a9467c3c45 Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need
to specify exact number of cpus in loader.conf which get annoying when you have
mix of machines which don't have equal number of total cpus. I propose "-1" as
that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all
available cpus.

In collaboration with:	davide
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D2318
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2015-04-25 16:12:06 +00:00
Gleb Smirnoff
9f7d0f4830 Don't propagate SIOCSIFCAPS from a vlan(4) to its parent. This leads to
quite unexpected result of toggling capabilities on the neighbour vlan(4)
interfaces.

Reviewed by:		melifaro, np
Differential Revision:	https://reviews.freebsd.org/D2310
Sponsored by:		Nginx, Inc.
2015-04-23 13:19:00 +00:00
Craig Rodrigues
d9db52256e Move zlib.c from net to libkern.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/

Submitted by:		Steve Kiernan stevek@juniper.net
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/28
Relnotes:		yes
2015-04-22 14:38:58 +00:00
Gleb Smirnoff
41c1a23326 Make IFMEDIA_DEBUG a kernel option.
Sponsored by:	Nginx, Inc.
2015-04-21 10:35:23 +00:00
Mark Johnston
b23cbbe6db Move the definition of struct bpf_if to bpf.c.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.

Differential Revision:	https://reviews.freebsd.org/D2319
Reviewed by:	melifaro, rpaulo
2015-04-20 22:08:11 +00:00
Alexander Motin
7144875388 Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-04-20 10:44:46 +00:00
Gleb Smirnoff
6456c04aae Bring in if_types.h from projects/ifnet, where types are
defined in enum.
2015-04-17 06:39:15 +00:00
Gleb Smirnoff
da8ae05d14 - Format copyright notices, VCS ids.
- Run through unifdef(1).
2015-04-17 06:38:31 +00:00
Gleb Smirnoff
772e66a6fc Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
Marcelo Araujo
546afaf83d Remove duplicate header entry. 2015-04-16 02:44:37 +00:00
George V. Neville-Neil
6332e4cc38 Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory.
Differential Revision:	https://reviews.freebsd.org/D2291
2015-04-15 14:46:45 +00:00
George V. Neville-Neil
3085e1216e Document internal interface types which are specific to FreeBSD. 2015-04-14 15:21:20 +00:00
Gleb Smirnoff
4651df570c Redo r274966. Instead of global all-interface all-vnet undocumented sysctl,
use per-interface flag, and document it.

Sponsored by:	Nginx, Inc.
2015-04-10 09:50:13 +00:00
George V. Neville-Neil
51d4054eeb Revert 281276 as unnecessary. Proper change to be committed
to the base polling code in a subsequent commit.

Pointed out by: glebius

Sponsored by:	Rubicon Communications (NetGate)
2015-04-09 14:44:30 +00:00
George V. Neville-Neil
8a7ad10169 Add support for a netisr polling tunable, which allows run time switching of
device polling rather than having it only be controlled by the compile
time option.

Summary: Rubicon Communications (Netgate)

Reviewers: #network, hiren

Reviewed By: #network, hiren

Subscribers: hiren

Differential Revision: https://reviews.freebsd.org/D2258
2015-04-08 20:25:51 +00:00
Eric Joyner
eb7e25b22f ifmedia changes:
- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.

- Use some of those new values to define new media types.

- lacp_compose_key() recgonizes the new Ethernet media types added.
  (Change made as required by a comment in if_media.h)

- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
  SIOCGIFMEDIA is retained for backwards compatibility.

- Changes to ifconfig to allow it to handle the new extended media types.

Submitted by:	mike@karels.net (original), hselasky
Reviewed by:	jfvogel, gnn, hselasky
Approved by:	jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965
2015-04-07 21:31:17 +00:00
Andrey V. Elsukov
a4b65afcab Fix a possible mbuf leak on interface departure.
Reported by:	Alexandre Martins
2015-03-26 23:40:22 +00:00
Gleb Smirnoff
c03044244f Fix couple of fallouts from r280280. The first one is a simple typo,
where counter was incremented on parent, instead of vlan(4) interface.

The second is more complicated. Historically, in our stack the incoming
packets are accounted in drivers, while incoming bytes for Ethernet
drivers are accounted in ether_input_internal(). Thus, it should be
removed from vlan(4) driver.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-25 16:01:46 +00:00
Gleb Smirnoff
b1828acf05 Make vlan_config() the signle point of validity checks.
Sponsored by:	Nginx, Inc.
2015-03-20 21:09:03 +00:00
Gleb Smirnoff
f941c31aae In vlan_clone_match_ethervid():
- Use ifunit() instead of going through the interface list ourselves.
- Remove unused parameter.
- Move the most important comment above the function.

Sponsored by:	Nginx, Inc.
2015-03-20 20:42:58 +00:00
Gleb Smirnoff
67975c79be Tiny comment fix. 2015-03-20 14:16:26 +00:00
Gleb Smirnoff
a58ea6b1cf Now, when r272244 introduced counter(9) based counters for all interfaces,
revert the r271538, which did that for vlan(4) only.

No objections:	melifaro
Sponsored by:	Nginx, Inc.
2015-03-20 14:05:17 +00:00
Gleb Smirnoff
3e8c6d74bb Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
Andrey V. Elsukov
b57d97215e Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().

This prevents panics when if_input() method called directly e.g.
from bpf(4) code.

PR:		192426
Reviewed by:	glebius
MFC after:	1 week
2015-03-12 14:55:33 +00:00
Hans Petter Selasky
b7ba031ff7 Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00
Mark Johnston
aa14e9b7c9 Reimplement support for userland core dump compression using a new interface
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.

This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.

Differential Revision:	https://reviews.freebsd.org/D1832
Reviewed by:	rpaulo
Discussed with:	kib
2015-03-09 03:50:53 +00:00
Gleb Smirnoff
607e337454 Optimize SIOCGIFMEDIA handling removing malloc(9) and double
traversal of the list.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-03-04 15:00:20 +00:00
Hiroki Sato
c92a456b55 Fix group membership of cloned interfaces when one is moved by
if_vmove().

In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface.  When
detaching, if_delgroup() was called and the interface leaves all of
the group membership.  And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.

This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation.  However, if_vmove() removed the membership and did
not restore upon attachment.

Differential Revision:	https://reviews.freebsd.org/D1859
2015-03-02 20:00:03 +00:00
Gleb Smirnoff
fec642add7 Hide struct ifmultiaddr under _KERNEL, too. 2015-02-27 01:15:23 +00:00
Xin LI
08e5736618 Handle SIOCSIFCAP by propogating the request to the parent interface. This
allows adding an vlan interface into a bridge.

Thanks for William Katsak <wkatsak cs rutgers edu> for testing and fixing
an issue in my previous patch draft.

MFC after:	2 weeks
2015-02-20 18:39:12 +00:00
Gleb Smirnoff
e072c794ad Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 23:16:10 +00:00
Gleb Smirnoff
0324938a0f - Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
2015-02-16 23:50:53 +00:00
Gleb Smirnoff
8dc98c2a36 Toss declarations to fix regular build and NO_INET6 build. 2015-02-16 21:52:28 +00:00
Gleb Smirnoff
f0b0fe5b45 Commit a miss from r278843.
Pointy hat to:	glebius
2015-02-16 18:33:33 +00:00
Brad Davis
936bdf364d Fix build.
Approved by:	gibbs
2015-02-16 18:06:24 +00:00
Gleb Smirnoff
6004805208 Missed from r278831. 2015-02-16 06:02:46 +00:00
Hiroki Sato
25792b116f Fix a panic when tearing down a vnet on a VIMAGE-enabled kernel.
There was a race that bridge_ifdetach() could be called via
ifnet_departure event handler after vnet_bridge_uninit().

PR:		195859
Reported by:	Danilo Egea Gondolfo
2015-02-14 18:15:14 +00:00
Will Andrews
0e5f55bb95 Improve the distribution of LAGG port traffic.
I edited the original change to retain the use of arc4random() as a seed for
the hashing as a very basic defense against intentional lagg port selection.

The author's original commit message (edited slightly):

sys/net/ieee8023ad_lacp.c
sys/net/if_lagg.c
	In lagg_hashmbuf, use the FNV hash instead of the old
	hash32_buf.  The hash32 family of functions operate one octet
	at a time, and when run on a string s of length n, their output
	is equivalent to :

		   ----- i=n-1
		   \
	       n    \           (n-i-1)              32
	( seed^  +  /        33^        * s[i] ) % 2^
		   /
		   ----- i=0

	The problem is that the last five bytes of input don't get
	multiplied by sufficiently many powers of 33 to rollover 2^32.
	That means that changing the last few bytes (but obviously not
	the very last) of input will always change the value of the
	hash by a multiple of 33.  In the case of lagg_hashmbuf() with
	ipv4 input, the last four bytes are the TCP or UDP port
	numbers.  Since the output of lagg_hashmbuf is always taken
	modulo the port count, and 3 is a common port count for a lagg,
	that's bad.  It means that the UDP or TCP source port will
	never affect which lagg member is selected on a 3-port lagg.

	At 10Gbps, I was not able to measure any difference in CPU
	consumption between the old and new hash.

Submitted by:	asomers (original commit)
Reviewed by:	emaste, glebius
MFC after:	1 week
Sponsored by:	Spectra Logic
MFSpectraBSD:	1001723 on 2013/08/28 (original)
		1114258 on 2015/01/22 (edit)
2015-01-23 00:06:35 +00:00
Gleb Smirnoff
efc6c51ffa Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
2015-01-22 01:23:16 +00:00
Adrian Chadd
b2bdc62a95 Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
  configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
  that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision:	D1383
Reviewed by:	gnn, jfv (intel driver bits)
2015-01-18 18:06:40 +00:00
Andrey V. Elsukov
504289ea5a Fix condition and really sort ports. Also add comment describing
the intent of this code.

Reported by:	sbruno
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-01-17 11:32:09 +00:00
Alexander V. Chernikov
29e0d65d7a Eliminate SIOCGIFADDR handling in bpf.
Quoting 19 years bpf.4 manual from bpf-1.2a1:
"
(SIOCGIFADDR is obsolete under BSD systems.  SIOCGIFCONF should be
 used to query link-level addresses.)
"
* SIOCGIFADDR was not imported in NetBSD (bpf.c 1.36) and OpenBSD.
* Last bits (e.g. manpage claiming SIOCGIFADDR exists) was cleaned
  from NetBSD via kern/21513 5 years ago,
  from OpenBSD via documentation/6352 5 years ago.
2015-01-16 10:09:28 +00:00
Andrey V. Elsukov
9c0265c6fd Restore Ethernet-within-IP Encapsulation support that was broken after
r273087. Move all checks from gif_output() into gif_transmit(). Previously
they were checked always, because if_start always called gif_output.
Now gif_transmit() can be called directly from if_bridge() code and we need
do checks here.

PR:		196646
MFC after:	1 week
2015-01-10 08:28:50 +00:00
Andrey V. Elsukov
3a9f9af803 Use if_name() macro instead of ifp->if_xname.
MFH:		1 week
2015-01-10 03:29:17 +00:00
Andrey V. Elsukov
84d03ddad3 Fix an error introduced in r274246.
Pass mtag argument into m_tag_locate() to continue the search from
the last found mtag.

X-MFC after:	r274246
2015-01-10 03:26:46 +00:00
Andrey V. Elsukov
c26230adee Move the recursion detection code into separate function gif_check_nesting().
Also make MTAG_GIF definition private to if_gif.c.

MFC after:	1 week
2015-01-10 03:13:16 +00:00
Alexander V. Chernikov
ecf09f8321 Fix typo.
Submitted by:	Olivér Pintér
2015-01-09 20:29:13 +00:00
Alexander V. Chernikov
d63e657c04 * Deal with ARCNET L2 multicast mapping for IPv6 the same way as in IPv4:
handle it in arc_output() instead of nd6_storelladdr().
* Remove IFT_ARCNET check from arpresolve() since arc_output() does not
  use arpresolve() to handle broadcast/multicast. This check was there
  since r84931. It looks like it was not used since r89099 (initial
  import of Arcnet support where multicast is handled separately).
* Remove IFT_IEEE1394 case from nd6_storelladdr() since firewire_output()
  calles nd6_storelladdr() for unicast addresses only.
* Remove IFT_ARCNET case from nd6_storelladdr() since arc_output() now
  handles multicast by itself.

As a result, we have the following pattern: all non-ethernet-style
media have their own multicast map handling inside their appropriate
routines. On the other hand, arpresolve() (and nd6_storelladdr()) which
meant to be 'generic' ones de-facto handles ethernet-only multicast maps.

MFC after:	3 weeks
2015-01-09 12:56:51 +00:00
Xin LI
681ed54caa MFV r276759: libpcap 1.6.2.
MFC after:	1 month
2015-01-06 22:29:12 +00:00
Craig Rodrigues
8d665c6ba8 Reapply previous patch to fix build.
PR: 194515
2015-01-06 16:47:02 +00:00
Craig Rodrigues
c75820c756 Merge: r258322 from projects/pf branch
Split functions that initialize various pf parts into their
    vimage parts and global parts.
    Since global parts appeared to be only mutex initializations, just
    abandon them and use MTX_SYSINIT() instead.
    Kill my incorrect VNET_FOREACH() iterator and instead use correct
    approach with VNET_SYSINIT().

PR:			194515
Differential Revision:	D1309
Submitted by: 		glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: 		trociny, zec, gnn
2015-01-06 08:39:06 +00:00
Alexander V. Chernikov
3a7498636a * Allocate hash tables separately
* Make llt_hash() callback more flexible
* Default hash size and hashing method is now per-af
* Move lltable allocation to separate function
2015-01-05 17:23:02 +00:00
Alexander V. Chernikov
b44a7d5d87 * Use unified code for deleting entry by sockaddr instead of per-af one.
* Remove now unused llt_delete_addr callback.
2015-01-03 19:09:06 +00:00
Alexander V. Chernikov
20dd899505 * Hide lltable implementation details in if_llatbl_var.h
* Make most of lltable_* methods 'normal' functions instead of inline
* Add lltable_get_<af|ifp>() functions to access given lltable fields
* Temporarily resurrect nd6_lookup() function
2015-01-03 16:04:28 +00:00
Andrey V. Elsukov
f188f14d43 Extern declarations in C files loses compile-time checking that
the functions' calls match their definitions. Move them to header files.

Reviewed by:	jilles (previous version)
2014-12-25 21:32:37 +00:00
Andrey V. Elsukov
06cd035ab6 Remove if_stf.h. It contains only one function declaration used by if_stf(4).
Also make in_stf_protosw structure static.
2014-12-23 20:54:59 +00:00
Andrey V. Elsukov
132c449079 Remove in_gif.h and in6_gif.h files. They only contain function
declarations used by gif(4). Instead declare these functions in C files.
Also make some variables static.
2014-12-23 16:17:37 +00:00
John Baldwin
fd22444c4f Provide a dead version of if_get_counter.
Submitted by:	glebius
Reported by:	np
2014-12-12 16:10:42 +00:00
Alexander V. Chernikov
ee7e9a4e17 * Do not assume lle has sockaddr key after struct lle:
use llt_fill_sa_entry() llt method to store lle address in sa.
* Eliminate L3_ADDR macro and either reference IPv4/IPv6 address
   directly from lle or use newly-created llt_fill_sa_entry().
* Do not store sockaddr inside arp/ndp lle anymore.
2014-12-09 00:48:08 +00:00
Alexander V. Chernikov
d82ed5051c Simplify lle lookup/create api by using addresses instead of sockaddrs. 2014-12-08 23:23:53 +00:00
Alexander V. Chernikov
73b52ad896 Use llt_prepare_static_entry method to prepare valid per-af static entry. 2014-12-07 23:59:44 +00:00
Alexander V. Chernikov
0368226e65 * Retire abstract llentry_free() in favor of lltable_drop_entry_queue()
and explicit calls to RTENTRY_FREE_LOCKED()
* Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6.
* Rename <lltable_|llt>_delete function to _delete_addr() to note that
   this function is used to external callers. Make this function maintain
   its own locking.
* Use lookup/unlink/clear call chain from internal callers instead of
    delete_addr.
* Fix LLE_DELETED flag handling
2014-12-07 23:08:07 +00:00
Alexander V. Chernikov
721cd2e032 Do not enforce particular lle storage scheme:
* move lltable allocation to per-domain callbacks.
* make llentry_link/unlink functions overridable llt methods.
* make hash table traversal another overridable llt method.
2014-12-07 17:32:06 +00:00
Alexander V. Chernikov
a743ccd468 * Add llt_clear_entry() callback which is able to do all lle
cleanup including unlinking/freeing
* Relax locking in lltable_prefix_free_af/lltable_free
* Do not pass @llt to lle free callback: it is always NULL now.
* Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding
   unlock/lock sequinces
* Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc()
   to retrieve preferred source address from lle hold queue and pass it
   instead of lle.
* Finally, make nd6_create() create and return unlocked lle
* Separate defrtr handling code from nd6_free():
   use nd6_check_del_defrtr() to check if we need to keep entry instead of
    performing GC,
   use nd6_check_recalc_defrtr() to perform actual recalc on lle removal.
* Move isRouter handling from nd6_cache_lladdr() to separate
   nd6_check_router()
* Add initial code to maintain lle runtime flags in sync.
2014-12-07 15:42:46 +00:00
Andrey V. Elsukov
2dfcd0ae9d Remove unneded check. No need to do m_pullup to the size that we prepended.
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-12-02 05:41:03 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Alexander V. Chernikov
ce313fdd71 * Unify lle table dump/prefix removal code.
* Rename lla_XXX -> lltable_XXX_lle to reduce number of name prefixes
  used by lltable code.
2014-11-30 14:35:01 +00:00
Alexander V. Chernikov
5d14e4cd76 Provide rte_<get|set> methods to access rtentry for external consumers. 2014-11-29 19:27:43 +00:00
Alexander V. Chernikov
1be1588acf * Make ifa_add_loopback_route() prepare gw before insertion.
* Temporarily move ifa_switch_loopback_route() implementation to route.c
2014-11-29 15:02:45 +00:00
Bjoern A. Zeeb
2c3774c183 After r275196 unbreak NOIP and NOINET kernels by hiding an otherwise
unused varibale under the proper #ifdef.
2014-11-28 14:51:49 +00:00
Alexander V. Chernikov
1a3a2b6798 Fix build broken by r275195. 2014-11-27 23:10:03 +00:00
Alexander V. Chernikov
74860d4f7c Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -
return lle flags IFF needed.
Do not pass rte to arpresolve - pass is_gateway flag instead.
2014-11-27 23:06:25 +00:00
Alexander V. Chernikov
c69aeaad14 Do not try to copy header to @dst and than back to ethernet in case of
pseudo_AF_HDRCMPLT:

we copy media header from mbuf to 'struct sockaddr' @dst in bpf_movein, so
mbuf already contains valid info.
2014-11-27 21:29:19 +00:00
Philip Paeps
894d1973f1 Add a sysctl `net.link.tap.deladdrs_on_close' to configure whether tap
should delete configured addresses and routes when the interface is
closed.  Default is enabled (preserve current behaviour).

MFC after:	1 week
2014-11-24 14:00:27 +00:00
Alexander V. Chernikov
acbc394dbe Finish r274335#2: put RT_LOCK_DESTROY() back. 2014-11-23 17:47:12 +00:00
Alexander V. Chernikov
ec25679569 Do not try to unlock lle which is not locked.
This is not a proper fix, proper one is on the way.
2014-11-23 17:45:49 +00:00
Alexander V. Chernikov
73d770287d Do more fine-grained lltable locking: use table runtime lock as rare
as we can.
2014-11-23 15:38:06 +00:00
Alexander V. Chernikov
9479029b1f * Add lltable llt_hash callback
* Move lltable items insertions/deletions to generic llt code.
2014-11-23 12:15:28 +00:00
Alexander V. Chernikov
7c066c18db Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
    (IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
  use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
2014-11-22 19:53:36 +00:00
Alexander V. Chernikov
27688dfe1d Temporarily revert r274774. 2014-11-22 17:57:54 +00:00
Alexander V. Chernikov
2e47d2f953 Mark ifaddr/rtsock static entries RLLE_VALID. 2014-11-21 23:37:59 +00:00
Alexander V. Chernikov
9883e41b4b Switch IF_AFDATA lock to rmlock 2014-11-21 02:28:56 +00:00
Alexander V. Chernikov
aca894e07b Finish sync: remove if_faith.c 2014-11-21 01:27:27 +00:00
Alexander V. Chernikov
4d56c133fb Sync to HEAD@r274766 2014-11-21 01:22:33 +00:00