Commit Graph

3461 Commits

Author SHA1 Message Date
smh
f50d421524 Add sysctl to control LACP strict compliance default
Add net.link.lagg.lacp.default_strict_mode which defines
the default value for LACP strict compliance for created
lagg devices.

Also:
* Add lacp_strict option to ifconfig(8).
* Fix lagg(4) creation examples.
* Minor style(9) fix.

MFC after:	1 week
2015-11-06 15:33:27 +00:00
gnn
00e8eec3f8 Replace the fastforward path with tryforward which does not require a
sysctl and will always be on. The former split between default and
fast forwarding is removed by this commit while preserving the ability
to use all network stack features.

Differential Revision:	https://reviews.freebsd.org/D4042
Reviewed by:	ae, melifaro, olivier, rwatson
MFC after:	1 month
Sponsored by:	Rubicon Communications (Netgate)
2015-11-05 07:26:32 +00:00
rrs
3b409054c4 Fix three flowtable bugs, a) one lookup issue, b) a two cleaner issue.
MFC after:	3 days
Sponsored by: Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D4014
2015-11-02 21:21:00 +00:00
melifaro
a23a95f981 Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after:		4 weeks
Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D4004
2015-11-01 19:59:04 +00:00
bdrewery
722eb97ab8 Avoid passing an uninitialized 'i'. Currently nothing was depending on it
anyhow.

Coverity CID:	1331562
2015-10-29 18:58:18 +00:00
ae
f27a843176 Check the size of data available in mbuf, before using them.
PR:		202667
MFC after:	1 week
2015-10-28 17:55:37 +00:00
kp
3831c06c1b pf: Fix compliation warning with gcc
While fixing the PF_ANEQ() macro I messed up the parentheses, leading to
compliation warnings with gcc.

Spotted by:     ian
Pointy Hat:     kp
2015-10-25 18:09:03 +00:00
kp
7d9c2c6af4 PF_ANEQ() macro will in most situations returns TRUE comparing two identical
IPv4 packets (when it should return FALSE). It happens because PF_ANEQ() doesn't
stop if first 32 bits of IPv4 packets are equal and starts to check next 3*32
bits (like for IPv6 packet). Those bits containt some garbage and in result
PF_ANEQ() wrongly returns TRUE.

Fix: Check if packet is of AF_INET type and if it is then compare only first 32
bits of data.

PR:		204005
Submitted by:	Miłosz Kaniewski
2015-10-25 13:14:53 +00:00
emaste
51b2e7fecb if_tap: correct typo in sysctl description (Enably)
Sponsored by:	The FreeBSD Foundation
2015-10-21 19:56:16 +00:00
melifaro
45d403ed23 Remove several compat functions from pre-fib era. 2015-10-17 17:26:44 +00:00
hrs
30ae8a4cad Fix a panic when destroying a lagg interface.
Differential Revision:	https://reviews.freebsd.org/D3883
2015-10-16 01:16:01 +00:00
kp
40bca2754d pf: Fix TSO issues
In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR:		154428, 193579, 198868
Reviewed by:	sbruno
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RootBSD
Differential Revision:	https://reviews.freebsd.org/D3779
2015-10-14 16:21:41 +00:00
hrs
03da536d33 Fix a bug that caused reinitialization failure of MAC addresses on
the lagg interface when removing the primary port.

PR:			201916
Differential Revision:	https://reviews.freebsd.org/D3301
2015-10-07 06:32:34 +00:00
araujo
2bbb4de455 Remove per complete the fec aggregation protocol.
The remove began with revision r271733.

NOTE: This patch must never be merge to 10-Stable

Reviewed by:	glebius
Approved by:	bapt (mentor)
Relnotes:	Yes
Sponsored by:	EuroBSDCon Sweden.
Differential Revision:	D3786
2015-10-04 08:00:29 +00:00
hrs
d3ad52afc9 Add IFCAP_LINKSTATE support. 2015-10-03 09:15:23 +00:00
ae
6959f4d949 Always detach encap handler when reconfiguring tunnel.
Reported by:	hrs
MFC after:	1 week
2015-10-03 03:57:58 +00:00
melifaro
91b3356875 Eliminate nd6_nud_hint() and its TCP bindings.
Initially function was introduced in r53541 (KAME initial commit) to
  "provide hints from upper layer protocols that indicate a connection
  is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability
  Confirmation).
However, it was converted to do nothing (e.g. just return) in r122922
  (tcp_hostcache implementation) back in 2003. Some defines were moved
  to tcp_var.h in r169541. Then, it was broken (for non-corner cases)
  by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So,
  right now this code is broken and has no "real" base users.

Differential Revision:	https://reviews.freebsd.org/D3699
2015-09-27 05:29:34 +00:00
melifaro
493325342d Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
ae
179df1f956 Use KASSERT for some checks, that are late to do.
Discussed with:	melifaro, glebius
2015-09-16 13:17:00 +00:00
oleg
76836bdb9c Remove superfluous m_freem().
MFC after:	1 month
2015-09-16 10:07:45 +00:00
melifaro
44be0cdaa8 Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
  if old ifp fib differes from new one, that the caller
  is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
2015-09-16 06:23:15 +00:00
melifaro
b42c7af3b5 * Require explicitl lle unlink prior to calling llentry_delete().
This one slightly decreases time of holding afdata wlock.
* While here, make nd6_free() return void. No one has used its return value
  since r186119.
2015-09-15 06:48:19 +00:00
vangyzen
75d72d4482 Fix the handling of IPv6 On-Link Redirects.
On receipt of a redirect message, install an interface route for the
redirected destination.  On removal of the corresponding Neighbor Cache
entry, remove the interface route.

This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.

This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.

Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.

When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE.  cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi.  (Thanks!)

These other test cases also passed in 2012:

* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
  RTF_HOST|RTF_GATEWAY route for the destination)

* the redirected-to-self case, with IPv4 and IPv6

* a valid IPv4 redirect

All testing in 2012 was done with WITNESS and INVARIANTS.

Tested by:    EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
              Mark Kelley <mark_kelley@dell.com> in 2012,
              TC Telkamp <terence_telkamp@dell.com> in 2012
PR:           152791
Reviewed by:  melifaro (current rev), bz (earlier rev)
Approved by:  kib (mentor)
MFC after:    1 month
Relnotes:     yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
2015-09-14 19:17:25 +00:00
melifaro
5ad1f2444d * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
hselasky
ac0c211c77 Update TSO limits to include all headers.
To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.

Implementation notes:

If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.

Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.

The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.

Existing network adapter limits will be updated separately.

Differential Revision:	https://reviews.freebsd.org/D3458
Reviewed by:		rmacklem
MFC after:		2 weeks
2015-09-14 08:36:22 +00:00
hrs
099cf5ebd0 - Remove GIF_{SEND,ACCEPT}_REVETHIP.
- Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions.

MFC after:	3 days
2015-09-10 05:59:39 +00:00
melifaro
bbab608243 Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
hrs
37e142f0ee Fix a panic which was reproducible by an infinite loop of
"ifconfig epair0 create && ifconfig epair0a destroy".

This was caused by an uninitialized function pointer in
softc->media.
2015-09-02 16:30:45 +00:00
melifaro
3f9699bcfe Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in
alloc handler, based on flags.
2015-08-31 05:03:36 +00:00
adrian
a8162db5e6 Remove now unused (and #if 0'ed out) headers. 2015-08-29 04:33:31 +00:00
adrian
2d6b12b499 Replace the printf()s with optional rate limited debugging for RSS.
Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision:	https://reviews.freebsd.org/D3471
2015-08-28 05:58:16 +00:00
kp
2a1a59d8e1 pf: Remove support for 'scrub fragment crop|drop-ovl'
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.

Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.

Reviewed by:	gnn, eri
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D3466
2015-08-27 21:27:47 +00:00
loos
c465848fcf Fix the spelling of eri's name.
Pointy hat to:	loos
MFC with:	r287009
2015-08-24 23:40:36 +00:00
loos
498601242d Add ALTQ(9) support for the CoDel algorithm.
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Differential Revision:	https://reviews.freebsd.org/D3272
Reviewd by:	rpaulo, gnn (previous version)
Obtained from:	pfSense
Sponsored by:	Rubicon Communications (Netgate)
2015-08-21 22:02:22 +00:00
melifaro
54b3b78856 * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
  return existing if found, create if not. This behaviour was error-prone
  since we had to deal with 'sudden' static<>dynamic lle changes.
  This commit fixes bunch of different issues like:
  - refcount leak when lle is converted to static.
    Simple check case:
    console 1:
    while true;
      do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
        do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
      done;
    done
   console 2:
    ping -f any-dead-host-in-L2
   console 3:
    # watch for memory consumption:
    vmstat -m | awk '$1~/lltable/{print$2}'
  - possible problems in arptimer() / nd6_timer() when dropping/reacquiring
   lock.
  New logic explicitly handles use-or-create cases in every lla_create
  user. Basically, most of the changes are purely mechanical. However,
  we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
  lle_free_t callback for entry deletion
2015-08-20 12:05:17 +00:00
hiren
99dda03ed4 Make LAG LACP fast timeout tunable through IOCTL.
Differential Revision:	D3300
Submitted by:		LN Sundararajan <lakshmi.n at msystechnologies>
Reviewed by:		wblock, smh, gnn, hiren, rpokala at panasas
MFC after:		2 weeks
Sponsored by:		Panasas
2015-08-12 20:21:04 +00:00
melifaro
0c24547a66 Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
2015-08-11 12:38:54 +00:00
melifaro
d8f92ce2cf Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
melifaro
8e6b3a8d59 MFP r276712.
* Split lltable_init() into lltable_allocate_htbl() (alloc
  hash table with default callbacks) and lltable_link() (
  links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
2015-08-11 05:51:00 +00:00
melifaro
ba06112c24 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
melifaro
4f240a9c31 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
melifaro
33c52eed18 MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
melifaro
20bb5966e2 MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
loos
1511a8a27a Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
loos
b6407759b0 Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
jmg
296c3c4f2f looks like all archs either have clang or cdefs included before..
drop this include as unnecessary..

Requested by:	bde
2015-08-02 21:33:40 +00:00
jmg
9b301851a3 convert to C11's _Static_assert, and pull in sys/cdefs.h for
compatibility w/ older non-C11 compilers...

passed make tinerdbox..

Suggested by:	imp
2015-08-02 00:15:52 +00:00
loos
bf2e4c7805 Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).

The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet.  If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.

Update the comment in ROTATE_BUFFERS macro to match the logic described
here.

While here fix a few typos in comments.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 21:43:27 +00:00
loos
89fd02fd8c Remove the sleep from the buffer allocation routine.
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:25:54 +00:00
loos
4244ee9e43 Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).

If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.

If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.

This fix allows the usage of BIOCSETBUFMODE after r235746.

Update the comments to reflect the recent changes.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:02:12 +00:00
ae
e9ba3ff9a6 Ansify if_stf.c 2015-07-31 09:04:22 +00:00
jmg
e1594f085c temporarily fix build.. This isn't the final fix, and testing is
still on going, but it has passed world for mips and powerpc...

I know this has an extra semicolon, but this is the patch that is
tested...

Looks like better fix is to use _Static_assert...
2015-07-31 07:48:08 +00:00
jmg
c00fae0f3e Clean up this header file...
use CTASSERTs now that we have them...

Replace a draft w/ RFC that's over 10 years old.

Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be..  This is part of the KABI...  I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything..  and it shortens a loop by quite
a bit..

remove SKIPJACK IPsec support...  SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...

jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6

Reviewed by:	gnn (earlier version)
2015-07-31 00:23:21 +00:00
ae
6895fa8542 Build if_stf(4) module only when both INET and INET6 support are enabled. 2015-07-30 10:26:43 +00:00
loos
79c1792263 Follow r256586 and rename the kernel version of the Free() macro to
R_Free().  This matches the other macros and reduces the chances to clash
with other headers.

This also fixes the build of radix.c outside of the kernel environment.

Reviewed by:	glebius
2015-07-30 02:09:03 +00:00
ae
271b2043d8 Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
ae
75425458ac Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
zec
c8374d535b Prevent null-pointer dereferencing.
MFC after:	3 days
2015-07-20 08:21:51 +00:00
markm
4637ff6821 * Address review (and add a bit myself).
- Tweek man page.
 - Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA.
 - Tidy up headers a bit.
 - Tidy up declarations a bit.
 - Make static in a couple of places where needed.
 - Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used.
 - Get rid of random_*_process_buffer() functions. They were only used in one place each, and are better subsumed into those places.
 - Remove *_post_read() functions as they are stubs everywhere.
 - Assert against buffer size illegalities.
 - Clean up some silly code in the randomdev_read() routine.
 - Make the harvesting more consistent.
 - Make some requested argument name changes.
 - Tidy up and clarify a few comments.
 - Make some requested comment changes.
 - Make some requested macro changes.

* NOTE: the thing calling itself a 'unit test' is not yet a proper
  unit test, but it helps me ensure things work. It may be a proper
  unit test at some time in the future, but for now please don't make
  any assumptions or hold any expectations.

Differential Revision:	https://reviews.freebsd.org/D2025
Approved by:	so (/dev/random blanket)
2015-07-12 18:14:38 +00:00
luigi
c354cad8fd Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
  (the latter are lower performance but give access to all traffic
  in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
  from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
  (ptnetmap) recently presented at bsdcan. ptnetmap is described in
        S. Garzarella, G. Lettieri, L. Rizzo;
        Virtual device passthrough for high speed VM networking,
        ACM/IEEE ANCS 2015, Oakland (CA) May 2015
        http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
  to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after:	1 month
Sponsored by:	(partly) Verisign, Cisco
2015-07-10 05:51:36 +00:00
pkelsey
43348356a7 Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt.  As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt.  Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.

Differential Revision: https://reviews.freebsd.org/D2989
Reviewed by: gnn, melifaro
Approved by: jmallett (mentor)
MFC after: 3 days
2015-07-06 02:12:49 +00:00
gnn
ea302f3ee6 New AES modes for IPSec, user space components.
Update setkey and libipsec to understand aes-gcm-16 as an
encryption method.

A partial commit of the work in review D2936.

Submitted by:	eri
Reviewed by:	jmg
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-03 20:09:14 +00:00
markm
d586165577 Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
bz
336ced39f1 Another attempt to make this compile on more architectures after r284777. 2015-06-25 23:16:01 +00:00
eri
164a129722 Correct r284777 to use proper includes and remove dead code to unbreak kernel builds.
Differential Revision:	https://reviews.freebsd.org/D2847
2015-06-25 15:05:58 +00:00
eri
70cda65ad9 ALTQ FAIRQ discipline import from DragonFLY
Differential Revision:  https://reviews.freebsd.org/D2847
Reviewed by:    glebius, wblock(manpage)
Approved by:    gnn(mentor)
Obtained from:  pfSense
Sponsored by:   Netgate
2015-06-24 19:16:41 +00:00
dim
b620c7e63c Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang
3.7.0.

Reviewed by:	marcel
2015-06-23 18:48:41 +00:00
kp
ca237f7117 Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:		200210
Differential Revision:	https://reviews.freebsd.org/D2804
Reviewed by:	philip (mentor)
2015-06-13 19:39:21 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
melifaro
bd614cce8b * Update SFF-8024 Identifier constants.
* Fix SFF_8436_CC_EXT in SFF-8436 memory map.
* Add SFF-8436/8636 bits (revision compliance/nominal bitrate).
* Do some small style/type fixes.
2015-05-16 13:11:35 +00:00
ae
cbc4e577f0 Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
ae
f154af3452 Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated
packets on tunnel interfaces. Add support of these ioctls to gre(4),
gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use
if_fib value from ifnet structure, use proper value in gre(4) and me(4).

Differential Revision:	https://reviews.freebsd.org/D2462
No objection from:	#network
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-12 07:37:27 +00:00
hrs
b1e416de14 Fix a panic when VIMAGE is enabled.
Spotted by:	Nikos Vassiliadis
2015-05-12 03:35:45 +00:00
ae
82077d8af2 Pass mtag argument into m_tag_locate() to continue the search from
the last found mtag.
2015-05-06 14:02:57 +00:00
glebius
e23deb0e60 After r281643 an #ifdef IFT_FOO preprocessor directive returns false,
since types became a enum C type.  Some software uses such ifdefs to
determine whether an operating systems supports certain interface type.
Of course, such check is bogus. E.g. FreeBSD defines about 250 interface
types, but supports only around 20.
However, we need not upset such software so provide a set of defines. The
current set was taken to suffice the dhcpd.

Reported & tested by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:37:40 +00:00
hiren
c60c12803e Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need
to specify exact number of cpus in loader.conf which get annoying when you have
mix of machines which don't have equal number of total cpus. I propose "-1" as
that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all
available cpus.

In collaboration with:	davide
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D2318
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2015-04-25 16:12:06 +00:00
glebius
0cc4e0b97e Don't propagate SIOCSIFCAPS from a vlan(4) to its parent. This leads to
quite unexpected result of toggling capabilities on the neighbour vlan(4)
interfaces.

Reviewed by:		melifaro, np
Differential Revision:	https://reviews.freebsd.org/D2310
Sponsored by:		Nginx, Inc.
2015-04-23 13:19:00 +00:00
rodrigc
7807fcddc4 Move zlib.c from net to libkern.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/

Submitted by:		Steve Kiernan stevek@juniper.net
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/28
Relnotes:		yes
2015-04-22 14:38:58 +00:00
glebius
5afcf58994 Make IFMEDIA_DEBUG a kernel option.
Sponsored by:	Nginx, Inc.
2015-04-21 10:35:23 +00:00
markj
639543ed86 Move the definition of struct bpf_if to bpf.c.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.

Differential Revision:	https://reviews.freebsd.org/D2319
Reviewed by:	melifaro, rpaulo
2015-04-20 22:08:11 +00:00
mav
161ff946d4 Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-04-20 10:44:46 +00:00
glebius
5bf38b8e18 Bring in if_types.h from projects/ifnet, where types are
defined in enum.
2015-04-17 06:39:15 +00:00
glebius
c067f1844a - Format copyright notices, VCS ids.
- Run through unifdef(1).
2015-04-17 06:38:31 +00:00
glebius
a29f5e7ca8 Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by:	net@
2015-04-16 20:22:40 +00:00
araujo
5ca111b13f Remove duplicate header entry. 2015-04-16 02:44:37 +00:00
gnn
5d97cb9c5e Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory.
Differential Revision:	https://reviews.freebsd.org/D2291
2015-04-15 14:46:45 +00:00
gnn
cb4c6b9356 Document internal interface types which are specific to FreeBSD. 2015-04-14 15:21:20 +00:00
glebius
50ede929e1 Redo r274966. Instead of global all-interface all-vnet undocumented sysctl,
use per-interface flag, and document it.

Sponsored by:	Nginx, Inc.
2015-04-10 09:50:13 +00:00
gnn
a54f0e733c Revert 281276 as unnecessary. Proper change to be committed
to the base polling code in a subsequent commit.

Pointed out by: glebius

Sponsored by:	Rubicon Communications (NetGate)
2015-04-09 14:44:30 +00:00
gnn
468eb00d1f Add support for a netisr polling tunable, which allows run time switching of
device polling rather than having it only be controlled by the compile
time option.

Summary: Rubicon Communications (Netgate)

Reviewers: #network, hiren

Reviewed By: #network, hiren

Subscribers: hiren

Differential Revision: https://reviews.freebsd.org/D2258
2015-04-08 20:25:51 +00:00
erj
363cef70bb ifmedia changes:
- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.

- Use some of those new values to define new media types.

- lacp_compose_key() recgonizes the new Ethernet media types added.
  (Change made as required by a comment in if_media.h)

- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
  SIOCGIFMEDIA is retained for backwards compatibility.

- Changes to ifconfig to allow it to handle the new extended media types.

Submitted by:	mike@karels.net (original), hselasky
Reviewed by:	jfvogel, gnn, hselasky
Approved by:	jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965
2015-04-07 21:31:17 +00:00
ae
cfc3df2b8f Fix a possible mbuf leak on interface departure.
Reported by:	Alexandre Martins
2015-03-26 23:40:22 +00:00
glebius
7f06483b1b Fix couple of fallouts from r280280. The first one is a simple typo,
where counter was incremented on parent, instead of vlan(4) interface.

The second is more complicated. Historically, in our stack the incoming
packets are accounted in drivers, while incoming bytes for Ethernet
drivers are accounted in ether_input_internal(). Thus, it should be
removed from vlan(4) driver.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-25 16:01:46 +00:00
glebius
f824cda4b8 Make vlan_config() the signle point of validity checks.
Sponsored by:	Nginx, Inc.
2015-03-20 21:09:03 +00:00
glebius
2cfbd6999b In vlan_clone_match_ethervid():
- Use ifunit() instead of going through the interface list ourselves.
- Remove unused parameter.
- Move the most important comment above the function.

Sponsored by:	Nginx, Inc.
2015-03-20 20:42:58 +00:00
glebius
0ed24917a8 Tiny comment fix. 2015-03-20 14:16:26 +00:00
glebius
754f126520 Now, when r272244 introduced counter(9) based counters for all interfaces,
revert the r271538, which did that for vlan(4) only.

No objections:	melifaro
Sponsored by:	Nginx, Inc.
2015-03-20 14:05:17 +00:00
glebius
d0d9f03f17 Always lock the hash row of a source node when updating its 'states' counter.
PR:		182401
Sponsored by:	Nginx, Inc.
2015-03-17 12:19:28 +00:00
ae
59dd2a1d45 Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().

This prevents panics when if_input() method called directly e.g.
from bpf(4) code.

PR:		192426
Reviewed by:	glebius
MFC after:	1 week
2015-03-12 14:55:33 +00:00
hselasky
de871211fe Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00