3297 Commits

Author SHA1 Message Date
Alexander V. Chernikov
ea491b8afd Remove unused fields from old radix_node_head. 2014-11-09 00:43:14 +00:00
Alexander V. Chernikov
55e5eda676 Separate radix and routing: use different structures for route and
for other customers.

Introduce new 'struct rib_head' for routing purposes and make
all routing api use it.
2014-11-09 00:36:39 +00:00
Alexander V. Chernikov
a9413f6ca0 Sync to HEAD@r274297. 2014-11-08 18:13:35 +00:00
Alexander V. Chernikov
1398ffe5bc Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.
2014-11-08 16:38:15 +00:00
Bjoern A. Zeeb
4dbd7c5dc4 After r274246 make the tree compile again.
gcc requires variables to be initialised in two places.  One of them
is correctly  used only under the same conditional though.

For module builds properly check if the kernel supports INET or INET6,
as otherwise various mips kernels without IPv6 support would fail to build.
2014-11-08 14:41:32 +00:00
Gleb Smirnoff
f4507b7166 ifindex_alloc_locked() never fails and doesn't have no-lock version,
so change the prototype.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-08 07:23:01 +00:00
Alexander V. Chernikov
22b08fd8b7 Split radix implementation and system route table structure:
use new 'struct radix_head' for radix.
2014-11-07 22:52:02 +00:00
Alexander V. Chernikov
389d731d64 Provide typedefs for radix functions. 2014-11-07 22:02:44 +00:00
Andrey V. Elsukov
f325335caf Overhaul if_gre(4).
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.

gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
  protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
  for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
  work at system startup. But this also removes support for having tunnels with
  the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
  Use our standard ioctls for tunnels.

me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);

PR:		164475
Differential Revision:	D1023
No objections from:	net@
Relnotes:	yes
Sponsored by:	Yandex LLC
2014-11-07 19:13:19 +00:00
Gleb Smirnoff
833e8dc5ab Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-07 15:14:10 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Gleb Smirnoff
e6abef0918 Remove useless structure ifindex_entry.
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-07 09:15:39 +00:00
Alexander V. Chernikov
043d919e33 Add new rib4/rib6 series of functions returning per-rte info
packed on stack.
Convert ng_netflow to use new routing API.
2014-11-07 02:04:48 +00:00
Alexander V. Chernikov
064b1bdb2d Convert lle rtchecks to use new routing API.
For inet/ case, this involves reverting r225947
which seem to be pretty strange commit and should
be reverted in HEAD ad well.
2014-11-06 23:35:22 +00:00
Alexander V. Chernikov
57c3556b58 Fix build.
Pointy hat to:	melifaro
2014-11-06 17:50:35 +00:00
Alexander V. Chernikov
146a181f28 Finish r274118: remove useless fields from struct domain.
Sponsored by:	Yandex LLC
2014-11-06 14:39:04 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Alexander V. Chernikov
69b74805d5 Convert gif and stf to use new routing api. 2014-11-04 18:48:13 +00:00
Alexander V. Chernikov
5c9ef37854 Sync to HEAD@r274095. 2014-11-04 18:22:33 +00:00
Alexander V. Chernikov
8c3cfe0be0 Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.
2014-11-04 17:28:13 +00:00
Alexander V. Chernikov
a9ac00b76b Convert in6p_lookup_mcast_ifp() to use new routing api.
* Add special fib6_lookup_nh_ifp() to return rt_ifp
  instead of rt_ifa->ifa_ifp for that.
2014-11-04 17:05:24 +00:00
Alexander V. Chernikov
257480b8ab Convert netinet6/ to use new routing API.
* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
  Currently it is used by 2 differenct type of customers:
  - socket-based one, which all are unsure about provided
   address scope and
  - in-kernel ones (ND code mostly), which don't have
    any sockets, options, crededentials, etc.
  So, we provide two different wrappers to in6_selectsrc()
  returning select source.
* Make different versions of selectroute():
  Currenly selectroute() is used in two scenarios:
  - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
  - output, via in6_output -> wrapper -> selectroute()
  Provide different versions for each customer:
  - fib6_lookup_nh_basic()-based in6_selectif() which is
    capable of returning interface only, without MTU/NHOP/L2
    calculations
  - full-blown fib6_selectroute() with cached route/multipath/
    MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes
2014-11-04 15:39:56 +00:00
Hans Petter Selasky
f8ca61996e Clarify TSO segment limit comment and remove two TABs to make lines a
bit shorter.

Sponsored by:	Mellanox Technologies
2014-11-03 13:02:58 +00:00
Mark Murray
10cb24248a This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random.
This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.

The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.

The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.

Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.

My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.

My Nomex pants are on. Let the feedback commence!

Reviewed by:	trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by:	so(des)
2014-10-30 21:21:53 +00:00
Konstantin Belousov
0a2c94b86e Replace some calls to fuword() by fueword() with proper error checking.
Sponsored by:	The FreeBSD Foundation
Tested by:	pho
MFC after:	3 weeks
2014-10-28 15:28:20 +00:00
Hans Petter Selasky
0e1152fcc2 The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-28 12:00:39 +00:00
Alexander V. Chernikov
30514718e7 Convert several places inside netinet6/ to new api. 2014-10-25 22:53:08 +00:00
Alexander V. Chernikov
7b42f6fae2 * Convert TOE framework to use new routing api.
* Add fib6_lookup_nh_ext().
* Rename union structures:
  nhop64_basic -> nhopu_basic,
  nhop64_extended -> nhopu_extended
2014-10-25 18:25:00 +00:00
Alexander V. Chernikov
9f65116cc1 * Increase nh_flags to be u16 thus reducing nhop payload to be 48 bytes
* Use NHF_ namespace for all nhop flags
* Rename nhop_data -> nhop_prepend
* Rename fib4_lookup_nh_extended -> fib4_lookup_nh_ext
* Add "flags" argument to fib4_lookup_nh_ext() to specify whether we want
  returned nh_ext structure to be refcounted or not.
2014-10-25 15:32:56 +00:00
Alexander V. Chernikov
b863adaaf3 Convert last piece of ip_forward to use new rouing api. 2014-10-24 22:00:25 +00:00
Andrey V. Elsukov
a663aa4ce8 Remove redundant check and m_pullup() call. 2014-10-24 13:34:22 +00:00
Alexander V. Chernikov
f50706648c Add new fib4_lookup_nh_extended() which fills in nhop4_extended
structure without doinf L2 resolve. It also requires freeing
 references by calling fib4_free_nh_ext().

Convert in_pcbladdr() to use it.
Convert tcp_maxmtu() to use it.
2014-10-23 23:11:04 +00:00
Alexander V. Chernikov
2bb83c79f6 Rename ip_sendmbuf to fib4_sendmbuf() and move it to rt_nhops api.
Convert IPv4 SAS to use new routing api.
2014-10-23 21:09:14 +00:00
Andrey V. Elsukov
61dc434406 Move if_get_counter initialization from if_attach into if_alloc.
Also, initialize all counters before ifnet will become available in the system.
This fixes possible access to uninitialized ifned fields.

PR:		194550
2014-10-23 14:29:52 +00:00
Luigi Rizzo
11a5be0f60 since we cast a pointer, use the correct signedness
(this was already in, and got lost in a recent update).
2014-10-22 18:55:36 +00:00
Bryan Venteicher
854f7e89e6 Use the size of the Ethernet address, not the entire header, when
copying into forwarding entry.

Reported by:	Coverity
CID:		1248849
2014-10-21 05:45:57 +00:00
Bryan Venteicher
007054f070 Add vxlan interface
vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in
a UDP packet. This implementation is based on RFC7348.

Currently, the IPv6 support is not fully compliant with the specification:
we should be able to receive UPDv6 packets with a zero checksum, but we
need to support RFC6935 first. Patches for this should come soon.

Encapsulation protocols such as vxlan emphasize the need for the FreeBSD
network stack to support batching, GRO, and GSO. Each frame has to make
two trips through the network stack, and each frame will be at most MTU
sized. Performance suffers accordingly.

Some latest generation NICs have begun to support vxlan HW offloads that
we should also take advantage of. VIMAGE support should also be added soon.

Differential Revision:	https://reviews.freebsd.org/D384
Reviewed by:	gnn
Relnotes:	yes
2014-10-20 14:42:42 +00:00
Alexander V. Chernikov
b4e8f808bf Switch IPv4 output path to use new routing api.
The goals of the new API is to provide consumers with minimal
  needed information, but as fast as possible. So we provide
  full nexthop info copied into alighed on-cache structure
  instead of rte/ia pointers, their refcounts and locks.
  This does not provide solution for protecting from egress
  ifp destruction, but does not make it any worse.

Current changes:

nhops:
Add fib4_lookup_prepend() function which stores either full
L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or
L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2
info exists and we have to take "slow" path.

ip_output:
Currently ip[ 46]_output consumers use 'struct route' for
the following purposes:
  1) double lookup avoidance(route caching)
  2) plain route caching
  3) get path MTU to be able to notify source.
The former pattern is mostly used by various tunnels
 (gif, gre, stf). (Actually, gre is the only remaining,
 others were already converted. Their locking model did
 not scale good enogh to benefit from such caching, so
 we have (temporarily) removed it without any performance
 loss).
Plain route caching used by SCTP is simply wrong and should be removed.
  Temporary break it for now just to be able to compile.
Optimize path mtu reporting by providing it in new 'route_info' stucture.

Minimize games with @ia locking/refcounting for route lookup:
  add special nhop[46]_extended structure to store more route attributes.
  Pointer to given structure can be passed to fib4_lookup_prepend() to indicate
  we want this info (we actually needs it for UDP and raw IP).

ether_output:
Provide light-weight ether_output2() call to deal with
transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/
  other L2 hooks before actually transmitting frame by if_transmit()).
Add a hack based on new RT_NHOP ro_flag to distinguish which version should
  we call. Better way is probably to add a new "if_output_frame" driver
  callbacks.

 Next steps:
* Convert ip_fastfwd part
* Implement auto-growing array for per-radix nexthops
* Implement LLE tracking for nexthop calculations to be able to
  immediately provide all necessary info in single route lookup
  for gateway routes
* Switch radix locking scheme to runtime/cfg lock
* Implement multipath support for rtsock
* Implement "tracked nexthops" for tunnels (e.g. _proper_
  nexthop caching)
* Add IPv6 support for remaining parts (postponed not to
   interfere with user/ae/inet6 branch)
* Consider adding "if_output_frame" driver call to
  ease logical frame pushing.
2014-10-19 21:07:35 +00:00
Alexander V. Chernikov
d74b9a2c6a * Remove route caching in if_stf.
* Copy necessary in6_ifa on stack instead of playing with refcounts.
2014-10-17 15:07:04 +00:00
Hiroki Sato
bf6d3f0c7c - Fix lladdr configuration which could prevent LACP mode from working.
- Fix LORs when a laggport interface has an IPv6 LLA.

PR:	194321
2014-10-17 09:08:44 +00:00
Andrey V. Elsukov
245c40e879 Add more ifdefs. SIOC*_IN6 are defined only with INET6.
MFC after:	1 month
Reported  by:	bz
2014-10-14 14:51:27 +00:00
Andrey V. Elsukov
138d56556c Move memset under ifdef INET6.
MFH:		1 month
Reported by:	bz
2014-10-14 14:41:06 +00:00
Andrey V. Elsukov
0b9f5f8a5f Overhaul if_gif(4):
o convert to if_transmit;
 o use rmlock to protect access to gif_softc;
 o use sx lock to protect from concurrent ioctls;
 o remove a lot of unneeded and duplicated code;
 o remove cached route support (it won't work with concurrent io);
 o style fixes.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2014-10-14 13:31:47 +00:00
Alexander V. Chernikov
9ae91cc416 Implement fib*_lookup_nh_basic to provide fast non-refcounted
way to determine egress ifp / mtu.
2014-10-12 11:22:25 +00:00
Hiroki Sato
3c3136b1dd Virtualize if_epair(4). An if_xname check for both "a" and "b" interfaces
is added to return EEXIST when only "b" interface exists---this can happen
when epair<N>b is moved to a vnet jail and then "ifconfig epair<N> create"
is invoked there.
2014-10-10 06:45:13 +00:00
Andrey V. Elsukov
5b7a43f546 When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.

This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.

PR:		174602
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-10-08 21:23:34 +00:00
Andrey V. Elsukov
9ef268219a Our packet filters use mbuf's rcvif pointer to determine incoming interface.
Change mbuf's rcvif to enc0 and restore it after pfil processing.

PR:		110959
Sponsored by:	Yandex LLC
2014-10-07 13:31:04 +00:00
Hiroki Sato
3b4b7de506 Virtualize if_edsc(4). 2014-10-05 21:27:26 +00:00
Hiroki Sato
d6f59204ef Virtualize if_disc(4) cloner. 2014-10-05 19:46:52 +00:00
Hiroki Sato
c51275260b Virtualize if_bridge(4) cloner. 2014-10-05 19:43:37 +00:00