3259 Commits

Author SHA1 Message Date
Alexander V. Chernikov
9f65116cc1 * Increase nh_flags to be u16 thus reducing nhop payload to be 48 bytes
* Use NHF_ namespace for all nhop flags
* Rename nhop_data -> nhop_prepend
* Rename fib4_lookup_nh_extended -> fib4_lookup_nh_ext
* Add "flags" argument to fib4_lookup_nh_ext() to specify whether we want
  returned nh_ext structure to be refcounted or not.
2014-10-25 15:32:56 +00:00
Alexander V. Chernikov
b863adaaf3 Convert last piece of ip_forward to use new rouing api. 2014-10-24 22:00:25 +00:00
Alexander V. Chernikov
f50706648c Add new fib4_lookup_nh_extended() which fills in nhop4_extended
structure without doinf L2 resolve. It also requires freeing
 references by calling fib4_free_nh_ext().

Convert in_pcbladdr() to use it.
Convert tcp_maxmtu() to use it.
2014-10-23 23:11:04 +00:00
Alexander V. Chernikov
2bb83c79f6 Rename ip_sendmbuf to fib4_sendmbuf() and move it to rt_nhops api.
Convert IPv4 SAS to use new routing api.
2014-10-23 21:09:14 +00:00
Alexander V. Chernikov
b4e8f808bf Switch IPv4 output path to use new routing api.
The goals of the new API is to provide consumers with minimal
  needed information, but as fast as possible. So we provide
  full nexthop info copied into alighed on-cache structure
  instead of rte/ia pointers, their refcounts and locks.
  This does not provide solution for protecting from egress
  ifp destruction, but does not make it any worse.

Current changes:

nhops:
Add fib4_lookup_prepend() function which stores either full
L2+L3 prepend info (e.g. MAC header in case of plain IPv4) or
L3 info with NH_FLAGS_L2_INCOMPLETE flag indicating that no valid L2
info exists and we have to take "slow" path.

ip_output:
Currently ip[ 46]_output consumers use 'struct route' for
the following purposes:
  1) double lookup avoidance(route caching)
  2) plain route caching
  3) get path MTU to be able to notify source.
The former pattern is mostly used by various tunnels
 (gif, gre, stf). (Actually, gre is the only remaining,
 others were already converted. Their locking model did
 not scale good enogh to benefit from such caching, so
 we have (temporarily) removed it without any performance
 loss).
Plain route caching used by SCTP is simply wrong and should be removed.
  Temporary break it for now just to be able to compile.
Optimize path mtu reporting by providing it in new 'route_info' stucture.

Minimize games with @ia locking/refcounting for route lookup:
  add special nhop[46]_extended structure to store more route attributes.
  Pointer to given structure can be passed to fib4_lookup_prepend() to indicate
  we want this info (we actually needs it for UDP and raw IP).

ether_output:
Provide light-weight ether_output2() call to deal with
transmitting L2 frame (e.g. properly handle broadcast/simloop/bridge/
  other L2 hooks before actually transmitting frame by if_transmit()).
Add a hack based on new RT_NHOP ro_flag to distinguish which version should
  we call. Better way is probably to add a new "if_output_frame" driver
  callbacks.

 Next steps:
* Convert ip_fastfwd part
* Implement auto-growing array for per-radix nexthops
* Implement LLE tracking for nexthop calculations to be able to
  immediately provide all necessary info in single route lookup
  for gateway routes
* Switch radix locking scheme to runtime/cfg lock
* Implement multipath support for rtsock
* Implement "tracked nexthops" for tunnels (e.g. _proper_
  nexthop caching)
* Add IPv6 support for remaining parts (postponed not to
   interfere with user/ae/inet6 branch)
* Consider adding "if_output_frame" driver call to
  ease logical frame pushing.
2014-10-19 21:07:35 +00:00
Alexander V. Chernikov
9ae91cc416 Implement fib*_lookup_nh_basic to provide fast non-refcounted
way to determine egress ifp / mtu.
2014-10-12 11:22:25 +00:00
Hiroki Sato
3c3136b1dd Virtualize if_epair(4). An if_xname check for both "a" and "b" interfaces
is added to return EEXIST when only "b" interface exists---this can happen
when epair<N>b is moved to a vnet jail and then "ifconfig epair<N> create"
is invoked there.
2014-10-10 06:45:13 +00:00
Andrey V. Elsukov
5b7a43f546 When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.

This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.

PR:		174602
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-10-08 21:23:34 +00:00
Andrey V. Elsukov
9ef268219a Our packet filters use mbuf's rcvif pointer to determine incoming interface.
Change mbuf's rcvif to enc0 and restore it after pfil processing.

PR:		110959
Sponsored by:	Yandex LLC
2014-10-07 13:31:04 +00:00
Hiroki Sato
3b4b7de506 Virtualize if_edsc(4). 2014-10-05 21:27:26 +00:00
Hiroki Sato
d6f59204ef Virtualize if_disc(4) cloner. 2014-10-05 19:46:52 +00:00
Hiroki Sato
c51275260b Virtualize if_bridge(4) cloner. 2014-10-05 19:43:37 +00:00
Hiroki Sato
7eb756fab1 Use printb() for boolean flags in ro_opts and actor_state for LACP. 2014-10-05 02:37:01 +00:00
Hiroki Sato
6d47816791 - Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK().  This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.
2014-10-05 02:34:21 +00:00
Hiroki Sato
9732189ca9 Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).
2014-10-02 20:01:13 +00:00
Hiroki Sato
478e052062 Virtualize net.link.vlan.soft_pad. 2014-10-02 05:56:17 +00:00
Hiroki Sato
939a050ad9 Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.

Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead.  Flags and per-interface
statistics counters are displayed in "ifconfig -v".

CR:	D842
2014-10-01 21:37:32 +00:00
Alexander V. Chernikov
8b1af054e8 Free radix mask entries on main radix destroy.
This is temporary commit to be merged to 10.
Other approach (like hash table) should be used
to store different masks.

PR:		194078
Submitted by:	Rumen Telbizov
MFC after:	3 days
2014-10-01 21:24:58 +00:00
Alexander V. Chernikov
31f0d081d8 Remove lock init from radix.c.
Radix has never managed its locking itself.
The only consumer using radix with embeded rwlock
is system routing table. Move per-AF lock inits there.
2014-10-01 14:39:06 +00:00
Gleb Smirnoff
dee826cec0 Fix off by one in lagg_port_destroy().
Reported by:	"Max N. Boyarov" <zotrix bsd.by>
2014-10-01 11:23:54 +00:00
Bjoern A. Zeeb
cbaac00901 Move the unconditional #include of net/ifq.h to the very end of file.
This seems to allow us to pass a universe with either clang or gcc
after r272244 (and r272260) and probably makes it easier to untabgle
these chained #includes in the future.
2014-09-28 17:09:40 +00:00
Bjoern A. Zeeb
0110795a35 Remove duplicate declaraton of the if_inc_counter() function after r272244.
if_var.h has the expected on and if_var.h include ifq.h and thus we get
duplicates.  It seems only one cavium ethernet file actually includes ifq.h
directly which might be another cleanup to be done but need to test first.
2014-09-28 15:38:21 +00:00
Gleb Smirnoff
bd071d4d19 - Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
  they are absolutely unrelated to queues.

Submitted by:	Mikhail <mp lenta.ru> [1]
2014-09-28 14:05:18 +00:00
Gleb Smirnoff
112f50ffb2 Finally, convert counters in struct ifnet to counter(9).
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-28 08:57:07 +00:00
Gleb Smirnoff
2357543753 Convert to if_inc_counter() last remnantes of bare access to struct ifnet
counters.
2014-09-28 07:43:38 +00:00
Alexander V. Chernikov
7d6cc45c9b Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by:	asomers (prev.version), glebius
CR:		D781
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-09-27 13:57:48 +00:00
Gleb Smirnoff
eade13f9d2 Remove macros that hide access to struct ifnet fields. 2014-09-26 13:02:29 +00:00
Gleb Smirnoff
38738d739a Make all lagg protocol methods live in lagg_protos, not in softc. All
interfaces of a same protocol, use the same methods.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 12:54:24 +00:00
Andrey V. Elsukov
30e5de489d Keep list of lagg ports sorted by if_index.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-09-26 12:42:06 +00:00
Gleb Smirnoff
6900d0d328 - Whitespace.
- Remove caddr_t.
2014-09-26 12:35:58 +00:00
Gleb Smirnoff
16ca790ead - Provide lagg_proto_attach(), lagg_proto_detach().
- Make detach a protocol method in lagg_protos.
- Simplify code to lookup protocols.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 11:01:04 +00:00
Gleb Smirnoff
09c7577ef3 - When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
then drop lock, run the attach routines, and then set it to specific
  proto. This removes tons of WITNESS warnings.
- Make lagg protocol attach handlers not failing and allocate memory
  with M_WAITOK.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 08:42:32 +00:00
Gleb Smirnoff
b5e094cfd7 Make lagg protos a enum. 2014-09-26 08:12:12 +00:00
Gleb Smirnoff
b1bbc5b3d1 Make lagg protocols detach methods returning void.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-26 07:12:40 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Hiroki Sato
9f21b0b8b2 Fix build. 2014-09-21 07:16:51 +00:00
Hiroki Sato
89c58b73e0 - Virtualize interface cloner for gre(4). This fixes a panic when destroying
a vnet jail which has a gre(4) interface.

- Make net.link.gre.max_nesting vnet-local.
2014-09-21 03:56:06 +00:00
Hiroki Sato
a7f5886ec7 Virtualize interface cloner for gif(4). This fixes a panic when destroying
a vnet jail which has a gif(4) interface.
2014-09-21 03:55:04 +00:00
Hiroki Sato
ee0bd4b909 Make net.add_addr_allfibs vnet-local. 2014-09-21 03:48:20 +00:00
Gleb Smirnoff
3751dddb3e Mechanically convert to if_inc_counter(). 2014-09-19 10:39:58 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Gleb Smirnoff
a6f2696932 Increase errors, not queue drops, in cases the module is supplied
with a bad packet or if mbuf allocation failes.
2014-09-19 05:43:38 +00:00
Gleb Smirnoff
d2a707cdfa Remove a bunch of methods that are superseded by if_inc_counter().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 16:17:20 +00:00
Gleb Smirnoff
1b7fb1d93f While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 14:47:13 +00:00
Gleb Smirnoff
35853c2c60 Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.
2014-09-18 14:38:28 +00:00
Gleb Smirnoff
277e067a58 While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with:	melifaro
2014-09-18 10:01:56 +00:00
Gleb Smirnoff
0b7b006c7f Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 09:54:57 +00:00
Marcelo Araujo
5d99eb5926 Revert r271735. The comment is absolutely correct, we do not support 802.1p priority tagging. I got confused with the packet tagged and packet to be tagged.
Spotted by:	glebius
2014-09-18 05:43:19 +00:00
Marcelo Araujo
397bdf7cd5 Remove old comment, we already do 802.1q tagging.
Phabric:	D797
Reviewed by:	kevlo
Approved by:	kevlo
Sponsored by:	QNAP Systems Inc.
2014-09-18 03:09:34 +00:00
Marcelo Araujo
99cdd96163 Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group
and receives frames on any port of the lagg(4).

Phabric:	D549
Reviewed by:	glebius, thompsa
Approved by:	glebius
Obtained from:	OpenBSD
Sponsored by:	QNAP Systems Inc.
2014-09-18 02:12:48 +00:00