Commit Graph

492 Commits

Author SHA1 Message Date
n_hibma
6a72802152 Change net.link.log_promisc_mode_change to a read-only tunable
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	hselasky
MFC after:	3 days
2016-05-25 09:00:05 +00:00
bz
47d341d05f Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by:		gnn
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6438
2016-05-18 20:06:45 +00:00
scottl
596f2a146d Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00
truckman
4d98e41134 When handling SIOCSIFNAME ensure that the new interface name is NUL
terminated.  Reject the rename attempt if the name is too long.

MFC after:	1 week
2016-05-15 21:37:36 +00:00
n_hibma
34530d661f Allow silencing of 'promiscuous mode enabled/disabled' messages.
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	eugen.grosbein.net
MFC after:	1 week
2016-05-12 19:42:13 +00:00
pfg
d9c9113377 sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
pfg
12232f8463 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
bz
146f54f44d During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.

Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.

Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5896
2016-04-11 10:00:38 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
bz
d5983c09f1 If vnets are torn down while ifconfig runs an ioctl to say, destroy an
epair(4), we may hit if_detach_internal() without holding a lock and by
the time we aquire it the interface might be gone.
We should not panic() in this case as it is our fault for not holding
the lock all the way. It is not ideal to return silently without error
to user space, but other callers will all ignore the return values so
do not change the entire KPI for little benefit for now.
The ifp will be dealt with one way or another still.

Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4529
2015-12-22 15:03:45 +00:00
smh
45d5617154 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
smh
864cf18128 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
ae
d81208c948 Overhaul if_enc(4) and make it loadable in run-time.
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
2015-11-25 07:31:59 +00:00
melifaro
c49cb93294 Move iflladdr_event eventhandler invocation to if_setlladdr.
Suggested by:	glebius
2015-11-14 13:34:03 +00:00
melifaro
a0ced91366 Use lladdr_event to propagate gratiotus arp.
Differential Revision:	https://reviews.freebsd.org/D4019
2015-11-09 10:11:14 +00:00
melifaro
a23a95f981 Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after:		4 weeks
Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D4004
2015-11-01 19:59:04 +00:00
melifaro
44be0cdaa8 Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
  if old ifp fib differes from new one, that the caller
  is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
2015-09-16 06:23:15 +00:00
melifaro
bbab608243 Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
melifaro
33c52eed18 MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
zec
c8374d535b Prevent null-pointer dereferencing.
MFC after:	3 days
2015-07-20 08:21:51 +00:00
dim
b620c7e63c Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang
3.7.0.

Reviewed by:	marcel
2015-06-23 18:48:41 +00:00
erj
363cef70bb ifmedia changes:
- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.

- Use some of those new values to define new media types.

- lacp_compose_key() recgonizes the new Ethernet media types added.
  (Change made as required by a comment in if_media.h)

- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
  SIOCGIFMEDIA is retained for backwards compatibility.

- Changes to ifconfig to allow it to handle the new extended media types.

Submitted by:	mike@karels.net (original), hselasky
Reviewed by:	jfvogel, gnn, hselasky
Approved by:	jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965
2015-04-07 21:31:17 +00:00
ae
59dd2a1d45 Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().

This prevents panics when if_input() method called directly e.g.
from bpf(4) code.

PR:		192426
Reviewed by:	glebius
MFC after:	1 week
2015-03-12 14:55:33 +00:00
hrs
1d30007287 Fix group membership of cloned interfaces when one is moved by
if_vmove().

In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface.  When
detaching, if_delgroup() was called and the interface leaves all of
the group membership.  And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.

This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation.  However, if_vmove() removed the membership and did
not restore upon attachment.

Differential Revision:	https://reviews.freebsd.org/D1859
2015-03-02 20:00:03 +00:00
melifaro
f8d64c469a Finish r274175: do control plane MTU tracking.
Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR:		194238
MFC after:	1 month
CR:		D1125
2014-11-17 01:05:29 +00:00
hselasky
4d1b5f70ee Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by:	Mellanox Technologies
Discussed with:	lstewart @
MFC after:	1 week
2014-11-11 12:05:59 +00:00
glebius
83e84205ec Use standard mtx(9), rwlock(9), sx(9) system initialization macros
instead of doing initialization manually.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-09 11:11:08 +00:00
glebius
959c68aefc ifindex_alloc_locked() never fails and doesn't have no-lock version,
so change the prototype.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-08 07:23:01 +00:00
glebius
6306f79560 Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-07 15:14:10 +00:00
glebius
ac67ed6c19 Remove useless structure ifindex_entry.
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-07 09:15:39 +00:00
melifaro
11af63037f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
ae
2921b89f9b Move if_get_counter initialization from if_attach into if_alloc.
Also, initialize all counters before ifnet will become available in the system.
This fixes possible access to uninitialized ifned fields.

PR:		194550
2014-10-23 14:29:52 +00:00
glebius
2cb6078939 Finally, convert counters in struct ifnet to counter(9).
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-28 08:57:07 +00:00
hselasky
bdacf9ba4d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
glebius
72f04611ec Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
glebius
de25153d59 Remove a bunch of methods that are superseded by if_inc_counter().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 16:17:20 +00:00
glebius
c2d27a81fe While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 14:47:13 +00:00
glebius
0917a065ca Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.
2014-09-18 14:38:28 +00:00
glebius
bf71125f67 While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with:	melifaro
2014-09-18 10:01:56 +00:00
glebius
f76e492f6d Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 09:54:57 +00:00
hselasky
727760a4e4 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
hselasky
3d04a989df Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
asomers
081aa8a15c Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
	Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
	Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
	versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
	Fixup calls of modified functions.

share/man/man9/ifnet.9
	Document changed API.

CR:		https://reviews.freebsd.org/D458
MFC after:	Never
Sponsored by:	Spectra Logic
2014-09-11 20:21:03 +00:00
glebius
833eb3c331 It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-31 12:48:13 +00:00
glebius
70b7c46209 o Remove struct if_data from struct ifnet. Now it is merely API structure
for route(4) socket and ifmib(4) sysctl.
o Move fields from if_data to ifnet, but keep all statistic counters
  separate, since they should disappear later.
o Provide function if_data_copy() to fill if_data, utilize it in routing
  socket and ifmib handler.
o Provide overridable ifnet(9) method to fetch counters. If no provided,
  if_get_counters_compat() would be used, that returns old counters.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-31 06:46:21 +00:00
royger
6a2fcceb9c net: move interface removal notification up in if_detach_internal
This is needed to prevent having interfaces with ifp->if_addr == NULL
on bridge interfaces. Moving the notification event handlers up makes
sure the interfaces are removed before doing any more cleanup.

Sponsored by: Citrix Systems R&D
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D598

net/if.c
 - Move interface removal notification up in if_detach_internal.
2014-08-16 10:47:24 +00:00
glebius
d32e428cc3 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
kevlo
940bebdcf2 Deprecate m_act. Use m_nextpkt always. 2014-07-17 05:21:16 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00