Commit Graph

566 Commits

Author SHA1 Message Date
Brooks Davis
f8f65519d2 Fix a whitespace bug missed in refactoring prior to r331641.
MFC with:	r331641
2018-03-27 18:55:39 +00:00
Brooks Davis
86d2ef167a Fix access to ifru_buffer on freebsd32.
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14846
2018-03-27 18:26:50 +00:00
Konstantin Belousov
f137973487 Allow to specify PCP on packets not belonging to any VLAN.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful.  See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface.  When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value.  The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive.  This bug should be fixed for each driver.

Reviewed by:	ae (previous version), hselasky, melifaro
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14702
2018-03-27 15:29:32 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Eugene Grosbein
9f23a54e52 Allow a process to assign an IP address to local ppp interface
even if kernel routing table already has a route to the address in question
installed by some routing daemon (PR 223129).

Also, allow loopback route deletion when stopping a VIMAGE jail (PR 222647).

PR:			222647, 223129
Reviewed by:		gnn
Approved by:		avg (mentor), mav (mentor)
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D12747
2017-11-05 14:41:48 +00:00
Sepherosa Ziehau
0f3af0411d if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485

These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.

Setting RSS key and hash type/function is a different story,
which probably requires more discussion.

Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.

Reviewed by:	gallatin
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D12174
2017-09-05 05:28:52 +00:00
Ravi Pokala
ddae57504b Persistently store NIC's hardware MAC address, and add a way to retrive it
The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR:		194386
Reviewed by:	glebius
MFC after:	1 week
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D10609
2017-05-10 22:13:47 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Gleb Smirnoff
efe3b0de14 Remove SVR4 (System V Release 4) binary compatibility support.
UNIX System V Release 4 is operating system released in 1988. It ceased
to exist in early 2000-s.
2017-02-28 05:14:42 +00:00
Stephen J. Kiernan
d0b2cad1ca Add the folowing set accessor functions for recently-added members of ifnet
structure:

if_gethwtsomax(), if_sethwtsomax()                 - if_hw_tsomax
if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount
if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize()   - if_hw_tsomaxsegsize

Update em and vnic drivers which had already been coverted to use accessor
functions for the other ifnet structure members.

Reviewed by:	erj
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D8544
2017-01-31 16:12:31 +00:00
Andriy Voskoboinyk
2bbd06fc33 Garbage collect IFT_IEEE80211 (but leave the define for possible reuse)
This interface type ("a parent interface of wlanX") is not used since
r287197

Reviewed by:	adrian, glebius
Differential Revision:	https://reviews.freebsd.org/D9308
2017-01-28 17:08:40 +00:00
Dexuan Cui
6597559ea7 ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h
Thank glebius for pointing this out:
"The network stuff shall not be added to sys/eventhandler.h"

Reviewed by:	David_A_Bright_DELL.com, sephe, glebius
Approved by:	sephe (mentor)
MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D9345
2017-01-28 07:26:42 +00:00
Luiz Otavio O Souza
338e227ac0 After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by:	glebius
Obtained from:	pfSense
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-25 19:04:08 +00:00
Dexuan Cui
92a6859b91 ifnet: introduce event handlers for ifup/ifdown events
Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together, mainly to support seamless live migration.

When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).

So the synthetic NIC driver needs to know when a VF device is becoming
UP or DOWN and hence the patch is made.

Reviewed by:	sephe
Approved by:	sephe (mentor)
MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8963
2017-01-24 09:19:46 +00:00
Sepherosa Ziehau
cc5bb78be1 if: Defer the if_up until the ifnet.if_ioctl is called.
This ensures the interface is initialized by the interface driver
before it can be used by the rest of the system.

Reviewed by:	jhb, karels, gnn
MFC after:	3 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8905
2017-01-06 05:10:49 +00:00
Sepherosa Ziehau
368bf0c2c6 ifnet: Use if_link_state snapshot to invoke ifnet_link_event
So that everyone in this task have consistent view of link state.

Reviewed by:	ae
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8214
2016-10-12 01:52:29 +00:00
Gleb Smirnoff
32e0ade6c4 Partially revert r257696/r257713, which have an issue with writing to user
controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c.

Noticed by:	C Turt
2016-07-24 10:10:09 +00:00
Bjoern A. Zeeb
a29c7aeb2e Several device drivers call if_alloc() and then do further checks and
will cal if_free() in case of conflict, error, ..
if_free() however sets the VNET instance from the ifp->if_vnet which
was not yet initialized but would only in if_attach(). Fix this by
setting the curvnet from where we allocate the interface in if_alloc().
if_attach() will later overwrite this as needed. We do not set the home_vnet
early on as we only want to prevent the if_free() panic but not change any
of the other housekeeping, e.g., triggered through ifioctl()s.

Reviewed by:	brooks
Approved by:	re (gjb)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D7010
2016-06-29 05:21:25 +00:00
Bjoern A. Zeeb
d3f6f80f4b After r302054 unloading an network interface driver on a kernel
without VIMAGE support would dereference a NULL point unconditionally
leading to a panic.  Wrap the entire VIMAGE related code with #ifdefs
rather than just the decision making part to save an extra bit of
resources.

Reported by:	np
Sponsored by:	The FreeBSD Foundation
MFC After:	13 days
Approved by:	re (marius)
2016-06-22 11:45:30 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Bjoern A. Zeeb
2d5ad99a0d After tearing down the interface per-"domain" bits, set the data area
to NULL to avoid it being mis-treated on a possible re-attach but also
to get a clean NULL pointer derefence in case of errors due to
unexpected race conditions elsewhere in the code, e.g., callouts.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 22:59:58 +00:00
Bjoern A. Zeeb
d117fd8003 Similarly to r301505 protect the removal of the ifa from the if_addrhead
by a lock (as well as the check that the list is not empty).

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 16:23:02 +00:00
Bjoern A. Zeeb
f22d78c06e In if_purgeaddrs() we cannot hold the lock over the entire loop
due to called functions (as in other parts of the stack, leave a comment).
Put around a lock the removal of the ifa from the list however to
reduce the possible race with other places.

Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-06-06 13:17:25 +00:00
Bjoern A. Zeeb
c169d9fe07 In if_attachdomain1() there does not seem to be any reason
to use TRYLOCK rather than just acquire the lock, so just do that.

Reviewed by:		markj
Obtained from:		projects/vnet
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6578
2016-05-28 08:32:15 +00:00
Nick Hibma
dbd2ee46b2 Change net.link.log_promisc_mode_change to a read-only tunable
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	hselasky
MFC after:	3 days
2016-05-25 09:00:05 +00:00
Bjoern A. Zeeb
ad4e911678 Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by:		gnn
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6438
2016-05-18 20:06:45 +00:00
Scott Long
4c7070db25 Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00
Don Lewis
1ef3d54d20 When handling SIOCSIFNAME ensure that the new interface name is NUL
terminated.  Reject the rename attempt if the name is too long.

MFC after:	1 week
2016-05-15 21:37:36 +00:00
Nick Hibma
6d07c1575b Allow silencing of 'promiscuous mode enabled/disabled' messages.
PR:		166255
Submitted by:	eugen.grosbein.net
Obtained from:	eugen.grosbein.net
MFC after:	1 week
2016-05-12 19:42:13 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Pedro F. Giffuni
155d72c498 sys/net* : for pointers replace 0 with NULL.
Mostly cosmetical, no functional change.

Found with devel/coccinelle.
2016-04-15 17:30:33 +00:00
Bjoern A. Zeeb
05fc416403 During if_vmove() we call if_detach_internal() which in turn calls the event
handler notifying about interface departure and one of the consumers will
detach if_bpf.
There is no way for us to re-attach this easily as the DLT and hdrlen are
only given on interface creation.
Add a function to allow us to query the DLT and hdrlen from a current
BPF attachment and after if_attach_internal() manually re-add the if_bpf
attachment using these values.

Found by panics triggered by nd6 packets running past BPF_MTAP() with no
proper if_bpf pointer on the interface.

Also add a basic DDB show function to investigate the if_bpf attachment
of an interface.

Reviewed by:	gnn
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5896
2016-04-11 10:00:38 +00:00
Alexander V. Chernikov
4fb3a8208c Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
Bjoern A. Zeeb
f501e6f136 If vnets are torn down while ifconfig runs an ioctl to say, destroy an
epair(4), we may hit if_detach_internal() without holding a lock and by
the time we aquire it the interface might be gone.
We should not panic() in this case as it is our fault for not holding
the lock all the way. It is not ideal to return silently without error
to user space, but other callers will all ignore the return values so
do not change the entire KPI for little benefit for now.
The ifp will be dealt with one way or another still.

Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Reviewed by:		gnn
Differential Revision:	https://reviews.freebsd.org/D4529
2015-12-22 15:03:45 +00:00
Steven Hartland
d6e82913c1 Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by:	Multiplay
2015-12-17 14:41:30 +00:00
Steven Hartland
52e53e2de0 Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR:		156226
MFC after:	1 month
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4111
2015-12-15 16:02:11 +00:00
Andrey V. Elsukov
ef91a9765d Overhaul if_enc(4) and make it loadable in run-time.
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
2015-11-25 07:31:59 +00:00
Alexander V. Chernikov
8ad43f2d0a Move iflladdr_event eventhandler invocation to if_setlladdr.
Suggested by:	glebius
2015-11-14 13:34:03 +00:00
Alexander V. Chernikov
b13c5b5db2 Use lladdr_event to propagate gratiotus arp.
Differential Revision:	https://reviews.freebsd.org/D4019
2015-11-09 10:11:14 +00:00
Alexander V. Chernikov
bb3d23fd35 Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after:		4 weeks
Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D4004
2015-11-01 19:59:04 +00:00
Alexander V. Chernikov
59c180c35c Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
  if old ifp fib differes from new one, that the caller
  is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
2015-09-16 06:23:15 +00:00
Alexander V. Chernikov
441f9243df Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
Alexander V. Chernikov
4bdf0b6a9a MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
Marko Zec
22a9384098 Prevent null-pointer dereferencing.
MFC after:	3 days
2015-07-20 08:21:51 +00:00
Dimitry Andric
966ab68df1 Fix endless recursion in sys/net/if.c's drbr_inuse_drv(), found by clang
3.7.0.

Reviewed by:	marcel
2015-06-23 18:48:41 +00:00
Eric Joyner
eb7e25b22f ifmedia changes:
- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.

- Use some of those new values to define new media types.

- lacp_compose_key() recgonizes the new Ethernet media types added.
  (Change made as required by a comment in if_media.h)

- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
  SIOCGIFMEDIA is retained for backwards compatibility.

- Changes to ifconfig to allow it to handle the new extended media types.

Submitted by:	mike@karels.net (original), hselasky
Reviewed by:	jfvogel, gnn, hselasky
Approved by:	jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965
2015-04-07 21:31:17 +00:00
Andrey V. Elsukov
b57d97215e Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().

This prevents panics when if_input() method called directly e.g.
from bpf(4) code.

PR:		192426
Reviewed by:	glebius
MFC after:	1 week
2015-03-12 14:55:33 +00:00
Hiroki Sato
c92a456b55 Fix group membership of cloned interfaces when one is moved by
if_vmove().

In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface.  When
detaching, if_delgroup() was called and the interface leaves all of
the group membership.  And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.

This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation.  However, if_vmove() removed the membership and did
not restore upon attachment.

Differential Revision:	https://reviews.freebsd.org/D1859
2015-03-02 20:00:03 +00:00
Alexander V. Chernikov
7f948f12f6 Finish r274175: do control plane MTU tracking.
Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR:		194238
MFC after:	1 month
CR:		D1125
2014-11-17 01:05:29 +00:00
Hans Petter Selasky
3c7c188c16 Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by:	Mellanox Technologies
Discussed with:	lstewart @
MFC after:	1 week
2014-11-11 12:05:59 +00:00
Gleb Smirnoff
4ea05db88e Use standard mtx(9), rwlock(9), sx(9) system initialization macros
instead of doing initialization manually.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-09 11:11:08 +00:00
Gleb Smirnoff
f4507b7166 ifindex_alloc_locked() never fails and doesn't have no-lock version,
so change the prototype.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-08 07:23:01 +00:00
Gleb Smirnoff
833e8dc5ab Remove struct arpcom. It is unused by most interface types, that allocate
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-07 15:14:10 +00:00
Gleb Smirnoff
e6abef0918 Remove useless structure ifindex_entry.
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-07 09:15:39 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Andrey V. Elsukov
61dc434406 Move if_get_counter initialization from if_attach into if_alloc.
Also, initialize all counters before ifnet will become available in the system.
This fixes possible access to uninitialized ifned fields.

PR:		194550
2014-10-23 14:29:52 +00:00
Gleb Smirnoff
112f50ffb2 Finally, convert counters in struct ifnet to counter(9).
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-28 08:57:07 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Gleb Smirnoff
56b61ca27a Remove ifq_drops from struct ifqueue. Now queue drops are accounted in
struct ifnet if_oqdrops.

Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops
is simply removed from them. There were no API to read this statistic.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-19 09:01:19 +00:00
Gleb Smirnoff
d2a707cdfa Remove a bunch of methods that are superseded by if_inc_counter().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 16:17:20 +00:00
Gleb Smirnoff
1b7fb1d93f While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 14:47:13 +00:00
Gleb Smirnoff
35853c2c60 Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.
2014-09-18 14:38:28 +00:00
Gleb Smirnoff
277e067a58 While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with:	melifaro
2014-09-18 10:01:56 +00:00
Gleb Smirnoff
0b7b006c7f Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 09:54:57 +00:00
Hans Petter Selasky
72f3100047 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
Hans Petter Selasky
eb93b77ae4 Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
Alan Somers
4f8585e021 Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
	Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
	Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
	versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
	Fixup calls of modified functions.

share/man/man9/ifnet.9
	Document changed API.

CR:		https://reviews.freebsd.org/D458
MFC after:	Never
Sponsored by:	Spectra Logic
2014-09-11 20:21:03 +00:00
Gleb Smirnoff
09a8241fc9 It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-31 12:48:13 +00:00
Gleb Smirnoff
e6485f73de o Remove struct if_data from struct ifnet. Now it is merely API structure
for route(4) socket and ifmib(4) sysctl.
o Move fields from if_data to ifnet, but keep all statistic counters
  separate, since they should disappear later.
o Provide function if_data_copy() to fill if_data, utilize it in routing
  socket and ifmib handler.
o Provide overridable ifnet(9) method to fetch counters. If no provided,
  if_get_counters_compat() would be used, that returns old counters.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-31 06:46:21 +00:00
Roger Pau Monné
af371fc66a net: move interface removal notification up in if_detach_internal
This is needed to prevent having interfaces with ifp->if_addr == NULL
on bridge interfaces. Moving the notification event handlers up makes
sure the interfaces are removed before doing any more cleanup.

Sponsored by: Citrix Systems R&D
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D598

net/if.c
 - Move interface removal notification up in if_detach_internal.
2014-08-16 10:47:24 +00:00
Gleb Smirnoff
9753faf553 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
Kevin Lo
c29a33213b Deprecate m_act. Use m_nextpkt always. 2014-07-17 05:21:16 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Marcel Moolenaar
62d76917b8 Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers.  This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1.  Juniper Networks - The network stack implementation in Junos
    is entirely different from FreeBSD's one and this change
    allows Juniper to build "stock" NIC drivers that can be used
    in combination with both the FreeBSD and Junos stacks.
2.  FreeBSD - This change opens the door towards changing ifnet
    and implementing new features and optimizations in the network
    stack without it requiring a change in the many NIC drivers
    FreeBSD has.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Reviewed by:	glebius@
Obtained from:	Juniper Networks, Inc.
2014-06-02 17:54:39 +00:00
Alan Somers
2f308a343f Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
Alexander V. Chernikov
36d55f0f9d Unify sa_equal() macro usage.
MFC after:	2 weeks
2014-04-26 14:52:03 +00:00
Alan Somers
0cfee0c223 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
Alan Somers
0489b8916e Fix host and network routes for new interfaces when net.add_addr_allfibs=0
sys/net/route.c
	In rtinit1, use the interface fib instead of the process fib.  The
	latter wasn't very useful because ifconfig(8) is usually invoked
	with the default process fib.  Changing ifconfig(8) to use setfib(2)
	would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear the expected ATF failure

sys/net/if.c
	Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
	Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
	in_scrubprefix.  Pass it the interface fib.

PR:		kern/187549
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-04-24 17:23:16 +00:00
Rick Macklem
209579aeac For NFS mounts using rsize,wsize=65536 over TSO enabled
network interfaces limited to 32 transmit segments, there
are two known issues.
The more serious one is that for an I/O of slightly less than 64K,
the net device driver prepends an ethernet header, resulting in a
TSO segment slightly larger than 64K. Since m_defrag() copies this
into 33 mbuf clusters, the transmit fails with EFBIG.
A tester indicated observing a similar failure using iSCSI.

The second less critical problem is that the network
device driver must copy the mbuf chain via m_defrag()
(m_collapse() is not sufficient), resulting in measurable overhead.

This patch reduces the default size of if_hw_tsomax
slightly, so that the first issue is avoided.
Fixing the second issue will require a way for the
network device driver to inform tcp_output() that it
is limited to 32 transmit segments.

Reported and tested by:	csforgeron@gmail.com, markus.gebert@hostpoint.ch
MFC after:	2 weeks
2014-04-17 23:31:50 +00:00
Bruce M Simpson
d565e5f206 In if_freemulti(), relax the paranoid KASSERT() on ifma->ifma_protospec.
This KASSERT() existed as a sanity check that upper layers in the network
stack (e.g. inet, inet6) had released their reference to the underlying
driver's multicast memberships (ifmultiaddr{}). However it assumes the
lifecycle of the driver membership corresponds to the lifecycle of the
network layer membership.

In the submitter's case, ieee80211_ioctl_updatemulti() attempts to
reprogram the (parent, physical) ifnet{} memberships in response
to a change in membership on the (child, virtual) VAP ifnet, using
a batched update mechanism. These updates happen independently from
the network layer, causing a "false negative" assertion failure.

There are possibly other use cases where this KASSERT() may be triggered
by other networking stack activity (e.g. where a nesting relationship
exists between multiple ifnet{} instances). This suggests that further
review of FreeBSD's approach to nested ifnet relationships is needed.

MFC after:	6 weeks
Submitted by:	adrian@
2014-04-10 18:43:02 +00:00
Alexander V. Chernikov
95fbe4d0cc Simplify filling sockaddr_dl structure for if_resolvemulti()
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
 with unmodified if_resolvemulti() callers).

MFC after:	2 weeks
2014-01-18 23:24:51 +00:00
Alexander V. Chernikov
955a2deb52 Remove dead code.
Reported by:	Coverity
Coverity CID:	1018057
MFC after:	2 weeks
2014-01-07 19:00:40 +00:00
Alexander V. Chernikov
50da3e886d Teach every SIOCGIFSTATUS provider to fill in ifs->ascii anyway.
Remove old bits of data concat for 'ascii' field.
Remove special SIOCGIFSTATUS handling from if.c (which Coverity yells at).

Reported by:	Coverity
Coverity CID:	1147174
MFC after:	2 weeks
2014-01-07 15:59:33 +00:00
Gleb Smirnoff
555036b5f6 Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
2013-11-11 05:39:42 +00:00
Gleb Smirnoff
af50ea380f Axe IFF_SMART. Fortunately this layering violating flag was never used,
it was just declared.
2013-11-05 12:52:56 +00:00
Gleb Smirnoff
5fb009bda7 Drop support for historic ioctls and also undefine them, so that code
that checks their presence via ifdef, won't use them.

Bump __FreeBSD_version as safety measure.
2013-11-05 10:29:47 +00:00
Gleb Smirnoff
9a6356bc31 In complemence to ifa_add_loopback_route() and ifa_del_loopback_route()
provide function ifa_switch_loopback_route() that will be used in case when
an interface address used for a loopback route goes away, but we have another
interface address with same address value and want to preserve loopback
route.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-11-05 07:36:17 +00:00
Gleb Smirnoff
7caf4ab7ac - Utilize counter(9) to accumulate statistics on interface addresses. Add
four counters to struct ifaddr. This kills '+=' on a variables shared
  between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
  right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
  for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
  rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
  took if_data not from the ifaddr, but from ifaddr's ifnet. [2]

Submitted by:	melifaro [1], pluknet[2]
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 11:37:57 +00:00
Gleb Smirnoff
67420bda02 Remove ifa_mtx. It was used only in one place in kernel, and ifnet's
ifaddr lock can substitute it there.

Discussed with:	melifaro, ae
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:41:22 +00:00
Gleb Smirnoff
4675896098 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
Dag-Erling Smørgrav
1a05c762b9 Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory.  [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks.  [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem.  [SA-13:13]

Security:	CVE-2013-5666
Security:	FreeBSD-SA-13:11.sendfile
Security:	CVE-2013-5691
Security:	FreeBSD-SA-13:12.ifioctl
Security:	CVE-2013-5710
Security:	FreeBSD-SA-13:13.nullfs
Approved by:	re
2013-09-10 10:05:59 +00:00
Craig Rodrigues
719fb72517 PR: 168520 170096
Submitted by: adrian, zec

Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.

(1)  Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
     device_attach().  This fixes multiple VIMAGE related kernel panics
     when trying to attach Bluetooth or USB Ethernet devices because
     curthread->td_vnet is NULL.

(2)  Set curthread->td_vnet in if_detach().  This fixes kernel panics when detaching networking
     interfaces, especially USB Ethernet devices.

(3)  Use VNET_DOMAIN_SET() in ng_btsocket.c

(4)  In ng_unref_node() set curthread->td_vnet.  This fixes kernel panics
     when detaching Netgraph nodes.
2013-07-15 01:32:55 +00:00
John Baldwin
8aa9937318 Fix build with both INET and INET6 disabled. 2013-06-04 20:40:16 +00:00
Andre Oppermann
3c914c547e Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore.  The upper limit is still at
IP_MAXPACKET (65536 bytes).  Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found.  When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by:	cperciva (earlier version)
Reviewed by:	cperciva
Tested by:	cperciva
MFC after:	1 week (using spare struct members to preserve ABI)
2013-06-03 12:55:13 +00:00
Andre Oppermann
f89d4c3acf Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
2013-05-06 16:42:18 +00:00
Gleb Smirnoff
47e8d432d5 Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-26 12:50:32 +00:00
Andre Oppermann
e8b3186b6a Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.
2013-04-09 21:02:20 +00:00
Alexander V. Chernikov
3034f43f2f Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00