3111 Commits

Author SHA1 Message Date
hrs
d91e3630bb Fix a panic in SIOCSLAGG and SIOCGLAGGOPTS. This was caused by a
wrongly-MFC'd patch in r287723.

Pointy hat to:	hrs
2015-09-21 18:32:43 +00:00
hiren
c6c0cd4557 MFC r286700
Make LAG LACP fast timeout tunable through IOCTL.
2015-09-15 05:19:10 +00:00
hrs
349e17a73a MFC 287607:
- Remove GIF_{SEND,ACCEPT}_REVETHIP.
- Simplify EADDRNOTAVAIL and EAFNOSUPPORT conditions.
2015-09-13 01:35:40 +00:00
hrs
8bd36880a4 MFC 272159,272161,272386,272446,272547,272548,273210:
- Make lagg protos a enum.

- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
  then drop lock, run the attach routines, and then set it to specific
  proto. This removes tons of WITNESS warnings.

- Make lagg protocol attach handlers not failing and allocate memory
  with M_WAITOK.

- Virtualize lagg(4) cloner.  This change fixes a panic when tearing down
  if_lagg(4) interfaces which were cloned in a vnet jail.

  Sysctl nodes which are dynamically generated for each cloned interface
  (net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
  ifconfig(8) parameters have been added instead.  Flags and per-interface
  statistics counters are displayed in "ifconfig -v".

- Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
  backward compatibility with old ifconfig(8).

- Move L2 addr configuration for the primary port to a taskqueue.  This fixes
  LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK().  This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.

- Use printb() for boolean flags in ro_opts and actor_state for LACP.

- Fix lladdr configuration which could prevent LACP mode from working.

- Fix LORs when a laggport interface has an IPv6 LLA.
2015-09-12 20:36:39 +00:00
hrs
72c1e2950c MFC r272889 and r287402:
- Virtualize if_epair(4).  An if_xname check for both "a" and "b" interfaces
  is added to return EEXIST when only "b" interface exists---this can happen
  when epair<N>b is moved to a vnet jail and then "ifconfig epair<N> create"
  is invoked there.

- Fix a panic which was reproducible by an infinite loop of
  "ifconfig epair0 create && ifconfig epair0a destroy".
  This was caused by an uninitialized function pointer in
  softc->media.
2015-09-09 08:52:39 +00:00
loos
47eb9e91e4 MFC r286260:
Remove the mtx_sleep() from the kqueue f_event filter.

  The filter is called from the network hot path and must not sleep.

  The filter runs with the descriptor lock held and does not manipulate the
  buffers, so it is not necessary sleep when the hold buffer is in use.

  Just ignore the hold buffer contents when it is being copied to user space
  (when hold buffer in use is set).

  This fix the "Sleeping thread owns a non-sleepable lock" panic when the
  userland thread is too busy reading the packets from bpf(4).

  PR:           200323
  Sponsored by: Rubicon Communications (Netgate)
2015-08-17 19:06:14 +00:00
loos
21e0cf5c4b MFC r286140:
Remove the sleep from the buffer allocation routine.

  The buffer must be allocated (or even changed) before the interface is set
  and thus, there is no need to verify if the buffer is in use.

MFC r286142:
  Remove two unnecessary sleeps from the hot path in bpf(4).

  The first one never triggers because bpf_canfreebuf() can only be true for
  zero-copy buffers and zero-copy buffers are not read with read(2).

  The second also never triggers, because we check the free buffer before
  calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
  will be NULL and there is nothing else to do besides drop the packet.  If
  the free buffer isn't NULL the hold buffer _is_ free and it is safe to
  rotate the buffers.

  Update the comment in ROTATE_BUFFERS macro to match the logic described
  here.

  While here fix a few typos in comments.

MFC r286243:
  Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
  buffers while the hold buffer is in use).

  Sponsored by: Rubicon Communications (Netgate)
2015-08-17 18:43:39 +00:00
loos
f69a7374f0 MFC r286139:
Do not allocate the buffers at opening of the descriptor, because once
  the buffer is allocated we are committed to a particular buffer method
  (BPF_BUFMODE_BUFFER in this case).

  If we are using zero-copy buffers, the userland program must register its
  buffers before set the interface.

  If we are using kernel memory buffers, we can allocate the buffer at the
  time that the interface is being set.

  This fix allows the usage of BIOCSETBUFMODE after r235746.

  Update the comments to reflect the recent changes.

  Sponsored by: Rubicon Communications (Netgate)
2015-08-17 18:21:18 +00:00
melifaro
a91e3ef58a MFC r270064,r270068,r270069,r270115,r270129,r270287,r270822,r271014,
r271524,r273541,r282967,r283009,r283364.

Add support for reading i2c SFP/SFP+ data from NIC driver and
presenting most interesting fields via ifconfig -v.
This version supports Intel ixgbe driver only.

Tested on:      Cisco,Intel,Mellanox,ModuleTech,Molex transceivers

* Add new net/sff8436.h containing constants used to access
  QSFP+ data via i2c inteface. These constants has been taken
  from SFF-8436 "QSFP+ 10 Gbs 4X PLUGGABLE TRANSCEIVER" standard
  rev 4.8.
* Add support for printing QSFP+ information from 40G NICs
  such as Chelsio T5.

Example:
cxl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,.....>
        ether 00:07:43:28:ad:08
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet 40Gbase-LR4 <full-duplex>
        status: active
        plugged: QSFP+ 40GBASE-LR4 (MPO Parallel Optic)
        vendor: OEM PN: OP-QSFP-40G-LR4 SN: 20140318001 DATE: 2014-03-18
        module temperature: 64.06 C voltage: 3.26 Volts
        lane 1: RX: 0.47 mW (-3.21 dBm) TX: 2.78 mW (4.46 dBm)
        lane 2: RX: 0.20 mW (-6.94 dBm) TX: 2.80 mW (4.47 dBm)
        lane 3: RX: 0.18 mW (-7.38 dBm) TX: 2.79 mW (4.47 dBm)
        lane 4: RX: 0.90 mW (-0.45 dBm) TX: 2.80 mW (4.48 dBm)

Tested on:      Chelsio T5
Tested on:      Mellanox/Huawei passive/active cables/transceivers.

Sponsored by:   Yandex LLC
2015-08-15 17:52:55 +00:00
glebius
2a5be58b24 Merge 280169: always lock the hash row of a source node when updating
its 'states' counter.

PR:		182401
2015-07-28 09:13:55 +00:00
hrs
123cf5c769 MFC r279538:
Fix group membership of cloned interfaces when one is moved by
if_vmove().

In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface.  When
detaching, if_delgroup() was called and the interface leaves all of
the group membership.  And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.

This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation.  However, if_vmove() removed the membership and did
not restore upon attachment.

Approved by:	re (gjb)
2015-07-23 19:57:47 +00:00
pkelsey
bc26e9b8dc MFC r285190:
Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt.  As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt.  Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.

Approved by: re
2015-07-15 16:57:40 +00:00
kp
f1fbe0ce5b MFC r284348: Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:         200210
2015-07-01 21:21:14 +00:00
kp
7d05cb134c Merge r278874, r278925, r278868
- Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
- Even more fixes to !INET and !INET6 kernels.
  In collaboration with pluknet
- Toss declarations to fix regular build and NO_INET6 build.

Differential Revision:	https://reviews.freebsd.org/D2823
Reviewed by:	gnn
2015-06-18 21:21:52 +00:00
kp
83b6287db4 Merge r278843, r278858
In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Differential Revision:	https://reviews.freebsd.org/D2816
Reviewed by:	gnn
2015-06-18 20:34:39 +00:00
kp
de79c168eb Merge r278831, r278834
Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling.

Differential Revision:	https://reviews.freebsd.org/D2814
Reviewed by:	gnn
2015-06-18 20:28:52 +00:00
bryanv
dfb124acf0 MFC r273331, r273371, r275851:
- Add vxlan interface

 - Use the size of the Ethernet address, not the entire header, when
   copying into forwarding entry.

 - Prefix all the vxlan ifconfig commands so they are unique
2015-06-14 03:14:45 +00:00
ae
49fd76c05c MFC r282809:
Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated
  packets on tunnel interfaces. Add support of these ioctls to gre(4),
  gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use
  if_fib value from ifnet structure, use proper value in gre(4) and me(4).

  Differential Revision:	https://reviews.freebsd.org/D2462
2015-06-06 13:37:11 +00:00
ae
ad4eef6e15 MFC r276902,282536:
Pass mtag argument into m_tag_locate() to continue the search from
  the last found mtag.
2015-06-06 13:29:41 +00:00
ae
f1be259e6a MFC r276148:
Remove in_gif.h and in6_gif.h files. They only contain function
  declarations used by gif(4). Instead declare these functions in C files.
  Also make some variables static.

MFC r276215:
  Extern declarations in C files loses compile-time checking that
  the functions' calls match their definitions. Move them to header files.
2015-06-06 13:26:13 +00:00
ae
920800a21f MFC r274246:
Overhaul if_gre(4).

  Split it into two modules: if_gre(4) for GRE encapsulation and
  if_me(4) for minimal encapsulation within IP.

  gre(4) changes:
  * convert to if_transmit;
  * rework locking: protect access to softc with rmlock,
    protect from concurrent ioctls with sx lock;
  * correct interface accounting for outgoing datagramms (count only payload size);
  * implement generic support for using IPv6 as delivery header;
  * make implementation conform to the RFC 2784 and partially to RFC 2890;
  * add support for GRE checksums - calculate for outgoing datagramms and check
    for inconming datagramms;
  * add support for sending sequence number in GRE header;
  * remove support of cached routes. This fixes problem, when gre(4) doesn't
    work at system startup. But this also removes support for having tunnels with
    the same addresses for inner and outer header.
  * deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
    Use our standard ioctls for tunnels.

  me(4):
  * implementation conform to RFC 2004;
  * use if_transmit;
  * use the same locking model as gre(4);

  PR:		164475

MFC r274289 (by bz):
  gcc requires variables to be initialised in two places.  One of them
  is correctly  used only under the same conditional though.

  For module builds properly check if the kernel supports INET or INET6,
  as otherwise various mips kernels without IPv6 support would fail to build.

MFC r274964:
  Add ip_gre.h to ObsoleteFiles.inc.
2015-06-06 12:44:42 +00:00
ae
c84e575eec MFC r271918 (by hrs):
- Virtualize interface cloner for gre(4).  This fixes a panic when destroying
    a vnet jail which has a gre(4) interface.

  - Make net.link.gre.max_nesting vnet-local.
2015-06-05 08:10:08 +00:00
ae
8272d42d32 MFC r282965:
Add an ability accept encapsulated packets from different sources by one
  gif(4) interface. Add new option "ignore_source" for gif(4) interface.
  When it is enabled, gif's encapcheck function requires match only for
  packet's destination address.

  Differential Revision:	https://reviews.freebsd.org/D2004
  Sponsored by:	Yandex LLC
2015-05-31 22:58:41 +00:00
erj
d390788aa7 MFC r281236 -- extended media types in if_media.h.
Approved by:	jfv (mentor)
2015-05-29 23:02:12 +00:00
hiren
c3228a95e4 MFC r281984:
Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need
to specify exact number of cpus in loader.conf which get annoying when you have
mix of machines which don't have equal number of total cpus. I propose "-1" as
that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all
available cpus.

Sponsored by:	Limelight Networks
2015-05-13 08:04:50 +00:00
gnn
431f70d695 MFC: 281558
Minor change to the macros to make sure that if an AF is passed that is neither AF_INET6 nor AF_INET that we don't touch random bits of memory.
2015-05-09 19:43:48 +00:00
mav
0da2a8acde MFC r281765:
Activate write-only optimization if bpf device opened with O_WRONLY.

dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.

Sponsored by:	iXsystems, Inc.
2015-05-04 19:33:51 +00:00
hiren
b09afc6f3f MFC r275358 r275483 r276982 - Removing M_FLOWID by hps@
r275358:
Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

r275483:
Remove M_FLOWID from SCTP code.

r276982:
Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr
manpage.

Note: The FreeBSD version has been bumped.

Reviewed by:    hps, tuexen
Sponsored by:   Limelight Networks
2015-04-24 23:26:44 +00:00
ae
24836ef695 MFC r279920:
Add if_input_default() method, that will be used for if_input
  initialization, when no input method specified before if_attach().

  This prevents panics when if_input() method called directly e.g.
  from bpf(4) code.

  PR:		192426
2015-03-19 13:10:09 +00:00
luigi
6e901283bf sync with the version in head (r274338):
fix one comment, and return kernel-supplied error if available.
no API changes.
2015-02-14 19:18:56 +00:00
ae
efbb33d3cf MFC r277295:
Fix condition and really sort ports. Also add comment describing
  the intent of this code.
2015-01-25 16:35:03 +00:00
ae
45e30f880b MFC r276901:
Move the recursion detection code into separate function
  gif_check_nesting(). Also make MTAG_GIF definition private to if_gif.c.

MFC r276907:
  Restore Ethernet-within-IP Encapsulation support that was broken after
  r273087. Move all checks from gif_output() into gif_transmit(). Previously
  they were checked always, because if_start always called gif_output.
  Now gif_transmit() can be called directly from if_bridge() code and we need
  do checks here.

  PR:		196646
2015-01-17 11:43:13 +00:00
ae
7a82e24551 MFC r273087 (with modifications):
Overhaul if_gif(4):
   o convert to if_transmit;
   o use rmlock to protect access to gif_softc;
   o use sx lock to protect from concurrent ioctls;
   o remove a lot of unneeded and duplicated code;
   o remove cached route support (it won't work with concurrent io);
   o style fixes.

MFC r273090:
  Move memset under ifdef INET6.

MFC r273091:
  Add more ifdefs. SIOC*_IN6 are defined only with INET6.

MFC r273121:
  Add inet/inet6 to the dependency list. Without them if_gif is useless.

MFC r273209 by bz:
  After r273087,r273090,r273091,r273121 changes to gif(4) try to fix
  NOIP builds for real.

MFC r273587:
  Remove redundant check and m_pullup() call.
2014-12-23 16:33:44 +00:00
ae
57be9990bd Add if_inc_counter() and if_get_counter_default() functions that do
access to ifnet counters for code compatibility with FreeBSD 11.

This is direct commit to stable/10.

Discussed with:	glebius@, arch@
2014-12-23 09:39:40 +00:00
ae
9a4e55b147 MFC r271917 by hrs:
Virtualize interface cloner for gif(4).  This fixes a panic when destroying
  a vnet jail which has a gif(4) interface.
2014-12-22 17:54:26 +00:00
ae
3449c92ea5 MFC r258167:
ANSIfy function defintions.
2014-12-22 17:32:13 +00:00
ae
3e533b7379 MFC r275394:
Remove unneded check. No need to do m_pullup to the size that we prepended.

Sponsored by:	Yandex LLC
2014-12-16 11:53:45 +00:00
hselasky
9fcf944d2a MFC r274376:
Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by:	Mellanox Technologies
2014-11-19 09:03:12 +00:00
kib
e4b2ee7e2b Merge the fueword(9) and casueword(9). In particular,
MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.

MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.

MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used.  The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.

MFC r273788 (by jkim):
Actually install casuword(9) to fix build.

MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).
2014-11-18 12:53:32 +00:00
hselasky
fa183f0174 MFC r271946 and r272595:
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.

Sponsored by:	Mellanox Technologies
2014-11-03 12:38:29 +00:00
ae
33d2961d9a MFC r272770:
When tunneling interface is going to insert mbuf into netisr queue after stripping
  outer header, consider it as new packet and clear the protocols flags.

  This fixes problems when IPSEC traffic goes through various tunnels and router
  doesn't send ICMP/ICMPv6 errors.

PR:		174602
Sponsored by:	Yandex LLC
2014-10-30 13:53:57 +00:00
hselasky
1d17f744c7 MFC r273733, r273740 and r273773:
The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

Sponsored by:	Mellanox Technologies
2014-10-30 08:04:48 +00:00
hselasky
1f41d295fb MFC r263710, r273377, r273378, r273423 and r273455:
- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by:	Mellanox Technologies
2014-10-27 14:38:00 +00:00
glebius
9ea3e68626 Merge r272385 by melifaro from head:
Free radix mask entries on main radix destroy.
  This is temporary commit to be merged to 10.
  Other approach (like hash table) should be used
  to store different masks.

PR:             194078
2014-10-16 20:46:02 +00:00
ae
f7ad542948 MFC r272176:
Keep list of lagg ports sorted by if_index.
2014-10-07 07:52:47 +00:00
asomers
f906790c87 MFC r265232
Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed.
The thread that is destroying the lagg has already set sc->sc_psc=NULL when
the "ifconfig -am" thread gets to lacp_req().  It tries to dereference
sc->sc_psc and panics.  The solution is for lacp_req() to check the value of
sc->sc_psc.  If NULL, harmlessly return an lacp_opreq structure full of
zeros.  Full details in GNATS.

PR:	189003
2014-10-06 23:17:01 +00:00
glebius
3722b178a3 Merge r269998 from head:
- Count global pf(4) statistics in counter(9).
  - Do not count global number of states and of src_nodes,
    use uma_zone_get_cur() to obtain values.
  - Struct pf_status becomes merely an ioctl API structure,
    and moves to netpfil/pf/pf.h with its constants.
  - V_pf_status is now of type struct pf_kstatus.

  Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
  Sponsored by: InnoGames GmbH
2014-08-25 15:40:37 +00:00
np
c11c6b7951 Update a couple of header files that were missed in r270252. This is a
direct commit to stable/10.

Submitted by:	luigi
2014-08-21 19:42:03 +00:00
mav
0959ad1632 MFC r269492:
Improve locking of multicast addresses in VLAN and LAGG interfaces.

This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.
2014-08-18 15:54:35 +00:00
kevlo
f112206e5a MFC r268787:
Deprecate m_act.  Use m_nextpkt always.
2014-07-24 06:02:03 +00:00