Commit Graph

220 Commits

Author SHA1 Message Date
np
057d736604 Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.

Submitted by:	Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D4801
2016-01-22 23:33:34 +00:00
melifaro
e4b451498b Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
  to check if this particular mbuf needs to be dropped (and what
  error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
  regardless of RTF_HOST flag existence. This is due to upcoming routing
  changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
  to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
  error value. In fact, the only place where this result is propagated
  is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
  RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
  rte flags to ro_flags. Call this function in ip_output() after looking up/
  verifying rte.

Reviewed by:	ae
2016-01-09 16:34:37 +00:00
glebius
e25e77f91d Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.

Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
2016-01-08 19:03:20 +00:00
hselasky
3d1ba99d7b Remove unused file. 2016-01-07 09:40:19 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
hselasky
55287fc50d Fix i386 build WITH_OFED=YES. Remove some redundant KASSERTs.
Suggested by:	kib, ian
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-12-04 18:20:55 +00:00
ngie
d838deab97 Fix scope of bridge_header and bridge_pcix_cap in mthca_reset(..)
They're only used in the __linux__ case

Differential Revision: https://reviews.freebsd.org/D4332
MFC after: 1 week
Reported by: cppcheck
Reviewed by: hselasky
Sponsored by: EMC / Isilon Storage Division
2015-12-04 09:01:58 +00:00
hselasky
4fe2405f36 Convert the mlxen driver to use the BUSDMA(9) APIs instead of
vtophys() when loading mbufs for transmission and reception. While at
it all pointer arithmetic and cast qualifier issues were fixed, mostly
related to transmission and reception.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4284
2015-12-03 14:56:17 +00:00
hselasky
d01f6eada0 Updated the mlx4 and mlxen drivers to the latest version, v2.1.6:
- Added support for dumping the SFP EEPROM content to dmesg.
- Fixed handling of network interface capability IOCTLs.
- Fixed race when loading and unloading the mlxen driver by applying
  appropriate locking.
- Removed two unused C-files.

MFC after:	1 week
Submitted by:	Mark Bloch <markb@mellanox.com>
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4283
2015-12-03 13:29:20 +00:00
hselasky
729fe703d7 Add some defines needed by the coming mlx5 infiniband support.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-11-24 12:11:56 +00:00
ngie
42f62a277c Don't leak work if __mlx4_register_vlan(..) fails in
mlx4_master_immediate_activate_vlan_qos(..)

MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D4203
Submitted by: Miles Olrich <miles.olrich@isilon.com>
Sponsored by: EMC / Isilon Storage Division
2015-11-19 01:08:16 +00:00
hselasky
109297af09 Fix integer to pointer of different size conversion warnings when
using GCC for 32-bit platforms. The integer size in this case is
hardcoded 64-bit while the pointer size is 32-bit.

Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 10:12:20 +00:00
hselasky
ee5ff7b1cd Fix print formatting compile warnings for Sparc64 and PowerPC platforms.
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 09:56:25 +00:00
hselasky
97b71ce545 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
hselasky
e32aa14698 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-27 12:21:15 +00:00
hselasky
a165c14ffd Add support for binding IRQs to CPUs in the LinuxKPI. The new function
added is for BSD only and does not exist in Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-10-26 13:28:34 +00:00
hselasky
e0346e7915 Build fix for MIPS.
Sponsored by:	Mellanox Technologies
2015-10-26 09:34:43 +00:00
hselasky
e07514e0fe Build fix for non-i386 and non-amd64 platforms.
Sponsored by:	Mellanox Technologies
2015-10-23 14:52:05 +00:00
hselasky
1025a58857 Rename linuxapi[.ko] into linuxkpi[.ko], to reflect that it is a
kernel programming interface module, KPI, to avoid confusion with the
existing Linux userspace binary compatibility shims. Bump the
FreeBSD_version number.

Reviewed by:	np @
Suggested by:	dumbbell @
Sponsored by:	Mellanox Technologies
2015-10-22 09:50:45 +00:00
hselasky
06414daf3a Remove all comments deriving from Linux.
Minor rework of ilog2() function.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 09:37:34 +00:00
hselasky
093c8bb583 Remove all comments deriving from Linux. Style file for FreeBSD.
Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 08:51:49 +00:00
hselasky
306eaf5312 Reimplement header file, remove all comments deriving from Linux and
update copyright to 2-clause BSD.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 07:59:46 +00:00
hselasky
623abddf22 Move location of RCS keyword according to style.
Suggested by:	jhb @
Sponsored by:	Mellanox Technologies
2015-10-20 19:08:26 +00:00
hselasky
9464c2c2e1 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 16:02:11 +00:00
hselasky
046ed6409c Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 15:28:02 +00:00
hselasky
a038e15ff5 Add missing dash to copyright clause.
Sponsored by:	Mellanox Technologies
2015-10-20 11:42:00 +00:00
hselasky
f8fd14ec87 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 11:40:04 +00:00
hselasky
7c61600dca Merge LinuxKPI changes from DragonflyBSD:
- Remove redundant NBLONG macro and use BIT_WORD()
  and BIT_MASK() instead.
- Correctly define BIT_MASK() according to Linux and
  update all users of this macro.
- Add missing GENMASK() macro.
- Remove all comments deriving from Linux.

Sponsored by:	Mellanox Technologies
2015-10-20 09:13:35 +00:00
hselasky
0f9392fd20 The returned value from vm_fault_disable_pagefaults() must be stored
and passed to vm_fault_enable_pagefaults(). Else possible recursion on
the state can be lost.

Sponsored by:	Mellanox Technologies
Suggested by:	kib @
2015-10-19 16:03:08 +00:00
hselasky
12df20ee74 Merge LinuxKPI changes from DragonflyBSD:
- Redefine DIV_ROUND_UP as a function macro taking two arguments
  instead of none.
- Implement more Linux kernel functions related to various forms
  of DELAY() and basic mathematical operations.

Sponsored by:	Mellanox Technologies
2015-10-19 12:44:41 +00:00
hselasky
e0d62d0664 Merge LinuxKPI changes from DragonflyBSD:
- Implement more Linux kernel functions.

Sponsored by:	Mellanox Technologies
2015-10-19 12:33:09 +00:00
hselasky
fd834c513a Merge LinuxKPI changes from DragonflyBSD:
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.

Reviewed by:	np @
Sponsored by:	Mellanox Technologies
2015-10-19 12:26:38 +00:00
hselasky
c2a6426cb9 Merge LinuxKPI changes from DragonflyBSD:
- Map more Linux compiler related defines to FreeBSD ones.

Sponsored by:	Mellanox Technologies
2015-10-19 12:08:06 +00:00
hselasky
a2fecac047 Map two more Linux error return codes to FreeBSD ones.
Sponsored by:	Mellanox Technologies
2015-10-19 12:04:20 +00:00
hselasky
5daa6ae96e Implement IS_ERR_OR_NULL() function.
Sponsored by:	Mellanox Technologies
2015-10-19 12:00:52 +00:00
hselasky
c454e1ce47 Merge LinuxKPI changes from DragonflyBSD:
- Add more list related functions and macros.
- Update the hlist_for_each_entry() macro to take one less argument.

Sponsored by:	Mellanox Technologies
2015-10-19 11:57:33 +00:00
hselasky
eca5e168a9 Merge LinuxKPI changes from DragonflyBSD:
- Reimplement ktime header file to distinguish more from Linux.
- Add new time header file to handle time related Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 11:46:48 +00:00
hselasky
5855be3ec6 Fix compile warning.
Sponsored by:	Mellanox Technologies
2015-10-19 11:29:50 +00:00
hselasky
c2db9fa30b Merge LinuxKPI changes from DragonflyBSD:
- Reimplement math64 header file to distinguish more from Linux.

Sponsored by:	Mellanox Technologies
2015-10-19 11:16:38 +00:00
hselasky
109f18ddeb Merge LinuxKPI changes from DragonflyBSD:
- Whitespace fixes.

Sponsored by:	Mellanox Technologies
2015-10-19 11:11:15 +00:00
hselasky
bed28ab60d Merge LinuxKPI changes from DragonflyBSD:
- Avoid using PAGE_MASK, because Linux defines it differently.
  Use (PAGE_SIZE - 1) instead.
- Add support for for_each_sg_page() and sg_page_iter_dma_address().

Sponsored by:	Mellanox Technologies
2015-10-19 11:09:51 +00:00
hselasky
c58bf9145b Merge LinuxKPI changes from DragonflyBSD:
- Implement schedule_timeout().

Sponsored by:	Mellanox Technologies
2015-10-19 10:57:56 +00:00
hselasky
6760765480 Merge LinuxKPI changes from DragonflyBSD:
- Implement pagefault_disable() and pagefault_enable().

Sponsored by:	Mellanox Technologies
2015-10-19 10:56:32 +00:00
hselasky
74fb822e8a Merge LinuxKPI changes from DragonflyBSD:
- Added support for multiple new Linux functions.
- Properly implement DEFINE_WAIT() and init_waitqueue_head() macros.
- Removed FreeBSD specific __wait_queue_head structure definition.

Sponsored by:	Mellanox Technologies
2015-10-19 10:54:24 +00:00
hselasky
5bb43cdb31 Merge LinuxKPI changes from DragonflyBSD:
- Some minor whitespace fixes.
- Added support for two new Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 10:49:15 +00:00
melifaro
1e639f75e6 Fix build broken by r287861.
Spotted by:	zb
2015-09-16 15:40:08 +00:00
melifaro
493325342d Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
markj
20a6072662 Ensure that the MAD agent's delayed taskqueue is completely stopped
before proceeding. Otherwise, nothing prevents it from running after the
MAD agent struct has been been freed, and this results in a use-after-free
when the task's ta_pending count is incremented in the callout handler.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-09-15 23:56:31 +00:00
jhb
61b7ea7a3d Currently the Linux character device mmap handling only supports mmap
operations that map a single page that has an associated vm_page_t.
This does not permit mapping larger regions (such as a PCI memory
BAR) and it does not permit mapping addresses beyond the top of RAM
(such as a 64-bit BAR located above the top of RAM).

Instead of using a single OBJT_DEVICE object and passing the physaddr via
the offset as a hack, create a new sglist and OBJT_SG object for each
mmap request. The requested memory attribute is applied to the object
thus affecting all pages mapped by the request.

Reviewed by:	hselasky, np
MFC after:	1 week
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D3386
2015-09-03 18:27:39 +00:00
np
cf4bfabada Reinstate unify_tcp_port_space and associated code that was lost during
the last OFED update (r278886).

iWARP on FreeBSD is properly integrated with the network stack and the
iWARP drivers _never_ operate out of any private TCP port-space that is
invisible to the kernel.  Instead, an iWARP connection shows up as a TCP
socket (which is what it is) fully visible to the kernel and standard
tools like netstat, sockstat, etc.
2015-08-12 22:09:58 +00:00