Commit Graph

275 Commits

Author SHA1 Message Date
hselasky
64d6d868e8 Use hardware computed Toeplitz hash for incoming flowids
Use the Toeplitz hash value as source for the flowid. This makes the
hash value more suitable for so-called hash bucket algorithms which
are used in the FreeBSD's TCP/IP stack when RSS is enabled.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-03-15 15:53:37 +00:00
hselasky
659fbd3907 Fix witness panic in the ipoib_ioctl() function when unloading the
ipoib module.

The bpfdetach() function is trying to turn off promiscious mode on the
network interface it is attached to while holding a mutex. The fix
consists of ignoring any further calls to the ipoib_ioctl() function
when the network interface is going to be detached. The ipoib_ioctl()
function might sleep.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-03-15 15:47:26 +00:00
jhb
374ffbde40 Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem.
Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5515
2016-03-11 23:18:06 +00:00
hselasky
2c58c1a4c4 Whitespace fixes.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-03-04 09:07:30 +00:00
hselasky
c9e96da515 LinuxKPI list updates:
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
  iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
  to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
  to reflect they are LinuxKPI internal functions.

Obtained from:	Linux
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2016-01-26 15:12:31 +00:00
np
057d736604 Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.

Submitted by:	Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D4801
2016-01-22 23:33:34 +00:00
melifaro
e4b451498b Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
  to check if this particular mbuf needs to be dropped (and what
  error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
  regardless of RTF_HOST flag existence. This is due to upcoming routing
  changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
  to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
  error value. In fact, the only place where this result is propagated
  is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
  RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
  rte flags to ro_flags. Call this function in ip_output() after looking up/
  verifying rte.

Reviewed by:	ae
2016-01-09 16:34:37 +00:00
glebius
e25e77f91d Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.

Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
2016-01-08 19:03:20 +00:00
hselasky
3d1ba99d7b Remove unused file. 2016-01-07 09:40:19 +00:00
melifaro
93152c67c9 Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating
  various link headers for given interface. Right now there is support
  for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
  Other types are planned to support more complex calculation
  (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
  to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
  (like L2 header for route w/gateway) down to the stack eliminating the
  need for other lookups. It also brings us closer to more complex scenarios
  like transparently handling MPLS nexthops and tunnel interfaces.
  Last, but not least, it removes layering violation introduced by flowtable
  code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
  record. Interface link address change are handled by re-calculating
  headers for all lles based on if_lladdr event. After these changes,
  arpresolve()/nd6_resolve() returns full pre-calculated header for
  supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
  returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
  compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
  difference is that interface source mac has to be filled by OS for
  AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
  BPF and not pollute if_output() routines. Convert BPF to pass prepend data
  via new 'struct route' mechanism. Note that it does not change
  non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
  It is not needed for ethernet anymore. The only remaining FDDI user is
  dev/pdq mostly untouched since 2007. FDDI support was eliminated from
  OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
  Flowtable violates layering by saving (and not correctly managing)
  rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
  header data from that lle.

Differential Revision:	https://reviews.freebsd.org/D4102
2015-12-31 05:03:27 +00:00
hselasky
55287fc50d Fix i386 build WITH_OFED=YES. Remove some redundant KASSERTs.
Suggested by:	kib, ian
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-12-04 18:20:55 +00:00
ngie
d838deab97 Fix scope of bridge_header and bridge_pcix_cap in mthca_reset(..)
They're only used in the __linux__ case

Differential Revision: https://reviews.freebsd.org/D4332
MFC after: 1 week
Reported by: cppcheck
Reviewed by: hselasky
Sponsored by: EMC / Isilon Storage Division
2015-12-04 09:01:58 +00:00
hselasky
4fe2405f36 Convert the mlxen driver to use the BUSDMA(9) APIs instead of
vtophys() when loading mbufs for transmission and reception. While at
it all pointer arithmetic and cast qualifier issues were fixed, mostly
related to transmission and reception.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4284
2015-12-03 14:56:17 +00:00
hselasky
d01f6eada0 Updated the mlx4 and mlxen drivers to the latest version, v2.1.6:
- Added support for dumping the SFP EEPROM content to dmesg.
- Fixed handling of network interface capability IOCTLs.
- Fixed race when loading and unloading the mlxen driver by applying
  appropriate locking.
- Removed two unused C-files.

MFC after:	1 week
Submitted by:	Mark Bloch <markb@mellanox.com>
Sponsored by:	Mellanox Technologies
Differential Revision:	https://reviews.freebsd.org/D4283
2015-12-03 13:29:20 +00:00
hselasky
729fe703d7 Add some defines needed by the coming mlx5 infiniband support.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2015-11-24 12:11:56 +00:00
ngie
42f62a277c Don't leak work if __mlx4_register_vlan(..) fails in
mlx4_master_immediate_activate_vlan_qos(..)

MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D4203
Submitted by: Miles Olrich <miles.olrich@isilon.com>
Sponsored by: EMC / Isilon Storage Division
2015-11-19 01:08:16 +00:00
hselasky
109297af09 Fix integer to pointer of different size conversion warnings when
using GCC for 32-bit platforms. The integer size in this case is
hardcoded 64-bit while the pointer size is 32-bit.

Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 10:12:20 +00:00
hselasky
ee5ff7b1cd Fix print formatting compile warnings for Sparc64 and PowerPC platforms.
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
2015-11-12 09:56:25 +00:00
hselasky
97b71ce545 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
hselasky
e32aa14698 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-27 12:21:15 +00:00
hselasky
a165c14ffd Add support for binding IRQs to CPUs in the LinuxKPI. The new function
added is for BSD only and does not exist in Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-10-26 13:28:34 +00:00
hselasky
e0346e7915 Build fix for MIPS.
Sponsored by:	Mellanox Technologies
2015-10-26 09:34:43 +00:00
hselasky
e07514e0fe Build fix for non-i386 and non-amd64 platforms.
Sponsored by:	Mellanox Technologies
2015-10-23 14:52:05 +00:00
hselasky
1025a58857 Rename linuxapi[.ko] into linuxkpi[.ko], to reflect that it is a
kernel programming interface module, KPI, to avoid confusion with the
existing Linux userspace binary compatibility shims. Bump the
FreeBSD_version number.

Reviewed by:	np @
Suggested by:	dumbbell @
Sponsored by:	Mellanox Technologies
2015-10-22 09:50:45 +00:00
hselasky
06414daf3a Remove all comments deriving from Linux.
Minor rework of ilog2() function.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 09:37:34 +00:00
hselasky
093c8bb583 Remove all comments deriving from Linux. Style file for FreeBSD.
Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 08:51:49 +00:00
hselasky
306eaf5312 Reimplement header file, remove all comments deriving from Linux and
update copyright to 2-clause BSD.

Suggested by:	emaste @
Sponsored by:	Mellanox Technologies
2015-10-21 07:59:46 +00:00
hselasky
623abddf22 Move location of RCS keyword according to style.
Suggested by:	jhb @
Sponsored by:	Mellanox Technologies
2015-10-20 19:08:26 +00:00
hselasky
9464c2c2e1 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 16:02:11 +00:00
hselasky
046ed6409c Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 15:28:02 +00:00
hselasky
a038e15ff5 Add missing dash to copyright clause.
Sponsored by:	Mellanox Technologies
2015-10-20 11:42:00 +00:00
hselasky
f8fd14ec87 Add missing FreeBSD RCS keyword and SVN properties.
Sponsored by:	Mellanox Technologies
2015-10-20 11:40:04 +00:00
hselasky
7c61600dca Merge LinuxKPI changes from DragonflyBSD:
- Remove redundant NBLONG macro and use BIT_WORD()
  and BIT_MASK() instead.
- Correctly define BIT_MASK() according to Linux and
  update all users of this macro.
- Add missing GENMASK() macro.
- Remove all comments deriving from Linux.

Sponsored by:	Mellanox Technologies
2015-10-20 09:13:35 +00:00
hselasky
0f9392fd20 The returned value from vm_fault_disable_pagefaults() must be stored
and passed to vm_fault_enable_pagefaults(). Else possible recursion on
the state can be lost.

Sponsored by:	Mellanox Technologies
Suggested by:	kib @
2015-10-19 16:03:08 +00:00
hselasky
12df20ee74 Merge LinuxKPI changes from DragonflyBSD:
- Redefine DIV_ROUND_UP as a function macro taking two arguments
  instead of none.
- Implement more Linux kernel functions related to various forms
  of DELAY() and basic mathematical operations.

Sponsored by:	Mellanox Technologies
2015-10-19 12:44:41 +00:00
hselasky
e0d62d0664 Merge LinuxKPI changes from DragonflyBSD:
- Implement more Linux kernel functions.

Sponsored by:	Mellanox Technologies
2015-10-19 12:33:09 +00:00
hselasky
fd834c513a Merge LinuxKPI changes from DragonflyBSD:
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.

Reviewed by:	np @
Sponsored by:	Mellanox Technologies
2015-10-19 12:26:38 +00:00
hselasky
c2a6426cb9 Merge LinuxKPI changes from DragonflyBSD:
- Map more Linux compiler related defines to FreeBSD ones.

Sponsored by:	Mellanox Technologies
2015-10-19 12:08:06 +00:00
hselasky
a2fecac047 Map two more Linux error return codes to FreeBSD ones.
Sponsored by:	Mellanox Technologies
2015-10-19 12:04:20 +00:00
hselasky
5daa6ae96e Implement IS_ERR_OR_NULL() function.
Sponsored by:	Mellanox Technologies
2015-10-19 12:00:52 +00:00
hselasky
c454e1ce47 Merge LinuxKPI changes from DragonflyBSD:
- Add more list related functions and macros.
- Update the hlist_for_each_entry() macro to take one less argument.

Sponsored by:	Mellanox Technologies
2015-10-19 11:57:33 +00:00
hselasky
eca5e168a9 Merge LinuxKPI changes from DragonflyBSD:
- Reimplement ktime header file to distinguish more from Linux.
- Add new time header file to handle time related Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 11:46:48 +00:00
hselasky
5855be3ec6 Fix compile warning.
Sponsored by:	Mellanox Technologies
2015-10-19 11:29:50 +00:00
hselasky
c2db9fa30b Merge LinuxKPI changes from DragonflyBSD:
- Reimplement math64 header file to distinguish more from Linux.

Sponsored by:	Mellanox Technologies
2015-10-19 11:16:38 +00:00
hselasky
109f18ddeb Merge LinuxKPI changes from DragonflyBSD:
- Whitespace fixes.

Sponsored by:	Mellanox Technologies
2015-10-19 11:11:15 +00:00
hselasky
bed28ab60d Merge LinuxKPI changes from DragonflyBSD:
- Avoid using PAGE_MASK, because Linux defines it differently.
  Use (PAGE_SIZE - 1) instead.
- Add support for for_each_sg_page() and sg_page_iter_dma_address().

Sponsored by:	Mellanox Technologies
2015-10-19 11:09:51 +00:00
hselasky
c58bf9145b Merge LinuxKPI changes from DragonflyBSD:
- Implement schedule_timeout().

Sponsored by:	Mellanox Technologies
2015-10-19 10:57:56 +00:00
hselasky
6760765480 Merge LinuxKPI changes from DragonflyBSD:
- Implement pagefault_disable() and pagefault_enable().

Sponsored by:	Mellanox Technologies
2015-10-19 10:56:32 +00:00
hselasky
74fb822e8a Merge LinuxKPI changes from DragonflyBSD:
- Added support for multiple new Linux functions.
- Properly implement DEFINE_WAIT() and init_waitqueue_head() macros.
- Removed FreeBSD specific __wait_queue_head structure definition.

Sponsored by:	Mellanox Technologies
2015-10-19 10:54:24 +00:00
hselasky
5bb43cdb31 Merge LinuxKPI changes from DragonflyBSD:
- Some minor whitespace fixes.
- Added support for two new Linux functions.

Sponsored by:	Mellanox Technologies
2015-10-19 10:49:15 +00:00
melifaro
1e639f75e6 Fix build broken by r287861.
Spotted by:	zb
2015-09-16 15:40:08 +00:00
melifaro
493325342d Simplify the way of attaching IPv6 link-layer header.
Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
  ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
  to provide cached lle data.

For IPv6 situation is more complex:
  ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
    nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
  * checks if lle exists, creates it if needed (similar to arpresolve())
  * performes lle state transitions (similar to arpresolve())
  * calls nd6_output_ifp() which pushes packets to link output routine along
    with running SeND/MAC hooks regardless of lle state
    (e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
  which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
  of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
  interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
  * make all nd6_output() consumers use nd6_output_ifp() instead
  * rename nd6_output[_slow]() to nd6_resolve_[slow]()
  * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
    e.g. copy L2 address to buffer instead of pushing packet towards lower
    layers
  * Make all nd6_storelladdr() users use nd6_resolve()
  * eliminate nd6_storelladdr()

The resulting callchain is the following:
  ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
  -> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
  nd6_resolve() which will return EWOULDBLOCK, and that result
  will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D1469
2015-09-16 14:26:28 +00:00
markj
20a6072662 Ensure that the MAD agent's delayed taskqueue is completely stopped
before proceeding. Otherwise, nothing prevents it from running after the
MAD agent struct has been been freed, and this results in a use-after-free
when the task's ta_pending count is incremented in the callout handler.

MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-09-15 23:56:31 +00:00
jhb
61b7ea7a3d Currently the Linux character device mmap handling only supports mmap
operations that map a single page that has an associated vm_page_t.
This does not permit mapping larger regions (such as a PCI memory
BAR) and it does not permit mapping addresses beyond the top of RAM
(such as a 64-bit BAR located above the top of RAM).

Instead of using a single OBJT_DEVICE object and passing the physaddr via
the offset as a hack, create a new sglist and OBJT_SG object for each
mmap request. The requested memory attribute is applied to the object
thus affecting all pages mapped by the request.

Reviewed by:	hselasky, np
MFC after:	1 week
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D3386
2015-09-03 18:27:39 +00:00
np
cf4bfabada Reinstate unify_tcp_port_space and associated code that was lost during
the last OFED update (r278886).

iWARP on FreeBSD is properly integrated with the network stack and the
iWARP drivers _never_ operate out of any private TCP port-space that is
invisible to the kernel.  Instead, an iWARP connection shows up as a TCP
socket (which is what it is) fully visible to the kernel and standard
tools like netstat, sockstat, etc.
2015-08-12 22:09:58 +00:00
markj
40035e133b ipv4_is_zeronet() and ipv4_is_loopback() expect an address in network
order, but IN_ZERONET and IN_LOOPBACK expect it in host order.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-08-07 18:30:11 +00:00
hselasky
8375b98de4 Avoid calling into the random subsystem before it is initialized.
Sponsored by:	Mellanox Technologies
2015-08-04 09:45:10 +00:00
markj
ca7adff0b8 ib mad: fix an incorrect use of list_for_each_entry
In tf_dequeue(), if we reach the end of the list without finding a
non-cancelled element, "tmp" will be a pointer into the list head, so the
tmp->canceled check is bogus. Use a flag instead.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3244
2015-07-30 18:28:37 +00:00
hselasky
81b3f9b45b Fix broken implementation of "kvasprintf()" function by adding missing
kmalloc() call. Make function global instead of static inline to fix
compiler warnings about passing variable argument lists to inline
functions.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-07-03 11:16:20 +00:00
mjg
a5a3a94b02 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
mjg
d7bc9285a6 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
glebius
30a2ce2101 Add SIOCGI2C ioctl support to the driver. Would work only on ConnectX-3
with fresh firmware. The low level code is based on code provided by
Mellanox.

Thanks to Mellanox and their distributor Must (http://mustcompany.ru)
for providing hardware.

In collaboration with:	Andre Melkoumian <andre mellanox.com>
Reviewed by:		hselasky
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2015-05-27 13:42:28 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
hselasky
b9ba488f00 Apply proper locking when iterating the multicast addresses and add a
missing check for NULL from a non-blocking "kzalloc()" function call.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
Found by:	glebius @
2015-05-12 11:52:34 +00:00
markj
9a6d755918 msecs_to_jiffies() is implemented using tvtohz(9), which always returns a
positive value since it adds the current tick to its result. This differs
from the behaviour in Linux, whose implementation does not add the extra
tick, so subtract the extra tick in the OFED compat layer implementation.
This addresses some incorrect handling of IB MAD timeouts, since some IB
code depends on msecs_to_jiffies(0) returning 0.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-10 22:21:00 +00:00
markj
ba9d713de7 find_next_bit() and find_next_zero_bit(): if the caller-specified offset
lies within the last block of the bit set and no bits are set beyond the
offset, terminate the search immediately instead of continuing as though
there are further blocks in the set and subsequently returning an incorrect
result.

MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-10 22:04:42 +00:00
markj
713ce9cc68 Don't drop the idr lock before verifying that the newly-inserted element
is present in the tree. Otherwise there exists a window during which the
element could be removed by another thread, triggering an incorrect
assertion failure.

Reviewed by:	jeff
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-05-02 00:26:38 +00:00
mjg
22da590f11 fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.

No functional changes.
2015-04-11 15:40:28 +00:00
hselasky
e18a30075e Fix variable casting:
- Jiffies or ticks in FreeBSD have integer type and are not long.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-03-27 19:08:11 +00:00
hselasky
694b75af4d Fixes for the LinuxAPI completion wrappers:
- make sure the timeout computations are always above zero by using
the existing "linux_timer_jiffies_until()" function. Negative timeouts
can result in undefined behaviour.
- declare all completion functions like external symbols and move the
code to the LinuxAPI kernel module.
- add a proper prefix to all LinuxAPI kernel functions to avoid
namespace collision with other parts of the FreeBSD kernel.
- clean up header file inclusions in the linux/completion.h, linux/in.h
and linux/fs.h header files.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-03-27 16:16:23 +00:00
hselasky
0a348b9a8c Add missing void pointer argument to SYSINIT() functions.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-03-18 10:50:10 +00:00
hselasky
47d10f1bcb Fix problems about 32-bit ticks wraparound and unsigned long
conversion:
- The linux compat API layer casts the ticks to unsigned long which
might cause problems when the ticks value is negative.
- Guard against already expired ticks values, by checking if the
passed expiry tick is already elapsed.
- While at it avoid referring the address of an inlined function.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-03-18 10:49:17 +00:00
hselasky
58cb7b5be4 Declare missing symbol and inline macro which is only used once.
MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
Submitted by:	glebius@
2015-03-18 08:46:08 +00:00
hselasky
de871211fe Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision:	https://reviews.freebsd.org/D1987
Sponsored by:		Mellanox Technologies
MFC after:		1 month
2015-03-11 16:02:24 +00:00
hselasky
e09330b6b5 Ensure setting promiscious mode when a network interface is up, is
always non-blocking by not locking a SX type of mutex.

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-03-10 21:17:10 +00:00
hselasky
e5a77689c2 Define PTR_ALIGN() macro which will be needed coming Mellanox driver
releases.

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-03-04 09:58:39 +00:00
hselasky
781f97cbbc Updates for the Mellanox ethernet driver
> List of fixes:
  * use correct format for GID printouts
  * double array indexing
  * spelling in printouts
  * void pointer arithmetic
  * allow more receive rings
  * correct maximum number of transmit rings
  * use "const" instead of "static" for constants
  * check for invalid VLAN tags
  * check for lack of IRQ resources
> Added more hardware specific defines
> Added more verbose printouts of firmware status codes

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-03-04 09:30:03 +00:00
hselasky
403694c32d Macro fixes:
- Add missing "order_base_2()" macro.
- Fix BUILD_BUG_ON() macro.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2015-02-23 12:54:46 +00:00
bz
77d3bd281e Try to unbreak NOIP and NOINET6 LINT builds after r278886
by placing appropriate #ifdefs around otherwise unused variables
or sections with functions called which are not available without
IPv6 support in the kernel.
2015-02-19 11:48:00 +00:00
mjg
0a219ba739 filedesc: simplify fget_unlocked & friends
Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by:	silence on -hackers
2015-02-17 23:54:06 +00:00
hselasky
5c330fb03f Fix compilation of LINT-NOINET kernel target after r278886.
Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 21:59:15 +00:00
glebius
557c602e55 Globally enable -fms-extensions when building kernel with gcc, and remove
this option from all modules that enable it theirselves.
  In C mode -fms-extensions option enables anonymous structs and unions,
allowing us to use this C11 feature in kernel. Of course, clang supports
it without any extra options.

Reviewed by:	dim
2015-02-17 19:27:14 +00:00
hselasky
9bb7e2d4a1 Fix compilation of the SDP driver and a compile warning after r278886.
Also fix the kernel build rule for mlx4_exp.c.
This fixes the LINT kernel target for amd64.

Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 10:00:15 +00:00
hselasky
3eb2cab314 Fix compilation when DEBUG is defined.
Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 08:57:36 +00:00
hselasky
5dbc43f9a5 Update the infiniband stack to Mellanox's OFED version 2.1.
Highlights:
 - Multiple verbs API updates
 - Support for RoCE, RDMA over ethernet

All hardware drivers depending on the common infiniband stack has been
updated aswell.

Discussed with:	np @
Sponsored by:	Mellanox Technologies
MFC after:	1 month
2015-02-17 08:40:27 +00:00
hselasky
b5f65ef867 Define standard formatting strings to print GIDs
in a separate header file.

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-02-16 21:26:16 +00:00
hselasky
c3df62e094 The kasprintf() function cannot be inlined due to using a variable
number of arguments. Move it to a C-file in the linuxapi module to
make the function usable.

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-02-16 21:22:56 +00:00
hselasky
c01ffad8fc The "frag_info" pointer is already pointing to an array index.
Don't index twice.

Sponsored by:	Mellanox Technologies
MFC after:	3 days
2015-02-16 17:05:59 +00:00
hselasky
ee4071c1c9 Add more functions to the Linux kernel compatibility layer. Add some
missing includes which are needed when the header files are not
included in a particular order.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2015-02-13 16:35:12 +00:00
np
90487f1be9 Fix bug in idr_pre_get where it doesn't handle 'need' correctly.
Obtained from:	Chelsio Communications' internal repository.
2015-02-02 23:41:43 +00:00
hselasky
c0aba3b50d Revert for r277213:
FreeBSD developers need more time to review patches in the surrounding
areas like the TCP stack which are using MPSAFE callouts to restore
distribution of callouts on multiple CPUs.

Bump the __FreeBSD_version instead of reverting it.

Suggested by:		kmacy, adrian, glebius and kib
Differential Revision:	https://reviews.freebsd.org/D1438
2015-01-22 11:12:42 +00:00
hselasky
551300d112 Add missing linuxapi module dependencies and always use the FreeBSD
"MODULE_VERSION" macro definition. Remove the redefinition of the
"MODULE_VERSION" macro from the Linux kernel compatibility API.

MFC after:	1 month
Reported by:	np@
Sponsored by:	Mellanox Technologies
2015-01-19 21:53:00 +00:00
hselasky
b4271ae0c2 Add more functions to the Linux kernel compatibility layer. Add some
missing includes which are needed when the header files are not
included in a particular order.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2015-01-19 20:39:48 +00:00
hselasky
b7da41a302 Start importing the basic OFED linux compatibility layer changes made
by dumbbell@ to be able to compile this layer as a dependency module.
Clean up some Makefiles and remove the no longer used OFED define.
Currently only i386 and amd64 targets are supported.

MFC after:		1 month
Sponsored by:		Mellanox Technologies
2015-01-17 16:36:39 +00:00
np
badd58cf3b Use parentheses instead of close proximity to ensure layer + 1 is evaluated
before the rest of the expression.
2015-01-16 02:20:24 +00:00
hselasky
a9eed96a6b Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
hselasky
ced5b3dac1 Don't mask the IP-address when doing multicast IP over infiniband.
PR:		196631
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-01-09 06:39:07 +00:00
hselasky
51aafa4f2b Use the M_SIZE() macro when possible.
MFC after:	3 days
Suggested by:	rwatson@
2015-01-08 14:58:54 +00:00
hselasky
ff9d81bf5b Fixes and updates for the Linux compatibility layer:
- Remove unsupported "bus" field from "struct pci_dev".
- Fix logic inside "pci_enable_msix()" when the number of allocated
  interrupts are less than the number of available interrupts.
- Update header files included from "list.h".
- Ensure that "idr_destroy()" removes all entries before destroying
  the IDR root node(s).
- Set the "device->release" function so that we don't leak memory at
  device destruction.
- Use FreeBSD's "log()" function for certain debug printouts.
- Put parenthesis around arguments inside the min, max, min_t and max_t macros.
- Make sure we don't leak file descriptors by dropping the extra file
  reference counts done by the FreeBSD kernel when calling falloc()
  and fget_unlocked().

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-01-06 10:02:14 +00:00
hselasky
13d48a81ec Make sure callbacks being freed are not pending when the
"mlx4_en_deactivate_cq()" function returns.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-12-11 10:47:50 +00:00