334 Commits

Author SHA1 Message Date
hselasky
551300d112 Add missing linuxapi module dependencies and always use the FreeBSD
"MODULE_VERSION" macro definition. Remove the redefinition of the
"MODULE_VERSION" macro from the Linux kernel compatibility API.

MFC after:	1 month
Reported by:	np@
Sponsored by:	Mellanox Technologies
2015-01-19 21:53:00 +00:00
hselasky
b4271ae0c2 Add more functions to the Linux kernel compatibility layer. Add some
missing includes which are needed when the header files are not
included in a particular order.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2015-01-19 20:39:48 +00:00
hselasky
b7da41a302 Start importing the basic OFED linux compatibility layer changes made
by dumbbell@ to be able to compile this layer as a dependency module.
Clean up some Makefiles and remove the no longer used OFED define.
Currently only i386 and amd64 targets are supported.

MFC after:		1 month
Sponsored by:		Mellanox Technologies
2015-01-17 16:36:39 +00:00
np
badd58cf3b Use parentheses instead of close proximity to ensure layer + 1 is evaluated
before the rest of the expression.
2015-01-16 02:20:24 +00:00
hselasky
a9eed96a6b Major callout subsystem cleanup and rewrite:
- Close a migration race where callout_reset() failed to set the
  CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
  spinlocks.
- Switching the callout CPU number cannot always be done on a
  per-callout basis. See the updated timeout(9) manual page for more
  information.
- The timeout(9) manual page has been updated to reflect how all the
  functions inside the callout API are working. The manual page has
  been made function oriented to make it easier to deduce how each of
  the functions making up the callout API are working without having
  to first read the whole manual page. Group all functions into a
  handful of sections which should give a quick top-level overview
  when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
  to reduce the complexity in the callout code and to avoid problems
  about atomically stopping callouts via callout_stop(). If someone
  needs it, it can be re-added. From my quick grep there are no
  CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
  added. See the updated timeout(9) manual page for a complete
  description.
- Update the callout clients in the "kern/" folder to use the callout
  API properly, like cv_timedwait(). Previously there was some custom
  sleepqueue code in the callout subsystem, which has been removed,
  because we now allow callouts to be protected by spinlocks. This
  allows us to tear down the callout like done with regular mutexes,
  and a "td_slpmutex" has been added to "struct thread" to atomically
  teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
  "SWT_SLEEPQTIMO" states can now be completely removed. Currently
  they are marked as available and will be cleaned up in a follow up
  commit.
- Bump the __FreeBSD_version to indicate kernel modules need
  recompilation.
- There has been several reports that this patch "seems to squash a
  serious bug leading to a callout timeout and panic".

Kernel build testing:	all architectures were built
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D1438
Sponsored by:		Mellanox Technologies
Reviewed by:		jhb, adrian, sbruno and emaste
2015-01-15 15:32:30 +00:00
hselasky
ced5b3dac1 Don't mask the IP-address when doing multicast IP over infiniband.
PR:		196631
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2015-01-09 06:39:07 +00:00
hselasky
51aafa4f2b Use the M_SIZE() macro when possible.
MFC after:	3 days
Suggested by:	rwatson@
2015-01-08 14:58:54 +00:00
hselasky
ff9d81bf5b Fixes and updates for the Linux compatibility layer:
- Remove unsupported "bus" field from "struct pci_dev".
- Fix logic inside "pci_enable_msix()" when the number of allocated
  interrupts are less than the number of available interrupts.
- Update header files included from "list.h".
- Ensure that "idr_destroy()" removes all entries before destroying
  the IDR root node(s).
- Set the "device->release" function so that we don't leak memory at
  device destruction.
- Use FreeBSD's "log()" function for certain debug printouts.
- Put parenthesis around arguments inside the min, max, min_t and max_t macros.
- Make sure we don't leak file descriptors by dropping the extra file
  reference counts done by the FreeBSD kernel when calling falloc()
  and fget_unlocked().

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2015-01-06 10:02:14 +00:00
hselasky
13d48a81ec Make sure callbacks being freed are not pending when the
"mlx4_en_deactivate_cq()" function returns.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-12-11 10:47:50 +00:00
hselasky
07f79e99bf Move OFED init a bit earlier so that PXE boot works.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-12-09 08:56:01 +00:00
rodrigc
ea02e38701 Use CURVNET macros inside inet_get_local_port_range() function.
Without this fix, a kernel with VIMAGE + Infiniband will panic on bootup.

Certain necessary #include statements require LIST_HEAD.
Add these includes to ofed/include/linux/list.h, because
LIST_HEAD is specifically overridden in this file.

PR: 191468
Differential Revision: D1279
Reviewed by: hselasky
2014-12-08 07:25:59 +00:00
hselasky
12fec3618b Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
glebius
9cadf1b974 Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-30 13:24:21 +00:00
melifaro
95c680b9a3 Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr -
return lle flags IFF needed.
Do not pass rte to arpresolve - pass is_gateway flag instead.
2014-11-27 23:06:25 +00:00
glebius
c0b38b545a In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-12 09:57:15 +00:00
hselasky
b131051a77 Fix compile warning by removing unused variable.
MFC:		3 days
Sponsored by:	Mellanox Technologies
2014-10-30 16:57:56 +00:00
hselasky
9d712aa6b0 Update the network interface baudrate integer according to the actual
line rate.

Submitted by:	jhb @
MFC after:	1 week
2014-10-24 16:39:01 +00:00
hselasky
49c137f7be Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
hselasky
11046e8048 Update the OFED Linux compatibility layer and
Mellanox hardware driver(s):

- Properly name an inclusion guard
- Fix compile warnings regarding unsigned enums
- Add two new sysctl nodes
- Remove all empty linux header files
- Make an error printout more verbose
- Use "mod_delayed_work()" instead of
  cancelling and starting a timeout.
- Implement more Linux scatterlist
  functions.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-15 13:40:29 +00:00
hselasky
2c5c8aef35 Fix compile warning when compiling with GCC.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-07 10:04:25 +00:00
glebius
7864ab9953 Mechanically convert to if_inc_counter(). 2014-09-27 20:39:24 +00:00
hselasky
660eeebe34 Update code to use new network counter API.
Fix some minor compile warnings while at it.

Sponsored by:	Mellanox Technologies
Suggested by:	glebius@
MFC after:	1 week
2014-09-24 08:28:34 +00:00
hselasky
512a43f91c Hardware driver update from Mellanox Technologies, including:
- improved performance
 - better stability
 - new features
 - bugfixes

Supported HCAs:
 - ConnectX-2
 - ConnectX-3
 - ConnectX-3 Pro

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-23 12:37:01 +00:00
jhb
8f082668d0 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
hselasky
bdacf9ba4d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
hselasky
727760a4e4 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
hselasky
3d04a989df Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
jhb
4cd91e9d81 Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.
2014-09-12 21:29:10 +00:00
bz
5fa46aaa74 Forward declare struct kiocb, which is only used for an unsued function
argument but not actually defined anywhere.

This fixes the compile complaining about
"declaration of 'struct kiocb' will not be visible outside of this function".

MFC after:	2 weeks
X-MFC with:	whatever changed caused the breakage ;-)
2014-08-29 14:47:05 +00:00
hselasky
387f5b6c1a - Update the OFED Linux Emulation layer as a preparation for a
hardware driver update from Mellanox Technologies.
- Remove empty files from the OFED Linux Emulation layer.
- Fix compile warnings related to printf() and the "%lld" and "%llx"
format specifiers.
- Add some missing 2-clause BSD copyrights.
- Add "Mellanox Technologies, Ltd." to list of copyright holders.
- Add some new compatibility files.
- Fix order of uninit in the mlx4ib module to avoid crash at unload
using the new module_exit_order() function.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-08-27 13:21:53 +00:00
royger
925b20548e pci: make MSI(-X) enable and disable methods of the PCI bus
Make the functions pci_disable_msi, pci_enable_msi and pci_enable_msix
methods of the newbus PCI bus. This code should not include any
functional change.

Sponsored by: Citrix Systems R&D
Reviewed by: imp, jhb
Differential Revision: https://reviews.freebsd.org/D354

dev/pci/pci.c:
 - Convert the mentioned functions to newbus methods.
 - Fix the callers of the converted functions.

sys/dev/pci/pci_private.h:
dev/pci/pci_if.m:
 - Declare the new methods.

dev/pci/pcivar.h:
 - Add helpers to call the newbus methods.

ofed/include/linux/pci.h:
 - Add define to prevent the ofed version of pci_enable_msix from
   clashing with the FreeBSD native version.
2014-08-20 14:57:20 +00:00
hselasky
fa73754636 - Fix radix tree memory leakage when unloading modules using radix
trees. This happens because the logic inserting items into the radix
tree is allocating empty radix levels, when index zero does not
contain any items.
- Add proper error case handling, so that the radix tree does not end
up in a bad state, if memory cannot be allocated during insertion of
an item.
- Add check for inserting NULL items into the radix tree.
- Add check for radix tree getting too big.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-08-12 11:45:57 +00:00
hselasky
50128eff91 Fix OFED startup order: All SYSINIT()'s and modules should be loaded
prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx"
scripts. Else there can be a race configuring the interfaces via
"/etc/rc.conf".

MFC after:	4 weeks
Sponsored by:	Mellanox Technologies
2014-07-06 14:22:13 +00:00
hselasky
050052b698 Fix compile warning.
MFC after:	4 weeks
Sponsored by:	Mellanox Technologies
2014-07-06 14:20:47 +00:00
hselasky
8bc8d604f8 Fix some compile warnings.
MFC after:	4 weeks
Sponsored by:	Mellanox Technologies
2014-07-06 14:14:07 +00:00
hselasky
c676f789a4 Compile fixes:
Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.

Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.

Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.

Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.

MFC after:	2 weeks
2014-06-28 17:36:18 +00:00
hselasky
b019f75fb3 - Fix out of range shifting bug in bitops.h.
- Make code a bit easier to read by adding parenthesis.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-06-12 13:33:01 +00:00
imp
2118f42afd Use src.opts.mk in preference to bsd.own.mk except where we need stuff
from the latter.
2014-05-06 04:22:01 +00:00
bdrewery
6fcf6199a4 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
glebius
b38edcd355 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
melifaro
881c9e28bf Simplify filling sockaddr_dl structure for if_resolvemulti()
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
 with unmodified if_resolvemulti() callers).

MFC after:	2 weeks
2014-01-18 23:24:51 +00:00
dim
076aa80e8f Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile.  Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.

MFC after:	3 days
2013-12-30 20:34:53 +00:00
alfred
eaa4015a95 Defer start/stop port to workqueues.
We need to do this because the Linux compat layer uses sx(9) for
mutex, however the lagg code uses rmlocks and calls into the mellanox
driver.  This causes deadlock due to sleeping while holding a rmlock.

Submitted by: Shahar Klein (shahark mellanox.com)
MFC After: 3 days.
2013-12-15 07:07:13 +00:00
eadler
44c01df173 Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this
shifts into the sign bit.  Instead use (1U << 31) which gets the
expected result.

This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.

A similar change was made in OpenBSD.

Discussed with:	-arch, rdivacky
Reviewed by:	cperciva
2013-11-30 22:17:27 +00:00
alfred
d38e51078d Fix creating a vlan over lagg over mlxen crash.
PR:		181931
Submitted by:	Shahar Klein (shahark mellanox.com)
2013-11-17 20:58:31 +00:00
alfred
4161aa5b4a Do not use a sleep lock when protecting the driver flags.
This was causing a locking issue with lagg

Submitted by:	odeds
2013-11-08 18:28:48 +00:00
alfred
9dd81144ca Fix for bad performance when mtu is increased.
Update the auto moderation behavior in the mlxen driver to match
the new LINUX OFED code.

Submitted by:	odeds
2013-11-08 18:26:28 +00:00
alfred
ac3a1c41ee Use explicit long cast to avoid overflow in bitopts.
This was causing problems with the buddy allocator inside of
ofed.

Submitted by: odeds
2013-11-08 18:20:19 +00:00
alfred
0b3e10d274 Fix API mismatch exposed by lagg.
When destroying a lagg the driver tries to restore the old mac and
fails due to API mismatch
2013-11-02 10:49:47 +00:00
glebius
ff6e113f1b The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00