Commit Graph

78 Commits

Author SHA1 Message Date
Dimitry Andric
cb53fc2dad Remove redundant declaration of cmclass in
sys/ofed/drivers/infiniband/core/ucm.c, to silence a gcc warning.

Approved by:	re (kib)
X-MFC-With:	r255932
2013-10-09 07:02:03 +00:00
Dimitry Andric
7ea82d6741 Give an unnamed union in sys/ofed/include/rdma/ib_verbs.h a name, to
silence a gcc warning.

Approved by:	re (gjb)
MFC after:      3 days
2013-10-07 16:54:29 +00:00
Alfred Perlstein
ef4f67f0b8 Fixed kernel crash when running devinfo
When calling to ib_uverbs_cleanup_ucontext, there is a call to
mutex_lock of xrcd_table_mutex, which was not initialized.
Added missing initialization for xrcd_table_mutex.

Submitted by: Orit Moskovich (oritm mellanox.com)

Approved by:	re
2013-10-01 15:43:23 +00:00
Alfred Perlstein
e18c176d9d Enable ib_dev.mmap function
Removed the ifdef linux from this function.
Added stub function for contiguous pages to avoid compilation
errors.

Submitted by:	Orit Moskovich (oritm mellanox.com)
Approved by:	re
2013-10-01 15:42:38 +00:00
Alfred Perlstein
ef7a7bb8ab Fixed 'Couldn't Create QP' issue when running rc_pingpong, uc_pingpong,
srq_pingpong IBverbs

Removed refrences using 'ifdef __linux__' to qpg functions and
related fields in struct
ib_qp_init_attr.

Submitted by: Orit Moskovich (oritm mellanox.com)

Approved by:	re
2013-10-01 15:38:29 +00:00
Alfred Perlstein
549e8f8b3f Fixed kernel crash when removing IPOIB_CM option from configuration file
Changed module init from module_init() to module_init_order() with
SI_ORDER_MIDDLE flag
Submitted by:	Orit Moskovich (oritm mellanox.com)
Approved by:	re
2013-10-01 15:36:51 +00:00
Alfred Perlstein
babfea95f9 Fix mis-merge of upstream fix.
We would accidentally make the string one byte too short.

Submitted by: Orit Moskovich (oritm mellanox.com)

Approved by:	re
2013-10-01 15:33:00 +00:00
Alfred Perlstein
c9f432b7ba Update OFED to Linux 3.7 and update Mellanox drivers.
Update the OFED Infiniband core to the version supplied in Linux
version 3.7.

The update to OFED is nearly all additional defines and functions
with the exception of the addition of additional parameters to
ib_register_device() and the reg_user_mr callback.

In addition the ibcore (Infiniband core) and ipoib (IP over Infiniband)
have both been made into completely loadable modules to facilitate
testing of the OFED stack in FreeBSD.

Finally the Mellanox Infiniband drivers are now updated to the
latest version shipping with Linux 3.7.

Submitted by: Mellanox FreeBSD driver team:
                Oded Shanoon (odeds mellanox.com),
                Meny Yossefi (menyy mellanox.com),
                Orit Moskovich (oritm mellanox.com)

Approved by: re
2013-09-29 00:35:03 +00:00
Pawel Jakub Dawidek
ab568de789 Handle cases where capability rights are not provided.
Reported by:	kib
2013-09-05 11:58:12 +00:00
Andre Oppermann
24c6ede6f0 Change m->pkthdr.header to m->pkthdr.PH_loc.ptr after r254804
to transiently store pointers to packet headers.

Sponsored by:	The FreeBSD Foundation
2013-08-25 09:45:26 +00:00
Navdeep Parhar
f336c6303e Fix implementation of sock_getname.
MFC after:	1 week
2013-08-23 18:54:27 +00:00
John Baldwin
c8e1113b37 Stop an ipoib interface before detaching it.
PR:		kern/181225
Submitted by:	Shahar Klein
Obtained from:	Mellanox
MFC after:	1 week
2013-08-20 18:08:06 +00:00
Andre Oppermann
86bd049144 Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with:	trociny, glebius
2013-08-19 13:27:32 +00:00
Gleb Smirnoff
ca04d21d5f Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by:	kib
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2013-08-15 07:54:31 +00:00
Jeff Roberson
863c7e4562 - Reserve a special AF for SDP. The one we were incorrectly using before
was taken by another AF.

Sponsored by:	EMC / Isilon Storage Division
2013-08-09 03:26:17 +00:00
Jeff Roberson
3d71c6cf45 - Correctly handle various edge cases in sysfs emulation.
Sponsored by:	EMC / Isilon Storage Division
2013-08-09 03:24:48 +00:00
Jeff Roberson
ba397d0f16 - Use the correct type in the linux bitops emulation.
Submitted by:	Maxim Ignatenko <gelraen.ua@gmail.com>
2013-08-09 03:24:12 +00:00
Konstantin Belousov
449c2e92c9 Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain.  The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass.  This is not optimal, since it
could cause excessive page deactivation and freeing.  The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues.  Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-07 16:36:38 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
John Baldwin
c183dc9519 Add a missing prototype.
Pointy hat:	me
2013-07-29 20:48:10 +00:00
John Baldwin
575a6840de Various fixes to the mlxen(4) driver:
- Remove an incorrect assertion that can trigger when downing an interface.
- Stop the interface during detach to avoid panics when unloading the
  driver.
- A few locking fixes to be more consistent with other FreeBSD drivers:
  - Protect if_drv_flags with the driver lock, not atomic ops
  - Hold the driver lock when adjusting multicast state.
  - Hold the driver lock while adjusting if_capenable.

PR:		kern/180791 [1,2]
Submitted by:	Shakar Klein @ Mellanox [1,2]
MFC after:	3 days
2013-07-29 18:44:52 +00:00
John Baldwin
ba90c51af3 Avoid trashing IP fragments:
- Only enable UDP/TCP hardware checksums if CSUM_UDP or CSUM_TCP is set.
- Only enable IP hardware checksums if CSUM_IP is set.

PR:		kern/180430
Submitted by:	Meny Yossefi <menyy@mellanox.com>
MFC after:	1 week
2013-07-25 16:34:34 +00:00
Andriy Gapon
785797c341 rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.

Rationale:

- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
  invoked depending on its relative order with scheduler; this was not obvious
  and the bug actually used to exist

Reviewed by:	kib (ealier version)
MFC after:	14 days
2013-07-24 09:45:31 +00:00
John Baldwin
c232b2aebb Rework the previous fix for the IB vs Ethernet sysctl handler to be more
generic and apply to all sysfs attributes:
- Use sysctl_handle_string() instead of reimplementing it.
- Remove trailing newline from the current value before passing it to
  userland and append a newline to the new string value before passing it
  to the attribute's store function.
- Don't leak the temporary buffer if the first error check triggers.
- Revert earlier change to mlx4 port mode handler.

PR:		kern/174213
Submitted by:	Garrett Cooper
Reviewed by:	Shakar Klein @ Mellanox
MFC after:	1 week
2013-07-18 14:06:01 +00:00
John Baldwin
d356583dbc Remove check forbidding requests that would result in one port being set
to Ethernet and the subsequent port being set to IB.

Submitted by:	Shakar Klein @ Mellanox
Tested by:	Morgan Robertson <morganrobertson@gmail.com>
MFC after:	1 week
2013-07-17 13:41:54 +00:00
John Baldwin
335349690f Allow mlx4 devices to switch from Ethernet to Infiniband (and vice versa):
- Fix sysctl wrapper for sysfs attributes to properly handle new string
  values similar to sysctl_handle_string() (only copyin the user's
  supplied length and nul-terminate the string).
- Don't check for a trailing newline when evaluating the desired operating
  mode of a mlx4 device.

PR:		kern/179999
Submitted by:	Shahar Klein <shahark@mellanox.com>
MFC after:	1 week
2013-07-08 21:25:12 +00:00
John Baldwin
b2737f9e63 Store a reference to the vnode associated with a file descriptor in the
linux_file structure and use it instead of directly accessing td_fpop
when destroying the linux_file structure.  The td_fpop pointer is not
valid when a cdevpriv destructor is run, and the type-specific close
method has already been called, so f_vnode may not be valid (and the
vnode might have been recycled without our own reference).

Tested by:	Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
MFC after:	1 week
2013-06-11 15:37:07 +00:00
Eitan Adler
7a2b450ff8 Fxi a bunch of typos.
PR:	misc/174625
Submitted by:	Jeremy Chadwick <jdc@koitsu.org>
2013-05-10 16:41:26 +00:00
Xin LI
5ed3c8af83 According to the documentation, on Linux, cancel_delayed_work() does not
do drain (flush_workqueue() in Linux terms) but instead returns true if
the work was removed before it is run, or false otherwise.

Simulate this by removing the taskqueue_drain() and return the value
derived from taskqueue_cancel()'s return value.

This would solve a witness warning caused by calling taskqueue_drain()
with a non-sleepable lock held, like:

taskqueue_drain with the following non-sleepable locks held:
exclusive rw lle (lle) r = 0 (0xfffffe001450b410) locked @
/usr/src/sys/netinet/in.c:1484
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff848d4f7690
kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff848d4f7740
witness_warn() at witness_warn+0x4a8/frame 0xffffff848d4f7800
taskqueue_drain() at taskqueue_drain+0x3a/frame 0xffffff848d4f7840
set_timeout() at set_timeout+0x4a/frame 0xffffff848d4f7860
netevent_callback() at netevent_callback+0x16/frame 0xffffff848d4f7870
arpintr() at arpintr+0x9b5/frame 0xffffff848d4f7930

This do not affect kernel without OFED compiled in.

Reported by:	Garrett Cooper <yaneurabeya gmail com>
		(who also tested an earlier version of this patch,
		but bugs are mine)
MFC after:	2 weeks
2013-05-08 17:45:22 +00:00
Gleb Smirnoff
c46db3529c Add const qualifier to the dst parameter of the ifnet if_output method. 2013-04-27 08:11:48 +00:00
John Baldwin
e5fb562225 Check for SS_NBIO in the socket state field rather than socket buffer
flags.

Submitted by:	Vijay Singh
MFC after:	1 week
2013-04-03 20:31:10 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Alexander Motin
27eae7e9ad Add protective parentheses for macro argument, missed in r247671. 2013-03-02 22:41:06 +00:00
Alexander Motin
d4d29475e6 MFcalloutng:
Give OFED Linux wrapper own "expires" field instead of abusing callout's
c_time, which will change its type and units with calloutng commit.
2013-03-02 22:19:17 +00:00
Pawel Jakub Dawidek
2609222ab4 Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor
  has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
  should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
  cap_new(2), which limits capability rights of the given descriptor
  without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
  ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
  ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
  that can be used with the new cap_fcntls_limit(2) syscall and retrive
  them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
  heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
  recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
  backward API and ABI compatibility there are some incompatible changes
  that are described in detail below:

	CAP_CREATE old behaviour:
	- Allow for openat(2)+O_CREAT.
	- Allow for linkat(2).
	- Allow for symlinkat(2).
	CAP_CREATE new behaviour:
	- Allow for openat(2)+O_CREAT.

	Added CAP_LINKAT:
	- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
	- Allow to be target for renameat(2).

	Added CAP_SYMLINKAT:
	- Allow for symlinkat(2).

	Removed CAP_DELETE. Old behaviour:
	- Allow for unlinkat(2) when removing non-directory object.
	- Allow to be source for renameat(2).

	Removed CAP_RMDIR. Old behaviour:
	- Allow for unlinkat(2) when removing directory.

	Added CAP_RENAMEAT:
	- Required for source directory for the renameat(2) syscall.

	Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
	- Allow for unlinkat(2) on any object.
	- Required if target of renameat(2) exists and will be removed by this
	  call.

	Removed CAP_MAPEXEC.

	CAP_MMAP old behaviour:
	- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
	  PROT_WRITE.
	CAP_MMAP new behaviour:
	- Allow for mmap(2)+PROT_NONE.

	Added CAP_MMAP_R:
	- Allow for mmap(PROT_READ).
	Added CAP_MMAP_W:
	- Allow for mmap(PROT_WRITE).
	Added CAP_MMAP_X:
	- Allow for mmap(PROT_EXEC).
	Added CAP_MMAP_RW:
	- Allow for mmap(PROT_READ | PROT_WRITE).
	Added CAP_MMAP_RX:
	- Allow for mmap(PROT_READ | PROT_EXEC).
	Added CAP_MMAP_WX:
	- Allow for mmap(PROT_WRITE | PROT_EXEC).
	Added CAP_MMAP_RWX:
	- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

	Renamed CAP_MKDIR to CAP_MKDIRAT.
	Renamed CAP_MKFIFO to CAP_MKFIFOAT.
	Renamed CAP_MKNODE to CAP_MKNODEAT.

	CAP_READ old behaviour:
	- Allow pread(2).
	- Disallow read(2), readv(2) (if there is no CAP_SEEK).
	CAP_READ new behaviour:
	- Allow read(2), readv(2).
	- Disallow pread(2) (CAP_SEEK was also required).

	CAP_WRITE old behaviour:
	- Allow pwrite(2).
	- Disallow write(2), writev(2) (if there is no CAP_SEEK).
	CAP_WRITE new behaviour:
	- Allow write(2), writev(2).
	- Disallow pwrite(2) (CAP_SEEK was also required).

	Added convinient defines:

	#define	CAP_PREAD		(CAP_SEEK | CAP_READ)
	#define	CAP_PWRITE		(CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_R		(CAP_MMAP | CAP_SEEK | CAP_READ)
	#define	CAP_MMAP_W		(CAP_MMAP | CAP_SEEK | CAP_WRITE)
	#define	CAP_MMAP_X		(CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
	#define	CAP_MMAP_RW		(CAP_MMAP_R | CAP_MMAP_W)
	#define	CAP_MMAP_RX		(CAP_MMAP_R | CAP_MMAP_X)
	#define	CAP_MMAP_WX		(CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_MMAP_RWX		(CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
	#define	CAP_RECV		CAP_READ
	#define	CAP_SEND		CAP_WRITE

	#define	CAP_SOCK_CLIENT \
		(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
		 CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
	#define	CAP_SOCK_SERVER \
		(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
		 CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
		 CAP_SETSOCKOPT | CAP_SHUTDOWN)

	Added defines for backward API compatibility:

	#define	CAP_MAPEXEC		CAP_MMAP_X
	#define	CAP_DELETE		CAP_UNLINKAT
	#define	CAP_MKDIR		CAP_MKDIRAT
	#define	CAP_RMDIR		CAP_UNLINKAT
	#define	CAP_MKFIFO		CAP_MKFIFOAT
	#define	CAP_MKNOD		CAP_MKNODAT
	#define	CAP_SOCK_ALL		(CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by:	The FreeBSD Foundation
Reviewed by:	Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with:	rwatson, benl, jonathan
ABI compatibility discussed with:	kib
2013-03-02 00:53:12 +00:00
Xin LI
3b9ba32d88 Fix LINT build on amd64. 2013-02-09 04:13:45 +00:00
Randall Stewart
ded5ea6a25 This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will  need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by:	jhb@freebsd.org, jlv@freebsd.org
2013-02-07 15:20:54 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Dimitry Andric
3910bc633f Redo r242842, now actually fixing the warnings, as follows:
- In sys/ofed/drivers/infiniband/core/cma.c, an enum struct member is
  interpreted as an int, so cast it to an int.
- In sys/ofed/drivers/infiniband/core/ud_header.c, initialize the
  packet_length variable in ib_ud_header_init(), to prevent undefined
  behaviour.
- In sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c, call rdma_notify()
  with the correct enum type and value.
- In sys/ofed/include/linux/pci.h, change the PCI_DEVICE and PCI_VDEVICE
  macros to use C99 struct initializers, so additional members can be
  overridden.

Reviewed by:	delphij, Garrett Cooper <yanegomi@gmail.com>
MFC after:	1 week
2012-11-12 22:01:29 +00:00
Xin LI
5e2b8fb7ed Use %s when calling make_dev with a string pointer. This makes
clang happy.

MFC after:	2 weeks
2012-11-09 21:41:07 +00:00
Eitan Adler
db702c59cf remove duplicate semicolons where possible.
Approved by:	cperciva
MFC after:	1 week
2012-10-22 03:00:37 +00:00
John Baldwin
345a795543 Take advantage of if_baudrate_pf and calculate an effective baud rate on
all platforms (not just amd64) to compute an equivalent IB rate.
2012-10-18 15:44:27 +00:00
John Baldwin
989343faad Use if_initbaudrate(). 2012-10-18 15:43:19 +00:00
Gleb Smirnoff
063efed28c The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
  ALTQ queueing or buf_ring(9) queueing.
  - It doesn't touch the buf_ring stats any more.
  - It doesn't touch ifnet stats anymore.
  - drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
  - It handles br_drops itself.
  - br_prod_bytes stats are dropped. Rationale: no one ever
    reads them but update of a common counter on every packet
    negatively affects performance due to excessive cache
    invalidation.
  - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
    we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and  ixv(4) no longer
  use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
  drbr_stats_update(), thus it was the only driver accurate under
  moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
  - drop software stats updating.
  - take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
  from hardware.

Reviewed by:	jfv, gnn
2012-09-28 18:28:27 +00:00
Gavin Atkinson
389c8bd51e Align the PCI Express #defines with the style used for the PCI-X
#defines.  This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
  s/PCIR_EXPRESS_/PCIER_/g
  s/PCIM_EXP_/PCIEM_/g
  s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with:	jhb
MFC after:	1 week
2012-09-18 22:04:59 +00:00
Alexander V. Chernikov
f340037ea1 Remove unneeded ipfw headers introduced in r213447 from Infiniband code.
MFC after:     2 weeks
2012-09-04 10:56:30 +00:00
Hans Petter Selasky
07da61a6cc Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with:	phk
MFC after:	2 weeks
2012-08-15 16:19:39 +00:00
Konstantin Belousov
1c771f9222 After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
Navdeep Parhar
40a29cf009 Fix clang warning when compiling iw_cxgb.
Reported by:	rene, dim
2012-06-25 16:52:27 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00