124352 Commits

Author SHA1 Message Date
loos
650192202f Commit the struture changes for the padding of small packets on if_cpsw.
Should have been committed together with r312604.

MFC with:	r312604
2017-01-21 19:56:28 +00:00
loos
ffe2349eba Simplify the handling of small packets padding in cpsw:
- Pad small packets to 60 bytes and not 64 (exclude the CRC bytes);
 - Pad the packet using m_append(9), if the packet has enough space for
   padding, which is usually true, it will not be necessary append a newly
   allocated mbuf to the chain.

Suggested by:	yongari
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-21 19:49:39 +00:00
mav
33a69be661 Add initial support for CTL module unloading.
It is only a first step and not perfect, but better then nothing.
The main blocker is CAM target frontend, that can not be unloaded,
since CAM does not have mechanism to unregister periph driver now.

MFC after:	2 weeks
2017-01-21 19:38:26 +00:00
mjg
646081b560 vfs: __predict_false the need to handle F_HASLOCK
Also reorder the check with DTYPE_VNODE. Passed files are vnodes vast
majority of the time, so it is typically true.
2017-01-21 19:01:42 +00:00
mjg
826bdd63fa vfs: fix whitespace damage in r312600
While here wrap the previously overly long line so that it fits 80 chars.
2017-01-21 18:56:58 +00:00
mjg
1f77e9a20e vfs: refactor _vn_lock
Stop testing for LK_RETRY and error multiple times. Also postpone the
VI_DOOMED until after LK_RETRY was seen as it reads from the vnode.

No functional changes.
2017-01-21 18:38:16 +00:00
cem
d1e33f24bc Add remaining ELF compression definitions and structs
A follow-up to r300231.

Sponsored by:	Dell EMC Isilon
2017-01-21 17:39:10 +00:00
mjg
e90da0df92 vfs: hide the getvnode NULL mp message behind DIAGNOSTIC
Since crossmp vnode changes the message was being printed on each boot.

Reported by:	trasz
Discussed with:	kib
2017-01-21 16:59:50 +00:00
avos
9561ba5adb rtwn: enable LDPC support where possible
Tested with RTL8821AU, STA mode.
2017-01-21 15:03:58 +00:00
avos
8e0917e51f net80211: allow to configure LDPC support
Tested with RTL8821AU, STA mode (Tx support only)

Reviewed by:	adrian
Differential Revision:	https://reviews.freebsd.org/D9268
2017-01-21 14:19:06 +00:00
brooks
7d2bc72184 Enable TMPFS on MALTA so we can use it on minimalist disk images without
modules.

Sponsored by:	DARPA, AFRL
2017-01-21 09:08:27 +00:00
adrian
e61c94ce9d [ath] ensure both iv_ampdu_limit and iv_ampdu_rxmax is set.
A recent change enforced the VAP limit as well as the peer limit.
I now need to actually set iv_ampdu_limit or we don't transmit more
than 8K sized aggregates.

This restores the expected (suboptimal, but still much faster) behaviour.

Tested:

* AR9380, STA mode
2017-01-21 06:53:30 +00:00
kib
d52cc80daf Use SFENCE for ordering CLFLUSHOPT.
SDM states that CLFLUSHOPT instructions can be ordered with other
writes by SFENCE, heavier MFENCE is not required.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-20 19:08:44 +00:00
asomers
bd896633a0 Fix "camcontrol timestamp -s" with LTO-7 drives
The length of the scsi_set_timestamp_parameters struct was incorrect.  LTO-5
drives don't care, but LTO-7 drives do.

Reviewed by:	Sam Klopsch
MFC after:	2 weeks
Sponsored by:	Spectra Logic Corp
2017-01-20 17:54:24 +00:00
hselasky
6c205e2b1c Fix for race leading to endless timer interrupts related to
configtimer().

During normal operation "state->nextcallopt" will always be less than
or equal to "state->nextcall" and checking only "state->nextcallopt"
before calling "callout_process()" is sufficient. However when
"configtimer()" is called a race might happen requiring both of these
binary times to be checked.

Short description of race:

1) A configtimer() call will reset both "state->nextcall" and
"state->nextcallopt" to the same binary time.

2) If a "callout_reset()" call happens between "configtimer()" and the
next "callout_process()" call, "state->nextcallopt" will get updated
and "state->nextcall" will remain at the current time. Refer to logic
inside cpu_new_callout().

3) getnextcpuevent() only respects "state->nextcall" and returns this
value over and over again, even if it is in the past, until "now >=
state->nextcallopt" becomes true. Then these two time variables are
corrected by a "callout_process()" call and the situation goes back to
normal.

The problem manifests itself in different ways. The common factor is
the timer process(es) consume all CPU on one or more CPU cores for a
long time, blocking other kernel processes from getting execution
time. This can be seen by very high interrupt counts as displayed by
"vmstat -i | grep timer" right after boot.

When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting
this bug apparently increased.

Example output from "vmstat -i" before patch:
cpu0:timer                          7591         69
cpu9:timer                      39031773     358089
cpu4:timer                          9359         85
cpu3:timer                          9100         83
cpu2:timer                          9620         88

Example output from "vmstat -i" after patch:
cpu0:timer                          4242         34
cpu6:timer                          5531         44
cpu3:timer                          6450         52
cpu1:timer                          4545         36
cpu9:timer                          7153         58

Before the patch cpu9 in the example above, was spinning in a loop in
order to reach 39 million interrupts just a few seconds after
bootup. After the patch the timer interrupt counts are more or less
consistent.

Discussed with:		mav @
Reported by:		several people
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 17:40:31 +00:00
rstone
409b9a04ef Fix reference to free memory in ixgbe/if_media.c
When ixgbe receives an interrupt indicating that a new optical module
may have been inserted, it discards all of its current media types
by calling ifmedia_removeall() and then creates a new set of media
types for the supported media on the new module.  However,
ifmedia_removeall() was maintaining a pointer to whatever the
current media type was before the call to ifmedia_removealL().
The result of this was that any attempt to read the current media
type of the interface (e.g. via ifconfig) would return potentially
garbage data from free memory (or if one were particularly unlucky
on an architecture that does not malloc() from a direct map, page
fault the kernel).

Fix this by NULL'ing out the current media field in if_media.c,
and have ixgbe update the current media type after recreating
them.

Submitted by:	Matt Joras <matt.joras AT gmail DOT com>
Reviewed by:	sbruno, erj
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D9164
2017-01-20 17:16:48 +00:00
pfg
bb08bd9aa7 Addition of clang nullability qualifiers.
For consistency with the qualifiers added in r310977, define a new
qualifier _Null_unspecified which is also defined in clang 3.7+.

Add two new macros:
__NULLABILITY_PRAGMA_PUSH
__NULLABILITY_PRAGMA_POP

These are for use in headers when we want avoid noisy warnings if
some pointers are left without nullability annotations.

These are added with way ahead of their first use to teach the GCC
ports headers of their existance before their first use.
2017-01-20 15:56:40 +00:00
hselasky
195b63cfea Remove superfluous return statement.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 15:47:29 +00:00
hselasky
f0361cf78b Allow transmit packet bufring in software to be disabled.
- Add new sysctl node to control the transmit packet bufring.

- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.

- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
Suggested by:		gallatin @
2017-01-20 15:45:21 +00:00
jpaetzel
a7aeec2d89 MFV 312436
6569 large file delete can starve out write ops

  illumos/illumos-gate@ff5177ee8b
  ff5177ee8b

  https://www.illumos.org/issues/6569
    The core issue I've found is that there is no throttle for how many
    deletes get assigned to one TXG. As a results when deleting large files
    we end up filling consecutive TXGs with deletes/frees, then write
    throttling other (more important) ops.

    There is an easy test case for this problem. Try deleting several
    large files (at least 1/2 TB) while you do write ops on the same
    pool. What we've seen is performance of these write ops (let's
    call it sideload I/O) would drop to zero.

    More specifically the problem is that dmu_free_long_range_impl()
    can/will fill up all of the dirty data in the pool "instantly",
    before many of the sideload ops can get in. So sideload
    performance will be impacted until all the files are freed.

    The solution we have tested at Nexenta (with positive results)
    creates a relatively simple throttle for how many "free" ops we let
    into one TXG.

    However this solution exposes other problems that should also be
    addressed. If we are to slow down freeing of data that means one
    has to wait even longer (assuming vnode ref count of 1) to get shell
    back after an rm or for NFS thread to finish the free-ing op.
    To avoid this the proposed solution is to call zfs_inactive() async
    for "large" files. Async freeing then begs for the reclaimed space
    to be accounted for in the zpool's "freeing" prop.

    The other issue with having a longer delete is the inability to
    export/unmount for a longer period of time. The proposed solution
    is to interrupt freeing of blocks when a fs is unmounted.

  Author: Alek Pinchuk <alek@nexenta.com>
  Reviewed by: Matt Ahrens <mahrens@delphix.com>
  Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
  Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
  Approved by: Dan McDonald <danmcd@omniti.com>

Reviewed by:	avg
Differential Revision:	D9008
2017-01-20 15:01:04 +00:00
emaste
feeebed4f4 ANSYfy kern_ktrace.c and remove archaic register keyword
Sponsored by:	The FreeBSD Foundation
2017-01-20 14:59:56 +00:00
mav
be27e37e53 Report disk addition errors on add or create subcommand.
MFC after:	1 week
2017-01-20 13:49:04 +00:00
avg
05e4e60349 don't abort writing of a core dump after EFAULT
It's possible to get EFAULT when writing a segment backed by a file
if the segment extends beyond the file.
The core dump could still be useful if we skip the rest of the segment
and proceed to other segements.
The skipped segment (or a portion of it) will be zero-filled.

While there, use 'const' to signify that core_write() only reads the
buffer and use __DECONST before calling vn_rdwr_inchunks() because it
can be used for both reading and writing.

Before the change:
kernel: Failed to write core file for process mmap_trunc_core (error 14)
kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6

After the change:
kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core
kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped)

Reviewed by:	julian, kib
Obtained from:	Panzura (older version of the change)
MFC after:	5 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D9233
2017-01-20 13:39:07 +00:00
avg
fa73e5b5c7 vmm_dev: work around a bogus error with gcc 6.3.0
The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]

Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.

MFC after:	1 week
2017-01-20 13:21:27 +00:00
hselasky
c49cd7b58d Make draining a sendqueue more robust.
Add own state variable to track if a sendqueue is stopped or not.
This will prevent traffic from entering the sendqueue while it is
being destroyed.

Update drain function to wait for traffic to be transmitted before
returning when the link state is active.

Add extra checks in transmit path for stopped SQ's.

While at it:
- Use likely() for a mbuf pointer check.
- Remove redundant IFF_DRV_RUNNING check.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 12:02:40 +00:00
hselasky
ebf4085d99 Add runtime support for modifying the SQ and RQ completion event
moderation mode. The presence of this feature is indicated through the
firmware capabilities.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 11:11:49 +00:00
hselasky
08255e2b1b Update firmware interface structures and definitions adding support
for new features and commands.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-01-20 10:47:32 +00:00
adrian
8d8d1a126a [net80211] allow for MCS16-23 to be statically configured.
Tested:

* AR9380, STA mode
2017-01-20 07:43:40 +00:00
ngie
8e431518c3 Use SRCTOP-relative paths to other directories instead of .CURDIR-relative ones
This simplifies pathing in make/displayed output

MFC after:    3 weeks
Sponsored by: Dell EMC Isilon
2017-01-20 05:45:07 +00:00
pfg
c33a77a05d mppc - Finish pluging NETGRAPH_MPPC_COMPRESSION.
There were several places where reference to compression were left
unfinished. Furthermore, KASSERTs contained references to MPPC_INVALID
which is not defined in the tree and therefore were sure to break with
INVARIANTS: comment them out.

Reported by:	Eugene Grosbein
PR:		216265
MFC after:	3 days
2017-01-20 00:02:11 +00:00
jkim
bad55a0870 Merge ACPICA 20170119. 2017-01-19 22:07:21 +00:00
scottl
bac013c4c0 Rework the debug print API. Event printing no longer gets special handling.
All of the printing from the tables file now has wrappers so that the
handling is cleaner and it's possible to print something out (say, during
development) without having to fight the global debug flags. This re-org
will also make it easier to have the tables be compiled out at build time
if desired.

Other than fixing some minor bugs, there are no user-visible changes from
this change

Sponsored by:	Netflix, Inc.
Differential Revision:	D9238
2017-01-19 21:47:50 +00:00
kib
6e2a82f4df Remove mistakenly merged field.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 20:03:26 +00:00
sbruno
15aceefbb6 Adjust gtaskqueue startup again so that we catch the !SMP case and
users that choose not to use EARLY_AP_STARTUP.

There is still an initialization issue/panic with !SMP and !EARLY_AP_STARTUP
that we have yet to resolve.

Submitted by:	bde
2017-01-19 19:58:08 +00:00
kib
49b653be7a Add mount option for tmpfs(5) to not use namecache.
The option "nonc" disables using of namecache for the created mount,
by default namecache is used.  The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs.  Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default.  On the
other hand, smaller machines may benefit from lesser namecache
pressure.

Discussed with:	mjg
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:46:49 +00:00
kib
c30dfec101 Implement VOP_VPTOCNP() for tmpfs.
For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent
is used.  For non-directories, the implementation is naive, all
directory nodes are scanned to find a dirent linking the specified
node.  This can be significantly improved by maintaining tn_parent for
all nodes, later.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:29:13 +00:00
kib
4b4202e092 VNON nodes cannot exist.
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:25:42 +00:00
kib
65a8e18fa7 Refcount tmpfs nodes and mount structures.
On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer.  In this case, tmpfs_alloc_vp() accesses freed memory.

Introduce the reference count on the nodes.  The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference.  Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list.  Both nodes and tmpfs_mounts are removed when
refcount goes to zero.

Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished.  The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-01-19 19:15:21 +00:00
erj
a5b52954f7 e1000: Add support for Kaby Lake generation i219 (4) and i219 (5) devices
MFC after:	1 week
Sponsored by:	Intel Corporation
2017-01-19 18:52:38 +00:00
avg
96691e46d1 fix a thread preemption regression in schedulers introduced in r270423
Commit r270423 fixed a regression in sched_yield() that was introduced
in earlier changes.  Unfortunately, at the same time it introduced an
new regression.  The problem is that SWT_RELINQUISH (6), like all other
SWT_* constants and unlike SW_* flags, is not a bit flag.  So, (flags &
SWT_RELINQUISH) is true in cases where that was not really indended,
for example, with SWT_OWEPREEMPT (2) and SWT_REMOTEPREEMPT (11).

A straight forward fix would be to use (flags & SW_TYPE_MASK) ==
SWT_RELINQUISH, but my impression is that the switch types are designed
mostly for gathering statistics, not for influencing scheduling
decisions.

So, I decided that it would be better to check for SW_PREEMPT flag
instead.  That's also the same flag that was checked before r239157.
I double-checked how that flag is used and I am confident that the flag
is set only in the places where we really have the preemption:
- critical_exit + td_owepreempt
- sched_preempt in the ULE scheduler
- sched_preempt in the 4BSD scheduler

Reviewed by:	kib, mav
MFC after:	4 days
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D9230
2017-01-19 18:46:41 +00:00
kib
dbff4844a1 Make tmpfs directory cursor available outside tmpfs_subr.c.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 18:38:58 +00:00
hselasky
69cdfeb2a7 Fix problem with suspend and resume when using Skylake chipsets. Make
sure the XHCI controller is reset after halting it. The problem is
clearly a BIOS bug as the suspend and resume is failing without
loading the XHCI driver. The same happens when using Linux and the
XHCI driver is not loaded.

Submitted by:		Yanko Yankulov <yanko.yankulov@gmail.com>
PR:			216261
MFC after:		1 week
2017-01-19 18:33:27 +00:00
cem
c3bd748d53 ffs_vnops: Simplify extattr access
As suggested in r167010, use the structure type and macros to access and
modify UFS2 extended attributes.  Add assertions that pointers are
aligned in places where we now access the data through a structure
pointer, instead of character-by-character.

PR:		216127
Reported by:	dewayne at heuristicsystems.com.au
Reviewed by:	kib@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D9225
2017-01-19 16:46:05 +00:00
kib
022209a5df Rename tmpfs_mount member allnode_lock to include namespace prefix.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 16:01:36 +00:00
kib
ba16314dcf Protect macro argument.
Requested by:	hselasky
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 15:06:18 +00:00
loos
3b8977012f Handle the set capabilities ioctl, letting the hardware checksum be
disabled (Hi netmap!).

Only remove the CRC bytes from packets when the hardware tell us to do so.

Fixes the 'discard frame w/o leading ethernet header' issues.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-19 14:58:55 +00:00
kib
0df67f81eb Rework some tmpfs lock assertions.
Remove TMPFS_ASSERT_ELOCKED().  Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 14:49:55 +00:00
kib
e1ce156f26 Style fixes and comment updates.
Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.

Tested by:	pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 14:27:37 +00:00
loos
4879a81344 The port number and the to_port_en flag are valid only on SOP descriptor.
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-01-19 14:05:49 +00:00
kib
d899aacb44 Remove unused union member, fifos on tmpfs are implemented in common code.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-19 13:35:14 +00:00