Commit Graph

128295 Commits

Author SHA1 Message Date
markj
6e5d1bfd1d Remove a stale and incorrect comment.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-28 02:51:27 +00:00
markj
6795d68368 Remove workqueue items after updating the workqueue tail pointer.
When QUEUE_MACRO_DEBUG_TRASH is configured, the queue linkage fields
are trashed upon removal of the item, so be sure to only read them before
removing the item.

No functional change intended.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-28 02:48:37 +00:00
gonzo
e96647dde8 Fix MAC address detection regression introduced by r324184
To accomodate all variaties of Pi DTS files floating around
we look for MAC address property either in DTS node for
USB ethernet (if it exists) or at predefined path
".../usb/hub/ethernet".

After r324184 smsc_fdt_find_eth_node started to return node
with compatibility string "usb424,ec00" as an eth node.
In imported GNU dts files this node still does not have
MAC address related property, and therefor following check for
"mac-address" and "local-mac-address" fails.

To make this logic more robust do not just search for the node
but also make sure it has required property, so if node with
accepted compatibility string exists but doesn't have the
property we fall back to looking for hardoded path mentioned above.
2017-10-27 21:22:38 +00:00
tuexen
84d5fd2292 Fix parsing error when processing cmsg in SCTP send calls. Thei bug is
related to a signed/unsigned mismatch.
This should most likely fix the issue in sctp_sosend reported by
Dmitry Vyukov on the freebsd-hackers mailing list and found by
running syzkaller.
2017-10-27 19:27:05 +00:00
ian
0a072e9bda Actually release resources in detach() rather than just returning EBUSY.
This will enable use of 'devctl disable', allow creation of a module, etc.
2017-10-27 17:21:43 +00:00
markj
7dbd75884a Fix a lock leak in g_mirror_destroy().
g_mirror_destroy() is supposed to unlock the softc before indicating
success, but it wasn't doing so if the caller raced with another
thread destroying the mirror.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-27 17:05:14 +00:00
obrien
54594d8cf8 Update comment to match r177997 & r178036 changes. 2017-10-27 16:36:05 +00:00
imp
0c9db49957 nvd alias has caused some problems, revert it for the moment.
Sponsored by: Netflix
2017-10-27 14:57:38 +00:00
jhb
7db3e0e96c Rework pass through changes in r305485 to be safer.
Specifically, devices that do not support PCI-e FLR and were not
gracefully shutdown by the guest OS could continue to issue DMA
requests after the VM was terminated.  The changes in r305485 meant
that those DMA requests were completed against the host's memory which
could result in random memory corruption.  Instead, leave ppt devices
that are not attached to a VM disabled in the IOMMU and only restore
the devices to the host domain if the ppt(4) driver is detached from a
device.

As an added safety belt, disable busmastering for a pass-through device
when before adding it to the host domain during ppt(4) detach.

PR:		222937
Tested by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12661
2017-10-27 14:57:14 +00:00
jhb
0472a49b7f Discard the correct thread event reported for a ptrace stop.
When multiple threads wish to report a tracing event to a debugger,
both threads call ptracestop() and one thread will win the race to be
the reporting thread (p->p_xthread).  The debugger uses PT_LWPINFO
with the process ID to determine which thread / LWP is reporting an
event and the details of that event.  This event is cleared as a side
effect of the subsequent ptrace event that resumed the process
(PT_CONTINUE, PT_STEP, etc.).  However, ptrace() was clearing the
event identified by the LWP ID passed to the resume request even if
that wasn't the 'p_xthread'.  This could result in clearing an event
that had not yet been observed by the debugger and leaving the
existing event for 'p_thread' pending so that it was reported a second
time.

Specifically, if the debugger stopped due to a software breakpoint in
one thread, but then switched to another thread that was used to
resume (e.g. if the user switched to a different thread and issued a
step), the resume request (PT_STEP) cleared a pending event (if any)
for the thread being stepped.  However, the process immediately
stopped and the first thread reported it's breakpoint event a second
time.  The debugger decremented the PC for "both" breakpoint events
which resulted in the PC now pointing into the middle of an
instruction (on x86) and a SIGILL fault when the process was resumed a
second time.

To fix, always clear the pending event for 'p_xthread' when resuming a
process.  ptrace() still honors the requested LWP ID when enabling
single-stepping (PT_STEP) or setting a different PC (PT_CONTINUE).

Reported by:	GDB testsuite (gdb.threads/continue-pending-status.exp)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12794
2017-10-27 03:16:19 +00:00
imp
2793077fcd We should be call adaerror() instead of cam_periph_error() always.
Sponsored by: Netflix
2017-10-26 22:53:55 +00:00
imp
86103edf8e Always send STANDBY IMMEDIATE when shutting down
To save SMART data and for a drive to understand that it's been nicely
shutdown, we need to send a STANDBY IMMEDIATE. Modify adaspindown to
use a local CCB on the stack. When we're panicing, used
xpt_polled_action rather than cam_periph_runccb so that we can SEND
IMMEDIATE after we've shutdown the scheduler.

Sponsored by: Netflix
Reviewed by: scottl@, gallatin@
Differential Revision: https://reviews.freebsd.org/D12799
2017-10-26 22:53:49 +00:00
imp
ccfbd308f9 Make time we wait for a power cycle tunable.
hw.ipmi.cycle_time is the time to wait for the power down phase of the
ipmi power cycle before falling back to either reboot or halt.

Sponsored by: Netflix
2017-10-26 22:53:02 +00:00
imp
609e0d0778 Various IPMI watchdog timer improvements
o Make hw.ipmi.on a tuneable
o Changes to keep shutdown from hanging indefinitately after the wd
  would normally have been disabled.
o Add support for setting pretimeout (which fires an interrupt
  some time before the actual watchdog expires)
o Allow refinement of the actions to take when the watchdog expires
o Allow special startup timeout to keep us from hanging in boot
  before watchdogd is started, but after we've loaded the kernel.

Obtained From: Netflix OCA Firmware
2017-10-26 22:52:51 +00:00
oshogbo
8380d411a7 Introduce cnvlist_name() and cnvlist_type() functions.
Those function can be used when we are iterating over nvlist to reduce
amount of extra variables we need to declare.

MFC after:	1 month
2017-10-26 20:44:42 +00:00
asomers
3edbac4ecd Fix aio_suspend in 32-bit emulation
An off-by-one error has been present since the system call was first present
in 185878.  It additionally became a memory corruption bug after change
324941.  The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.

Reported by:	Coverity, cem
CID:		1382114
MFC after:	18 days
X-MFC-With:	324941
Sponsored by:	Spectra Logic Corp
2017-10-26 19:45:15 +00:00
jilles
6b71a8aec8 libnv: Fix strict-aliasing violation with cookie
In rS323851, some casts were adjusted in calls to nvlist_next() and
nvlist_get_pararr() in order to make scan-build happy. I think these changes
just confused scan-build into not reporting the strict-aliasing violation.

For example, nvlist_xdescriptors() is causing nvlist_next() to write to its
local variable nvp of type nvpair_t * using the lvalue *cookiep of type
void *, which is not allowed. Given the APIs of nvlist_next(),
nvlist_get_parent() and nvlist_get_pararr(), one possible fix is to create a
local void *cookie in nvlist_xdescriptors() and other places, and to convert
the value to nvpair_t * when necessary. This patch implements that fix.

Reviewed by:	oshogbo
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12760
2017-10-26 18:32:04 +00:00
imp
ab6d04b193 Add a 'place holder' arm struct efi_fb until a real one comes
along. This allows the arm efi boot loader to compile again.

Sponsored by: Netflix
2017-10-26 16:36:27 +00:00
trasz
2589d9ccaf Make gmountver(8) use direct dispatch.
MFC after:	2 weeks
2017-10-26 10:18:31 +00:00
truckman
df064ee4ac Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).

Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame.  This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2.  More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.

Submitted by:	lstewart
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D12506
2017-10-26 10:11:35 +00:00
trasz
094f4d7259 Make gmountver(8) use G_PF_ACCEPT_UNMAPPED.
MFC after:	2 weeks
2017-10-26 09:29:35 +00:00
kp
baa79ec703 Evaluate packet size after the firewall had its chance in the ip6 fast path
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.
This mirrors what the slow path does.

Reviewed by:	ae
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12779
2017-10-25 19:21:48 +00:00
manu
9d60c6e2a4 loader.efi: Make framebuffer commands available for arm64
Move framebuffer.{c,h} to sys/boot/efi/loader and add the efifb
related metadata and pass it to the kernel

Reviewed by:	imp, andrew
Differential Revision:	https://reviews.freebsd.org/D12757
2017-10-25 18:55:04 +00:00
sbruno
940e3c6c41 Enable i386 build of the Cavium LiquidIO driver (lio) module.
Submitted by:	pkanneganti@cavium.com (Prasad V Kanneganti)
MFC after:	1 week
Sponsored by:	Cavium Networks
Differential Revision:	https://reviews.freebsd.org/D12415
2017-10-25 17:49:17 +00:00
markj
668ea833d1 Make drain_output() use bufobj_wwait().
No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12790
2017-10-25 17:20:18 +00:00
imp
2502c4412c Implement IPMI support for RB_POWRECYCLE
Some BMCs support power cycling the chassis via the chassis control
command 2 subcommand 2 (ipmitool called it 'chassis power cycle').  If
the BMC supports the chassis device, register a shutdown_final handler
that sends the power cycle command if request and waits up to 10s for
it to take effect. To minimize stack strain, we preallocate a ipmi
request in the softc. At the moment, we're verbose about what we're
doing.

Sponsored by: Netflix
2017-10-25 15:30:53 +00:00
imp
369f9fa7ad Handle RB_POWERCYCLE in ada driver
Allow the disks to be spun down when doing a POWERCYCLE as well as
POWEROFF.

Sponsored by: Netflix
2017-10-25 15:30:48 +00:00
imp
a46386815a Handle RB_POWERCYCLE in the MI part of the kernel
Signal init with SIGWINCH in shutdown_nice for RB_POWERCYCLE.

Sponsored by: Netflix
2017-10-25 15:30:44 +00:00
imp
1b355cb406 Define RB_POWERCYCLE
RB_POWERCYCLE instructs the platform to power off and then power back
on a short time later, if that's possible. Otherwise, degrade to the
RB_POWEROFF behavior.

Sponsored by: Netflix
2017-10-25 15:30:20 +00:00
imp
05c5ce5637 Remove sys/boot/arm/at91 and ixp425
Remove at91 bootloader. It only worked on AT91RM9200, and only
specific boards that were all EOLd 10 years ago.

Remove ixp425. It doesn't build anymore and is for boards that were
EOLd 8 years ago.

Sponsored by: Netflix
2017-10-25 15:28:05 +00:00
imp
baf954a462 Move BINDIR definition to defs.mk, and override where it isn't /boot
(those files already do that so weren't changed).

Sponsored by: Netflix
2017-10-25 15:27:58 +00:00
imp
b113dc9515 Use BOOTDIR consistently. We need to include bsd.init.mk early to make
this happen. This will cause src.opts.mk to be included, so remove
that. This needs to propigate through the sys/boot tree.

Sponsored by: Netflix
2017-10-25 15:27:53 +00:00
tuexen
9799ec2be8 Fix a bug reported by Felix Weinrank using the libfuzzer on the
userland stack.

MFC after:	3 days
2017-10-25 09:12:22 +00:00
markj
a049a758b2 Add support for compressed kernel dumps.
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.

Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.

savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.

A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.

Reviewed by:	cem (earlier version)
Discussed with:	def, rgrimes
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11723
2017-10-25 00:51:00 +00:00
shurd
1f73004f8d bnxt: add support for Flow control setting using sysctl
Created sysctl node dev.bnxt.0.fc with following options.

A. dev.bnxt.0.fc.autoneg
B. dev.bnxt.0.fc.rx
C. dev.bnxt.0.fc.tx

Description:-
dev.bnxt.0.fc: flow ctrl
dev.bnxt.0.fc.autoneg: Enable or Disable Autoneg Flow Ctrl: 0 / 1
dev.bnxt.0.fc.rx: Enable or Disable Rx Flow Ctrl: 0 / 1
dev.bnxt.0.fc.tx: Enable or Disable Tx Flow Ctrl: 0 / 1

Submitted by:	Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Reviewed by:	shurd, sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Broadcom Limited
Differential Revision:	https://reviews.freebsd.org/D12599
2017-10-24 21:18:50 +00:00
fsu
edb3be92a6 Fix physical block number overflow in different places.
Approved by:    pfg (mentor)
MFC after:      6 months
2017-10-24 20:10:08 +00:00
fsu
be8349452e Set doreallocblks sysctl value to zero by default because of
possibility of filesystem corruption.

Approved by:    pfg (mentor)
MFC after:      2 weeks
2017-10-24 19:16:25 +00:00
fsu
66c5b6ebb7 Do not free bufs in case of extents metadata blocks + remove unneeded asserts.
Approved by:    pfg (mentor)
MFC after:      6 months
2017-10-24 19:14:33 +00:00
alc
1728bf4480 Micro-optimize the handling of fictitious pages in vm_page_free_prep().
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.

Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.

Reviewed by:	kib, markj
MFC after:	1 week
2017-10-24 17:14:53 +00:00
tuexen
ae1c6d7d9f Fix a bug in handling special ABORT chunks.
Thanks to Felix Weinrank for finding this issue using libfuzzer with
the userland stack.

MFC after:	3 days
2017-10-24 16:24:12 +00:00
avg
075c9b9645 iscsi_shutdown_post: do nothing if panic-ing
There is nothing that that routine should or could really do in that
context.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
MFC after:	1 week
2017-10-24 14:59:31 +00:00
tuexen
874f77fd25 Fix a locking issue found by running AFL on the userland stack.
Thanks to Felix Weinrank for reporting the issue.

MFC after:	3 days
2017-10-24 14:28:56 +00:00
ae
7059976537 Add IPv6 support for O_TCPDATALEN opcode.
PR:		222746
MFC after:	1 week
2017-10-24 08:39:05 +00:00
np
e1852a0edc cxgbe(4): Read the MPS buffer group map from the firmware as it could be
different from hardware defaults.  The congestion channel map, which is
still fixed, needs to be tracked separately now.  Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-10-24 05:41:48 +00:00
imp
4afc14666f Treat a 'current' value of 0 as unlimited as a failsfe.
When limiting I/O, a value of 0 makes no sense as a limit. No progress
can be made. Trade the possibility that someone might be doing
something clever to achieve ultra-low I/O limits vs the damage of not
ever making progress on an I/O in favor of making progress. Now the
machine won't be useless if this accidentally gets requested.

Sponsored by: Netflix
2017-10-24 02:25:42 +00:00
asomers
d4f614fd9c Remove artificial restriction on lio_listio's operation count
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:

* To size a UMA zone, which globally limits the number of concurrent
  aio_suspend calls.

* To artifically limit the number of operations in a single lio_listio call.
  There doesn't seem to be any memory allocation associated with this limit.

This change does two things:

* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.

* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
  calls will now be limited by the more generous max_aio_queue_per_proc. The
  old p1003_1b.aio_listio_max is now an alias for
  vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
  _SC_AIO_LISTIO_MAX.

Reported by:	bde
Reviewed by:	jhb
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12120
2017-10-23 23:12:01 +00:00
asomers
010bb3a477 Fix the error message when creating a zpool on a too-small device
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path.
It's redundant with the check in vdev_open, and failing to attach here
results in the wrong error message being printed.  However, still check for
it in some other situations:

* When opening by guids, so we don't get bogged down reading from slow
  devices like floppy drives.
* In vdev_geom_read_pool_label for the same reason, because we iterate over
  all providers.
* If the caller requests that we verify the guid, because then we'll have to
  read from the device before vdev_open verifies the size.

PR:		222227
Reported by:	Marie Helene Kvello-Aune <marieheleneka@gmail.com>
Reviewed by:	avg, mav
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D12531
2017-10-23 23:05:29 +00:00
shurd
8b51323439 Some cache related optimizations
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.

Provides small performance improvments on some hardware

Reviewed by:	sbruno
Approved by:	sbruno (mentor)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12447
2017-10-23 20:50:08 +00:00
markj
a57e72e403 Remove resource_set_*() declarations from sys/bus.h.
The corresponding definitions were removed in r78135.

PR:		223189
Submitted by:	marc.priggemeyer@gmail.com
MFC after:	1 week
2017-10-23 16:02:48 +00:00
mjoras
d7ecc62316 Move clear_unrhdr to tmpfs_free_tmp.
Clearing the unr in tmpfs_unmount is not correct. In the case of
multiple references to the tmpfs mount (e.g. when there are lookup
threads using it) it will not be the one to finish tmpfs_free_tmp. In
those cases tmpfs_free_node_locked will be the final one to execute
tmpfs_free_tmp, and until then the unr must be valid.

Reported by:	pho
Approved/reviewed by:	rstone (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12749
2017-10-23 15:43:38 +00:00