Commit Graph

229593 Commits

Author SHA1 Message Date
Alexander Motin
ebd7699264 8909 8585 can cause a use-after-free kernel panic
illumos/illumos-gate@94ddd0900a

https://www.illumos.org/issues/8909:
There's a race condition that exists if `zil_free_lwb` races with either
`zil_commit_waiter_timeout` and/or `zil_lwb_flush_vdevs_done`.

Here's an example panic due to this bug:

> ::status
    debugging crash dump vmcore.0 (64-bit) from ip-10-110-205-40
    operating system: 5.11 dlpx-5.2.2.0_2017-12-04-17-28-32b6ba51fb (i86pc)
    image uuid: 4af0edfb-e58e-6ed8-cafc-d3e9167c7513
    panic message:
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0010555970 addr=60 occurred in module "zfs" due to a NULL pointer dereference
    dump content: kernel pages only

> $c
    zio_shrink+0x12()
    zil_lwb_write_issue+0x30d(ffffff03dcd15cc0, ffffff03e0730e20)
    zil_commit_waiter_timeout+0xa2(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit_waiter+0xf3(ffffff03dcd15cc0, ffffff03d97ffcf8)
    zil_commit+0x80(ffffff03dcd15cc0, 9a9)
    zfs_write+0xc34(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    fop_write+0x5b(ffffff03dc38b140, ffffff0010555e60, 40, ffffff03e00fb758, 0)
    write+0x250(42, fffffd7ff4832000, 2000)
    sys_syscall+0x177()

If there's an outstanding lwb that's in `zil_commit_waiter_timeout`
waiting to timeout, waiting on it's waiter's CV, we must be sure not to
call `zil_free_lwb`. If we end up calling `zil_free_lwb`, then that LWB
may be freed and can result in a use-after-free situation where the
stale lwb pointer stored in the `zil_commit_waiter_t` structure of the
thread waiting on the waiter's CV is used.

A similar situation can occur if an lwb is issued to disk, and thus in
the `LWB_STATE_ISSUED` state, and `zil_free_lwb` is called while the
disk is servicing that lwb. In this situation, the lwb will be freed by
`zil_free_lwb`, which will result in a use-after-free situation when the
lwb's zio completes, and `zil_lwb_flush_vdevs_done` is called.

This race condition is prevented in `zil_close` by calling `zil_commit`
before `zil_free_lwb` is called, which will ensure all outstanding (i.e.
all lwb's in the `LWB_STATE_OPEN` and/or `LWB_STATE_ISSUED` states)
reach the `LWB_STATE_DONE` state before the lwb's are freed
(`zil_commit` will not return untill all the lwb's are
`LWB_STATE_DONE`).

Further, this race condition is prevented in `zil_sync` by only calling
`zil_free_lwb` for lwb's that do not have their `lwb_buf` pointer set.
All lwb's not in the `LWB_STATE_DONE` state will have a non-null value
for this pointer; the pointer is only cleared in
`zil_lwb_flush_vdevs_done`, at which point the lwb's state will be
changed to `LWB_STATE_DONE`.

This race is present in `zil_suspend`, leading to this bug.

At first glance, it would appear as though this would not be true
because `zil_suspend` will call `zil_commit`, just like `zil_close`, but
the problem is that `zil_suspend` will set the zilog's `zl_suspend`
field prior to calling `zil_commit`. Further, in `zil_commit`, if
`zl_suspend` is set, `zil_commit` will take a special branch of logic
and use `txg_wait_synced` instead of performing the normal `zil_commit`
logic.

This call to `txg_wait_synced` might be good enough for the data to
reach disk safely before it returns, but it does not ensure that all
outstanding lwb's reach the `LWB_STATE_DONE` state before it returns.
This is because, if there's an lwb "stuck" in
`zil_commit_waiter_timeout`, waiting for it's lwb to timeout, it will
maintain a non-null value for it's `lwb_buf` field and thus `zil_sync`
will not free that lwb. Thus, even though the lwb's data is already on
disk, the lwb will be left lingering, waiting on the CV, and will
eventually timeout and be issued to disk even though the write is
unnesseary.

So, after `zil_commit` is called from `zil_suspend`, we incorrectly
assume that there are not outstanding lwb's, and proceed to free all
lwb's found on the zilog's lwb list. As a result, we free the lwb that
will later be used `zil_commit_waiter_timeout`.

Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:12:38 +00:00
Alexander Motin
d252c97fc0 MFV r328225: 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock"
illumos/illumos-gate@cf07d3da99

https://www.illumos.org/issues/8603:
  To help make the ZIL's code more understandable, it was suggested that
  the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock".

Reviewed by: C Fraire <cfraire@me.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-01-21 23:11:20 +00:00
Alexander Motin
d216de62d1 8603 rename zilog's "zl_writer_lock" to "zl_issuer_lock"
illumos/illumos-gate@cf07d3da99

https://www.illumos.org/issues/8603:
  To help make the ZIL's code more understandable, it was suggested that
  the zilog_t's "zl_writer_lock" field should be renamed to "zl_issuer_lock".
2018-01-21 23:04:25 +00:00
Alexander Motin
442814b7e7 MFV r328220: 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 23:02:05 +00:00
Poul-Henning Kamp
137a344c63 Rename rpi_pwm to bcm283x_pwm, and build it on armv[67] and arm64.
Truncate ratio if period is lowered.

Tested on Rpi2 and Rpi3.

Rpi3 requires DTB->DTS->edit->DTB hack
2018-01-21 21:27:41 +00:00
Eitan Adler
bdf16dd67c iconv: adding missing break
break is probably intended and correct,
but has no correctness implications due to is94 => is96

Reviewed by:	cem, jilles
Reported by:	swildner@DragonFlyBSD.org
MFC After:	1 week
2018-01-21 21:09:08 +00:00
Pedro F. Giffuni
3669918823 Define a new __alloc_size2 attribute to complement the exiting support.
At least on GCC7 calling __alloc_size(x) twice is not equivalent to
calling using the attribute once with two arguments. The later is the
documented use in GCC documentation so add a new alloc_size(n, x)
alternative to cover for the few places where it is used: basically:
calloc(3), reallocarray(3) and  mallocarray(9).

Submitted by:	Mark Millard
MFC after:	3 days
Reference:
http://docs.freebsd.org/cgi/mid.cgi?F227842D-6BE2-4680-82E7-07906AF61CD7
2018-01-21 20:27:47 +00:00
Alexander Motin
619fd3c317 8677 Open-Context Channel Programs
illumos/illumos-gate@a3b2868063

https://www.illumos.org/issues/8677
  We want to be able to run channel programs outside of synching context.
  This would greatly improve performance of channel program that just gather
  information, as we won't have to wait for synching context anymore.

  This feature should introduce the following:
  - A new command line flag in "zfs program" to specify our intention to
  run in open context.
  - A new flag/option within the channel program ioctl which selects the
  context.
  - Appropriate error handling whenever we try a channel program in
  open-context that contains zfs.sync* expressions.
  - Documentation for the new feature in the manual pages.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author:	Serapheim Dimitropoulos <serapheim@delphix.com>
2018-01-21 19:26:38 +00:00
Edward Tomasz Napierala
1ee5bed72a Add missing manufacturer/serial number string descriptors.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-21 17:31:31 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Andriy Gapon
e09304d8f3 zfs: no need to check that size of zfs_cmd_t is not greater than IOCPARM_MAX
Nowadays we do not pass zfs_cmd_t directly through the ioctl interface.
Instead a small zfs_iocparm_t object is passed and the command is
explicitly copied in and out.  So, the check has become irrelevant.

MFC after:	3 weeks
Sponsored by:	Panzura
2018-01-21 11:19:18 +00:00
Eitan Adler
7275057a5e limits(1): fix always true condition
Reviewed by:	imp
MFC After:	1 week
2018-01-21 08:48:26 +00:00
Kyle Evans
4f8f1c798e regex(3): Resolve issues with higher WARNS levels
libc is set for WARNS=2, but the incoming libregex will use WARNS=6.
Sprinkle some casts and (void)bc's to alleviate the warnings that come along
with the higher WARNS level.

These 'bc' parameters could be outright removed, but as of right now they
will be used in some parts of libregex land. Silence the warnings instead
rather than flip-flopping.
2018-01-21 04:57:29 +00:00
Eitan Adler
7e2c40ffa9 termcap: add xterm-termite
Obtained from:	DragonFly
MFC After:	1 week
2018-01-20 22:24:45 +00:00
Jean-Sébastien Pédron
90b0eb9b4a psm: Log syncmask[1], not syncmask[0] twice
MFC after:	1 week
2018-01-20 19:04:21 +00:00
Eitan Adler
5a9fdfb303 limits(1): add missing break
Reported by:	swildner@DragonFlyBSD.org
MFC After:	1 week
2018-01-20 18:27:00 +00:00
Konstantin Belousov
c398c14664 Use correct symbol name in r328202.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2018-01-20 18:05:14 +00:00
Konstantin Belousov
3a5e472e17 Use predefined symbol for the CR3.PCID mask.
Sponsored by:	The FreeBSD Foundation
MFC after:	11 days
2018-01-20 17:46:09 +00:00
Michal Meloun
f8759facd2 Convert extres/phy to kobj model.
Similarly as other extres pseudo-drivers, implement phy by using kobj model.
This detaches it from provider device, so single device driver can export
multiple different phys. Additionally, this  allows phy to be subclassed to
more specialized drivers, like is USB OTG phy, or PCIe phy with hot-plug
capability.

Tested by:	manu (previous version, on Allwinner board)
MFC after:	1 month
2018-01-20 17:02:17 +00:00
Li-Wen Hsu
08c80489cd Silence the gcc warning: 'op' may be used uninitialized in this function
Approved by:	kevans
2018-01-20 15:37:47 +00:00
Roger Pau Monné
50a53194f6 xen: fix IDT setup after PTI
On amd64 the IDT handler was not set correctly when using PTI.

While there also fix the selectors to SEL_KPL.

Obtained from:	kib
MFC with:	r328083
2018-01-20 14:59:37 +00:00
Emmanuel Vadot
2b541904af clk: Get new parent freq after set_freq
During set_freq a clknode might have reparent (using a better parent that
have a higher frequency for example), before refreshing the cache, re-get
the parent frequency.

Reviewed by:	mmel
2018-01-20 14:47:27 +00:00
Edward Tomasz Napierala
ad2c142415 Remove unused index.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-20 14:05:55 +00:00
Edward Tomasz Napierala
42e42a9aa4 Add missing SPDX tags; the rest of the license text is the same as in other
USB templates.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-20 14:03:55 +00:00
Edward Tomasz Napierala
83d02512b4 Add usb_template(4) to RPI-B kernel config. This is to support the USB OTG
functionality on Raspberry Pi 0.

Reviewed by:	hselasky@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13924
2018-01-20 14:00:07 +00:00
Edward Tomasz Napierala
8e06491a4e Add sysctls to control device side USB identifiers. This makes it
possible to change string and numeric vendor and product identifiers,
as well as anything else there might be to change for a particular
device side template, eg the MAC address.

Reviewed by:	hselasky@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13920
2018-01-20 13:58:34 +00:00
Konstantin Belousov
1c5196c3ed Assign map->header values to avoid boundary checks.
In several places, entry start and end field are checked, after
excluding the possibility that the entry is map->header.  By assigning
max and min values to the start and end fields of map->header in
vm_map_init, the explicit map->header checks become unnecessary.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc, kib, markj (previous version)
Tested by:	pho (previous version)
MFC after:	1 week
Differential Revision:  https://reviews.freebsd.org/D13735
2018-01-20 12:19:02 +00:00
Jean-Sébastien Pédron
b9b766ae47 psm: Don't try to detect trackpoint packets if the Elantech device has none
This fixes a panic when `EVDEV_SUPPORT` is enabled: if a trackpoint
packet was detected but there was no trackpoint, we still tried to emit an
evdev event even though the associated relative evdev device (`evdev_r`)
was not initialized.

PR:		225339
MFC after:	1 week
2018-01-20 11:21:22 +00:00
Jean-Sébastien Pédron
9402bd96df psm: Skip sync check when PSM_CONFIG_NOCHECKSYNC is set
In psmprobe(), we set the initial `syncmask` to the vendor default value
if the `PSM_CONFIG_NOCHECKSYNC` bit is unset. However, we currently only
set it for the Elantech touchpad later in psmattach(), thus `syncmask`
is always configured.

Now, we check `PSM_CONFIG_NOCHECKSYNC` and skip sync check if it is set.
This fixes Elantech touchpad support for units which have `hascrc` set.

To clarify that, when we log the `syncmask` and `syncbits` fields, also
mention if they are actually used.

Finally, when we set `PSM_CONFIG_NOCHECKSYNC`, clear `PSM_NEED_SYNCBITS`
flag.

PR:		225338
MFC after:	1 week
2018-01-20 11:02:18 +00:00
Kyle Evans
450c7a04f7 od(1): Fix mis-patch from r328188
od_test.sh got duplicated erroneously when it was added in r328188. Dedup.
2018-01-20 03:50:56 +00:00
Kyle Evans
6bb764934c od(1): Fix wrong output for some corner cases in multibyte locales.
Restore the original character to print if we used the look-ahead
buffer, but that didn't help -- we either got an illegal sequence
or still can't complete.

PR:		224552
Submitted by:	Yuri Pankov
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D13963
2018-01-20 02:49:32 +00:00
Landon J. Fuller
f248a99a0e bhnd_chipc(4): Fix leak of child device ivars by explicitly deleting
any children prior to detach.

With the newbus child deletion ordering changes introduced in r307518,
parent devices are now detached (and their driver set to NULL) prior to
detaching and deleting child devices; child-related bus methods (e.g.
BUS_CHILD_DETACHED, BUS_CHILD_DELETED) are no longer be dispatched to the
parent device driver after it returns 0 (success) from DEVICE_DETACH.

Sponsored by:   The FreeBSD Foundation
2018-01-20 01:55:34 +00:00
Landon J. Fuller
a225321fbc bhnd/bwn(4): Define a bhnd(4) softmodem device class for the v.90 modem
codec core, and mark the core as unpopulated on all BCM4306 bwn(4) devices.

Sponsored by:	The FreeBSD Foundation
2018-01-19 22:43:08 +00:00
Landon J. Fuller
72759b4824 bwn(4): Add missing BCM4306 PCI IDs.
Sponsored by:	The FreeBSD Foundation
2018-01-19 22:37:48 +00:00
Landon J. Fuller
aea6ca2486 bwn(4): Fix DMA translation lookup on devices limited to 30-bit host
addressing. The host addressing constraint does not apply to device address
space, and shouldn't be passed to bhnd_get_dma_translation() as the
maximum supported device address width.

Sponsored by:	The FreeBSD Foundation
2018-01-19 22:33:25 +00:00
Landon J. Fuller
711221efd8 bhndb_pci(4): Implement bridge support for CardBus-attached devices.
- Extend the probe method to accept devclasses that inherit from the pci
   devclass (e.g. cardbus).
 - Some BCM4306-based CardBus adapters appear to advertise 4K SPROM, but
   only the first 2K is mapped into BAR0. We can safely assume that the
   SPROM data fits within the first 2K of the SPROM, rather than rejecting
   the SPROM mapping as invalid.

Sponsored by:	The FreeBSD Foundation
2018-01-19 22:22:02 +00:00
Nathan Whitehorn
7790e46cf0 On AIM systems without a software-managed SLB, such as POWER9 systems using
either hardware segment tables or radix-tree-based page tables, do not try
to install SLB entries at trap boundaries.
2018-01-19 22:19:50 +00:00
Nathan Whitehorn
ad6b97e7ca Define PHYS_TO_DMAP() and DMAP_TO_PHYS() as panics on the architectures
(i386 and arm) that never implement them. This allows the removal of
#ifdef PHYS_TO_DMAP on code otherwise protected by a runtime check on
PMAP_HAS_DMAP. It also fixes the build on ARM and i386 after I forgot an
#ifdef in r328168.

Reported by:	Milan Obuch
Pointy hat to:	me
2018-01-19 22:17:13 +00:00
Konstantin Belousov
b4dfc9d7ad PTI: Trap if we returned to userspace with kernel (full) page table
still active.

Map userspace portion of VA in the PTI kernel-mode page table as
non-executable. This way, if we ever miss reloading ucr3 into %cr3 on
the return to usermode, the process traps instead of executing in
potentially vulnerable setup.  Catch the condition of such trap and
verify user-mode %cr3, which is saved by page fault handler.

I peek this trick in some article about Linux implementation.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
DIfferential revision:	https://reviews.freebsd.org/D13956
2018-01-19 22:10:29 +00:00
Landon J. Fuller
1c8edc1ed8 bhnd(4): fix a few bugs in pwrctl/fixed-clock device support.
- Do not panic on siba(4) detach when the bhnd(4) bus calls
   bhnd_get_pmu_info() on a PMU-less device.
 - Fix bhnd_pwrctl attach/detach on fixed-clock devices:
    - Treat bhnd_pwrctl_updateclk() as a no-op on fixed-clock devices.
    - Use bhnd_pwrctl_updateclk() to perform the appropriate clock
      transition on detach.

Sponsored by:	The FreeBSD Foundation
2018-01-19 21:58:48 +00:00
Landon J. Fuller
5ad00fa2cf bhnd_chipc(4): Fix the assignment of non-wildcard child unit numbers
introduced in r326102 and r326109; all chipc children should be added with
a wildcard unit (-1).

Sponsored by:	The FreeBSD Foundation
2018-01-19 21:36:28 +00:00
Kyle Evans
ca84c67cd0 dtc(1): Update to upstream ea3c233
Highlights of this update:
- /__local_fixups__ is now generated to be GPL dtc and libfdt compliant
- Compiling with -@ will now cause dtc to assign phandles to all labelled
  nodes
- /include/ and /incbin/ now handle absolute paths correctly
- The manpage now has information about overlays, including how to apply
  them and how to generate them
- Syntactic sugar for overlays is now supported, allowing an overlay DTS
  like:

=
/dts-v1/;
/plugin/;

&foo {
    foo,status = "okay";
};
=

to generate a fragment targetting <&foo>.
2018-01-19 21:20:24 +00:00
Andrey V. Elsukov
7e03e18104 Rename "index" variable to "idx" since gcc complains that it shadows
index(3) function declaration.

Reported by:	lwhsu
MFC after:	2 weeks
2018-01-19 20:33:47 +00:00
Scott Long
c35fca48b3 Fix compile errors in r328165
Reported by:	O. Hartmann
Sponsored by:	Netflix
2018-01-19 19:18:14 +00:00
Warner Losh
cec2b30645 Remove extra copy of bootinfo.c. It's a bit rotted copy of the one in
efi/loader.

Differential Review: https://reviews.freebsd.org/D13986
2018-01-19 19:09:17 +00:00
Nathan Whitehorn
9a8196ce19 Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the
kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and
powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be
evaluated at runtime to determine if the architecture has a direct map;
if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or
1, the compiler can remove the conditional logic.

As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had
similar things but spelled differently. 32-bit MIPS has a partial direct-map
that maps poorly to this concept and is unchanged.

Reviewed by:		kib
Suggestions from:	marius, alc, kib
Runtime tested on:	amd64, powerpc64, powerpc, mips64
2018-01-19 17:46:31 +00:00
Ed Maste
b3327f62f0 Enable KPTI by default on amd64 for non-AMD CPUs
Kernel Page Table Isolation (KPTI) was introduced in r328083 as a
mitigation for the 'Meltdown' vulnerability.  AMD CPUs are not affected,
per https://www.amd.com/en/corporate/speculative-execution:

    We believe AMD processors are not susceptible due to our use of
    privilege level protections within paging architecture and no
    mitigation is required.

Thus default KPTI to off for AMD CPUs, and to on for others.  This may
be refined later as we obtain more specific information on the sets of
CPUs that are and are not affected.

Submitted by:	Mitchell Horne
Reviewed by:	cem
Relnotes:	Yes
Security:	CVE-2017-5754
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13971
2018-01-19 15:42:34 +00:00
Scott Long
19641ce893 Revert ABI breakage to CAM that came in with MMC/SD support in r320844.
Make it possible to retrieve mmc parameters via the XPT_GET_ADVINFO
call instead.  Convert camcontrol to the new scheme.

Reviewed by:	imp. kibab
Sponsored by:	Netflix
Differential Revision:	D13868
2018-01-19 15:32:27 +00:00
Pedro F. Giffuni
7326b4e68c libnv: Use mallocarray(9) for the nv_calloc. 2018-01-19 14:50:53 +00:00
Hans Petter Selasky
77e7601e6d Add new USB ID to U3G driver.
PR:		134299
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-19 13:06:36 +00:00