256770 Commits

Author SHA1 Message Date
scottl
b0d1efe0ae Copy and clear the reply descriptor atomically. This prevents concurrency
in the interrupt handlers (usually due to timeout/error recovery) from
seeing and processing the same descriptor twice.
2018-12-09 06:10:11 +00:00
scottl
050d2d6c93 Remove the mps driver from powerpc 32bit GENERIC, and don't build it and
mpr as a module for powerpc or mips.  An upcoming commit will cause these
drivers to rely on the presence of 64bit atomic operations.  Discussed
with jhibbits.
2018-12-09 06:06:06 +00:00
jhibbits
f6d390617b powerpc/SPE: Copy lower part of source register to target for efdabs/efdnabs/efdneg
MFC after:	1 week
MFC With:	r341751
2018-12-09 04:54:55 +00:00
jhibbits
2cbeab925f powerpc/SPE: Reload vector registers after efdabs/efdnabs/efdneg
While here, also style(9)-adjust indents around this code.
2018-12-09 04:13:14 +00:00
sobomax
b3480b6b3e Hook up ng_checksum(4) module and appropriate manpage to the build. The module
was added back in 2016, but has never been connected.

MFC after:	1 week
2018-12-09 02:58:53 +00:00
kib
ea29016877 Fix PAE boot.
With the introduction of M_EXEC support for kmem_malloc(), some kernel
mappings start having NX bit set in the paging structures early, for
PAE kernels on machines with NX support, i.e. practically on all
machines.  In particular, AP trampoline and initialization needs to
access pages which translations has NX bit set, before initializecpu()
is called.

Check for CPUID NX feature and enable EFER.NXE before we enable paging
in mp boot trampoline.  This allows the CPU to use the kernel page
table instead of generating page fault due to reserved bit set.

PR:	233819
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-08 22:12:57 +00:00
jchandra
5a913206e0 arm64: add ACPI based NUMA support
Use the newly defined SRAT/SLIT parsing APIs in arm64 to support
ACPI based NUMA.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D17943
2018-12-08 19:42:01 +00:00
jchandra
6fd503bd86 acpica: support parsing of arm64 affinity in acpi_pxm.c
ACPI SRAT table on arm64 uses GICC entries to provide CPU locality
information. These entries use an AcpiProcessorUid to identify the
CPU (unlike on x86 where the entries have an APIC ID).

Update acpi_pxm.c to extend the cpu_add/cpu_find/cpu_get_info
functions to handle AcpiProcessorUid. Use the updated functions
while parsing ACPI_SRAT_GICC_AFFINITY entry for arm64.

Also update sys/conf/files.arm64 to build acpi_pxm.c when ACPI is
enabled.

Reviewed by:	markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D17942
2018-12-08 19:32:23 +00:00
jchandra
2d1461899d acpica : move SRAT/SLIT parsing to sys/dev/acpica
This moves the architecture independent parts of sys/x86/acpica/srat.c
to sys/dev/acpica/acpi_pxm.c, to be used later on arm64. The function
declarations are moved to sys/dev/acpica/acpivar.h

We also need to update sys/conf/files.{i386,amd64} to use the new file.
No functional changes.

Reviewed by:	markj, imp
Differential Revision:	https://reviews.freebsd.org/D17941
2018-12-08 19:10:58 +00:00
jchandra
acaf867c57 x86/acpica/srat.c: Add API for parsing proximity tables
The SLIT and SRAT ACPI tables needs to be parsed on arm64 as well, on
systems that use UEFI/ACPI firmware and support NUMA. To do this, we
need to move most of the logic of x86/acpica/srat.c to dev/acpica and
provide an API that architectures can use to parse and configure ACPI
NUMA information.

This commit adds the API in srat.c as a first step, without making any
functional changes. We will move the common code to sys/dev/acpica
as the next step.

The functions added are:
  * int acpi_pxm_init(int ncpus, vm_paddr_t maxphys) - to allocate and
    initialize data structures used
  * void acpi_pxm_parse_tables(void) - parse SRAT/SLIT, save the cpu and
    memory proximity information
  * void acpi_pxm_set_mem_locality(void) - use the saved data to set
    memory locality
  * void acpi_pxm_set_cpu_locality(void) - use the saved data to set cpu
    locality
  * void acpi_pxm_free(void) - free data structures allocated by init

On arm64, we do not have an cpu APIC id that can be used as index to
store CPU data, we need to use the Processor Uid. To help with this,
define internal functions cpu_add, cpu_find, cpu_get_info to store
and get CPU proximity information.

Reviewed by:	markj, jhb (previous version)
Differential Revision:	https://reviews.freebsd.org/D17940
2018-12-08 18:34:05 +00:00
mmel
73763781e7 Implement R_AARCH64_TLS_DTPMOD64 and A_AARCH64_TLS_DTPREL64 relocations.
Although these are slightly obsolete in favor of R_AARCH64_TLSDESC,
gcc -mtls-dialect=trad still use them.

Please note that definition of TLS_DTPMOD64 and TLS_DTPREL64 are incorrectly
exchanged in GNU binutils. TLS_DTPREL64 should be encoded to 1028 (as is
defined in ARM ELF ABI) but binutils encode it to 1029. And vice versa,
TLS_DTPMOD64 should be encoded to 1029 but binutils encode it to 1028.

While I'm in, add also R_AARCH64_NONE. It can be produced as result of linker
relaxation.

MFC after:	1 week
2018-12-08 14:58:17 +00:00
mjg
d21951d547 umtx: avoid umtxshm locking on object termination if possible
Sample build world result on tmpfs:
kern.ipc.umtx_terminate_notempty: 0
kern.ipc.umtx_terminate_empty: 2891815

Sponsored by:	The FreeBSD Foundation
2018-12-08 14:04:57 +00:00
vmaffione
43ef1e7712 tools: netmap: pkt-gen: check packet length against interface MTU
Validate the value of the -l argument (packet length) against the MTU of the netmap port.
In case the netmap port does not refer to a physical interface (e.g. VALE port or pipe), then
the netmap buffer size is used as MTU.
This change also sets a better default value for the -M option, so that pkt-gen uses
the largest possible fragments in case of multi-slot packets.

Differential Revision:	https://reviews.freebsd.org/D18436
2018-12-08 12:52:09 +00:00
jilles
40107bb258 sh(1): Remove -c string from set builtin documentation
Altering the -c string at run time does not make sense and is not possible.

MFC after:	1 week
2018-12-08 12:49:19 +00:00
mjg
bae6f9dc2d Remove proctree acquire from note_procstat_proc
It is not needed since r340482 ("proc: always store parent pid in p_oppid")

Sponsored by:	The FreeBSD Foundation
2018-12-08 11:38:39 +00:00
mjg
59185429c4 Fix a corner case in ID bitmap management.
If all IDs from trypid to pid_max were used as pids, the code would enter
a loop which would be infinite if none of the IDs could become free (e.g.
they all belong to processes which did not transitioned to zombie).

Fixes:	r341684 ("Manage process-related IDs with bitmaps")

Sponsored by:	The FreeBSD Foundation
2018-12-08 10:22:12 +00:00
mjg
c2763443b4 proc: postpone proc unlock until after reporting with kqueue
kqueue would always relock immediately afterwards.

While here drop the NULL check for list itself. The list is
always allocated.

Sponsored by:	The FreeBSD Foundation
2018-12-08 06:34:12 +00:00
mjg
af8321b07f proc: handle sdt exit probe before taking the proc lock
Sponsored by:	The FreeBSD Foundation
2018-12-08 06:31:43 +00:00
mjg
c0fc2aadad Provide SDT_PROBES_ENABLED macro.
Sponsored by:	The FreeBSD Foundation
2018-12-08 06:30:41 +00:00
mjg
4f7d169e9a amd64: stop re-reading curpc on subyte/suword
Originally read value is still safely kept. Re-reading code was there
for previous iterations which were partially shared with i386.

Sponsored by:	The FreeBSD Foundation
2018-12-08 04:53:08 +00:00
kib
fa48020659 Simplify kern_readlink_vp().
When we detected that the vnode is not symlink, return immediately.
This moves the readlink code out of else branch and unindents it.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-07 23:07:51 +00:00
kib
52dbc76ca2 Fix expression evaluation.
Braces were put in the wrong place, causing failing EAGAIN check to
return zero result.  Remove the problematic assignment from the
conditional expression at all.

While there, remove used once variable vp, and wrap too long line.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-12-07 23:05:12 +00:00
imp
b517a81b5f Even though they are reserved, cdw2 and cdw3 can be set via nvme-cli
(and soon nvmecontrol). Go ahead and copy them into rsvd2 and rsvd3.

Sponsored by: Netflix
2018-12-07 21:58:08 +00:00
imp
f3187c309c Add nda(4) cross reference to nvme(4) 2018-12-07 21:57:39 +00:00
mav
e289217df8 Make virtio-scsi pass SCSI Task Attributes to CTL.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2018-12-07 20:55:29 +00:00
mav
c4b30791ff Fix several iov handling bugs in bhyve virtio-scsi backend.
- buf_to_iov() does not use buflen parameter, allowing out of bound read.
 - buf_to_iov() leaks memory if seek argument > 0.
 - iov_to_buf() doesn't need to reallocate buffer for every segment.
 - there is no point to use size_t for iov counts, int is more then enough.
 - some iov function arguments can be constified.
 - pci_vtscsi_request_handle() used truncate_iov() incorrectly, allowing
   getting out of buffer and possibly corrupting data.
 - pci_vtscsi_controlq_notify() written returned status at wrong offset.
 - pci_vtscsi_controlq_notify() leaked one buffer per event.

Reported by:	wg
Reviewed by:	araujo
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D18465
2018-12-07 20:30:00 +00:00
mav
cc0df1e404 Fill initid explicitly on requests.
Unfortunately ctl_scsi_zero_io() wipes that field, so it was always zero.
While there, targ_port is set by kernel, so user-space should not fill it.

MFC after:	1 week
2018-12-07 19:10:51 +00:00
emaste
2e02c23111 BSD.debug.dist: add newly added nvmecontrol directory 2018-12-07 16:52:52 +00:00
mjg
69fc5b56ce fd: use racct_set_unlocked
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:51:38 +00:00
mjg
74492ad8e4 racct: add RACCT_ENABLED macro and racct_set_unlocked
This allows to remove PROC_LOCK/UNLOCK pairs spread thorought the kernel
only used to appease racct_set.

Sponsored by:	The FreeBSD Foundation
2018-12-07 16:47:34 +00:00
mjg
76d3335601 fd: try do less work with the lock in dup
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:44:52 +00:00
mjg
5d5736ca94 vm: use fcmpset for vmspace reference counting
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:22:54 +00:00
mjg
10f6c42acc Replace hand-rolled unrefs if > 1 with refcount_release_if_not_last
Sponsored by:	The FreeBSD Foundation
2018-12-07 16:11:45 +00:00
mjg
6d80602477 refcount: remove a stale comment about conditional ref/unref routines
It was there for vfs-specific variants and was copied over when it should not.

Sponsored by:	The FreeBSD Foundation
2018-12-07 16:10:13 +00:00
avg
72442df1f5 acpi_MatchHid: use ACPI_MATCHHID_NOMATCH instead of FALSE
Binary representation of both is the same (zero), but
ACPI_MATCHHID_NOMATCH is better for consistency.

MFC after:	4 days
X-MFC with:	r339754
2018-12-07 16:05:39 +00:00
avg
ea228066e0 aibs: fix a typo in the probe method that was introduced in r339754
Because of that typo the driver would try to attach to every device
on acpi bus.  That disrupted acpi attachment of uart driver, at least.

MFC after:	4 days
X-MFC with:	r339754
2018-12-07 16:01:51 +00:00
markj
7a0ac26a7e Update the description of the address space layout on RISC-V.
This adds more detail and fixes some inaccuracies.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18463
2018-12-07 15:56:40 +00:00
markj
a61c5fb063 Rename sptbr to satp per v1.10 of the privileged architecture spec.
Add a subroutine for updating satp, for use when updating the
active pmap.  No functional change intended.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18462
2018-12-07 15:55:23 +00:00
markj
c2425a682c Let the cap_syslog capability inherit stdio descriptors.
Otherwise cap_openlog(LOG_PERROR) doesn't work.

Reviewed by:	oshogbo
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18457
2018-12-07 15:52:50 +00:00
kib
0f30bb9a10 Regen. 2018-12-07 15:19:00 +00:00
kib
48d91fd889 Add new file handle system calls.
Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2).  The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by:	Jack Halford <jack@gandi.net>
Sponsored by:	Gandi.net
Tested by:	pho
Feedback from:	brooks, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D18359
2018-12-07 15:17:29 +00:00
mjg
73f834bfbb proc: when exiting move to zombproc before taking proctree
The kernel was already doing this prior to r329615. It was changed
to reduce contention on allproc. However, introduction of pidhash
locks and removal of proctree -> allproc ordering from fork thanks
to bitmaps fixed things enough to make this change pessimal.

waitpid takes proctree on each call and this change (now) causes
avoidable stalls if allproc is held.

Sponsored by:	The FreeBSD Foundation
2018-12-07 12:32:25 +00:00
mjg
6b8dd99954 Manage process-related IDs with bitmaps
Currently unique pid allocation on fork often requires a full walk of
process, group, session lists to make sure it is not used by anything.
This has a side effect of requiring proctree to be held along with allproc,
which adds more contention in poudriere -j 128.

The patch below implements trivial bitmaps which gets rid of the problem.
Dedicated lock is introduced to manage IDs.

While here a bug was discovered: all processes would inherit reap id from
the first process spawned by init. This had a side effect of keeping the
ID used and when allocation rolls over to the beginning it keeps being
skipped.

The patch is loosely based on initial work by mjoras@.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
2018-12-07 12:22:32 +00:00
mjg
a1ba958dc0 Annotate Giant drop/pickup macros with __predict_false
They are used in important places of the kernel with the lock not being held
majority of the time.

Sponsored by:	The FreeBSD Foundation
2018-12-07 12:06:03 +00:00
mjg
c54695512c unr64: use locked variant if not __LP64__
The current ifdefs are not sufficient to distinguish 32- and 64- bit
variants, which results e.g. in powerpc64 not using atomics.

While some 32-bit archs provide 64-bit atomics, there is no huge advantage
of using them on these platforms.

Reported by:	many
Suggested by:	jhb
Sponsored by:	The FreeBSD Foundation
2018-12-07 12:05:11 +00:00
avg
c1fb81f798 daprobedone: announce if a disk is write-protected
MFC after:	2 weeks
2018-12-07 12:02:31 +00:00
vmaffione
ab139270f3 netmap: remove dead code obsoleted by iflib
The iflib subsystem implements netmap support in a driver-independent
way (sys/net/iflib.c). We can therefore remove the headers that
used to implement netmap support for all the drivers now supported
by iflib (em, igb, ixl, ixgbe, lem).

MFC after:	1 week
2018-12-07 11:47:42 +00:00
mmel
4b670ed804 Fix cut&paste typo in atomic_fetchadd_64().
Reported by:	Jia-Shiun Li <jiashiun@gmail.com>
MFC after:	1 week
2018-12-07 11:10:27 +00:00
pjd
3c25eec2c3 Consider the following situation:
The sender has .not_terminated file. It gets disconnected. The last trail
file is then terminated without adding new data (this can happen for example
when auditd is being stopped on the sender). After reconnect the .not_terminated
was not renamed on the receiver as it should.

We were already handling similar situation where the sender crashed and the
.not_terminated trail file was renamed to .crash_recovery. Extend this case to
handle the situation above.
2018-12-07 03:13:36 +00:00
cem
42d84ef531 gmirror: Evaluate mirror components against newest metadata copy
Re-apply r341665 with format strings fixed.

If we happen to taste a stale mirror component first, don't reject valid,
newer components that have differing metadata from the stale component
(during STARTING).  Instead, update our view of the most recent metadata as
we taste components.

Like mediasize beforehand, remove some checks from g_mirror_check_metadata
which would evict valid components due to metadata that can change over a
mirror's lifetime.  g_mirror_check_metadata is invoked long before we check
genid/syncid and decide which component(s) are newest and whether or not we
have quorum.

Before checking if we can enter RUNNING (i.e., we have quorum) after a NEW
component is added, first remove any known stale or inconsistent disks from
the mirrorset, rather than removing them *after* deciding we have quorum.
Check if we have quorum after removing these components.

Additionally, add a knob, kern.geom.mirror.launch_mirror_before_timeout, to
force gmirrors to wait out the full timeout (kern.geom.mirror.timeout)
before transitioning from STARTING to RUNNING.  This is a kludge to help
ensure all eligible, boot-time available mirror components are tasted before
RUNNING a gmirror.

Add a basic test case for STARTING -> RUNNING startup behavior around stale
genids.

PR:		232671, 232835
Submitted by:	Cindy Yang <cyang AT isilon.com> (previous version)
Reviewed by:	markj (kernel portions)
Discussed with:	asomers, Cindy Yang
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D18062
2018-12-07 02:44:04 +00:00