Commit Graph

229918 Commits

Author SHA1 Message Date
Mark Johnston
1d3a1bcfac Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page
queue. In the common case, unwiring causes it to be enqueued at the tail
of that page queue. This change modifies vm_page_wire() to not dequeue
the page, thus avoiding the highly contended page queue locks. Instead,
vm_page_unwire() takes care of requeuing the page as a single operation,
and the page daemon dequeues wired pages as they are encountered during
a queue scan to avoid needlessly revisiting them later. For pages in
PQ_ACTIVE we do even better, since a requeue is unnecessary.

The change improves scalability for some common workloads. For instance,
threads wiring pages into the buffer cache no longer need to modify
global page queues, and unwiring is usually done by the bufspace thread,
so concurrency is not as much of an issue. As another example, many
sysctl handlers wire the output buffer to avoid faults on copyout, and
since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid
modifying the page queue in this case.

The change also adds a block comment describing some properties of
struct vm_page's reference counters, and the busy lock.

Reviewed by:	jeff
Discussed with:	alc, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D11943
2018-02-07 16:57:10 +00:00
Warner Losh
207efdb345 Add a note about why we have the conditional before including
bsd.compiler.mk. It's so fmake from older 9.x systems still
works (still a supported build config, and having the note here
will let us know when we can cull it more easily).

Also pull in a related change from include to sinclude from
arichardson@'s cross building work, as well as it's companion in
Makefile.inc1 with a note about why we do the odd thing there.

Submitted by: archardson
Differential Revision: https://reviews.freebsd.org/D14241
2018-02-07 16:28:26 +00:00
Ed Maste
4816408016 add retpoline compiler and linker feature flags
These features indicate that the compiler and linker support the
retpoline speculative execution vulnerability (CVE-2017-5715)
mitigation.

Reviewed by:	dim, imp
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14228
2018-02-07 14:50:06 +00:00
Hans Petter Selasky
5b7cc89266 Fix implementation of ktime_add_ns() and ktime_sub_ns() in the LinuxKPI to
actually return the computed result instead of the input value.

This is a regression issue after r289572.

Found by:	gcc6
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-02-07 12:12:06 +00:00
Adrian Chadd
f6ede6300f [ath] Use the BSSID address logic for STA VAPs too.
For DWDS VAPs on ath(4) we need to ensure that the STA vap and hostap VAP
have different MAC addresses.  If the STA code path doesn't utilise the
address assign / reclaim path then it doesn't update the bitmap with which
address was allocated.

This should fix a bunch of corner issues I've been seeing with DWDS STA + AP
VAPs that I was working around with manual MAC address assignment.
2018-02-07 09:37:22 +00:00
Adrian Chadd
037fb51a2e [ar71xx] Fix the TL-wdr3600/tl-wdr4300 hints in the new world order.
Tested:

* tl-wdr4300
2018-02-07 09:35:47 +00:00
Kyle Evans
87fb7f5b58 if_awg: Skip emac reset if configured for internal PHY
On the OrangePi One at least, emac reset when an ethernet cable is not
plugged in seems to break ethernet. Soft reset will fail, even with
increasing the delay and retries to wait for up to 20 seconds. This can be
reproduced across at least two different OrangePi One's by simply leaving
ethernet cable unplugged when awg attaches. Whether it's plugged in or not
through u-boot process makes no difference.

Skipping the reset in this configuration doesn't seem to cause any problems,
tried across many many reboots with and without ethernet cable plugged in.

Tested on:	OrangePi One
Tested on:	Other boards (manu)
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D13974
2018-02-07 01:54:13 +00:00
Warner Losh
c4b72d8b37 Keep a counter for number of requests completed with an error.
Sponsored by: Netflix
2018-02-06 23:21:08 +00:00
Conrad Meyer
6e876d695e fsync.2: Cross-reference fsync(1)
Reported by:	rpokala
Sponsored by:	Dell EMC Isilon
2018-02-06 23:12:47 +00:00
Warner Losh
3182fd63c4 Avoid find -s, use find | sort instead.
find -s was introduced to make the metalog more
deterministic. However, find -s is not portable. find | sort is
portable and accomplishes the same goals, even if it isn't
pedantically the same. TZS is the same before / after the change so
any fussy differences between the two are moot and there won't be
METALOG churn across this change.

Differential Revision: https://reviews.freebsd.org/D14231
2018-02-06 23:12:16 +00:00
Pedro F. Giffuni
7cbd6d338e {ext2|ufs}_readdir: Avoid setting negative ncookies.
ncookies cannot be negative or the allocator will fail. This should only
happen if a caller is very broken but we can still try to survive the
event.

We should probably also verify for uio_resid > MAXPHYS but in that case
it is not clear that just clipping the ncookies value is an adequate
response.

MFC after:	2 weeks
2018-02-06 22:38:19 +00:00
Ian Lepore
ce75945d3c Use const pointers for input data not modified by clock utility functions. 2018-02-06 22:17:01 +00:00
Gleb Smirnoff
d2be4a1e4f Use correct arithmetic to calculate how many pages we need for kegs
and hashes.  There is no functional change with current sizes.
2018-02-06 22:13:40 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Gleb Smirnoff
1616767dfc Improve DIAGNOSTIC printf. Report using a boot page every time regardless
of booted status.
2018-02-06 22:08:43 +00:00
Gleb Smirnoff
ae941b1b4e Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC.
o Call uma_startup1() after initializing kmem, vmem and domains.
o Include 8 eight VM startup pages into uma_startup_count() calculation.
o Account for vmem_startup() and vm_map_startup() preallocating pages.
o Account for extra two allocations done by kmem_init() and vmem_create().
o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT
  allowed several other SYSINITs to sneak in before it, thus bumping
  requirement for amount of boot pages.
2018-02-06 22:06:59 +00:00
Mark Felder
330d62831f Refactor cleanvar to remove shell expansion vulnerability
If any process creates a directory named "-P" in /var/run or
/var/spool/lock it will cause the purgedir function to start to rm -r /.

Simplify a lot of complicated shell logic by leveraging find(1).

Reviewed by:	allanjude
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D13778
2018-02-06 21:35:41 +00:00
Scott Long
964107031b Cache the value of the request and reply frame size since it's used quite
a bit in the normal operation of the driver.  Covert it to represent bytes
instead of 32bit words.  Fix what I believe to be is a bug in this respect
with the Tri-mode cards.

Sponsored by:	Netflix
2018-02-06 21:01:38 +00:00
Mark Felder
1ce07411fa Fix firstboot fs mount logic
The firstboot logic has an error which causes the filesystem to be
mounted readonly even though root_rw_mount=YES. This fixes the error to
ensure that the root filesystem is mounted rw as expected after the run
of the firstboot scripts.

Reviewed by:	imp
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D14226
2018-02-06 20:12:05 +00:00
Bjoern A. Zeeb
6b9159b96b Remove a trailing whitspace. 2018-02-06 19:14:15 +00:00
Mark Johnston
4d2653522d Delete a declaration for a variable removed in r305362. 2018-02-06 17:26:11 +00:00
John Baldwin
a1d39c5344 Use a workaround to compile the crt init functions correctly with clang.
The MIPS assembly parser treats forward-declared local symbols as global
symbols.  This results in CALL16 relocations being used against local
(private) symbols which then fail to resolve when linking binaries.
Add .local to force the init and fini functions to be treated as local as
a workaround.

Submitted by:	sbruno
Sponsored by:	DARPA / AFRL
2018-02-06 17:01:10 +00:00
Mark Johnston
0d02f6c201 Simplify synchronization read error handling.
Since synchronization reads are performed by submitting a request to
the external mirror provider, we know that the request returns with an
error only when gmirror was unable to read a copy of the block from any
mirror. Thus, there is no need to retry the request from the
synchronization error handler.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-02-06 16:02:33 +00:00
Alexander Motin
62a09ee976 Fix queue length reporting in mps(4) and mpr(4).
Both drivers were found to report CAM bigger queue depth then they really
can handle.  It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.

Reviewed by:	scottl
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14215
2018-02-06 16:02:25 +00:00
Kenneth D. Merry
e2997a03b7 Diagnostic buffer fixes for the mps(4) and mpr(4) drivers.
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.

There were several issues here:
 o No checking of the bus_dmamap_load() return value.  If the load
   failed or got deferred, mp{r,s}_diag_register() continued on as if
   nothing had happened.  We now check the return value and bail
   out if it fails.

 o No waiting for a deferred load callback.  bus_dmamap_load()
   calls a supplied callback when the mapping is done.  This is
   generally done immediately, but it can be deferred.
   mp{r,s}_diag_register() did not check to see whether the callback
   was already done before proceeding on.  We now sleep until the
   callback is done if it is deferred.

 o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
   memory is allocated and loaded.  This is necessary on some
   platforms to synchronize host memory that is going to be updated
   by a device.

Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress.  This fixes that problem
as well.  (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)

mp{r,s}var.h:
	Add a new structure, struct mpr_busdma_context, that is
	used for deferred busdma load callbacks.

	Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
	Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
	This provides synchronization for callers that want to
	wait on a deferred bus_dmamap_load() callback.

mp{r,s}_user.c:
	In bus_dmamap_register(), add a call to bus_dmamap_sync()
	with the BUS_DMASYNC_PREREAD flag set after an allocation
	is loaded.

	Also, check the return value of bus_dmamap_load().  If it
	fails, bail out.  If it is EINPROGRESS, wait for the
	callback to happen.  We use an interruptible sleep (msleep
	with PCATCH) and let the callback clean things up if we get
	interrupted.

	In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
	bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
	the data out to make sure the data is in stable storage.

	In mp{r,s}_post_fw_diag_buffer() and
	mp{r,s}_release_fw_diag_buffer(), check the reply to see
	whether it is NULL.  It can be NULL (and the command non-NULL)
	if the controller gets reinitialized while we're waiting for
	the command to complete but the driver structures aren't
	reallocated.  The driver structures generally won't be
	reallocated unless there is a firmware upgrade that changes
	one of the IOCFacts.

	When freeing diagnostic buffers in mp{r,s}_diag_register()
	and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
	freeing it.  This will prevent a duplicate free in some
	situations.

Sponsored by:	Spectra Logic
Reviewed by:	mav, scottl
MFC after:	1 week
Differential Revision:	D13453
2018-02-06 15:58:22 +00:00
Alex Richardson
95eff7c0c3 crossbuild: Make the CHECK_TIME variable work on Linux
Linux /usr/bin/find doesn't understand the -mtime -0s flag.
Instead create a temporary file and compare that file's mtime to
sys/sys/param.h to check whether the clock is correct.

Reviewed By:	jhb, imp
Approved By:	jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D14157
2018-02-06 15:41:45 +00:00
Alex Richardson
fb1df20368 Don't hardcode /usr/bin as the path for mktemp in build tools
It won't work e.g. when crossbuilding from Ubuntu Linux as mktemp is in
/bin there.

Reviewed By:	bdrewery
Approved By:	jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D13937
2018-02-06 15:41:35 +00:00
Alex Richardson
c3a6ea5ba6 Allow compiling usr.bin/find on Linux and Mac
When building FreeBSD the makefiles invoke find with various flags such as
`-s` that aren't supported in the native /usr/bin/find. To fix this I
build the FreeBSD version of find and use that when crossbuilding.

Inserting lots if #ifdefs in the code is rather ugly but I don't see a
better solution.

Reviewed By:	brooks (mentor)
Approved By:	jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D13306
2018-02-06 15:41:26 +00:00
Alex Richardson
e911aac76a Make mips_postboot_fixup work when building the kernel with clang+lld
The compiler/linker can align fake_preload anyway it would like. When
building the kernel with gcc+bfd this always happened to be a multiple of 8.
When I built the kernel with clang and linked with lld fake_preload
happened to only be aligned to 4 bytes which caused a an ADDRS trap because
the compiler will emit sd instructions to store to this buffer.

Reviewed By:	jhb, imp
Approved By:	jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D14018
2018-02-06 15:41:15 +00:00
Dmitry Marakasov
52d7a78f17 - Document new ${name}_limits rc.conf option
Approved by:	cy
MFC after:	2 weeks (along with 328331 which introduced this option)
Differential Revision:	https://reviews.freebsd.org/D14028
2018-02-06 15:30:17 +00:00
Kyle Evans
a123333f3a dtb/allwinner: Add sun7i-a20-lamobo-r1.dts (Banana Pi R1)
FreeBSD boots on this board, but the ethernet switch is not currently
supported, resulting in no ethernet.

A U-Boot port will be added once the ethernet switch is at least basically
supported, but we add its DTS to the build here to lower the barrier-to-boot
while work is underway.
2018-02-06 14:57:03 +00:00
Baptiste Daroussin
8134347f26 Remove libreadline from the source tree, all consumers but gdb
has been switched to libedit long ago, libreadline was built as an
internallib for a while and kept only for gdbtui which was broken using
libreadline.

Since gdb has been mostly deorbitted in all arches, gdbtui was only installed
on arm and sparc64, given it has been removed, gdb has been switched to use
libedit, no consumers are left for libreadline. Thus this removal
2018-02-06 12:22:42 +00:00
Baptiste Daroussin
0b0d729237 Commit forgotten change in gdb allowing to use libedit 2018-02-06 12:17:03 +00:00
Baptiste Daroussin
15c36a23a1 Switch to use libedit instead of readline 2018-02-06 12:12:44 +00:00
Baptiste Daroussin
a273973175 Remove gdbtui, it was already not installed on every arches
only installed on arm and sparc64.
It is the only bits that keeps us having libreadline in base
The rest of gdb can be switched to libedit and will be in another
commit
2018-02-06 11:54:20 +00:00
Adrian Chadd
2ba4bf8fa5 [arswitch] Implement the switch MAC address fetch API.
The placeholders are here for some future "set" MAC address API.

Tested:

* AR9340 switch
* AR8327 switch
2018-02-06 08:35:49 +00:00
Adrian Chadd
93e98f5f14 [etherswitchcfg] print the switch MAC address if provided. 2018-02-06 08:35:09 +00:00
Adrian Chadd
15bd1a867e [etherswitch] add initial support for potentially configuring and fetching the switch MAC address.
Switches that originate their own frames (eg obvious ones like Pause frames)
need a MAC address to use to send those frames from.

This API will hopefully begin to allow that to be configurable.
2018-02-06 08:34:50 +00:00
Scott Long
4b07a5606c Fix a case where a request frame can be composed that requires 2 or more
SGList elements, but there's only enough space in the request frame for
either 1 element or a chain frame pointer.  Previously, the code would
hit the wrong case, add the SGList element, but then fail to add the
chain frame due to lack of space.  Re-arrange the code to catch this case
earlier and handle it.

Sponsored by:	Netflix
2018-02-06 06:55:55 +00:00
Scott Long
99e7a4ad9e Return a C errno for cam_periph_acquire().
There's no compelling reason to return a cam_status type for this
function and doing so only creates confusion with normal C
coding practices. It's technically an API change, but the periph API
isn't widely used. No efffective change to operation.

Reviewed by:	imp, mav, ken
Sponsored by:	Netflix
Differential Revision:	D14063
2018-02-06 06:42:25 +00:00
Bryan Venteicher
87dda4ed5d Correct structure name used in bus_map_resource(9) example
Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14188
2018-02-06 04:28:21 +00:00
Gleb Smirnoff
f4bef67c9c Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.

o Introduce another stage of UMA startup, which is entered after
  vm_page_startup() finishes. After this stage we don't yet enable buckets,
  but we can ask VM for pages. Rename stages to meaningful names while here.
  New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
  BOOT_RUNNING.
  Enabling page alloc earlier allows us to dramatically reduce number of
  boot pages required. What is more important number of zones becomes
  consistent across different machines, as no MD allocations are done before
  the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
  startup_alloc(), however that may change, so vm_page_startup() provides
  its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
  functions calculates sizes of zones zone and kegs zone, and calculates how
  many pages UMA will need to bootstrap.
  It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
  mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
  but also as args.size.

Reviewed by:		imp, gallatin (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14054
2018-02-06 04:16:00 +00:00
Kirk McKusick
47806d1b93 Occasional cylinder-group check-hash errors were being reported on
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.

The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:

- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
  check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
  never mark it dirty, thus do not write it out, and hence never
  update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
  check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
  the old check hash is still in the VM cache, but some other pages
  are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
  from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
  error to be printed.

The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.

The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.

- When a read completes we had the following sequence:

  - bufdone()
  -- b_ckhashcalc (calculates check hash)
  -- bufdone_finish()
  --- vfs_vmio_iodone() (replaces bogus pages with the cached ones)

- When we are reading a buffer where one or more pages are already
  in memory (but not all pages, or we wouldn't be doing the read),
  the I/O is done with bogus_page mapped in for the pages that exist
  in the VM cache. This mapping is done to avoid corrupting the
  cached pages if there is any I/O overrun. The vfs_vmio_iodone()
  function is responsible for replacing the bogus_page(s) with the
  cached ones. But we were calculating the check hash before the
  bogus_page(s) were replaced. Hence, when we were calculating the
  check hash, we were partly reading from bogus_page, which means
  we calculated a bad check hash (e.g., because multiple pages have
  been mapped to bogus_page, so its contents are indeterminate).

The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.

With these two changes, the occasional cylinder-group check-hash
errors are gone.

Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
2018-02-06 00:19:46 +00:00
Konstantin Belousov
5370a081a8 Move signal trampolines out of locore.s into separate source file.
Similar to other arches, the move makes the subject of locore.s only
the kernel startup.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-06 00:02:30 +00:00
Landon J. Fuller
d177c19903 bwn(4): migrate bwn(4) to the native bhnd(9) interface, and drop siba_bwn.
- Remove the shim interface that allowed bwn(4) to use either siba_bwn or
  bhnd(4), replacing all siba_bwn calls with their bhnd(4) bus equivalents.
- Drop the legay, now-unused siba_bwn bus driver.
- Clean up bhnd(4) board flag defines referenced by bwn(4).

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13518
2018-02-05 23:38:15 +00:00
John Baldwin
15746ef43a Ignore relocation tables for non-memory-resident sections.
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident.  For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero).  For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.

Reported by:	Mori Hiroki <yamori813@yahoo.co.jp>, adrian
Tested on:	mips, mips64
MFC after:	1 month
Sponsored by:	DARPA / AFRL
2018-02-05 23:35:33 +00:00
John Baldwin
d56b465cd5 Fix a typo. 2018-02-05 23:29:50 +00:00
John Baldwin
d722231bca Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13945
2018-02-05 23:27:42 +00:00
Adrian Chadd
f6f7bec6f8 [arswitch] disable ARP copy-to-CPU port for AR9340 for now.
I'll have to go double check to see if it does indeed pass ARP frames between
switch ports with this disabled, but it seems required for the CPU port to see
ARP traffic.

I'll dig into this some more.
2018-02-05 20:37:29 +00:00
Adrian Chadd
45a7272a5b [arswitch] fix build breakage.
Apparently the last time I checked building this it didn't pick up that the
header had changed.
2018-02-05 20:30:53 +00:00