Commit Graph

233792 Commits

Author SHA1 Message Date
Bruce Evans
49c871278a Fix high resolution kernel profiling just enough to not crash at boot
time, especially for SMP.  If configured, it turns itself on at boot
time for calibration, so is fragile even if never otherwise used.

Both types of kernel profiling were supposed to use a global spinlock
in the SMP case.  If hi-res profiling is configured (but not necessarily
used), this was supposed to be optimized by only using it when
necessary, and slightly more efficiently, in asm.  But it was not done
at all for mcount entry where it is necessary.  This caused crashes
in the SMP case when either type of profiling was enabled.  For mcount
exit, it only caused wrong times.  The times were wrongest with an
i8254 timer since using that requires exclusive access to the hardware.
The i8254 timer was too slow to use here 20 years ago and is much less
usable now, but it is the default for the SMP case since TSCs weren't
invariant when SMP was new.  Do the locking in all hi-res SMP cases for
simplicity.

Calibration uses special asms, and the clobber lists in these were sort
of inverted.  They contained the arg and return registers which are not
clobbered, but on amd64 they didn't contain the residue of the call-used
registers which may be clobbered (%r10 and %r11).  This usually caused
hangs at boot time.  This usually affected even the UP case.
2018-06-02 05:48:44 +00:00
Eitan Adler
66b3f031f0 top(1): const poison
top(1) has a number of issues with writing to const strings. Begin
helping this along by marking easy cases as const.
2018-06-02 04:37:37 +00:00
Bruce Evans
dbe3061729 Fix recent breakages of kernel profiling, mostly on i386 (high resolution
kernel profiling remains broken).

memmove() was broken using ALTENTRY().  ALTENTRY() is only different from
ENTRY() in the profiling case, and its use in that case was sort of
backwards.  The backwardness magically turned memmove() into memcpy()
instead of completely breaking it.  Only the high resolution parts of
profiling itself were broken.  Use ordinary ENTRY() for memmove().
Turn bcopy() into a tail call to memmove() to reduce complications.
This gives slightly different pessimizations and profiling lossage.
The pessimizations are minimized by not using a frame pointer() for
bcopy().

Calls to profiling functions from exception trampolines were not
relocated.  This caused crashes on the first exception.  Fix this using
function pointers.

Addresses of exception handlers in trampolines were not relocated.  This
caused unknown offsets in the profiling data.  Relocate by abusing
setidt_disp as for pmc although this is slower than necessary and
requires namespace pollution.  pmc seems to be missing some relocations.
Stack traces and lots of other things in debuggers need similar relocations.

Most user addresses were misclassified as unknown kernel addresses and
then ignored.  Treat all unknown addresses as user. Now only user
addresses in the kernel text range are significantly misclassified (as
known kernel addresses).

The ibrs functions didn't preserve enough registers.  This is the only
recent breakage on amd64.  Although these functions are written in
asm, in the profiling case they call profiling functions which are
mostly for the C ABI, so they only have to save call-used registers.
They also have to save arg and return registers in some cases and
actually save them in all cases to reduce complications.  They end up
saving all registers except %ecx on i386 and %r10 and %r11 on amd64.
Saving these is only needed for 1 caller on each of amd64 and i386.
Save them there.  This is slightly simpler.

Remove saving %ecx in handle_ibrs_exit on i386.  Both handle_ibrs_entry
and handle_ibrs_exit use %ecx, but only the latter needed to or did
save it.  But saving it there doesn't work for the profiling case.

amd64 has more automatic saving of the most common scratch registers
%rax, %rcx and %rdx (its complications for %r10 are from unusual use
of %r10 by SYSCALL).  Thus profiling of handle_ibrs_exit_rs() was not
broken, and I didn't simplify the saving by moving the saving of these
registers from it to the caller.
2018-06-02 04:25:09 +00:00
Eitan Adler
0059e7102f top(1): clean up a bit
- remove unused defines
- use standard defines for STDOUT
- don't cast for memset
- avoid using (void) cast
2018-06-02 04:20:42 +00:00
Eitan Adler
cffee2bc5b top(1): help scan-build along a bit
Teach scan-build that some arrays are larger than zero, and thus not to
warn.
2018-06-02 04:08:52 +00:00
Eitan Adler
1978939544 top(1): Use uid_t for uid rather than 'int'
Remove unneeded define while here.
2018-06-02 03:54:50 +00:00
Eitan Adler
f4d9a8de00 top(1): Remove now-invalid NOTE 2018-06-02 03:33:02 +00:00
Eitan Adler
3798694c01 top(1): avoid casting malloc 2018-06-02 03:31:14 +00:00
Eitan Adler
0b2f6ed144 top(1): Use standard boolean rather than homegrown alternative 2018-06-02 03:25:15 +00:00
Rick Macklem
dec8894b45 Fix the default number of threads for Flex File layout pNFS client I/O.
The intent was that the default would be based on number of CPUs, but the
code disabled using taskqueue() by default.
This code is only executed when mounting a NFSv4.1 server that supports the
Flexible File layout for pNFS and, since such servers are rare, this change
shouldn't result in a POLA violation.
(The FreeBSD pNFS server is still a project and the only other one that
 uses Flexible File layout is being developed by Primary Data and I don't
 know if they have even shipped any to customers yet.)
Found while testing the pNFS server.
2018-06-02 00:11:26 +00:00
Eitan Adler
220e4623aa top(1): remove two unneeded headers 2018-06-02 00:02:27 +00:00
Eitan Adler
f6234b51bf top(1): ansify, style(9). and nits
- Prefer using ansi prototypes rather than C prototypes
- Keep type on separate line from name of function
- Try to keep things const where possible. This will help get to WARNS=6
- switch to "bool" where it makes sense
2018-06-02 00:02:15 +00:00
Mark Johnston
49a3710c89 Remove the "pass" variable from the page daemon control loop.
It serves little purpose after r308474 and r329882.  As a side
effect, the removal fixes a bug in r329882 which caused the
page daemon to periodically invoke lowmem handlers even in the
absence of memory pressure.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D15491
2018-06-02 00:01:07 +00:00
Konstantin Belousov
633d3b1c71 Only check for MAP_32BIT when available.
Reported by:	mmacy
Sponsored by:	The FreeBSD Foundation
MFC after:	10 days
2018-06-01 23:50:51 +00:00
Mark Johnston
3fb14f61e1 Avoid completing I/O when dumping core after a panic.
Filesystem or pager completion callbacks are generally non-functional
after a panic and may trigger deadlocks if invoked in this context
(e.g., by attempting to destroying a buffer mapping).  To avoid this
situation, short-circuit I/O completion in biodone().

Reviewed by:	imp
Discussed with:	mav
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D15592
2018-06-01 23:49:32 +00:00
Mark Johnston
2e7680c6bf Don't export _end on arm64 and riscv.
These platforms don't support brk() and sbrk(), which are the reason
for exporting _end in the first place.

MFC after:	1 week
2018-06-01 23:42:10 +00:00
Mark Johnston
e2c1730299 Remove an inaccuracy from mincore.2.
Super pages are supported on non-x86 architectures, so just remove the
incorrect note.  While here, change terminology to be consistent with
mmap.2.

MFC after:	1 week
2018-06-01 23:40:43 +00:00
Conrad Meyer
452bb88a9c at.man: Bump .Dd missed in r334502
Sponsored by:	Dell EMC Isilon
2018-06-01 22:57:19 +00:00
Conrad Meyer
3181398b92 Update other man pages to match leap second reality
Missed these in r334501; see justification there:

https://svnweb.freebsd.org/base?view=revision&revision=334501

Sponsored by:	Dell EMC Isilon
2018-06-01 22:37:59 +00:00
Conrad Meyer
2a1fb74048 touch.1: Update to conform to POSIX 2004
POSIX borrowed the "double leap second" bug from C89.  Double leap seconds can
never happen.  This mistake was present in at least POSIX 1997 and fixed by
POSIX 2004.  I can't find a copy of 2001 online to determine if the bug was
present in that revision.

While here, remove duplicate language between -d and -t.  A few other minor
enhancements and an igor (lint) bugfix.

Further reading:

2018 POSIX (documents -d):
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/touch.html

2004 POSIX (documents SS from 0-60):
http://pubs.opengroup.org/onlinepubs/009695399/utilities/touch.html

1997 POSIX/SUSv2 (historical interest, 0-61):
http://pubs.opengroup.org/onlinepubs/007908799/xcu/touch.html

More on this subject (start at "Unix system time and the POSIX standard")
https://www.ucolick.org/~sla/leapsecs/onlinebib.html

And: https://marc.info/?l=openbsd-tech&m=92682843416159&w=2

Reported by:	Vishal Sahu <vsahu AT isilon.com>
Sponsored by:	Dell EMC Isilon
2018-06-01 22:34:59 +00:00
Brooks Davis
0141ef6c07 Remove support for SYS_sys_exit in favor of SYS_exit.
SYS_exit has been defined in the repo since 1994 except for a brief
window when SYS_sys_exit was defined in 2000.
2018-06-01 22:09:27 +00:00
Alan Cox
60221a5701 Only a small subset of mmap(2)'s flags should be used in combination with
the flag MAP_GUARD.  Rather than enumerating the flags that are not
allowed, enumerate the flags that are allowed.  The list of allowed flags
is much shorter and less likely to change.  (As an aside, one of the
previously enumerated flags, MAP_PREFAULT, was not even a legal flag for
mmap(2).  However, because of an earlier check within kern_mmap(), this
misuse of MAP_PREFAULT was harmless.)

Reviewed by:	kib
MFC after:	10 days
2018-06-01 21:37:42 +00:00
Justin Hibbits
3254c39f83 Increase powerpc64 KVA from ~7.25GB to 32GB
This will let us use much more KVA for ZFS ARC where needed.  This may be
incresed in the future if memory requirements increase.

Discussed with:	nwhitehorn
2018-06-01 21:37:20 +00:00
Michael Tuexen
c14f9fe5ef Limit the retransmission timer for SYN-ACKs by TCPTV_REXMTMAX.
Use the same logic to handle the SYN-ACK retransmission when sent from
the syn cache code as when sent from the main code.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 21:24:27 +00:00
Alan Somers
a19dca2dfd audit(4): add tests for the fd audit class
The only syscalls in this class are rmdir, unlink, unlinkat, rename, and
renameat.  Also, set is_exclusive for all audit(4) tests, because they can
start and stop auditd.

Submitted by:	aniketp
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15647
2018-06-01 21:24:10 +00:00
Piotr Pawel Stefaniak
0de58b3f46 indent(1): improve an error message
When producing a "[...] requires a parameter" error, provide the recognized
name of the option instead of argument provided.
2018-06-01 20:45:35 +00:00
Michael Tuexen
badef00d58 Ensure net.inet.tcp.syncache.rexmtlimit is limited by TCP_MAXRXTSHIFT.
If the sysctl variable is set to a value larger than TCP_MAXRXTSHIFT+1,
the array tcp_syn_backoff[] is accessed out of bounds.

Discussed with: jtl@
MFC after:	3 days
Sponsored by:	Netflix, Inc.
2018-06-01 19:58:19 +00:00
Piotr Pawel Stefaniak
1d01804309 indent(1): restore working -pcs
My previous indent(1) commit accidentally broke the -pcs option (which adds
space between function name and opening parenthesis in function calls) by
copying all but one of a few conditions in an if clause. Reinstate the
condition.

Add a regression test to lower the chances of breaking it again.

Correct a comment with description of what the option does.
2018-06-01 19:56:41 +00:00
Rick Macklem
9442a64e53 Add the BindConnectiontoSession operation to the NFSv4.1 server.
Under some fairly unusual circumstances, the Linux NFSv4.1 client is
doing a BindConnectiontoSession operation for TCP connections.
It is also used by the ESXi6.5 NFSv4.1 client.
This patch adds this operation to the NFSv4.1 server.

Reported by:	andreas.nagy@frequentis.com
Tested by:	andreas.nagy@frequentis.com
MFC after:	2 weeks
2018-06-01 19:47:41 +00:00
Warner Losh
16bc63ec75 Add PNP_INFO to aac
Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
2018-06-01 19:42:59 +00:00
Jonathan T. Looney
651a790808 Update the sysctl(9) manpage to indicate that <sys/param.h> is required
instead of <sys/types.h>.  (<sys/sysctl.h> includes NULL, which is defined
with <sys/param.h> and not <sys/types.h>.)

Sponsored by:	Netflix
2018-06-01 16:47:39 +00:00
Navdeep Parhar
c27fcc70cc cxgbe(4): Include full duplex mediaopt in media that can be reported as
active.  Always report full duplex in active media.

Sponsored by:	Chelsio Communications
2018-06-01 16:46:29 +00:00
Justin Hibbits
a608b7d313 Unbreak 32-bit binaries on powerpc64
Recently a change was made which broke loading 32-bit binaries on powerpc64,
with an assertion in ld-elf32.so.1:

ld-elf32.so.1: assert failed:
/usr/local/poudriere/jails/ppc64/usr/src/libexec/rtld-elf/rtld.c:390

It turns out Elf32_AuxInfo was broken for a very long time on powerpc64, as
it uses long and pointers, which are both 64 bits on powerpc64, and only
manifested with the recent work on auxargs.
2018-06-01 16:31:05 +00:00
Alan Somers
26f5ecb775 audit(4): Add tests for the fw class of syscalls.
truncate and ftruncate are the only syscalls in this class, apart from
certain variations of open and openat, which will be handled in a different
file.

Submitted by:	aniketp
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15640
2018-06-01 16:23:47 +00:00
Ed Maste
b8d908b71e ANSIfy sys/kern 2018-06-01 13:26:45 +00:00
Breno Leitao
48f64992f2 powerpc64: Avoid overwriting initrd area
Currently kexec loads an initrd file into the main memory but does not
mark that region as reserved, thus the area is not protected.

If any initrd/md file is loaded from kexec/petitboot, the region might become
corarupted/overwritten since FreeBSD does not know the region is 'reserved'.

This patch simply adds the initrd area as a reserved memory region.

Approved by: jhibbits
Differential Revision: https://reviews.freebsd.org/D15610
2018-06-01 12:43:13 +00:00
Hans Petter Selasky
57a865f808 Implement the __sg_alloc_table_from_pages() function based on the existing
sg_alloc_table_from_pages() function in the LinuxKPI.

This basically allow segments to have a limit, max_segment.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-01 12:09:07 +00:00
Hans Petter Selasky
6fad8d171a Implement radix_tree_iter_delete() in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-01 11:42:09 +00:00
Hans Petter Selasky
0a85496223 Improve high resolution timer support in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-01 11:33:14 +00:00
Hans Petter Selasky
f03ae7e802 Add more GFP macro definitions in the LinuxKPI.
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-06-01 11:14:59 +00:00
Piotr Pawel Stefaniak
b06c2eb7b1 indent(1): don't add unneeded space to function pointer declarations
If the current token is an opening parenthesis, it's either a function call
(or sizeof or offsetof) or a declaration. The former doesn't need a space
before the parenthesis.
2018-06-01 09:58:44 +00:00
Andriy Gapon
0a15ff37d6 call AcpiLeaveSleepStatePrep after re-enabling interrupts
I want to do this change because this call (actually,
AcpiHwLegacyWakePrep) does a memory allocation and ACPI namespace
evaluation.  Although it is not very likely to run into any trouble, it
is still not safe to make those calls with interrupts disabled.
witness(4) and malloc(9) do not currently check for a context with
interrupts disabled via intr_disable and we lack a facility for doing
that.  So, those unsafe operations fly under the radar.  But if
intr_disable in acpi_EnterSleepState was replaced with spinlock_enter
(which it probably should be), then witness and malloc would immediately
complain.

Also, AcpiLeaveSleepStatePrep is documented as called when interrupts
are enabled.  It used to require disabled interrupts, but that
requirement was changed a long time ago when support for _BFS and _GTS
was removed from ACPICA.

The ACPI wakeup sequence is very sensitive to changes. I consider this
change to be correct, but there can be fallouts from it.

What AcpiHwLegacyWakePrep essentially does is writing a value
corresponding to S0 into SLP_TYPx bits of PM1 Control Register(s).
According to ACPI specifications that write should be a NOP as SLP_EN
bit is not set.  But I see in some chipset specifications that they
allow to ignore SLP_EN altogether and to act on a change of SLP_TYPx
alone.

Also, there are a couple of accesses to ACPI hardware before the new
location of the call to AcpiLeaveSleepStatePrep.  One is to clear the
power button status and the other is to enable SCI.  So, the move may
affect the interaction between then OS and ACPI platform.

I have not seen any regressions on my test system, but it's a desktop.

MFC after:	5 weeks
2018-06-01 09:44:23 +00:00
Piotr Pawel Stefaniak
3bbaa755f3 indent(1): don't indent typedef declarations as object declarations 2018-06-01 09:41:15 +00:00
Piotr Pawel Stefaniak
5f35ea69af indent(1): consider tab characters when forcing a newline after a comma 2018-06-01 09:32:42 +00:00
Edward Tomasz Napierala
e8a5d07df5 Set bDeviceClass properly for composite device (template 8). There should
be no functional change.

PR:		203289
Reviewed by:	hselasky@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-06-01 09:17:20 +00:00
Piotr Pawel Stefaniak
9d4264fbdd indent(1): identifiers inside parentheses are not declarations
Also make lparen position calculation consider tab stops.

This improves function pointer typedef formatting.
2018-06-01 08:54:51 +00:00
Eitan Adler
937499dcb1 top(1): Display of TID when using 'H' flag
Some users prefer seeing the TID when viewing individual threads. This
makes sense as the PID will be the same for multiple entries. An attempt
was made to include both, but there is insufficient room. As such, using
the TID.

While here, rename the header variables to be more understandable.

Discussed with:	mmacy
Reported on:	2009-10-07
2018-06-01 05:51:40 +00:00
Eitan Adler
91339aaf7b service(1): Improve manual page
* Sort options..
* Fix some typos.
* Use one Bd macro for code blocks instead of a bunch of Dl macros.
* Improve formatting.
* Clarify 'jail' argument

PR:		228552
Submitted by:	0mp
MFC After:	3 weeks
2018-06-01 04:14:16 +00:00
Alan Somers
8ec6562b6d audit(4): Add tests for the fr class of syscalls
readlink and readlinkat are the only syscalls in this class.  open and
openat are as well, but they'll be handled in a different file.  Also, tidy
up the copyright headers of recently added files in this area.

Submitted by:	aniketp
MFC after:	2 weeks
Sponsored by:	Google, Inc. (GSoC 2018)
Differential Revision:	https://reviews.freebsd.org/D15636
2018-06-01 01:37:07 +00:00
Navdeep Parhar
b9330ed7a2 cxgbe(4): Retire an old check. 2018-06-01 01:05:34 +00:00