137034 Commits

Author SHA1 Message Date
kib
ab87bfcd7a Fix leak of memory and file refs with sendmsg(2) over unix domain sockets.
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.

Noted, test case written, reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21000
2019-07-19 20:51:39 +00:00
dougm
b894f21a4a Define vm_map_entry_in_transition to handle an in-transition map
entry, combining code currently in vm_map_unwire and
vm_map_wire_locked into a single function, called by each of them for
entries in transition.

Discussed with: kib, markj
Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20833
2019-07-19 20:47:35 +00:00
mav
f4934e6568 Add Accessible Max Address Configuration support to camcontrol.
AMA replaced HPA in ACS-3 specification.  It allows to limit size of the
disk alike to HPA, but declares inaccessible data as indeterminate.  One
of its practical use cases is to under-provision SATA SSDs for better
reliability and performance.

While there, fix HPA Security detection/reporting.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2019-07-19 19:15:08 +00:00
imp
7f1c525519 Keep track of the number of commands that exhaust their retry limit.
While we print failure messages on the console, sometimes logs are lost or
overwhelmed. Keeping a count of how many times we've failed retriable commands
helps get a magnitude of the problem.
2019-07-19 18:39:24 +00:00
imp
64b34948ed Keep track of the number of retried commands.
Retried commands can indicate a performance degredation of an nvme drive. Keep
track of the number of retries and report it out via sysctl, just like number of
commands an interrupts.
2019-07-19 18:39:18 +00:00
imp
77c0809f26 Remove pre-FreeBSD 7.0 compatibility. 2019-07-19 18:38:47 +00:00
imp
4d2be36042 Add comments about KERN_OPT here. 2019-07-19 17:48:29 +00:00
imp
37af220308 Use sysctl + CTLRWTUN for hw.nvme.verbose_cmd_dump.
Also convert it to a bool. While the rest of the driver isn't yet bool clean,
this will help.

Reviewed by: cem@
Differential Revision: https://reviews.freebsd.org/D20988
2019-07-19 00:32:56 +00:00
imp
9f9b80b338 Provide new tunable hw.nvme.verbose_cmd_dump
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.

Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
2019-07-18 21:58:51 +00:00
imp
38af09b891 Provide macros to extract the sub-fields of the CAP_LO and CAP_HI registers.
These macros make places where we extract these easier to read. The shift and
mask stuff is also a bit tedious and error prone. Start with the CAP_LO and
CAP_HI registers since their scope is somewhat constrained. This is style
chagne only, no functional changes.

Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20979
2019-07-18 15:41:10 +00:00
andrew
f90826df9f Rename arm64 macros in preperation for a script to generate them.
I have a script to generate most of the ID_AA64* macros from the Arm
XML source [1]. In preperation for using this we need to clean up the
macros to be in line with what the script will generate. This is the
first step, rename the macros to follow the names in said XML.

[1] https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools

MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20976
2019-07-18 13:58:04 +00:00
ian
964d4b7330 Fix a paste-o, set is212x = false for other chip types. Doh! 2019-07-18 01:37:00 +00:00
ian
db85d947ee Handle the PCF2127 RTC chip the same as PCF2129 when init'ing the chip.
This affects the detection of 24-hour vs AM/PM mode... the ampm bit is in a
different location on 2127 and 2129 chips compared to other nxp rtc chips.
I noticed the 2127 case wasn't being handled correctly when I accidentally
misconfiged my system by claiming my PCF2129 was a 2127.
2019-07-18 01:30:56 +00:00
mckusick
eb5503c168 The error reported in FS-14-UFS-3 can only happen on UFS/FFS
filesystems that have block pointers that are out-of-range for their
filesystem. These out-of-range block pointers are corrected by
fsck(8) so are only encountered when an unchecked filesystem is
mounted.

A new "untrusted" flag has been added to the generic mount interface
that can be set when mounting media of unknown provenance or integrity.
For example, a daemon that automounts a filesystem on a flash drive
when it is plugged into a system.

This commit adds a test to UFS/FFS that validates all block numbers
before using them. Because checking for out-of-range blocks adds
unnecessary overhead to normal operation, the tests are only done
when the filesystem is mounted as an "untrusted" filesystem.

Reported by:  Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as:  FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg)
Reviewed by:  kib
Sponsored by: Netflix
2019-07-17 22:07:43 +00:00
kp
a81657d241 riscv: Return vm_paddr_t in pmap_early_vtophys()
We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives in below 0x100000000, but as
soon as it doesn't this breaks.

MFC after:	1 week
Sponsored by:	Axiado
2019-07-17 21:25:26 +00:00
imp
9066677485 Remove now-obsolete comment. 2019-07-17 20:43:14 +00:00
asomers
f80b423863 F_READAHEAD: Fix r349248's overflow protection, broken by r349391
I accidentally broke the main point of r349248 when making stylistic changes
in r349391.  Restore the original behavior, and also fix an additional
overflow that was possible when uio->uio_resid was nearly SSIZE_MAX.

Reported by:	cem
Reviewed by:	bde
MFC after:	2 weeks
MFC-With:	349248
Sponsored by:	The FreeBSD Foundation
2019-07-17 17:01:07 +00:00
markj
15ba49a9d0 Fix FASTTRAPIOC_GETINSTR.
This ioctl is used when a breakpoint is encountered while disassembling
a symbol in the target process.  Since only one DTrace consumer can
toggle or enumerate fasttrap probes from a given process at time, this
ioctl does not appear to be used in practice.
2019-07-17 16:38:29 +00:00
sbruno
f33cafa731 I add the ability to accept the default pin widget configuration to help
with various laptops using hdaa(4) sound devices.  We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.

PR:	200526
Differential Revision:	https://reviews.freebsd.org/D17772
2019-07-17 04:13:46 +00:00
mckusick
895b89d33b Style.
No change intended.
2019-07-16 23:39:39 +00:00
mckusick
654c8c5036 When a process attempts to allocate space on a full filesystem, a
filesystem full message is sent to the offending process or the
kernel log if the offending process cannot be identified.

To prevent an explotion of messages, the kernel ppsratecheck()
function is used to limit the messages to one per second. This
revision changes the variable that tracks the rate of these messages
from a systemwide limit to a per-filesystem limit by moving it from
a global variable to a variable in the ufsmount structure.

Suggested by: kib
Reviewed by:  kib
Sponsored by: Netflix
2019-07-16 23:12:27 +00:00
imp
1f59bd994d Assume that the timeout value from the capacity is 1-based
Neither the 1.3 or 1.4 standards say this number is 1's based, but adding 1
costs little and copes with those NVMe drives that report '0' in this field
cheaply. This is consistent with what the Linux driver does as well.
2019-07-16 22:55:30 +00:00
cy
92ee47be1e As of upstream fil.c CVS r1.53 (March 1, 2009), prior to the import of
ipfilter 5.1.2 into FreeBSD-10, the fix for, 2580062 from/to targets
should be able to use any interface name, moved frentry.fr_cksum to
prior to frentry.fr_func thereby making this code redundant. After
investigating whether this fix to move fr_cksum was correct and if it
broke anything, it has been determined that the fix is correct and this
code is redundant. We remove it here.

MFC after:	2 weeks
2019-07-16 19:00:42 +00:00
cy
0aaf4cae5d Refactor, removing one compare.
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.

MFC after:	1 week
2019-07-16 19:00:38 +00:00
tuexen
3e4e5c46ac Fix compilation on platforms using gcc.
When compiling RACK on platforms using gcc, a warning that tcp_outflags
is defined but not used is issued and terminates compilation on PPC64,
for example. So don't indicate that tcp_outflags is used.

Reviewed by:		rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D20971
2019-07-16 17:54:20 +00:00
vangyzen
18f6c17b71 Adds signal number format to kern.corefile
Add format capability to core file names to include signal
that generated the core. This can help various validation workflows
where all cores should not be considered equally (SIGQUIT is often
intentional and not an error unlike SIGSEGV or SIGBUS)

Submitted by:	David Leimbach (leimy2k@gmail.com)
Reviewed by:	markj
MFC after:	1 week
Relnotes:	sysctl kern.corefile can now include the signal number
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20970
2019-07-16 15:51:09 +00:00
markj
07347a7f8f Always use the software DBM bit for now.
r350004 added most of the machinery needed to support hardware DBM
management, but it did not intend to actually enable use of the hardware
DBM bit.

Reviewed by:	andrew
MFC with:	r350004
Sponsored by:	The FreeBSD Foundation
2019-07-16 15:41:09 +00:00
markj
079d038a05 Fix the arm64 page table entry attribute mask.
It did not include the DBM or contiguous bits.

Reported by:	andrew
Reviewed by:	andrew
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-16 15:38:01 +00:00
markj
5f4e2bae6a Propagate attribute changes during demotion.
After r349117 and r349122, some mapping attribute changes do not trigger
superpage demotion. However, pmap_demote_l2() was not updated to ensure
that the replacement L3 entries carry any attribute changes that
occurred since promotion.

Reported and tested by:	manu
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20965
2019-07-16 14:40:49 +00:00
avg
eecfc22b8e bge: check that the bus is a pci bus before using it as such
This fixes the following panic on powerpc:
  pci_get_vendor failed for pcib1 on bus ofwbus0, error = 2

PR:		238730
Reported by:	Dennis Clarke <dclarke@blastwave.org>
Tested by:	Dennis Clarke <dclarke@blastwave.org>
MFC after:	2 weeks
2019-07-16 08:36:49 +00:00
jhibbits
462a300755 powerpc: Fix casueword(9) post-r349951
'=' asm constraint marks a variable as write-only.  Because of this, gcc
throws away the initialization of 'res', causing garbage to be returned if
the CAS was successful.  Use '+' to mark res as read/write, so that the
initialization stays in the generated asm.  Also, fix the reservation
clearing stwcx store index register in casueword32, and only do the dummy
store when needed, skip it if the real store has already succeeded.
2019-07-16 03:55:27 +00:00
alc
d567ba7b62 Revert r349973. Upon further reflection, I realized that the comment
deleted by r349973 is still valid on i386.  Restore it.

Discussed with:	   markj
2019-07-16 03:09:03 +00:00
jhb
bd67f2ec6b Add ptrace op PT_GET_SC_RET.
This ptrace operation returns a structure containing the error and
return values from the current system call.  It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).

The sr_error member holds the error value from the system call.  Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.

If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20901
2019-07-15 21:48:02 +00:00
ian
a108259927 In nxprtc(4), use the countdown timer for better timekeeping resolution
on PCx2129 chips too.

The datasheet for the PCx2129 chips says that there is only a watchdog
timer, no countdown timer.  It turns out the countdown timer hardware is
there and works just the same as it does on a PCx2127 chip, except that you
can't use it to trigger an interrupt or toggle an output pin.  We don't need
interrupts or output pins, we only need to read the timer register to get
sub-second resolution.  So start treating the 2129 chips the same as 2127.
2019-07-15 21:47:40 +00:00
ian
264fbd07fe Fix nxprtc(4) on systems that support i2c repeat-start correctly.
An obscure footnote in the datasheets for the PCx2127, PCx2129, and
PCF8523 rtc chips states that the chips do not support i2c repeat-start
operations.  When the driver was originally written and tested, the i2c
bus on that system also didn't support repeat-start and just quietly
turned repeat-start operations into a stop-then-start, making it appear
that the nxprtc driver was working properly.

The repeat-start situation only comes up on reads, so instead of using
the standard iicdev_readfrom(), use a local nxprtc_readfrom(), which is
just a cut-and-pasted copy of iicdev_readfrom(), modified to send two
separate start-data-stop sequences instead of using repeat-start.
2019-07-15 21:40:58 +00:00
jhb
895d57ec60 Don't pass error from syscallenter() to syscallret().
syscallret() doesn't use error anymore.  Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D2090
2019-07-15 21:25:16 +00:00
jhb
16bf0faaa2 Always set td_errno to the error value of a system call.
Early errors prior to a system call did not set td_errno.  This commit
sets td_errno for all errors during syscallenter().  As a result,
syscallret() can now always use td_errno without checking TDP_NERRNO.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D20898
2019-07-15 21:16:01 +00:00
tuexen
39695d1680 Don't free read control entries, which are still on the stream queue when
adding them the the read queue fails

MFC after:		1 week
2019-07-15 20:45:01 +00:00
kib
12d7e7a980 In do_sem2_wait(), balance umtx_key_get() with umtx_key_release() on retry.
Reported by:	ler
Bisected and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
2019-07-15 19:18:25 +00:00
markj
cdc0469c0e Implement software access and dirty bit management for arm64.
Previously the arm64 pmap did no reference or modification tracking;
all mappings were treated as referenced and all read-write mappings
were treated as dirty.  This change implements software management
of these attributes.

Dirty bit management is implemented to emulate ARMv8.1's optional
hardware dirty bit modifier management, following a suggestion from alc.
In particular, a mapping with ATTR_SW_DBM set is logically writeable and
is dirty if the ATTR_AP_RW_BIT bit is clear.  Mappings with
ATTR_AP_RW_BIT set are write-protected, and a write access will trigger
a permission fault.  pmap_fault() handles permission faults for such
mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus
mapping the page read-write.

Reviewed by:	alc
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20907
2019-07-15 17:13:32 +00:00
markj
c99cb2e79e pmap_clear_modify() needs to clear PTE_W.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-07-15 15:45:33 +00:00
markj
c6d7448b91 Fix reference counting in pmap_ts_referenced() on RISC-V.
pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page.  Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.

Reported by:	alc
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20926
2019-07-15 15:43:15 +00:00
manu
0e4cff93ba Remove duplicated device firmware entry in generic arm kernel config added in r333191
Submitted by:	Daniel Engberg (daniel.engberg.lists@pyret.net)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D20680
2019-07-15 15:07:55 +00:00
tuexen
eb56d924ff Add support for MSG_EOR and MSG_EOF in sendmsg() for SCTP.
This is an FreeBSD extension, not covered by Posix.

This issue was found by running syzkaller.

MFC after:		1 week
2019-07-15 14:54:04 +00:00
tuexen
e972541a65 Fix socket state handling when freeing an SCTP endpoint.
This issue was found by runing syzkaller.

MFC after:		1 week
2019-07-15 14:52:52 +00:00
kib
163feb2e45 In do_lock_pi(), do not return prematurely.
If umtxq_check_susp() indicates an exit, we should clean the resources
before returning.  Do it by breaking out of the loop and relying on
post-loop cleanup.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Differential revision:	https://reviews.freebsd.org/D20949
2019-07-15 08:39:52 +00:00
kib
97d74e089b Correctly check for casueword(9) success in do_set_ceiling().
After r349951, the return code must be checked instead of old == new
comparision.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days
Differential revision:	https://reviews.freebsd.org/D20949
2019-07-15 08:38:01 +00:00
tuexen
ead2ed8d3e Improve the input validation for l_linger.
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.

Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.

Thanks to syzkaller for making me aware of this issue.

Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().

Reviewed by:		markj@
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D20948
2019-07-14 21:44:18 +00:00
kib
4aa27a3b56 PR: 239143
Reported and tested by:	Wes Maag <jwmaag@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-14 21:08:54 +00:00
rrs
2bc1470fec This is the second in a number of patches needed to
get BBRv1 into the tree. This fixes the DSACK bug but
is also needed by BBR. We have yet to go two more
one will be for the pacing code (tcp_ratelimit.c) and
the second will be for the new updated LRO code that
allows a transport to know the arrival times of packets
and (tcp_lro.c). After that we should finally be able
to get BBRv1 into head.

Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D20908
2019-07-14 16:05:47 +00:00