The values to report can be set via LUN options. It can be useful for
testing, and also required for Drive Maintenance 2016 feature set.
MFC after: 2 weeks
CTL implements all defined feature sets except Drive Maintenance 2016,
which is not very applicable to such a virtual device, and implemented
only partially now. But may be it could be fixed later at least for
completeness.
MFC after: 2 weeks
if a demotion succeeds, then all of the 4KB page mappings within the
superpage-sized region must be valid, so there is no point in testing the
validity of the 4KB page mapping that is going to be write protected.
Deindent the nearby code.
Reviewed by: kib, markj
Tested by: pho (amd64, i386)
X-MFC after: r350004 (this change depends on arm64 dirty bit emulation)
Differential Revision: https://reviews.freebsd.org/D21027
The timeout field in the CAPS register is defined to be 8 bits, so its type was
uint8_t. We recently started adding 1 to it to cope with rogue devices that
listed 0 timeout time (which is impossible). However, in so doing, other devices
that list 0xff (for a 2 minute timeout) were broken when adding 1
overflowed. Widen the type to be uint32_t like its source register to avoid the
issue.
Reported by: bapt@
ATA sanitize is functionally identical to SCSI, just uses different
initiation commands and status reporting mechanism.
While there, make kernel better handle sanitize commands and statuses.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Added allocation retry loop in alloc_pvo_entry(), to wait for
memory to become available if the caller specifies the M_WAITOK flag.
Also, the loop in moa64_enter() was removed, as moea64_pvo_enter()
never returns ENOMEM. It is alloc_pvo_entry() memory allocation that
can fail and must be retried.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21035
This commit is for the correct version. (The incorrect one had the order
of the last two entries reversed, due to my testing with copy_file_range
at 568 instead of 569.) This misordering should not have been a problem,
but is now fixed.
This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.
vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.
Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.
Reviewed by: kib, asomers (plus comments by brooks, jilles)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
Summary:
It turns out statistics accounting is very expensive in the pmap driver,
and doesn't seem necessary in the common case. Make this optional
behind a MOEA64_STATS #define, which one can set if they really need
statistics.
This saves ~7-8% on buildworld time on a POWER9.
Found by bdragon.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D20903
turnstile_{lock,unlock}() were added for use in epoch. turnstile_lock()
returned NULL to indicate that the calling thread had lost a race and
the turnstile was no longer associated with the given lock, or the lock
owner. However, reader-writer locks may not have a designated owner,
in which case turnstile_lock() would return NULL and
epoch_block_handler_preempt() would leak spinlocks as a result.
Apply a minimal fix: return the lock owner as a separate return value.
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21048
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete. Remove it.
Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.
Reviewed by: delphij, emaste, oshogbo
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21033
Commit text by Jake:
If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register
function will free the ctx pointer. However, it does not reset the
device softc pointer to NULL.
This will result in memory corruption as a future access to the now
invalid pointer will corrupt memory that is later allocated on top of
the same memory location.
The iflib_device_deregister function correctly resets the softc pointer
by using device_set_softc().
This clears up the invalid dangling pointer and prevents memory
corruption that could lead to a panic or undefined behavior if the
device's driver failed to attach.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21003
The already-listed APMC0D0F ID belongs to the Ampere eMAG aarch64
platform, but ACPI support was not even built on aarch64.
Submitted by: Greg V <greg_unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21059
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Reported by: devgs@ukr.net
Reviewed by: rrs@
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20980
- Wrong order of casting and bit shift caused that enabling and disabling
queues didn't work properly for queues number larger than 32. Use literals
with right suffix instead.
- TX ring tail address was not updated during reinitiailzation of TX
structures. It could block sending traffic.
- Also remove unused variables 'eims' and 'active_queues'.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20826
Previously only some of the ID register fields were 64 bit. To allow
for a script to generate these mark them all 64 bit. To allow for their
use in assembly we need to use the UINT64_C macro via a new UL macro
to stop the lines from being too long.
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20977
After r343631 pfil hooks are invoked in net_epoch_preempt section,
this allows to avoid extra locking. Add NET_EPOCH_ASSER() assertion
to each ipfw_bpf_*tap*() call to require to be called from inside
epoch section.
Use NET_EPOCH_WAIT() in ipfw_clone_destroy() to wait until it becomes
safe to free() ifnet. And use on-stack ifnet pointer in each
ipfw_bpf_*tap*() call to avoid NULL pointer dereference in case when
V_*log_if global variable will become NULL during ipfw_bpf_*tap*() call.
Sponsored by: Yandex LLC
While for ATA disks resize is even more rare situation than for SCSI, it
may happen in case of HPA or AMA being used. Make ATA XPT report minor
IDENTIFY DATA change to upper layers with AC_GETDEV_CHANGED, and ada(4)
periph driver handle that event, recalculating all the disk properties and
signalling resize to GEOM. Since ATA has no mechanism of UNIT ATTENTIONs,
like SCSI, it has no way to detect that something has changed. That is why
this functionality depends on explicit reprobe via XPT_REPROBE_LUN call.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
oldpvo is never explicitly NULL'd by moea64_pvo_enter(), so don't check for
NULL to do anything, only check error.
PR: 239372
Reported by: Francis Little
In principle this should not matter as it's a union and they point to
the same memory location but based on the code above we should be
accessing .sata and not .ata.
Submitted by: arichardson
Reviewed by: scottl, imp
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D21002
o Add an experimental IOMMU support to xDMA framework
The BERI IOMMU device is the part of CHERI device-model project [1]. It
translates memory addresses for various BERI peripherals modelled in
software. It accepts FreeBSD/mips64 page directories format and manages
BERI TLB.
1. https://github.com/CTSRD-CHERI/device-model
Sponsored by: DARPA, AFRL
Summary:
Instead of searching for a PVO entry before adding, take advantage of
the fact that RB_INSERT() returns NULL if it inserts, and the existing entry if
an entry exists, without inserting a new entry. This saves an extra tree
traversal in the cases where the PVO does not exist.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D20944
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21014
we should test ATTR_SW_DBM, not ATTR_AP_RW, to determine whether to set
PGA_WRITEABLE. In effect, we are currently setting PGA_WRITEABLE based on
whether the dirty bit is preset, not whether the mapping is writeable.
Correct this mistake.
Reviewed by: markj
X-MFC with: r350004
Differential Revision: https://reviews.freebsd.org/D21013
fget_unlocked() and fhold().
On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create
transient references by sending fds over unix sockets.
I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html
Reviewed by: markj (previous version), mjg
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20947
where the page table entry was previously invalid. (Note that I did not
replace pmap_load_store() when it was followed by a TLB invalidation, even
if we are not using the return value from pmap_load_store().)
Correct an error in pmap_enter(). A test for determining when to set
PGA_WRITEABLE was always true, even if the mapping was read only.
In pmap_enter_l2(), when replacing an empty kernel page table page by a
superpage mapping, clear the old l2 entry and issue a TLB invalidation. My
reading of the ARM architecture manual leads me to believe that the TLB
could hold an intermediate entry referencing the empty kernel page table
page even though it contains no valid mappings.
Replace a couple direct uses of atomic_clear_64() by the new
pmap_clear_bits().
In a couple comments, replace the term "paging-structure caches", which is
an Intel-specific term for the caches that hold intermediate entries in the
page table, with wording that is more consistent with the ARM architecture
manual.
Reviewed by: markj
X-MFC after: r350004
Differential Revision: https://reviews.freebsd.org/D20998
features offered by the chips.
For 2127 and 2129 chips, fix the detection of when chip-init is needed. The
chip config needs to be reset whenever power was lost, but the logic was
wrong for 212x chips (it only worked for 8523). Now the "oscillator
stopped" bit rather than the power manager mode is used to detect startup
after powerfail.
For all chips, disable the clock output pin.
For chips that have a timestamp/tamper-monitor feature, turn off monitoring
of the timestamp trigger pin.
The 8523, 2127, and 2129 chips have a "power manager" feature that offers
several options. We've been using the default mode which enables
everything. Now the code sets the power manager options to
- direct-switch (when Vdd < Vbat, without extra threshold check)
- no battery monitor
- no external powerfail monitor
This reduces the current draw while running on battery from 1930nA to 880nA,
which should roughly double the lifespan of the battery under load.
Because battery checking is a nice thing to have, the code now does a check
at startup, and then once a day after that, instead of checking continuously
(but only actually reporting at startup). The battery check is now done by
setting the power manager back to default mode, sleeping briefly while it
makes a voltage measurement, then switching back to power-saving mode.
I would like to use the name vm_page_release() for a different purpose,
and vm_page_{import,release}() are local to vm_page.c.
Reviewed by: kib
MFC after: 1 week
EFSCFD (floating point single convert from double) emulation requires saving
the high word of the register, which uses SPE instructions. Enable the SPE
to avoid an SPV Unavailable exception.
MFC after: 1 week
Use 'struct bintime' instead of 'sbintime_t' to manage times in vPIT
to postpone rounding to final results rather than intermediate
results. In tests performed by Joyent, this reduced the error measured
by Linux guests by 59 ppm.
Smart OS bug: https://smartos.org/bugview/OS-6923
Submitted by: Patrick Mooney
Reviewed by: rgrimes
Obtained from: SmartOS / Joyent
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20335
Add HWCAP support for arm64.
defines are the same as in Linux and a userland program can use
elf_aux_info to get the data.
We only save the common denominator for all cores in case the
big and little cluster have different support (this is known to
exists even if we don't support those SoCs in FreeBSD)
Differential Revision: https://reviews.freebsd.org/D17137
When sendmsg(2) sucessfully internalized one SCM_RIGHTS control
message, but failed to process some other control message later, both
file references and filedescent memory needs to be freed. This was not
done, only mbuf chain was freed.
Noted, test case written, reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21000
entry, combining code currently in vm_map_unwire and
vm_map_wire_locked into a single function, called by each of them for
entries in transition.
Discussed with: kib, markj
Reviewed by: alc
Approved by: kib, markj (mentors, implicit)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D20833
AMA replaced HPA in ACS-3 specification. It allows to limit size of the
disk alike to HPA, but declares inaccessible data as indeterminate. One
of its practical use cases is to under-provision SATA SSDs for better
reliability and performance.
While there, fix HPA Security detection/reporting.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
While we print failure messages on the console, sometimes logs are lost or
overwhelmed. Keeping a count of how many times we've failed retriable commands
helps get a magnitude of the problem.
Retried commands can indicate a performance degredation of an nvme drive. Keep
track of the number of retries and report it out via sysctl, just like number of
commands an interrupts.
Also convert it to a bool. While the rest of the driver isn't yet bool clean,
this will help.
Reviewed by: cem@
Differential Revision: https://reviews.freebsd.org/D20988
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.
Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
FUSE file systems can optionally support interrupting outstanding
operations. However, the file system does not identify to the kernel at
mount time whether it's capable of doing that. Instead it signals its
noncapability by returning ENOSYS to the first FUSE_INTERRUPT operation it
receives. That's a problem for reliable signal delivery, because the kernel
must choose which thread should get a signal before it knows whether the
FUSE server can handle interrupts. The problem is even worse because the
FUSE protocol allows a file system to simply ignore all FUSE_INTERRUPT
operations.
Fix the signal delivery logic by making interruptibility an opt-in mount
option. This will require a corresponding change to libfuse, but not to
most file systems that link to libfuse.
Bump __FreeBSD_version due to the new mount option.
Sponsored by: The FreeBSD Foundation
These macros make places where we extract these easier to read. The shift and
mask stuff is also a bit tedious and error prone. Start with the CAP_LO and
CAP_HI registers since their scope is somewhat constrained. This is style
chagne only, no functional changes.
Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20979
fticket_wait_answer would spin if it received an unhandled signal whose
default disposition is to terminate. The reason is because msleep(9) would
return EINTR even for a masked signal. One reason is when the thread is
stopped, which happens for example during sigexit(). Fix this bug by
returning immediately if fticket_wait_answer ever gets interrupted a second
time, for any reason.
Sponsored by: The FreeBSD Foundation
This affects the detection of 24-hour vs AM/PM mode... the ampm bit is in a
different location on 2127 and 2129 chips compared to other nxp rtc chips.
I noticed the 2127 case wasn't being handled correctly when I accidentally
misconfiged my system by claiming my PCF2129 was a 2127.
1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be
masked, anyway.
2) Fix an infinite loop bug. If a process received both a maskable signal
lower than 9 (like SIGINT) and then received SIGKILL,
fticket_wait_answer would spin. msleep would immediately return EINTR,
but cursig would return SIGINT, so the sleep would get retried. Fix it
by explicitly checking whether SIGKILL has been received.
3) Abandon the sig_isfatal optimization introduced by r346357. That
optimization would cause fticket_wait_answer to return immediately,
without waiting for a response from the server, if the process were going
to exit anyway. However, it's vulnerable to a race:
1) fatal signal is received while fticket_wait_answer is sleeping.
2) fticket_wait_answer sends the FUSE_INTERRUPT operation.
3) fticket_wait_answer determines that the signal was fatal and returns
without waiting for a response.
4) Another thread changes the signal to non-fatal.
5) The first thread returns to userspace. Instead of exiting, the
process continues.
6) The application receives EINTR, wrongly believes that the operation
was successfully interrupted, and restarts it. This could cause
problems for non-idempotent operations like FUSE_RENAME.
Reported by: kib (the race part)
Sponsored by: The FreeBSD Foundation
filesystems that have block pointers that are out-of-range for their
filesystem. These out-of-range block pointers are corrected by
fsck(8) so are only encountered when an unchecked filesystem is
mounted.
A new "untrusted" flag has been added to the generic mount interface
that can be set when mounting media of unknown provenance or integrity.
For example, a daemon that automounts a filesystem on a flash drive
when it is plugged into a system.
This commit adds a test to UFS/FFS that validates all block numbers
before using them. Because checking for out-of-range blocks adds
unnecessary overhead to normal operation, the tests are only done
when the filesystem is mounted as an "untrusted" filesystem.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg)
Reviewed by: kib
Sponsored by: Netflix
We can't use a u_int to compute the physical address in
pmap_early_vtophys(). Our int is 32-bit, but the physical address is
64-bit. This works fine if everything lives in below 0x100000000, but as
soon as it doesn't this breaks.
MFC after: 1 week
Sponsored by: Axiado
That change was intended to be cosmetic, but it inadvertenly caused
vnode_pager_setsize to discard cached indirect blocks and extended
attributes on UFS during truncation. The reason is because those blocks
have negative LBNs, which get sign-cast to positive VM indexes.
Reported by: kib
Sponsored by: The FreeBSD Foundation
I accidentally broke the main point of r349248 when making stylistic changes
in r349391. Restore the original behavior, and also fix an additional
overflow that was possible when uio->uio_resid was nearly SSIZE_MAX.
Reported by: cem
Reviewed by: bde
MFC after: 2 weeks
MFC-With: 349248
Sponsored by: The FreeBSD Foundation
This ioctl is used when a breakpoint is encountered while disassembling
a symbol in the target process. Since only one DTrace consumer can
toggle or enumerate fasttrap probes from a given process at time, this
ioctl does not appear to be used in practice.
with various laptops using hdaa(4) sound devices. We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.
PR: 200526
Differential Revision: https://reviews.freebsd.org/D17772
filesystem full message is sent to the offending process or the
kernel log if the offending process cannot be identified.
To prevent an explotion of messages, the kernel ppsratecheck()
function is used to limit the messages to one per second. This
revision changes the variable that tracks the rate of these messages
from a systemwide limit to a per-filesystem limit by moving it from
a global variable to a variable in the ufsmount structure.
Suggested by: kib
Reviewed by: kib
Sponsored by: Netflix
Neither the 1.3 or 1.4 standards say this number is 1's based, but adding 1
costs little and copes with those NVMe drives that report '0' in this field
cheaply. This is consistent with what the Linux driver does as well.
ipfilter 5.1.2 into FreeBSD-10, the fix for, 2580062 from/to targets
should be able to use any interface name, moved frentry.fr_cksum to
prior to frentry.fr_func thereby making this code redundant. After
investigating whether this fix to move fr_cksum was correct and if it
broke anything, it has been determined that the fix is correct and this
code is redundant. We remove it here.
MFC after: 2 weeks
This changes the return code however the caller only tests for 0 and != 0.
One might ask then, why multiple return codes when the caller only tests
for 0 and != 0? From what I can tell, Darren probably passed various
return codes for sake of debugging. The debugging code is long gone
however we can still use the different return codes using DTrace FBT
traces. We can still determine why the compare failed by examining the
differences between the fr1 and fr2 frentry structs, which is a simple
test in DTrace. This allows reducing the number of tests, improving the
code while not affecting our ability to capture information for
diagnostic purposes.
MFC after: 1 week
When compiling RACK on platforms using gcc, a warning that tcp_outflags
is defined but not used is issued and terminates compilation on PPC64,
for example. So don't indicate that tcp_outflags is used.
Reviewed by: rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20971
Add format capability to core file names to include signal
that generated the core. This can help various validation workflows
where all cores should not be considered equally (SIGQUIT is often
intentional and not an error unlike SIGSEGV or SIGBUS)
Submitted by: David Leimbach (leimy2k@gmail.com)
Reviewed by: markj
MFC after: 1 week
Relnotes: sysctl kern.corefile can now include the signal number
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20970
r350004 added most of the machinery needed to support hardware DBM
management, but it did not intend to actually enable use of the hardware
DBM bit.
Reviewed by: andrew
MFC with: r350004
Sponsored by: The FreeBSD Foundation
After r349117 and r349122, some mapping attribute changes do not trigger
superpage demotion. However, pmap_demote_l2() was not updated to ensure
that the replacement L3 entries carry any attribute changes that
occurred since promotion.
Reported and tested by: manu
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20965
This fixes the following panic on powerpc:
pci_get_vendor failed for pcib1 on bus ofwbus0, error = 2
PR: 238730
Reported by: Dennis Clarke <dclarke@blastwave.org>
Tested by: Dennis Clarke <dclarke@blastwave.org>
MFC after: 2 weeks
'=' asm constraint marks a variable as write-only. Because of this, gcc
throws away the initialization of 'res', causing garbage to be returned if
the CAS was successful. Use '+' to mark res as read/write, so that the
initialization stays in the generated asm. Also, fix the reservation
clearing stwcx store index register in casueword32, and only do the dummy
store when needed, skip it if the real store has already succeeded.
This ptrace operation returns a structure containing the error and
return values from the current system call. It is only valid when a
thread is stopped during a system call exit (PL_FLAG_SCX is set).
The sr_error member holds the error value from the system call. Note
that this error value is the native FreeBSD error value that has _not_
been translated to an ABI-specific error value similar to the values
logged to ktrace.
If sr_error is zero, then the return values of the system call will be
set in sr_retval[0] and sr_retval[1].
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20901
on PCx2129 chips too.
The datasheet for the PCx2129 chips says that there is only a watchdog
timer, no countdown timer. It turns out the countdown timer hardware is
there and works just the same as it does on a PCx2127 chip, except that you
can't use it to trigger an interrupt or toggle an output pin. We don't need
interrupts or output pins, we only need to read the timer register to get
sub-second resolution. So start treating the 2129 chips the same as 2127.
An obscure footnote in the datasheets for the PCx2127, PCx2129, and
PCF8523 rtc chips states that the chips do not support i2c repeat-start
operations. When the driver was originally written and tested, the i2c
bus on that system also didn't support repeat-start and just quietly
turned repeat-start operations into a stop-then-start, making it appear
that the nxprtc driver was working properly.
The repeat-start situation only comes up on reads, so instead of using
the standard iicdev_readfrom(), use a local nxprtc_readfrom(), which is
just a cut-and-pasted copy of iicdev_readfrom(), modified to send two
separate start-data-stop sequences instead of using repeat-start.
syscallret() doesn't use error anymore. Fix a few other places to permit
removing the return value from syscallenter() entirely.
- Remove a duplicated assertion from arm's syscall().
- Use td_errno for amd64_syscall_ret_flush_l1d.
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D2090
Early errors prior to a system call did not set td_errno. This commit
sets td_errno for all errors during syscallenter(). As a result,
syscallret() can now always use td_errno without checking TDP_NERRNO.
Reviewed by: kib
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D20898
Previously the arm64 pmap did no reference or modification tracking;
all mappings were treated as referenced and all read-write mappings
were treated as dirty. This change implements software management
of these attributes.
Dirty bit management is implemented to emulate ARMv8.1's optional
hardware dirty bit modifier management, following a suggestion from alc.
In particular, a mapping with ATTR_SW_DBM set is logically writeable and
is dirty if the ATTR_AP_RW_BIT bit is clear. Mappings with
ATTR_AP_RW_BIT set are write-protected, and a write access will trigger
a permission fault. pmap_fault() handles permission faults for such
mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus
mapping the page read-write.
Reviewed by: alc
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20907
pmap_ts_referenced() does not necessarily clear the access bit from
all accessed mappings of a given page. Thus, if a scan of the mappings
needs to be restarted, we should be careful to avoid double-counting
accessed mappings whose access bits were not cleared in a previous
attempt.
Reported by: alc
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20926
If umtxq_check_susp() indicates an exit, we should clean the resources
before returning. Do it by breaking out of the loop and relying on
post-loop cleanup.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D20949
After r349951, the return code must be checked instead of old == new
comparision.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D20949
When using the SOL_SOCKET level socket option SO_LINGER, the structure
struct linger is used as the option value. The component l_linger is of
type int, but internally copied to the field so_linger of the structure
struct socket. The type of so_linger is short, but it is assumed to be
non-negative and the value is used to compute ticks to be stored in a
variable of type int.
Therefore, perform input validation on l_linger similar to the one
performed by NetBSD and OpenBSD.
Thanks to syzkaller for making me aware of this issue.
Thanks to markj@ for pointing out that a similar check should be added
to so_linger_set().
Reviewed by: markj@
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20948
get BBRv1 into the tree. This fixes the DSACK bug but
is also needed by BBR. We have yet to go two more
one will be for the pacing code (tcp_ratelimit.c) and
the second will be for the new updated LRO code that
allows a transport to know the arrival times of packets
and (tcp_lro.c). After that we should finally be able
to get BBRv1 into head.
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D20908
prior to its import into FreeBSD. This macro calculates the size to be
compared within the frentry structure. The ipfilter 4 version of the
macro calculated the compare size based upon the static size of the
frentry struct. Today it uses the ipfilter 5 method of calculating the
size based upon the new to ipfilter 5 fr_size value found in the
frentry struct itself.
No effective change in code is intended.
MFC after: 1 week
TLB entry. Specifically, at the start of pmap_enter_quick_locked(), we
would sometimes have a TLB entry for an invalid PTE, and we would need to
issue a TLB invalidation before exiting pmap_enter_quick_locked(). However,
we should never have a TLB entry for an invalid PTE. r349905 has addressed
the root cause of the problem, and so we no longer need this workaround.
X-MFC after: r349905
NetBSD and OpenBSD have libc wrapper functions for the ARM_SYNC_ICACHE and
ARM_DRAIN_WRITEBUF sysarch operations. This change adds compatible functions
to our library. This should make it easier for various upstream sources to
support *BSD operating systems with a single variation of cache maintence
code in tools like interpreters and JIT compilers.
I consider the argument types passed to arm_sync_icache() to be especially
unfortunate, but this is intended to match the other BSDs.
Differential Revision: https://reviews.freebsd.org/D20906
* Fix the kernel build with gcc by removing a redundant extern declaration
* In the tests, fix a printf format specifier that assumed LP64
Sponsored by: The FreeBSD Foundation
Accept an IEEE Extended Unique Identifier (EUI-64) from the command
line for each NVMe namespace. If one isn't provided, it will create one
based on the CRC16 of:
- the FreeBSD IEEE OUI
- PCI bus, device/slot, function values
- Namespace ID
Reviewed by: imp, araujo, jhb, rgrimes
Approved by: imp (mentor), jhb (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19905
The only consumer of moea64_pvo_remove_from_page_locked() already has the
page in hand, so there is no need to search for the page while holding the
lock. Drop the wrapper, and rename _moea64_pvo_remove_from_page_locked().
Reported by: alc
Summary:
Since the 'page pv' lock is one of the most highly contended locks, we
need to try to do as much work outside of the lock as we can. The
moea64_pvo_remove_from_page() path is a low hanging fruit, where we can
do some heavy work (PHYS_TO_VM_PAGE()) outside of the lock if needed.
In one path, moea64_remove_all(), the PV lock is already held and can't
be swizzled, so we provide two ways to perform the locked operation, one
that can call PHYS_TO_VM_PAGE outside the lock, and one that calls with
the lock already held.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20694
Summary:
If an illegal instruction is encountered on a process running on a
powerpc64 kernel it would attempt to sync the cache before retrying the
instruction "just in case". However, since curpmap is not set, when
moea64_sync_icache() attempts to lock the pmap, it's locking on a NULL pointer,
triggering a panic. Fix this by adding a (assumed unnecessary) fallback to
curthread's pmap in moea64_sync_icache().
Reported by: alfredo.junior_eldorado.org.br
Reviewed by: luporl, alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20911
The driver used to log any non-zero cause and when running with a single
line interrupt it would spam the console/logs with reports of interrupts
that are of no interest to anyone.
MFC after: 1 week
Sponsored by: Chelsio Communications
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely. Otherwise, rogue userspace livelocks the
kernel.
To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison. The primitive no longer retries
the operation if it failed spuriously. Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.
On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.
Reviewed by: andrew (arm64, previous version), markj
Tested by: pho
Reported by: https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20772
hard-coded value. Don't allocate space for it from the kernel stack.
Account for prefix, suffix, and separator space in the name. This
takes the effective length up to 229 bytes on 13-current, and 37 bytes
on 12-stable. 37 bytes is enough to hold a full GUID string.
PR: 234134
MFC after: 1 week
Differential Revision: http://reviews.freebsd.org/D20924
- Check for ATTR_SW_MANAGED before anything else.
- Use pmap_pte_dirty() in pmap_remove_pages().
No functional change intended.
Reviewed by: alc
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Enable this for the NovAtel OEMv2 GPS receiver.
Not fixed: The receiver shows up as "<Interface 0>" in the device
tree, because that is literally what the descriptor-string is.
Reviewed by: hselasky@
The reason for this is that ipftest(8), which still works on FreeBSD-11,
fails to link to it, breaking stable/11 builds.
ipftest(8) was broken (segfault) sometime during the FreeBSD-12 cycle.
glebius@ suggested we disable building it until I can get around to
fixing it. Hence this was not caught in -current.
The intention is to fix ipftest(8) as it is used by the netbsd-tests
(imported by ngie@ many moons ago) for regression testing.
MFC after: immediately
if a retry to allocate swap space, after a larger allocation attempt failed, allocated a smaller set of free blocks
that ended on a 32- or 64-block boundary.
Add tests to detect this kind of failure-to-extend-at-boundary and prevent the associated accounting screwup.
Reported by: pho
Tested by: pho
Reviewed by: alc
Approved by: markj (mentor)
Discussed with: kib
Differential Revision: https://reviews.freebsd.org/D20893
Depending on system configuration, version, and architecture,
mds_handler might be dereferenced from doreti before
hw_mds_recalculate_boot() initialized it. Statically assign void
method to cover all cases.
Reported by: "Schuendehuette, Matthias (LDA IT PLM)" <matthias.schuendehuette@siemens.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
Summary:
Running a 32-bit process on a 64-bit POWER CPU may still use all 64-bits
in calculations, while ignoring the upper 32 bits for addressing
storage. It so happens that some processes end up with r1 (SP) having
bit 31 set in some cases (33-bit address). Writing out to this 33-bit
address obviosly fails. Since the CPU ignores the upper bits, we should
as well.
sendsig() and cpu_fetch_syscall_args() appear to be the only functions
that actually rely on userspace register values for copy in/out, and
cpu_fetch_syscall_args() doesn't seem to be bitten in practice yet.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20896
register values" of the architecture manual, an isb instruction should be
executed after updating ttbr0_el1 and before invalidating the TLB. The
lack of this instruction in pmap_activate() appears to be the reason why
andrew@ and I have observed an unexpected TLB entry for an invalid PTE on
entry to pmap_enter_quick_locked(). Thus, we should now be able to revert
the workaround committed in r349442.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20904
via an ioctl interface. Rules can be added or removed and stats and
counters can be zeroed out. As the ipfilter interprets these
instructions or operations they are stored in an integer called
addrem (add/remove). 1 is add, 2 is remove, and 3 is clear stats and
counters. Much of this is not documented. This commit documents these
operations by replacing simple integers with a self documenting
enum along with a few basic comments.
MFC after: 1 week
This device cannot cross a 4GB boundary with DMA. Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.
Submitted by: gallatin@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20910
well as sets in some of the groundwork for committing BBR. The
hpts system is updated as well as some other needed utilities
for the entrance of BBR. This is actually part 1 of 3 more
needed commits which will finally complete with BBRv1 being
added as a new tcp stack.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20834