Every other architecture defines this and this is required for
interrupts to work when using QEMU's PCI VirtIO devices (which all
report an interrupt line of 0) for two reasons.
Firstly, interrupt line 0 is wrong; they use one of 0x20-0x23 with the
lines being cycled across devices like normal. Moreover, RISC-V uses
INTRNG, whose IRQs are virtual as indices into its irq_map, so even if
we have the right interrupt line we still need to try and route the
interrupt in order to ultimately call into intr_map_irq and get back a
unique index into the map for the given line, otherwise we will use
whatever happens to be in irq_map[line] (which for QEMU where the line
is initialised to 0 results in using the first allocated interrupt,
namely the RTC on IRQ 11 at time of commit).
Note that pci_assign_interrupt will still do the wrong thing for INTRNG
when using a tunable, as it will bypass INTRNG entirely and use the
tunable's value as the index into irq_map, when it should instead
(indirectly) call intr_map_irq to allocate a new entry for the given
IRQ and treat the tunable as stating the physical line in use, which is
what one would expect. This, however, is a problem shared by all INTRNG
architectures, and not exclusive to RISC-V.
Reviewed by: kib
Approved by: kib
Differential Revision: https://reviews.freebsd.org/D26564
Without this patch, if a call to copy_file_range(2) specifies an input file
offset + len that would wrap around, EINVAL is returned.
I thought that was the Linux behaviour, but recent testing showed that
Linux accepts this case and does the copy_file_range() to EOF.
This patch changes the FreeBSD code to exhibit the same behaviour as
Linux for this case.
Reviewed by: asomers, kib
Differential Revision: https://reviews.freebsd.org/D26569
This appears to be a typo. The AdvSIMD field encodes support for
half-precision floating point SIMD instructions, which corresponds to
HWCAP_ASIMDHP, not HWCAP_ASIMDDP.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
ccr(4) uses software to handle GCM and CCM requests not supported by
the crypto engine (e.g. with only AAD and no payload). This change
adds a fallback for a few more requests such as those with more SGL
entries than can fit in a work request (this can happen for GCM when
decrypting a TLS record split across 15 or more packets).
Reported by: Chelsio QA
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26582
Rather than placing the epoch around the entire receive loop which
might call into rtwn_rx_frame() and USB and sleep, split the loop
into two[1] and leave us with one unlock/lock cycle as well.
PR: 249925
Reported by: thj, (rkoberman gmail.com)
Tested by: thj
Suggested by: adrian [1]
Reviewed by: adrian
MFC after: 3 days
Sponsored by: The FreeBSD Foundation (initially, paniced my iwl lab host)
Differential Revision: https://reviews.freebsd.org/D26554
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
This function isn't ACPI dependent and we may use it on FDT systems
as well.
o Don't repeat the function declaration, include iommu.h instead.
Reviewed by: andrew, kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26584
This was introduced when I merged r361287 to OpenZFS and has been fixed
there already, commit 3f6bb6e43fd68e.
Reported by: swills
Reviewed by: allanjude, freqlabs, mmacy
LLVM 11 changed the meaning of '-O' from '-O2' to '-O1', which resulted
in debug kernels (with 'makeoptions DEBUG=-g') being built with inlining
disabled, causing severe performance hit.
The -O2 was already being used for building amd64, powerpc, and powerpcspe.
Discussed with: jrtc27, arichardson, bdragon, jhibbits
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26471
This avoids setting the association in an inconsistent
state, which could result in a use-after-free situation.
This can be triggered by a malicious peer, if the peer
can modify the cookie without the local endpoint recognizing
it.
Thanks to Ned Williamson for reporting the issue.
MFC after: 3 days
Bind the netmap tx queues to a special '0xff' scheduling class which
makes the firmware skip some processing related to rate limiting on the
outgoing traffic. Future firmwares will do this automatically.
MFC after: 1 week
Sponsored by: Chelsio Communications
- Only active netmap receive queues should be in the RSS lookup table.
- The RSS table should be restored for NIC operation when the last
active netmap queue is switched off, not the first one.
- Support repeated netmap ON/OFF on a subset of the queues. This works
whether the the queues being enabled and disabled are the only ones
active or not. Some kring indexes have to be reset in the driver for
the second case.
MFC after: 1 week
Sponsored by: Chelsio Communications
For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state. While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.
SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison. This implementation mirrors that of other
similar ioctls in FreeBSD.
Submitted by: Roy Marples <roy@marples.name>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D26538
Until we can do proper /etc/rc output on both consoles in multicons
boot (or all of them if we ever generalize), report when we are
booting multicons. Also report the primary console. This will be a big
hint why output stops after this line (though some slow USB discovery
still happens after mountroot / init starts).
Reviewed by: scottl@, tsoome@
Differential Revision: https://reviews.freebsd.org/D26574
Use address of the pointer passed to kernel to determine whether the pointer
is a FDT block (physical address) or a module pointer (virtual kernel address).
This fragment was supposed to be committed before r366196, but I accidentally
skipped it in a patch series.
Reported by: bz
from 32 to 28) by shrinking some entries and reordering them.
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26508
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.
Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.
MFC after: 3 days
We have a few shortcuts in the arm trap code to speed up obvious "must fail"
cases. In these situations, make sure that we fill in the "sig" and "code"
fields of the generated signal.
MFC after: 3 weeks
It was removed in r355289 but forgot to return it back when new u-boot booti
support was committed. Although booti is not the preferred method of
booting the kernel, it is very useful for the initial phase of porting
FreeBSD to a new platform or booting the kernel on various embedded boards
in an industrial environment.
Don't map same physical memory multiple times with different cache attributes.
This is explicitly stated as architectural undefined behavior, leading to
coherency issues sooner or later.
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.
For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).
For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.
Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.
Tested by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
PR: 249567
Reported by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
MFC after: 3 days
There may be additional 64-bit ABIs supported, so use a positive check rather
than a negative check.
Suggested by: imp
MFC after: 1 week
Sponsored by: Juniper Networks, Inc
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
PR: 249395
Reported by: Trenton Schulz <trueos@norwegianrockcat.com>
Reviewed by: scottl
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26550
Allow the necessary parts of systm.h to be visible in the _STANDALONE
environnment. Limit the reset to only being visible for _KERNEL
builds. Map KASSERT, etc to printf on failure in the bootloader until
we have more confidence things won't break and leave systems
unbootable. Eventually, this should map to a full panic in the
bootloader, but that also needs some enhancement to be more useful.
Reviewed by: tsoome, jhb
Differential Revision: https://reviews.freebsd.org/D26543
Original patch was against FreeBSD 12, and a test compile wasn't run against
head. md_tls_tcb_offset field was moved from mdthread to mdproc in the
meantime.
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
_KERNEL and _STANDALONE are different things. They cannot both be true
at the same time. If things that are normally visible only to _KERNEL
are needed for the _STANDALONE environment, you need to also make them
visible to _STANDALONE. Often times, this will be just a subset of the
required things for _KERNEL (eg global variables are but one example).
sys/cdefs.h is included by pretty much everything in both the loader
and the kernel, so is the ideal choke point.
A received control packet may cause the transmit queue to be flushed, in
which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler.
The handler checks to see if it was cancelled before doing anything, but
did so before acquiring the node lock, so a small race window could
cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue,
ultimately causing a null pointer dereference.
PR: 241133
Reviewed by: bz, glebius, Lutz Donnerhacke
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D26548
Summary:
Two bugs:
* Elf32_Auxinfo is broken, using pointers in the union, which are 64-bits not
32.
* freebsd32_sysarch() doesn't update the 'user local' register when handling
MIPS_SET_TLS, leading to a NULL pointer dereference in the 32-bit
application.
Reviewed by: #mips, brooks
MFC after: 1 week
Sponsored by: Juniper Networks, Inc
Differential Revision: https://reviews.freebsd.org/D26556
In some cases, the syscon driver may be used by consumer requiring better
control about locking (ie. it may be used as registe file provider for clock
driver which needs locked access to multiple registers).
Add fine locking protocol methods together with bunch of helper functions
in syscon driver and implement this functionality in syscon_generic driver.
MFC after: 4 weeks
Syscon can also have child nodes that share a registration file with it.
To do this correctly, follow these steps:
- subclass syscon from simplebus and expose it if the node is also
"simple-bus" compatible.
- block simplebus probe for this compatible string, so it's priority
(bus pass) doesn't colide with syscon driver.
While I'm in, also block "syscon", "simple-mfd" for the same reason.
MFC after: 4 weeks
The fastpath in tcp_output tries to send out
full segments, and avoid sending partial segments by
comparing against the static t_maxseg variable.
That value does not consider tcp options like timestamps,
while the initial window calculation is using
the correct dynamic tcp_maxseg() function.
Due to this interaction, the last, full size segment
is considered too short and not sent out immediately.
Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26478
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.
As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.
Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D22438
designated initializers. This makes it easier to modify 'struct sysent'
layout.
Reviewed by: kevans
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26530
The hardware supports periods as long as 196 seconds[*] when using the
maximal prescaling of 72000 and maximum cycle count of 2^16.
But the code becomes incorrect when the period length approaches 1 second.
That's because of things like NS_PER_SEC / period.
[*] At the same time I must note that the KPI provides for maximum
period of about 4 seconds (2^32 nanoseconds).
MFC after: 2 weeks
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".
However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.
Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.
PR: 247276
Reported by: trapexit@spawn.link
Reviewed by: cem
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D26485
We have (two versions) of MS() and SM() macros which we use throughout
the wireless code. Change all but three places (ath_hal, rtwn, and rsu)
to the newly provided _IEEE80211_MASKSHIFT() and _IEEE80211_SHIFTMASK()
macros. Also change one internal case using both _S and _M instead of
just _S away from _M (one of the reasons rtwn and rsu were not changed).
This was done semi-mechanically. No functional changes intended.
Requested by: gnn (D26091)
Reviewed by: adrian (pre line wrap)
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision: https://reviews.freebsd.org/D26539
- We can exit the loop as soon as the filter check passes.
- The alignment check has already passed so there is no need to also run
it here.
Sponsored by: Innovate UK
We need to use a bounce buffer when the memory we are operating on is not
aligned to a cacheline, and not aligned to the maps alignment.
The former is to stop other threads from dirtying the cacheline while we
are performing DMA operations with it. The latter is to check memory
passed in by a driver is correctly aligned for the device.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26496
This will ensure nothing modifies the cacheline while DMA is in progress
so we won't need to bounce the data.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26495
_STANDALONE is only for the bootloader, not kernel modules. Remove it
from the build. This was harmless before, but sys/malloc.h now does
different things for the standalone environment, triggering the issue.
Use it to decide if we can skip cache management.
While here remove the DMAMAP_COULD_BOUNCE flag as it's unneeded.
Reviewed by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26494
Add helper functions to the arm64 busdma for common cases of checking if
we may need to bounce, and if we must bounce for a given address.
These will be expanded later as we handle cache-misaligned memory.
Reported by: mmel
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26493
The ZSTD support for the boot loader will need to include files that
use the kernel's malloc interface. Create a standalone stub version
that's functional enough to allow this to work. There's some
limitations in this interface, and it's not quite a perfect
match. Specifically, M_WAITOK allocations can fail because there's
nothing that can be done we no memory is available.
It is only ever called for negative entries and for those it is
just a wrapper around cache_zap_negative_locked_vnode_kl which
always succeeds.
This also fixes a bug where cache_lookup_fallback should have been
calling cache_zap_locked_bucket instead. Note that in order to trigger
the bug NOCACHE must not be set, which currently only happens when
creating a new coredump (and then the coredump-to-be has to have a
negative entry).
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.
Suggested by: kevans
and current address space is already destroyed, so kern_execve()
terminates the process.
While there, clean up some internals of post_execve() inlined in init_main.
Reported by: Peter <pmc@citylink.dinoex.sub.org>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26525
We left these in the clean rule to avoid having stale files remain in
working trees, but enough time has now passed that it's no longer
relevant.
Discussed with: imp
The entire cache scan was a leftover from the old implementation.
It is incredibly wasteful in presence of several mount points and does not
win much even for single ones.
Similar to OPAL calls, switch to big endian to do calls to RTAS.
(Missed this one when I was doing the bulk commit of PowerPC64LE support.)
Sponsored by: Tag1 Consulting, Inc.
Due to enter_idle_powerx fabricating a MSR from scratch, it is necessary
for it to care about the endianness, so we don't accidentally switch
endian the first time we idle a thread.
Took about five seconds to spot after seeing an unmangled backtrace.
The hard bit was needing to temporarily set up a mutex to sort out the
logjam that happens when every thread simultaneously wakes up in the wrong
endian due to the panic IPI and panics, leaving what I can best describe as
"alphabet soup" on the console.
Luckily, I already had a patch sitting around to do that.
This brings POWER8 up to equivilence with POWER9 on PPC64LE.
Sponsored by: Tag1 Consulting, Inc.
OPAL unconditionally enters secondary CPUs with only HV and SF set.
I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.
This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.
It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.
Sponsored by: Tag1 Consulting, Inc.
More endian conversion.
* Install TCEs correctly (i.e. in big endian)
* Convert to big endian and back when setting up queue pages and IRQs.
Sponsored by: Tag1 Consulting, Inc.
Since OPAL runs in big endian, any data being passed back and forth
via memory instead of registers needs to be byteswapped.
From my notes during development:
"A good way to find candidates is to look for vtophys() in opal_call()
parameters. The memory being passed will be written into in BE."
Sponsored by: Tag1 Consulting, Inc.
It's much easier to implement this in an endian-independent way when we
don't also have to worry about masking half of the dword off.
Given that this code ran on a machine that ran a poudriere bulk with no
kernel oddities, I am relatively certain it is correctly implemented. ;)
This should be a minor performance boost on BE as well.
Sponsored by: Tag1 Consulting, Inc.
For a body of code that had its endian conversion bits written blind without
the ability to test, moea64 was VERY close to being correct.
There were only four instances where the existing code was getting it wrong.
Sponsored by: Tag1 Consulting, Inc.
This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)
While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)
Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.
IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.
The so-called "TRUELE" feature bit which is the baseline requirement for
needed for PowerPC64LE was introduced in POWER8.
Sponsored by: Tag1 Consulting, Inc.
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.
For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.
No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.
Sponsored by: Tag1 Consulting, Inc.
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.
Sponsored by: Tag1 Consulting, Inc.
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.
These changes are enough to attach a rootdisk.
Sponsored by: Tag1 Consulting, Inc.
The TCG implementation of mtmsrd in qemu blindly copies the entire register
to the MSR, instead of the specific bit positions listed in the ISA.
This means that qemu will prematurely switch endian out from under the
running code instead of waiting for the rfid, causing an immediate trap
as it attempts to interpret the next instruction in the wrong endianness.
To work around this, ensure PSL_LE is still set before doing the mtmsrd.
In the future, we may wish to just turn off translation and unconditionally
use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr.
Add a new platform option so this can be disabled. (And so that we can
conditonalize additional QEMU-specific hacks in the platform code.)
Sponsored by: Tag1 Consulting, Inc.