Otherwise tap(4) can be loaded by loader despite being compiled into the
kernel, causing a panic as things try to double-initialize.
PR: 220867
MFC after: 3 days
in r346631. VGLEnd() clears some state variables as it restores state,
but not all of them, so it still needs to clear a single state variable
to indicate that it has completed. Put this clearing back where it was
(at the start instead of the end) to avoid moving bugs in the signal
handling.
Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.
Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.
Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028
The disk_open() function searches for "the best partition" when slice and
partition information is not provided as part of the device name. As of
r345477 the slice and partition fields of a disk_devdesc are initialized to
D_SLICEWILD and D_PARTWILD; in the past they were initialized to -1, which
was sometimes interpreted as meaning 'wildcard' and sometimes as 'open the
raw partition' depending on the context. So as an unintended side effect of
r345477 it became basically impossible to ever open a disk or partition
without doing the 'best partition' search. One visible effect of that was
the inability to open the raw disk to read the partition table correctly in
zfs_probe_dev(), leading to failures to find the zfs pool unless it was on
the first partition.
Now instead of always initializing slice and partition to wildcards, the
disk_parsedev() function initializes them based on the presence of a
path/file name following the device. If there is any path or filename
following the ':' that ends the device name, then slice and partition are
initialized to D_SLICEWILD and D_PARTWILD. If there is nothing after the
':' then it is considered to be a request to open the raw device or
partition itself (not a file stored within it), and the fields are
initialized to D_SLICENONE and D_PARTNONE.
With this change in place, all the tests in src/tools/boot are succesful
again, including the recently-added cases of booting from a zfs pool on
a partition other than slice 1 of the device.
PR: 236981
Previously, a pid check was used to prevent open of the tun(4); this works,
but may not make the most sense as we don't prevent the owner process from
opening the tun device multiple times.
The potential race described near tun_pid should not be an issue: if a
tun(4) is to be handed off, its fd has to have been sent via control message
or some other mechanism that duplicates the fd to the receiving process so
that it may set the pid. Otherwise, the pid gets cleared when the original
process closes it and you have no effective handoff mechanism.
Close up another potential issue with handing a tun(4) off by not clobbering
state if the closer isn't the controller anymore. If we want some state to
be cleared, we should do that a little more surgically.
Additionally, nothing prevents a dying tun(4) from being "reopened" in the
middle of tun_destroy as soon as the mutex is unlocked, quickly leading to a
bad time. Return EBUSY if we're marked for destruction, as well, and the
consumer will need to deal with it. The associated character device will be
destroyed in short order.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20033
It seems that there should be a better way to handle this, but this seems to
be the more common approach and it should likely get replaced in all of the
places it happens... Basically, thread 1 is in the process of destroying the
tun/tap while thread 2 is executing one of the ioctls that requires the
tun/tap mutex and the mutex is destroyed before the ioctl handler can
acquire it.
This is only one of the races described/found in PR 233955.
PR: 233955
Reviewed by: ae
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20027
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.
Sponsored by: Mellanox Technologies
From 7d8dc6544c
"The mcbin (and likely others) have a nonstandard uart clock. This means
that the earlycon programming will incorrectly set the baud rate if it is
specified. The way around this is to tell the kernel to continue using the
preprogrammed baud rate. This is done by setting the baud to 0."
Our drivers (uart_dev_ns8250) do respect zero, but SPCR would error. Let's
not error.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, imp, bcran
Differential Revision: https://reviews.freebsd.org/D19914
ufs partition as p2, and put the zfs partition at p3, to test the ability
of the zfs probe code to find a zfs pool on something other than the first
partition.
Parse the R_MIPS_32 and R_MIPS_64 relocations. Both Elf_Rel and
Elf_Rela relocations are handled since O32 MIPS uses Elf_Rel while N64
uses Elf_Rela. Note that R_MIPS_32 is only handled for 32-bit mips
and R_MIPS_64 for 64-bit. N32 is untested.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D19870
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).
Tested by: cryptocheck, cryptotest.py
MFC after: 1 month
Sponsored by: Chelsio Communications
A request to encrypt an empty payload without any AAD is unusual, but
it is defined behavior. Removing this assertion removes a panic and
instead returns the correct tag for an empty buffer.
Reviewed by: cem, sef
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20043
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.
Reported by: NIST SHA1 test failure
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add support for newer Thinkpad models with id LEN0268. Was tested on
Thinkpad T480 and ThinkPad X1 Yoga 2nd gen.
PR: 229120
Submitted by: Ali Abdallah <aliovx@gmail.com>
MFC after: 1 week
tools/boot/install-boot.sh was assuming that if a device was passed in,
it should operate on the current system and run efibootmgr etc. to
update the boot manager. However, rootgen.sh passes a md(4) device and
not a fixed disk.
Add a -u option to install-boot.sh to tell it to update the system
in-place and run efibootmgr etc.
Also, source install-boot.sh in rootgen.sh to allow it to find and
call make_esp_file etc. And pass the loader file to make_esp_file instead
of a directory name.
Reported by: ian
Reviewed by: ian,imp,tsoome
Differential Revision: https://reviews.freebsd.org/D19992
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
We may need the BSP to reboot, but we don't need any AP CPU that isn't the
panic thread. Any CPU landing in this routine during panic isn't the panic
thread, so we can just detect !BSP && panic and shut down the logical core.
The savings can be demonstrated in a bhyve guest with multiple cores; before
this change, N guest threads would spin at 100% CPU. After this change,
only one or two threads spin (depending on if the panicing CPU was the BSP
or not).
Konstantin points out that this may break any future patches which allow
switching ddb(4) CPUs after panic and examining CPU-local state that cannot
be inspected remotely. In the event that such a mechanism is incorporated,
this behavior could be made configurable by tunable/sysctl.
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20019
VGLMouseFreeze() now only defers mouse signals and leaves it to higher
levels to hide and unhide the mouse cursor if necessary. (It is never
necessary, but is done to simplify the implementation. It is slow and
flashes the cursor. It is still done for copying bitmaps and clearing.)
VGLMouseUnFreeze() now only undoes 1 level of freezing. Its old
optimization to reduce mouse redrawing is too hard to do with unhiding
in higher levels, and its undoing of multiple levels was a historical
mistake.
VGLMouseOverlap() determines if a region overlaps the (full) mouse region.
VGLMouseFreezeXY() is the freezing and a precise overlap check combined
for the special case of writing a single pixel. This is the single-pixel
case of the old VGLMouseFreeze() with cleanups.
Fixes:
- check in more cases that the application didn't pass an invalid VIDBUF
- check for errors from copying a bitmap to the shadow buffer
- freeze the mouse before writing to the shadow buffer in all cases. This
was not done for the case of writing a single pixel (there was a race)
- don't spell the #defined values for VGLMouseShown as 0, 1 or boolean.
As with mlx5en, the idea is to drop unwanted traffic as early
in receive as possible, before mbufs are allocated and anything
is passed up the stack. This can save considerable CPU time
when a machine is under a flooding style DOS attack.
The major change here is to remove the unneeded abstraction where
callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and
are responsible for NULL'ing that mbuf themselves. Now this happens
directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us
to use the decision (and potentially mbuf) returned by the pfil
hooks. The driver can now recycle mbufs to avoid re-allocation when
packets are dropped.
Reviewed by: marius (shurd and erj also provided feedback)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19645
The mouse signal SIGUSR2 was not turned off for normal termination and
in some other cases. Thus mouse signals arriving after the frame
buffer was unmapped always caused fatal traps. The fatal traps occurred
about 1 time in 5 if the mouse was wiggled while vgl is ending.
The screen switch signal SIGUSR1 was turned off after clearing the
flag that it sets. Unlike the mouse signal, this signal is handled
synchronously, but VGLEnd() does screen clearing which does the
synchronous handling. This race is harder to lose. I think it can
get vgl into deadlocked state (waiting in the screen switch handler
with SIGUSR1 to leave that state already turned off).
Turn off the mouse cursor before clearing the screen in VGLEnd().
Otherwise, clearing is careful to not clear the mouse cursor. Undrawing
an active mouse cursor uses a lot of state, so is dangerous for abnormal
termination, but so is clearing. Clearing is slow and is usually not
needed, since the kernel also does it (not quite right).
This GRE-in-UDP encapsulation allows the UDP source port field to be
used as an entropy field for load-balancing of GRE traffic in transit
networks. Also most of multiqueue network cards are able distribute
incoming UDP datagrams to different NIC queues, while very little are
able do this for GRE packets.
When an administrator enables UDP encapsulation with command
`ifconfig gre0 udpencap`, the driver creates kernel socket, that binds
to tunnel source address and after udp_set_kernel_tunneling() starts
receiving of all UDP packets destined to 4754 port. Each kernel socket
maintains list of tunnels with different destination addresses. Thus
when several tunnels use the same source address, they all handled by
single socket. The IP[V6]_BINDANY socket option is used to be able bind
socket to source address even if it is not yet available in the system.
This may happen on system boot, when gre(4) interface is created before
source address become available. The encapsulation and sending of packets
is done directly from gre(4) into ip[6]_output() without using sockets.
Reviewed by: eugen
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19921
points at the "latest" branch and one which points at the "quarterly"
branch. Install the "latest" version unless overridden via the newly
added PKGCONFBRANCH variable.
This does not change user-visible behaviour (assuming said vairable is
not set) but will make it easier to change the defaults in the future --
on stable branches we will want "latest" on x86 but "quarterly" elsewhere.
Discussed with: gjb
MFC after: 3 days
X-MFC: After MFCing this I'll make a direct commit to stable/* to
switch non-x86 architectures to "quarterly".
`xrange` is a pre-python 2.x compatible idiom. Use `range` instead. The values
being iterated over are sufficiently small that using range on python 2.x won't
be a noticeable issue.
MFC after: 2 months
Since D19668 was done, new users of the -n flag have surfaced. Parse
and ignore it on the command line until they can be updated.
Suggested by: rgrimes (in D19668).
Replace `except Environment, e:` with `except Environment as e` for
compatibility between python 2.x and python 3.x.
While here, fix a bad indentation change from r346620 by reindenting the code
properly.
MFC after: 2 months
From r346443:
"""
Replace hard tabs with four-character indentations, per PEP8.
This is being done to separate stylistic changes from the tests from functional
ones, as I accidentally introduced a bug to the tests when I used four-space
indentation locally.
No functional change.
"""
MFC after: 2 months
Discussed with: jhb
mtmsr and mtsr require context synchronizing instructions to follow. Without
a CSI, there's a chance for a machine check exception. This reportedly does
occur on a MPC750 (PowerMac G3).
Reported by: Mark Millard
r346307 inadvertently started installing FDT_DTS_FILE along with the kernel.
While this isn't necessarily bad, it was not intended or discussed and it
actively breaks some current setups that don't anticipate any .dtb being
installed when it's using static fdt. This change could be reconsidered down
the line, but it needs to be done with prior discussion.
Fix it by pushing FDT_DTS_FILE build down into the raw dtb.build.mk bits.
This technically allows modules building DTS to accidentally specify an
FDT_DTS_FILE that gets built but isn't otherwise useful (since it's not
installed), but I suspect this isn't a big deal and would get caught with
any kind of testing -- and perhaps this might end up useful in some other
way, for example by some module wanting to embed fdt in some other way than
our current/normal mechanism.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>
MFC after: 3 days
X-MFC-With: r346307
The CCM test vectors use a slightly different file format in that
there are global key-value pairs as well as section key-value pairs
that need to be used in each test. In addition, the sections can set
multiple key-value pairs in the section name. The CCM KAT parser
class is an iterator that returns a dictionary once per test where the
dictionary contains all of the relevant key-value pairs for a given
test (global, section name, section, test-specific).
Note that all of the CCM decrypt tests use nonce and tag lengths that
are not supported by OCF (OCF only supports a 12 byte nonce and 16
byte tag), so none of the decryption vectors are actually tested.
Reviewed by: ngie
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D19978
Pass in an explicit digest length to the Crypto constructor since it
was assuming only sessions with a MAC key would have a MAC. Passing
an explicit size allows us to test the full digest in HMAC tests as
well.
Reviewed by: cem
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D19884
This copes more gracefully when older version of the nist-kat package
are intalled that don't have newer test vectors such as CCM or plain
SHA.
If the nist-kat package is not installed at all, this still fails with
an error.
Reviewed by: cem
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20034
sbin/veriexec will ignore entries that have no hash anyway,
but loader needs to be explicitly told that such files are
ok to ignore (not verify).
We will report as Unverified depending on verbose level,
but with no reason - because we are not rejecting the file.
Reviewed by: imp, mindal_semihalf
Sponsored by: Juniper Networks
MFC After: 1 week
Differential Revision: https://reviews.freebsd.org//D20018
tun destruction will not continue until TUN_OPEN is cleared. There are brief
moments in tunclose where the mutex is dropped and we've already cleared
TUN_OPEN, so tun_destroy would be able to proceed while we're in the middle
of cleaning up the tun still. tun_destroy should be blocked until these
parts (address/route purges, mostly) are complete.
PR: 233955
MFC after: 2 weeks
If kern.random.initial_seeding.bypass_before_seeding is disabled, random(4)
and arc4random(9) will block indefinitely until enough entropy is available
to initially seed Fortuna.
It seems that zero flowids are perfectly valid, so avoid blocking on random
until initial seeding takes place.
Discussed with: bz (earlier revision)
Reviewed by: thj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20011
As mphyp_pte_unset() can also remove PTE entries, and as this can
happen in parallel with PTEs evicted by mphyp_pte_insert(), there
is a (rare) chance the PTE being evicted gets removed before
mphyp_pte_insert() is able to do so. Thus, the KASSERT should
check wether the result is H_SUCCESS or H_NOT_FOUND, to avoid
panics if the situation described above occurs.
More details about this issue can be found in PR 237470.
PR: 237470
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20012