transmit queues aswell as non-ratelimited ones.
Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.
Reviewed by: gallatin, kib, gnn
Differential Revision: https://reviews.freebsd.org/D11518
Sponsored by: Mellanox Technologies
Turn on the required options in the ERL config file, and ensure
that the fbt module is listed as a dependency for mips in
the modules/dtrace/dtraceall/dtraceall.c file.
PR: 220346
Reviewed by: gnn, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12227
Similar to r323195, but for amdsmn(4) driver (which borrowed some design).
Ignore hostbs that do not match our PCI device id criteria.
Sponsored by: Dell EMC Isilon
Some systems have hostbs that do not match our PCI device id criteria.
Detect and ignore these devices in probe.
PR: 218264
Sponsored by: Dell EMC Isilon
"zfs mount -o" passes a list of mount options directly to nmount(2) after
sanity checking them. In particular, zfs(8) will refuse to mount an already
existing file system unless "remount" is specified in the option list.
However, the "remount" option only exists in Illumos. FreeBSD's equivalent is
"update".
PR: 221985
Reviewed by: avg
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12233
The sensor value is formatted similarly to previous models (same
bitfield sizes, same units), but must be read off of the internal
System Management Network (SMN) from the System Management Unit (SMU)
co-processor.
PR: 218264
Reported and tested by: Nils Beyer <nbe AT renzel.net>
Reviewed by: avg (no +1), mjoras, truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12217
AMD Family 17h CPUs have an internal network used to communicate between
the host CPU and the PSP and SMU coprocessors. It exposes a simple
32-bit register space.
Reviewed by: avg (no +1), mjoras, truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12217
It would be better to fix API consumers to not pass NULL there - most of them,
such as gmirror, already contain the neccessary checks - but this is easier
and much less error-prone.
One known user-visible result is that it fixes panic on a failed "graid label".
PR: 221846
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Now that CloudABI's sockets API has been changed to be addressless and
only connected socket instances are used (e.g., socket pairs), they have
become fairly similar to pipes. The only differences on CloudABI is that
socket pairs additionally support shutdown(), send() and recv().
To simplify the ABI, we've therefore decided to remove pipes as a
separate file descriptor type and just let pipe() return a socket pair
of type SOCK_STREAM. S_ISFIFO() and S_ISSOCK() are now defined
identically.
This helps to detect when UDP hash types can be supported.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12177
The conditional compiling in the review request is removed, since
these IOCTLs will be available in stable/10 and stable/11.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12175
FreeBSD loader expects to have mmsz variable set by bootloader.
U-Boot behaviour is that if buffer size is not big enough to keep
whole memory map, assign the smallest correct buffer size to sz
and return error.
In other words U-Boot assumes that nobody will need mmsz value when buffer
is not filled with memory map, which is not true, so calculated pages value
was too big to allocate.
Solution: Simply assign default value to mmsz.
Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12194
Marvell Armada 80x0/70x0 SoC family uses same RTC IP as
Armada 38x. This patch adds necessary files and enable driver in
GENERIC config.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12200
Marvell Armada 80x0/70x0 SoC family uses same RTC IP as Armada 38x.
This patch adds Armada 8k compatible to Marvell RTC driver.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12186
Two modules with the same name cannot be loaded, so Marvell specific drivers
cannot have the same name as generic drivers.
Files with the same name, even in different folder overlaps their .o files.
Change armada38x/rtc.c to armada38x/armada38x_rtc.c fix it.
Preparation for adding this driver to GENERIC config for ARMv7
Marvell platforms.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Semihalf
Differential Revision: https://reviews.freebsd.org/D12185
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299https://svnweb.freebsd.org/base?view=revision&revision=322485
These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.
Setting RSS key and hash type/function is a different story,
which probably requires more discussion.
Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12174
superblocks created in revision 322297 only works on disks
with sector sizes up to 4K. This update allows the recovery
information to be created by newfs and used by fsck on disks
with sector sizes up to 64K. Note that FFS currently limits
filesystem to be mounted from disks with up to 8K sectors.
Expanding this limitation will be the subject of another
commit.
Reported by: Peter Holm
Reviewed with: kib
Fix from fallout introduced in r322348 that moved the cpus array to a
dynamic allocation without zeroing the area.
Reported by: mjg
MFC with: r322348
Reviewed by: mjg
Differential revision: https://reviews.freebsd.org/D12220
state to save power, so after writing the entry point address for a core to
the mailbox, use a dsb() to synchronize the execution pipeline to the data
written, then use an sev() to wake up the core.
Submitted by: Sylvain Garrigues <sylgar@gmail.com>
There is no big need to burn CPU if other side may be not there yet. For
example, the PLX hardware by default enables the NTB link up on reset, not
dependig on driver to do it. In case of Intel hardware this also reduces
race between MSI-X workaround negotiation and upper layers, using the same
scratchpad registers in different time.
MFC after: 12 days
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This creates conflicts with FreeBSD variations that may use it. The
usage of the flag M_TOOBIG is limited to iflib queue, thus using
one of M_PROTO flags is fine. There is no need to grab global flag.
Silence from: kmacy, sbruno (2 weeks)
In theory this allows to avoid one more expensive doorbell register read
later in some scenarios. But in practice it also significantly increases
packet rate on PLX hardware, that I can't explain yet, possibly work-
arounding some interrupt delays.
MFC after: 13 days
Sponsored by: iXsystems, Inc.
compatible string to check if the board is compatible with a given quirk.
It's possible this will be moved later, however as it's currently only used
by the MP code put it there.
So far the only instance of a quirk is when the list of CPUs may be
incorrect. This can happen on virtual machines with a hard coded
devicetree, but where the user may then set the number of CPUs as an
argument. This is the case on the ARM models so include the model specific
compat strings for these, including the spelling mistake found in some of
the OpenplatformPkg dtb files.
Sponsored by: DARPA, AFRL
used by the TOE hardware for fully offloaded connections. The knob
affects new connections only.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Record the file path for boot1.efi as the UEFI environemnt variable
FreeBSDBootVarGUID:Boot1Path. Record the device this came from as
FreeBSDBootVarGUID:Boot1Dev. While later stages of the boot may be
able to guess these values by retrieving UEFIGlobal:BootCurrent and
groveling through the correct UEFIGlobal:BootXXXX, this provides
certanty in the face of behavior from any part of the boot loader
chain that might "guess" what to do next. These env variables are
volatile and will disappear on reboot.
Sponsored by: Netflix
In the FreeBSD UEFI boot protocol, boot1.efi exits back to UEFI if it
can't boot the image for most reasons (so that further items in the
EFI boot manger list can be tried). Rename panic to efi_panic, make it
static and give it an extra status argument. Exit back to UEFI with
that status argument so the next loader can be tried.
Use malloc/free exclusively instead of mixing malloc/free and
AllocatePool/FreePool. The code is smaller.
Sponsored by: Netflix
Print the device that boot1.efi was loaded from. Print the path as
well (since it isn't included in DeviceHandle). Move block where we do
this earlier so all the block handle code is now together.
Sponsored by: Netflix
Make efichar.c routines available to libefi as well as
libefivar. Define LIBEFI when building so we can conditionally include
stand.h vs the normal userland stuff.
If both nvme and cam are compiled as modules, nvme cannot be kldloaded
otherwise.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The default value for arc_no_grow_shift may not be optimal when using
several GiB ARC. Expose it via sysctl allows users to tune it easily.
Also expose arc_grow_retry via sysctl for the same reason. The default value of
60s might, in case of intensive load, be too long.
Submitted by: Nikita Kozlov <nikita.kozlov@blade-group.com>
Reviewed by: mav, manu, bapt
MFC after: 2 weeks
Sponsored by: blade
Differential Revision: https://reviews.freebsd.org/D12144
It allows application driver get initial link state without racing with
hardware interrupts, thanks to the context rmlock held here.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Those events may be reported as soon as callback is registered, if the link
is enabled by hardware or some other application.
While there, clean link_is_up variable on link down event.
MFC after: 1 week
install after full initialization, and another to disable the TCB
cache (T6+). The latter works as a tunable only.
Note that debug_flags are for debugging only and should not be set
normally.
MFC after: 1 week
Sponsored by: Chelsio Communications
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too. It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells. There are also 4 DMA engines
in those chips, but they are not yet supported.
While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
In particular, MIPS now has COMPAT_FREEBSD32 for n64 kernels so this
cannot be ignored for n64. On the other hand, it is unneeded for o32
MIPS kernels as the issue is only present when using 64-bit registers,
so remove the workaround from o32 kernels.
Reviewed by: jmallett
Sponsored by: DARPA / AFRL
While I am here, declare struct md_ioctl32 as packed which allows
us to stop playing tricks with sizeof(md_ioctl32)+y as well as
simplifies md_pad handling. Both were necessary because of different
alignment preferences on amd64 vs i386.
MFC after: 4 weeks
The function return value is not used. Its argument is always
swap_total/PAGE_SIZE, so make it not take any arguments.
Submitted by: ota@j.email.ne.jp
PR: 221356
MFC after: 1 week
Now that all of the packaged software has been adjusted to either use
Flower (https://github.com/NuxiNL/flower) for making incoming/outgoing
network connections or can have connections injected, there is no longer
need to keep accept() around. It is now a lot easier to write networked
services that are address family independent, dual-stack, testable, etc.
Remove all of the bits related to accept(), but also to
getsockopt(SO_ACCEPTCONN).
Post-cold sleep instead of DELAY when waiting for firmware.
Convert softc mutex to an SX lock. Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).
Submitted by: Matt Macy <matt@mattmacy.io>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12101
These firmwares come from a pre-release snapshot. The final firmwares
in this Chelsio release cycle will likely be .61.0 or later and those
will be the next "long lived" firmwares in FreeBSD head and stable
branches. .59 is being provided in head (only) for wider test exposure.
Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications
illumos 4185 ("add new cryptographic checksums to ZFS: SHA-512,
Skein, Edon-R") was intentionally merged only partially in r289422,
without adding support for skein, sha512 and edonr on FreeBSD.
Support for skein and sha512 was added later on, but edonr is still not
implemented in FreeBSD.
Prior to this commit zfs(8) correctly rejected edonr, but with an error
message that claimed support:
fk@r500 ~ $zfs set checksum=edonr tank
cannot set property for 'tank': 'checksum' must be one of 'on | off | fletcher2 | fletcher4 | sha256 | sha512 | skein | edonr'
PR: 204055
Submitted by: Fabian Keil
Approved by: allanjude
Obtained from: ElectroBSD
MFC after: 1 week
This patch changes the way XPT_GDEV_TYPE works for NVMe. The current
ccb_getdev structure includes pointers to the NVMe Identify Controller
and Namespace structures, but these are kernel virtual addresses which
are not accessible from user space.
As an alternative, the patch changes the pointers into padding in
ccb_getdev and adds two new types to ccb_dev_advinfo to retrieve the
Identify Controller (CDAI_TYPE_NVME_CNTRL) and Namespace
(CDAI_TYPE_NVME_NS) data structures.
Reviewed By: rpokala, imp
Differential Revision: https://reviews.freebsd.org/D10466
Submitted by: Chuck Tuffli
Tuffli had submitted a more thorough patch that I was unaware of when
I did my work and this brings in the bits I missed from that patch.
PR: 220267
Submitted by: Chuck Tuffli
This adds support in pass(4) for data to be described with a
scatter-gather list (sglist) to augment the existing (single) virtual
address.
Differential Revision: https://reviews.freebsd.org/D11361
Submitted by: Chuck Tuffli
Reviewed by: imp@, scottl@, kenm@
Provided a better estimate for the number of transactions that can be
pending at one time. This will be number of queues * number of
trackers / 4, as suggested by Jim Harris. This gives a better estimate
of the number of transactions that CAM should queue before applying
back pressure. This should be revisted when we have real multi-queue
support in CAM and the upper layers of the I/O stack.
Sponsored by: Netflix
creating hardware VIs.
This fixes a bad race on systems with hw.cxgbe.num_vis > 1.
Reported by: olivier@
MFC after: 1 week
Sponsored by: Chelsio Communications
The actual cache line size has always been 64 bytes.
The 128 number arose as an optimization for Core 2 era Intel processors. By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently. Newer CPUs prefetch more intelligently.
The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington"). If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.
Reported by: mjg
Reviewed by: np
Discussed with: jhb
Sponsored by: Dell EMC Isilon
Before r207410, the hold count of a page in a page queue was protected
by the queue lock, and, before laundering a page, the page daemon
removed managed writeable mappings of the page before releasing the
queue lock. This ensured that other threads could not concurrently
create transient writeable mappings using pmap_extract_and_hold() on a
user map, as is done for example by vmapbuf(). With that revision,
however, a race can allow the creation of such a mapping, meaning that
the page might be modified as it is being laundered, potentially
resulting in it being marked clean when its contents do not match
those given to the pager. Close the race by using the page lock to
synchronize the hold count check in vm_pageout_cluster() with the
removal of writeable managed mappings.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12084
according to the enabled interface capability bits. Also remove
some dead code, which tried to preserve already set contents of
E1000_WUC while that register is completely overwritten shortly
after in all cases.
FAT specification requires that for valid FAT, FAT cluster 0 has a
specific value derived from the BPB media descriptor. The lowest
(little-endian) byte must be equal to bpb.bpbMedia, other bits in the
cluster number must be all 1's. Implement the check to reduce the
chance of the randomly corrupted FAT to pass the mount attempt.
Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org>
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12124
This fixes interrupt storms on hardware using legacy level-triggered
interrupts, since doorbell processing could take time after interrupt
handler completion, that triggered extra interrupts in a loop.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Since the doorbell bit is already set when interrupt handler is called,
the event was not propagated to upper layer. It was working normally
because present code was not using masking actively, but that is going
to change.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
The old code allowed calling vdrop() before insmntque() to place the vnode back
onto the freelist for later recycling. Some downstream consumers may rely on
this support. Normally insmntque() failing is fine since is uses vgone() and
immediately frees the vnode rather than attempting to add it to the freelist if
vdrop() were used instead.
Also assert that vhold() cannot be used on such a vnode.
Reviewed by: kib, cem, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12126
new use of the vm_object's lock to synchronize updates to a radix trie
mapping per-vm object page indices to on-disk swap blocks.
Fix a typo in a nearby comment.
Reviewed by: kib, markj
X-MFC with: r322913
Differential Revision: https://reviews.freebsd.org/D12134
vm_object page indices to on-disk swap space (r322913) has changed the
synchronization requirements for a couple swap pager functions. Whereas
before a read lock on the vm object sufficed because of the global mutex
on the hash table, a write lock on the vm object may now be required. In
particular, calls to vm_pager_page_unswapped() now require a write lock on
the vm_object. Consequently, vm_fault()'s fast path cannot call
vm_pager_page_unswapped(). The swap space will have to be released at a
later point.
Reviewed by: kib, markj
X-MFC with: r322913
Differential Revision: https://reviews.freebsd.org/D12134
This feature comes from the fact that we rely memory-backed md(4)
in our build process heavily. However, if the build goes haywire
the allocated resources (i.e. swap and memory-backed md(4)'s) need
to be purged. It is extremely useful to have ability to attach
arbitrary labels to each of the virtual disks so that they can
be identified and GC'ed if neecessary.
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D10457
There were two bugs related to the blackhole detection:
* The smalles size was tried more than two times.
* The restored MSS was not the original one, but the second
candidate.
MFC after: 1 week
Sponsored by: Netflix, Inc.
GCC only activates C11 keywords in C mode, not C++ mode. This means
that when targeting an older C++ standard, we cannot fall back to using
_Static_assert(). In this case, do define _Static_assert() as a macro
that uses a typedef'ed array.
Discussed in: r322875 commit thread
Reported by: Mark MIllard
MFC after: 1 month
Print the full conflicting oid path, and include the function name in the
warning so it is clear that the warnings are sysctl-related.
PR: 221853
Submitted by: Fabian Keil <fk AT fabiankeil.de> (earlier version)
Sponsored by: Dell EMC Isilon
Tx power values can easily fit into uint8_t + only 8 bits are written
to registers; values may overflow only in case if ROM contains
malformed data (but limit is checked anyway).
Tested with RTL8188CUS, dev.rtwn.1.debug=0x2000 (no changes).
commit it to make initiazation less chatty in the normal case, and more useful
and informative when real debugging is turned on.
Reviewed by: ken (earlier version)
Sponsored by: Netflix
Improve scheduler performance by flattening nonsensical topology layers
(layers with only one child don't serve any purpose).
This is especially relevant on non-AMD Zen systems after r322776. On my
dual core Intel laptop, this brings the kern.sched.topology_spec table down
from three levels to two.
Submitted by: jeff
Reviewed by: attilio
Sponsored by: Dell EMC Isilon
Use efi_devpath_match instead of device_paths_match. They are
functionally the same. Remove device_paths_match from boot1.c and call
efi_devpath_match instead.
Sponsored by: Netflix
Kill our own hand-rolled (and somewhat flawed) devpath_str in favor of
the recently added efi_devpath_str in libefi. This gives us much
better names at the expense of not being able to debug on EFI 1.2
machines (since the UEFI protocol efi_devpath_str depends on was added
in UEFI 2.0). However, this isn't the first thing that requires newer
than EFI 1.2, so it's quite possible that this doesn't change the
universe of machines we can EFI boot from. This will now give us the
full UEFI path, even for devices we don't yet know about. More
importantly, it gives us the full HD(...) part of the path, which is
sufficient by itself to locate disks that follow the rules (dd one
disk (but not partition) to another still needs the rest of the path
to disambiguate, but that isn't following the rules that require every
GPT table to have globally unique GUIDs for every partion).
This also has the side effect of shrinking boot1.efi by ~3k.
Sponsored by: Netflix
Add libefi to the list of libraries we'll link in. Move EFI table
definitions back to libefi so we don't have drift between the two
efi_main routines.
Sponsored by: Netflix
Cast ctxp to caddr_t to pass data as expected. While void * is a
universal type, char * isn't (and that's what caddr_t is defined as).
One could argue these prototypes should take void * rather than
caddr_t, but changing that is much more invasive.
Sponsored by: Netflix
Make the return type of efi_main uniform. Declare the Exit() function
as not returning. Move efi_main's declaration to the proper header.
Sponsored by: Netflix
Move the efi_main routine out of libefi into sys/boot/efi/loader.
Since boot1 has its own efi_main routine, this effectively prevents
boot1 from linking with libefi. By moving it out, we can share code
better (though though some refactoring with boot1's efi_main and
loader.efi's efi_main is definitely in order).
Sponsored by: Netflix
Simplify i386 trap().
- Use more relevant name 'signo' instead of 'i' for the local variable
which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
of the trap() function. Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The machdep.uprintf_signal sysctl replaced it in more convenient way,
not requiring recompilation to use and providing more information on
fault.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
blocks assigned to the object pages.
- The global swhash_mtx is removed, trie is synchronized by the
corresponding object lock.
- The swp_pager_meta_free_all() function used during object
termination is optimized by only looking at the trie instead of
having to search whole hash for the swap blocks owned by the object.
- On swap_pager_swapoff(), instead of iterating over the swhash,
global object list have to be inspected. There, we have to ensure
that we do see valid trie content if we see that the object type is
swap.
Sizing of the swblk zone is same as for swblock zone, each swblk maps
SWAP_META_PAGES pages.
Proposed by: alc
Reviewed by: alc, markj (previous version)
Tested by: alc, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D11435
The AIO job holds a reference on the associated file descriptor, so the
socket's count should already be > 0. This fixes a LOR with the socket
buffer lock after recent socket locking changes in HEAD.
Sponsored by: Chelsio Communications
recieve descriptors for the igb(4) class of devices. This will
allow a better definition for maximum going forward. Some igb(4)
devices support more than the default 4K.
Reported by: Jason (j@nitrology.com)
Sponsored by: Limelight Networks
o Automomous Power State Transition
o Host Memory Buffer
o Timestamp
o Keep Alive Timer
o Host Controlled Thermal Management
o Non-Operational Power State Config
Also note that feature codes 0x78-0x7f are reserved for the NVMe
Management Interface.
Sponsored by: Netflix
removal of the "blk" parameter from blst_meta_alloc() had the unintended
effect of generating an out-of-range allocation when the cursor reaches
the end of the tree if the number of managed blocks in the tree equals
the so-called "radix" (which in the blist code is not the standard notion
of what a radix is but rather the maximum number of leaves in a tree of
the current height.) In other words, only certain swap configurations
were affected, which is why earlier testing did not reveal the problem.
Submitted by: Doug Moore <dougm@rice.edu>
Reported by: pho, kib
Tested by: pho
X-MFC with: r322459
Differential Revision: https://reviews.freebsd.org/D12106
With Flower (CloudABI's network connection daemon) becoming more
complete, there is no longer any need for creating any unconnected
sockets. Socket pairs in combination with file descriptor passing is all
that is necessary, as that is what is used by Flower to pass network
connections from the public internet to listening processes.
Remove all of the kernel bits that were used to implement socket(),
listen(), bindat() and connectat(). In principle, accept() and
SO_ACCEPTCONN may also be removed, but there are still some consumers
left.
Obtained from: https://github.com/NuxiNL/cloudabi
MFC after: 1 month
- map the hard-coded frame buffer address above KERNBASE. Using the
physical address only worked because of larger mapping bugs.
The hard-coded frame buffer address only works on x86. Use messy ifdefs
to try to avoid warnings about unused code for other arches.
- remove the sysctl for reading and writing the table kernel console
attributes. Writing only worked for emergency output since normal
output uses unalterd copies.
- fix the test for the emergency console being usable
- explain why a hard-coded attribute is used very early. Emergency output
works on x86 even before the pcpu pointer is initialized.
vnode lock.
Caller of softdep_count_dependencies() may own a buffer lock, which
might conflict with the lock order.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Advertise this by changing the defaults to mostly red. If you don't like
this, change them (almost) back using:
vidcontrol -c charcolors,base=7,height=0
vidcontrol -c mousecolors,base=0[,height=15]
The (graphics mode only) mouse cursor colors were hard-coded to a black
border and lightwhite interior. Black for the border is the worst
possible default, since it is the same as the default black background
and not good for any dark background. Reversing this gives the better
default of X Windows. Coloring everything works better still. Now
the coloring defaults to a lightwhite border and red interior.
Coloring for the character cursor is more complicated and mode
dependent. The new coloring doesn't apply for hardware cursors. For
non-block cursors, it only applies in graphics mode. In text mode,
the cursor color was usually a hard-coded (dull)white for the background
only, unless the foreground was white when it was a hard-coded black
for the background only, unless the foreground was white and the
background was black it was reverse video. In graphics mode, it was
always reverse video for the block cursor. Reverse video is worse,
especially over cutmarking regions, since cutmarking still uses simple
reverse video (nothing better is possible in text mode) and double
reverse video for the cursor gives normal video. Now, graphics mode
uses the same algorithm as the best case for text mode in all cases
for graphics mode. The hard-coded sequence { white, black, } for the
background is now { red, white, blue, } where the first 2 colors can
be configured. The blue color at the end is a sentinel which prevents
reverse video being used in most cases but breaks the compatibility
setting for white on black and black on white characters. This will
be fixed later. The compatibility setting is most needed for mono modes.
The previous commit to syscons.c changed sc_cnterm() to be more careful.
It followed null pointers in some cases. But sc_cnterm() has been
unreachable for 15+ years since changes for multiple consoles turned
off calls to the the cnterm destructor for all console drivers. Before
them, it was only called at boot time. So no driver with an attached
console has ever been unloadable and not even the non-console destructors
have been tested much.
These files are compiled in userland too, so we can't use sys/systm.h
and rely on CTASSERT. Switch to using _Static_assert instead.
MFC After: 3 days
Sponsored by: Netflix
card has to do PCIe transactions to complete the reset process, but
can't do them, per the PCIe spec, unless bus mastering is enabled.
Submitted by: Kinjal Patel
PR: 22166
terminal state for kernel console output.
r56043 in 2000 added many complications to support dynamic selection
of the terminal emulator using modules and the ioctl CONS_SETTERM.
This was never completed. There are still no modules, but it is easy
to restore the scterm and dumb emulators at compile time. Then
boot-time configuration for the preferred one doesn't work right, but
CONS_SETTERM almost works after fixing this bug. CONS_SETTERM only
switches the emulator for the user state, leaving the kernel state(s)
still using the boot-time emulator. The fix is especially important
when switching from sc to scteken, since the scteken state has pointers
in it.
Rename kernel_console_ts to sc_kts.
The previous update to the driver to 3.2.12-k changed the VF's API version
to 1.2, but did not let the VF fall back to 1.1 or 1.0 versions. So, this
patch tries 1.2 first, then the older versions in succession if that fails.
This should allow the VF driver to negotiate 1.1 and work with older PF
drivers, such as the one used in Amazon's EC2 service.
PR: 220872
Submitted by: Jeb Cramer <jeb.j.cramer@intel.com>
MFC after: 1 week
Sponsored by: Intel Corporation
extreme outliers from dodgy drives. Adjust comments to reflect this,
and make sure that the number of latency buckets match in the two
places where it matters.
o Allow I/O scheduler to gather times on 32-bit systems. We do this by shifting
the sbintime_t over by 8 bits and truncating to 32-bits. This gives us 8.24
time. This is sufficient both in range (256 seconds is about 128x what current
users need) and precision (60ns easily meets the 1ms smallest bucket size
measurements). 64-bit systems are unchanged. Centralize all the time math so
it's easy to tweak tha range / precision tradeoffs in the future.
o While I'm here, the I/O scheduler should be using periph_data rather than
sim_data since it is operating on behalf of the periph.
Differential Review: https://reviews.freebsd.org/D12119
via soisdisconnected(), and in the earlier unlock earlier to avoid lock
recursion.
This fixes a situation when a socket on accept queue is reset before being
accepted.
Reported by: Jason Eggleston <jeggleston llnw.com>
A follow-up to r322836.
Warnings for the unused declaration were breaking some second tier
architectures, but did not show up in Clang on x86.
Reported by: markj (ddb.4), emaste (declaration)
Sponsored by: Dell EMC Isilon
When debugging a deadlock, it is useful to follow the full chain of locks as
far as possible.
Reviewed by: jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12115
Not enabling FSGSBASE in %cr4 does not prevent reporting of the
feature by the CPUID instruction (blame Int*l). As result, kernels
which were run under monitors pretended that usermode cannot modify
TLS base without the syscall, while libc noted right combination of
capable CPU and the new kernel version, trying to use the WRFSBASE
instruction.
Really old hypervisors that cannot handle enablement of these features
in %cr4 would require the manual configuration, by setting the loader
tunable hw.cpu_stdext_disable=0x81
Reported by: lwhsu, mjoras
Sponsored by: The FreeBSD Foundation
MFC after: 18 days
The check for timestamps are too early to handle SYN-ACK correctly.
So move it down after the corresponing processing has been done.
PR: 216832
Obtained from: antonfb@hesiod.org
MFC after: 1 week
Remote DMA over Converged Ethernet, RoCE, for the ConnectX-4 series of
PCI express network cards.
There is currently no user-space support and this driver only supports
kernel side non-routable RoCE V1. The krping kernel module can be used
to test this driver. Full user-space support including RoCE V2 will be
added as part of the ongoing upgrade to ibcore from Linux 4.9. Otherwise
this driver is feature equivalent to mlx4ib(4). The mlx5ib(4) kernel
module will only be built when WITH_OFED=YES is specified.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
newvers.sh looks for a .vcs subdirectory (e.g. .git, .svn) to determine
which vcs info tool to run (e.g., git rev-parse, svn info).
(As of r308789 if a .vcs subdirectory is not found at ${TOPDIR} then
newvers.sh walks up successive parent directories, testing for the .vcs
subdirectory at each step. This is done in case the FreeBSD source is
built in a subdirectory as part of some larger project, but either way
newvers.sh still tests for the .vcs subdirectory.)
However, when using git worktree there is no .git subdirectory but
rather a plain text .git file which contains a reference to the main
working tree.
Change findvcs() to test that the .vcs entry exists, regardless of type.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Rather than repeatedly nesting loops, separate concerns with a single loop
per call stack level. Use a table to drive the recursive routine. Handle
missing topology layers more gracefully (infer a single unit).
Analyze some additional optional layers which may be present on e.g. AMD Zen
systems (groups, aka dies, per package; and cachegroups, aka CCXes, per
group).
Display that additional information in the boot-time topology information,
when it is relevent (non-one).
Reviewed by: markj@, mjoras@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12019
sysctls to display stats, stats polled every 2 seconds
Modify QLA_LOCK()/QLA_UNLOCK() to not sleep after acquiring mtx_lock
Add support to turn OFF/ON error recovery following heartbeat failure for
debug purposes.
Set default max values to 32 Tx/Rx/SDS rings
MFC after:5 days
The full system memory barrier around a TLB invalidation is stricter than
required. It needs to wait on accesses to main memory, with just the weaker
store variant before the invalidate. As such use the dsb istst, tlbi, dlb
ish sequence already used in pmap.
The tlbi instruction in this sequence is also unnecessarily using a
broadcast invalidate when it just needs to invalidate the local CPUs TLB.
Switch to a non-broadcast variant of this instruction.
Sponsored by: DARPA, AFRL
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.
Reviewed by: jhb (previous version)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12023
softdep_count_dependencies().
Buffer's b_dep list is protected by the SU mount lock. Owning the
buffer lock is not enough to guarantee the stability of the list.
Calculation of the UFS mount owning the workitems from the buffer must
be much more careful to not dereference the work item which might be
freed meantime. To get to ump, use the pointers chain which does not
involve workitems at all.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
When a security policy should match TCP connection with specific ports,
the SYN+ACK segment send by syncache_respond() is considered as forwarded
packet, because at this moment TCP connection does not have PCB structure,
and ip_output() is called without inpcb pointer. In this case SPIDX filled
for SP lookup will not contain TCP ports and security policy will not
be found. This can lead to unencrypted SYN+ACK on the wire.
This patch restores the old behavior, when ports will not be filled only
for forwarded packets.
Reported by: Dewayne Geraghty <dewayne.geraghty at heuristicsystems.com.au>
MFC after: 1 week
Deadlock condition:
The return value of TDQ_LOCKPTR(td) is the same for two threads.
1) The first thread signals a wakeup while keeping the rcu_read_lock().
This invokes sched_add() which in turn will try to lock TDQ_LOCK().
2) The second thread is calling synchronize_rcu() calling mi_switch() over
and over again trying to yield(). This prevents the first thread from running
and releasing the RCU reader lock.
Solution:
Release the thread lock while yielding to allow other threads to acquire the
lock pointed to by TDQ_LOCKPTR(td).
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Currently several paths in the NFS client upgrade the shared vnode
lock to exclusive, which might cause temporal dropping of the lock.
This action appears to be fatal for nullfs mounts over NFS. If the
operation is performed over nullfs vnode, then bypassed down to NFS
VOP, and the lock is dropped, other thread might reclaim the upper
nullfs vnode. Since on reclaim the nullfs vnode lock and NFS vnode
lock are split, the original lock state of the nullfs vnode is not
restored. As result, VFS operations receive not locked vnode after a
VOP call.
Stop upgrading the vnode lock when we check the consistency or flush
buffers as result of detected inconsistency. Instead, allocate a new
lockmgr lock for each NFS node, which is locked exclusive instead of
the vnode lock upgrade. In other words, the other parallel
modification of the vnode are excluded by either vnode lock conflict
or exclusivity of the new lock when the vnode lock is shared.
Also revert r316529 because now the vnode cannot be reclaimed during
ncl_vinvalbuf().
In collaboration with: pho
Reviewed by: rmacklem
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12083
This mode allows other clean buffers to arrive while we flush the buf
lists for the vnode, which is fine for the targeted use. We only need
that all buffers existed at the time of the function start were
flushed. In fact, only one assert has to be relaxed.
In collaboration with: pho
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12083
- Use more relevant name 'signo' instead of 'i' for the local variable
which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
of the trap() function. Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.
Requested and reviewed by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This information is normally available via acpi_perf, but in case it is not,
add support for fetching the information via MSRs on AMD family 17h (Zen)
processors. Zen uses a slightly different formula than previous generation
AMD CPUs.
This was inspired by, but does not fix, PR 221621.
Reported by: Sean P. R. <seanpr AT swbell.net>
Reviewed by: mjoras@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12082
There was already a per-vty defaults field, but it was useless since it was
only initialized when propagating the global settings and thus no different
from the current global settings and not per-vty. The global defaults field
was also invariant after boot time, but not quite so useless.
Fix this by adding a second selection bit the the control flags of the
relevant ioctl(). vidcontrol doesn't support this yet. Setting either
default propagates the change to the current setting for the same level
and then to all lower levels.
Improve the 3-way escape sequence used by termcap to control the cursor.
The "normal" (ve) case has always used reset, so the user could set
it to anything, but since the reset is to a global value this is not
very useful, especially since the "very visible" (vs) case doesn't
reset but inconsistently forces to a blinking block. Change vs to
first reset and then XOR the blinking bit so that it is predictably
different from ve.
attribute field is curs_attr. The base field holds user data translated
in a reversible way and is needed because current field holds this in
an irreversible way for efficiency.
Factor out some common code for the reversible translation. This is
slightly simpler now, and much easier to expand.
Translate the magic flags value -1 to a single control flag internally
up front so other flags can be trusted later. This can be used for the
relevant ioctl() too.
Remove CONS_CURSOR_FLAGS which contained all the control flags. It was
unused and not useful. After adding more flags, there will be tests on
a couple at a time but never on them all. This API should have used this
to disallow unknown flags.
Make sure that %eflags.D flag is cleared for hook.
Improve comments.
When #UD dtrace code checks for a registered hook before checking that
the exception was raised from kernel mode, we might run with the user
%ds, trapping on access. Exception entry from userspace automatically
load valid %ss, which we can use there instead.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
redundant initializations.
Hard-code base = 0, height = (approx. 1/8 of the boot-time font height)
in all cases, and remove the BIOS/MD support for setting these values.
This asks for an underline cursor sized for the boot-time font instead
of various less hard-coded but worse values. I used that think that
the x86 BIOS always gave the same values as the above hard-coding, but
on 1 of my systems it gives the wrong value of base = 1.
The remaining BIOS fields are shift_state and bell_pitch. These are now
consistently not explicitly reinitialized to 0. All sc_get_bios_value()
functions except x86's are now empty, and the only useful thing that x86
returns is shift_state. This really belongs in atkbdc, but heavier
use of the BIOS to read the more useful typematic rate has been removed
there. fb still makes much heavier use of the BIOS.