FreeBSD developers need more time to review patches in the surrounding
areas like the TCP stack which are using MPSAFE callouts to restore
distribution of callouts on multiple CPUs.
Bump the __FreeBSD_version instead of reverting it.
Suggested by: kmacy, adrian, glebius and kib
Differential Revision: https://reviews.freebsd.org/D1438
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.
I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.
The newer boards don't have the response field that indicates
whether the SCSI status byte is present. You have to just look to
see whether it is non-zero.
The code was looking to see whether the sense length was valid
before propagating the SCSI status byte (and sense information) up
the stack. With a status like Reservation Conflict, there is no
sense information, only the SCSI status byte. So it wasn't getting
correctly returned.
isp.c:
In isp_intr(), if we are on a 2400 or 2500 type board and
get a response, look at the actual contents of the
SCSI status value and set the RQSF_GOT_STATUS flag
accordingly so that return any SCSI status value we get. The
RQSF_GOT_SENSE flag will get set later on if there is
actual sense information returned.
Submitted by: ken
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1112791 on 2015/01/15
If the user sends an XPT_RESET_DEV CCB, make sure to reset the
Fibre Channel Command Reference Number if we're running on a FC
controller.
We send a SCSI Target Reset when we get this CCB, and as a result
need to reset the CRN to 1 on the next command.
isp_freebsd.c:
In the XPT_RESET_DEV implementation in isp_action(), reset
the CRN if we're on a FC controller.
Submitted by: ken
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1112787 on 2015/01/15
Fix SCSI status byte reporting on 4Gb and 8Gb Qlogic boards.
The newer boards don't have the response field that indicates
whether the SCSI status byte is present. You have to just look to
see whether it is non-zero.
The code was looking to see whether the sense length was valid
before propagating the SCSI status byte (and sense information) up
the stack. With a status like Reservation Conflict, there is no
sense information, only the SCSI status byte. So it wasn't getting
correctly returned.
isp.c:
In isp_intr(), if we are on a 2400 or 2500 type board and
get a response, look at the actual contents of the
SCSI status value and set the RQSF_GOT_STATUS flag
accordingly so that return any SCSI status value we get. The
RQSF_GOT_SENSE flag will get set later on if there is
actual sense information returned.
Submitted by: ken
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1112791 on 2015/01/15
the data the inline functions access together at the start of the bus_space
struct. The start-of part isn't so important, it's the grouping-together
that's the point: now all the most-accessed data should be in one cache line.
Suggested by: cognet
systems with more than 4GB of physical memory.
To remotely debug the system 'stealthy' which has a kernel
with this change installed and firewire properly configured:
% fwcontrol -m stealthy (or stealthy's firewire EUI64)
% kgdb kernel /dev/fwmem0.0
sys/dev/firewire/fwohci.c:
Rather than hard code the upper limit for hw based
automatic responses to remote DMA requests at 4GB,
program the hardware using Maxmem, the page number
one higher than the highest physical page detected
in the system.
While here, garbage collect more useless splfw()
calls.
Submitted by: gibbs
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1110994 on 2015/01/06
asynchronous remote dma request (DMA request that the
hardware cannot automatically handle).
sys/dev/firewire/firewire.c
In fw_rcv(), add missing early return in the error
path for DMA requests to unregistered regions.
Submitted by: gibbs
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1110993 on 2015/01/06
sys/boot/i386/libfirewire/firewire.c:
sys/dev/firewire/firewire.c:
Fix configuration ROM generation count wrapping logic
so that the generation count is never outside of
allowed limits (0x2 -> 0xF).
sys/dev/firewire/firewire.c:
In fw_xfer_unload(), xfer->fc may be NULL. Protect
against this before taking the fc lock.
Submitted by: gibbs
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1110685 on 2015/01/05
sys/dev/firewire/firewire.c:
In fw_xfer_unload() expand lock coverage so that
the test for FWXF_INQ doesn't race with it being
cleared in another thread.
Submitted by: gibbs
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1110207 on 2015/01/02
sys/dev/firewire/firewire.c:
In fw_xfer_unload(), clear the FWXF_INQ flag on the
xfer under protection of the FW_GMTX, after the
xfer is removeed from the tx/rx queue. Otherwise
it is possible for the xfer to be removed again
(corrupting the list or immediately panicing) from
another thread that has found this xfer in the
transaction label table.
Submitted by: gibbs
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1110200 on 2015/01/02
ZFS already commits outstanding data every zfs_txg_timeout seconds, so these
syncs are unnecessarily intrusive.
Submitted by: gibbs
Sponsored by: Spectra Logic
MFSpectraBSD: 1105759 on 2014/12/11
Since allow_mounted is a FreeBSD-specific change, default to B_TRUE, then
locally check for the magic bit. Unconditionally check allow_mounted below.
Convert the setting of allow_mounted to an explicit boolean.
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 672578 (in part) on 2013/07/19
mostly a no-op since all currently-supported instances of these CPUs give
the number of SLB slots in the device tree, but keep it here as well just
in case.
instructions to call through pointers instead. In general, these are set
implicitly through relocation processing. One has to be set explicitly in
machdep.c, however, to fit one handler in the tiny (8 instruction) space
available.
Reviewed by: andreast
Differential revision: D1554
Tested on: UP and SMP G5, Cell, POWER5+
We obtain a stable copy and store it in local 'fde' variable. Storing another
copy (based on aforementioned variable) does not serve any purpose.
No functional changes.
The only potential in-tree consumer (_fdrop) special-cased it and returns 0
0 on its own instead of calling badfo_close.
Remove the special case since it is not needed and very unlikely to encounter
anyway.
No objections from: kib
This is primarily for developer/debugging use; it enables built-in tagged
tracking of refcounts inside ZFS. It can only be enabled from the loader,
since it modifies how in-core state is managed. Default remains disabled.
MFC after: 1 week
Sponsored by: Spectra Logic
as the cpu id on arm64 as it may use two cells. In it's place we can use
the device id.
It is expected we will use the reg data on arm64 to enable cores so we
still need to read and store it even if it is not yet used.
Differential Revision: https://reviews.freebsd.org/D1555
Reviewed by: nwhitehorn
Sponsored by: The FreeBSD Foundation
commit 4d93914ae3db4a897ead4b. Some related drm infrastructure
changes are imported as needed.
Biggest update is the rewrite of the i915 gem io to more closely
follow Linux model, althought the mechanism used by FreeBSD port is
different.
Sponsored by: The FreeBSD Foundation
MFC after: 2 month
sequences, like are used to read the HIDs. This is both easier to read
and avoids a miscompilation by GCC in certain circumstances. Also avoid
double restoration of HID4 and HID5.
MFC after: 2 weeks
Fill in some formerly NULL members where the implementation function
exists. Add a dummy function that panics and use it as a placeholder
for thigns that are still unimplemented. Remove a few unused includes.
every operation to retrieve the bs_cookie value almost nothing actually uses.
The bus_space struct contains a private data pointer (poorly named bs_cookie,
now renamed to bs_privdata) which is used only by a few old armv4 xscale
implementations. The bus_space functions were all defined to take this
value as the first parameter instead of the bus_space_tag_t, requiring all
the inline macro and function expansions to dereference the tag to pass it
to another function, which never uses it. Now all the functions take the tag
as the first parameter and retrieve the privdata if they need it.
Also fix a couple bus_space_unmap() implementations that were calling
kva_free() instead of pmap_unmapdev().
Discussed with: cognet
Remove the unnecessary #ifdef _KERNEL, which did not differ in the true or
false cases. Actually set the value of to_free before using it.
MFC after: 1 week
Sponsored by: Spectra Logic
Define it as an atomic uint32_t. These increments happen infrequently
enough for the atomic overhead to be a problem, and since they're now
independent atomics, they won't contend with xpt_lock_buses().
This counter is useful as a means of cheaply identifying whether any changes
have been made to the CAM peripheral list. Userland programs have no guarantee
that the counter won't change on them while being returned or while processing
the information, so they must be written accordingly.
Discussed with: ken, mav (in general)
MFC after: 1 week
Sponsored by: Spectra Logic
This makes Mac OS X happy when it returns back from suspending.
o Switch notify state after data is transferred, but not before.
o Consider there is also Super Speed mode.
o Do not set stall bit on any pipes in device mode as Mac OS X seems
don't support it.
In collaboration with: hselasky@
dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(),
which already calls dmu_buf_will_dirty() for the same dbuf and tx, so
its duplicate call in those cases will change nothing, only spend time.
Skipping this call by four times reduces time spent in dbuf_write_done()
and descendants, updating dataset statistics with several congested lock
acquisitions. When rewriting 8K zvol blocks at 1GB/s rate, this reduces
CPU time spent inside dbuf_write_done(), according to profiling, from 45%
of 683K samples to 18% of 422K.
MFC after: 2 weeks
The data in MODINFOMD_MODULEP is packed by the loader as a 4 byte type, but
the amd64 kernel expects a vm_paddr_t, which is of size 8 bytes. Fix this by
saving it as 8 bytes in the loader and retrieving it using the proper type
in the kernel.
Sponsored by: Citrix Systems R&D
driver on Rockchip boards. It currently supports PIO mode
and dma mode needs external dma controller to be used.
Submitted by: jmcneill
Approved by: stas (mentor)
in ofw_mem_regions(). This function is actually MI and should move to
dev/ofw at some point in the near future so that ARM and MIPS can use the
same code.
Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter
hardware was changed via 'sysctl kern.timecounter.hardware'. This happened
because the vdso timehands update was missing the special treatment in
tc_windup() when changing timecounters.
Reviewed by: kib
"MODULE_VERSION" macro definition. Remove the redefinition of the
"MODULE_VERSION" macro from the Linux kernel compatibility API.
MFC after: 1 month
Reported by: np@
Sponsored by: Mellanox Technologies
in r277199. Acquire the neccessary reference in delist_dev_locked()
and inform destroy_devl() about it using CDP_UNREF_DTR flag.
Fix some style nits, add asserts.
Discussed with: hselasky
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
These instructions are emitted by 'bus_space_read_region()' when accessing
MMIO regions.
Since MOVS can be used with a repeat prefix start decoding the REPZ and
REPNZ prefixes. Also start decoding the segment override prefix since MOVS
allows overriding the source operand segment register.
Tested by: tychon
MFC after: 1 week
PVO pool size. The default errs on the exceedingly large side, so absent
any intelligent automatic tuning, at least let the user set it to save
RAM on memory-constrained systems.
MFC after: 2 weeks
A number of entries that can be present in the spa config shouldn't be saved
to disk so add a method to ensure this is case. Without this if the last
caller to vdev_config_generate requested stats then we can end up in the
cache file.
Also only skip a none writable pool in the cache file generation if its
active. This prevents unavailable pools incorrectly getting removed from
cache file.
Tested by: delphij
MFC after: 2 weeks
Sponsored by: Multiplay
This doesn't actually change any behavior, because it just allows a 16-bit
read of the command register to return the correct value, and nothing
actually does a 16-bit read of that register.
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.
Drop the dead code and unsign dirsize given that it can't be
negative anyways.
CID: 1008669
MFC after: 1 week
bits.
The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.
* net/rss_config.[ch] takes care of the generic bits for doing
configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.
This should have no functional impact on anyone currently using
the RSS support.
Differential Revision: D1383
Reviewed by: gnn, jfv (intel driver bits)
attachment to the process. Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.
Discussed with: jilles, rwatson
Man page fixes by: rwatson
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There needs to be some more testing done before it is ready for all
platforms and architectures.
MFC after: 1 month
Sponsored by: Mellanox Technologies
Reported by: bz@
accidentally enable non-existent states.
This bug was triggered if ACPI advertises the presence of a C2 state
which we fail to parse via acpi_PkgGas due to our lack of support for
FFixedHW resources, and causes an immediate panic when an attempt is
made to enter the (NULL) state.
One affected platform is the EC2 c4.8xlarge VM instance type; there
may be others.
MFC after: 1 week
Thanks to: jkim, @_msw_
amd64. Until further we need some custom C-flags when building the
Linux compat API.
MFC after: 1 month
Sponsored by: Mellanox Technologies
Reported by: bz@
Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.
Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.
bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())
Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks
sdhci controllers, such as the one on a Raspberry Pi, mishandle the signal
timing in high speed signaling mode, but run just fine in standard mode
with the bus running at frequencies between 25-50MHz (which shouldn't work).
This is the solution adopted by U-Boot and other OSes (linux and *BSD)
for the timeouts on Raspberry Pi boards with certain SD cards. Some
research shows that this quirk is also used on a few other boards, so the
fix is a generic quirk instead of being in the RPi-specific driver code.
This change is based on information discovered by Michal Meloun.
the stack for secondary cores, the other two values are only used for zeroing
bss on the primary core. No need to store the size of the stack at the
top of the stack (seems to be a leftover instruction from some cut-n-paste).
AR9462/AR9565.
This and some upcoming changes to the HAL for these chips should
address some of the signal sensitivity reported by users.
Tested:
* AR9462 (WB222), STA mode
Obtained from: Linux ath9k
by dumbbell@ to be able to compile this layer as a dependency module.
Clean up some Makefiles and remove the no longer used OFED define.
Currently only i386 and amd64 targets are supported.
MFC after: 1 month
Sponsored by: Mellanox Technologies
Since the upstream for cddl code is now illumos not sun, mechanically
convert all sun #ifdef's to illumos #ifdef's which have been used in all
newer code for some time.
Also do a manual pass to correct the use if #ifdef comments as per style(9)
as well as few uses of #if defined(__FreeBSD__) vs #ifndef illumos.
MFC after: 1 month
Sponsored by: Multiplay
Required when communicating to Mac OS X USB host stack.
o Also don't set stall bit to TX pipe in device mode as seems Mac OS X
don't clears it as it should.
Discussed with: hselasky@
Use the proper types in parse_modmetadata for the p_start and p_end
parameters. This was causing problems in the ARM 32bit loader.
Sponsored by: Citrix Systems R&D
Reported and Tested by: ian
in prep for the next NF calibration pass.
Totally missing braces. Damn you C.
Submitted by: Sascha Wildner <swildner@dragonflybsd.org>
MFC after: 1 week
AR9565 (Aphrodite.) These need to use the MCI routines, not
the legacy 2-wire / 3-wire bluetooth coexistence methods.
Tested:
* AR9462 (WB222); STA mode
binary FPGA image that's in an include file in this directory, that
include file isn't actually used. It is only for certain Trump Cards
that we don't yet support. When support was anticipated for them, we
got permission to include the required FPGA image in our sources under
the BSDL, but didn't start actually including the file. This was done
to provide a public paper trail for this file.
MCI bluetooth coexistence method for WB222.
The rest of MCI requires a bunch more work, including adding a DMA buffer
for the MCI hardware to bounce messages in/out of and handling MCI
interrupts. But the more important part here is telling the HAL
the btcoex is enabled and MCI is in use so it configures the correct
initial bluetooth parameters in the wireless NIC and configures
things like bluetooth traffic weights and such.
So, this at least gets the HAL to do some of the right things in
configuring the inital bluetooth coexistence stuff, but doesn't
actually do full btcoex. That'll take.. some effort.
Tested:
* AR9462 (WB222), STA mode
tree's /chosen node to provide out-of-band header fields of the FDT. This
emulation is not perfect without corresponding changes to ofw_fdt_nextprop(),
but is enough to enable lookup by memory-map-parsing code.
MFC after: 1 week
handler. For roughly twenty years, the page fault handler has used the
same basic strategy: Fetch a fixed number of non-resident pages both ahead
and behind the virtual page that was faulted on. Over the years,
alternative strategies have been implemented for optimizing the handling
of random and sequential access patterns, but the only change to the
default strategy has been to increase the number of pages read ahead to 7
and behind to 8.
The problem with the default page clustering strategy becomes apparent
when you look at how it behaves on the code section of an executable or
shared library. (To simplify the following explanation, I'm going to
ignore the read that is performed to obtain the header and assume that no
pages are resident at the start of execution.) Suppose that we have a
code section consisting of 32 pages. Further, suppose that we access
pages 4, 28, and 16 in that order. Under the default page clustering
strategy, we page fault three times and perform three I/O operations,
because the first and second page faults only read a truncated cluster of
12 pages. In contrast, if we access pages 8, 24, and 16 in that order, we
only fault twice and perform two I/O operations, because the first and
second page faults read a full cluster of 16 pages. In general, truncated
clusters are more common than full clusters.
To address this problem, this revision changes the default page clustering
strategy to align the start of the cluster to a page offset within the vm
object that is a multiple of the cluster size. This results in many fewer
truncated clusters. Returning to our example, if we now access pages 4,
28, and 16 in that order, the cluster that is read to satisfy the page
fault on page 28 will now include page 16. So, the access to page 16 will
no longer page fault and perform an I/O operation.
Since the revised default page clustering strategy is typically reading
more pages at a time, we are likely to read a few more pages that are
never accessed. However, for the various programs that we looked at,
including clang, emacs, firefox, and openjdk, the reduction in the number
of page faults and I/O operations far outweighed the increase in the
number of pages that are never accessed. Moreover, the extra resident
pages allowed for many more superpage mappings. For example, if we look
at the execution of clang during a buildworld, the number of (hard) page
faults on the code section drops by 26%, the number of superpage mappings
increases by about 29,000, but the number of never accessed pages only
increases from 30.38% to 33.66%. Finally, this leads to a small but
measureable reduction in execution time.
In collaboration with: Emily Pettigrew <ejp1@rice.edu>
Differential Revision: https://reviews.freebsd.org/D1500
Reviewed by: jhb, kib
MFC after: 6 weeks
If we aggregated status sending with data move and got error, allow status
to be updated and resent again separately. Without this command may stuck
without status sent at all.
MFC after: 2 weeks
Quoting 19 years bpf.4 manual from bpf-1.2a1:
"
(SIOCGIFADDR is obsolete under BSD systems. SIOCGIFCONF should be
used to query link-level addresses.)
"
* SIOCGIFADDR was not imported in NetBSD (bpf.c 1.36) and OpenBSD.
* Last bits (e.g. manpage claiming SIOCGIFADDR exists) was cleaned
from NetBSD via kern/21513 5 years ago,
from OpenBSD via documentation/6352 5 years ago.
== SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.
Reported and tested by: Alex Tutubalin <lexa@lexa.ru>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
resume sometimes (but not others). On powerup, other wierd issues show
up (sometimes the card comes up, but with really bogus pci config
space stuff. There may be more, but given my experience of historical
fussiness, stick to what works and make more minimal changes to that.
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.
In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.
The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.
Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.
In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:
xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"
The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:
OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517
For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
- Close a migration race where callout_reset() failed to set the
CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
spinlocks.
- Switching the callout CPU number cannot always be done on a
per-callout basis. See the updated timeout(9) manual page for more
information.
- The timeout(9) manual page has been updated to reflect how all the
functions inside the callout API are working. The manual page has
been made function oriented to make it easier to deduce how each of
the functions making up the callout API are working without having
to first read the whole manual page. Group all functions into a
handful of sections which should give a quick top-level overview
when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
to reduce the complexity in the callout code and to avoid problems
about atomically stopping callouts via callout_stop(). If someone
needs it, it can be re-added. From my quick grep there are no
CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
added. See the updated timeout(9) manual page for a complete
description.
- Update the callout clients in the "kern/" folder to use the callout
API properly, like cv_timedwait(). Previously there was some custom
sleepqueue code in the callout subsystem, which has been removed,
because we now allow callouts to be protected by spinlocks. This
allows us to tear down the callout like done with regular mutexes,
and a "td_slpmutex" has been added to "struct thread" to atomically
teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
"SWT_SLEEPQTIMO" states can now be completely removed. Currently
they are marked as available and will be cleaned up in a follow up
commit.
- Bump the __FreeBSD_version to indicate kernel modules need
recompilation.
- There has been several reports that this patch "seems to squash a
serious bug leading to a callout timeout and panic".
Kernel build testing: all architectures were built
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D1438
Sponsored by: Mellanox Technologies
Reviewed by: jhb, adrian, sbruno and emaste
While in theory this should have been a transparent change (and was for all
other drivers), cpsw(4) never used the proper accessor macros in a few
places but spelt the indirect m_hdr.mh_* out itself. Convert those to
use m_len and m_data and unbreak the driver build.
associating an optional PNP hint table with this module. In the
future, when these are added, these changes will silently ignore the
new type they would otherwise warn about. It will always be safe to
ignore this data. Get this into the builds today for some future
proofing.
MFC After: 3 days
only compile in those options in GENERIC that cannot be loaded as
modules. ufs is still included because many of its options aren't
present in the kernel module. There's some other exceptions documented
in the file. This is part of some work to get more things
automatically loading in the hopes of obsoleting GENERIC one day.
more generally make it easier to extend 'struct mbuf in the future', make
a number of changes to the data structure:
- As we anticipate embedding mbufs headers within variable-size regions of
memory in the future, change the definitions of byte arrays embedded in
mbufs to be of size [0] rather than [MLEN] and [MHLEN]. In fact, the
cxgbe driver already uses 'struct mbuf' on the front of other storage
sizes, but we would like the global mbuf allocator do be able to do this
as well.
- Fold 'struct m_hdr' into 'struct mbuf' itself, eliminating a set of
macros that aliased 'mh_foo' field names to 'm_foo' names such as
'm_next'. These present a particular problem as we would like to add
new mbuf-header fields -- e.g., 'm_size' -- that, if similarly named via
macros, would introduce collisions with many other variable names in the
kernel.
- Rename 'struct m_ext' to 'struct struct_m_ext' so that we can add
compile-time assertions without bumping into the still-extant 'm_ext'
macro.
- Remove the MSIZE compile-time assertion for 'struct mbuf', but add new
assertions for alignment of embedded data arrays (64-bit alignment even
on 32-bit platforms), and for the sizes the mbuf header, packet header,
and m_ext structure.
- Document that these assertions exist in comments in mbuf.h.
This change is not intended to cause (non-trivial) behavioural
differences, but is a precursor to further mbuf-allocator work.
Differential Revision: https://reviews.freebsd.org/D1483
Reviewed by: bz, gnn, np, glebius ("go ahead, I trust you")
Sponsored by: EMC / Isilon Storage Division
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".
MFC after: 1 week
Reported by: dchagin@
This makes sure that file descriptors of opened directories will
actually get these capabilities. Without this change, bindat() and
connectat() don't seem to work for me.
MFC after: 2 weeks
Reviewed by: rwatson, pjd
go back through HASWELL, IVY_BRIDGE, IVY_BRIDGE_XEON and SANDY_BRIDGE
to straighten out all the missing PMCs. We also add a new pmc tool
pmcstudy, this allows one to run the various formulas from
the documents "Using Intel Vtune Amplifier XE on XXX Generation platforms" for
IB/SB and Haswell. The tool also allows one to postulate your own
formulas with any of the various PMC's. At some point I will enahance
this to work with Brendan Gregg's flame-graphs so we can flamegraph
various PMC interactions. Note the manual page also needs some
work (lots of work) but gnn has committed to help me with that ;-)
Reviewed by: gnn
MFC after:1 month
Sponsored by: Netflix Inc.
Previous throttling implementation approached problem from the wrong side.
It significantly limited useful delaying of TRIM requests and aggregation
potential, while not so much controlled TRIM burstiness under heavy load.
With this change random 4K write benchmarks (probably the worst case for
TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak
TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage).
Also the new logic does not force map size down so heavily, really allowing
to keep deleted data for 32 TXG or 30 seconds under moderate load. It was
practically impossible with old throttling logic, which pushed map down to
only 64 segments.
Reviewed by: smh
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
can suspend / resume and unload / load cbb and cardbus without errors
on my Lenovo T400, which wasn't possible before. Cards suspending
and resuming in the CardBus slot not yet tested.
o Enable memory cycles to the bridge early (as part of the new
cbb_pci_bridge_init). This fixes the Bad VCC errors which were
caused by the code accessing the device registers with this
cleared. The suspend / resume process clears it.
o Refactor suspend / resume into bus specific code (though the ISA
code is just stubbed). This isn't strictly necessary, but makes
the initializaiton code more uniform and should be more bullet
proof in the face of variant behavior among cardbus bridges.
o Fixup comments in the power-up sequence to reflect reality. These
comments were written for one regime of power-up, but not updated
as things were revised.
o Add a paranoid small delay (100ms) to cover noisy cards powering
down.
o Fix some debugging prints to be easier to grep from dmesg.
Sponsored by: Netflix
would be picked up for kernel builds, it isn't picked up for
old-fashioned builds. Without this option, PCI bus numbers are busted
for modules build iteratively.
VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping
track pending exceptions for a vcpu. This in turn causes confusion because
some fields in 'struct vm_exception' like 'vcpuid' make sense only in the
ioctl context. It also makes it harder to add or remove structure fields.
Fix this by using 'struct vm_exception' only to communicate information
from userspace to vmm.ko when injecting an exception.
Also, add a field 'restart_instruction' to 'struct vm_exception'. This
field is set to '1' for exceptions where the faulting instruction is
restarted after the exception is handled.
MFC after: 1 week
simultaneously detaching kernel drivers on the same USB device we can
get stuck in the "usb_wait_pending_ref_locked()" function because the
conditions needed for allowing detach are not met. The "destroy_dev()"
function waits for all system calls involving the given character
device to return. Character device system calls may lock the USB
enumeration lock, which is also held when "destroy_dev()" is
called. This can sometimes lead to a deadlock not noticed by
WITNESS. The current solution is to ensure the calling thread is the
only one holding the USB enumeration lock and prevent other threads
from getting refs while a USB device detach is ongoing. This turned
out not to be sufficient. To solve this deadlock we could use
"destroy_dev_sched()" to schedule the device destruction in the
background, but then we don't know when it is safe to free() the
private data of the character device. Instead a callback function is
executed by the USB explore process to kill off any leftover USB
character devices synchronously after the USB device explore code is
finished and the USB enumeration lock is no longer locked. This makes
porting easier and also ensures us that character devices must
eventually go away after a USB device detach.
While at it ensure that "flag_iserror" is only written when "priv_mtx"
is locked, which is protecting it.
MFC after: 5 days
function. Many existing clients don't understand POLLNVAL and instead
relies on an error code from the read(), write() or ioctl() system
call. Also make sure we wakeup any client pollers before the cuse
server is closing, so they don't wait forever for an event.
Before this change, the current code handles SIOCGIFADDR the same
way with SIOCSIFADDR, which involves full arp_ifinit, et al. They
should be unnecessary for SIOCGIFADDR case.
Differential Revision: https://reviews.freebsd.org/D1508
Reviewed by: glebius
MFC after: 2 weeks
Instead of reusing the same reg parsing code, create one, common function
that puts reg contents to the resource list. Address cells and size cells
are passed rather than acquired here so that any bus can have different
default values.
Obtained from: Semihalf
Reviewed by: andrew, ian, nwhitehorn
Sponsored by: The FreeBSD Foundation
driver name and NIC driver softc via the device(9) tree,
instead of going dirty through the ifnet(9) layer.
Differential Revision: D1506
Reviewed by: imp, jhb