page queue even when the allocation is not wired. It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.
In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.
In the uiomove_object_page(), deactivate page before the object is
unlocked. There is no leak, since the page is deactivated after
uiomove_fromphys() finished. But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
trees. This happens because the logic inserting items into the radix
tree is allocating empty radix levels, when index zero does not
contain any items.
- Add proper error case handling, so that the radix tree does not end
up in a bad state, if memory cannot be allocated during insertion of
an item.
- Add check for inserting NULL items into the radix tree.
- Add check for radix tree getting too big.
MFC after: 1 week
Sponsored by: Mellanox Technologies
r269533, which was tested before r269457 was committed, implicitely
relied on the Giant to protect the manipulations of the softdepmounts
list. Use softdep global lock consistently to guarantee the list
structure now.
Insert the new struct mount_softdeps into the softdepmounts only after
it is sufficiently initialized, to prevent softdep_speedup() from
accessing bare memory. Similarly, remove struct mount_softdeps for
the unmounted filesystem from the tailq before destroying structure
rwlock.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table
page allocation. (The same functions in the i386/xen and mips pmap
implementations already use PMAP_ENTER_NOSLEEP.)
X-MFC with: r269728
Sponsored by: EMC / Isilon Storage Division
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
Sponsored by: Citrix Systems R&D
Tested by: robak
MFC after: 1 week
PR: 191173
dev/xen/blkfront/blkfront.c:
- Add and announce support for unmapped IO.
options into kern.opts.mk and change all the places where we use
src.opts.mk to pull in the options. Conditionally define SYSDIR and
use SYSDIR/conf/kern.opts.mk instead of a CURDIR path. Replace all
instances of CURDIR/../../etc with STSDIR, but only in the affected
files.
As a special compatibility hack, include bsd.owm.mk at the top of
kern.opts.mk to allow the bare build of sys/modules to work on older
systems. If the defaults ever change between 9.x, 10.x and current for
these options, however, you'll wind up with the host OS' defaults
rather than the -current defaults. This hack will be removed when
we no longer need to support this build scenario.
Reviewed by: jhb
Differential Revision: https://phabric.freebsd.org/D529
Rename vt_generate_vga_palette() to vt_generate_cons_palette() and
change it to build a palette where the color index is the same than in
terminal escape codes, not the VGA index. That's what TCHAR_CREATE()
uses and passes to vt(4).
The main differences between both orders are:
o Blue and red are swapped (1 <-> 4)
o Yellow and cyan are swapped (3 <-> 6)
The problem remained unnoticed, because the RGB bit indexes passed to
vt_generate_vga_palette() were reversed. This inversion was cancelled
by the colors inversions in the generated palette. For instance, red
(0xff0000) and blue (0x0000ff) have bytes in opposite order, but were
swapped in the palette. But after changing the value of blue (see last
paragraph), the modified color was in fact the red one.
This commit includes a fix to creator_vt.c, submitted by Nathan
Whitehorn: fb_cmsize is set to 16. Before this, the generated palette
would be overwritte. This fixes colors on sparc64 with a Creator3D
adapter.
While here, tune the palette to better match console colors and improve
the readability (especially the dark blue).
Submitted by: nwhitehorn (fix to creator_vt.c)
MFC after: 1 week
pmap, unlike i386, and similar to i386/xen pv, does not tolerate
abandoned mappings for the freed pages.
Reported and tested by: dumbbell
Diagnosed and reviewed by: alc
Sponsored by: The FreeBSD Foundation
In several functions, vtbuf_putchar() in particular, the lock on vtbuf
is acquired twice:
1. once by the said functions;
2. once in vtbuf_dirty().
Now, vtbuf_dirty_locked() and vtbuf_dirty_cell_locked() allow to
acquire that lock only once.
This improves the input speed of vt(4). To measure the gain, a
50,000-lines file was displayed on the console using cat(1). The time
taken by cat(1) is reported below:
o On amd64, with vt_vga:
- before: 1.0"
- after: 0.5"
o On sparc64, with creator_vt:
- before: 13.6"
- after: 10.5"
MFC after: 1 week
vt_fb_attach() currently always returns 0, but it could return a code
defined in errno.h. However, it doesn't return a CN_* code. So checking
its return value against CN_DEAD (which is 0) is incorrect, and in this
case, a success becomes a failure.
The consequence was unimportant, because the caller (drm_fb_helper.c)
would only log an error message in this case. The console would still
work.
Approved by: nwhitehorn
The original commit was supposed to stop the ability to do raw frame
injection in monitor mode to arbitrary channels (whether supported
by regulatory or not) however it doesn't seem to have been followed
by any useful way of doing it.
Apparently AHDEMO is supposed to be that way, but it seems to require
too much fiddly things (disable scanning, set a garbage SSID, etc)
for it to actually be useful for spoofing things.
So for now let's just disable it and instead look to filter transmit
in the output path if the channel isn't allowed by regulatory.
That way monitor RX works fine but TX will be blocked.
I don't plan on MFC'ing this to -10 until the regulatory enforcement
bits are written.
The AR9380 and later chips have a 128KiB register window, so the register
read diag api needs changing.
The tools are about to be updated as well. No, they're not backwards
compatible.
If powersave is enabled and there are any transitions to network
or full sleep - even if they're pretty damned brief - eventually
something messes up somewhere and the bus glue between the AR9331
SoC and the AR9331 wifi stops working. It shows up as stuck DMA
and LOCAL_TIMEOUT interrupts.
Both ath9k and the reference driver does a full chip reset if things
get stuck.
So:
* teach the AR9330 HAL about the force_full_reset option I added a
couple of years ago;
* if the chip is currently in full-sleep, do a full-reset;
* if TX DMA and/or RX DMA are still enabled (eg, they did get
stuck during reset) then do a full-reset.
Tested:
* AR9331 SoC, STA mode
pmap_enter(PMAP_ENTER_NOSLEEP). The PGA_WRITEABLE flag can be set
when either the page is busied, or the owner object is locked.
Update comments, move all assertions about page state when
PGA_WRITEABLE flag is set, into new helper
vm_page_assert_pga_writeable().
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
mapping size (currently unused). The flags includes the fault access
bits, wired flag as PMAP_ENTER_WIRED, and a new flag
PMAP_ENTER_NOSLEEP to indicate that pmap should not sleep.
For powerpc aim both 32 and 64 bit, fix implementation to ensure that
the requested mapping is created when PMAP_ENTER_NOSLEEP is not
specified, in particular, wait for the available memory required to
proceed.
In collaboration with: alc
Tested by: nwhitehorn (ppc aim32 and booke)
Sponsored by: The FreeBSD Foundation and EMC / Isilon Storage Division
MFC after: 2 weeks
nullfs vnode shares vnode lock with lower vnode, this allows the
reclamation of nullfs directory vnode in null_lookup(). In this
situation, VOP must return ENOENT.
More, since after the reclamation, the locks of nullfs directory vnode
and lower vnode are no longer shared, the relock of the ldvp does not
restore the correct locking state of dvp, and leaks ldvp lock.
Correct this by unlocking ldvp and locking dvp.
Use cached value of dvp->v_mount.
Reported by: bdrewery
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Add the ACPI MCFG table to advertise the extended config memory window.
Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.
Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.
CR: https://phabric.freebsd.org/D505
Reviewed by: jhb, grehan
Requested by: tychon
After r269510 the IO APIC and ATPIC initialization is done at the same
order, which means atpic_init can be called before the IO APIC has
been initalized. In that case the ATPIC will take over the interrupt
sources, preventing the IO APIC from registering them.
Reported by: David Wolfskill <david@catwhisker.org>
Tested by: David Wolfskill <david@catwhisker.org>,
Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
Sponsored by: Citrix Systems R&D
have to adjust freeblk records to reflect the change to a full-size block.
For example, suppose we have a block made up of fragments 8-15 and
want to free its last two fragments. We are given a request that says:
FREEBLK ino=5, blkno=14, lbn=0, frags=2, oldfrags=0
where frags are the number of fragments to free and oldfrags are the
number of fragments to keep. To block align it, we have to change it to
have a valid full-size blkno, so it becomes:
FREEBLK ino=5, blkno=8, lbn=0, frags=2, oldfrags=6
Submitted by: Mikihito Takehara
Tested by: Mikihito Takehara
Reviewed by: Jeff Roberson
MFC after: 1 week
One problem is inferior(9) looping due to the process tree becoming a
graph instead of tree if the parent is traced by child. Another issue
is due to the use of p_oppid to restore the original parent/child
relationship, because real parent could already exited and its pid
reused (noted by mjg).
Add the function proc_realparent(9), which calculates the parent for
given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head
element of the p_orphan list and than stepping back to its container
to find the parent process. If the parent has already exited, the
init(8) is returned.
Move the P_ORPHAN and the new helper flag from the p_flag* to new
p_treeflag field of struct proc, which is protected by proctree lock
instead of proc lock, since the orphans relationship is managed under
the proctree_lock already.
The remaining uses of p_oppid in ptrace(PT_DETACH) and process
reapping are replaced by proc_realparent(9).
Phabric: D417
Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
swap pager. Swap pager uses a private mutex to protect swap metadata,
and does not rely on the vm object lock to ensure integrity of it.
Weaken the requirement for the vm object lock by only asserting locked
object in vm_pager_page_unswapped(), instead of locked exclusively.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
the right register bank for the framebuffer. Disable the assigned-addresses
path on SPARC since it is just a hack for IBM PPC systems and was neither
relevant for nor worked on SPARC anyway.
vm_phys_fictitious_to_vm_page should not be called directly, even when
operating on a range that has been registered using
vm_phys_fictitious_reg_range. PHYS_TO_VM_PAGE should be used instead
because on arches that use VM_PHYSSEG_DENSE the page might come
directly from vm_page_array.
Reported by: nwhitehorn
Tested by: nwhitehorn, David Mackay <davidm.jx8p@gmail.com>
Sponsored by: Citrix Systems R&D
sizeof(struct scsi_inquiry_data) of 256 bytes combined with off-by-one
error in the changed code gave total INQUIRY data length above 255 bytes,
that was maximal INQUIRY length in SPC-2. While SPC-3 increased the
maximal length to 64K, at least sg3_utils are still confused by that.
MFC after: 1 week
like EX and SRX. The install command uses pkgfs to extract a kernel,
zero or more modules and a root file system from the specified package
and boots the kernel. The name of the kernel, the list of modules and
the name of the root file system can be specified by putting a
file called "metatags in the package.
The package to use is given by an URL. The schemes supported are
tftp and file. For the file scheme, the disk is currently hardcoded
but that should really look for the package on all devices and
partititions.
Obtained from: Juniper Networks, Inc.
- Replace the global driver lock with a per-instance device lock.
- Use the per-instance device lock instead of Giant for the CAM sim lock.
- Add global locks to protect the adapter list and DPC queues.
- Use wakeup() and mtx_sleep() to wait for certain events like the
controller going idle rather than polling via timeouts passed to
tsleep().
- Use callout(9) instead of timeout(9).
- Mark the interrupt handler MPSAFE.
- Remove compat shims for FreeBSD versions older than 8.0.
Reviewed by: Steve Chang <ychang@highpoint-tech.com>
- Use the existing vbus locks instead of Giant for the CAM sim lock.
- Use callout(9) instead of timeout(9).
- Mark the interrupt handler MPSAFE.
- Don't attempt to pass data in the softc from probe() to attach().
- Remove compat shims for FreeBSD versions older than 8.0.
Reviewed by: Steve Chang <ychang@highpoint-tech.com>
- Use the existing vbus locks instead of Giant for the CAM sim lock.
- Use callout(9) instead of timeout(9).
- Mark the interrupt handler MPSAFE.
- Don't attempt to pass data in the softc from probe() to attach().
- Remove compat shims for FreeBSD versions older than 8.0.
Reviewed by: Steve Chang <ychang@highpoint-tech.com>
- Use callout(9) instead of timeout(9).
- Use the existing hba lock as the CAM sim lock instead of Giant.
- Mark interrupt handler MPSAFE.
- Reorder detach and destroy the hba lock in detach.
Reviewed by: Steve Chang <ychang@highpoint-tech.com>
device attachment on arm platforms. If this is defined, nexus attaches
early in BUS_PASS_BUS, and other busses and devices attach later, in the
pass number they are set up for. Without it defined, nexus attaches in
BUS_PASS_DEFAULT and thus so does everything else, which is status quo.
Arm platforms which use FDT data to enumerate devices have been relying
on devices being attached in the exact order they're listed in the dts
source file. That's one of things currently preventing us from using
vendor-supplied fdt data (because then we don't control the order of the
devices in the data). Multi-pass attachment can go a long way towards
solving that problem by ensuring things like clock and interrupt drivers
are attached before the more mundane devices that need them.
The long-term goal is to have all arm fdt-based platforms using multipass.
This option is a bridge to that, letting us enable it selectively as
platforms are converted and tested (the alternative being to just throw
a big switch and try to fight fires as they're reported).
- Cleanup some register reads and writes to use existing register
access macros.
- Ensure code which only applies to the control endpoint is not run
for other endpoints in the data transfer path.
MFC after: 3 days
handled by creator(4) (Sun Creator 3D, Elite 3D, etc.). This provides
vt(4) consoles on all devices currently supported by syscons on sparc64.
The driver should also be easily adaptable to support newer Sun framebuffers
such as the XVR-500 and higher.
Many thanks to dumbbell@ (Jean-Sebastien Pedron) for testing this remotely
during development.
ethernet class.
Note: This is untested as I do not have a device like this. That is
reflected in the MFC timeout.
PR: 192345
Submitted by: rozhuk.im gmail.com
MFC after: 4 weeks
we set MK_INET6_SUPPORT to no, not if we do define INET6.
This way we do not try to build IPv6 parts in if the kernel doesn't support
them.
This unbreaks several kernel configurations building modules but no INET6.
With the current implementation of managed fictitious ranges when
also using VM_PHYSSEG_DENSE, a user could try to register a
fictitious range that starts inside of vm_page_array, but then
overrruns it (because the end of the fictitious range is greater than
vm_page_array_size + first_page). This would result in PHYS_TO_VM_PAGE
returning unallocated pages from past the end of vm_page_array. The
same could happen if a user tried to register a segment that starts
outside of vm_page_array but ends inside of it.
In order to fix this, allow vm_phys_fictitious_{reg/unreg}_range to
use a set of pages from vm_page_array, and allocate the rest.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, alc
vm/vm_phys.c:
- Allow registering/unregistering fictitious ranges that overrun
vm_page_array.
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.
As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.
Tested by: glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
In vdev_get_stats, check that the vdev is not a hole before computing the
fragmentation. This fixes a panic when removing log device.
Illumos issue:
5049 panic when removing log device
Author: Alex Reece <alex@delphix.com>
MFC after: 2 weeks
Make the sysinit tool a build tool rather than building in with
/usr/bin/cc and running it from OBJDIR. (It will be moved to usr.bin
once a manpage is written and a few style cleanups are done.)
Split the makefile bits for Hans' kernel shim layer into their own
includable kshim.mk.
Move USB support into a .mk file so loaders can include it.
opt_inet6.h into kmod.mk by forcing almost everybody to eat the same
dogfood. While at it, consolidate the opt_bpf.h and opt_mroute.h
targets here too.
Replace a single soft updates thread with a thread per FFS-filesystem
mount point. The threads are associated with the bufdaemon process.
Reviewed by: kib
Tested by: Peter Holm and Scott Long
MFC after: 2 weeks
Sponsored by: Netflix
In linux EXT4_LINK_MAX is now 64000. We can't really do that
since i_nlink and va_nlink are signed so setting higher values
is likely to cause trouble.
This is a system limitation so set the EXT_LINK_MAX to
what the system can handle.
MFC after: 3 days
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
x86/xen/xen_nexus.c:
- Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
force the usage of the Xen Nexus.
- Attach the ACPI bus when running as Dom0.
dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
- Add a variable that gates the addition of the devices.
x86/include/init.h:
- Declare variables that control the attachment of ACPI cpu, hpet and
timer devices.
Minor fixes to make the Xen Dom0 console work. This includes always
returning there's pending input in xencons_has_input, because on Dom0
there's no shared ring and we cannot test the indexes. The second
fix is to use the CONSOLEIO_read hypercall in order to read input
data from the Xen console.
Sponsored by: Citrix Systems R&D
dev/xen/console/xencons_ring.c:
- Always return true in xencons_has_input for Dom0.
- Implement Dom0 console support for xencons_handle_input.
Allow a privileged Xen guest (Dom0) to parse the MADT ACPI interrupt
overrides and register them with the interrupt subsystem.
Also add a Xen specific implementation for bus_config_intr that
registers interrupts on demand for all the vectors less than
FIRST_MSI_INT.
Sponsored by: Citrix Systems R&D
x86/xen/pvcpu_enum.c:
- Use helper functions from x86/acpica/madt.c in order to parse
interrupt overrides from the MADT.
- Walk the MADT and register any interrupt override with the
interrupt subsystem.
x86/xen/xen_nexus.c:
- Add a custom bus_config_intr method for Xen that intercepts calls
to configure unset interrupts and registers them on the fly (if the
vector is < FIRST_MSI_INT).
Split a portion of the code in madt_parse_interrupt_override to a
separate function, that is public and can be used from other code.
This will be needed by the Xen port, since FreeBSD needs to parse the
interrupt overrides and notify Xen about them.
This commit should not introduce any functional change.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb, gibbs
x86/acpica/madt.c:
- Introduce madt_parse_interrupt_values() that parses the intr
information from ACPI and returns the triggering and the polarity.
This is a subset of the functionality that used to be part of
madt_parse_interrupt_override().
- Make madt_found_sci_override a global variable that can be used
from other files.
x86/include/acpica_machdep.h:
- Prototype of madt_parse_interrupt_values.
- Extern declaration of madt_found_sci_override.
Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can
force the usage of the Xen mptable enumerator even when ACPI is
detected.
This is needed because Xen might restrict the number of vCPUs
available to Dom0, but the MADT ACPI table parsed in FreeBSD is the
native one (which enumerates all the CPUs available in the system).
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
x86/acpica/madt.c:
- Lower MADT enumerator quality to -50.
x86/xen/pvcpu_enum.c:
- Rise Xen PV enumerator to 0.
This change inserts the Xen interrupt subsystem (event channels)
initialization between the system interrupt initialization and the IO
APIC source registration.
This is needed when running on Dom0, that routes physical interrupts
on top of event channels, so that the interrupt sources found during
IO APIC initialization can be registered using the Xen interrupt
subsystem.
The resulting order in the SI_SUB_INTR stage is the following:
- System intr initialization
- Xen intr initalization
- IO APIC source registration
Sponsored by: Citrix Systems R&D
x86/x86/local_apic.c:
- Change order of apic_setup_io to be called after xen interrupt
subsystem is setup.
x86/xen/xen_intr.c:
- Init Xen event channels before apic_setup_io.
Add a new DDB command to dump all registered event channels.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Add a new xen_evtchn command to DDB in order to dump all
information related to event channels.
Mask all event channels during initialization. This is done so that we
don't receive spurious interrupts while dynamically registering new
event channels. There's a small window during registration where an
event channel can fire before we have attached a handler to it.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Mask all event channels on init.
This allows Dom0 to manage physical hardware, redirecting the
physical interrupts to event channels.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Expand struct xenisrc to hold the level and triggering of PIRQ
event channels.
- Implement missing methods in xen_intr_pirq_pic.
- Allow xen_intr_alloc_isrc to take a vector parameter that globally
identifies the interrupt. This is only used for PIRQs that are
bound to a specific hardware IRQ.
- Introduce xen_register_pirq used to register IO APIC legacy PIRQ
interrupts.
- Add support for the dynamic PIRQ EOI map, this shared memory is
modified by Xen (if it suppoorts that feature), and notifies the
guest if an EOI is needed or not. If it's not available fall back
to the old implementation using PHYSDEVOP_irq_status_query.
- Rename xen_intr_isrc_count to xen_intr_auto_vector_count and
replace it's usages.
- Align static variables by name.
xen/xen_intr.h:
- Add prototype for xen_register_pirq.
This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET,
but also changed one of the SOCK_DGRAM code paths to use the new
sbappendaddr_nospacecheck_locked() function. This lead to SOCK_DGRAM
bypassing socket buffer limits.
bridges in strange ways, either rendering them unable to detect
insertion and removal events, or possibly unable to read from the
device behind the bridge.
This fixes at least one laptop, a Toshiba Tecra M5 with a Texas
Instruments PCxx12 (d=0x8039 v=0c104c) bridge. The very similar
Tecra M9 has the same bridge, but worked fine without this change.
The bridge chip has no I/O port BAR, and there is nothing in the spec
to suggest I/O decoding should be enabled; however enabling it fixes
the issue. Add an XXX comment to this effect.
Discussed with: jhb, imp
MFC after: 2 weeks
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.
Sponsored by: EMC / Isilon Storage Division
NRSACK extension. The default will still be off, since it
it not an RFC (yet).
Changing the sysctl name will be in a separate commit.
MFC after: 1 week
Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().
Reviewed by: mav, gibbs
Differential Revision: https://phabric.freebsd.org/D521
MFC after: 2 weeks
It could be claimed that two things were reasonable protected by
Giant. One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is
now updated with atomics.
Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
option for controlling ECN on future associations and get the
status on current associations.
A simialar pattern will be used for controlling SCTP extensions in
upcoming commits.
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.
(See r268327 and r269134 for the motivation behind this change.)
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.
This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.
Illumos issue:
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>
MFC after: 2 weeks
writes to files for read-only file systems. Since there are already
checks in nandfs_setattr that return an error, this moves detection of
the error earlier.
provides support for a variety of low-end graphics hardware (SBus adapters,
Mach64, QEMU's framebuffer, XVR-100). A driver for at least the Creator3D
cards will have to be present before this can become the default console
driver.
To test vt(4) on sparc64, set kern.vty=vt at the loader prompt.
Reorganize struct sge_iq. Make the iq entry size a compile time
constant. While here, eliminate RX_FL_ESIZE and use EQ_ESIZE directly.
MFC after: 2 weeks
This prevents recursion of vdev_queue_io_done as per r265321 but
using a different method as recommended on the openzfs list.
We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead
of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods.
zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns
ZIO_PIPELINE_STOP to ensure future changes don't reintroduce
ZIO_PIPELINE_CONTINUE returns.
Cleanup flow in vdev_geom_io_start while I'm here.
Also fix some cases not using SET_ERROR(..)
MFC after: 2 weeks
X-MFC-With: r265321
don't need any #ifdef stuff to use atomic_load/store_64() elsewhere in
the kernel. For armv4 the atomics are trivial to implement for kernel
code (just disable interrupts), less so for user mode, so this only has
the kernel mode implementations for now.
nanouptime() instead of getnanouptime(). nanouptime(9) provides more
precise result at expense of being slower.
In r269223, gethrtime() is used as creation time of dbuf, which in turn
acts as portion of lookup key to maintain AVL invariant where there can
not be duplicate items. Before this change, gethrtime() have preferred
better execution time by sacrificing precision, which may lead to panic
on busy systems with:
panic: avl_find() succeeded inside avl_add()
Reported by: allanjude, mav
PR: kern/192284
MFC after: 11 days
X-MFC-with: r269223
value shared across multiple cores is with atomic_load_64() and
atomic_store_64(), because the normal 64-bit load/store instructions
are not atomic on 32-bit arm. Luckily the ldrexd/strexd instructions
that are atomic are fairly cheap on armv6. Because it's fairly simple
to do, this implements all the ops for 64-bit, not just load/store.
Reviewed by: andrew, cognet
The rss_key[] array in netinet/in_rss.c has the bytes in incorrect
order. This results in the RSS test vectors in the Microsft RSS spec
and Intel NIC specs giving incorrect results, and making it difficult
to verify correct hash operation when RSS functionality is added to
new NICs.
CR: https://phabric.freebsd.org/D516
Reviewed by: adrian
We have functions nested within functions, and places where we start a
function then never end it, we just jump to the middle of something else.
We tried to express this with nested ENTRY()/END() macros (which result
in .fnstart and .fnend directives), but it turns out there's no way to
express that nesting in ARM EHABI unwind info, and newer tools treat
multiple .fnstart directives without an intervening .fnend as an error.
These changes introduce two new macros, EENTRY() and EEND(). EENTRY()
creates a global label you can call/jump to just like ENTRY(), but it
doesn't emit a .fnstart. EEND() is a no-op that just documents the
conceptual endpoint that matches up with the same-named EENTRY().
This is based on patches submitted by Stepan Dyatkovskiy, but I made some
changes and added the EEND() stuff, so blame any problems on me.
Submitted by: Stepan Dyatkovskiy <stpworld@narod.ru>
(4 in operation), 4GB ram (3.5 usable) ARM machine.
Support covers device drivers for:
- Serial Peripheral Interface (SPI)
- Chrome Embedded Controller (EC) - SPI-based version
- XHCI and USB 3.0 dual-role device PHY
Also:
- Add support for Exynos5420 in Pad module
- Move power-related functions to separate driver --
Power Management Unit (PMU)
- Enable XHCI for Chromebook1
Special thanks to grehan@ for hardware, and to
hselasky@ for r269139.
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)
Tested by: andreast
compressed tarball, aka package. The file system assumes that the
files are layed-out in the same order as needed to allow for the
package to be streamed. As such, it does not read an entire package
into memory first.
Some properties of the file system:
o Files that start with '+' are silently skipped. These are found
in FreeBSD package files.
o Files smaller than or equal to 4KB will be cached in memory and
as such allow for some flexibility in accessing files out of
order.
o Files with the .tgz suffix are assumed to be (sub-)packages and
signal the end for a directory scan.
Obtained from: Juniper Networks, Inc.
left some of the decisions based on its counterpart, SA_CCB_BUFFER_IO
being random. As a result, propagation of the residual information
for the SPACE command was broken, so the number of filemarks
encountered during a SPACE operation was miscalculated. Consequently,
systems relying on properly tracked filemark counters (like Bacula)
fell apart.
The change also removes a switch/case in sadone() which r256843
degraded to a single remaining case label.
PR: 192285
Approved by: ken
MFC after: 2 weeks
It overflows witness.
Shorten the names of some nfs mutexes.
Reported and tested by: pho
No objections from: rmacklem, mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In the mmcsd layer use this value to populate disk->d_ident. Also set
disk->d_descr to the full set of card identification info (includes vendor,
model, manufacturing date, etc).
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.
This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before
any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will
panic.
Reported by: andreast
* Implements Start Stop Unit for SATA direct-attach devices in IR mode to avoid
data corruption.
* Use CAM_DEV_NOT_THERE instead of CAM_SEL_TIMEOUT and CAM_TID_INVALID
Obtained from: LSI
MFC after: 2 weeks
features. If bootverbose is enabled, a detailed list is provided;
otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
indicate the presence of optional capabilities under this node.
CR: https://phabric.freebsd.org/D498
Reviewed by: grehan, neel
MFC after: 1 week
framebuffer drivers. This lets ofwfb work with xf86-video-scfb and makes
the driver much more generic and less PCI-centric. This changes some
user-visible behavior and will require updates to the xorg-server port
on PowerPC when using ATI graphics cards.
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags. Two of the three macros were only used in assertions.
In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself. Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.
In v_incr_usecount(), call vholdl() before incrementing other ref
counters. The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue:
5034 ARC's buf_hash_table is too small
MFC after: 2 weeks
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets
MFC after: 2 weeks
code behave more like it is on Solaris.
Reported by: avg
Reviewed by: avg, mav (but bugs are mine)
Differential Revision: https://phabric.freebsd.org/D457
moved from the stack into the tag structure. In retrospect that was a bad
idea, because nothing protects that array from concurrent access by
multiple threads.
This change moves the array to the map structure (actually it's allocated
following the structure, but all in a single malloc() call).
This also establishes a "sane" limit of 4096 segments per map. This is
mostly to prevent trying to allocate all of memory if someone accidentally
uses a tag with nsegments set to BUS_SPACE_UNRESTRICTED. If there's ever
a genuine need for more than 4096, don't hesitate to increase this (or
maybe make it tunable).
Reviewed by: cognet
triggers a need to bounce due to cacheline alignment. These buffers
are always aligned to cacheline boundaries, and even when the DMA operation
starts at an offset within the buffer or doesn't extend to the end of the
buffer, it's safe to flush the complete cachelines that were only partially
involved in the DMA. This is because there's a very strict rule on these
types of buffers that there will not be concurrent access by the CPU and
one or more DMA transfers within the buffer.
Reviewed by: cognet
functions, it has evolved to make a variety of decisions about whether
the DMA needs to bounce, so rename it to must_bounce(). Rewrite it to
perform checks outside of the ancestor loop if they're based on information
that's wholly contained within the original tag. Now the loop only checks
exclusion zones in ancestor tags.
Also, add a new function, might_bounce() which does a fast inline check
of flags within the tag and map to quickly eliminate the need to call
the more expensive must_bounce() for each page in the DMA operation.
Within the mapping loops, use map->pagesneeded != 0 as a proxy for all
the various checks on whether bouncing might be required. If no pages
were reserved for bouncing during the checks before the mapping loop,
then there's no need to re-check any of the conditions that can lead
to bouncing -- all those checks already decided there would be no bouncing.
Reviewed by: cognet
exclusion zones and phsyical memory. The phys_avail[i] entries are the
address of the first byte of ram in the region, and phys_avail[i+1]
entries are the address of the first byte of ram in the next region
(i.e., they're not included in the region that starts at phys_avail[i]).
Reviewed by: cognet
unchanging values in the phys_avail array, so do the comparisons just once
at tag creation time and set a flag to remember the result.
Reviewed by: cognet
DMA on arm can bounce for several reasons, and _bus_dma_can_bounce() only
checks for the lowaddr/highaddr exclusion ranges in the dma tag, so now
it's named exclusion_bounce(). The other reasons for bouncing are checked
by the new functions alignment_bounce() and cacheline_bounce().
Reviewed by: cognet
postponing it to zfs_vget(). zfs_root() returned vnode with the
default value of v_hash, which caused inconsistent v_hash value when
root vnode was obtained from zfs_vget().
Nullfs allocated two upper vnodes for the root zfs vnode due to
different hashes, causing consistency problems.
Reported and tested by: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
place for the NFS-based PXE loader. Information like rootpath
or rootip aren't that useful for TFTP and the gateway IP is
typically already printed by the firmware.
2. Only set boot.nfsroot.* environment variables for NFS. This
makes it possible for the OS to work either way by checking
for the presence or absence of environment variables.
3. Set boot.netif.server when using TFTP so that the OS can fetch
files as well. A typical use case for this is network-based
installations with the installation process implemented on
top of FreeBSD.
4. The pxelinux loader has a set of alternative names it tries
for configuration files. Make it easier to do something
similar in Forth by providing the IP address as a 32-bit hex
number in the pxeboot.ip variable and the MAC address with
dashes in the pxeboot.hwaddr environment variable.
Obtained from: Juniper Networks, Inc.
particular, allow loaders to define the name of the RC script the
interpreter needs to use. Use this new-found control to have the
PXE loader (when compiled with TFTP support and not NFS support)
read from ${bootfile}.4th, where ${bootfile} is the name of the
file fetched by the PXE firmware.
The normal startup process involves reading the following files:
1. /boot/boot.4th
2. /boot/loader.rc or alternatively /boot/boot.conf
When these come from a FreeBSD-defined file system, this is all
good. But when we boot over the network, subdirectories and fixed
file names are often painful to administrators and there's really
no way for them to change the behaviour of the loader.
Obtained from: Juniper Networks, Inc.
be able to claim I know how the UART code works."
* Just return 115200 as the current baud rate. I should cache it in the
device struct and return that but I'm lazy right now.
* don't error out on other ioctl settings for now, just silently ignore them.
* remove some code that was copied from the 8250 driver that isn't needed
any longer.
Tested:
* AR9331, Carambola-2 board.
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.
Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.
Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.
In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.
Submitted by: Steve Kiernan <stevek@juniper.net>
Obtained from: Juniper Networks, Inc.
underlying physical pages are mapped by the pmap. If, for example, the
application has performed an mprotect(..., PROT_NONE) on any part of the
wired region, then those pages will no longer be mapped by the pmap.
So, using the pmap to lookup the wired pages in order to unwire them
doesn't always work, and when it doesn't work wired pages are leaked.
To avoid the leak, introduce and use a new function vm_object_unwire()
that locates the wired pages by traversing the object and its backing
objects.
At the same time, switch from using pmap_change_wiring() to the recently
introduced function pmap_unwire() for unwiring the region's mappings.
pmap_unwire() is faster, because it operates a range of virtual addresses
rather than a single virtual page at a time. Moreover, by operating on
a range, it is superpage friendly. It doesn't waste time performing
unnecessary demotions.
Reported by: markj
Reviewed by: kib
Tested by: pho, jmg (arm)
Sponsored by: EMC / Isilon Storage Division
doesn't have support for the Z8530. Embedded PowerPC platforms
typically don't. Fail when the device class we actually need is
not present.
Obtained from: Juniper Networks, Inc.
rebased to point to the allocated memory, but for architectures
that have non-zero relocation addends, the address comparison
happens on the "unfinalized" address.
After the addend is taken into account, call elf_relocaddr() to
make sure we rebase properly.
is the indication that draining got interrupted due to a revoke(2) and
that tty_drain() is to be called again for draining to complete. If the
device is flagged as gone, then waiting/draining is not possible. Only
return ERESTART when waiting is still possible.
Obtained from: Juniper Networks, Inc.
Unlike disk devices ZVOLs process all requests synchronously. That makes
impossible sending multiple requests to them from single thread. From the
other side ZVOLs have real d_read/d_write methods, which unlike d_strategy
can handle uio scatter/gather and have no strict I/O size limitations.
So, if ZVOL in "dev" mode is detected, use of d_read/d_write methods instead
of d_strategy allows to avoid pointless splitting of large requests into
MAXPHYS (128K) sized chunks.
MFC after: 1 week
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
MFC after: 2 weeks
This also fixes a few minor violations of the SD protocol, such as running
the bus at high speed during the card identification sequence.
The sdcard_init() routine now probes for SDHC cards so that later read
requests can make needed adjustments between block and byte offsets based
on card type.
There is a new MCI_readblocks() function that takes block number and block
count parameters instead of byte-offset values. Using this routine, boot
loader code can load a kernel from any location on an SDHC or standard SD.
The old MCI_read() interface remains unchanged so that existing customized
boot loader code will still keep working without changes. Using this
routine, boot loaders can load a kernel from anywhere in the first 4GB of
an SDHC card (or of course any location on a standard SD card).
A new sdcard_use4wire() routine allows boot loaders to request 4-bit
transfers; it should be called after sdcard_init(). The sdcard_init()
routine no longer assumes the hardware is 4-wire capable and by default
sets things up for 1-bit transfers. (4-wire mode is unreliable on
at91rm9200, works on later SoCs.)
PR: 155894
Submitted by: me. years ago.
forever in vm_handle_hlt().
This is usually not an issue as long as one of the other vcpus properly resets
or powers off the virtual machine. However, if the bhyve(8) process is killed
with a signal the halted vcpu cannot be woken up because it's sleep cannot be
interrupted.
Fix this by waking up periodically and returning from vm_handle_hlt() if
TDF_ASTPENDING is set.
Reported by: Leon Dang
Sponsored by: Nahanni Systems
This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.
Submitted by: Conrad Meyer (EMC / Isilon storage division)
instead of at the beginning. This allows an intra process signal
to be sent to the oldest thread with the signal unmasked - which,
if it still exists, is the main thread. This mimics behavior
found in Linux and Solaris.
link_elf_obj: symbol icl_pdu_new_bhs undefined
PR: 192031
Submitted by: Nils Beyer (earlier version)
MFC after: 3 days
Sponsored by: FreeBSD Foundation
been transferred from zio_compress_data to its caller. Therefore, passing
the 'minblocksize' down will be a no-op.
Eliminate the parameter to reduce diff against upstream.
MFC after: 2 weeks
It is not possible to PUSH a 32-bit operand on the stack in 64-bit mode. The
default operand size for PUSH is 64-bits and the operand size override prefix
changes that to 16-bits.
vm_copy_setup() can return '1' if it encounters a fault when walking the
guest page tables. This is a guest issue and is now handled properly by
resuming the guest to handle the fault.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).
Submitted by: Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by: Chelsio Communications.
PF_LINK, and multicast/broadcast flag should always be dropped because
the outer protocol uses unicast even when the inner address is not for
unicast. It had been broken since r236951 when gif_output() started to
use IFQ_HANDOFF().
and tmpfs object cannot shadow. In other words, tmpfs vm object is
always at the bottom of the shadow chain.
Reported and tested by: bdrewery
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.
Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".
1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
involves updating the corresponding page tables followed by accesses to the
pages in question. This sequence is subject to the situation exactly described
in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming"
rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore,
issuing the INVLPG right after modifying the PTE bits is crucial (see also
r269050).
For the amd64 PMAP code, the order of instructions was already correct. The
above fact still is worth documenting, though.
1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
Reviewed by: alc
Sponsored by: Bally Wulff Games & Entertainment GmbH
corresponding page tables followed by accesses to the pages in question.
This sequence is subject to the situation exactly described in the "AMD64
Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23,
"7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing
the INVLPG right after modifying the PTE bits is crucial.
For pmap_copy_page(), this has been broken in r124956 and later on carried
over to pmap_copy_pages() derived from the former, while all other places
in the i386 PMAP code use the correct order of instructions in this regard.
Fixing the latter breakage solves the problem of data corruption seen with
unmapped I/O enabled when running at least bare metal on AMD R-268D APUs.
However, this might also fix similar corruption reported for virtualized
environments.
- In pmap_copy_pages(), correctly set the cache bits on the source page being
copied. This change is thought to be a NOP for the real world, though. [2]
1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
Submitted by: kib [2]
Reviewed by: alc, kib
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.
A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
custom free routine (rxb_free) in the driver. Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet. This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.
MFC after: 1 week
Setting PSE together with PAE or in long mode just makes the PSE bit
completely ignored, so don't set it.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
the assumption that consumers would respect bio_completed and/or
bio_resid to detect short reads. This assumption proved false and
file corruption was the result.
Create as many bios as we need to satisfy the original request.
Check the cached chunk every time we need to do I/O to increase the
hit rate.
Obtained from: junipre Networks, Inc.
MFC after: 1 week
This change is a bit ugly, but so is the coupling between the i915
driver and syscons. It isn't worth developing a more elegant solution
only to support the legacy syscons console.
The hcreate(3) implementation and related functions we inherited
from NetBSD used to free() the key value, something that is not
supported by the standard implementation.
This would cause a segmentation fault when attempting to run
the examples from the opengroup and linux manpages. NetBSD
has added non-standard calls to provide the previous
behaviour but hdestroy is not very commonly used so at this
time it seems excessive to bring those to FreeBSD.
Bump the __FreeBSD_version as this is an ABI change.
Reference:
http://bugs.dragonflybsd.org/issues/1398
MFC after: 2 weeks
If RSS is enabled, ixgbe(4) will query the RSS API for the types of hashes
which should be used. It'll then only enable hashes that are exposed via
the RSS layer.
This way it won't try to do things like enable UDP hashing if RSS explicitly
states that it isn't supported in lookups.
Tested:
* 82599EB ixgbe(4) NIC
A mix of fragmented and non-fragmented UDP in a single stream will end up
being hashed differently, resulting in out-of-order behaviour in the receive
path.
This was done in the linux e1000 driver in 2011.
Discussed with: jfv
by the stack.
Right now the stack isn't really setup for RSS with 4-tuple UDP hashing
for either IPv4 and IPv6.
The specifics:
* The UDP init path udp_init() and udplite_init() specify the hash as
2-tuple, so the PCBGROUPS code only tries a 2-tuple check;
* The PCBGROUPS and RSS code doesn't know about the UDP hash types
just yet, so they're never treated as valid hashes.
* For correctness, 4-tuple can't be enabled in the general case because
UDP datagrams can be more fragmented than IP datagrams may be.
Strictly speaking, TCP datagrams may also be fragmented and this could
cause issues with PCBGROUPS/RSS until the IP defragment path grows some
code to re-calculate the RSS hash.
I'll follow this commit up with awareness of the UDP 4-tuple for those
who wish to configure it, but for now it'll stay disabled.
No drivers (yet) know to use this function when RSS is enabled.