commit 8bd88585ed8e3f7def0d780a1bc30d96fe642b9c
Rework atse_rx_cycles handling: count packets instead of fills, and use the
limit only when polling, not when in interrupt mode. Otherwise, we may
stop reading the FIFO midpacket and clear the event mask even though the
FIFO still has data to read, which could stall receive when a large packet
arrives. Add a comment about races in the Altera FIFO interface: we may
need to do a little more work to handle races than we are.
commit 20b39086cc612f8874dc9e6ef4c0c2eb777ba92a
Use 'sizeof(data)' rather than '4' when checking an mbuf bound, as is the
case for adjusting length/etc.
commit e18953174a265f40e9ba60d76af7d288927f5382
Break out atse_intr() into two separate routines, one for each of the two
interrupt sources: receive and transmit.
commit 6deedb43246ab3f9f597918361831fbab7fac4ce
For the RX interrupt, take interest only in ALMOSTEMPTY and OVERFLOW.
For the TX interrupt, take interest only in ALMOSTFULL and UNDERFLOW.
Perform TX atse_start_locked() once rather than twice in TX interrupt
handling -- and only if !FULL, rather than unconditionally.
commit 12601972ba08d4380201a74f5b967bdaeb23092c
Experimentation suggests that the Altera Triple-Speed Ethernet documentation
is incorrect and bits in the event and interrupt-enable registers are not
irrationally rearranged relative to the status register.
commit 3cff2ffad769289fce3a728152e7be09405385d8
Substantially rework interrupt handling in the atse(4) driver:
- Introduce a new macro ATSE_TX_PENDING() which checks whether there is
any pending data to transmit, either in an in-progress packet or in
the TX queue.
- Introduce new ATSE_RX_STATUS_READ() and ATSE_TX_STAUTS_WRITE() macros
that query the FIFO status registers rather than event registers,
offering level- rather than edge-triggered FIFO conditions.
- For RX, interrupt only on full/overflow/underflow; for TX, interrupt
only on empty/overflow/underflow.
- Add new ATSE_RX_INTR_READ() and ATSE_RX_INTR_WRITE() macros useful for
debugging interrupt behaviour.
- Add a debug.atse_intr_debug_enable sysctl that causes various pieces
of FIFO state to be printed out on each RX or TX interrupt. This is
disabled by default but good to turn on if the interface appears to
wedge. Also print debugging information when polling.
- In the watchdog handler, do receive, not just transmit, processing, to
ensure that the rx, not just tx, queue is being handled -- and, in
particular, will be drained such that interrupts can resume.
- Rework both atse_rx_intr() and atse_tx_intr() to eliminate many race
conditions, and add comments on why various things are in various
orders. Interactions between modifications to the event and interrupt
masks are quite subtle indeed, and we must actively check for a number
of races (e.g., event mask cleared; packet arrives; interrupts enabled).
We also now use the status registers rather than event registers for
FIFO status checks to avoid other races; we continue to use event
registers for underflow/overflow.
With this change, interrupt-driven operation of atse appears (for the
time being) robust.
commit 3393bbff5c68a4e61699f9b4a62af5d2a5f918f8
atse: Fix build after 3cff2ffa
Obtained from: cheribsd
Submitted by: rwatson, emaste
Sponsored by: DARPA/AFRL
MFC after: 3 days
fibs. Use the mbuf's or the socket's fib instead of RT_ALL_FIBS. Fixes PR
187553. Also fixes netperf's UDP_STREAM test on a nondefault fib.
sys/netinet/ip_output.c
In ip_output, lookup the source address using the mbuf's fib instead
of RT_ALL_FIBS.
sys/netinet/in_pcb.c
in in_pcbladdr, lookup the source address using the socket's fib,
because we don't seem to have the mbuf fib. They should be the same,
though.
tests/sys/net/fibs_test.sh
Clear the expected failure on udp_dontroute.
PR: 187553
CR: https://reviews.freebsd.org/D772
MFC after: 3 weeks
Sponsored by: Spectra Logic
This fixes a bug which resulted in a warning on the userland
stack, when compiled on Windows.
Thanks to Peter Kasting from Google for reporting the issue and
provinding a potential fix.
MFC after: 3 days
towards blind SYN/RST spoofed attack.
Originally our stack used in-window checks for incoming SYN/RST
as proposed by RFC793. Later, circa 2003 the RST attack was
mitigated using the technique described in P. Watson
"Slipping in the window" paper [1].
After that, the checks were only relaxed for the sake of
compatibility with some buggy TCP stacks. First, r192912
introduced the vulnerability, just fixed by aforementioned SA.
Second, r167310 had slightly relaxed the default RST checks,
instead of utilizing net.inet.tcp.insecure_rst sysctl.
In 2010 a new technique for mitigation of these attacks was
proposed in RFC5961 [2]. The idea is to send a "challenge ACK"
packet to the peer, to verify that packet arrived isn't spoofed.
If peer receives challenge ACK it should regenerate its RST or
SYN with correct sequence number. This should not only protect
against attacks, but also improve communication with broken
stacks, so authors of reverted r167310 and r192912 won't be
disappointed.
[1] http://bandwidthco.com/whitepapers/netforensics/tcpip/TCP Reset Attacks.pdf
[2] http://www.rfc-editor.org/rfc/rfc5961.txt
Changes made:
o Revert r167310.
o Implement "challenge ACK" protection as specificed in RFC5961
against RST attack. On by default.
- Carefully preserve r138098, which handles empty window edge
case, not described by the RFC.
- Update net.inet.tcp.insecure_rst description.
o Implement "challenge ACK" protection as specificed in RFC5961
against SYN attack. On by default.
- Provide net.inet.tcp.insecure_syn sysctl, to turn off
RFC5961 protection.
The changes were tested at Netflix. The tested box didn't show
any anomalies compared to control box, except slightly increased
number of TCP connection in LAST_ACK state.
Reviewed by: rrs
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
in order to improve user-friendliness when a system has multiple disks
encrypted using the same passphrase.
When examining a new GELI provider, the most recently used passphrase
will be attempted before prompting for a passphrase; and whenever a
passphrase is entered, it is cached for later reference. When the root
disk is mounted, the cached passphrase is zeroed (triggered by the
"mountroot" event), in order to minimize the possibility of leakage
of passphrases. (After root is mounted, the "taste and prompt for
passphrases on the console" code path is disabled, so there is no
potential for a passphrase to be stored after the zeroing takes place.)
This behaviour can be disabled by setting kern.geom.eli.boot_passcache=0.
Reviewed by: pjd, dteske, allanjude
MFC after: 7 days
POSIX compliance and to improve compatibility with Linux and NetBSD
The issue was identified with lib/libc/sys/t_access:access_inval from
NetBSD
Update the manpage accordingly
PR: 181155
Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code)
MFC after: 4 weeks
Phabric: D678 (code), D786 (manpage)
Sponsored by: EMC / Isilon Storage Division
The flowdirector feature shares on-chip memory with other things
such as the RX buffers. In theory it should be configured in a way
that doesn't interfere with the rest of operation. In practice,
the RX buffer calculation didn't take the flow-director allocation
into account and there'd be overlap. This lead to various garbage
frames being received containing what looks like internal NIC state.
What _I_ saw was traffic ending up in the wrong RX queues.
If I was doing a UDP traffic test with only one NIC ring receiving
traffic, everything is fine. If I fired up a second UDP stream
which came in on another ring, there'd be a few percent of traffic
from both rings ending up in the wrong ring. Ie, the RSS hash would
indicate it was supposed to come in ring X, but it'd come in ring Y.
However, when the allocation was fixed up, the developers at Verisign
still saw traffic stalls.
The flowdirector feature ends up fiddling with the NIC to do various
attempts at load balancing connections by populating flow table rules
based on sampled traffic. It's likely that all of that has to be
carefully reviewed and made less "magic".
So for now the flow director feature is disabled (which fixes both
what I was seeing and what they were seeing) until it's all much
more debugged and verified.
Tested:
* (me) 82599EB 2x10G NIC, RSS UDP testing.
* (verisign) not sure on the NIC (but likely 82599), 100k-200k/sec TCP
transaction tests.
Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com>
MFC after: 1 week
Sponsored by: Verisign, Inc.
fmp->buf at the free point is already part of the chain being freed,
so double-freeing is counter-productive.
Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com>
MFC after: 1 week
Sponsored by: Verisign, Inc.
This allows the NIC to drop frames on the receive queue and not
cause the MAC to block on receiving to _any_ queue.
Tested:
igb0@pci0:5:0:0: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
Discussed with: Eric Joyner <eric.joyner@intel.com>
MFC after: 1 week
Sponsored by: Norse Corp, Inc.
- Fail with EINVAL if an invalid protection mask is passed to mmap().
- Fail with EINVAL if an unknown flag is passed to mmap().
- Fail with EINVAL if both MAP_PRIVATE and MAP_SHARED are passed to mmap().
- Require one of either MAP_PRIVATE or MAP_SHARED for non-anonymous
mappings.
Reviewed by: alc, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D698
Eliminate an exclusive object lock acquisition and release on the expected
execution path.
Do page zeroing before the object lock is acquired rather than during the
time that the object lock is held.
Use vm_pager_free_nonreq() to eliminate duplicated code.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
The suspend/resume of event channels is already handled by the xen_intr_pic.
If those methods are set on the PIRQ PIC they are just called twice, which
breaks proper resume. This fix restores migration of FreeBSD guests to a
working state.
Sponsored by: Citrix Systems R&D
by ffs and ext2fs. Remove duplicated call to vm_page_zero_invalid(),
done by VOP and by vm_pager_getpages(). Use vm_pager_free_nonreq().
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 6 weeks (after r271596)
Without clustering support we any way have only one group of permanently
active ports, but that gives us one more supported VMWare feature. ;)
Solaris' Comstar also reports it even when only one port is present.
This change fixes transient performance drops in some of my benchmarks,
vanishing as soon as I am trying to collect any stats from the scheduler.
It looks like reordered access to those variables sometimes caused loss of
IPI_PREEMPT, that delayed thread execution until some later interrupt.
MFC after: 3 days
spaces, rather than a split address, we actually can't check for being within
the kernel's address range. Instead, do what other backtraces do, and use
trapexit()/asttrapexit() as the stack sentinel.
MFC after: 3 weeks
In the fdt data we've written for ourselves, the interrupt properties
for GIC interrupts have just been a bare interrupt number. In standard
data that conforms to the published bindings, GIC interrupt properties
contain 3-tuples that describe the interrupt as shared vs private, the
interrupt number within the shared/private address space, and configuration
info such as level vs edge triggered.
The new gic_decode_fdt() function parses both types of data, based on the
#interrupt-cells property. Previously, each platform implemented a decode
routine and put a pointer to it into fdt_pic_table. Now they can just
list this function in their table instead if they use arm/gic.c.
path through the NFS clients' getpages functions.
Introduce vm_pager_free_nonreq(). This function can be used to eliminate
code that is duplicated in many getpages functions. Also, in contrast to
the code that currently appears in those getpages functions,
vm_pager_free_nonreq() avoids acquiring an exclusive object lock in one
case.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division
The code had references to both intr_offset and intr_parent variable names
as referring to the parent interrupt node. The intr_parent variable
wasn't actually defined anywhere, but the only references to it were as
an argument to a macro that didn't use that argument in expansion, so
the undefined variable accidentally didn't cause an error.
The intr_parent name makes more sense in context, so change all occurrances
of intr_offset to intr_parent.
* vfs.zfs.vdev.async_write_active_min_dirty_percent
* vfs.zfs.vdev.async_write_active_max_dirty_percent
Added validation of min / max for ZFS sysctl
* vfs.zfs.dirty_data_max_percent
MFC after: 3 days
devq_openings counter lost its meaning after allocation queues has gone.
held counter is still meaningful, but problematic to update due to separate
locking of CCB allocation and queuing.
To fix that replace devq_openings counter with allocated counter. held is
now calculated on request as difference between number of allocated, queued
and active CCBs.
MFC after: 1 month
The changes should not modify the generated code.
The pager->pgo_putpages() method takes int flags as its fourth
argument, while vnode_pager_putpages() used boolean_t (which is
typedef'ed to int). The flags are from VM_PAGER_* namespace, while
vnode_pager_putpages() passed TRUE and OBJPC_SYNC to VOP_PUTPAGES(),
which both are numerically equal to VM_PAGER_PUT_SYNC.
Noted and reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Prevent saturattion of the bus by constant polling which in
extreme cases can cause interface lockup. This makes FreeBSD
match similar case in the executive.
Correctly report hole at end of file.
When asked to find a hole, the DMU sees that there are no holes in the
object, and returns ESRCH. The ZPL interprets this as "no holes before
the end of the file", and therefore inserts the "virtual hole" at the
end of the file. Because DMU and ZPL have different ideas of where the
end of an object/file is, we will end up returning the end of file,
which is generally larger, instead of returning the end of object.
The fix is to handle the "virtual hole" in the DMU. If no hole is found,
the DMU will return a hole at the end of the file, rather than an error.
Illumos issue:
5139 SEEK_HOLE failed to report a hole at end of file
MFC after: 1 week
In zil_claim, don't issue warning if we get EBUSY (inconsistent) when
opening an objset, instead, ignore it silently.
Illumos issue:
5140 message about "%recv could not be opened" is printed when booting after crash
MFC after: 1 week
Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
limit how many blocks can be free'ed before a new transaction group is
created. The default is no limit (infinite), but we should probably have
a lower default, e.g. 100,000.
With this limit, we can guard against the case where ZFS could run out of
memory when destroying large numbers of blocks in a single transaction
group, as the entire DDT needs to be brought into memory.
Illumos issue:
5138 add tunable for maximum number of blocks freed in one txg
MFC after: 2 weeks
Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).
This makes some space estimates more accurate and uses less memory
for some data structures.
Illumos issue:
5141 zfs minimum indirect block size is 4K
MFC after: 2 weeks
It allows to bypass range checks between UNMAP and READ/WRITE commands,
which may introduce additional delays while waiting for UNMAP parameters.
READ and WRITE commands are always processed in safe order since their
range checks are almost free.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Before this change UNMAP completely blocked other I/Os while running.
Now it blocks only colliding ones, slowing down others only due to ZFS
locks collisions.
Sponsored by: iXsystems, Inc.
many thanks for their continued support of FreeBSD.
While I'm there, also implement a new build knob, WITHOUT_HYPERV to
disable building and installing of the HyperV utilities when necessary.
The HyperV utilities are only built for i386 and amd64 targets.
This is a stable/10 candidate for inclusion with 10.1-RELEASE.
Submitted by: Wei Hu <weh microsoft com>
MFC after: 1 week
Huawei. It might appear as if the firmware is allocating memory blocks
according to the USB transfer size and if there is initially a lot of
data, like at the answering machine prompt, it simply dies without any
apparent reason. The simple workaround for this is to force a zero
length packet at hardware level after every 512 bytes of data. This
will force the other side to use smaller memory blocks aswell.
MFC after: 1 week
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
and invfo_kqfilter() for use by file types that do not support the
respective operations. Home-grown versions of invfo_poll() were
universally broken (they returned an errno value, invfo_poll()
uses poll_no_poll() to return an appropriate event mask). Home-grown
ioctl routines also tended to return an incorrect errno (invfo_ioctl
returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
unsupported file operations.
- Reorder fileops members to match the order in the structure definition
to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most
of these used invfo_*(), but a dummy fo_stat() implementation was
added.
instead of breaking out of the loop and then immediately checking the loop
index so that if it was broken out of the proper value can be returned.
While here, use nitems().
nexus_alloc_resource() and don't set a bushandle.
nexus_activate_resource() will set a proper bushandle.
- Implement a proper nexus_release_resource().
- Fix ixppcib_activate_resource() to call rman_activate_resource()
before creating a mapping for the resource.
Tested by: jmg
by explicitly moving it out of the interrupt shadow. The hypervisor is done
"executing" the HLT and by definition this moves the vcpu out of the
1-instruction interrupt shadow.
Prior to this change the interrupt would be held pending because the VMCS
guest-interruptibility-state would indicate that "blocking by STI" was in
effect. This resulted in an unnecessary round trip into the guest before
the pending interrupt could be injected.
Reviewed by: grehan
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.
sys/sys/param.h
Increment __FreeBSD_version
sys/net/if.c
sys/net/if_var.h
sys/net/route.c
Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.
sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
Fixup calls of modified functions.
share/man/man9/ifnet.9
Document changed API.
CR: https://reviews.freebsd.org/D458
MFC after: Never
Sponsored by: Spectra Logic
Add in6ifa_ifwithaddr() function. It is similar to ifa_ifwithaddr,
but does fast lookup in the hash of inet6 addresses.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
* new macro to remove magic number - IPV6_ADDR_SCOPES_COUNT;
* sa6_checkzone() - this function checks sockaddr_in6 structure
for correctness of sin6_scope_id. It also can fill correct
value sometimes.
* in6_getscopezone() - this function returns scope zone id for
specified interface and scope.
* in6_getlinkifnet() - this function returns struct ifnet for
corresponding zone id of link-local scope.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
* use IN6_IS_ADDR_XXX() macro instead of hardcoded values;
* for multicast addresses just return scope value, the only exception
is addresses with 0x0F scope value (RFC 4291 p2.7.0);
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
an mbuf's storage (internal or external).
Add a new M_SIZE() mbuf macro that returns the size of an mbuf's
storage (internal or external).
These contrast with m_data and m_len, which are with respect to data
in the buffer, rather than the buffer itself.
Rewrite M_LEADINGSPACE() and M_TRAILINGSPACE() in terms of M_START()
and M_SIZE().
This is done as we currently have many instances of using mbuf flags
to generate pointers or lengths for internal storage in header and
regular mbufs, as well as to external storage. Rather than replicate
this logic throughout the network stack, centralising the
implementation will make it easier for us to refine mbuf storage.
This should also help reduce bugs by limiting the amount of
mbuf-type-specific pointer arithmetic. Followup changes will
propagate use of the macros throughout the stack.
M_SIZE() conflicts with one macro in the Chelsio driver; rename that
macro in a slightly unsatisfying way to eliminate the collision.
MFC after: 3 days
Obtained from: jeff (with enhancements)
Sponsored by: EMC / Isilon Storage Division
Reviewed by: bz, glebius, np
Differential Revision: https://reviews.freebsd.org/D753
AP startup and AP resume (it was already used for BSP startup and BSP
resume).
- Split code to do one-time probing of cache properties out of
initializecpu() and into initializecpucache(). This is called once on
the BSP during boot.
- Move enable_sse() into initializecpu().
- Call initializecpu() for AP startup instead of enable_sse() and
manually frobbing MSR_EFER to enable PG_NX.
- Call initializecpu() when an AP resumes. In theory this will now
properly re-enable PG_NX in MSR_EFER when resuming a PAE kernel on
APs.
the local APIC in initializecpu() and re-enables it if the APIC code
decides to use the local APIC after all. Rework this workaround
slightly so that initializecpu() won't re-disable the local APIC if
it is called after the APIC code re-enables the local APIC.
None of existing STEC devices need UNMAP or even support it well, having
many limitations and even hanging sometimes executing those commands.
New devices that may use UNMAP going to be released under HGST name.
MFC after: 3 days
bootloader. Implement the following routines:
pcibios-device-count count the number of instances of a devid
pcibios-read-config read pci config space
pcibios-write-config write pci config space
pcibios-find-devclass find the nth device with a given devclass
pcibios-find-device find the nth device with a given devid
pcibios-locator convert bus device function ti pcibios locator
These commands are thin wrappers over their PCI BIOS 2.1 counterparts. More
informaiton, such as it is, can be found in the standard.
Export a nunmber of pcibios.X variables into the environment to report
what the PCI IDENTIFY command returned.
Also implmenet a new command line primitive (pci-device-count), but don't
include it by default just yet, since it depends on the recently added
words and any errors here can render a system unbootable.
This is intended to allow the boot loader to do special things based
on the hardware it finds. This could be have special settings that are
optimized for the specific cards, or even loading special drivers. It
goes without saying that writing to pci config space should not be
done without a just cause and a sound mind.
Sponsored by: Netflix
A non-global IPv6 address can be used in more than one zone of the same
scope. This zone index is used to identify to which zone a non-global
address belongs.
Also we can have many foreign hosts with equal non-global addresses,
but from different zones. So, they can have different metrics in the
host cache.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
* Return ENETDOWN when interface specified by ipi6_ifindex is not
enabled for IPv6 use.
* Return EADDRNOTAVAIL when ipi6_ifindex specifies an interface, but the
address ipi6_addr is not available for use on that interface.
* Return EINVAL when ipi6_addr is multicast address.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
kern.cam.ctl.iscsi.ping_timeout to 0. This fixes interoperability with
some initiators that don't properly support NOP-Ins, namely iPXE/gPXE.
Submitted by: Chen Wen <pokkys@gmail.com>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
reboot/halt/debug.
o Add support for most key combinations supported by syscons(4).
Reviewed by: dumbbell, emaste (prev revision of D747)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
device drivers with calls to the centralised m_print() implementation.
While the formatting and output details differ a little, the content
is essentially the same, and it is unlikely anyone has used this
debugging output in some time.
This change reduces awareness of mbuf cluster allocation (and,
especially, the M_EXT flag) outside of the mbuf allocator, which will
make it easier to refine the external storage mechanism without
disrupting drivers in the future.
Style bugs are preserved.
Reviewed by: bz, glebius
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
isn't being allocated for the last of the requested pages, because a
reservation won't fit in the gap between allocated pages, then the
reservation structure shouldn't be initialized.
While I'm here, improve the nearby comments.
Reported by: jeff, pho
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
The nmdm code enforces a number between the 'nmdm' and 'A|B' portions
of the device name. This is then used as a unit number, and sprintf'd
back into the tty name. If leading zeros were used in the name,
the created device name is different than the string used for the
clone-open (e.g. /dev/nmdm0001A will result in /dev/nmdm1A).
Since unit numbers are no longer required with the updated tty
code, there seems to be no reason to force the string to be a
number. The fix is to allow an arbitrary string between
'nmdm' and 'A|B', within the constraints of devfs names. This allows
all existing user of numeric strings to continue to work, and also
allows more meaningful names to be used, such as bhyve VM names.
Tested on amd64, i386 and ppc64.
Reported by: Dave Smith
PR: 192281
Reviewed by: neel, glebius
Phabric: D729
MFC after: 3 days
At this moment it works only for files and ZVOLs in device mode since BIOs
have no respective respective cache control flags (DPO/FUA).
MFC after: 1 month
Sponsored by: iXsystems, Inc.
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
This doesn't include the same kind of userland overriding that the IPv4
path has; nor does it yet know about 2-tuple versus 4-tuple hashing.
That'll come later.
Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan
with no RSS hash.
When doing RSS:
* Create a new IPv4 netisr which expects the frames to have been verified;
it just directly dispatches to the IPv4 input path.
* Once IPv4 reassembly is done, re-calculate the RSS hash with the new
IP and L3 header; then reinject it as appropriate.
* Update the IPv4 netisr to be a CPU affinity netisr with the RSS hash
function (rss_soft_m2cpuid) - this will do a software hash if the
hardware doesn't provide one.
NICs that don't implement hardware RSS hashing will now benefit from RSS
distribution - it'll inject into the correct destination netisr.
Note: the netisr distribution doesn't work out of the box - netisr doesn't
query RSS for how many CPUs and the affinity setup. Yes, netisr likely
shouldn't really be doing CPU stuff anymore and should be "some kind of
'thing' that is a workqueue that may or may not have any CPU affinity";
that's for a later commit.
Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan
and egress.
* rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details
+ direction to calculate a hash.
* rss_proto_software_hash_v4 - hash the given source/destination IPv4 address,
port and direction.
* rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now)
These functions are intended to be used by the stack to support
the following:
* Not all NICs do RSS hashing, so we should support some way of doing
a hash in software;
* The NIC / driver may not hash frames the way we want (eg UDP 4-tuple
hashing when the stack is only doing 2-tuple hashing for UDP); so we
may need to re-hash frames;
* .. same with IPv4 fragments - they will need to be re-hashed after
reassembly;
* .. and same with things like IP tunneling and such;
* The transmit path for things like UDP, RAW and ICMP don't currently
have any RSS information attached to them - so they'll need an
RSS calculation performed before transmit.
TODO:
* Counters! Everywhere!
* Add a debug mode that software hashes received frames and compares them
to the hardware hash provided by the hardware to ensure they match.
The IPv6 part of this is missing - I'm going to do some re-juggling of
where various parts of the RSS framework live before I add the IPv6
code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch],
rather than living here.)
Note: This API is still fluid. Please keep that in mind.
Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan
information as part of recvmsg().
This is primarily used for debugging/verification of the various
processing paths in the IP, PCB and driver layers.
Unfortunately the current implementation of the control message path
results in a ~10% or so drop in UDP frame throughput when it's used.
Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan
overriding an existing flowid/flowtype field in the outbound mbuf with
the inp_flowid/inp_flowtype details.
The upcoming RSS UDP support calculates a valid RSS value for outbound
mbufs and since it may change per send, it doesn't cache it in the inpcb.
So overriding it here would be wrong.
Differential Revision: https://reviews.freebsd.org/D527
Reviewed by: grehan
u-boot env into the loader(8) env (which also gets them into the kernel
env). You can import selected variables or the whole environment. Each
u-boot var=value becomes uboot.var=value in the loader env. You can also
use 'ubenv show' to display uboot vars without importing them.
This fixes a panic in the i915 driver when one uses debug.kdb.enter=1
under vt(4).
PR: 193269
Reported by: emaste@
Submitted by: avg@
MFC after: 3 days
This fixes a bug where scroll lock would not work for tty #0 when using
vt_vga's textmode. The reason was that this window is created with a
static 256x100 buffer, larger than the real size of 80x25.
Now, in vt_change_font() and vt_compute_drawable_area(), we still
perform operations even of the window has no font loaded (this is the
case in textmode here vw->vw_font == NULL). One of these operation
resizes the buffer accordingly.
In vt_compute_drawable_area(), we take the terminal size as is (ie.
80x25) for the drawable area.
The font argument to vt_set_border() is removed (it was never used) and
the code now uses the computed drawable area instead of re-doing its own
calculation.
Reported by: Harald Schmalzbauer <h.schmalzbauer_omnilan.de>
Tested by: Harald Schmalzbauer <h.schmalzbauer_omnilan.de>
MFC after: 3 days
The rules turn out to be:
* for non-aggregation session TX queues - it's either sent or not sent.
* for aggregation session TX queues - if nframes=1, then the status reflects
the completed transmission.
* however, for nframes > 1, then this is just a status reflecting what
the initial transmission did. The compressed BA (immediate or delayed)
may not have yet been received, so the actual frame status is in the
compressed BA updates.
Whilst here, I fiddled with debugging and formatting a bit.
There's also RTS attempts (what the atheros chips call "short retries")
which weren't being logged and they aren't yet being used in the rate
control statistics updates. For now, at least log them.
TODO:
* This still isn't 100% correct! So I have to tinker with this some more.
(The failures aren't always failures..)
* Extend the rate control API in net80211 so it can take both short and
long retry counts.
Tested:
* Intel 5100, STA mode
The (eventual) intention is to create MIB counters for transmitted
frame completion to count how many packets with each status are
transmitted.
Note the difference between A-MPDU and non A-MPDU status.
Obtained from: Linux iwlwifi/dvm driver
va == pa map.
I'm not sure the code would work if we are not running from the identity
map as the ARM core may attempt to read the next instruction from an
invalid memory location.
In dnode_sync(), do dnode_increase_indirection() before processing
the dn_next_nblkptr.
Illumos issue:
5117 space map reallocation can cause corruption
MFC after: 3 days
eliminiates some warnings when building in userland.
Thanks to Patrick Laimbock for reporting this issue.
Remove also some unnecessary casts.
There should be no functional change.
MFC after: 1 week
boards.
This is just intended to split the common config entries out, further
cleanup is expected.
Reviewed by: ian@, rpaulo@ (earlier version)
Differential Revision: https://reviews.freebsd.org/D731
For controllers with only one port (like PCIe or M.2 SSDs) interrupt can
come from only one source, and skipping read saves few percents of CPU time.
MFC after: 1 month
H/W donated by: I/O Switch
It is the compaction bitmask, with the highest bit defining if compact
format of the xsave area is used at all.
Adjust the definition of struct xstate_hdr, provide define for bit 63.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
an entry in the xref list if one doesn't already exist for the given handle.
On a system that uses phandle properties, the init-time scan of the tree
which builds the xref list will pre-create entries for every xref handle
that exists in the data. On systems where the xref and node handles are
synonymous there is no phandle property in referenced nodes, and the xref
list will initialize to an empty state. In the latter case, we still need
to be able to associate a device_t with an xref handle, so we create list
entries on the fly as needed. Since the node and xref handles are
synonymous, we have all the info needed to create a list entry at device
registration time.
The downside to this change is that it basically allows on the fly creation
of xref handles as synonyms of node handles, and the association of a
device_t with them. Whether this is a bug or a feature is in the eye of
the beholder, I guess.
o Unmagic 'configuration done' bit
o Move probe() to place before attach() for better navigation
o Use bus_read_n instead of bus_space_read_n functions
Pointed out by: andrew
Sponsored by: DARPA, AFRL
%eax report.
Print the XSAVE features 0xd/1 in the boot banner. The printcpuinfo()
is executed late enough so that XSAVE is already enabled.
There is no known to me off the shelf hardware that implements any
feature bits except XSAVEOPT, the list is taken from SDM rev. 50. The
banner printing will allow us to note the hardware arrival.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
resume that is a superset of a pcb. Move the FPU state out of the pcb and
into this new structure. As part of this, move the FPU resume code on
amd64 into a C function. This allows resumectx() to still operate only on
a pcb and more closely mirrors the i386 code.
Reviewed by: kib (earlier version)
for the node. The default routine returns the untranslated handle, which
is sometimes useful, but sometimes you really need to know there's no
entry in the xref<->node<->device translation table.
be usable as the default timer in place of the physical timer.
We are guaranteed to have access to the virtual timer, but when running
under a hypervisor may not have access to the physical.
Differential Revision: https://reviews.freebsd.org/D588
mbuf-initialisation logic that is best left to centralised mbuf utility
code rather than scattered around the kernel.
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
imply a cluster is attached; it could also refer to some other sort of
external storage (e.g., an sf_buf).
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
I gave up to update list of Marvell chips that require this quirk.
The final nail was growing number of PCIe/M.2 SSDs where Marvell chips
have PCI IDs of different vendors.
MFC after: 1 week
H/W donated by: I/O Switch
I've looked at the GCC sources and I now understand what's going wrong.
THe C11 keywords are simply nonexistent when using C++ mode. They are
marked as C-only in the parser. This is absolutely impractical for
multiple reasons:
- The C11 keywords do not conflict with C++ naming rules. They all start
with _[A-Z]. There is no reason to make them C-only.
- It makes it practically impossible for people to use these keywords in
C header files and expect them to work from within C++ sources.
As I said in my previous commit message: GCC is by far the weirdest
compiler that I've ever used.
Incredibly weird: GCC 4.7/4.9 do support the _Noreturn and _Thread_local
keywords, but not during bootstrapping. GCC is by far the weirdest
compiler that I've ever used.
Reported by: andreast@
The legacy USB circuit tends to give trouble on MacBook.
While the original report covered MacBook, extend the fix
preemptively for the newer MacBookPro too.
PR: 191693
Reviewed by: emaste
MFC after: 5 days
driver when it hits a mbuf/iov buffer, it mallocs and copies the data
for processing.. This improves perf by ~8-10% on my machine...
I have thoughts of fixing AES-NI so that it can better handle segmented
buffers, which should help improve IPSEC performance, but that is for
the future...
PCI IDs into quirks, which mostly fit (though you'd get no argument
from me that AHCI_Q_SATA1_UNIT0 is oddly specific). Set these quirks
in the PCI attachment. Make some shared functions public so that PCI
and possibly other bus attachments can use them.
The split isn't perfect yet, but it is functional. The split will be
perfected as other bus attachments for AHCI are written.
Sponsored by: Netflix
Reviewed by: kan, mav
Differential Revision: https://reviews.freebsd.org/D699
imgp->interpreted to a bitmask instead of, functionally, a bool. Each
imgactivator now requires its own flag in interpreted to indicate whether
or not it has already examined argv[0].
Change imgp->interpreted to an unsigned char to add one extra bit for
future use.
With this change, one can execute a shell script from a 64bit host native
make and still get the binmisc image activator to fire for the script
interpreter. Prior to this, execution would fail.
Phabric: https://reviews.freebsd.org/D696
Reviewed by: jhb@
MFC after: 4 weeks
full vendor tree. This worked great until it was time to update, but
now it is time to update. Hit the rest button by removing this branch
and re-adding it by a full copy of whatever is in the vendor tree.
packets targeting a listening socket. Permit to reduce TCP input
processing starvation in context of high SYN load (e.g. short-lived TCP
connections or SYN flood).
Submitted by: Julien Charbon <jcharbon@verisign.com>
Reviewed by: adrian, hiren, jhb, Mike Bentkofsky
Current busdma code for unmapped bios will not properly align the segment
size, causing corruption on blkfront devices. Revert the commit until
busdma code is fixed.
Reported by: mav
MFC after: 1 day
try to collapse adjacent pieces using m_catpkt(). In best case
scenario it copies data and frees mbufs, making mbuf exhaustion
attack harder.
Suggested by: Jonathan Looney <jonlooney gmail.com>
Security: Hardens against remote mbuf exhaustion attack.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
few "general purpose registers" whose values control chip behavior in ways
that have nothing to do with IO pin mux control. Define a simple API that
other soc-specific code can use to read and write the registers, and provide
the imx51 implementation of them.
and into the TSC probe routine.
- Initialize cpu_exthigh once in finishidentcpu() which is called
before printcpuinfo() (and matches the behavior on amd64).
<machine/md_var.h>.
- Move some CPU-related variables out of i386/i386/identcpu.c to
initcpu.c to match amd64.
- Move the declaration of has_f00f_hack out of identcpu.c to machdep.c.
- Remove a misleading comment from i386/i386/initcpu.c (locore zeros
the BSS before it calls identify_cpu()) and remove explicit zero
assignments to reduce the diff with amd64.
set bo_bsize on a bufobj.
This is a slight modification of the patch provided.
PR: 193146
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by: EMC Isilon Storage Division
- miibus fixes as suggested by Yonghyeon Pyun.
- enable VLAN MTU support.
- fix a few WITNESS complaints in cgem_attach().
- have cgem_attach() properly init the ifnet struct before calling
mii_attach() to fix panic when using e1000phy.
- fix ethernet address changing.
- fix transmit queue overflow handling.
- tweak receive queue handling to reduce receive overflows.
- bring out MAC statistic counters to sysctls.
- add e1000phy to config file.
- implement receive hang work-around described in reference guide.
- change device name from if_cgem to cgem to be consistent with other
interfaces.
Submitted by: Thomas Skibo <ThomasSkibo@sbcglobal.net>
Reviewed by: wkoszek, Yonghyeon PYUN <pyunyh@gmail.com>
As GCC also gained support for the C11 keywords over time, we can patch
up <sys/cdefs.h> to not define these anymore. This has the advantage
that error messages for static assertions are printed natively and that
_Alignas() will work with even a type outside of C11 mode.
All C11 keywords are supported with GCC 4.7 and higher, with the
exception of _Thread_local and _Generic. These are only supported as of
GCC 4.9.