* Force the arge0 interface to not use a PHY for speed negotiation
for now. It'd be nice to do it, but right now the RGMII interface
to the switch needs to stay at 1000/full in order to match what
the switch side of the port is programmed as.
So until that's all sorted out, disconnect arge0 from the PHY
and leave it at fixed at 1000/full.
I noticed this when I tried using a busted ethernet cable that
forced the PHY to negotiate 100/full. The switch was fine and
it negotiated to 100/full, but then arge0 saw the link update
and set the speed to 100/full when the switch side of that
hook up was set to 1000/full. Tsk.
* When using argemdio, the mdio device resets and initialises
the MAC, /not/ the arge_attach (or, as I discovered, arge_init.)
So arge1 wasn't being fully initialised and thus no traffic
would ever flow.
So until I tidy up that mess, just create an argemdio bus for
arge1. It's totally fine; it won't do anything or find anything
attached to it.
Tested:
* AP135 reference board - both arge0 and arge1 now work.
to initialize mbuf's fibnum. Uninitialized fibnum value can lead to
panic in the routing code. Currently we use only RT_DEFAULT_FIB value
for initialization.
Differential Revision: https://reviews.freebsd.org/D1998
Reviewed by: hrs (previous version)
Sponsored by: Yandex LLC
is the case, depending on the options, in some of the ARM hardware
simulators. In these cases we don't get an interrupt so will need to
schedule the task to write more data to the uart.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This new function can be used by other drivers to reserve the use of GPIO
pins.
Anyway, the use of ofw_gpiobus_parse_gpios() is preferred when possible.
Requested by: Michal Meloun
- fix warning about comparison of 'uint8_t v_tpr >= 0' always being true.
- fix error triggered by an empty clobber list in the inline assembly for
"clgi" and "stgi"
- fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The
gcc assembler does not like the explicit operand "%rax" while the clang
assembler requires specifying the operand "%rax". Fix this by encoding the
instructions using the ".byte" directive.
Reported by: julian
MFC after: 1 week
number of other subsystems, so you probably don't want _SPARE2..
ktr needs an overhaul to really only compile in the ones you want,
we've long passed the 31 bits it provides..
Sponsored by: transip.nl
if_vmove().
In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.
This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.
Differential Revision: https://reviews.freebsd.org/D1859
SCSI-2 devices.
Some older tape devices claim to be SCSI-2, but actually do support
long position information. (Long position information includes
the current file mark.) For example, the COMPAQ SuperDLT1.
So we now only disable the check on SCSI-1 and older devices.
sys/cam/scsi/scsi_sa.c:
In saregister(), only disable fetching long position
information on SCSI-1 and older drives. Update the
comment to explain why.
Confirmed by: dvl
Sponsored by: Spectra Logic
MFC after: 3 weeks
draft-ietf-6man-enhanced-dad-13.
This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet. This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).
The length of the nonce is set to 6 bytes. This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.
Reported by: hiren
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D1835
This is a QCA9558 SoC (2ghz 3x3) with an atheros 11ac PCIe 5GHz 3x3
NIC and an AR8327 gigabit ethernet switch.
TODO:
* The AR8327 gigabit switch support bugfixes are forthcoming.
* 11ac support and 11ac NIC support
This is enough to bring up the basic SoC support.
What works thus far:
* The mips74k core, pll setup, and UART (or else well, stuff would
be really difficult..)
* both USB 2.0 EHCI controllers
* on-board 2GHz 3x3 wifi (the other variant has 2GHz/5GHz wifi on-chip);
* arge0 - not yet sure why arge1 isn't firing off interrupts and thus
handling traffic, but I will soon figure it out and fix it here.
Tested:
* AP135 reference design, QCA9558 SoC, pretending to be an 11n
2GHz AP.
TODO:
* There's an interrupt mux hooking up devices to IP2 and IP3 - but it's
not a read-and-clear or write-to-clear register. So, trying to use it
naively like I have been ends up with massive interrupt storms.
For now the things that share those interrupts can just take them as
shared interrupts and try to play nice.
* There's two PCIe root complexes /and/ one of them can actually be
a PCIe device endpoint. Yes, you heard right. I have to teach the
AR724x PCIe bridge code to handle multiple instances with multiple
memory/irq regions, and then there'll be RC support, but EP support
isn't on my TODO list.
* I'm not sure why arge1 isn't up and running. I'll go figure that
out soon and fix it here.
Thankyou to Qualcomm Atheros for providing me with hardware and
an abundance of documentation about these things.
the CPU nexus.
* Add ahb as a possible bus attachment
* Lay a comment down to remind me or whoever else ends up trying
to debug why the EEPROM isn't mapped in as to what's going on.
There's two EHCI controllers in the QCA955x SoCs - they have different
interrupts available via various demux registers, but they both tie to
IP3.
So for now, allow them to be sharable so they can hang off of IP3.
error cases. Calling brelse() with a NULL pointer is not allowed,
so only call brelse() when the bp is non-NULL.
Reported by: Maxime Villard (reported as uninitialized variable)
Fix an extremely subtle concurrency bug triggered by running on 32-thread
POWER8 systems. During thread switch, there was a very small window when
the stack pointer was set to the stack pointer of the outgoing thread, but
after the lock on that thread had already been released.
If, during that window, the outgoing thread were rescheduled on another CPU
and begin execution and an exception were taken on the original CPU, the
trap handler and the outgoing thread would simultaneously execute on the same
stack, causing memory corruption. Fix this by making sure to release the
old thread only after cpu_switch() is done with its stack.
MFC after: 2 weeks
I noticed that openwrt/linux does this, citing "instability", so
until they figure out why I'm going to disable it here as well.
Tested:
* QCA AP135 - QCA955x SoC + AR8327 switch.
So, it turns out that the AR8327 has 7 ports internally:
* GMAC0 / external (CPU) MAC0
* GMAC1 / port1 -> GMAC5 / port5: external switch port PHYs
* GMAC6 / external (CPU) MAC1
Now, depending upon how things are wired up, the second CPU port (MAC1)
can be wired to either the switch (port6), or through port5's PHY, bypassing
the GMAC+switch entirely. Ie, it can pretend to be a boring PHY, saving
system designers from having to include a separate PHY for a "WAN" port.
Here's the rub - the AP135 board (QCA955x SoC) hooks up arge0 to
the second CPU port on the AR8327, but it's hooked up as RGMII.
So, in order to hook it up to the rest of the switch, it isn't configured
as a separate PHY - OpenWRT has it setup as connected via RGMII to
GMAC6 and (I'm guessing) it's set to be a WAN port by configuring up
port-based VLANs or something.
Thus, with a port mask of 0x3f, GMAC6 was never allowed to receive traffic
from any other port. It could transmit fine, but not receive anything.
So, now it works enough for me to continue doing board bootstrapping.
Note, this isn't enough to make the QCA955x + AR8327 work - there's
a bunch of uncommitted work to both the platform SoC (interrupt handling,
ethernet, etc) and the ethernet switch (register access space, setup, etc)
that needs to happen. However, this particular change is also relevant to
other SoCs, like the AR934x and AR7161, both of which can be glued to
this switch.
Tested:
* AP135 development board
TODO:
* Figure out whether I can somehow abuse another port mode to have this
be a pass-through PHY, or whether I should just create some more boot
time hints to explicitly set up port-based isolation so this works
in a more useful way by default.
The main purpose of this feature is to be able to unload a KMS driver.
When going back from the current vt(4) backend to the previous backend,
the previous backend is reinitialized with the special VDF_DOWNGRADE
flag set. Then the current driver is terminated with the new "vd_fini"
callback.
In the case of vt_fb and vt_vga, this allows the former to pass the
vgapci device vt_fb used to vt_vga so the device can be rePOSTed.
Differential Revision: https://reviews.freebsd.org/D687
expected to return the data in the memory location pointed at by target
after the operation. The FreeBSD atomic functions previously used return
either 0 or 1 to indicate if the comparison succeeded or not respectively.
With this change these functions only support ARMv6 and later are supported
by these functions.
Sponsored by: ABT Systems Ltd
There's a lot more to come - the QCA955x has a bunch more GPIO MUX
configuration, reminiscent of what the ARM chips let you do - but
it'll have to come later.
Change the numeric value of IPI_STOP_HARD so it doesn't occupy a valid IPI
slot. This can be done because IPI_STOP_HARD is actually delivered via NMI.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D1983
Pass all SR-IOV configuration to the kernel using an nvlist. The
main benefit that this offers is flexibility. It allows a driver
to accept any number of parameters of any type supported by the
SR-IOV configuration infrastructure with having to make any
changes outside of the driver.
It also offers the user very fine-grained control over the
configuration of the VFs -- if they want, they can have different
configuration applied to every VF.
Differential Revision: https://reviews.freebsd.org/D82
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Add a function that validates that the user-provided SR-IOV
configuration is valid. This includes basic checks that the
structure of the configuration is correct (e.g. all required
configuration nodes are present) as well as validating against
a configuration schema.
The schema validation consists of:
- Ensuring that all required config parameters are present.
- If the schema defines a default value for a parameter,
adding the default value if the parameter is not set.
- Ensuring that no parameters are specified in the config
that are not defined in the schema.
- Ensuring that have the correct type defined in the schema.
- Ensuring that no configuration nodes are present for devices
that do not exist. For example, if 2 VFs are configured,
then we validate that a node called VF-5 does not exist.
Differential Revision: https://reviews.freebsd.org/D81
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
When creating VFs, we must size each SR-IOV BAR on the PF and
allocate a configuous I/O memory window large enough for every VF.
However, the window only needs to be aligned to a boundary equal
to the size of the window for a single VF.
When a VF attempts to allocate an I/O memory resource, we must
intercept the request in the pci driver and pass it off to the
SR-IOV code, which will allocate the correct window from the
pre-allocated memory space for the PF.
Inform the pci driver about the size and address of the BARs on
the VF when the VF is created. This is required by pciconf -b and
bhyve.
Differential Revision: https://reviews.freebsd.org/D78
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
The SR-IOV standard requires VFs to read all-ones when the VID
and DID registers are read. The VMM (hypervisor) is required to
emulate them instead. Make pci_read_config() do this emulation.
Change pci_user.c to use pci_read_config() to read config space
registers instead of going directly to the pcib so that the
emulated VID/DID registers work correctly on VFs. This is
required both for pciconf and bhyve PCI passthrough.
Differential Revision: https://reviews.freebsd.org/D77
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Differential Revision: https://reviews.freebsd.org/D76
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Allow the ppt driver to attach to devices that were hinted to be
passthrough devices by the PCI code creating them with a driver
name of "ppt".
Add a tunable that allows the IOMMU to be forced to be used. With
SR-IOV passthrough devices the VFs may be created after vmm.ko is
loaded. The current code will not initialize the IOMMU in that
case, meaning that the passthrough devices can't actually be used.
Differential Revision: https://reviews.freebsd.org/D73
Reviewed by: neel
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor PCI resource allocation code to allow a request for a
memory-mapped I/O window that is a multiple of a requested size.
This is needed by the SR-IOV code because the VF BARs are all
allocated contiguously. We can't just allocate a resource that is
a multiple of a single VF BAR because the size of an allocation
implies its alignment requirement.
Differential Revision: https://reviews.freebsd.org/D71
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
Refactor creation of PCI devices into helper methods that can be
used by the VF creation code.
Differential Revision: https://reviews.freebsd.org/D67
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
watchdog.c does an #ifdef DDB but does not #include "opt_ddb.h".
Fixing this turned up a missing include file.
MFC after: 1 week
X-MFC-With: r261495, r279410
When sendfile_getobj() is called on a DTYPE_SHM file, it never
initializes error, which is eventually returned to the caller.
Differential Revision: https://reviews.freebsd.org/D1989
Reviewed by: kib
Reported by: Brainy Code Scanner, by Maxime Villard.
property for devices that doesn't descend directly from gpiobus.
The parser supports multiple pins, different GPIO controllers and can use
arbitrary names for the property (to match the many linux variants:
cd-gpios, power-gpios, wp-gpios, etc.).
Pass the driver name on ofw_gpiobus_add_fdt_child(). Update gpioled to
match.
An usage example of ofw_gpiobus_parse_gpios() will follow soon.
x2APIC mode is detected and enabled. Current theory is that switching
the APIC mode while an IPI is in flight might be the issue.
Postpone switching to x2APIC mode until we are guaranteed that all
starting IPIs are already send and aknowledged. Use aps_ready signal
as an indication that the BSP is done with us.
Tested by: adrian
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
Do not ever return doomed vnode from lookup. This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.
In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called. Prevent the situation by doing
vfs_busy() before unlocking dvp. Reuse the vn_vget_ino_gen() helper.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
I2C real-time clock (RTC).
The DS3231 has an integrated temperature-compensated crystal oscillator
(TXCO) and crystal.
DS3231 has a temperature sensor, an independent 32kHz output (which can be
turned on and off by the driver) and another output that can be used as
interrupt for alarms or as a second square-wave output, which frequency and
operation mode can be set by driver sysctl(8) knobs.
Differential Revision: https://reviews.freebsd.org/D1016
Reviewed by: ian, rpaulo
Tested on: Raspberry pi model B
The current GSO implementation in netback is broken and causes errors on the
guest tx path. While this is fixed disable GSO in order to have a working
netback.
Sponsored by: Citrix Systems R&D
Discussed with: gibbs
The optimization made in r239940 is valid for struct mbuf's current structure
and size in FreeBSD, but hardcodes assumptions about sizes of struct mbuf,
which are unfortunately broken if additional data is added to the beginning of
struct mbuf
X-MFC note (discussed with rwatson):
This change requires the MPKTHSIZE definition, which is only available after
head@r277203 and will not be MFCed as it breaks mbuf(9) KPI.
A direct commit to stable/10 and merges to other branches to add the necessary
definitions to work with the code as-is will be done to facilitate this MFC
PR: 194314
MFC after: 2 weeks
Approved/Reviewed by: erj, jfv
Sponsored by: EMC / Isilon Storage Division
currently a spin lock. Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock. Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.
Change umtx_lock to be the sleepable mutex. For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.
Discussed with: jhb
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
from the git tree. This merges a lot that we're not using, but there's
too many files to be selective and have a hope of catching everything.
If there are conflicts with the rest of the tree, we'll resolve them
on a case by case basis.
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
This will override the resource allocation of simplebus, and also
merge the resource allocation code which was in xlp_pci.c.
With this change the SoC devices that does not have proper PCI
resources will be on the FDT simplebus. We can remove
sys/mips/nlm/dev/cfi_pci_xlp.c and sys/mips/nlm/dev/uart_pci_xlp.c
tracking.
It is important to subtract the residual from the requested
transfer size to see how much data was actually transferred. With
tape drives in particular, it is common to request more data than is
returned.
Also, add I/O latency tracking for CAM requests issued by
cam_periph_runccb().
If the caller supplies a struct devstat, and the I/O is a SCSI or
ATA I/O, we will track the elapsed time to provide I/O latency
statistics for the request.
sys/cam/scsi/cam_periph.c:
In cam_periph_runccb(), subtract the residual when reporting I/O
totals to devstat(9) for SCSI and ATA passthrough requests.
In cam_periph_runccb(), grab the I/O start time and supply
the start time to devstat_end_transaction() so that it can
calculate the elapsed I/O time.
Sponsored by: Spectra Logic
MFC after: 1 week
consistently. This also matches the per-cpu pointer declaration
anyway.
This changes the tweak we give to the load from -32..31 to be 0..31
which seems more inline with the rest of the code (- rnd and the -=
64). It should also provide the randomness we need, and may fix a
signedness bug in the old code (it isn't clear that the effect was
intentional as opposed to sloppy, and the right shift of a signed
value is undefined to boot).
This stores sched_balance() behavior when it used random().
Differential Revision: https://reviews.freebsd.org/D1981
Provide sys/dev/fdt/simplebus.h with the class declaration so that it
is possible to subclass FDT simplebus.
Differential Revision: https://reviews.freebsd.org/D1886
Reviewed by: nwhitehorn, imp
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.
Sponsored by: EMC / Isilon Storage Division
Submitted by: Conrad Meyer
MFC after: 1 week
jail's creation parameters. This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.
The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.
The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).
There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design. The system
administrator is trusted to set sane values. Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.
Differential Revision: https://reviews.freebsd.org/D1948
Relnotes: yes
Summary:
For new eMMC chips, we must signal controller HC capability in OP_COND command.
Reviewers: imp, ian
Reviewed By: ian
Differential Revision: https://reviews.freebsd.org/D1920
the cache line flush in the LAPIC page, keep direct map page covering
LAPIC mapped uncached.
To have the (incomplete) check for the LAPIC range in
pmap_invalidate_cache_range() working, lapic_paddr must be initialized
in x2APIC mode too.
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
With the patch applied the number of instruction events is 1% less and
number of mispredicted branch events is 5% less under multistream TCP
traffic load close to line rate.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
configured as 11b.
This came up when debugging other issues surrounding scanning and
channel modes.
What's going on:
* The VAP comes up as an 11b VAP, but on an 11n capable NIC;
* .. it announces HTINFO and MCS rates;
* The AP thinks it's an 11n capable device and transmits 11n frames
to the STA;
* But the STA is in 11b mode, and thus doesn't receive/ACK the frames.
It didn't happen for the ath(4) devices as the AR5416/AR9300 HALs
unconditionally enable MCS frame reception, even if the channel
mode is not 11n. But the Intel NICs are configured in 11b/11a/11g
modes when doing those, even if 11n is enabled and available.
So, don't announce 11n capabilities if the VAP isn't on an 11n
channel when sending management assocation request / reassociation
request frames.
TODO:
* Lots more testing - 11n should be "upgraded" after association,
and I just want to make sure I haven't broken 11n upgrade.
I shouldn't have - this is only happening for /sending/ association
requests, which APs aren't doing.
Tested:
* ath(4) APs (AR9331, AR7161+AR9280, AR934x)
* AR5416, STA mode
* Intel 5100, STA mode
PR: kern/196290
we need randomness in ULE. This removes random() call from the
rebalance interval code.
Submitted by: Harrison Grundy
Differential Revision: https://reviews.freebsd.org/D1968
Handling some interrupts in XLP (like PCIe and SATA) involves writing to
vendor specific registers as part of interrupt acknowledgement.
This was earlier done with xlp_establish_intr(), but a better solution
is to provide a function xlp_set_bus_ack() that can be used with
cpu_establish_hardintr(). This will allow platform initialization code to
setup these ACKs without changing the standrard drivers.
CDAI_FLAG_NONE advanced information CCB flag.
Support for the flag was merged to stable/10 in r279329, and the
__FreeBSD_version in stable/10 was bumped to 1001510.
Check for that version in the mps(4) and mpr(4) drivers when determining
whether to use the flag.
Sponsored by: Spectra Logic
MFC after: 3 days
r278854 introduced a race in the event channel handling code. We must make
sure that the pending bit is cleared before executing the filter, or else we
might miss other events that would be injected after the filter has ran but
before the pending bit is cleared.
While there also mask event channels while FreeBSD executes the ithread
bound to that event channel. This refrains Xen from injecting more
interrupts while the ithread has not finished it's work.
Sponsored by: Citrix Systems R&D
Reported by: sbruno, robak
Tested by: robak
level-triggered interrupt does not broadcast the EOI message to all
APICs in the system. Instead, interrupt handler must follow LAPIC EOI
with IOAPIC EOI. For modern IOAPICs, the later is done by writing to
EOIR register. Otherwise, Intel provided Linux with a trick of
temporary switching the pin config to edge and then back to level.
Detect presence of EOIR register by reading IO-APIC version. The
summary table in the comments was taken from the Linux kernel. For
Intel, newer IO-APICs are only briefly documented as part of the
ICH/PCH datasheet. According to the BKDG and chipset documentation,
AMD LAPICs do not provide EOI suppression, althought IO-APICs do
declare version 0x21 and implement EOIR.
The trick to temporary switch pin to edge mode to clear IRR was tested
on modern chipset, by pretending that EOIR is not present, i.e. by
forcing io_haseoi to zero.
Tunable hw.lapic_eoi_suppression disables the optimization.
Reviewed by: neel
Tested by: pho
Review: https://reviews.freebsd.org/D1943
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
Gather all the IRQ definitions to interrupt.h. Earlier these were in xlp.h
and pic.h. Update the definition of XLP_IRQ_IS_PICINTR to check for last
irq as well.
declares support for it. Newer versions of Xen works fine with x2APIC
code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite
x2APIC mode being known to hypervisor.
Discussed with: royger
Sponsored by: The FreeBSD Foundation
to its previous, unowned state. This avoids compounding an existing
problem of inconsistent ownership.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from: Dell Inc.
PR: 198914
MFC after: 1 week
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi. The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from: Dell Inc.
PR: 198914
of packets. When the data payload length excluding any headers, of an
outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in
ip_fragment() can kick in to optimise the outgoing payload(s). The
code which was added in r98849 as part of zero copy socket support
assumes that the beginning of any MTU sized payload is aligned to
where a MBUF's "m_data" pointer points. This is not always the case
and can sometimes cause large IPv4 packets, as part of ping replies,
to be split more than needed.
Instead of iterating the MBUFs to figure out how much data is in the
current chain, use the value already in the "m_pkthdr.len" field of
the first MBUF in the chain.
Reviewed by: ken @
Differential Revision: https://reviews.freebsd.org/D1893
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
dates.
- Changed all of the PCI device strings from LSI to Avago Technologies (LSI).
- Added a sysctl variable to control how StartStopUnit behavior works. User can
select to spin down disks based on if disk is SSD or HDD.
- Inquiry data is required to tell if a disk will support SSU at shutdown or
not. Due to the addition of mpssas_async, which gets Advanced Info but not
Inquiry data, the setting of supports_SSU was moved to the
mpssas_scsiio_complete function, which snoops for any Inquiry commands. And,
since disks are shutdown as a target and not a LUN, this process was
simplified by basing it on targets and not LUNs.
- Added a sysctl variable that sets the amount of time to retry after sending a
failed SATA ID command. This helps with some bad disks and large disks that
require a lot of time to spin up. Part of this change was to add a callout to
handle timeouts with the SATA ID command. The callout function is called
mpssas_ata_id_timeout(). (Fixes PR 191348)
- Changed the way resets work by allowing I/O to continue to devices that are
not currently under a reset condition. This uses devq's instead of simq's and
makes use of the MPSSAS_TARGET_INRESET flag. This change also adds a function
called mpssas_prepare_tm().
- Some changes were made to reduce code duplication when getting a SAS address
for a SATA disk.
- Fixed some formatting and whitespace.
- Bump version of mps driver to 20.00.00.00-fbsd
PR: 191348
Reviewed by: ken, scottl
Approved by: ken, scottl
MFC after: 2 weeks
this change is to improve concurrency:
- Drop global state stored in the shadow overflow page table (and all other
global state)
- Remove all global locks
- Use per-PTE lock bits to allow parallel page insertion
- Reconstruct state when requested for evicted PTEs instead of buffering
it during overflow
This drops total wall time for make buildworld on a 32-thread POWER8 system
by a factor of two and system time by a factor of three, providing performance
20% better than similarly clocked Core i7 Xeons per-core. Performance on
smaller SMP systems, where PMAP lock contention was not as much of an issue,
is nearly unchanged.
Tested on: POWER8, POWER5+, G5 UP, G5 SMP (64-bit and 32-bit kernels)
Merged from: user/nwhitehorn/ppc64-pmap-rework
Looked over by: jhibbits, andreast
MFC after: 3 months
Relnotes: yes
Sponsored by: FreeBSD Foundation
It is disabled by default but users can set IFCAP_TXCSUM on the
netmap ifnet (ifconfig ncxl0 txcsum) to override netmap and force
the hardware to calculate and insert proper IP and L4 checksums in
outbound frames.
MFC after: 2 weeks
Previous __alignment(4) allowed compiler to assume that operations are
performed on aligned region. On ARM processor, this led to alignment fault
as shown below:
trapframe: 0xda9e5b10
FSR=00000001, FAR=a67b680e, spsr=60000113
r0 =00000000, r1 =00000068, r2 =0000007c, r3 =00000000
r4 =a67b6826, r5 =a67b680e, r6 =00000014, r7 =00000068
r8 =00000068, r9 =da9e5bd0, r10=00000011, r11=da9e5c10
r12=da9e5be0, ssp=da9e5b60, slr=a054f164, pc =a054f2cc
<...>
udp_input+0x264: ldmia r5, {r0-r3, r6}
udp_input+0x268: stmia r12, {r0-r3, r6}
This was due to instructions which do not support unaligned access,
whereas for __alignment(2) compiler replaced ldmia/stmia with some
logically equivalent memcpy operations.
In fact, the assumption that 'struct ip' is always 4-byte aligned
is definitely false, as we have no impact on data alignment of packet
stream received.
Another possible solution would be to explicitely perform memcpy()
on objects of 'struct ip' type, which, however, would suffer from
performance drop, and be merely a problem hiding.
Please, note that this has nothing to do with
ARM32_DISABLE_ALIGNMENT_FAULTS option, but is related strictly to
compiler behaviour.
Submitted by: Wojciech Macek <wma@semihalf.com>
Reviewed by: glebius, ian
Obtained from: Semihalf
code.
Resurrect the state field in the struct secpolicy, it has
IPSEC_SPSTATE_ALIVE value when security policy linked in the chain,
and IPSEC_SPSTATE_DEAD value in all other cases. This field protects
from trying to unlink one security policy several times from the different
threads.
Take additional reference in the key_flush_spd() to be sure that policy
won't be freed from the different thread while we are sending SPDEXPIRE message.
Add KEY_FREESP() call to the key_unlink() to release additional reference
that we take when use key_getsp*() functions.
Differential Revision: https://reviews.freebsd.org/D1914
Tested by: Emeric POUPON <emeric.poupon at stormshield dot eu>
Reviewed by: hrs
Sponsored by: Yandex LLC
when re-enumerating a FULL speed device. Else the wrong max packet
setting might be used when trying to re-enumerate a FULL speed device.
MFC after: 3 days
Preliminary tests indicate 32 Mpps on tx, 24 Mpps on rx
with source and receiver on two different ports of the same 40G card.
Optimizations are likely possible.
The code follows closely the one for ixgbe so i do not
expect stability issues.
Hardware kindly supplied by Intel.
Reviewed by: Jack Vogel
MFC after: 1 week