Update of common code to provide a query on the MAC_SPOOFING_TX and
CHANGE_MAC privileges instead of the deprecated MAC_SPOOFING privilege.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4436
from 1500 to 1496 bytes. The MTU should remain at 1500, extending the
frame size as per IEEE 802.3. Adding IFCAP_VLAN_MTU to the
if_capabilities field in the smsc driver solves the problem. The
datasheet for the LAN9512 chip, section 3.2.3 states that the chip
supports the extended frame.
Submitted by: rpp@ci.com.au
MFC after: 1 week
PR: 205050
include the following list of changes:
- Added eswitch ACL table management
Introduce API for managing ACL table.
This API include the following features:
1) vlan filter - for VST/VGT+ support.
2) spoofcheck.
3) robust functionality to allow/drop general untagged/tagged traffic.
4) support for both ingress and egress ACL types.
- Added loopback filter to the vacl table.
- Added multicast list set in the vPort context
- Added promiscuous mode set in the vPort context
- Set the vlan list in vPort context
1) Check caps if VLAN list is not longer than FW supports
2) Set MODIFY_NIC_VPORT_CONTEXT command
- Changed MLX5_EEPROM_MAX_BYTES from 48 to 32 so that a single EEPROM
reading cannot cross the 128-byte boundary. Previously reading the
MCIA register was done in batches of 48 bytes. The third reading
would then by-pass the 127th byte, which means that part of the low
page and part of the high page would be read at the same time, which
created a bug:
1st: 0-47 bytes
2nd: 48-95 bytes
3rd: 96-143 bytes
MFC after: 1 week
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D4411
driver. This includes binding all interrupt and worker threads
according to the RSS configuration, setting up correct Toeplitz
hashing keys as given by RSS and setting the correct mbuf
hashtype for all received traffic.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D4410
completion events can be moderated in the same way like RX completion
events. Expose this functionality by a sysctl variable.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D4409
Set the port MTU and then query it and report if any problems instead.
MFC after: 1 week
Submitted by: Shahar Klein <shahark@mellanox.com>
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D4408
TLV routines use 'uint8_t *', NVRAM code uses caddr_t. Just cast to
required type to fix the warning.
Required to build with -Werror=pointer-signg.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4391
The header is not present on FreeBSD, but exists on OmniOS where sfxge
common code is used as well.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4390
It is better do not mix TxQ creation and receive event flags since only
checksum flags are applicable to TxQ.
Also it will allow to add a new TxQ creation specific flags.
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4389
When writing the MUM firmware the chunk size must be equal to the erase
size.
Submitted by: Laurence Evans <levans at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4388
Use flag on vadapter alloc when reported as a supported capability.
Use the slow device reset only when the capability is missing.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4387
iscsi's shutdown_pre_sync prio was SHUTDOWN_PRI_FIRST which caused it to
run before other high priority handlers such as filesystems e.g. ZFS.
This meant the iscsi sessions where removed before the ZFS geom consumer
was closed, resulting in a panic from g_access calls on debug kernels
due to negative acr.
Instead use the same as the old iscsi_initiator SHUTDOWN_PRI_DEFAULT-1
which allows it to run before dashutdown etc but after filesystems.
MFC after: 2 weeks
Sponsored by: Multiplay
The erase size is reported by the nvram info command.
Submitted by: Paul Fox <pfox at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4386
- Restore R92C_TXDW4_HWSEQ_EN bit - it is used by non-8188EU chips.
- Fix DRVRATE bit usage.
Tested with:
- RTL8188EU, STA mode.
- RTL8188CUS, STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4352
Required to build with -Werror=unused-but-set-variable.
Keep it under #if 0 as a reminder for parse error processing.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
The SFL9122 "Huntington" controller was never built.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4355
If, for example, a VF is configured to use a 1500 byte MTU, but the port
it is attached to is set to 9000 bytes, overlength frames can be received
by the VF. As Huntington scatters by default, these overlength packets
would be scattered across several descriptors, with all except the last
having the CONT bit set.
To avoid this, disable scatter when creating RXQs if the firmware
supports doing so, which all recent versions do. Then we only get
a single descriptor from an overlength frame. This will have the CONT
bit set to indicate it was truncated, so we can discard it.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4354
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4353
camdd(8) utility.
CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.
While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.
Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.
The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.
There are some things things would be good to add:
1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.
2. Add an ioctl to list currently outstanding CCBs in the various
queues.
3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.
4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.
5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.
Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.
This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.
It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.
It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.
The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.
camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.
For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.
For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.
Things that would be nice to do for camdd(8) eventually:
1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.
2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.
3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.
4. Add support for SATA device passthrough I/O.
5. Add support for random LBAs and/or lengths on the input and
output sides.
6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.
sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.
Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.
sys/cam/scsi/scsi_pass.c:
Add asynchronous CCB support.
Add two new ioctls, CAMIOQUEUE and CAMIOGET.
CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.
When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.
If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.
The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.
The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.
The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.
The new passiocleanup() function restores data pointers in
user CCBs and frees memory.
Add new functions to support kqueue(2)/kevent(2):
passreadfilt() tells kevent whether or not the done
queue is empty.
passkqfilter() adds a knote to our list.
passreadfiltdetach() removes a knote from our list.
Add a new function, passpoll(), for poll(2)/select(2)
to use.
Add devstat(9) support for the queued CCB path.
sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.
sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)
sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.
sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().
sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.
sys/dev/md/md.c:
Add BIO_VLIST support to md(4).
sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.
sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.
Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.
Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.
sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().
sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.
This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.
sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.
sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().
share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.
usr.sbin/Makefile:
Add camdd.
usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).
usr.sbin/camdd/camdd.8:
Man page for camdd(8).
usr.sbin/camdd/camdd.c:
The new camdd(8) utility.
Sponsored by: Spectra Logic
MFC after: 1 week
Note that the MW allocation still must be BAR *aligned*. So, this only
loosens the constraints on MW allocation slightly. BAR-aligned does not
play well with large (GB+) BAR sizes.
Going forward, if anyone cares about if_ntb on very large BARs, I
suggest they add functionality to allocate a smaller window than the BAR
size, and set the BAR range to cover a window much larger than the
allocated window. This will require negotiating a window offset and
limit for protocol traffic. None of this is implemented in this
revision.
Sponsored by: EMC / Isilon Storage Division
typically memory mapped bus, for example on the AMD Opteron A1100 the AHCI
device is mapped in the CPUs address space, and not through a PCI
controller.
Further work is needed for this to work with ACPI as this is expected to be
common on ARMv8 servers.
Reviewed by: mav, mmel
Obtained from: mmel, ABT Systems Ltd
Relnotes: yes
Sponsored by: SoftIron Inc
Differential Revision: https://reviews.freebsd.org/D4269
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4331
It seems the EEE made RX MAC enter LPI(Low Power Idle) mode such
that dwc(4) was not able to receive packets. Ideally dwc(4) should
be able to use EEE to save power during periods of low link
utilization(i.e. gating off clock). Due to lack of dwc(4)
datasheet it's not easy to take required steps for EEE on LPI
enter/exit events. Disabling EEE in PHY seems to be easy
workaround until dwc(4) supports EEE.
Updating EEE advertisement register on RTL8211F seems to have no
effect until reprogramming MII_ANAR, MII_100T2CR and MII_BMCR
with auto-negotiation. It's not clear whether it's related with
mii_phy_reset()'s BMCR_ISO handling for RTL8211F though.
It seems rgephy_reset() needs careful investigation for newer
RealTek PHYs.
Ganbold submitted working version based on NetBSD change and
tested lots of changes I made. Thanks a lot!
Submitted by: ganbold (initial version)
In collaboration with: ganbold
Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.
Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).
T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).
One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.
The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.
The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.
Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio
the firmware because the original flowc sent by t4_tom changed the
firmware-internal state of the tid. For now I'm setting MAX_DSL to 8K
and PDU accordingly. This will be tidied up with the next firmware
update that will handle multiple flowc's correctly.
After an MC reboot, a VF driver may reset before the PF driver has
finished bringing everything back up. This includes the VFs EVB port.
MC_CMD_VADAPTOR_ALLOC is the first MCDI call after an MC reboot to
require the EVB port, so if it fails with MC_CMD_ERR_NO_EVB_PORT,
retry the command a few times after waiting a while.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4333
Make link control privilege visible to OS driver to guard updates to
flow control and PHY advertised capabilities.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4330
It allows manftest to program them.
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4329
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4328
Common code should infer other privileges from Admin privilege to
support firmware that pre-dates introduction of specific privilege
flags.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4327
Submitted by: Artem V. Andreev <Artem.Andreev at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4308
It is really observed in the case of VLAN over sfxge interface.
Also this change makes total value equal to 35 which is default assumed
by the kernel for if_hw_tsomaxsegcount.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4319
IPv4/IPv6 checksum offloading and VLAN tag insertion/stripping.
Since uether doesn't provide a way to announce driver specific offload
capabilities to upper stack, checksum offloading support needs more work
and will be done in the future.
Special thanks to Hayes Wang from RealTek who gave input.
tested on the Broadwell-Xeon with a hacked up version of pmcstudy -T. I still need
to circle back and add in to pmcstudy all the new tests from the Broadwell Vtune
guide (for the hacked up version I just made it so I could run the -T option). The
Skylake CPU is not yet available (even though Intel is advertising it .. imagine that).
The Skylake PMC's will need to be tested once we can get a sample skylake CPU :-)
Sponsored by: Netflix Inc.
This change will fix kernel panic with uninitialized (zeroed)
RXON structure.
Tested with Intel 3945BG, IBSS mode.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4304
The synth programming here requires the real centre frequency,
which for HT20 channels is the normal channel, but HT40 is
/not/ the primary channel. Everything else was using 'freq',
which is the correct centre frequency, but the hornet config
was using 'ichan' to do the lookup which was also the primary
channel.
So, modify the HAL call that does the mapping to take a frequency
in MHz and return the channel number.
Tested:
* Carambola 2, AR9331, tested both HT/20 and HT/40 operation.
On ARM, we must ensure proper interdevice write ordering.
The AHCI interrupt status register must be updated in HW before
registers in interrupt controller.
Unfortunately, only way how we can do it is readback.
Discussed with: mav
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D4240
Using a typedef for common code return types (rather than "int")
allows the Prefast static analyser to understand when a function
has been successful (and thus when its postconditions must hold).
This greatly reduces then number of false positives reported by
prefast for error paths in common code functions.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Dynamic config partitions on boards that support RFID are divided into
a number of segments, each formatted like a partition, with header,
trailer and end tags. The first segment is the current active
configuration.
The segments are initialised by manftest and each contain a different
configuration e.g. firmware variant. The firmware can be instructed
via RFID to copy a segment over the first segment, hence changing the
active configuration. This allows ops to change the configuration of
a board prior to shipment using RFID.
Changes to the dynamic config may need to be written to all segments (in
particular firmware versions written by manftest) or just the first
segment (changes to the active configuration). See SF-111324-SW.
If only the first segment is written the code still needs to be aware of
the possible presence of subsequent segments as writing to a segment may
cause its size to increase, which would overwrite the subsequent
segments and invalidate them.
Boards that do not support RFID will only have one segment in their
dynamic config partition.
Submitted by: Paul Fox <pfox at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4302
This probe/attaches correctly in my local branch and now displays
a useful message:
ath0: <Qualcomm Atheros QCA953x> at mem 0x18100000-0x1811ffff irq 0 on nexus0
...
ath0: AR9530 mac 1280.0 RF5110 phy 0.0
If the VPD is corrupt and contains an 'RV' keyword before the
END tag, then this function could return without setting the
return code to report the error.
Found by prefast.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Keep prefast happy by returning the initial queue index
from falconsiena_tx_qcreate(). No change in behaviour, as
etxo_qcreate already zeros *addedp before the call.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
PIO is not yet supported in the FreeBSD driver.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Adjust external port mapping table to distinguish Pavia from Monza.
Now the presence of any 40G mode implies at least 2 outputs per
external port. So Pavia 4x10G ports are now mapped to 1,2,3,4;
Monza 4x10G ports map to 1,1,2,2 as before.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
GCC 4.2.1 used on FreeBSD 8 and 9 branches does not like unnamed
union member in the structure. It is not strictly required in head,
but nice to have to minimize difference with out-of-tree driver.
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
- Make scan aborted by event restart immediately and infinitely.
- Improve handling of some loop events from firmware.
- Remove loop down timer, adding its functionality to scanner thread.
- Some more unification and simplification.
I added MYBEACON support a while ago to listen to beacons that are only
for your configured BSSID. For AR9380 and later NICs this results in
a lot less chip wakeups in station mode as it then only shows you beacons
that are destined to you.
However in IBSS mode you really do want to hear all beacons so you can do
IBSS merges. Oops.
So only use MYBEACON for STA + not-scanning, and just use BEACON for
the other modes it used to use BEACON for.
This doesn't completely fix IBSS merges though - there are still some
conditions to chase down and fix.
And expose vm_memattr_t of current mapping to consumers (as well as the
ability to change it to one of UC, WB, WC).
After short discussion with: jhb (but no review)
Sponsored by: EMC / Isilon Storage Division
- Add error handling for urtwn_(r88e_)read_rom() and
urtwn_efuse_*() functions.
- Remove code duplication between urtwn_efuse_read() and
urtwn_r88e_read_rom().
- Merge r88e_rom and (r92c_)rom structures
(only one of them can be used at the same time).
- Other minor fixes / improvements.
Tested with RTL8188EU, STA mode
(URTWN_DEBUG + USB_DEBUG, hw.usb.urtwn.debug=3, no visual differences).
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4253
Adds a new tunable, ntb.hw.b2b_mw_idx, which specifies the offset (from the
total number of memory windows) to use for register access on hardware with
the SDOORBELL_LOCKUP errata. The default is -1, i.e., the last memory
window.
We map BARs before the b2b_mw_idx is selected, so map them all as memory
windows initially. The register memory window should not be write-combined,
so we explicitly disable WC on the selected MW later.
This introduces a layer of abstraction between consumer memory window
indices, which exclude any exclusive errata-workaround BARs, and internal
memory window indices, which include such BARs. An internal routine,
ntb_user_mw_to_idx(), converts the former to the latter. Public APIs have
been updated to use this instead of assuming the exclusive workaround BAR is
the last available MW.
Sponsored by: EMC / Isilon Storage Division
This should be a big no-op pass; and reduces the size of if_ath.c.
I'm hopefully soon going to take a whack at the USB support for ath(4)
and this'll require some reuse of the busdma memory code.
- In the iSCSI CPL handlers, do not rely on the ulpcb/icl_conn when in
the middle of assembling a PDU. This is so we don't have to grab
various locks and evaluate the kernel state of the connection multiple
times. Instead, the state is evaluated once after the entire PDU is
received. This requires another ULP specific item in toepcb (ulpcb2).
- If there is data in the so_rcv sockbuf of a connection in iSCSI ULP
mode it must be from before the connection got promoted to ULP mode.
Convert the contents of the sockbuf to PDUs and deliver them to ICL.
Do this before delivering the PDUs received on the "normal" ULP path.
- The receive path within ICL is allowed to sleep so it's not
appropriate to deliver PDUs to ICL from the driver's ithread, or from
any other thread with any mutex held. Use worker threads (created
back in r285650 but unused till now) to dispatch received PDUs to ICL.
Assign a worker thread to each connection. For now everything goes to
the first thread.
- Prevent various bad races that are possible when more than one of
a) rx ithread, b) worker thread, and c) icl_conn_close are active at
the same time.
It is perhaps preferable to have a separate limit for send instead of
reusing the receive limit. I'll discuss with trasz@ and mav@ before
pulling this into head.
bridges. Currently this includes information about what resources a
bridge decodes on the upstream side for use by downstream devices including
bus numbers, I/O port resources, and memory resources. Windows and bus
ranges are enumerated for both PCI-PCI bridges and PCI-CardBus bridges.
To simplify the implementation, all enumeration is done by reading the
appropriate config space registers directly rather than querying the
bridge driver in the kernel via new ioctls. This does result in a few
limitations.
First, an unimplemented window in a PCI-PCI bridge cannot be accurately
detected as accurate detection requires writing to the window base
register. That is not safe for pciconf(8). Instead, this assumes that
any window where both the base and limit read as all zeroes is
unimplemented.
Second, the PCI-PCI bridge driver in a tree has a few quirks for
PCI-PCI bridges that use subtractive decoding but do not indicate that
via the progif config register. The list of quirks is duplicated in
pciconf's source.
Reviewed by: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4171
Hacks to enable target mode there complicated code, while didn't really
work. And for outdated hardware fixing it is not really interesting.
Initiator mode tested with Qlogic 1080 adapter is still working fine.
While later firmware always registers for RSCN requests, older one does
it only in initiator mode. But in target mode there RSCN can be the only
way to detect gone intiator.
For those chips we are not receiving login events, adding initiators
based on ATIO requests. But there is no port ID in that structure, so
in fabric mode we have to explicitly fetch it from firmware to be able
to do normal scan after that.
Lower the payload data (IP) portion of the MTU from 0x10000 to
IP_MAXPACKET (0xFFFF) to avoid panicing the IP stack.
Sponsored by: EMC / Isilon Storage Division
This feature is disabled by default. To enable it, tune
hw.if_ntb.enable_xeon_watchdog to non-zero.
If enabled, writes an unused NTB register every second to demonstrate to
a hardware watchdog that the NTB device is still alive. Most machines
with NTB will not need this -- you know who you are.
Sponsored by: EMC / Isilon Storage Division
This change simplifies and unifies port adding/updating for loop and
fabric scanners. It also fixes problems with scanning restarts due to
concurrent port databases changes. It also fixes many cosmetic issues.
RX buffers from number of received packets.
Differential Revision: https://reviews.freebsd.org/D4178
Submitted by: Drew Gallatin <gallatin@freebsd.org>
Sponsored by: Mellanox Technologies
MFC after: 3 days
Setting sysctl dev....conf.hw_lro may fail if the net device lro is
turned off. Due to the nature of our sysctl handler we need to set the
values back to 0 and issue an error.
Differential Revision: https://reviews.freebsd.org/D4177
Submitted by: Shahar Klein <shahark@mellanox.com>
Sponsored by: Mellanox Technologies
MFC after: 3 days
Discard the unused rx_free_q. Instead, reuse inputed packets by putting
them back on the *pend* queue after reinitialization.
If tx or rx handlers are unavailable, free mbufs rather than leaking
them.
With this change, if_ntb can receive more than 100
(NTB_QP_DEF_NUM_ENTRIES) packets.
Sponsored by: EMC / Isilon Storage Division
32-bit BARs can only address memory mapped in the low 32 bits of
physical RAM. Expose this as a 'plimit' out parameter from
ntb_mw_get_range().
Fix if_ntb to allocate memory within this limit.
Sponsored by: EMC / Isilon Storage Division
Sometimes they'll read spurious values (observed: 0xc on Broadwell-DE),
failing link negotiation.
Discussed with: Dave Jiang, Allen Hubbe
Sponsored by: EMC / Isilon Storage Division
The tunable 'hw.ntb.enable_writecombine' may be set to zero to
administratively disable write combining the mapped NTB region.
Sponsored by: EMC / Isilon Storage Division
Use bus_space_write instead of (non-volatile) C pointer writes via an
iowrite32() shim in the same places as the Dual BSD/GPL Linux driver.
Update some types to fixed 32-bit sizes.
Sponsored by: EMC / Isilon Storage Division
Current Xen resume code clears all pending bitmap IPIs on resume, which is
not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in
order to acknowledge any pending bitmap IPIs.
Sponsored by: Citrix Systems R&D
MFC after: 2 weeks
Modern cards in most cases operate abstract port handles, that have no
any relation to real loop IDs. Leave loopid used only where it really
goes about local loop IDs.
While there, fix few more cases where LUNs were still printed in decimal.
It turns out on a 16550 w/ a 25MHz SoC reference clock you get a little
over 3% error at 115200 baud, which causes this to fail.
Just .. cope. Things cope these days.
Default to 30 (3.0%) as before, but allow UART_DEV_TOLERANCE_PCT to be
set at build time to change that.
- icl_cxgbei_conn is _the_ per-connection softc for iSCSI. Retire
iscsi_socket by removing all unneeded fields and moving the rest to
icl_cxgbei_conn.
- Update pdu_queue to use the new t4_push_pdus and associated mbufq in
the TOE driver. Throw away all the callbacks registered during MOD_LOAD
as t4_push_pdus doesn't need them.
- Use the mbuf allocated for the BHS header to store icl_cxgbei_pdu as
well. This eliminates the custom zone for the PDUs and reduces the
number of allocations on the fast path. For each PDU, the old code
used to allocate an icl_cxgbei_pdu, an mbuf for the BHS, and tags for
the BHS and data mbufs. The new code allocates just one mbuf per PDU.
This is convenient for another reason -- it allows t4_tom to deal with
mbufs (which it understands) instead of having to call into the iSCSI
driver.
- Remove the socket upcalls, calls to ICL_DEBUG and ICL_WARN, and all
code within ICL_KERNEL_PROXY. None of this stuff is actually used by
cxgbei, it's probably leftover copy/paste from icl_soft.
- Fold various icl_foo into icl_cxgbei_foo if the cxgbei implementation
was simply a call to the other function.
- Remove set_tcb_field and use t4_set_tcb_field that's already available
in the base driver.
- Fix connection handoff to not assume that there is only one T4/T5
adapter in the system and that's the one handling all offloaded
connections. Walk the list of adapters and match tp->t_tod with the
adapter's toedev instead. This allows multiple TOE devices of multiple
types to coexist.
- Fix connection teardown by not reaching for the inp via the toepcb but
via the socket instead. If the tid is dead in the hardware then the
inp has already been unhooked from the toepcb by t4_tom.
- Add more CTRs. The ones on the normal fast path are disabled by
default to avoid flooding the log.
- Refine pdu_append_data.
- Other miscellaneous changes.
Switch PCI register reads from using magic numbers to using the names
defined in pcireg.h
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4185
Using other values causes VMXNET3_CMD_ENABLE to fail. The Linux
driver also enforces this restriction.
Reviewed by: bryanv
MFC after: 1 week
Sponsored by: Norse
Differential Revision: https://reviews.freebsd.org/D4139
accident with RTL8168G and later chips when the interface actually was
brought up. This is due to the fact that with these MAC variants, RXDV
gate needs be disabled for WOL to work. So do just that in re_setwol()
when IFCAP_WOL is requested.
Reported and tested by: dhw
MFC after: 3 days
The code tracks a counter which is the number of events until the next
sample. On context switch in, it loads the saved counter. On context
switch out, it tries to calculate a new saved counter.
Problems:
1. The saved counter was shared by all threads in a process. However, this
means that all threads would be initially loaded with the same saved
counter. However, that could result in sampling more often than once every
X number of events.
2. The calculation to determine a new saved counter was backwards. It
added when it should have subtracted, and subtracted when it should have
added. Assume a single-threaded process with a reload count of 1000 events.
Assuming the counter on context switch in was 100 and the counter on context
switch out was 50 (meaning the thread has "consumed" 50 more events), the
code would calculate a new saved counter of 150 (instead of the proper 50).
Fix:
1. As soon as the saved counter is used to initialize a monitor for a
thread on context switch in, set the saved counter to the reload count.
That way, subsequent threads to use the saved counter will get the full
reload count, assuring we sample at least once every X number of events
(across all threads).
2. Change the calculation of the saved counter. Due to the change to the
saved counter in #1, we simply need to add (modulo the reload count) the
remaining counter time we retrieve from the CPU when a thread is context
switched out.
Differential Revision: https://reviews.freebsd.org/D4122
Approved by: gnn (mentor)
MFC after: 1 month
Sponsored by: Juniper Networks
On my own tests I see no effect from this change, but I also can't
reproduce the reported problem in general.
PR: 127391
PR: 204554
Submitted by: satz@iranger.com
MFC after: 2 weeks
Changes to the code to gather user stacks:
* Delay setting pmc_cpumask until we actually have the stack.
* When recording user stack traces, only walk the portion of the ring
that should have samples for us.
Sponsored by: Juniper Networks
Approved by: gnn (mentor)
MFC after: 1 month
Currently, there is a single pm_stalled flag that tracks whether a
performance monitor was "stalled" due to insufficent ring buffer
space for samples. However, because the same performance monitor
can run on multiple processes or threads at the same time, a single
pm_stalled flag that impacts them all seems insufficient.
In particular, you can hit corner cases where the code fails to stop
performance monitors during a context switch out, because it thinks
the performance monitor is already stopped. However, in reality,
it may be that only the monitor running on a different CPU was stalled.
This patch attempts to fix that behavior by tracking on a per-CPU basis
whether a PM desires to run and whether it is "stalled". This lets the
code make better decisions about when to stop PMs and when to try to
restart them. Ideally, we should avoid the case where the code fails
to stop a PM during a context switch out.
Sponsored by: Juniper Networks
Reviewed by: jhb
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D4124
There is no need for the upstream and downstream addresses to be
different for the NTB configs. Go to using a single set of address. It
is still possible to configure them differently using module parameter
override however (CEM: tunable).
Authored by: Dave Jiang <dave.jiang@intel.com>
Reviewed by: Allen Hubbe <Allen.Hubbe@emc.com>
Reviewed by: Jon Mason <jdmason@kudzu.us>
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Remove the use of sbuf_data on drained sbufs from the debug sysctls:
* ixl_sysctl_hw_res_alloc
* ixl_sysctl_switch_config
This prevents a kernel panic when accessing these values under a kernel
compiled with INVARIANTS.
Sponsored by: Multiplay
Order of operations issue with the QP Num and MW count, which would
result in the receive buffer pointer being invalid if there are more
than 1 MW. Corrected with parenthesis to enforce the proper order of
operations.
Reported by: John I. Kading <John.Kading@gd-ms.com>
Reported by: Conrad Meyer <cem@FreeBSD.org>
Authored by: Jon Mason <jdmason@kudzu.us>
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Because it can sleep drainking link work callout(s). Linux (dual
BSD/GPL driver) does something very similar.
At the same time, switch the NTB CTX lock to a non-spin mutex, because
the taskqueue_swi lock can't be taken after a spin mutex.
Suggested by: Witness
Sponsored by: EMC / Isilon Storage Division
In ntb_poll_link, we are intentionally writing the link bit, which is
absent from db_valid_mask. Don't panic on a kassert when we do so.
The Linux version of this (dual BSD/GPL) driver has the db_valid_mask
assertions in callers of db_iowrite() rather than db_iowrite() itself;
it skips the assertions in the equivalent of ntb_poll_link(). Rather
than duplicating the assertions in every caller, add a db_iowrite_raw()
that doesn't check and use it from ntb_poll_link().
Suggested by: kassert_panic
Sponsored by: EMC / Isilon Storage Division
from Mellanox Technologies. The current driver supports ethernet
speeds up to and including 100 GBit/s. Infiniband support will be
done later.
The code added is not compiled by default, which will be done by a
separate commit.
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
the TOE for LAN operation. It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.
None of this affects plain NIC operation in any way.
MFC after: 1 week
- Split urtwn_tx_start() into urtwn_tx_data() and urtwn_tx_start()
(the last will be used for beacon updates / raw xmit path).
- Remove unneeded code from _urtwn_getbuf().
- Use CCK11 for data frames in 11b mode.
- Send EAPOL frames at 1 Mbps.
- Reduce code duplication in urtwn_tx_data().
- Fix sequence numbering.
- Add IEEE80211_RADIOTAP_F_WEP flag for encrypted frames.
- Check URTWN_RUNNING flag under lock.
Tested with RTL8188EU, STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4017
Right now the only way to force a cold reset is:
* The HAL itself detects it's needed, or
* The sysctl, setting all resets to be cold.
Trouble is, cold resets take quite a bit longer than warm resets.
However, there are situations where a cold reset would be nice.
Specifically, after a stuck beacon, BB/MAC hang, stuck calibration results,
etc.
The vendor HAL has a separate method to set the reset reason (which is
how HAL_RESET_BBPANIC gets set) which informs the HAL during the reset path
why it occured. This is almost but not quite the same; I may eventually
unify both approaches in the future.
This commit just extends HAL_RESET_TYPE to include both status (eg BBPANIC)
and type (eg do COLD.) None of the HAL code uses it yet though; that'll
come later.
It also is a big no-op in each HAL - I need to go teach each of the HALs
about cold/warm reset through this path.
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
Sponsored by: Citrix Systems R&D
MFC after: 2 weeks
X-MFC-with: r290610
dev/xen/blkfront/blkfront.c:
- Add and announce support for unmapped IO.
before their initial configuration is done, it turns out that r281337
has the inverse effect on some older chips. Moreover, as with newer
chips before, two chips seemingly identical according to their MAC
revisions may behave differently in this regard, with most working
but a few not, making changes extremely hard to test.
Closer inspection of the corresponding Linux code suggests that RX
and TX should only be enabled after their initial configuration with
RTL8168G and later chips, i. e. RTL8106E{,US}, RTL8107E, as well as
RTL8168{EP,G,GU,H}, so limit the new code path to these. [1]
- Distinguish between RTL8168H and RTL8107E, with the latter being the
10/100-Mbit/s-only variant of the former.
- For MAC variants that can only do Fast Ethernet at a maximum, ensure
that we don't advertise Gigabit Ethernet speed.
- In re_stop(), do the inverse of re_init_locked() and enable RXDV
gate on RTL8168G and later chips again, matching what Linux does.
PR: 203422 [1]
MFC after: 1 week
- Filter out unneeded frames in STA mode.
- Implement ic_promisc() call.
Tested with RTL8188EU, STA and MONITOR modes.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D3999
learned that the Power8 and the PS3 have a mix of OFW and FDT. Both have AIM
defined. But currently they are not affected. They have no I2C devices under
OFW.
This version was tested on a Quad G5 and build tested for armv6*.
Discussed with nwhitehorn@
Reviewed by: ian@
32-bits aligned. Merge the two bounce buffers into a single one. Some
rough tests showed that the DWC OTG throughput on RPI2 increased by
10% after this patch.
MFC after: 1 week