- icl_cxgbei_conn is _the_ per-connection softc for iSCSI. Retire
iscsi_socket by removing all unneeded fields and moving the rest to
icl_cxgbei_conn.
- Update pdu_queue to use the new t4_push_pdus and associated mbufq in
the TOE driver. Throw away all the callbacks registered during MOD_LOAD
as t4_push_pdus doesn't need them.
- Use the mbuf allocated for the BHS header to store icl_cxgbei_pdu as
well. This eliminates the custom zone for the PDUs and reduces the
number of allocations on the fast path. For each PDU, the old code
used to allocate an icl_cxgbei_pdu, an mbuf for the BHS, and tags for
the BHS and data mbufs. The new code allocates just one mbuf per PDU.
This is convenient for another reason -- it allows t4_tom to deal with
mbufs (which it understands) instead of having to call into the iSCSI
driver.
- Remove the socket upcalls, calls to ICL_DEBUG and ICL_WARN, and all
code within ICL_KERNEL_PROXY. None of this stuff is actually used by
cxgbei, it's probably leftover copy/paste from icl_soft.
- Fold various icl_foo into icl_cxgbei_foo if the cxgbei implementation
was simply a call to the other function.
- Remove set_tcb_field and use t4_set_tcb_field that's already available
in the base driver.
- Fix connection handoff to not assume that there is only one T4/T5
adapter in the system and that's the one handling all offloaded
connections. Walk the list of adapters and match tp->t_tod with the
adapter's toedev instead. This allows multiple TOE devices of multiple
types to coexist.
- Fix connection teardown by not reaching for the inp via the toepcb but
via the socket instead. If the tid is dead in the hardware then the
inp has already been unhooked from the toepcb by t4_tom.
- Add more CTRs. The ones on the normal fast path are disabled by
default to avoid flooding the log.
- Refine pdu_append_data.
- Other miscellaneous changes.
Switch PCI register reads from using magic numbers to using the names
defined in pcireg.h
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4185
Using other values causes VMXNET3_CMD_ENABLE to fail. The Linux
driver also enforces this restriction.
Reviewed by: bryanv
MFC after: 1 week
Sponsored by: Norse
Differential Revision: https://reviews.freebsd.org/D4139
accident with RTL8168G and later chips when the interface actually was
brought up. This is due to the fact that with these MAC variants, RXDV
gate needs be disabled for WOL to work. So do just that in re_setwol()
when IFCAP_WOL is requested.
Reported and tested by: dhw
MFC after: 3 days
The code tracks a counter which is the number of events until the next
sample. On context switch in, it loads the saved counter. On context
switch out, it tries to calculate a new saved counter.
Problems:
1. The saved counter was shared by all threads in a process. However, this
means that all threads would be initially loaded with the same saved
counter. However, that could result in sampling more often than once every
X number of events.
2. The calculation to determine a new saved counter was backwards. It
added when it should have subtracted, and subtracted when it should have
added. Assume a single-threaded process with a reload count of 1000 events.
Assuming the counter on context switch in was 100 and the counter on context
switch out was 50 (meaning the thread has "consumed" 50 more events), the
code would calculate a new saved counter of 150 (instead of the proper 50).
Fix:
1. As soon as the saved counter is used to initialize a monitor for a
thread on context switch in, set the saved counter to the reload count.
That way, subsequent threads to use the saved counter will get the full
reload count, assuring we sample at least once every X number of events
(across all threads).
2. Change the calculation of the saved counter. Due to the change to the
saved counter in #1, we simply need to add (modulo the reload count) the
remaining counter time we retrieve from the CPU when a thread is context
switched out.
Differential Revision: https://reviews.freebsd.org/D4122
Approved by: gnn (mentor)
MFC after: 1 month
Sponsored by: Juniper Networks
On my own tests I see no effect from this change, but I also can't
reproduce the reported problem in general.
PR: 127391
PR: 204554
Submitted by: satz@iranger.com
MFC after: 2 weeks
Changes to the code to gather user stacks:
* Delay setting pmc_cpumask until we actually have the stack.
* When recording user stack traces, only walk the portion of the ring
that should have samples for us.
Sponsored by: Juniper Networks
Approved by: gnn (mentor)
MFC after: 1 month
Currently, there is a single pm_stalled flag that tracks whether a
performance monitor was "stalled" due to insufficent ring buffer
space for samples. However, because the same performance monitor
can run on multiple processes or threads at the same time, a single
pm_stalled flag that impacts them all seems insufficient.
In particular, you can hit corner cases where the code fails to stop
performance monitors during a context switch out, because it thinks
the performance monitor is already stopped. However, in reality,
it may be that only the monitor running on a different CPU was stalled.
This patch attempts to fix that behavior by tracking on a per-CPU basis
whether a PM desires to run and whether it is "stalled". This lets the
code make better decisions about when to stop PMs and when to try to
restart them. Ideally, we should avoid the case where the code fails
to stop a PM during a context switch out.
Sponsored by: Juniper Networks
Reviewed by: jhb
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D4124
There is no need for the upstream and downstream addresses to be
different for the NTB configs. Go to using a single set of address. It
is still possible to configure them differently using module parameter
override however (CEM: tunable).
Authored by: Dave Jiang <dave.jiang@intel.com>
Reviewed by: Allen Hubbe <Allen.Hubbe@emc.com>
Reviewed by: Jon Mason <jdmason@kudzu.us>
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Remove the use of sbuf_data on drained sbufs from the debug sysctls:
* ixl_sysctl_hw_res_alloc
* ixl_sysctl_switch_config
This prevents a kernel panic when accessing these values under a kernel
compiled with INVARIANTS.
Sponsored by: Multiplay
Order of operations issue with the QP Num and MW count, which would
result in the receive buffer pointer being invalid if there are more
than 1 MW. Corrected with parenthesis to enforce the proper order of
operations.
Reported by: John I. Kading <John.Kading@gd-ms.com>
Reported by: Conrad Meyer <cem@FreeBSD.org>
Authored by: Jon Mason <jdmason@kudzu.us>
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Because it can sleep drainking link work callout(s). Linux (dual
BSD/GPL driver) does something very similar.
At the same time, switch the NTB CTX lock to a non-spin mutex, because
the taskqueue_swi lock can't be taken after a spin mutex.
Suggested by: Witness
Sponsored by: EMC / Isilon Storage Division
In ntb_poll_link, we are intentionally writing the link bit, which is
absent from db_valid_mask. Don't panic on a kassert when we do so.
The Linux version of this (dual BSD/GPL) driver has the db_valid_mask
assertions in callers of db_iowrite() rather than db_iowrite() itself;
it skips the assertions in the equivalent of ntb_poll_link(). Rather
than duplicating the assertions in every caller, add a db_iowrite_raw()
that doesn't check and use it from ntb_poll_link().
Suggested by: kassert_panic
Sponsored by: EMC / Isilon Storage Division
from Mellanox Technologies. The current driver supports ethernet
speeds up to and including 100 GBit/s. Infiniband support will be
done later.
The code added is not compiled by default, which will be done by a
separate commit.
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
the TOE for LAN operation. It is possible to set this to other values
(cluster for networks with little loss and really tight RTTs, and wan
for relatively large RTTs and/or lossy networks) depending on the
environment in which the TOE is being used.
None of this affects plain NIC operation in any way.
MFC after: 1 week
- Split urtwn_tx_start() into urtwn_tx_data() and urtwn_tx_start()
(the last will be used for beacon updates / raw xmit path).
- Remove unneeded code from _urtwn_getbuf().
- Use CCK11 for data frames in 11b mode.
- Send EAPOL frames at 1 Mbps.
- Reduce code duplication in urtwn_tx_data().
- Fix sequence numbering.
- Add IEEE80211_RADIOTAP_F_WEP flag for encrypted frames.
- Check URTWN_RUNNING flag under lock.
Tested with RTL8188EU, STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4017
Right now the only way to force a cold reset is:
* The HAL itself detects it's needed, or
* The sysctl, setting all resets to be cold.
Trouble is, cold resets take quite a bit longer than warm resets.
However, there are situations where a cold reset would be nice.
Specifically, after a stuck beacon, BB/MAC hang, stuck calibration results,
etc.
The vendor HAL has a separate method to set the reset reason (which is
how HAL_RESET_BBPANIC gets set) which informs the HAL during the reset path
why it occured. This is almost but not quite the same; I may eventually
unify both approaches in the future.
This commit just extends HAL_RESET_TYPE to include both status (eg BBPANIC)
and type (eg do COLD.) None of the HAL code uses it yet though; that'll
come later.
It also is a big no-op in each HAL - I need to go teach each of the HALs
about cold/warm reset through this path.
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
Sponsored by: Citrix Systems R&D
MFC after: 2 weeks
X-MFC-with: r290610
dev/xen/blkfront/blkfront.c:
- Add and announce support for unmapped IO.
before their initial configuration is done, it turns out that r281337
has the inverse effect on some older chips. Moreover, as with newer
chips before, two chips seemingly identical according to their MAC
revisions may behave differently in this regard, with most working
but a few not, making changes extremely hard to test.
Closer inspection of the corresponding Linux code suggests that RX
and TX should only be enabled after their initial configuration with
RTL8168G and later chips, i. e. RTL8106E{,US}, RTL8107E, as well as
RTL8168{EP,G,GU,H}, so limit the new code path to these. [1]
- Distinguish between RTL8168H and RTL8107E, with the latter being the
10/100-Mbit/s-only variant of the former.
- For MAC variants that can only do Fast Ethernet at a maximum, ensure
that we don't advertise Gigabit Ethernet speed.
- In re_stop(), do the inverse of re_init_locked() and enable RXDV
gate on RTL8168G and later chips again, matching what Linux does.
PR: 203422 [1]
MFC after: 1 week
- Filter out unneeded frames in STA mode.
- Implement ic_promisc() call.
Tested with RTL8188EU, STA and MONITOR modes.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D3999
learned that the Power8 and the PS3 have a mix of OFW and FDT. Both have AIM
defined. But currently they are not affected. They have no I2C devices under
OFW.
This version was tested on a Quad G5 and build tested for armv6*.
Discussed with nwhitehorn@
Reviewed by: ian@
32-bits aligned. Merge the two bounce buffers into a single one. Some
rough tests showed that the DWC OTG throughput on RPI2 increased by
10% after this patch.
MFC after: 1 week
The mbuf length fields must be set before m_adj() is called else
m_adj() will not always adjust the mbuf and an unaligned read
exception can trigger inside the network stack. This can happen on
platforms where unaligned reads are not supported. Adjust a length
check to include the 2-byte ethernet alignment while at it.
MFC after: 3 days
- Drop TSF initialization; device can discover it without our help.
- Do not touch R92C_BCN_CTRL_EN_BCN bit in STA mode.
- Add 'static' keyword for function definition.
Tested with RTL8188EU, STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D3801
- Fix mbuf leaks in iwn_raw_xmit() and iwn_xmit_task()
(regression since r288178).
- Check IWN_FLAG_RUNNING flag under lock.
- Remove m->m_pkthdr.rcvif initialization (fixed in r283994).
- Enclose some values in return statements into parentheses.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4069
attributes when replying to a TLP from a Root Port. As a workaround,
disable No Snoop and Relaxed Ordering in the Root Port of each T5 adapter
during attach so that CPU-initiated requests do not contain these flags.
Note that this affects CPU-initiated requests to all devices under this
root port.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio
PCI-Express capability registers (that is, PCI config registers in the
standard PCI config space belonging to the PCI-Express capability
register set).
Note that all of the current PCI-e registers are either 16 or 32-bits,
so only widths of 2 or 4 bytes are supported.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D4088
Since r288350, ic_wme_task() is called via ieee80211_runtask(),
so, any additional deferring from the driver side is not needed.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4072
- There is no reason to have a special case for iSCSI in t4_rcvd. Either there
is data in the socket buffer (from when the connection was plain TOE, before
being promoted to ulp_mode iSCSI) and sbused _should_ be taken into account,
or sbused is 0 and doesn't affect the calculation of rx_credits.
- write_tx_wr doesn't need special handling for iSCSI either. Its caller should
specify the ulp_submode.
- Replace t4_ulp_push_frames with t4_push_pdus that can deal with PDUs in an
mbufq hanging off the toepcb. This eliminates the "backwards" calls from
t4_tom's tx into the iSCSI driver.
- The iSCSI driver installs a handler for RX_ISCSI_DDP already and the iSCSI
handler for RX_DATA_DDP is identical to the one for RX_ISCSI_DDP. Take
advantage of this to eliminate the last remaining "backwards" call from
do_rx_data_ddp into the iSCSI driver.
- Eliminate the CXGBE_ISCSI_MBUF_TAG abomination.
- For tx, it makes no sense to allocate an mbuf tag just to stash 2 bits worth
of information. Use a spare byte from the mbuf header instead.
- For rx, the per-connection ulpcb is a more natural place to keep information
about the PDU currently being assembled.
to transmit the buffer.
ath_tx_start() may manipulate/reallocate the mbuf as part of the DMA
code, so we can't expect the mbuf can be returned back to the caller.
Now, the net80211 ifnet work changed the semantics slightly so
if an error is returned here, the mbuf/reference is freed by the
caller (here, it's net80211.)
So, once we reach ath_tx_start(), we never return failure. If we fail
then we still return OK and we free the mbuf/noderef ourselves, and
we increment OERRORS.
Certain invalid operations trigger hardware error conditions. Error
conditions that only halt one channel can be detected and recovered by
resetting the channel. Error conditions that halt the whole device are
generally not recoverable.
Add a sysctl to inject channel-fatal HW errors,
'dev.ioat.<N>.force_hw_error=1'.
When a halt due to a channel error is detected, ioat(4) blocks new
operations from being queued on the channel, completes any outstanding
operations with an error status, and resets the channel before allowing
new operations to be queued again.
Update ioat.4 to document error recovery; document blockfill introduced
in r290021 while we are here; document ioat_put_dmaengine() added in
r289907; document DMA_NO_WAIT added in r289982.
Sponsored by: EMC / Isilon Storage Division
Fixes race condition observed under following circumstances:
1) I/O split on 128KB boundary with Intel NVMe controller.
Current Intel controllers produce better latency when
I/Os do not span a 128KB boundary - even if the I/O size
itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
simultaneously.
5) ithread for child I/O A increments bio_inbed, then
immediately is preempted (rendezvous IPI, higher priority
interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
or gpart bio (including setting bio_children to 1 and
clearing bio_driver1).
8) ithread for child I/O A resumes processing. bio_children
for what it thinks is the parent bio is set to 1, so it
thinks it needs to complete the parent bio.
Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.
PR: 203746
Reported by: Drew Gallatin <gallatin@netflix.com>,
Marc Goroff <mgoroff@quorum.net>
Tested by: Drew Gallatin <gallatin@netflix.com>
MFC after: 3 days
Sponsored by: Intel
status registers for every interrupt. Check a common host channel
status interrupt register first, then conditionally read the
individual host channel status registers.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
LinuxKPI into the kernel. This was done to keep the build rules for
the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.
Reviewed by: np @ (cxgb and cxgbe related changes only)
Sponsored by: Mellanox Technologies
AMD64 pmap assumes ranges will be in the DMAP, which isn't necessarily
true for NTB memory windows (especially 64-bit BARs).
Suggested by: pmap_change_attr_locked -> kassert_panic
Sponsored by: EMC / Isilon Storage Division
Allows DMA from/to arbitrary KVA or physical address. /dev/ioat_test
must be enabled by root and is only R/W root, so this is approximately
as dangerous as /dev/mem and /dev/kmem.
Sponsored by: EMC / Isilon Storage Division
For the most of chips (except anscient ones) port handlers have no relation
to port IDs. In such situation old code scanning first 125 handlers was
quite naive. Instead of doing that, send to chip single request to get full
list of port handlers available on specific virtual port and scan only them.
Old code had problems with case of several virtual ports enabled, when port
handlers allocated from global address space could easily go above 125.
This change was successfully tested on 23xx, 24xx and 25xx chips in loop
mode with 4 virtual initiator ports, each seing 50 virtual target ports.
* refactor out the rx filter and operating mode code into a separate
method.
* add some comments about what's left with setting the operating mode
based on what carl9170 does.
* comment out some init from otus_init_mac() - it's no longer needed as
it's always init'ed now.
* add debugging and a missing return around a failure to call m_get2() -
during monitor mode operation I found RXing of frames > 2k, which
fails allocation. I'm sure they're valid (it's configuring 11n RX and
receiving 11n frames even though the driver doesn't "do" 11n)
and may be A-MSDU; but allocations fail and we should handle that
gracefully.
Tested:
* UB82 reference NIC (AR9170 + AR9104 2x2 dual band NIC); STA and
monitor mode operation.
The IOAT hardware supports writing a 64-bit pattern to some destination
buffer. The same limitations on buffer length apply as for copy
operations. Throughput is a bit higher (probably because fill does not
have to spend bandwidth reading from a source in memory).
Support for testing Block Fill has been added to ioatcontrol(8) and the
ioat_test device. ioatcontrol(8) accepts the '-f' flag, which tests
Block Fill. (If the flag is omitted, the tool tests copy by default.)
The '-V' flag, in conjunction with '-f', verifies that buffers are
filled in the expected pattern.
Tested on: Broadwell DE (Xeon D-1500)
Sponsored by: EMC / Isilon Storage Division
Add generic hw descriptor struct and generic control flags struct, in
preparation for other kinds of IOAT operation.
Sponsored by: EMC / Isilon Storage Division
Now on 24xx and above chips it is really possible to simulate several
virtual FC ports with single physical one. For example, it allows to
configure several targets in ctl.conf, assign each of them to separate
virtual port, and let user to control access to them with switch zoning.
I still doubt that all problems are solved there, but at now it passes
at least basic tests.
The new load_ma implementation can cause dereferences when used with
certain drivers, back it out until the reason is found:
Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 03
fault virtual address = 0x30
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff808a2d22
stack pointer = 0x28:0xfffffe07cc737710
frame pointer = 0x28:0xfffffe07cc737790
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 13 (g_down)
trap number = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80641647 at kdb_backtrace+0x67
#1 0xffffffff80606762 at vpanic+0x182
#2 0xffffffff806067e3 at panic+0x43
#3 0xffffffff8084eef1 at trap_fatal+0x351
#4 0xffffffff8084f0e4 at trap_pfault+0x1e4
#5 0xffffffff8084e82f at trap+0x4bf
#6 0xffffffff80830d57 at calltrap+0x8
#7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb
#8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91
#9 0xffffffff8042dcad at ata_dmaload+0x11d
#10 0xffffffff8042df7e at ata_begin_transaction+0x7e
#11 0xffffffff8042c18e at ataaction+0x9ce
#12 0xffffffff802a220f at xpt_run_devq+0x5bf
#13 0xffffffff802a17ad at xpt_action_default+0x94d
#14 0xffffffff802c0024 at adastart+0x8b4
#15 0xffffffff802a2e93 at xpt_run_allocq+0x193
#16 0xffffffff802c0735 at adastrategy+0xf5
#17 0xffffffff80554206 at g_disk_start+0x426
Uptime: 2m29s
Add a new flag for DMA operations, DMA_NO_WAIT. It behaves much like
other NOWAIT flags -- if queueing an operation would sleep, abort and
return NULL instead.
When growing the internal descriptor ring, the memory allocation is
performed outside of all locks. A lock-protected flag is used to avoid
duplicated work. Threads that cannot sleep and attempt to queue
operations when the descriptor ring is full allocate a larger ring with
M_NOWAIT, or bail if that fails.
ioat_reserve_space() could become an external API if is important to
callers that they have room for a sequence of operations, or that those
operations succeed each other directly in the hardware ring.
This patch splits the internal head index (->head) from the hardware's
head-of-chain (DMACOUNT) register (->hw_head). In the future, for
simplicity's sake, we could drop the 'ring' array entirely and just use
a linked list (with head and tail pointers rather than indices).
Suggested by: Witness
Sponsored by: EMC / Isilon Storage Division
Assertion used here was invalid. If current thread helds any of locks,
we never want to recurse on them.
Obtained from: Semihalf
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D3903
Add e6000sw driver supporting Marvell 88E6352, 88E6172, 88E6176 switches.
It needs to be attached to mdio interface, exporting SMI access
functionality. e6000sw supports port-based VLAN configuration, per-port
media changing, accessing PHY and switch registers.
e6000sw attaches miibuses and PHY drivers as children. Instead of typical
tick as callout, kthread-based tick is used. This combined with SX locks
allows MDIO read/write calls to sleep. It is expected, because this
hardware requires long delays in SMI read/write procedures, which can not
be handled by busy-waiting.
Reviewed by: adrian
Obtained from: Semihalf
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D3902
This commit introduces support for etherswitch devices that utilize SMI as
a way of accessing its registers. SMI register is located in address space
of mge -- access to it was exported through MDIO interface.
Attachment functions were enhanced so as to ensure proper initialisation
in both cases: 1) PHYs attached directly to mge, 2) PHYs attached to
switch device and switch attached to mge. Attachment of etherswitch device
depends on dts entry with compatible="mrvl,sw" property. If none is found,
typical PHY attachment procedure follows.
In case of switch attached, PHYs' status and configuration is accessible
via etherswitchcfg, and ifconfig shows always-up, non-configurable mge
interfaces.
Due to the fact that there may be simultaneous accessess to SMI
registers (e.g. from PHY attached to one of mge instances and switch
to the other), SMI access interlock was added. It is SX lock,
because sleep ability is necessary -- busy-waiting would result
in poor performance due to long delays required by hardware.
Underlying switch driver is obliged to use sleepable locks as well.
Reviewed by: adrian
Obtained from: Semihalf
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D3900
We need to reset the chancmp and chainaddr MMIO registers to bring the
device back to a working state.
Name the chanerr bits while we're here.
Sponsored by: EMC / Isilon Storage Division
We only need to borrow a mutex for the drain sleep and the 0->1
transition, so just reuse an existing one for now.
The wchan is arbitrary. Using refcount itself would have required
__DEVOLATILE(), so use the lock's address instead.
Different uses are tagged by kind, although we only do anything with
that information in INVARIANTS builds.
Sponsored by: EMC / Isilon Storage Division
Callers should have acquired this lock when they invoked ioat_acquire()
before issuing operations. Assert it is held.
Sponsored by: EMC / Isilon Storage Division
This is still the worst possible way to allocate memory if it will ever
be under pressure, but at least it won't deadlock.
Suggested by: WITNESS
Sponsored by: EMC / Isilon Storage Division
Pull out the timer callout delay into IOAT_INTR_TIMO and shorten it
considerably (5s -> 100ms). Single operations do not take 5-10 seconds
and when interrupts aren't working, waiting 100ms sucks a lot less than
5s.
Sponsored by: EMC / Isilon Storage Division
Now 24xx and above chips support full 8-byte LUN address space.
Older FC chips may support up to 16K LUNs when firmware allows.
Tested in both initiator and target modes for 23xx, 24xx and 25xx.
This change allows to decode respective functions in isp(4) in target mode
and pass them through CAM to CTL. Unfortunately neither CAM nor isp(4)
support returning response info for those task management functions now.
On the other side I just have no initiator to test this functionality.
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.
This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:
PVHVM
3165.84 real 6354.17 user 4483.32 sys
PVHVM with unmapped IO
2099.46 real 4624.52 user 2967.38 sys
This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.
Sponsored by: Citrix Systems R&D
MFC after: 2 weeks
X-MFC-with: r289834
FC port database code already notifies CAM about all devices. Additional
full scan is just a waste of time, that by definition won't find anything
that is not present in port database.
registers (they are used for different purposes).
- Wrap R92C_MSR modifications into urtwn_set_mode().
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D3838
* Add a comment about the parameters I should support, stolen shamelessly
from iwn(4);
* Implement the rate bit for the raw transmit path;
* Print out the host-order versions of each of the transmit bits, so
I have a hope in heck of debugging why things are going wrong.
This still doesn't fix 5GHz in the office but that's likely due to a lot
of other configuration parameters being 2GHz-specific. That'll come next.
Tested:
* AR9170 + AR9103 (2/5GHz) 2x2, 5GHz association
Replace custom Linux-like logging with a thin shim around
device_printf(), when the softc is available.
In ioat_test, shim around printf(9) instead.
Sponsored by: EMC / Isilon Storage Division
This should export all of the same information as the Linux ntb_hw_intel
debugfs info file, but with a bit more structure, in the sysctl tree
rooted at 'dev.ntb_hw.<N>.debug_info'.
Raw registers are marked as OPAQUE because reading them on some hardware
revisions may cause a hard lockup (NTB errata). They can be read with
'sysctl -x dev.ntb_hw.<N>.debug_info.registers'. On Xeon platforms,
some additional registers are available under 'registers.xeon_stats' and
'registers.xeon_hw_err'. They are exported as big-endian values so that
the 'sysctl -x' output is legible.
Shrink the feature mask to 32 bits so we can use the %b formatter in
'debug_info.features'.
Sponsored by: EMC / Isilon Storage Division
Don't run the selftest until after we've enabled bus mastering, or the
DMA engine can't copy anything for our test.
Create the ioat_test device on attach, if so tuned. Destroy the
ioat_test device on teardown.
Replace deprecated 'CALLOUT_MPSAFE' with correct '1' in callout_init().
Sponsored by: EMC / Isilon Storage Division
The test logic now preallocates memory before running the test.
The buffer size is now configurable. Post-copy verification is
configurable. The number of copies to chain into one transaction (one
interrupt) is configurable.
A 'duration' mode is added, which repeats the test until the duration
has elapsed, reporting the B/s and transactions completed.
ioatcontrol.8 has been updated to document the new arguments.
Initial limits (on this particular Broadwell-DE) (and when the
interrupts are working) seem to be: 256 interrupts/sec or ~6 GB/s,
whichever limit is more restrictive.
Unfortunately, it seems the interrupt-reset handling on Broadwell isn't
working as intended. That will be fixed in a later commit.
Sponsored by: EMC / Isilon Storage Division
The FDT bindings for eeprom parts don't include any metadata about the
device other than the part name encoded in the compatible property.
Instead, a driver is required to have a compiled-in table of information
about the various parts (page size, device capacity, addressing scheme). So
much for FDT being an abstract description of hardware characteristics, huh?
In addition to the FDT-specific changes, this also switches to using the
newer iicbus_transfer_excl() mechanism which holds bus ownership for the
duration of the transfer. Previously this code held the bus across all
the transfers needed to complete the user's IO request, which could be
up to 128KB of data which might occupy the bus for 10-20 seconds. Now the
bus will be released and re-aquired between every page-sized (8-256 byte)
transfer, making this driver a much nicer citizen on the i2c bus.
The hint-based configuration mechanism is still in place for non-FDT systems.
Michal Meloun contributed some of the code for these changes.
while holding exclusive ownership of the bus. This is the routine most
slave drivers should use unless they have a need to acquire and hold the
bus across a series of related operations that involves multiple transfers.
various reasons while executing user commands. After these commands are
completed, the pages backing the relocation regions are unheld.
Since relocation regions do not have to be page aligned, the code in
validate_exec_list() allocates 2 extra page pointers in the array of
held pages populated by vm_fault_quick_hold_pages(). However, the cleanup
code that unheld the pages always assumed that only the buffer size /
PAGE_SIZE pages were used. This meant that non-page aligned buffers would
not unheld the last 1 or 2 pages in the list. Fix this by saving the
number of held pages returned by vm_fault_quick_hold_pages() for each
relocation region and using this count during cleanup.
Reviewed by: dumbbell, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3965
represented in 7-bits format in DT files, but system expect it in 8-bit
format. Also, fix two drivers that locally hack around this bug.
Submitted by: Michal Meloun <meloun@miracle.cz>
r289587 broke LINT-NOIP kernels because the lro and queued local variables
are defined but not used. Add preprocessor guards around them.
Reported by: emaste
Sponsored by: Citrix Systems R&D
xen/hypervisor.h:
- Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
is_running_on_xen
- Remove unused define CONFIG_X86_PAE
- Remove unused variable xen_start_info: note that it's used inpcifront
which is not built at all
- Remove forward declaration of HYPERVISOR_crash
xen/xen-os.h:
- Remove unused define CONFIG_X86_PAE
- Drop unused helpers: test_and_clear_bit, clear_bit,
force_evtchn_callback
- Implement a generic version (based on ofed/include/linux/bitops.h) of
set_bit and test_bit and prefix them by xen_ to avoid any use by other
code than Xen. Note that It would be worth to investigate a generic
implementation in FreeBSD.
- Replace barrier() by __compiler_membar()
- Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
= pause
xen/xen_intr.h:
- Move the prototype of xen_intr_handle_upcall in it: Use by all the
platform
x86/xen/xen_intr.c:
- Use BITSET* for the enabledbits: Avoid to use custom helpers
- test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
- Don't export the variable xen_intr_pcpu
dev/xen/blkback/blkback.c:
- Fix the string format when XBB_DEBUG is enabled: host_addr is typed
uint64_t
dev/xen/balloon/balloon.c:
- Remove set but not used variable
- Use the correct type for frame_list: xen_pfn_t represents the frame
number on any architecture
dev/xen/control/control.c:
- Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
means the driver can handle this device. If by any chance xenstore is the
first driver, every new device with the driver is unset will use
xenstore.
dev/xen/grant-table/grant_table.c:
- Remove unused cmpxchg
- Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
contain anything required for the code on x86
dev/xen/netfront/netfront.c:
- Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
number on any architecture
dev/xen/netback/netback.c:
- Use the correct type for gmfn: xen_pfn_t represents the frame number on
any architecture
dev/xen/xenstore/xenstore.c:
- Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
means the driver can handle this device. If by any chance xenstore is the
first driver, every new device with the driver is unset will use xenstore.
Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3921
Sponsored by: Citrix Systems R&D
I messed up when doing the reset_vlans method - setting vid[0] = 1 here
was making it 'hidden' from configuration (as it needed ETHERSWITCH_VID_VALID
as well) and so there was no way to configure vlangroup0.
In per-port VLAN mode, vlangroup0 is for the CPU port (port0).
Now, it normally wouldn't really matter - the CPU port thus sees
all other ports. However there are two CPU ports on the AR8327 and
so port0 (arge0) was seeing all traffic on port6 (arge1).
If you thus tried to use arge1/port6 for anything (eg a WAN port)
in a bridge group then things would very upset very quickly.
Whilst here, add a comment to remind myself that yes, it'd be nice
if we could specify a boot-time switch config.
Tested:
* AP135 reference platform w/ AR8327N switch
If the bus is detached and deleted by a call to device_delete_child() or
device_delete_children() on a device higher in the tree, I²C children
were already detached and deleted. So the device_t pointer stored in sc
points to freed memory: we must not try to delete it again.
By using device_delete_children(), we let subr_bus.c figure out if there
are children to take care of.
While here, make sure iicbus_detach() and iicoc_detach() call
device_delete_children() too, to be safe.
Reviewed by: jhb, imp
Approved by: jhb, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3926
Add ntb_q_idx_t so it is more clear which struct members are of the same
type (some bogus uint64_ts snuck in that should have been unsigned int).
Add tx_err_no_buf and s/ENOMEM/EBUSY/ in tx_enqueue to match Linux.
Sponsored by: EMC / Isilon Storage Division
A plain 32 bit integer will overflow for values over 4GiB.
Change the plain integer size to the appropriate size type in
ntb_set_mw. Change the type of the size parameter and two local
variables used for size.
Even if there is no overflow, a size of zero is invalid here.
Authored by: Allen Hubbe
Reported by: Juyoung Jung
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.
There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.
Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Mechanically replace "SOC" with "ATOM" to match Linux. No functional
change. Original Linux commit log follows:
Instead of using the platform code names, use the correct platform names
to identify the respective Intel NTB hardware.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Prints driver name to indicate what is being loaded.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Throw away the result of the peer SPAD read. The peer will write our
local SPAD and we need to keep the locally read SPAD value to check if
the remote side is up.
Sponsored by: EMC / Isilon Storage Division
Add module parameters for the addresses to be used in B2B topology.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Reset the link stats when the link goes down. In particular, the TX and
RX index and count must be reset, or else the TX side will be sending
packets to the RX side where the RX side is not expecting them. Reset
all the stats, to be consistent.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
We skip actually bringing up Rootport/Transparent configurations, so
most of this doesn't apply. Original Linux commit log:
Link training should be enabled in the driver probe for root port mode.
We should not have to wait for transport to be loaded for this to
happen. Otherwise the ntb device will not show up on the transparent
bridge side of the link.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
It is just a trivial wrapper around ntb_mw_set_trans().
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The failure path for allocating rx grant refs should not try to free tx
grant refs because tx grant refs were allocated after that. Also fix the
error path for xen_net_read_mac.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3891
Sponsored by: Citrix Systems R&D
This is redundant because ether_ifattach will set that field.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3918
Sponsored by: Citrix Systems R&D
We're way beyond FreeBSD 7 at this point.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3892
Sponsored by: Citrix Systems R&D
Multiqueue feature will make the number of queues dynamic, so XN_LOCK_INIT
won't be that useful. Remove the macro and call mtx_init directly.
XN_LOCK_DESTROY is just dead code.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3890
Sponsored by: Citrix Systems R&D
Rename it with netfront_ prefix and purge a bunch of unused fields.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3889
Sponsored by: Citrix Systems R&D
Currently neither Linux nor FreeBSD netback supports page flipping. NetBSD
still supports that. It is not sure how many people actually use page
flipping, but page flipping is supposed to be slower than copying nowadays.
It will also shatter frontend / backend address space.
Overall this feature is more of a burden than a benefit.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3888
Sponsored by: Citrix Systems R&D
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.
Reviewed by: np @
Sponsored by: Mellanox Technologies
* Use the correct malloc type for node allocation - M_80211_NODE - so
the default node free method in net80211 will work correctly.
* Fix otus_node_alloc() to suit FreeBSD's net80211.
* .. and actually call otus_node_alloc() so there's space for the
per-node tx statistics. Otherwise, well, it will be scribbling over
random memory.
Tested:
* AR9170, STA mode
The monitor mode stuff is from the openbsd driver, but it doesn't
100% work. It doesn't seem to get all frames for all BSSes.
However, it's enough to at start debugging things. That 0xffffffff
write is /I think/ the RX filter, but I am still not 100% sure about
it all.
Then, whilst here, use the lowest rate for EAPOL frames. This is just
generally a good thing to do.
This commit adds support for MDIO present in the ThunderX SoC.
From the FDT point of view it is compatible with "octeon-3860-mdio"
however only C22 mode is used.
The code also implements lmac_if interface functions.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
- The driver consists of three main componens: PF, VF, BGX
- Requires appropriate entries in DTS and MDIO driver
- Supports only FDT configuration
- Multiple Tx queues and single Rx queue supported
- No RSS, HW checksum and TSO support
- No more than 8 queues per-IF (only one Queue Set per IF)
- HW statistics enabled
- Works in all available MAC modes (1,10,20,40G)
- Style converted to BSD according to style(9)
- The code brings lmac_if interface used by the BGX driver to
update its logical MACs state.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
This import brings following components of the Linux driver:
- Thunder BGX (programmable MAC)
- Physical Function driver
- Virtual Function driver
- Headers
Revision: 1.0
Obtained from: Cavium
License information: Cavium provided these files under BSD license
This is the last e26a5843 patch. The general thrust of the rewrite was
to move more responsibility for Memory Window and Doorbell interrupt
management from the ntb_hw driver to if_ntb.
A number of APIs have been added, removed, or replaced. The old
DB callback mechanism has been excised. Instead, callers (if_ntb) are
responsible for configuring MWs and handling their interrupts more
directly.
This adds a tunable, hw.ntb.max_mw_size, allowing users to limit the
size of memory windows used by if_ntb (identical to the Linux modparam
of the same name).
Despite attempts to keep mechanical name changes to separate commits,
some have snuck in here. At least the driver should be much more
similar to the latest Linux one now -- making porting fixes easier.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
No functional change. Part of the huge rewrite (e26a5843).
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Move all Xeon secondary register setup to the setup_b2b_mw routine. We
use subroutines to make it a bit less wordy than the Linux version.
Adds a new tunable, 'hw.ntb.b2b_mw_share'. By default, it is off
(zero). If both sides enable it (any non-zero value), the NTB driver
attempts to use only half of a memory window for remote register MMIO
access.
This is still part of the large Linux rewrite (e26a5843).
Authored by: Allen Hubbe
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This Linux commit was more or less a rewrite. Unfortunately, the commit
log does not give a lot of context for the rewrite. I have tried to
faithfully follow the changes made upstream, including matching function
names where possible, while churning the FreeBSD driver as little as
possible.
This is the bulk of the rewrite. There are two groups of changes to
follow in separate commits: fleshing out the rest of the changes to
xeon_setup_b2b_mw(), and some changes to if_ntb.
Yes, this is a big patch (3 files changed, 416 insertions(+), 237
deletions(-)), but the Linux patch was 13 files changed, 2,589
additions(+) and 2,195 deletions(-).
Original Linux commit log:
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Authored by: Allen Hubbe
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Some interrupt-related function names changed to match Linux.
No functional change. Still part of the huge e26a5843 rewrite in Linux.
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
No functional change.
Still part of the huge e26a5843 rewrite. I'm trying to make it less of
a complete rewrite in the FreeBSD version of the driver. Still, it
helps if our names match Linux.
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
On the Haswell platform, a split BAR option to allow creation of 2 32bit
BARs (4 and 5) from the 64bit BAR 4. Adding support for this new option.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This is a follow-up to r289208: "Xeon Errata Workaround."
Add logic to support a variable number of memory windows and doorbell
callbacks. This was added to the Linux driver in the "Xeon Errata
Workaround" commit, but I skipped it because it didn't look neccessary
at the time. It is needed for future Haswell split-BAR support, so
bring it in now.
A new tunable was added for if_ntb, 'hw.ntb.max_num_clients'. By
default, it is set to zero -- infer the number of clients from the
number of memory windows available from the hardware. Any other
positive value can specify a different number of clients, limited by the
number of doorbell callbacks available (4 under MSI-X, or 15 (Xeon) or
34 (SoC) under legacy INTx).
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This patch adds support for the BCM57765[2] card reader function included in
Broadcom's BCM57766 ethernet/sd3.0 controller. This controller is commonly
found in laptops and Apple hardware (MBP, iMac, etc).
The BCM57765 chipset is almost fully compatible with the SD3.0 spec, but
does not support deriving a frequency below 781KHz from its default base
clock via the standard SD3.0-configured 10-bit clock divisor.
If such a divisor is set, card identification (which requires a 400KHz
clock frequency) will time out[1].
As a work-around, I've made use of an undocumented device-specific clock
control register to switch the controller to a 63MHz clock source when
targeting clock speeds below 781KHz; the clock source is likewise switched
back to the 200MHz clock when targeting speeds greater than 781KHz.
Additionally, this patch fixes a small sdhci_pci bug; the
sdhci_pci_softc->quirks flag was not copied to the sdhci_slot, resulting in
`quirk` behavior not being applied by sdhci.c.
[1] A number of Linux/FreeBSD users have noted that bringing up the chipsets'
associated ethernet interface will allow SD cards to enumerate (slowly).
This is a controller implementation side-effect triggered by the ethernet
driver's reading of the hardware statistics registers.
[2] This may also fix card detection when using the BCM57785 chipset, but I
don't have access to the BCM57785 chipset and can't verify.
I actually snagged some BCM57785 hardware recently (2012 Retina MacBook Pro)
and can confirm that this also fixes card enumeration with the BCM57785
chipset; with the patch, I can boot off of the internal sdcard reader.
PR: kern/203385
Submitted by: Landon Fuller <landon@landonf.org>
Pull out read of PPD and platform detection logic to new functions,
ntb_detect_xeon(), ntb_detect_soc(). No functional change -- mostly
this is just shuffling the code to more closely match the Linux driver.
Linux commit log:
To simplify some of the platform detection code. Move the platform
detection to a function to be called earlier.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The doorbell registers (and associated mask) are 16-bit on Xeon but
64-bit on SoC. Abstract IO access to doorbell registers with
'db_ioread' and 'db_iowrite' (names and idea borrowed from the dual
BSD/GPL Linux driver).
Sponsored by: EMC / Isilon Storage Division
Original Linux commit log:
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The detection of an uneven number of queues on the given memory windows
was not correct. The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Remap MSI-X messages over available slots rather than falling back to
legacy INTx when fewer MSI-X slots are available than were requested.
N.B. the Linux driver does *not* do this.
To aid in testing, a tunable 'hw.ntb.force_remap_mode' has been added.
It defaults to off (0). When the tunable is enabled and sufficient
slots were available, the driver restricts the number of slots by one
and remaps the MSI-X messages over the remaining slots.
In case this is actually not okay (as I don't yet have access to this
hardware to test), a tunable 'hw.ntb.prefer_intx_to_remap' has been
added. It defaults to off (0). When the tunable is enabled and fewer
slots are available than requested, fall back to legacy INTx mode rather
than attempting to remap MSI-X messages.
Suggested by: jhb
Reviewed by: jhb (earlier version)
Sponsored by: EMC / Isilon Storage Division
Consumers that registered on this bit would never see a callback and it
is likely a mistake.
This does not affect if_ntb, which limits itself to a single doorbell
callback.
The names don't line up 100% with Linux. Our routines are named
ntb_setup_interrupts, ntb_setup_xeon_msix, ntb_setup_soc_msix, and
ntb_setup_legacy_interrupt. Linux SNB = FreeBSD Xeon; Linux BWD =
FreeBSD SOC. Original Linux commit log:
This is an cleanup effort to make ntb_setup_msix() more readable - use
ntb_setup_bwd_msix() to init MSI-Xs on BWD hardware and
ntb_setup_snb_msix() - on SNB hardware.
Function ntb_setup_snb_msix() also initializes MSI-Xs the way it should
has been done - looping pci_enable_msix() until success or failure.
Authored by: Alexander Gordeev
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Provide a better event interface between the client and transport.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
according to the Cortex-A8 TRM r3p2 section 3.2.49.
The A8 list differs from the "ARM-v7 common" list, given the A8
was an earlier model.
There is still more work to be done for other Cortex-Ax version as
andrew points out, but I am just trying to fix A8 for now for teaching.
MFC after: 2 weeks
Sponsored by: DARPA/AFRL
Obtained from: Cambridge/L41
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D3876
Enable Snoop from Primary to Secondary side on BAR23 and BAR45 on all
TLPs. Previously, Snoop was only enabled from Secondary to Primary
side. This can have a performance improvement on some workloads.
Also, make the code more obvious about how the link is being enabled.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Add a comment describing the necessary ordering of modifications to the
NTB Limit and Base registers.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
A WARN_ON is being hit in ntb_qp_link_work due to the NTB transport link
being down while the ntb qp link is still active. This is caused by the
transport link being brought down prior to the qp link worker thread
being terminated. To correct this, shutdown the qp's prior to bringing
the transport link down. Also, only call the qp worker thread if it is
in interrupt context, otherwise call the function directly.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The Xeon NTB-RP setup, the transparent side does not get a link up/down
interrupt. Since the presence of a NTB device on the transparent side
means that we have a NTB link up, we can work around the lack of an
interrupt by simply calling the link up function to notify the upper
layers.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Modifications to the 14th bit of the B2BDOORBELL register will not be
mirrored to the remote system due to a hardware issue. To get around
the issue, shrink the number of available doorbell bits by 1. The max
number of doorbells was being used as a way to referencing the Link
Doorbell bit. Since this would no longer work, the driver must now
explicitly reference that bit.
This does not affect the xeon_errata_workaround case, as it is not using
the b2bdoorbell register.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
NTB-RP is not a supported configuration on BWD hardware. Remove the
code attempting to set it up.
Authored by: Jon Mason
Obtained from: Linux (dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This commit does not actually add NTB-RP support. Mostly it serves to
shuffle code around to match the Linux driver. Original Linux commit
log follows:
Add support for Non-Transparent Bridge connected to a PCI-E Root Port on
the remote system (also known as NTB-RP mode). This allows for a NTB
enabled system to be connected to a non-NTB enabled system/slot.
Modifications to the registers and BARs/MWs on the Secondary side by the
remote system are reflected into registers on the Primary side for the
local system. Similarly, modifications of registers and BARs/MWs on
Primary side by the local system are reflected into registers on the
Secondary side for the Remote System. This allows communication between
the 2 sides via these registers and BARs/MWs.
Note: there is not a fix for the Xeon Errata (that was already worked
around in NTB-B2B mode) for NTB-RP mode. Due to this limitation, NTB-RP
will not work on the Secondary side with the Xeon Errata workaround
enabled. To get around this, disable the workaround via the
xeon_errata_workaround=0 modparm. However, this can cause the hang
described in the errata.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
Many variable names in the NTB driver refer to the primary or secondary
side. However, these variables will be used to access the reverse case
when in NTB-RP mode. Make these names more generic in anticipation of
NTB-RP support.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
The BWD NTB device will drop the link if an error is encountered on the
point-to-point PCI bridge. The link will stay down until all errors are
cleared and the link is re-established. On link down, check to see if
the error is detected, if so do the necessary housekeeping to try and
recover from the error and reestablish the link.
There is a potential race between the 2 NTB devices recovering at the
same time. If the times are synchronized, the link will not recover and
the driver will be stuck in this loop forever. Add a random interval to
the recovery time to prevent this race.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
There is a Xeon hardware errata related to writes to SDOORBELL or B2BDOORBELL
in conjunction with inbound access to NTB MMIO Space, which may hang the
system. To workaround this issue, use one of the memory windows to access the
interrupt and scratch pad registers on the remote system. This bypasses the
issue, but removes one of the memory windows from use by the transport. This
reduction of MWs necessitates adding some logic to determine the number of
available MWs.
Since some NTB usage methodologies may have unidirectional traffic, the ability
to disable the workaround via modparm has been added.
See BF113 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
See BT119 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
Due to ambiguous documentation, the USD/DSD identification is backward
when compared to the setting in BIOS. Correct the bits to match the
BIOS setting.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
The NTB Xeon hardware has 16 scratch pad registers and 16 back-to-back
scratch pad registers. Correct the #define to represent this and update
the variable names to reflect their usage.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
This doesn't free the mbuf upon error; the driver ic_raw_xmit method is still
doing that.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3774
Move error handling into ieee80211_parent_xmitpkt() instead of spreading it
between functions.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3772
* Create ieee80211_free_mbuf() which frees a list of mbufs.
* Use it in the fragment transmit path and ath / uath transmit paths.
* Call it in xmit_pkt() if the transmission fails; otherwise fragments
may be leaked.
This should be a big no-op.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3769
The system will appear to lockup for long periods of time due to the NTB
driver spending too much time in memcpy. Avoid this by reducing the
number of packets that can be serviced on a given interrupt.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
The ring logic of the NTB receive buffer/transmit memory window requires
there to be at least 2 payload sized allotments. For the minimal size
case, split the buffer into two and set the transport_mtu to the
appropriate size.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
If the NTB link toggles, the driver could stop receiving due to the
tx_index not being set to 0 on the transmitting size on a link-up event.
This is due to the driver expecting the incoming data to start at the
beginning of the receive buffer and not at a random place.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
Each link-up will allocate a new NTB receive buffer when the NTB
properties are negotiated with the remote system. These allocations did
not check for existing buffers and thus did not free them. Now, the
driver will check for an existing buffer and free it if not of the
correct size, before trying to alloc a new one.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
64bit BAR sizes are permissible with an NTB device. To support them
various modifications and clean-ups were required, most significantly
using 2 32bit scratch pad registers for each BAR.
Also, modify the driver to allow more than 2 Memory Windows.
Authored by: Jon Mason
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
->remote_rx_info and ->rx_info are struct ntb_rx_info pointers. If we
add sizeof(struct ntb_rx_info) then it goes too far.
Authored by: Dan Carpenter
Obtained from: Linux
Sponsored by: EMC / Isilon Storage Division
This change fixes some amount of -Wsign-conversion and -Wconversion warnings
and sets correct sizes for some variables (as a result, some loop counters
were touched too).
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3763
This is roughly the iw_cxgbe equivalent of
be13b2dff8
-----------------
RDMA/cxgb4: Connect_request_upcall fixes
When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.
If the IWCM returns an error, then we must abort the connection and
release resources. Also abort_connection() should not post a CLOSE
event, so clean that up too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
-----------------
Submitted by: Krishnamraju Eraparaju at chelsio dot com.
This was off because the net80211 aggregation code was using the same
state pointers for both fast frames and ampdu tx support which led to some
pretty unfortunate panic-y behaviour.
Now that net80211 doesn't panic, let's flip this back on.
It doesn't (yet) do the horrific sounding thing of A-MPDU aggregates
of fast frames; that'll come next. It's a pre-requisite to supporting
AMSDU + AMPDU anyway, which actually speeds things up quite considerably
(think packing lots of little ACK frames into a single AMSDU.)
Tested:
* QCA955x SoC, AP mode
* AR5416, STA mode
* AR9170, STA mode (with local fast frame patches)
these functions are thin wrappers around calling the hardware-layer driver,
but some of them do sanity checks and return an error. Since the hardware
layer can only return IIC_Exxxxx status values, the iicbus helper functions
must also adhere to that, so that drivers at higher layers can assume that
any non-zero status value is an IIC_Exxxx value that provides details about
what happened at the hardware layer (sometimes those details are important
for certain slave drivers).
errno values that are at least vaguely equivelent. Also add a new status
value, IIC_ERESOURCE, to indicate a failure to acquire memory or other
required resources to complete a transaction.
The IIC_Exxxxxx values are supposed to communicate low-level details of the
i2c transaction status between the lowest-layer hardware driver and
higher-layer bus protocol and device drivers for slave devices on the bus.
Most of those slave drivers just return all status values from the lower
layers directly to their callers, resulting in crazy error reporting from a
user's point of view (things like timeouts being reported as "no such
process"). Now there's a helper function to make it easier to start
cleaning up all those drivers.
Make it clearer what each one means in the comments that define them.
IIC_BUSBSY was used in many places to mean two different things, either
"someone else has reserved the bus so you have to wait until they're done"
or "the signal level on the bus was not in the state I expected before/after
issuing some command".
Now IIC_BUSERR is used consistantly to refer to protocol/signaling errors,
and IIC_BUSBSY refers to ownership/reservation of the bus.
perform a stop operation on the bus if there was an error, otherwise the
bus will remain hung forever. Consistantly use 'if (error != 0)' style in
the function.
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.
Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.
Over the years, code has been modified to support various new features but
the driver has not been reworked.
This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.
Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.
List of items that may be good to have but not mandatory:
- Avoid to flush for each character written when using the tty
- Support multiple consoles
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3698
Sponsored by: Citrix Systems R&D
Use direct dispatch into the destination hardware ring instead of using
a staging queue.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3757
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.
This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.
Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:
- sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
drop it.
- {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
xen/interface/event_channel.h, so include it.
- {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
machine/intr_machdep.h. This is also fixing build compilation with the
new headers.
- dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
been dropped. So directly use struct blkif_request_segment
Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3805
Sponsored by: Citrix Systems R&D
'rdrand' instruction may occasionally not return random numbers, in
spite of looping attempts to do so. The reusult is a KASSERT/panic.
Reluctantly accept this state-of-affairs, but make a noise about it.
if this 'noise' spams the console, it may be time to discontinue
using that source.
This is written in a general way to account for /any/ source that
might not supply random numbers when required.
Submitted by: jkh (report and slightly different fix)
Approved by: so (/dev/random blanket)
* Remove obsolete drm_agp_*_memory() prototypes.
* Fix comment in drm_fops.c (outisde -> outside).
* Fix some formatting issues in drm_stub.c (spaces -> tabs).
* Add missing case statement (gen == 3) in intel_gpu_reset().
* Restore pci_enable_busmaster() call in the init path (fixes gpu hang on i945GM).
* Replace M_WAITOK with M_NOWAIT when the return value of malloc is checked (may be incorrect).
Submitted by: <s3erios@gmail.com>
Reviewed by: dumbbell
Approved by: dumbbell
Differential Revision: https://reviews.freebsd.org/D3413
Now run(4) fetches parameters from ic->ic_wme.wme_params array, which is never initialized
(and can be safely removed). This patch replaces &ic->ic_wme.wme_params with
&ic->ic_wme.wme_chanParams.cap_wmeParams (contains parameters for local station;
used by other drivers with WME support).
Tested:
* me: STA: run0: MAC/BBP RT5390 (rev 0x0502), RF RT5370 (MIMO 1T1R), address 38:83:45:11:78:ae
Now device will use retry limit, which is set via 'ifconfig <interface>
maxretry <number>'.
Tested:
* Tested on WUSB54GC, STA mode.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3689
The MAC can be fetched from the key struct.
I added the ndis updates to make it compile.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3657
This diff includes:
* Transmitter Addresses, Keys and TKIP MIC addition to the Security Key Table.
* Proper SEC Control Registers initialization and maintenance.
* Additional flags and values in TX descriptor, which are required for encryption support.
* Error checking in RX path.
Tested:
* Tested on WUSB54GC, STA (WEP, TKIP, CCMP), HOSTAP (CCMP) and IBSS (CCMP, WPA-None) modes.
* rum0: MAC/BBP RT2573 (rev 0x2573a), RF RT2528, STA mode (CCMP+TKIP)
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3640
Note: I manually had to merge this; I merged in the "put beacon_offsets
into vap" commit before this.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3628
Don't override the NIC MAC address with an overridden MAC address for
a VAP.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3625
Tested:
* rum0: MAC/BBP RT2573 (rev 0x2573a), RF RT2528, STA mode
Note: haven't tested AP mode yet; will do once the rest of the
AP mode / power save commits are in.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3624
Move the mbuf free responsibility to the caller of the hardware xmit
function, not the hardware xmit function itself.
Submitted by: <s3erios@gmail.com>
Differential Revision: https://reviews.freebsd.org/D3621