This should export all of the same information as the Linux ntb_hw_intel
debugfs info file, but with a bit more structure, in the sysctl tree
rooted at 'dev.ntb_hw.<N>.debug_info'.
Raw registers are marked as OPAQUE because reading them on some hardware
revisions may cause a hard lockup (NTB errata). They can be read with
'sysctl -x dev.ntb_hw.<N>.debug_info.registers'. On Xeon platforms,
some additional registers are available under 'registers.xeon_stats' and
'registers.xeon_hw_err'. They are exported as big-endian values so that
the 'sysctl -x' output is legible.
Shrink the feature mask to 32 bits so we can use the %b formatter in
'debug_info.features'.
Sponsored by: EMC / Isilon Storage Division
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D3954
linux: fix handling of out-of-bounds syscall attempts
Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.
Don't run the selftest until after we've enabled bus mastering, or the
DMA engine can't copy anything for our test.
Create the ioat_test device on attach, if so tuned. Destroy the
ioat_test device on teardown.
Replace deprecated 'CALLOUT_MPSAFE' with correct '1' in callout_init().
Sponsored by: EMC / Isilon Storage Division
Similar to r286787 for x86, this treats userspace buffers the same as unmapped buffers and no longer borrows the UVA for sync operations.
Submitted by: Svatopluk Kraus <onwahe@gmail.com> (earlier revision)
Tested by: Svatopluk Kraus
Differential Revision: https://reviews.freebsd.org/D3869
It turns out that it is pretty easy to make CloudABI work on ARM64. We
essentially only need to copy over the sysvec from AMD64 and ensure that
we use ARM64 specific registers.
As there is an overlap between function argument and return registers,
we do need to extend cloudabi64_schedtail() to only set its values if
we're actually forking. Not when we're creating a new thread.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3917
For CloudABI we need to initialize the registers of new threads
differently based on whether the thread got created through a fork or
through simple thread creation.
Add a flag, TDP_FORKING, that is set by do_fork() and cleared by
fork_exit(). This can be tested against in schedtail.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3973
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3974
This is an AR9331 part based on the AP121 reference design but with
32MB RAM. Yes, it has 4MB flash and it has no USB, so clever hacks
are required to get it up and working.
But boot/work it does.
This part seems to work bug-free with single byte TX/RX buffer alignment.
This drops the CPU requirement to bridge 100mbit iperf from 100% CPU
to ~ 50% CPU.
Tested:
* AP121 (AR9330) SoC, highly magic netbooted kernel + USB rootfs
due to 4mb flash, 16mb RAM; doing bridging between arge0 and arge1.
Notes:
* Yes, I likely can also turn this on for the AR934x SoC family now.
But since hardware design apparently follows similar branching
strategies to software design, I'll go and make sure all the AR934x's
that made it out into shipping products work before I flip it on.
The test logic now preallocates memory before running the test.
The buffer size is now configurable. Post-copy verification is
configurable. The number of copies to chain into one transaction (one
interrupt) is configurable.
A 'duration' mode is added, which repeats the test until the duration
has elapsed, reporting the B/s and transactions completed.
ioatcontrol.8 has been updated to document the new arguments.
Initial limits (on this particular Broadwell-DE) (and when the
interrupts are working) seem to be: 256 interrupts/sec or ~6 GB/s,
whichever limit is more restrictive.
Unfortunately, it seems the interrupt-reset handling on Broadwell isn't
working as intended. That will be fixed in a later commit.
Sponsored by: EMC / Isilon Storage Division
The FDT bindings for eeprom parts don't include any metadata about the
device other than the part name encoded in the compatible property.
Instead, a driver is required to have a compiled-in table of information
about the various parts (page size, device capacity, addressing scheme). So
much for FDT being an abstract description of hardware characteristics, huh?
In addition to the FDT-specific changes, this also switches to using the
newer iicbus_transfer_excl() mechanism which holds bus ownership for the
duration of the transfer. Previously this code held the bus across all
the transfers needed to complete the user's IO request, which could be
up to 128KB of data which might occupy the bus for 10-20 seconds. Now the
bus will be released and re-aquired between every page-sized (8-256 byte)
transfer, making this driver a much nicer citizen on the i2c bus.
The hint-based configuration mechanism is still in place for non-FDT systems.
Michal Meloun contributed some of the code for these changes.
while holding exclusive ownership of the bus. This is the routine most
slave drivers should use unless they have a need to acquire and hold the
bus across a series of related operations that involves multiple transfers.
various reasons while executing user commands. After these commands are
completed, the pages backing the relocation regions are unheld.
Since relocation regions do not have to be page aligned, the code in
validate_exec_list() allocates 2 extra page pointers in the array of
held pages populated by vm_fault_quick_hold_pages(). However, the cleanup
code that unheld the pages always assumed that only the buffer size /
PAGE_SIZE pages were used. This meant that non-page aligned buffers would
not unheld the last 1 or 2 pages in the list. Fix this by saving the
number of held pages returned by vm_fault_quick_hold_pages() for each
relocation region and using this count during cleanup.
Reviewed by: dumbbell, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3965
offset within the buffer to align the L3 headers we know the buffer itself
was allocated and sized on cacheline boundaries and we don't need to
preserve partitial cachelines at the start and end of the buffer when
doing busdma sync operations.
represented in 7-bits format in DT files, but system expect it in 8-bit
format. Also, fix two drivers that locally hack around this bug.
Submitted by: Michal Meloun <meloun@miracle.cz>
to copying in some code from the armv4 busdma, and adapting a few variable
and flag names to match the surrounding mips code.
Instead of keeping a local cache of prealloced busdma_map structs on a
mutex-protected list, set up an uma zone to cache them.
Instead of all memory allocations using M_DEVBUF, use new categories
M_BUSDMA for allocations of metadata (tags, maps, segment tracking lists),
and M_BOUNCE for bounce pages.
When buffers are allocated out of the busdma_bufalloc zones the alignment
and size of the buffers is known, and the code can skip doing any "partial
cacheline flush" logic to preserve data that may be adjacent to the DMA
buffer but contain non-DMA data.
Reviewed by: adrian, imp
and implement support for VM_MEMATTR_UNCACHEABLE. This will be used in
upcoming changes to support BUS_DMA_COHERENT in bus_dmamem_alloc().
Reviewed by: adrian, imp
the not-SMP case. This is safe because arm_irq_next_cpu() will return
the cpuid of the current/only core in the not-SMP case.
Submitted by: Bartosz Szczepanek @ semihalf
r289587 broke LINT-NOIP kernels because the lro and queued local variables
are defined but not used. Add preprocessor guards around them.
Reported by: emaste
Sponsored by: Citrix Systems R&D
xen/hypervisor.h:
- Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
is_running_on_xen
- Remove unused define CONFIG_X86_PAE
- Remove unused variable xen_start_info: note that it's used inpcifront
which is not built at all
- Remove forward declaration of HYPERVISOR_crash
xen/xen-os.h:
- Remove unused define CONFIG_X86_PAE
- Drop unused helpers: test_and_clear_bit, clear_bit,
force_evtchn_callback
- Implement a generic version (based on ofed/include/linux/bitops.h) of
set_bit and test_bit and prefix them by xen_ to avoid any use by other
code than Xen. Note that It would be worth to investigate a generic
implementation in FreeBSD.
- Replace barrier() by __compiler_membar()
- Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
= pause
xen/xen_intr.h:
- Move the prototype of xen_intr_handle_upcall in it: Use by all the
platform
x86/xen/xen_intr.c:
- Use BITSET* for the enabledbits: Avoid to use custom helpers
- test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
- Don't export the variable xen_intr_pcpu
dev/xen/blkback/blkback.c:
- Fix the string format when XBB_DEBUG is enabled: host_addr is typed
uint64_t
dev/xen/balloon/balloon.c:
- Remove set but not used variable
- Use the correct type for frame_list: xen_pfn_t represents the frame
number on any architecture
dev/xen/control/control.c:
- Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
means the driver can handle this device. If by any chance xenstore is the
first driver, every new device with the driver is unset will use
xenstore.
dev/xen/grant-table/grant_table.c:
- Remove unused cmpxchg
- Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
contain anything required for the code on x86
dev/xen/netfront/netfront.c:
- Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
number on any architecture
dev/xen/netback/netback.c:
- Use the correct type for gmfn: xen_pfn_t represents the frame number on
any architecture
dev/xen/xenstore/xenstore.c:
- Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
means the driver can handle this device. If by any chance xenstore is the
first driver, every new device with the driver is unset will use xenstore.
Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3921
Sponsored by: Citrix Systems R&D
amd64 and i386 platform code contain very similar xen/xen-os.h
The only differences are:
- Functions/variables/types which were unused in i386/xen/xen-os.h:
* xen_xchg
* __xchg_dummy
* __xg
* __xchg
* atomic_t
* atomic_inc
* rdtscll
The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.
The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3880
Sponsored by: Citrix Systems R&D
This was triggering when using it as an AP bridge rather than an ethernet
bridge.
The code is unclear but it works; I'll fix it to be clearer and test
performance at a later stage.
The existing code meets the "alignment" requirement for the l3 payload
by offsetting the mbuf by uint64_t and then calling an rx fixup routine
to copy the frame backwards by 2 bytes. This DWORD aligns the
L3 payload so tcp, etc doesn't panic on unaligned access.
This is .. slow.
For arge MACs that support 1 byte TX/RX address alignment, we can do
the "other" hack: offset the RX address of the mbuf so the L3 payload
again is hopefully DWORD aligned.
This is much cheaper - since TX/RX is both 1 byte align ready (thanks
to the previous commit) there's no bounce buffering going on and there
is no rx fixup copying.
This gets bridging performance up from 180mbit/sec -> 410mbit/sec.
There's around 10% of CPU cycles spent in _bus_dmamap_sync(); I'll
investigate that later.
Tested:
* QCA955x SoC (AP135 reference board), bridging arge0/arge1
by programming the switch to have two vlangroups in dot1q mode:
# ifconfig bridge0 inet 192.168.2.20/24
# etherswitchcfg config vlan_mode dot1q
# etherswitchcfg vlangroup0 members 0,1,2,3,4
# etherswitchcfg vlangroup1 vlan 2 members 5,6
# etherswitchcfg port5 pvid 2
# etherswitchcfg port6 pvid 2
# ifconfig arge1 up
# ifconfig bridge0 addm arge1
I messed up when doing the reset_vlans method - setting vid[0] = 1 here
was making it 'hidden' from configuration (as it needed ETHERSWITCH_VID_VALID
as well) and so there was no way to configure vlangroup0.
In per-port VLAN mode, vlangroup0 is for the CPU port (port0).
Now, it normally wouldn't really matter - the CPU port thus sees
all other ports. However there are two CPU ports on the AR8327 and
so port0 (arge0) was seeing all traffic on port6 (arge1).
If you thus tried to use arge1/port6 for anything (eg a WAN port)
in a bridge group then things would very upset very quickly.
Whilst here, add a comment to remind myself that yes, it'd be nice
if we could specify a boot-time switch config.
Tested:
* AP135 reference platform w/ AR8327N switch
r289660:
Do not allow to execute ptrace(PT_TRACE_ME) when the process is
already traced.
Do not allow to execute ptrace(PT_TRACE_ME) when there is no parent
which can trace the process, i.e. when the parent is already init.
Note that after the PT_TRACE_ME request the process is unkillable and
non-continuable until a debugger is attached, or parent is killed, the
later clears P_TRACED state. Since init clearly would not debug the
caller, and cannot be killed, disallow creation of unkillable
processes.
Reviewed by: jhb, pho
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3908
When establishing the locking state for several lock types (including
blockable mutexes and sx) failed, locking primitives try to spin while
the owner thread is running. The spinning loop performs the test for
running condition by dereferencing the owner->td_state field of the
owner thread. If the owner thread exited while spinner was put off
the processor, it is harmless to access reused struct thread owner,
since in some near future the current processor would notice the owner
change and make appropriate progress. But it could be that the page
which carried the freed struct thread was unmapped, then we fault
(this cannot happen on amd64).
For now, disallowing free of the struct thread seems to be good
enough, and tests which create a lot of threads once, did not
demonstrated regressions.
Reviewed by: jhb, pho
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D3908
If the bus is detached and deleted by a call to device_delete_child() or
device_delete_children() on a device higher in the tree, I²C children
were already detached and deleted. So the device_t pointer stored in sc
points to freed memory: we must not try to delete it again.
By using device_delete_children(), we let subr_bus.c figure out if there
are children to take care of.
While here, make sure iicbus_detach() and iicoc_detach() call
device_delete_children() too, to be safe.
Reviewed by: jhb, imp
Approved by: jhb, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3926
Add ntb_q_idx_t so it is more clear which struct members are of the same
type (some bogus uint64_ts snuck in that should have been unsigned int).
Add tx_err_no_buf and s/ENOMEM/EBUSY/ in tx_enqueue to match Linux.
Sponsored by: EMC / Isilon Storage Division
A plain 32 bit integer will overflow for values over 4GiB.
Change the plain integer size to the appropriate size type in
ntb_set_mw. Change the type of the size parameter and two local
variables used for size.
Even if there is no overflow, a size of zero is invalid here.
Authored by: Allen Hubbe
Reported by: Juyoung Jung
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
It was possible for a synchronous update of the RX index in the error
case to get ahead of the asynchronous RX index update in the normal
case. Change the RX processing to preserve an RX completion order.
There were two error cases. First, if a buffer is not present to
receive data, there would be no queue entry to preserve the RX
completion order. Instead of dropping the RX frame, leave the RX frame
in the ring. Schedule RX processing when RX entries are enqueued, in
case there are RX frames waiting in the ring to be received.
Second, if a buffer is too small to receive data, drop the frame in the
ring, mark the RX entry as done, and indicate the error in the RX entry
length. Check for a negative length in the receive callback in
ntb_netdev, and count occurrences as rx_length_errors.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Mechanically replace "SOC" with "ATOM" to match Linux. No functional
change. Original Linux commit log follows:
Instead of using the platform code names, use the correct platform names
to identify the respective Intel NTB hardware.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Prints driver name to indicate what is being loaded.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Normally this routine is supposed to loop until the PIC returns a "no more
interrupts pending" indication. I had commented that out to do just one
interrupt per invokation to do some timing tests.
Spotted by: Svata Kraus
Pointy Hat: ian
- Remove redundant NBLONG macro and use BIT_WORD()
and BIT_MASK() instead.
- Correctly define BIT_MASK() according to Linux and
update all users of this macro.
- Add missing GENMASK() macro.
- Remove all comments deriving from Linux.
Sponsored by: Mellanox Technologies
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Throw away the result of the peer SPAD read. The peer will write our
local SPAD and we need to keep the locally read SPAD value to check if
the remote side is up.
Sponsored by: EMC / Isilon Storage Division
Add module parameters for the addresses to be used in B2B topology.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Reset the link stats when the link goes down. In particular, the TX and
RX index and count must be reset, or else the TX side will be sending
packets to the RX side where the RX side is not expecting them. Reset
all the stats, to be consistent.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
We skip actually bringing up Rootport/Transparent configurations, so
most of this doesn't apply. Original Linux commit log:
Link training should be enabled in the driver probe for root port mode.
We should not have to wait for transport to be loaded for this to
happen. Otherwise the ntb device will not show up on the transparent
bridge side of the link.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The bits in the aux control register vary based on the processor type. In
the past we've always just set the 'smp' and "broadcast tlb/cache ops' bits,
which worked fine for the first few SoCs we supported. Now that we support
most of the cortex-a series processors, it's important to get the right bits
set based on the processor type.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>
It is just a trivial wrapper around ntb_mw_set_trans().
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The failure path for allocating rx grant refs should not try to free tx
grant refs because tx grant refs were allocated after that. Also fix the
error path for xen_net_read_mac.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3891
Sponsored by: Citrix Systems R&D
This is redundant because ether_ifattach will set that field.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3918
Sponsored by: Citrix Systems R&D
We're way beyond FreeBSD 7 at this point.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3892
Sponsored by: Citrix Systems R&D
Multiqueue feature will make the number of queues dynamic, so XN_LOCK_INIT
won't be that useful. Remove the macro and call mtx_init directly.
XN_LOCK_DESTROY is just dead code.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3890
Sponsored by: Citrix Systems R&D
Rename it with netfront_ prefix and purge a bunch of unused fields.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3889
Sponsored by: Citrix Systems R&D
Currently neither Linux nor FreeBSD netback supports page flipping. NetBSD
still supports that. It is not sure how many people actually use page
flipping, but page flipping is supposed to be slower than copying nowadays.
It will also shatter frontend / backend address space.
Overall this feature is more of a burden than a benefit.
Submitted by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3888
Sponsored by: Citrix Systems R&D
boot on an SoC that places physical memory at an address past where three
levels of page tables can access in an identity mapping.
Submitted by: Wojciech Macek <wma@semihalf.com>,
Patrick Wildt <patrick@bitrig.org>
Differential Revision: https://reviews.freebsd.org/D3885 (partial)
Differential Revision: https://reviews.freebsd.org/D3744
- Redefine DIV_ROUND_UP as a function macro taking two arguments
instead of none.
- Implement more Linux kernel functions related to various forms
of DELAY() and basic mathematical operations.
Sponsored by: Mellanox Technologies
- Define the kref structure identical to the one found in Linux.
- Update clients referring inside the kref structure.
- Implement kref_sub() for FreeBSD.
Reviewed by: np @
Sponsored by: Mellanox Technologies
- Reimplement ktime header file to distinguish more from Linux.
- Add new time header file to handle time related Linux functions.
Sponsored by: Mellanox Technologies
- Avoid using PAGE_MASK, because Linux defines it differently.
Use (PAGE_SIZE - 1) instead.
- Add support for for_each_sg_page() and sg_page_iter_dma_address().
Sponsored by: Mellanox Technologies
- Added support for multiple new Linux functions.
- Properly implement DEFINE_WAIT() and init_waitqueue_head() macros.
- Removed FreeBSD specific __wait_queue_head structure definition.
Sponsored by: Mellanox Technologies
* Use the correct malloc type for node allocation - M_80211_NODE - so
the default node free method in net80211 will work correctly.
* Fix otus_node_alloc() to suit FreeBSD's net80211.
* .. and actually call otus_node_alloc() so there's space for the
per-node tx statistics. Otherwise, well, it will be scribbling over
random memory.
Tested:
* AR9170, STA mode
The monitor mode stuff is from the openbsd driver, but it doesn't
100% work. It doesn't seem to get all frames for all BSSes.
However, it's enough to at start debugging things. That 0xffffffff
write is /I think/ the RX filter, but I am still not 100% sure about
it all.
Then, whilst here, use the lowest rate for EAPOL frames. This is just
generally a good thing to do.
This commit adds support for MDIO present in the ThunderX SoC.
From the FDT point of view it is compatible with "octeon-3860-mdio"
however only C22 mode is used.
The code also implements lmac_if interface functions.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
- The driver consists of three main componens: PF, VF, BGX
- Requires appropriate entries in DTS and MDIO driver
- Supports only FDT configuration
- Multiple Tx queues and single Rx queue supported
- No RSS, HW checksum and TSO support
- No more than 8 queues per-IF (only one Queue Set per IF)
- HW statistics enabled
- Works in all available MAC modes (1,10,20,40G)
- Style converted to BSD according to style(9)
- The code brings lmac_if interface used by the BGX driver to
update its logical MACs state.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
This import brings following components of the Linux driver:
- Thunder BGX (programmable MAC)
- Physical Function driver
- Virtual Function driver
- Headers
Revision: 1.0
Obtained from: Cavium
License information: Cavium provided these files under BSD license
The interrupts-extended property is a list of controller-specific
interrupt tuples for more than one controller. The decode routine of
every PIC gets called in the pre-INTRNG code (nexus doesn't know which
device instance belongs to which fdt node), so the GIC code has to
check each FDT node it is asked to decode to ensure it is the owner.
Because in the pre-INTRNG world there can only be one instance of a GIC,
it's safe to cache the results of a positive lookup in a static variable
to avoid the expensive lookups on subsequent calls.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>
Differential Revision: https://reviews.freebsd.org/D2345
This is the last e26a5843 patch. The general thrust of the rewrite was
to move more responsibility for Memory Window and Doorbell interrupt
management from the ntb_hw driver to if_ntb.
A number of APIs have been added, removed, or replaced. The old
DB callback mechanism has been excised. Instead, callers (if_ntb) are
responsible for configuring MWs and handling their interrupts more
directly.
This adds a tunable, hw.ntb.max_mw_size, allowing users to limit the
size of memory windows used by if_ntb (identical to the Linux modparam
of the same name).
Despite attempts to keep mechanical name changes to separate commits,
some have snuck in here. At least the driver should be much more
similar to the latest Linux one now -- making porting fixes easier.
Authored by: Allen Hubbe
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
No functional change. Part of the huge rewrite (e26a5843).
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Move all Xeon secondary register setup to the setup_b2b_mw routine. We
use subroutines to make it a bit less wordy than the Linux version.
Adds a new tunable, 'hw.ntb.b2b_mw_share'. By default, it is off
(zero). If both sides enable it (any non-zero value), the NTB driver
attempts to use only half of a memory window for remote register MMIO
access.
This is still part of the large Linux rewrite (e26a5843).
Authored by: Allen Hubbe
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This Linux commit was more or less a rewrite. Unfortunately, the commit
log does not give a lot of context for the rewrite. I have tried to
faithfully follow the changes made upstream, including matching function
names where possible, while churning the FreeBSD driver as little as
possible.
This is the bulk of the rewrite. There are two groups of changes to
follow in separate commits: fleshing out the rest of the changes to
xeon_setup_b2b_mw(), and some changes to if_ntb.
Yes, this is a big patch (3 files changed, 416 insertions(+), 237
deletions(-)), but the Linux patch was 13 files changed, 2,589
additions(+) and 2,195 deletions(-).
Original Linux commit log:
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Authored by: Allen Hubbe
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Some interrupt-related function names changed to match Linux.
No functional change. Still part of the huge e26a5843 rewrite in Linux.
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
No functional change.
Still part of the huge e26a5843 rewrite. I'm trying to make it less of
a complete rewrite in the FreeBSD version of the driver. Still, it
helps if our names match Linux.
Obtained from: Linux (e26a5843) (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
interrupt controller.
The latter is required for INTRNG, because of the hardware erratum
workaround installed by the linux folks into the imx6 FDT data, which remaps
an ethernet interrupt to the gpio device. In the non-INTRNG world we
intercept the call to map the interrupt and map it back to the ethernet
hardware (because we don't need linux's workaround), but in the INTRNG world
we lose the hookpoint where that remapping was happening, but we gain the
ability to work the way linux does by having the gpio driver dispatch the
interrupt.
and armv6 architecures. The primary enhancement over the old design is
support for hierarchical interrupt controllers (such as a gpio driver
which can receive interrupts from a root PIC and act as a PIC itself for
clients interested in handling a change of gpio pin state as an
interrupt). The new code also provides an infrastructure for mapping
interrupts described in metadata in the form of a "controller reference
plus interrupt number" tuple into the simple "0-n" flat numeric space
understood by rman and the bus resource mechanisms.
Use of the new code is enabled by setting the ARM_INTRNG option, and by
making a few simple changes to the platform's support code. In addition
each existing PIC driver needs changes to be ready for INTRNG; this commit
contains the changes for the arm/gic driver, which most armv6 SoCs use, but
it does not enable the new code yet on any platform.
This project has been many years in the making, starting as a GSoC project
by Jakub Klama (jceel@) in 2012. That didn't get committed right away and
the source base evolved out from under it to some degree. In 2014 I rebased
the diffs to then -current and did some enhancements in the area of mapping
interrupt numbers and storing associated fdt data, then the project went
cold again for a while. Eventually Svata Kraus took that work in progress
and did another big round of work on it, removing most of the remaining
rough edges. Finally I took that and made one more pass through it, mostly
disabling the "INTR_SOLO" feature for now, pending further design
discussions on how to most efficiently dispatch a pending interrupt through
more than one layer of PIC. The current code with the INTR_SOLO feature
disabled uses approximate 100 extra cpu cycles for each cascaded PIC the
interrupt has to be passed to, so what's left to do is about efficiency, not
correct operation.
Differential Revision: https://reviews.freebsd.org/D2047
5561 support root pools on EFI/GPT partitioned disks
5125 update zpool/libzfs to manage bootable whole disk pools (EFI/GPT labeled disks)
Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
illumos/illumos-gate@1a902ef862
This is NOP changes for FreeBSD.
the name the function will have when the new ARM_INTRNG code is integrated,
and doing this rename first will make it easier to toggle the new interrupt
handling code on/off with a config option for debugging.
in through the stack pointer, however this may have been misaligned
causing some userland applications to crash. A workaround was committed in
r284707 where userland would check if the aux vector was passed using the
old or new ABI and adjust the stack if needed. As 4 months have passed it
is time to move to the new ABI, with the expectation the compat code in csu
and the runtime linker to be removed in the future.
Sponsored by: ABT Systems Ltd
down state.
Regression appeared in r287789, where the "prefix has no corresponding
installed route" case was forgotten. Additionally, lltable_delete_addr()
was called with incorrect byte order (default is network for lltable code).
While here, improve comments on given cases and byte order.
PR: 203573
Submitted by: phk
in vm_pageout_fallback_object_lock() and vm_pageout_page_lock(). The
check for the m->queue == queue assumes that the page does belong to a
queue.
Modify the 'unchanged' calculation bu dereferencing the marker tailq
pointers, which is known to belong to the queue. Since for a page m
linked to the queue, m->queue must be equal to the queue index, assert
this instead of checking.
In collaboration with: alc
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 2 weeks
value is defined as a config option the definition is emitted into
opt_global.h which is force-included into everything. In addition, the
symbol is emitted by the genassym mechanism, but that by its nature reduces
the value to a 0xnnnnnnnn number. When compiling a .S file you end up
with two different definitions of the macro (they evaluate to the same
number, but the text is different, upsetting the compiler).
Nothing has changed about this code for a while but the compile error is
new, so this must be fallout from the clang 3.7 update or something.
The early ethernet MACs (I think AR71xx and AR913x) require that both
TX and RX require 4-byte alignment for all packets.
The later MACs have started relaxing the requirements.
For now, the 1-byte TX and 1-byte RX alignment requirements are only for
the QCA955x SoCs. I'll add in the relaxed requirements as I review the
datasheets and do testing.
* Add a hardware flags field and 1-byte / 4-byte TX/RX alignment.
* .. defaulting to 4-byte TX and 4-byte RX alignment.
* Only enforce the TX alignment fixup if the hardware requires a 4-byte
TX alignment. This avoids a call to m_defrag().
* Add counters for various situations for further debugging.
* Set the 1-byte and 4-byte busdma alignment requirement when
the tag is created.
This improves the straight bridging performance from 130mbit/sec
to 180mbit/sec, purely by removing the need for TX path bounce buffers.
The main performance issue is the RX alignment requirement and any RX
bounce buffering that's occuring. (In a local test, removing the RX
fixup path and just aligning buffers raises the performance to above
400mbit/sec.
In theory it's a no-op for SoCs before the QCA955x.
Tested:
* QCA9558 SoC in AP135 board, using software bridging between arge0/arge1.
an extra argument to specify the number of 1GiB pages to map. This should
be a nop as we are only mapping a single page, but when we move to use an
extra level of page tables we will be able to map a second block, e.g. if
the kernel was loaded over a 1GiB boundary.
atapicd(4) has been removed since r249083, and if a system has more than one
optical drive, it will likely be /dev/cd1
Update mount.conf(8) to reflect the change in behavior
MFC after: never
Sponsored by: EMC / Isilon Storage Division
an error.
Most of these do a 'mkdir -p' or 'install -d' before installing, but add
the trailing / here for consistency with the userland install.
MFC after: 2 weeks
X-MFC-With: r289391
Sponsored by: EMC / Isilon Storage Division
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@45818ee124
This is only a partial merge of respective ZFS infrastructure changes.
At this moment FreeBSD kernel has no those crypto algorithms, so the
parts of the code to enable them are commented out. When they are
implemented, it will be trivial to plug them in.
MACHINE_CPUARCH expands to aarch64 for arm64, whereas MACHINE always
corresponds to the directory name under sys/ that contains the sources
for that architecture.
deletions. Ability to do deletions is a strong indication that this
optimization will not help performance. It will only generate extra write
traffic. These devices are typically flash based and have a limited number of
write cycles. In addition, making the file contiguous in LBA space doesn't
improve the access times from flash devices because they have no seek time.
Reviewed by: mckusick@
bug of installing 'realtek' and 'intel_iwn' as files rather then as
a 'LICENSE' file in their directories.
Also add obsolete entries for the older names and names that existed in head
for a period of time.
Suggested by: jmg
X-MFC-With: r289391
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
On the Haswell platform, a split BAR option to allow creation of 2 32bit
BARs (4 and 5) from the 64bit BAR 4. Adding support for this new option.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
This is a follow-up to r289208: "Xeon Errata Workaround."
Add logic to support a variable number of memory windows and doorbell
callbacks. This was added to the Linux driver in the "Xeon Errata
Workaround" commit, but I skipped it because it didn't look neccessary
at the time. It is needed for future Haswell split-BAR support, so
bring it in now.
A new tunable was added for if_ntb, 'hw.ntb.max_num_clients'. By
default, it is set to zero -- infer the number of clients from the
number of memory windows available from the hardware. Any other
positive value can specify a different number of clients, limited by the
number of doorbell callbacks available (4 under MSI-X, or 15 (Xeon) or
34 (SoC) under legacy INTx).
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
While trying to get multithreading working for CloudABI on aarch64, I
noticed that compare-and-exchange operations in kernelspace would always
fail. It turns out that we don't properly set the return value to 0 when
the compare and exchange succeeds.
Approved by: andrew
Differential Revision: https://reviews.freebsd.org/D3899
casuword(9) and others, use LDRT and STRT instructions to access
memory with the privileges of userspace. If the *RT instruction
faults on the kernel address, then additional checks must be done to
not confuse the VM system with invalid kernel-mode faults.
Put ARM on line with other FreeBSD architectures and disallow usermode
buffers which intersect with the kernel address space in advance,
before any accesses are performed. In other words, vm_fault(9) is no
longer called when e.g. suword(9) stores to invalid (i.e. not
userspace) address.
Also, switch ARM to use fueword(9) and casueword(9).
Note: there is a pending patch in D3617, which adds the special
processing for faults from LDRT and STRT. The addition of the
processing is useful for potential other uses of the instructions and
for completeness, but standard userspace accessors are better served
by not allowing such faults beforehand.
Reviewed by: andrew
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3816
MFC after: 2 weeks
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@9c3fd1216f
For more info, see:
- slides http://www.slideshare.net/MatthewAhrens/openzfs-send-and-receive
- video https://www.youtube.com/watch?v=iY44jPMvxog
- manpage changes (for zfs resume -s and zfs send -t)
- upcoming talk at the OpenZFS Developer Summit
The TL;DR is:
Use "zfs receive -s" to save the partially received state on failure.
On failure, get the receive token with "zfs get receive_resume_token <fs>"
Resume the send with "zfs send -t <token_value>"
Relnotes: yes
This patch adds support for the BCM57765[2] card reader function included in
Broadcom's BCM57766 ethernet/sd3.0 controller. This controller is commonly
found in laptops and Apple hardware (MBP, iMac, etc).
The BCM57765 chipset is almost fully compatible with the SD3.0 spec, but
does not support deriving a frequency below 781KHz from its default base
clock via the standard SD3.0-configured 10-bit clock divisor.
If such a divisor is set, card identification (which requires a 400KHz
clock frequency) will time out[1].
As a work-around, I've made use of an undocumented device-specific clock
control register to switch the controller to a 63MHz clock source when
targeting clock speeds below 781KHz; the clock source is likewise switched
back to the 200MHz clock when targeting speeds greater than 781KHz.
Additionally, this patch fixes a small sdhci_pci bug; the
sdhci_pci_softc->quirks flag was not copied to the sdhci_slot, resulting in
`quirk` behavior not being applied by sdhci.c.
[1] A number of Linux/FreeBSD users have noted that bringing up the chipsets'
associated ethernet interface will allow SD cards to enumerate (slowly).
This is a controller implementation side-effect triggered by the ethernet
driver's reading of the hardware statistics registers.
[2] This may also fix card detection when using the BCM57785 chipset, but I
don't have access to the BCM57785 chipset and can't verify.
I actually snagged some BCM57785 hardware recently (2012 Retina MacBook Pro)
and can confirm that this also fixes card enumeration with the BCM57785
chipset; with the patch, I can boot off of the internal sdcard reader.
PR: kern/203385
Submitted by: Landon Fuller <landon@landonf.org>
HWPMC depends on pmu.c even if device pmu is not specified.
Would be great if we could just automatically enabled "device pmu"
if we try to compile in HWPMC.
Also several old kernel cnfigurations seem to have HWPMC enabled but are
pre-FDT and thus fail. So make pmu.c depend on fdt in case of hwpmc as
well.
MFC after: 2 weeks
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D3877
Pull out read of PPD and platform detection logic to new functions,
ntb_detect_xeon(), ntb_detect_soc(). No functional change -- mostly
this is just shuffling the code to more closely match the Linux driver.
Linux commit log:
To simplify some of the platform detection code. Move the platform
detection to a function to be called earlier.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The doorbell registers (and associated mask) are 16-bit on Xeon but
64-bit on SoC. Abstract IO access to doorbell registers with
'db_ioread' and 'db_iowrite' (names and idea borrowed from the dual
BSD/GPL Linux driver).
Sponsored by: EMC / Isilon Storage Division
Original Linux commit log:
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.
Authored by: Dave Jiang
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
The detection of an uneven number of queues on the given memory windows
was not correct. The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.
Authored by: Jon Mason
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Remap MSI-X messages over available slots rather than falling back to
legacy INTx when fewer MSI-X slots are available than were requested.
N.B. the Linux driver does *not* do this.
To aid in testing, a tunable 'hw.ntb.force_remap_mode' has been added.
It defaults to off (0). When the tunable is enabled and sufficient
slots were available, the driver restricts the number of slots by one
and remaps the MSI-X messages over the remaining slots.
In case this is actually not okay (as I don't yet have access to this
hardware to test), a tunable 'hw.ntb.prefer_intx_to_remap' has been
added. It defaults to off (0). When the tunable is enabled and fewer
slots are available than requested, fall back to legacy INTx mode rather
than attempting to remap MSI-X messages.
Suggested by: jhb
Reviewed by: jhb (earlier version)
Sponsored by: EMC / Isilon Storage Division
Consumers that registered on this bit would never see a callback and it
is likely a mistake.
This does not affect if_ntb, which limits itself to a single doorbell
callback.
The names don't line up 100% with Linux. Our routines are named
ntb_setup_interrupts, ntb_setup_xeon_msix, ntb_setup_soc_msix, and
ntb_setup_legacy_interrupt. Linux SNB = FreeBSD Xeon; Linux BWD =
FreeBSD SOC. Original Linux commit log:
This is an cleanup effort to make ntb_setup_msix() more readable - use
ntb_setup_bwd_msix() to init MSI-Xs on BWD hardware and
ntb_setup_snb_msix() - on SNB hardware.
Function ntb_setup_snb_msix() also initializes MSI-Xs the way it should
has been done - looping pci_enable_msix() until success or failure.
Authored by: Alexander Gordeev
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division