mostly by adjustments to debugging printf() format specifiers. For high
numbered LUNs, also switch to printing them in hex as per SAM-5.
MFC after: 2 weeks
from net/if_var to it own new net/ifq.h.
For now net/ifq.h is unconditionally included through net/if_var.h.
This is a mechanical change in preparation to make struct ifnet and
the individual interface queue mechanisms opaque.
Discussed with: glebius
Sponsored by: The FreeBSD Foundation
the upper 32-bits of the LUN, if possible, into the target_lun field as
passed directly from the REPORT LUNs response. This allows extended LUN
support to work for all LUNs with zeros in the lower 32-bits, which covers
most addressing modes without breaking KBI. Behavior for drivers not
setting PIM_EXTLUNS is unchanged. No user-facing interfaces are modified.
Extended LUNs are stored with swizzled 16-bit word order so that, for
devices implementing LUN addressing (like SCSI-2), the numerical
representation of the LUN is identical with and without PIM_EXTLUNS. Thus
setting PIM_EXTLUNS keeps most behavior, and user-facing LUN IDs, unchanged.
This follows the strategy used in Solaris. A macro (CAM_EXTLUN_BYTE_SWIZZLE)
is provided to transform a lun_id_t into a uint64_t ordered for the wire.
This is the second part of work for full 64-bit extended LUN support and is
designed to a bridge for stable/10 to the final 64-bit LUN code. The
third and final part will involve widening lun_id_t to 64 bits and will
not be MFCed. This third part will break the KBI but will keep the KPI
unchanged so that all drivers that will care about this can be updated now
and not require code changes between HEAD and stable/10.
Reviewed by: scottl
MFC after: 2 weeks
resist easy conversion since they implement a great deal of their attach
logic inside probe(). Some of this could be fixed by moving it to attach(),
but some requires something more subtle than BUS_PROBE_NOWILDCARD.
return BUS_PROBE_NOWILDCARD from their probe routines to avoid claiming
wildcard devices on their parent bus. Do a sweep through the MIPS tree.
MFC after: 2 weeks
Will fix RPI-B kernel build failure since it adds missing
armv6_idcache_wbinv_all which was previously taken from cpufunc_asm_pj4b.S.
Reviewed by: gber
r235816 triggered kernel panic or hang after warm boot.
Don't blindly restore BCE_EMAC_MODE media configuration in
bce_reset(). If driver is about to shutdown it will invoke
bce_reset() which in turn results in restoring BCE_EMAC_MODE
media configuration. This operation seems to confuse controller
firmware.
Reported by: Paul Herman (herman <> cleverbridge dot com)
Tested by: sbruno, Paul Herman (herman <> cleverbridge dot com)
RTL8168GU has two variants(GMII and MII) but it uses the same chip
revision id. Driver checks PCI device id of controller and
sets internal capability flag(i.e. jumbo frame and link speed down
in WOL).
H/W donated by: RealTek Semiconductor Corp.
I don't have a copy of data sheet so I'm not sure exact PHY model
name. Vendor's web page indicates RTL8251 is latest PHY so I used
the name. This PHY is used with RTL8168G, RTL8168GU and RTL8411B.
the rate is 11n, rather than whether the channel is 11n.
This correctly allows the PLCP lookup code to return the legacy rates
even on an 11n channel.
PR: kern/183430
Use values of the correct defines to determine statement's result.
ARM_ARCH_ symbols are always defined, hence only values are relevant.
Reviewed by: cognet
Sheeva PJ4Bv6 - based chips were only prototypes for V7 class Armada
SoC family. Current in-tree support for PJ4Bv6 will not work and also
there should be no platforms in active use that would incorporate that
CPU revision.
Loading kernel to 0xf00000 has no practical reason.
Starting it from the u-boot's highest possible end address
(2MB counting from 0x0) makes more sense.
Tested by: kevlo
Depending on u-boot's flavor some boards have their SoC registers
base address configured to 0xD0000000 and other to 0xF1000000.
U-boot is passing currently set value via CP15 register.
In order to create proper mapping for SoC registers and allow further
successful initialization it is necessary to replace fdt_immr_pa with
the real value and eventually fix-up device tree blob.
Tested by: kevlo
Armada XP initialization flow requires SoC registers to be
mapped very early in order to configure Snoop Filter for SMP.
Additional mapping in locore.S is redundant as proper mapping is
made in pmap_devmap_bootstrap() prior to calling cpu_setup() which
configures the Snoop Filter.
For secondaru CPUs it is better to pass VA of the SoC
registers defined in MV_BASE and PA consistent with the value
in the Device Tree.
Tested by: kevlo
The ng_create_one() and ng_mkpeer() functions in network.subr are
now not used anywhere, but I left them, since they can be useful
in future in netgraph scripting.
Submitted by: pluknet
1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture
Specification. The Extended Context and PASIDs from the rev. 2.2 are
not supported, but I am not aware of any released hardware which
implements them. Code does not use queued invalidation, see comments
for the reason, and does not provide interrupt remapping services.
Code implements the management of the guest address space per domain
and allows to establish and tear down arbitrary mappings, but not
partial unmapping. The superpages are created as needed, but not
promoted. Faults are recorded, fault records could be obtained
programmatically, and printed on the console.
Implement the busdma(9) using DMARs. This busdma backend avoids
bouncing and provides security against misbehaving hardware and driver
bad programming, preventing leaks and corruption of the memory by wild
DMA accesses.
By default, the implementation is compiled into amd64 GENERIC kernel
but disabled; to enable, set hw.dmar.enable=1 loader tunable. Code is
written to work on i386, but testing there was low priority, and
driver is not enabled in GENERIC. Even with the DMAR turned on,
individual devices could be directed to use the bounce busdma with the
hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable. If
DMARs are capable of the pass-through translations, it is used,
otherwise, an identity-mapping page table is constructed.
The driver was tested on Xeon 5400/5500 chipset legacy machine,
Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4),
ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices. It also
works with em(4) and igb(4), but there some fixes are needed for
drivers, which are not committed yet. Intel GPUs do not work with
DMAR (yet).
Many thanks to John Baldwin, who explained me the newbus integration;
Peter Holm, who did all testing and helped me to discover and
understand several incredible bugs; and to Jim Harris for the access
to the EDS and BWG and for listening when I have to explain my
findings to somebody.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
it had no hooks. It has abused ifnet's if_afdata slot and actually
abused every subsystem it touched.
lagg(4) is a proper trunking solution at ifnet(9) layer.
ng_one2many(4) is a proper trunking solution in netgraph(4).
from if.h.
- Remove unnecessary includes and declarations from if.h
- Remove unnecessary includes and declarations from if_var.h [1]
- Mark some declarations that are about to be removed in near
future with comments, explaning why this declaration is still
necessary.
- Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H.
Obtained from: bdeBSD [1]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
bpf(4) and vlan(4) related event declarations to bpf.h and
if_vlan_var.h. To avoid dependency on eventhandler.h, protect
these declarations with ifdef SYS_EVENTHANDLER_H.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
slightly unnerving.
In file included from ioctl.c:48:
/var/tmp/home/sbruno/bsd/head/tmp/usr/include/dev/lmc/if_lmc.h:939:13:
warning: no previous extern declaration for non-static variable 'ssi_cables'
[-Wmissing-variable-declarations]
const char *ssi_cables[] =
busdma implementations to coexist. Copy busdma_machdep.c to
busdma_bounce.c, which is still a single implementation of the busdma
interface on x86 for now. The busdma_machdep.c only contains common
and dispatch code.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
vm_pages. Provide trivial implementation which forwards the load to
_bus_dmamap_load_phys() page by page. Right now all architectures use
bus_dmamap_load_ma_triv().
Tested by: pho (as part of the functional patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.
Obtained from: OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type
in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
addresses but recieve unicast replies, by doing fallthrough lookups
against the correct multicast address. - Clear up some mistaken
assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
one based on other data if we can, otherwise set it to 0.
- Don't modify the icmp id field in NAT unless it's echo
- Use the full range of possible id's when NATing icmp6 echoy
Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe
Reviewed by: glebius
Obtained from: OpenBSD
In report_progress(), use nitems(progress_track) instead of manually
hard-coding array size. Wrap long line.
In blk_write(), code verifies that ptr and pa cannot be non-zero
simultaneously. The later check for the page-alignment of the ptr
argument never triggers due to pa != 0 always implying ptr == NULL. I
believe that the intent was to ensure that physicall address passed is
page-aligned, since the address is (temporary) mapped for the duration
of the page write.
Clear the progress_track.visited fields when starting minidump. If
minidump is restarted or taken second time during the system lifetime,
progress is not printed otherwise, making operator suspectible to the
dump status.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
in net, to avoid compatibility breakage for no sake.
The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.
Discussed with: bz
the code executed in the context of debugger, do not be ashamed to
inform loudly about the re-entry. Also, print the backtrace before
obliterating current stack with longjmp, allowing the operator to see
a place which caused the bug.
The change should make it less mysterious debugging the ddb itself.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The only remaining user was the code that allocates bounce pages for armv4
busdma. It's not clear why bounce pages would need uncached memory, but
if that ever changes, kmem_alloc_attr() would be the way to get it.
really need it. That would be almost everywhere it was included. Add
it in a couple files that really do need it and were previously getting
it by accident via another header.
included by vm/pmap.h, which is a prerequisite for arm/machine/pmap.h
so there's no reason to ever include it directly.
Thanks to alc@ for pointing this out.
of the address space downwards, and then returning the lowest mapped
device address from initarm_lastaddr(). This adds over 500MB of kva
space compared to the old way of hardcoding the end address as 0xE0000000.
Also, pre-map most of the SoC's common memory-mapped devices using 1MB
section mappings so that all device access uses just a few TLB entries.
Graphics devices aren't mapped this way yet, but probably should be.
To provide this new functionality without pasting identical code into
multiple imxNN_machdep.c files, rework the imx machdep code so that
things common to the whole family of SoCs are in a new imx_machdep.c file.
The rewritten imxNN_machdep.c files contain just things specific to an
individual SoC.
previous KVA allocations (which the PMAP lazily invalidates) in TLB0 could
shadow device maps in TLB1. Add a big block comment about some of the
caveats with this approach.
device tree into r3. Rather than worrying about mapping that tree, reserving
its space in the global physical memory space, etc., just copy it to some
memory after the kernel.
- Provide pf_altq.h that has only stuff needed for ALTQ.
- Start pf.h, that would have all constant values and
eventually non-kernel structures.
- Build ALTQ w/o pfvar.h, include if_var.h, that before
came via pollution.
- Build tcpdump w/o pfvar.h.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
interface. Make MII drivers forget about 'struct ifnet'.
Later plan is to provide an administrative downcall from ifnet
layer into drivers, to inform them about administrative status
change. If someone thinks that processing MII events for an
administratively down interface is a big problem, then drivers
would turn MII processing off.
The following MII drivers do evil things, like strcmp() on
driver name, so they still need knowledge of ifnet and thus
include if_var.h. They all need to be fixed:
sys/dev/mii/brgphy.c
sys/dev/mii/e1000phy.c
sys/dev/mii/ip1000phy.c
sys/dev/mii/jmphy.c
sys/dev/mii/nsphy.c
sys/dev/mii/rgephy.c
sys/dev/mii/truephy.c
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
/chosen, following the list of allowed console properties in ePAPR. Also
do not require that stdin be defined and equal to stdout: stdin is
nonstandard (for ePAPR) and console in an unexpected place is after all
better than no console.
- Remove explicit requirement that the SOC registers be found except as an
optimization (although the MPC85XX LAW drivers still require they be found
externally, which should change).
- Remove magic CCSRBAR_VA value.
- Allow bus_machdep.c's early-boot code to handle non 1:1 mappings and
systems not in real-mode or global 1:1 maps in early boot.
- Allow pmap_mapdev() on Book-E to reissue previous addresses if the
area is already mapped. Additionally have it check all mappings, not
just the CCSR area.
This allows the console on e500 systems to actually work on systems where
the boot loader was not kind enough to set up a 1:1 mapping before starting
the kernel.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
All Armada XP chips should be affected. It is necessary to handle
busy interrupt/indication by enabling busy-detect property in DTS.
Tested by: kevlo
Approved by: cognet (mentor)
When using DW UART with BUSY detection it is necessary to wait
until all serial transfers are finished before manipulating the
line control. LCR will not be affected when UART is busy.
In addition, if Divisor Latch Access Bit is being set in order to
modify UART divisors:
1. We will get BUSY interrupt if interrupts are enabled.
2. Because LCR will not be affected the THR and (even worse) IER
contents will be corrupted. This will lead to console hang.
Approved by: cognet (mentor)
Use it universally. Book-E traps may also need revisiting due to the
introduction of fixed-offset traps and the deprecation of IVORs in POWER
ISA 2.06, but that's very much an issue for another day.
is slightly more complicated and is left unimplemented for now. Also
prevent pmap_mapdev() from mapping over the kernel and KVA regions if
devices happen to have high physical addresses.
MFC after: 2 weeks
* Remove the unused sdt cdev.
* Don't bother keeping a list of probes in struct sdt_prov; it's not needed.
* Invoke sdt_load and sdt_unload from the module handler instead of
registering separate SYSINITs.
* Keep to within 80 columns.
* Check for errors from dtrace_unregister().
the code was trying to save the stack pointer rather than the frame pointer,
and the arguments to copyout(9) were reversed, so nothing ended up being
saved on the stack. This would cause process crashes when the pid provider
was being used to instrument calls of a function starting with this
instruction.
Reported by: symbolics@gmx.com
Tested by: symbolics@gmx.com (earlier version)
MFC after: 2 weeks
This is "STA invalid". I saw it during some 4965 testing (kern/183260)
and I still have no idea what is causing it.
Obtained from: Linux drivers/net/wireless/iwlegacy
Some firmware versions seem to get very unhappy if they're sent btcoex
commands when they don't actually have bluetooth hardware in them.
So, disable sending them those commands.
Tested:
* 5100 (which has bluetooth, no problems)
* 4965 (which doesn't have bluetooth, but didn't seem to crash)
* 6200 (no bluetooth, seems to get unhappy being sent bluetooth commands.)
index lookups.
* My recent(ish) change to iwn(4) and the net80211 rate control API to
support 11n rates broke the link quality table use. So, until I or
someone else decides to fix it, let's just disable it for now.
* Teach iwn_tx_data_raw() to use the iwn_rate_to_plcp() function.
* Eliminate two uses of the net80211 rate index lookup functions - they
are only for legacy rates and they're not needed here.
This fixes some invalid looking rate control TX issues that showed up
on my 4965 but it doesn't fix the two TX hangs I've noticed. Those look
like DMA related issues.
Tested:
* 4965, STA mode
* 5100, STA mode
associates compat strings with arbitrary values that mean something to
the driver. This is handy for drivers that support several variations
of similar hardware and need to know which one matched.
Reviewed by: imp, jmg, nwhitehorn
negative diff that should improve reliability somewhat. There should be
no differences in behavior -- please report any that crop up. This has been
tested on ARM and PPC systems.
Tested by: ray
board.
This is another AR9331 board similar to the Carambola2. It has different
ethernet and LED wiring though.
They make a variety of boards that mostly differ on the amount of RAM/flash
available. Alfa Networks graciously donated a handful of 64MB RAM/16MB flash
boards so I can finish off 802.11s support for the AR93xx chips and do up
a tech demonstration with it.
This is enough to bring up the board.
Tested:
* Alfa networks UB Hornet board - 64MB ram, 16MB flash version.
Thankyou to Alfa Networks for the development boards!
Sponsored by: Alfa Networks (hardware only)
crazy readings occasionally. One wild reading should not be enough to
trigger a shutdown, so instead wait for several concerning readings in
a row.
PR: powerpc/180593
Submitted by: Julio Merino
MFC after: 1 week
created after each other e.g.
ifconfig tap0
ifconfig vmnet0
<panic>
Appears to be a cut'n'paste error from the tap code to the vmnet
code where the name string wasn't updated in the call to make_dev().
Reviewed by: glebius
MFC after: 3 days
referenced by pointer, making it non-static should not have even the
negligible impact on the existing code.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
from a management frame transmission.
This bug is a bit loopy, so here goes.
The underlying cause is pretty easy to understand - the node isn't
referenced before passing into the callout, so if the node is deleted
before the callout fires, it'll dereference free'd memory.
The code path however is slightly more convoluted.
The functions _say_ mgt_tx - ie management transmit - which is partially
true. Yes, that callback is attached to the mbuf for some management
frames. However, it's only for frames relating to scanning and
authentication attempts. It helpfully drives the VAP state back to
"SCAN" if the transmission fails _OR_ (as I subsequently found out!)
if the transmission succeeds but the state machine doesn't make progress
towards being authenticated and active.
Now, the code itself isn't terribly clear about this.
It _looks_ like it's just handling the transmit failure case.
However, when you look at what goes on in the transmit success case, it's
moving the VAP state back to SCAN if it hasn't changed state since
the time the callback was scheduled. Ie, if it's in ASSOC or AUTH still,
it'll go back to SCAN. But if it has transitioned to the RUN state,
the comparison will fail and it'll not transition things back to the
SCAN state.
So, to fix this, I decided to leave everything the way it is and merely
fix the locking and remove the node reference.
The _better_ fix would be to turn this callout into a "assoc/auth request"
timeout callback and make the callout locked, thus eliminating all races.
However, until all the drivers have been fixed so that transmit completions
occur outside of any locking that's going on, it's going to be impossible
to do this without introducing LORs. So, I leave some of the evilness
in there.
Tested:
* AR5212, ath(4), STA mode
* 5100 and 4965 wifi, iwn(4), STA mode
BUS_PROBE_GENERIC and not BUS_PROBE_SPECIFIC (0) so the OFW SPI bus can
attach when enabled. Export the spibus devclass_t and driver_t
declarations.
Submitted by: ray
Approved by: adrian (mentor)
buffer. While here, make the code flow somewhat nicer.
Thanks to mav@ for tracking it down.
Tested by: mav
MFC after: 3 days
Sponsored by: FreeBSD Foundation
- Replace ordered_tag_count counter with single flag;
- From da remove outstanding_cmds counter, duplicating pending_ccbs list;
- From da_softc remove unused links field.
di_extsize is the EA size and as such it should be unsigned.
Adjust related types for consistency.
Reviewed by: mckusick (previous version)
MFC after: 3 weeks
Change 221534 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/01/27 16:05:30
FreeBSD/mips stores page-table entries in a near-identical format
to MIPS TLB entries -- only it overrides certain "reserved" bits
in the MIPS-defined EntryLo register to hold software-defined bits
(swbits) to avoid significantly increasing the page table memory
footprint. On n32 and n64, these bits were (a) colliding with
MIPS64r2 physical memory extensions and (b) being improperly
cleared.
Attempt to fix both of these problems by pushing swbits further
along 64-bit EntryLo registers into the reserved space, and
improving consistency between C-based and assembly-based clearing
of swbits -- in particular, to use the same definition. This
should stop swbits from leaking into TLB entries -- while ignored
by most current MIPS hardware, this would cause a problem with
(much) larger physical memory sizes, and also leads to confusing
hardware-level tracing as physical addresses contain unexpected
(and inconsistent) higher bits.
Discussed with: imp, jmallett
Change 1187301 by brooks@brooks_zenith on 2013/10/23 14:40:10
Loop back the initial commit of 221534 to HEAD. Correct its
implementation for mips32.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- ofw_bus_map_intr()
Maps an (iparent, IRQ) tuple to a system-global interrupt number in some
platform dependent way. This is meant to be implemented as a replacement
for [FDT_]MAP_IRQ() that is an MI interface that knows about the bus
hierarchy.
- ofw_bus_config_intr()
Configures an interrupt (previously mapped) based on firmware sense flags.
This replaces manual interpretation of the sense field in bus drivers and
will, in a follow-up, allow that interpretation to be redirected to the PIC
drivers where it belongs. This will eventually replace the tables in
/sys/dev/fdt/fdt_ARCH.c
The PowerPC/AIM code has been converted to use these globally, with an
implementation in terms of MAP_IRQ() and powerpc_config_intr(), assuming
OpenPIC, at the bus root in nexus(4). The ofw_bus_config_intr() will shortly
be integrated into pic_if.m and bounced through nexus into the PIC tree.
FDT integration will happen significantly later due to larger testing
requirements. This patch in general also lays the groundwork for the removal
of /sys/dev/fdt/fdt_ARCH.c and machine/fdt.h.
information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.
The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.
Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.
This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.
The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.
With pre-fetch disabled (vfs.zfs.prefetch_disable=1):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s
With pre-fetch enabled (vfs.zfs.prefetch_disable=0):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s
In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.
The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc
These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487
Reviewed by: gibbs, mav, will
MFC after: 2 weeks
Sponsored by: Multiplay
Change 221669 by bz@bz_zenith on 2013/02/01 12:26:04
Run the initialization for polling earlier along with INTRs
so that we can put network interface into polling mode by default
if DEVICE_POLLING is compiled in and no interrupts are available.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- The new allocator won't return coherent memory for any size > PAGE_SIZE,
so don't assume we have coherent memory, and explicitely use
bus_dmamap_sync().
Change 221767 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:18:53
When printing out information on a TLB MOD exception for a user
process (e.g., an attempt to write to a read-only page), report
it as a "write" in the console message, rather than "unknown".
Change 221768 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:28:00
Fix post-compile but pre-commit typo in last changeset.
MFC after: 3 days
Sponsored by: DARPA/AFRL
and OF_searchencprop(). I thought about using the element size parameter
to OF_getprop_alloc() to do endian-switching automatically, but it breaks
use with structs and a *lot* of FDT code (which can hopefully be moved to
these new APIs).
MFC after: 2 weeks
Change 231031 by brooks@brooks_zenith on 2013/07/11 16:22:08
Turn the unused and uncompilable MIPS_DISABLE_L1_CACHE define in
cache.c into an option and when set force I- and D-cache line
sizes to 0 (the latter part might be better as a tunable).
Fix some casts in an #if 0'd bit of code which attempts to
disable L1 cache ops when the cache is coherent.
Sponsored by: DARPA/AFRL
Change 228019 by bz@bz_zenith on 2013/04/23 13:55:30
Add kernel side support for large TLB on BERI/CHERI.
Modelled similar to NLM
MFC after: 3 days
Sponsored by: DAPRA/AFRL
Change 221534 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/01/27 16:05:30
FreeBSD/mips stores page-table entries in a near-identical format
to MIPS TLB entries -- only it overrides certain "reserved" bits
in the MIPS-defined EntryLo register to hold software-defined bits
(swbits) to avoid significantly increasing the page table memory
footprint. On n32 and n64, these bits were (a) colliding with
MIPS64r2 physical memory extensions and (b) being improperly
cleared.
Attempt to fix both of these problems by pushing swbits further
along 64-bit EntryLo registers into the reserved space, and
improving consistency between C-based and assembly-based clearing
of swbits -- in particular, to use the same definition. This
should stop swbits from leaking into TLB entries -- while ignored
by most current MIPS hardware, this would cause a problem with
(much) larger physical memory sizes, and also leads to confusing
hardware-level tracing as physical addresses contain unexpected
(and inconsistent) higher bits.
Discussed with: imp, jmallett
MFC after: 3 days
Sponsored by: DARPA/AFRL
by encode-int. Specifically, it takes a set of 32-bit cell values and
changes them to host byte order. Most non-string instances of OF_getprop()
should be using this function, which is a no-op on big-endian platforms.
segments thinking it received only one segment. This causes it to enable
the delay the ACK for 100ms to wait for another segment which may never
come because all the data was received already.
Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
us further away from acking every other packet; b) it introduces additional
delay in responding to the sender. The latter is especially bad because it
is in the nature of LRO to aggregated all segments of a burst with no more
coming until an ACK is sent back.
Change the delayed ACK logic to detect LRO segments by being larger than
the MSS for this connection and issuing an immediate ACK for them to keep
the ACK clock ticking without interruption.
Reported by: julian, cperciva
Tested by: cperciva
Reviewed by: lstewart
MFC after: 3 days
Switch the majority of device configuration to FDT from hints.
Add BERI_*_BASE configs to reduce duplication in the MDROOT and SDROOT
kernels.
Add NFS and GSSAPI support by default.
MFC after: 3 days
Sponsored by: DARPA/AFRL
230523, 1123614
Implement a driver for Robert Norton's PIC as an FDT interrupt
controller. Devices whose interrupt-parent property points to a beripic
device will have their interrupt allocation, activation , and setup
operations routed through the IC rather than down the traditional bus
hierarchy.
This driver largely abstracts the underlying CPU away allowing the
PIC to be implemented on CPU's other than BERI. Due to insufficient
abstractions a small amount of MIPS specific code is currently required
in fdt_mips.c and to implement counters.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- Use bus reference phandles in place of FDT offsets as IRQ domain keys
- Unify the identical macio/fdt/mambo OpenPIC drivers into one
- Be more forgiving (following ePAPR) about what we need from the device
tree to identify an OpenPIC
- Correctly map all IRQs into an interrupt domain
- Set IRQ_*_CONFORM for interrupts on an unknown PIC type instead of
failing attachment for that device.
zio_compress_data(..) when compressing l2arc buffers.
This eliminates l2arc I/O errors, which resulted in very poor performance on
vdev's configured with block size greater than 512b due to compression
assuming a smaller min block size than the vdev supports.
MFC after: 2 days
cam_periph_acquire() can return error if periph already invalidated, but
that may be unacceptable and cause deadlock if the invalidated periph can't
be destroyed without "executing" the scheduled request.
Coverity CID: 1109822
MFC after: 2 months
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.
To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.
Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).
To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.
This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).
Sponsored by: iXsystems, Inc.
MFC after: 2 months
This shuld have been a problem since r230587. Not exactly sure why it
was not detected the last weeks with the tinderbox. I would assume
r255744 is what started to cause it.
MFC after: 1 week
the cfi(4) driver. It remained in the tree longer than would be ideal
due to the time required to bring cfi(4) to feature parity.
Sponsored by: DARPA/AFRL
MFC after: 3 days
Implement support for interrupt-parent nodes in simplebus. The current
implementation requires that device declarations have an interrupt-parent
node and that it point to a device that has registered itself as a
interrupt controller in fdt_ic_list_head and implements the fdt_ic
interface.
Sponsored by: DARPA/AFRL
user. Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.
Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process. Limit
allows administrator to prevent the abuse.
This is kernel-mode side of the change, with the user-mode enabling
commit following.
Reported and tested by: pho
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
already. Also, according to the specs, GDRST register is not in the
power well, so the forcewake for reset status read is excessive for
this reason.
Use plain register read for waiting of the reset completion
notification, to avoid gt_lock recursion. Linux upstream did the
similar change, but their code was already restructured.
Reported by: ray
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.
Replace big per-SIM locks with bunch of smaller ones:
- per-LUN locks to protect device and peripheral drivers state;
- per-target locks to protect list of LUNs on target;
- per-bus locks to protect reference counting;
- per-send queue locks to protect queue of CCBs to be sent;
- per-done queue locks to protect queue of completed CCBs;
- remaining per-SIM locks now protect only HBA driver internals.
While holding LUN lock it is allowed (while not recommended for performance
reasons) to take SIM lock. The opposite acquisition order is forbidden.
All the other locks are leaf locks, that can be taken anywhere, but should
not be cascaded. Many functions, such as: xpt_action(), xpt_done(),
xpt_async(), xpt_create_path(), etc. are no longer require (but allow) SIM
lock to be held.
To keep compatibility and solve cases where SIM lock can't be dropped, all
xpt_async() calls in addition to xpt_done() calls are queued to completion
threads for async processing in clean environment without SIM lock held.
Instead of single CAM SWI thread, used for commands completion processing
before, use multiple (depending on number of CPUs) threads. Load balanced
between them using "hash" of the device B:T:L address.
HBA drivers that can drop SIM lock during completion processing and have
sufficient number of completion threads to efficiently scale to multiple
CPUs can use new function xpt_done_direct() to avoid extra context switch.
Make ahci(4) driver to use this mechanism depending on hardware setup.
Sponsored by: iXsystems, Inc.
MFC after: 2 months
Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping
temporary mapped buffer. That fixes double unmap if biodone() called twice
for the same BIO (but with different done methods).
Move mapping removal before calling bio_done() method. I believe that it is
very wrong to do anything to BIO after reporting completion. kib@ thinks
it was done for some forgotten now case when bio_done() method needed mapped
buffer. But 1) if BIO was sent as unmapped, then IMO done() should be called
in the same way; 2) IMO there is no guatantee that buffer will be mapped at
this point at all, for example, if all underlying stack supports unmapped
I/O, so bio_done() handler can not expect that.
the register based on the argument index rather than relying on the fields
in struct reg to be in the right order. This assumption is incorrect on
FreeBSD and generally led to bogus argument values for the sixth argument
of PID and USDT probes; the first five are passed directly to dtrace_probe()
via the fasttrap trap handler and so were correctly handled.
MFC after: 2 weeks
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
from the kernel build until it is ready.
sys/conf/files:
Remove the entry for xen/evtchn/evtchn_dev.c so it is not included
in any kernel builds.
Noticed by: smh
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
Freescale SoCs including the i.MX series. This also works for the newer
SoCs with the ENET gigabit controller, but doesn't use any of the new
hardware features other than enabling gigabit speed.