than required for handling MAXPHYS and report the resulting maximum
I/O size to CAM instead of implicitly limiting it to DFLTPHYS.
- Move the variables of sym_action2() out of nested scope as required
by style(9) and remove extraneous curly braces.
- Replace a magic value for PCIR_COMMAND with the appropriate macro.
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
Tested with a HBA donated by wilko.
MFC after: 3 days
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.
Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.
MFC after: 5 weeks
These aren't strictly needed at the moment as we're not doing APSM
and forcing the NIC in and out of network sleep. But, they don't hurt.
Tested:
* AR9280 (mini-PCIe)
Obtained from: Qualcomm Atheros, Linux ath9k
* Now that ah_configPCIE is called for both power on and suspend/resume,
make sure the right bit(s) are cleared and set when suspending and
resuming. Specifically:
+ force disable/enable the PCIe PHY upon suspend/resume;
+ reprogram the PCIe WAR register when resuming and upon power-on.
* Add a recipe which powers down any PCIe PHY hardware inside the AR5416
(which is the PCI variant) to save on power. I have (currently) no way
to test exactly how much power is saved, if any.
Tested on:
* AR5416 cardbus - although unfortunately pccard/cbb/cardbus currently
detaches the NIC upon suspend, I don't think it's a proper test case.
* AR5418 PCIe attached to expresscard - since we're not doing PCIe APSM,
it's also not likely a full/good test case.
In both instances I went through a handful of suspend/resume cycles and
ensured that the STA vap reassociated correctly.
TODO:
* Setup a laptop to simply sit in a suspend/resume loop, making sure that
the NIC always correctly comes back;
* Start doing suspend/resume tests with actual traffic going on in the
background, as I bet this process is all quite racy at the present;
* Test adhoc/hostap mode, just to be completely sure it's working correctly;
* See if I can jury rig an external power source to an AR5416 to test out
whether ah_disablePCIE() works.
Obtained from: Qualcomm Atheros
* Add some other WAR bits (very usefully described too) in preparation for
porting over some suspend/resume fixes from ath9k/Atheros.
Obtained from: Qualcomm Atheros
Use M_ZERO with malloc rather than calling bzero() ourselves.
Change if () panic() checks to KASSERT()s as they are only
catching invariants in code flow but not dependent on network
input/output.
Move initial assigments indirecting pointers after the lock
has been aquired.
Passing layer boundries, reset M_PROTOFLAGS.
Remove a NULL assignment before free.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Properly protect the inp read access when handling the control code.
In the past this was expensive but given the rlock it's not so much
anymore.
Spotted while: optimizing udp6
Discussed with: rwatson (a few months ago)
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
PMP ports such as PMP configuration or SEMB should be exposed or hidden.
These ports were always hidden before as useless and sometimes promatic.
But with updated ses driver supporting SEMB it is no longer so straight.
Keep ports hidden by default to avoid probe request ttimeouts if SEP is
not connected to PMP's SEMB via I2C, that is very often situation.
queue the packet for LRO and tell the driver to directly pass it on.
This avoids re-assembly and later re-fragmentation problems when
forwarding.
It's not the best solution but the simplest and most effective for
the moment.
Should have been done: ages ago
Discussed with and by: many
MFC after: 3 days
process exit. Instead use CAM's standard reference counting to prevent
periph going away until process won't complete. I think that sleep in
single CAM SWI thread is not a good idea and may lead to deadlocks if
daemon process waits for some command completion. Combined with recent
patch avoiding use of CAM SWI for ATA it just causes panics because of
sleeps prohibited in interrupt thread context.
This combination doesn't make sense, unit numbers should be hardwired
only in context of a known driver. The wildcard devices should have
wildcard unit numbers.
Reviewed by: jhb
MFC after: 2 weeks
not to disable the PCIe PHY in prepration for reset.
Extend the enablepci method to have a "poweroff" flag, which if equal
to true means the hardware is about to go to sleep.
Add TSO6 and LRO/IPv6 support.
Fix the module Makefile to at least properly inlcude opt_inet6.h
and allow builds without INET or INET6.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Allow LRO to work on IPv6 as well.
Fix the module Makefile to at least properly inlcude opt_inet6.h
and allow builds without INET or INET6.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Factor out Hop-By-Hop option processing. It's still not heavily used,
it reduces the footprint of ip6_input() and makes ip6_input() more
readable.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Add code to handle pre-checked TCP checksums as indicated by mbuf
flags to save the entire computation for validation if not needed.
In the IPv6 TCP output path only compute the pseudo-header checksum,
set the checksum offset in the mbuf field along the appropriate flag
as done in IPv4.
In tcp_respond() just initialize the IPv6 payload length to 0 as
ip6_output() will properly set it.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Simple yet effective change enabling checksum "offload" on loopback
for IPv6 to avoid expensive computations.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Defer checksum calulations on UDP6 output and respect the mbuf
flags set by NICs having done checksum validation for us already,
thus saving the computing time in the input path as well.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Add support for delayed checksum calculations in the IPv6
output path. We currently cannot offload to the card if we
add extension headers (which incl. fragmentation).
Fix two SCTP offload support copy&paste bugs: calculate
checksums if fragmenting and no need to flag IPv4 header
checksums in the IPv6 forwarding path.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
* Flesh out the pcie disable method for 11n chips, as they were defaulting
to the AR5212 (empty) PCIe disable method.
* Add accessor macros for the HAL PCIe enable/disable calls.
* Call disable on ath_suspend()
* Call enable on ath_resume()
NOTE:
* This has nothing to do with the NIC sleep/run state - the NIC still
will stay in network-run state rather than supporting network-sleep
state. This is preparation work for supporting correct suspend/resume
WARs for the 11n PCIe NICs.
TODO:
* It may be feasible at this point to keep the chip powered down during
initial probe/attach and only power it up upon the first configure/reset
pass. This however would require correct (for values of "correct")
tracking of the NIC power configuration state from the driver and that
just isn't attempted at the moment.
Tested:
* AR9280 on my Lenovo T60, but with no suspend/resume pass (yet).
Hide the ip6aux functions. The only one referenced outside ip6_input.c
is not compiled in yet (__notyet__) in route6.c (r235954). We do have
accessor functions that should be used.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
X-MFC: KPI?
Simplify the code removing a return from an earlier else case,
not differing from the default function return called now.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
We currently nowhere set IP6A_SWAP making the entire check useless
with the current code. Keep around but do not compile in.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
No need to hold the (expensive) rt lock over (expensive) logging.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Factor out the tcp_hc_getmtu() call. As the comments say it
applies to both v4 and v6, so only write it once making it easier
to read the protocol family specifc code.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future,
though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).
We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
1. Define all registers. These definitions are needed to support
the FCM driver for direct-connect NAND.
2. Repurpose lbc_read_reg() and lbc_write_reg() for use by localbus
attached device drivers. Use bus_space functions directly in the
lbc driver itself.
3. Be smarter about programming LAWs and mapping memory. The ranges
defined in the FDT are per bank (= chip select) and since we can
have up to 8 banks, we could easily use more than 8 LAWs or TLB
enrties when per-bank memory ranges need multiple LAWs or TLBs
due to alignment or size constraints.
We now combine all memory ranges into the fewest possible set of
contiguous regions and program the hardware for that. Thus, a
cleverly written FDT with 8 devices may still only need 1 LAW or
1 TLB entry. Note that the memory ranges can be assigned randomly
to the banks. We sort as we build to handle that.
4. Support the FCM when programming the OR register. This is mostly
for documention purposes as we do not have a way to define the
mode for a bank.
5. Remove Semihalf-ism: do not define DEBUG (only to undefine it
again).
FDT does not define all ranges possible for a particular node (e.g.
PCI).
While here, only update the trgt_mem and trgt_io pointers if there's
no error. This avoids that we knowingly write an invalid target (= -1).
for variables that live in the boot page.
o Add bp_trace (yes, it's in the boot page) that gets zeroed before we
try to wake a core and to which the core being woken can write markers
so that we know where the core was in case it doesn't wake up. The
boot code does not yet write markers (too follow).
o Disable the boot page translation to allow the last 4K page to be used
for whatever we please. It would get mapped otherwise.
o Fix kernstart in the case of SMP. The start argument is typically page
aligned due to the alignment requirements that come with having a boot
page. The point of using trunc_page is that we get the actual load
address given that the entry point is immediately following the ELF
headers. In the SMP case this ended up exactly 4K after the load
address. Hence subtracting 1 from start.
exceptions early enough during boot that the kernel will do ithe same.
Use lwsync only when compiling for LP64 and revert to the more proven isync
when compiling for ILP32. Note that in the end (i.e. between revision 222198
and this change) ILP32 changed from using sync to using isync. As per Nathan
the isync is needed to make sure I/O accesses are properly serialized with
locks and isync tends to be more effecient than sync.
While here, undefine __ATOMIC_ACQ and __ATOMIC_REL at the end of the file
so as not to leak their definitions.
Discussed with: nwhitehorn
Introduce a (for now copied stripped down) in6_cksum_pseudo()
function. We should be able to use this from in6_cksum() but
we should also ponder possible MD specific improvements.
It takes an extra csum argument to allow for easy checks as
will be done by the upper layer protocol input paths.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Optimize in6_cksum(), re-ordering work and limiting variable
initialization, removing a bzero() for mostly re-initialized
struct values, making use of the newly introduced in6_getscope(),
as well as converting an if/panic to a KASSERT().
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Without it, it fails to create labels for filesystems resized by
growfs(8).
PR: kern/165962
Submitted by: Olivier Cochard-Labbe <olivier at cochard dot me>
Introduce in6_getscope() to allow more effective checksum
computations without the need to copy the address to clear the
scope.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.
Style fix to pmap_protect().
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.
Sponsored by: Spectra Logic Corporation
Sponsored by: iXsystems, Inc.
Submitted by: gibbs, will, mav
- Add low-level support for SATA Enclosure Management Bridge (SEMB)
devices -- SATA equivalents of the SCSI SES/SAF-TE devices.
- Add some utility functions for SCSI SAF-TE devices access.
Sponsored by: iXsystems, Inc.
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.
Reviewed by: alc
MFC after: 1 week
processor objects. Instead of forcing the new-bus CPU objects to use
a unit number equal to pc_cpuid, adjust acpi_pcpu_get_id() to honor the
MADT IDs by default. As with the previous change, setting
debug.acpi.cpu_unordered to 1 in the loader will revert to the old
behavior.
Tested by: jimharris
MFC after: 1 month
and deactivating PCI resources. Previously, if a device had more than
48 MSI interrupts, then activating message 48 (which has a rid == PCIR_BIOS)
would incorrectly try to enable the PCI ROM BAR.
Tested by: Olivier Cinquin ocinquin uci edu
MFC after: 3 days
negotiate with each other on the TLP payload size so blindly
forcing the size to 128 can cause a completion error which in turn
will stop device.
Reported by: Geans Pin < geanspin <> broadcom dot com >
MFC after: 5 days
configured down. Formerly, IPMI communication was lost whenever the
interface was not up. The reason was that the BCE_EMAC_MODE
register was not configured with the correct media settings. There
are two parts to the fix.
First, resetting the chip in bce_reset() causes the BCE_EMAC_MODE
register to be initialized to a default value that does not
necessarily correspond to the actual media settings. The fix
implemented here is a bit of a hack. Ideally, at the end of
bce_reset() we would poll the PHY to determine the negotiated media,
and then we would set the BCE_EMAC_MODE register accordingly. That
is difficult, since the PHY is abstracted behind the MII layer and is
not supposed to be queried directly from the MAC driver. Instead,
we read the BCE_EMAC_MODE register at the beginning of bce_reset()
and then restore its media bits to their original values before
returning. If IPMI is up and running, then the link is already
established and the BCE_EMAC_MODE register is already set appropriately
when bce_reset() is called. If IPMI is not running, no harm is
done by preserving the BCE_EMAC_MODE settings. The driver will set
the register properly once the interface is configured up and link
is established.
Second, bce_miibus_statchg() is sometimes called when the link is
down. In that case, the reported media settings are invalid.
Formerly, the driver used them anyway to setup the BCE_EMAC_MODE
register. We now avoid changing any MAC registers unless link is
active and the reported media settings are valid.
Submitted by: jdp
Tested by: jdp
MFC after: 5 days
I'll have to leave this high for now, until I've done some significant
surgery with how ath_bufs (and descriptors) are handled.
This should significantly cut down on the opportunities for a full TX
queue hanging traffic. I'll continue making things work though; I'm
mostly doing this for users. :)
* If the first call succeeded but failed to transmit, a timer would
reschedule it via bar_timeout(). Unfortunately bar_timeout() didn't
check the return value from the ieee80211_send_bar() reattempt and
if that failed (eg the driver ic_raw_xmit() failed), it would never
re-arm the timer.
* If BARPEND is cleared (which ieee80211_send_bar() will do if it can't
TX), then re-arming the timer isn't enough - once bar_timeout() occurs,
it'll see BARPEND is 0 and not run through the rest of the routine.
So when rearming the timer, also set that flag.
* If the TX wasn't occuring, bar_tx_complete() wouldn't be called and the
driver callback wouldn't be called either. So the driver had no idea
that the BAR TX attempt had failed. In the ath(4) case, TX would stay
paused.
(There's no callback to indicate that BAR TX had failed or not;
only a "BAR TX was attempted". That's a separate, later problem.)
So call the driver callback (ic_bar_response()) before the ADDBA session
is torn down, so it has a chance of being notified that things didn't
quite go to plan.
I've verified that yes, this does suspend traffic for ath(4), retry BAR
TX even if the driver is failing ic_raw_xmit(), and then eventually giving
up and sending a DELBA. I'll address the "out of ath_buf" issue in ath(4)
in a subsequent commit - this commit just fixes the edge case where any
driver is (way) out of internal buffers/descriptors and fails frame TX.
PR: kern/168170
Reviewed by: bschmidt
MFC after: 1 month
works with new generations of GPUs (IronLake, SandyBridge and
supposedly IvyBridge).
The driver is not connected to the build yet.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
operations required by GEMified i915.ko. It also attaches to SandyBridge
and IvyBridge CPU northbridges now.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account
when determining whether given write would exceed RLIMIT_FSIZE.
When APPEND flag is specified, ZFS updates uio->uio_offset to point to the
end of file.
But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check
can be rendered ineffective by thread that opens some file with O_APPEND
and lseeks below RLIMIT_FSIZE before calling write.
Submitted by: Mateusz Guzik <mjguzik at gmail dot com>
MFC after: 2 weeks
into partitions.
Partitions are created based on data in dts file which are
extracted and interpreted by slicer.
Obtained from: Semihalf
Supported by: FreeBSD Foundation, Juniper Networks
this is a VNET-kernel or not. gcc used to put the static symbol into
the symbol table, clang does not. This fixes the 'netstat: no namelist'
error seen on clang+VNET systems.
In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past
the end of vm_page_array was incorrect causing it to return NULL. This
value is then used in vm_phys_add_page causing a data abort.
Reviewed by: alc, kib, imp
Tested by: stas
I've come across a weird scenario in net80211 where two TX streams will
happily attempt to setup an aggregation session together.
If we're very lucky, it happens concurrently on separate CPUs and the
total lack of locking in the net80211 aggregation code causes this stuff
to race. Badly.
So >1 call would occur to the ath(4) addba start, but only one call would
complete to addba complete or timeout. The TID would thus stay paused.
The real fix is to implement some proper per-node (or maybe per-TID)
locking in net80211, which then could be leveraged by the ath(4) TX
aggregation code.
Whilst I'm at it, shuffle around the debugging messages a bit.
I like to keep people on their toes.
queued internally. This works around issue in the isci HAL where it cannot
accept new I/O to a device after a resetting->ready state transition until
the completion context has unwound.
This issue was found by submitting non-tagged CCBs through pass(4) interface
to a SATA disk with an extremely small timeout value (5ms). This would trigger
internal resets with I/O in the isci(4) internal queues.
The small timeout value had not been intentional (and original reporter has
since changed his test to use 5sec instead), but it did uncover this corner
case that would result in a hung disk.
Sponsored by: Intel
Reported and tested by: Ravi Pokala <rpokala at panasas dot com>
Reviewed by: scottl (earlier version)
MFC after: 1 week
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.
Approved by: kib(mentor)
MFC in: 4 weeks
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'
Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.
Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)
Approved by: kib(mentor)
MFC in: 4 weeks
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock
Add several forgotten assertions (thanks to adrian@)
Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.
Approved by: kib(mentor)
Tested by: gnn
MFC in: 4 weeks
frequency in the at91_pmc_clock_init rather than passing it in. Allow
for frequencies >= 21MHz by rounding to the nearest 500Hz (Idea from
Ian Lapore whose company uses a similar arrangement in their product).
at91_pmc_clock_init() is now nearly independent of the rest of the pmc
driver (which means we may be able to call it much earlier in boot
soon to eliminate the master clock config file requirement for printf
to work during early boot and also eliminate some interdependencies
with the device ordering which requires pmc to be the first device
added).
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.
Reviewed by: kib
X-MFC after: r235598
The generic ELF loading code maps the kernel into low memory
by subtracting KERN_BASE. So the copyin/copyout/readin functions
are always called with low addresses. This code finds the largest
DRAM block from the U-Boot memory map and adds that base to
the addresses.
In particular, this fixes ubldr on AM3358, which has DRAM
mapped to 0x80000000 at power-on.
disabled or the system is in shutdown procedure.
This should fix the problem which kernel never response to the sleep
button press events after the message `suspend request ignored (not
ready yet)'.
MFC after: 3 days
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
casted to structs by getting rid of these buffers entirely. In r169832, it
was tried to paper over this issue by 32-bit aligning the buffers. Depending
on compiler optimizations that still was insufficient for 64-bit architectures
with strong alignment requirements though.
While at it, add comments regarding the total lack of locking in this area.
Tested by: bz
Reviewed by: bz (slightly earlier version), yongari (earlier version)
MFC after: 1 week