and deactivating PCI resources. Previously, if a device had more than
48 MSI interrupts, then activating message 48 (which has a rid == PCIR_BIOS)
would incorrectly try to enable the PCI ROM BAR.
Tested by: Olivier Cinquin ocinquin uci edu
MFC after: 3 days
negotiate with each other on the TLP payload size so blindly
forcing the size to 128 can cause a completion error which in turn
will stop device.
Reported by: Geans Pin < geanspin <> broadcom dot com >
MFC after: 5 days
configured down. Formerly, IPMI communication was lost whenever the
interface was not up. The reason was that the BCE_EMAC_MODE
register was not configured with the correct media settings. There
are two parts to the fix.
First, resetting the chip in bce_reset() causes the BCE_EMAC_MODE
register to be initialized to a default value that does not
necessarily correspond to the actual media settings. The fix
implemented here is a bit of a hack. Ideally, at the end of
bce_reset() we would poll the PHY to determine the negotiated media,
and then we would set the BCE_EMAC_MODE register accordingly. That
is difficult, since the PHY is abstracted behind the MII layer and is
not supposed to be queried directly from the MAC driver. Instead,
we read the BCE_EMAC_MODE register at the beginning of bce_reset()
and then restore its media bits to their original values before
returning. If IPMI is up and running, then the link is already
established and the BCE_EMAC_MODE register is already set appropriately
when bce_reset() is called. If IPMI is not running, no harm is
done by preserving the BCE_EMAC_MODE settings. The driver will set
the register properly once the interface is configured up and link
is established.
Second, bce_miibus_statchg() is sometimes called when the link is
down. In that case, the reported media settings are invalid.
Formerly, the driver used them anyway to setup the BCE_EMAC_MODE
register. We now avoid changing any MAC registers unless link is
active and the reported media settings are valid.
Submitted by: jdp
Tested by: jdp
MFC after: 5 days
I'll have to leave this high for now, until I've done some significant
surgery with how ath_bufs (and descriptors) are handled.
This should significantly cut down on the opportunities for a full TX
queue hanging traffic. I'll continue making things work though; I'm
mostly doing this for users. :)
* If the first call succeeded but failed to transmit, a timer would
reschedule it via bar_timeout(). Unfortunately bar_timeout() didn't
check the return value from the ieee80211_send_bar() reattempt and
if that failed (eg the driver ic_raw_xmit() failed), it would never
re-arm the timer.
* If BARPEND is cleared (which ieee80211_send_bar() will do if it can't
TX), then re-arming the timer isn't enough - once bar_timeout() occurs,
it'll see BARPEND is 0 and not run through the rest of the routine.
So when rearming the timer, also set that flag.
* If the TX wasn't occuring, bar_tx_complete() wouldn't be called and the
driver callback wouldn't be called either. So the driver had no idea
that the BAR TX attempt had failed. In the ath(4) case, TX would stay
paused.
(There's no callback to indicate that BAR TX had failed or not;
only a "BAR TX was attempted". That's a separate, later problem.)
So call the driver callback (ic_bar_response()) before the ADDBA session
is torn down, so it has a chance of being notified that things didn't
quite go to plan.
I've verified that yes, this does suspend traffic for ath(4), retry BAR
TX even if the driver is failing ic_raw_xmit(), and then eventually giving
up and sending a DELBA. I'll address the "out of ath_buf" issue in ath(4)
in a subsequent commit - this commit just fixes the edge case where any
driver is (way) out of internal buffers/descriptors and fails frame TX.
PR: kern/168170
Reviewed by: bschmidt
MFC after: 1 month
works with new generations of GPUs (IronLake, SandyBridge and
supposedly IvyBridge).
The driver is not connected to the build yet.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
operations required by GEMified i915.ko. It also attaches to SandyBridge
and IvyBridge CPU northbridges now.
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account
when determining whether given write would exceed RLIMIT_FSIZE.
When APPEND flag is specified, ZFS updates uio->uio_offset to point to the
end of file.
But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check
can be rendered ineffective by thread that opens some file with O_APPEND
and lseeks below RLIMIT_FSIZE before calling write.
Submitted by: Mateusz Guzik <mjguzik at gmail dot com>
MFC after: 2 weeks
into partitions.
Partitions are created based on data in dts file which are
extracted and interpreted by slicer.
Obtained from: Semihalf
Supported by: FreeBSD Foundation, Juniper Networks
this is a VNET-kernel or not. gcc used to put the static symbol into
the symbol table, clang does not. This fixes the 'netstat: no namelist'
error seen on clang+VNET systems.
In PHYS_TO_VM_PAGE() when VM_PHYSSEG_DENSE is set the check if we are past
the end of vm_page_array was incorrect causing it to return NULL. This
value is then used in vm_phys_add_page causing a data abort.
Reviewed by: alc, kib, imp
Tested by: stas
I've come across a weird scenario in net80211 where two TX streams will
happily attempt to setup an aggregation session together.
If we're very lucky, it happens concurrently on separate CPUs and the
total lack of locking in the net80211 aggregation code causes this stuff
to race. Badly.
So >1 call would occur to the ath(4) addba start, but only one call would
complete to addba complete or timeout. The TID would thus stay paused.
The real fix is to implement some proper per-node (or maybe per-TID)
locking in net80211, which then could be leveraged by the ath(4) TX
aggregation code.
Whilst I'm at it, shuffle around the debugging messages a bit.
I like to keep people on their toes.
queued internally. This works around issue in the isci HAL where it cannot
accept new I/O to a device after a resetting->ready state transition until
the completion context has unwound.
This issue was found by submitting non-tagged CCBs through pass(4) interface
to a SATA disk with an extremely small timeout value (5ms). This would trigger
internal resets with I/O in the isci(4) internal queues.
The small timeout value had not been intentional (and original reporter has
since changed his test to use 5sec instead), but it did uncover this corner
case that would result in a hung disk.
Sponsored by: Intel
Reported and tested by: Ravi Pokala <rpokala at panasas dot com>
Reviewed by: scottl (earlier version)
MFC after: 1 week
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.
Approved by: kib(mentor)
MFC in: 4 weeks
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'
Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.
Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)
Approved by: kib(mentor)
MFC in: 4 weeks
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock
Add several forgotten assertions (thanks to adrian@)
Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.
Approved by: kib(mentor)
Tested by: gnn
MFC in: 4 weeks
frequency in the at91_pmc_clock_init rather than passing it in. Allow
for frequencies >= 21MHz by rounding to the nearest 500Hz (Idea from
Ian Lapore whose company uses a similar arrangement in their product).
at91_pmc_clock_init() is now nearly independent of the rest of the pmc
driver (which means we may be able to call it much earlier in boot
soon to eliminate the master clock config file requirement for printf
to work during early boot and also eliminate some interdependencies
with the device ordering which requires pmc to be the first device
added).
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.
Reviewed by: kib
X-MFC after: r235598
The generic ELF loading code maps the kernel into low memory
by subtracting KERN_BASE. So the copyin/copyout/readin functions
are always called with low addresses. This code finds the largest
DRAM block from the U-Boot memory map and adds that base to
the addresses.
In particular, this fixes ubldr on AM3358, which has DRAM
mapped to 0x80000000 at power-on.
disabled or the system is in shutdown procedure.
This should fix the problem which kernel never response to the sleep
button press events after the message `suspend request ignored (not
ready yet)'.
MFC after: 3 days
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.
casted to structs by getting rid of these buffers entirely. In r169832, it
was tried to paper over this issue by 32-bit aligning the buffers. Depending
on compiler optimizations that still was insufficient for 64-bit architectures
with strong alignment requirements though.
While at it, add comments regarding the total lack of locking in this area.
Tested by: bz
Reviewed by: bz (slightly earlier version), yongari (earlier version)
MFC after: 1 week
There's some TX path TDMA code in if_ath_tx.c which should be migrated
out, but first I should likely try and verify/fix/repair the TDMA support
in 9.x and -HEAD.
* migrate the rx processing out into if_ath_rx.c
* migrate the TSF functions into if_ath_tsf.h, as inlines
This is in prepration for supporting the EDMA RX routines, required to
support the AR93xx series NICs.
TODO:
* ath_start() shouldn't be private, but it's called as part of
the RX path. I should likely migrate ath_rx_tasklet() back into
if_ath.c and then return this to be 'static'. The RX code really
shouldn't need to see TX routines (and vice versa.)
* ath_beacon_* should be in if_ath_beacon.[ch].
* ath_tdma_* should be in if_ath_tdma.[ch] ...
The configuration is:
* RGMII, both ports
* arge0 - connected to PHY4 as a dedicated port (CPU port)
* arge1 - connected to the switch ports
I've verified this on my routerstation pro board.
Most part is merged from amd64.
- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).
- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).
- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.
- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).
- i386/i386/apic_vector.s
Added cpususpend().
- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().
- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().
- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.
MFC after: 3 days
vm_pager_object_lookup() already referenced the object.
Note that there is no in-tree consumers of cdev_pager_lookup(). The
only known user of the function is i915 gem driver, which is not yet
imported. This should make the KPI change minor.
Submitted by: avg
MFC after: 1 week
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.
Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.
Reviewed by: kib
MFC after: 6 weeks
to freebsd-fs@, where the setfacl of an NFSv4 acl would fail.
This was caused by the VOP_ACLCHECK() call for ZFS replying
EOPNOTSUPP. After discussion with rwatson@, it was determined
that a call to VOP_ACLCHECK() before doing VOP_SETACL() is not
required. This patch fixes the problem by deleting the
VOP_ACLCHECK() call.
Tested by: Andrew Leonard (previous version)
MFC after: 1 week
Supermicro LCD screen modules seem to not support accessing reports through
the control pipes, but working fine with the interrupt pipes.
Sponsored by: iXsystems, Inc.
MFC after: 1 week
The NAND Flash environment consists of several distinct components:
- NAND framework (drivers harness for NAND controllers and NAND chips)
- NAND simulator (NANDsim)
- NAND file system (NAND FS)
- Companion tools and utilities
- Documentation (manual pages)
This work is still experimental. Please use with caution.
Obtained from: Semihalf
Supported by: FreeBSD Foundation, Juniper Networks
The code previously assumed that copyin/copyout did no
address translation and that the device tree blob could
be manipulated in-place (with only a few adjustments for
the ELF loader offset). This isn't possible on all platforms,
so the revised code uses copyout() to copy the device tree
blob into a heap-allocated buffer and then updates the
device tree with copyout(). This isn't ideal, since it
bloats the loader memory usage, but seems the only feasible
approach (short of rewriting all of the fdt manipulation
routines).
8.x code:
- If the lock cannot be acquired immediately unlocks 'bar' vnode
and then locks both vnodes in order.
- wrong vnode type panics from cache_enter_time after calls by
ext2_lookup.
The fix merges the fixes from ufs/ufs_lookup.c.
Submitted by: Mateusz Guzik
Approved by: jhb@ (mentor)
Reviewed by: kib@
MFC after: 1 week
Entries with zero inode number are considered placeholders by libc and
UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp,
unionfs.
Sponsored by: Google Summer of Code 2011
add some more BAR debugging logic.
* Change the definition of ath_debug and ath_softc.sc_debug from
int to uint64_t;
* Change the relevant sysctls;
* Add a new BAR TX debugging field;
* Use this in if_ath_tx.
This has been tested by using the sysctl program, which happily allows
for fields > 32 bits to be configured.
Although I _should_ handle the other errors in various ways (specifically
errors like FILT), treating them as having transmitted successfully
is completely wrong. Here, they'd be counted as successful and the BAW
would be advanced.. but the RX side wouldn't have received them.
The specific errors I've been seeing here are HAL_TXERR_FILT.
This patch does fix the issue - I've tested it using -i 0.001 pings
(enough to start aggregation) and now the behaviour is correct:
* The RX side never sees a "moved window" error, and
* The TX side sends BARs as needed, with the RX side correctly handling
them.
PR: kern/167902
compatible with the sched provider implemented by Solaris and its open-
source derivatives. Full documentation of the sched provider can be found
on Oracle's DTrace wiki pages.
Note that for compatibility with scripts originally written for Solaris,
serveral probes are defined that will never fire. These probes are defined
to fire when Solaris-specific features perform certain actions. As these
features are not present in FreeBSD, the probes can never fire.
Also, I have added a two probes that are not defined in Solaris, lend-pri
and load-change. These probes have been added to make it possible to
collect schedgraph data with DTrace.
Finally, a few probes are defined in Solaris to take a cpuinfo_t *
argument. As it was not immediately clear to me how to translate that to
FreeBSD, currently those probes are passed NULL in place of a cpuinfo_t *.
Sponsored by: Sandvine Incorporated
MFC after: 2 weeks
used, when the code should actually protect the tested
variable with a mutex. Since the tsleep()s had a 10sec
timeout, the race would have only delayed the allocation
of a new clientid for a client. The sleeps will also
rarely occur, since having a callback in progress when
a client acquires a new clientid, is unlikely.
in practice, since having a callback in progress when
a fresh clientid is being acquired by a client is unlikely.
MFC after: 1 month
depending upon the bootloader initialising it.
The aim is to eventually support a full switch set and reinitialisation
rather than relying on a consistent bootloader setup.
Remove the port flood config from arswitch.c, it's not yet used and
it's totally incorrect.
Whilst I'm here, also add in a comment describing why the full switch
reset is disabled.
Obtained from: Linux (OpenWRT) - Values
which carries fictitous managed pages. In particular, the consumers of
the new object type can remove all mappings of the device page with
pmap_remove_all().
The range of physical addresses used for fake page allocation shall be
registered with vm_phys_fictitious_reg_range() interface to allow the
PHYS_TO_VM_PAGE() to work in pmap.
Most likely, only i386 and amd64 pmaps can handle fictitious managed
pages right now.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
for allocation of fictitious pages, for which PHYS_TO_VM_PAGE()
returns proper fictitious vm_page_t. The range should be de-registered
after consumer stopped using it.
De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate
over registered ranges.
A hash container might be developed instead of range registration
interface, and fake pages could be put automatically into the hash,
were PHYS_TO_VM_PAGE() could look them up later. This should be
considered before the MFC of the commit is done.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
size for the AR7240.
* Include SM/MS macros, thanks to ath_hal(4).
* This field is for normal packets, VLAN and other headers are added to
this by the switch device.
* Set the MTU to 1536, to match what is done in Linux. Use the SM
macro to write this field.
Obtained from: Atheros (AR7240 datasheet), Linux OpenWRT (MTU default)
vm_page into new interface vm_page_initfake(). Handle the case of fake
page re-initialization with changed memattr.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
This is to silence warnings that result from different definitions of
uint64_t on different architectures, specifically i386 and sparc64.
MFC after: 1 month
all integrated and on-board peripherals except the DataFlash (at91_spi(4)
and at45d(4) still need to be unb0rken) and NAND Flash (missing NAND
framework) are working.
AFAICT, this makes FreeBSD the first operating system besides Nut/OS
supporting Ethernut 5 out of tree.
* Add the i2c bitbang bus;
* Add the etherswitch/rtl8366rb drivers;
* "fix" the USB GPIO configuration so USB actually works.
Submitted by: Stefan Bethke <stb@lassitu.de>
a taskqueue.
This gives a 16% performance improvement under high load on slow systems,
especially when vr shares an interrupt with another device, which is
common with the Alix x86 boards.
Contrary to the other devices, I left the interrupt processing for loop
in because there was no significant difference in performance and this
should avoid enqueuing more taskqueues unnecessarily.
We also decided to move the vr_start_locked() call inside the for loop
because we found out that it helps performance since TCP ACKs now have a
chance to go out quicker.
Reviewed by: yongari (older version, same idea)
Discussed with: yongari, jhb
to allow drivers to handle request completion directly without passing
them to the CAM SWI thread removing extra context switch.
Modify all ATA/SATA drivers to use them.
Reviewed by: gibbs, ken
MFC after: 2 weeks
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.
Tested by: Joel Ray Holveck (joelh at juniper.net)
PR: kern/165923
Reviewed by: kib
MFC after: 2 weeks
This way with the new zfsloader there is no need to explicitly set zfs
root filesystem either via vfs.root.mountfrom or fstab.
It should be automatically picked up from currdev which is by default
is set from bootfs.
Tested by: Florian Wagner <florian@wagner-flo.net> (x86)
MFC after: 1 month
In zfs loader zfs device name format now is "zfs:pool/fs",
fully qualified file path is "zfs:pool/fs:/path/to/file"
loader allows accessing files from various pools and filesystems as well
as changing currdev to a different pool/filesystem.
zfsboot accepts kernel/loader name in a format pool:fs:path/to/file or,
as before, pool:path/to/file; in the latter case a default filesystem
is used (pool root or bootfs). zfsboot passes guids of the selected
pool and dataset to zfsloader to be used as its defaults.
zfs support should be architecture independent and is provided
in a separate library, but architectures wishing to use this zfs support
still have to provide some glue code and their devdesc should be
compatible with zfs_devdesc.
arch_zfs_probe method is used to discover all disk devices that may
be part of ZFS pool(s).
libi386 unconditionally includes zfs support, but some zfs-specific
functions are stubbed out as weak symbols. The strong definitions
are provided in libzfsboot.
This change mean that the size of i386_devspec becomes larger
to match zfs_devspec.
Backward-compatibility shims are provided for recently added sparc64
zfs boot support. Currently that architecture still works the old
way and does not support the new features.
TODO:
- clear up pool root filesystem vs pool bootfs filesystem distinction
- update sparc64 support
- set vfs.root.mountfrom based on currdev (for zfs)
Mid-future TODO:
- loader sub-menu for selecting alternative boot environment
Distant future TODO:
- support accessing snapshots, using a snapshot as readonly root
Reviewed by: marius (sparc64),
Gavin Mu <gavin.mu@gmail.com> (sparc64)
Tested by: Florian Wagner <florian@wagner-flo.net> (x86),
marius (sparc64)
No objections: fs@, hackers@
MFC after: 1 month
* Add in the AR724x support. It probes the same as an AR8216/AR8316, so
just add in a hint to force the probe success rather than auto-detecting
it.
* Add in the missing entries from conf/files, lacking in the previous
commit.
The register values and CPU port / mirror port initialisation value was
obtained from Linux OpenWRT ag71xx_ar7240.c.
The DELAY(1000) to let things settle is my local workaround. For some
reason, PHY4 doesn't seem to probe very reliably without it. It's quite
possible that we're missing some MDIO bus initialisation code in if_arge
for the AR724x case. As I dislike DELAY() workarounds in general, it's
definitely worth trying to figure out why this is the case.
Tested on: AP93 (AR7240) reference design
Obtained from: Linux OpenWRT
The AP93 has:
* AR7240 - mips24k processor with integrated 10/100 switch and
various other peripherals;
* AR9283 - 2x2 2.4GHz 802.11n (with calibration data in flash);
* 64MB RAM;
* 16MB SPI flash.
The switch code detects as an AR8216 at the present moment, which isn't
_entirely_ strictly true. However, the MII/MDIO routing in AP93.hints
works - the arge0 MAC connects to PHY4 in the switch, but via the
switch internal MDIO bus. The switch connects to arge0's MDIO bus,
but only to export the switch registers.
Thanks to stb and ray for the switch work, and ray for helping determine
what the correct switch hints should be for this thing.
PAE to insta-panic on startup. Remove one unused variable that was
commented out.
Reviewed by: ambrisko@
Obtained from: jhb@ peter@ bz@ and countless others during BSDCAN
MFC after: 3 days
This is designed to support the very basic ethernet switch chip behaviour,
specifically:
* accessing switch register space;
* accessing per-PHY registers (for switches that actually expose PHYs);
* basic vlan group support, which applies for the rtl8366 driver but not
for the atheros switches.
This also includes initial support for:
* rtl8366rb support - which is a 10/100/1000 switch which supports
vlan groups;
* Initial Atheros AR8316 switch support - which is a 10/100/1000 switch
which supports an alternate vlan configuration (so the vlan group
methods are stubbed.)
The general idea here is that the switch driver may speak to a variety of
backend busses (mdio, i2c, spi, whatever) and expose:
* If applicable, one or more MDIO busses which ethernet interfaces can
then attach PHYs to via miiproxy/mdioproxy;
* exposes miibusses, one for each port at the moment, so ..
* .. a PHY can be exposed on each miibus, for each switch port, with all
of the existing MII/ifnet framework.
However:
* The ifnet is manually created for now, and it isn't linked into the
interface list, nor can you (currently) send/receive frames on this ifnet.
At some point in the future there may be _some_ support for this, for
switches with a multi-port, isolated mode.
* I'm still in the process of sorting out correct(er) locking.
TODO:
* ray's switch code in zrouter (zrouter.org) includes a much more developed
newbus API that covers the various switch methods, as well as a
capability API so drivers, the switch layer and the userland utility
can properly control the subset of supported features.
The plan is to sort that out later, once the rest of ray's switch drivers
are brought over and extended to export MII busses and PHYs.
Submitted by: Stefan Bethke <stb@lassitu.de>
Reviewed by: ray
# This doesn't implement the full Linux boot ABI for arm yet.
# since there's no ATAGs list passed in for r2, and r0 has
# boot options rather than 0 as specified in the standard.
# Commited code to the tree won't touch any of this anyway, but
# future code may be able to use this.
failed while write to some other succeeded. Instead mark disk as failed.
- Make RAID1E less aggressive in failing disks to avoid volume breakage.
MFC after: 2 weeks
deterministically handled after the corresponding PHY drivers when
loaded as modules. Otherwise, when these MAC/PHY driver pairs are
compiled into a single module probing the PHY driver may fail. This
makes r151438 and r226154 actually work. [1]
Reported and tested by: yongari (fxp(4))
- Use DEVMETHOD_END.
- Use NULL instead of 0 for pointers.
Submitted by: jhb [1]
MFC after: 3 days
ports. This currently is a nop, but will soon be used to allow
support for multiple boards to be built into one kernel (starting with
AT91RM9200 and expanding out from there).
There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again. This
revision changes both aspects.
The read ahead optimization is now more effective. It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults. This can yield increased read bandwidth. For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.
The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed. This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.
The unmap and cache behind optimization now clears each page's referenced
flag. Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.
From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.
Glanced at: kib
MFC after: 2 weeks
ataraid(4) previously was present there and having GEOM RAID is convinient.
Unlike other classes GEOM RAID can be set up from BIOS before install and
users are expecting it to be detected automatically.
2703 add mechanism to report ZFS send progress
If the zfs send command is used with the -v flag, the amount of bytes
transmitted is reported in per second updates.
References:
https://www.illumos.org/issues/2703
Obtained from: illumos (issue #2703)
MFC after: 2 weeks
as possible when using more than one igb(4) adapter. This
means that queues will not be bound to the same CPUs if
there are more CPUs availble.
This is only applicable to a system that has multiple interfaces.
Obtained from: Yahoo! Inc.
MFC after: 3 days
Place the arguments at a fixed offset of 0x800 withing the argument area
(of size 0x1000). Allow variable size extended arguments first of which
should be a size of the extended arguments (including the size
parameter).
Consolidate all related definitions in a new i386/common/bootargs.h header.
Many thanks to jhb and bde for their guidance and reviews.
Reviewed by: jhb, bde
Approved by: jhb
MFC after: 1 month
controller to perform MDIO type accesses to a remote transceiver
using message pages defined through MRBE(multirate backplane
ethernet). It's used in blade systems(e.g Dell Blade m610) which
are connected to pass-through blades rather than traditional
switches.
This change directly manipulates firmware's mailboxes to control
remote PHY such that it does not use mii(4). Alternatively, as
David said, it could be implemented in brgphy(4) by creating a fake
PHY and let brgphy(4) do necessary mii accesses and bce(4) can
implement mailbox accesses based on the type of brgphy(4)'s mii
accesses. Personally, I think it would make brgphy(4) hard to
maintain since it would have to access many bce(4) registers in
brgphy(4). Given that there are users who are suffering from lack
of remote PHY support, it would be better to get working system
rather than waiting for complete/perfect implementation.
Tested by: Jan Winter ( jan.winter <> kantarmedia dot de )
Reviewed by: davidch (initial version)
MFC after: 2 weeks
TX and RX PCU stop/drain routines have been thoroughly debugged.
It's also very likely that I should add hooks back up to the
interface glue (if_ath_pci / if_ath_ahb) to do any relevant
bus flushes that are required. A WMAC DDR flush may be required
for the AR9130 SoC.
some of the IPI mechanisms used by the common MIPS SMP code so we could use
the multicast IPI facilities, on GXemul as well as on several real hardware
platforms, and the ability to have multiple hard IPI types.
- DTrace scripts to check for errors, performance, ...
they serve mostly as examples of what you can do with the static probe;s
with moderate load the scripts may be overwhelmed, excessive lock-tracing
may influence program behavior (see the last design decission)
Design decissions:
- use "linuxulator" as the provider for the native bitsize; add the
bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
- Add probes only for locks which are acquired in one function and released
in another function. Locks which are aquired and released in the same
function should be easy to pair in the code, inter-function
locking is more easy to verify in DTrace.
- Probes for locks should be fired after locking and before releasing to
prevent races (to provide data/function stability in DTrace, see the
man-page of "dtrace -v ..." and the corresponding DTrace docs).
result in INQUIRY VPD 0x81 to SATA devices to return only 63 bytes of data
instead of 64 during SCSI/ATA translation.
Sponsored by: Intel
Approved by: scottl
MFC after: 1 week
I/O port addresses. Even if we do, this is hardly the place to mask
interrupts. It's not clear that this was at all needed. The code came
with CVS revision 1.2 of nexus.c when interrupt support was first added.
What is known is that ia64 has always been designed around the IOSAPIC,
and that doing I/O like this prevents Altix from booting.
them to cleanup and goto out when acknowledging the LD's. Check
for failure on malloc. Remove a couple of extra lines and remove
the spurious return.
Prompted by: Petr Lampa
MFC after: 3 days
Use MADT to match ACPI Processor objects to CPUs. MADT and DSDT/SSDTs may
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems, 'debug.acpi.cpu_unordered' tunable is added.
Set this to a non-zero value to restore the old behavior.
Many thanks to jhb for pointing me to the right direction and the manual
page change.
Reported by: Harris, James R (james dot r dot harris at intel dot com)
Tested by: Harris, James R (james dot r dot harris at intel dot com)
Reviewed by: jhb
MFC after: 1 month
list CPUs in different orders, especially for disabled logical cores. Now
we match ACPI IDs from the MADT with Processor objects, strictly order CPUs
accordingly, and ignore disabled cores. This prevents us from executing
methods for other CPUs, e. g., _PSS for disabled logical core, which may not
exist. Unfortunately, it is known that there are a few systems with buggy
BIOSes that do not have unique ACPI IDs for MADT and Processor objects. To
work around these problems
ThunderBolt cannot read sector >= 2^32 or 2^21
with supplied patch.
Second the bigger change, fix RAID operation on ThunderBolt base
card such as physically removing a disk from a RAID and replacing
it. The current situation is the RAID firmware effectively hangs
waiting for an acknowledgement from the driver. This is due to
the firmware support of the driver actually accessing the RAID
from under the firmware. This is an interesting feature that
the FreeBSD driver does not use. However, when the firmare
detects the driver has attached it then expects the driver will
synchronize LD's with the firmware. If the driver does not sync.
then the management part of the firmware will hang waiting for
it so a pulled driver will listed as still there.
The fix for this problem isn't extremely difficult. However,
figuring out why some of the code was the way it was and then
redoing it was involved. Not have a spec. made it harder to
try to figure out. The existing driver would send a
MFI_DCMD_LD_MAP_GET_INFO command in write mode to acknowledge
a LD state change. In read mode it gets the RAID map from the
firmware. The FreeBSD driver doesn't do that currently. It
could be added in the future with the appropriate structures.
To simplify things, get the current LD state and then build
the MFI_DCMD_LD_MAP_GET_INFO/write command so that it sends
an acknowledgement for each LD. The map would probably state
which LD's changed so then the driver could probably just
acknowledge the LD's that changed versus all. This doesn't seem
to be a problem. When a MFI_DCMD_LD_MAP_GET_INFO/write command
is sent to the firmware, it will complete later when a change
to the LD's happen. So it is very much like an AEN command
returning when something happened. When the
MFI_DCMD_LD_MAP_GET_INFO/write command completes, we refire the
sync'ing of the LD state. This needs to be done in as an event
so that MFI_DCMD_LD_GET_LIST can wait for that command to
complete before issuing the MFI_DCMD_LD_MAP_GET_INFO/write.
The prior code didn't use the call-back function and tried
to intercept the MFI_DCMD_LD_MAP_GET_INFO/write command when
processing an interrupt. This added a bunch of code complexity
to the interrupt handler. Using the call-back that is done
for other commands got rid of this need. So the interrupt
handler is greatly simplified. It seems that even commands
that shouldn't be acknowledged end up in the interrupt handler.
To deal with this, code was added to check to see if a command
is in the busy queue or not. This might have contributed to the
interrupt storm happening without MSI enabled on these cards.
Note that MFI_DCMD_LD_MAP_GET_INFO/read returns right away.
It would be interesting to see what other complexity could
be removed from the ThunderBolt driver that really isn't
needed in our mode of operation. Letting the RAID firmware
do all of the I/O to disks is a lot faster since it can
use its caches. It greatly simplifies what the driver has
to do and potential bugs if the driver and firmware are
not in sync.
Simplify the aen_abort/cm_map_abort and put it in the softc
versus in the command structure.
This should get merged to 9 before the driver is merged to
8.
PR: 167226
Submitted by: Petr Lampa
MFC after: 3 days
- Use isync/lwsync unconditionally for acquire/release. Use of isync
guarantees a complete memory barrier, which is important for serialization
of bus space accesses with mutexes on multi-processor systems.
- Go back to using sync as the I/O memory barrier, which solves the same
problem as above with respect to mutex release using lwsync, while not
penalizing non-I/O operations like a return to sync on the atomic release
operations would.
- Place an acquisition barrier around thread lock acquisition in
cpu_switchin().
This seems to break at least my test board here (AR71xx + AR8316 switch
PHY). Since I do have a whole sleuth of "normal" PHY boards (with
an AR71xx on a normal PHY port), I'll do some further testing with those
to determine whether this is a general issue, or whether it's limited
to the behaviour of the "fake" dedicated PHY port mode on these atheros
switches.
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.
The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.
MFC after: 1 week
is running on other cpu, the CALLOUT_PENDING flag is temporarily
cleared. Then, callout_stop() on this, in fact active, callout fails
because CALLOUT_PENDING is not set, and callout_stop() returns 0.
Now, in sleepq_check_timeout(), the failed callout_stop() causes the
sleepq code to execute mi_switch() without even setting the wmesg,
since the switch-out is supposed to be transient. In fact, the thread
is put off the CPU for full timeout interval, instead of being put on
runq immediately. Until timeout fires, the process is unkillable for
obvious reasons.
Fix this by marking the migrating callouts with CALLOUT_DFRMIGRATION
flag. The flag is cleared by callout_stop_safe() when the function
detects a migration, besides returning the success. The softclock()
rechecks the flag for migrating callout and cancels its execution if
the flag was cleared meantime.
PR: misc/166340
Reported, debugging traces provided and tested by:
Christian Esken <christian.esken trivago com>
Reviewed by: avg, jhb
MFC after: 1 week
Cleaner solution (e.g. adding another header) should be done here.
Original log:
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.
Requested by: luigi
Approved by: kib(mentor)
Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.
BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.
PR: kern/138620
Approved by: thompsa@
another process is in open() or stat() for the device node, then
close() from the owning process does not result in cdevsw close
method call. This fixes the pemanent "Device busy" seen.
Changed the sndstat_lock from mutex to sx. This allows to extend
the region covered by the lock, to include the uiomove() call in
sndstat_read() and bufptr increment. This fixes the "panic:
sbuf_put_byte called with finished or corrupt sbuf" seen.
In collaboration with: kib
MFC after: 1 week
code and which had only stub implementations or no implementation on all
platforms. Makes gxemul compile.
Hinted by: rwatson
MFC after: 3 weeks
X-MFC by: rwatson:
if the accounting log file is atomically replaced with a new file
(such as during log rotation).
- Simplify accounting log rotation a bit. There is no need to re-run
accton(8) after renaming the new log file to it's real name.
PR: kern/167321
Tested by: Jeremy Chadwick