By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag),
packets are forwarded between the NIC and the host stack unless the
netmap client clears the NS_FORWARD flag on the individual descriptors.
This feature greatly simplifies applications where some traffic
(think of ARP, control traffic, ssh sessions...) must be processed
by the host stack, whereas the bulk is handled by the netmap process
which simply (un)marks packets that should not be forwarded.
The default is chosen so that now a netmap receiver operates
in a mode very similar to bpf.
Of course there is no free lunch: traffic to/from the host stack
still operates at OS speed (or less, as there is one extra copy in
one direction).
HOWEVER, since traffic goes to the user process before being
reinjected, and reinjection occurs in a user context, you get some
form of livelock protection for free.
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.
Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.
MFC after: 1 week
Add a missing 0 to the mask for byte0 of C_SIZE.
The previous mask (0xc) worked except that the last 0-1536K of the disk
could not be accessed since we were shifting the (wrong) bits we did
mask off the right edge.
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.
This seems to help but is not a complete solution.
MFC after: 3 days
logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are
unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could
therefore potentially corrupt the result (although under normal operation,
neither variable should legitmately exceed 32 bits).
[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html
Submitted by: jhb
MFC after: 1 week
generating binary diffs.
- Constify a few strings used in the driver.
- Style changes to make the driver compile with default clang settings.
Approved by: HighPoint Technologies
MFC after: 3 days
gcc handles -symbolic by passing -Bsymbolic through to ld. clang ignores
-symbolic and thus invokes ld without -Bsymbolic which leads to some symbols
not being properly linked in loader.efi. Fix this by using -Wl,-Bsymbolic which
passes -Bsymbolic to ld in both the gcc and clang cases.
Approved by: rpaulo
This is easily possible now that the TX is protected by a single
lock, rather than a per-TXQ (and thus per-TID) lock.
Only set CLRDMASK if none of the destinations are filtered.
This likely will need some tuning when it comes time to do UASPD/PS-POLL
TX, however at that point it should be manually set anyway.
Tested:
* AR9280, STA mode
TODO:
* More thorough testing in AP mode
* test other chipsets, just to be safe/sure.
that 'smp_started != 0'.
This is required because the VT-x initialization calls smp_rendezvous()
to set the CR4_VMXE bit on all the cpus.
With this change we can preload vmm.ko from the loader.
Reported by: alfred@, sbruno@
Obtained from: NetApp
arch_zfs_probe method is supposed to only probe for ZFS vdevs, but it can
not expect that ZFS data is in a ready state yet.
So, move some code from sparc64_zfs_probe to main to meet the constraints.
Reported by: Chris Ross <cross+freebsd@distal.com>
Tested by: Chris Ross <cross+freebsd@distal.com>
MFC after: 4 days
'bhyve' was developed by grehan@ and myself at NetApp (thanks!).
Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their
support and encouragement.
Obtained from: NetApp
Make umass return an error code if SCSI sense retrieval request
has failed. Make sure scsi_error_action honors SF_NO_RETRY and
SF_NO_RECOVERY in all cases, even if it cannot parse sense bytes.
Reviewed by: hselasky (umass), scottl (cam)
interrupt counts and names, by making the names into an array of fixed-length
strings that can be directly indexed. This eliminates extra memory accesses
on every interrupt to increment the counts.
As a side effect, it also fixes a bug that would corrupt the names data
if a name was longer than MAXCOMLEN, which led to incorrect vmstat -i output.
Approved by: cognet (mentor)
to the old one's nfs.nfsrv.async.
Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.
Submitted by: rmacklem
MFC after: 2 weeks
two upcoming features:
semi-transparent mode:
when a device is opened in this mode, the
user program will be able to mark slots that must be forwarded
to the "other" side (i.e. from NIC to host stack, or viceversa),
and the forwarding will occur automatically at the next netmap syscall.
This saves the need to open another file descriptor and do
the forwarding manually.
direct-forwarding mode:
when operating with a VALE port, the user can specify in the slot
the actual destination port, overriding the forwarding decision
made by a lookup of the destination MAC. This can be useful to
implement packet dispatchers.
No API changes will be introduced.
No new functionality in this patch yet.
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.
MFC after: 2 weeks
tunable_mbinit() where it is next to where it is used later.
Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES
to SI_SUB_KMEM after the VM is running. This allows to use better
methods to determine the effectively available physical and virtual
memory available to the kernel.
Update comments.
In a second step it can be merged into mbuf_init().
a process has an auditid/preselection masks specified, and
is jailed, include the zonename (jailname) token as a
part of the audit record.
Reviewed by: pjd
MFC after: 2 weeks
chip hangs.
* Always do a reset in ath_bmiss_proc(), regardless of whether the
hardware is "hung" or not. Specifically, for spectral scan, there's
likely a whole bunch of potential hangs that we don't (yet) recognise
in the HAL. So to avoid staying RX deaf persisting until the station
disassociates, just do a no-loss reset.
* Set sc_beacons=1 in STA mode. During a reset, the beacon programming
isn't done. (It's likely I need to set sc_syncbeacons during a hang
reset, but I digress.) Thus after a reset, there's no beacon timer
programming to send a BMISS interrupt if beacons aren't heard ..
thus if the AP disappears, you won't get notified and you'll have to
reset your interface.
This hasn't yet fixed all of the hangs that I've seen when debugging
spectral scan, but it's certainly reduced the hang frequency and it
should improve general STA stability in very noisy environments.
Tested:
* AR9280, STA mode, spectral scan off/on
PR: kern/175227
when an interface is going down.
Right now it's quite possible (but very unlikely!) that ath_reset()
or similar is called, leading to a beacon config call, in parallel with
the last VAP being destroyed.
This likely should be fixed by making sure the bmiss/bstuck/watchdog
taskqueues are canceled whenever the last VAP is destroyed.
in r245004. Although the report was for noatime option which is
non-functional for the nullfs, other standard options like nosuid or
noexec are useful with it.
Reported by: Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au>
MFC after: 3 days
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
when krpc was committed.
Reviewed by: rmacklem
MFC after: 1 week
serial devices (such as printer ports). This allows ppc devices attached
to puc to correctly setup an interrupt handler and work.
Tested by: Andre Albsmeier Andre.Albsmeier@siemens.com
MFC after: 1 week
When maxusers was unrestricted and maxfiles was allowed to autotune
much higher the result was that ncallout which was based on maxfiles
and maxproc grew much higher than was needed.
To fix this clip autotuning to the same number we would get with
the old maxusers algorithm which would stop scaling at 384
maxusers.
Growing ncalout higher is not likely to be needed since most consumers
of timeout(9) are gone and any higher value for ncallout causes the
callwheel hashes to be much larger than will even be needed for
most applications.
MFC after: 1 month
Reviewed by: mav
if_start().
This removes the overlapping data path TX from occuring, which
solves quite a number of the potential TX queue races in ath(4).
It doesn't fix the net80211 layer TX queue races and it doesn't
fix the raw TX path yet, but it's an important step towards this.
This hasn't dropped the TX performance in my testing; primarily
because now the TX path can quickly queue frames and continue
along processing.
This involves a few rather deep changes:
* Use the ath_buf as a queue placeholder for now, as we need to be
able to support queuing a list of mbufs (ie, when transmitting
fragments) and m_nextpkt can't be used here (because it's what is
joining the fragments together)
* if_transmit() now simply allocates the ath_buf and queues it to
a driver TX staging queue.
* TX is now moved into a taskqueue function.
* The TX taskqueue function now dequeues and transmits frames.
* Fragments are handled correctly here - as the current API passes
the fragment list as one mbuf list (joined with m_nextpkt) through
to the driver if_transmit().
* For the couple of places where ath_start() may be called (mostly
from net80211 when starting the VAP up again), just reimplement
it using the new enqueue and taskqueue methods.
What I don't like (about this work and the TX code in general):
* I'm using the same lock for the staging TX queue management and the
actual TX. This isn't required; I'm just being slack.
* I haven't yet moved TX to a separate taskqueue (but the taskqueue is
created); it's easy enough to do this later if necessary. I just need
to make sure it's a higher priority queue, so TX has the same
behaviour as it used to (where it would preempt existing RX..)
* I need to re-review the TX path a little more and make sure that
ieee80211_node_*() functions aren't called within the TX lock.
When queueing, I should just push failed frames into a queue and
when I'm wrapping up the TX code, unlock the TX lock and
call ieee80211_node_free() on each.
* It would be nice if I could hold the TX lock for the entire
TX and TX completion, rather than this release/re-acquire behaviour.
But that requires that I shuffle around the TX completion code
to handle actual ath_buf free and net80211 callback/free outside
of the TX lock. That's one of my next projects.
* the ic_raw_xmit() path doesn't use this yet - so it still has
sequencing problems with parallel, overlapping calls to the
data path. I'll fix this later.
Tested:
* Hostap - AR9280, AR9220
* STA - AR5212, AR9280, AR5416
save queue code.
Instead, use if_transmit() directly - and handle the cases where frame
transmission fails.
I don't necessarily like this and I think at this point the M_ENCAP check,
node freeing upon fail and the actual if_transmit() call should be done
in methods in ieee80211_freebsd.c, but I digress slightly..
This removes one of the last few uses of if_start() and the ifnet
if_snd queue. The last major offender is ieee80211_output.c, where
ieee80211_start() implements if_start() and uses the ifnet queue
directly.
(There's a couple of gotchas here, where the if_start pointer is
compared to ieee80211_start(), but that's a later problem.)
CISS_MAX_LOGICAL to 107
Submitter wanted to increase the number of logical disks supported by ciss(4)
by simply raising the CISS_MAX_LOGICAL value even higher. Instead, consult
the documentation for the raid controller (OPENCISS) and poke the controller
bits to ask it for how many logical/physical disks it can handle.
Revert svn R242089 that raised CISS_MAX_LOGICAL to 64 for all controllers.
For older controllers that don't support this mechanism, fallback to the old
value of 16 logical disks. Tested on P420, P410, P400 and 6i model ciss(4)
controllers.
This should will be MFC'd back to stable/9 stable/8 and stable/7 after the MFC
period.
PR: kern/151564
Reviewed by: scottl@freebsd.org
MFC after: 2 weeks
This is the Compressed Local IPv6 table on the chip. To save space, the
chip uses an index into this table instead of a full IPv6 address in
some of its hardware data structures.
For now the driver fills this table with all the local IPv6 addresses
that it sees at the time the table is initialized. I'll improve this
later so that the table is updated whenever new IPv6 addresses are
configured or existing ones deleted.
MFC after: 1 week
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.
ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.
MFC after: 2 weeks
as clean on shutdown and move that action from shutdown_pre_sync stage to
shutdown_post_sync to avoid extra flapping.
ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully. To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.
PR: kern/113957
MFC after: 2 weeks
- Teach find_best_mtu_idx() to deal with IPv6 endpoints.
- Install correct protosw in offloaded TCP/IPv6 sockets when DDP is
enabled.
- Move set_tcp_ddp_ulp_mode to t4_tom.c so that t4_tom.h can be included
without having to drag in t4_msg.h too. This was bothering the iWARP
driver for some reason.
MFC after: 1 week
- Add full support for IPv6 addresses.
- Read the size of the L2 table during attach. Do not assume that PCIe
physical function 4 of the card has all of the table to itself.
- Use FNV instead of Jenkins to hash L3 addresses and drop the private
copy of jhash.h from the driver.
MFC after: 1 week
ARM EABI syscall calling convention.
The current ABI encodes the syscall number in the instruction. This causes
issues with the thumb mode as it only has 8 bits to encode this value and
we have too many system calls and by using a register will simplify the
code to get the syscall number in the kernel.
With the ARM EABI we reuse the Linux calling convention by storing the
value in r7. Because of this we use both methods to encode the syscall
number in this function.
Set the v_hash for a new vnode in the getnewvnode() to the value
calculated based on the vnode structure address. Filesystems using
vfs_hash_insert() override the v_hash using the standard formula of
(inode_number + mnt_hashseed). For other filesystems, the
initialization allows the vfs_hash_index() to provide useful hash too.
Suggested, reviewed and tested by: peter
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
padding. On the amd64 kernel with INVARIANTS turned off, size of the
struct vnode is reduced from 496 to 472 bytes, saving 24 bytes of
memory and KVA per vnode.
Noted and reviewed by: peter
Tested by: pho
Sponsored by: The FreeBSD Foundation
existing nullfs vnode by the lower vnode is only 16 slots. Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.
Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.
Pointy hat to: kib
Diagnosed and reviewed by: peter
Tested by: peter, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
pre-masked hash for the given vnode. The function assumes that
vp->v_hash is initialized by the filesystem vnode instantiation
function. At the moment, it is only done if filesystem uses
vfs_hash_insert().
Reviewed by: peter
Tested by: peter, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
unsupported metadata types like Intel Smart Response to not corrupt them.
- Improve setting of these things during metadata writing to protect from
incapable BIOS'es and other implementations.
on Raspberry Pi.
o convert mmap address to physical.
o add FBIOGTYPE ioctl handler - allow to get screen resolution by new
xf86-video-scfb driver.
Originally designed for "Efika MX" project.
Sponsored by: FreeBSD Foundation
Implement an FDT attachment for altera_avgen(4).
Portions of the changeset updating DTS and device.hints will be merged
separately.
Sponsored by: DARPA, AFRL
Rework altera_avgen(4) to cleanly(ish) separate nexus bus
attachment from the driver itself. This should allow us to
plug in an fdt attachment more easily.
Sponsored by: DARPA, AFRL
Start restructuring of altera_avgen(4) so that it can have an FDT
attachment -- this requires first properly breaking out the current
nexus attachment from the driver implementation.
Sponsored by: DARPA, AFRL
Write FDT attachment for the Terasic MTL (multitouch LCD) driver.
Exploit the fact that FDT allows multiple memory ranges to be
assigned to a device, giving us a cleaner description than
device.hints does.
Portions of this changeset that remove mtl from BERI device.hints and
add to DTS will be merged separately.
Sponsored by: DARPA, AFRL
Add an Intel StrataFlash (isf) driver FDT attachment.
Portions of the original changeset hooking up FDT use for BERI will be
merged separately.
Sponsored by: DARPA, AFRL
disks should be rebuilt. Our rebuild code is same time disk-centric. To
handle this situation properly check all disks for RBLD flags, and if no
disk specified try rebuild/resync all of them except newly inserted.
Windows driver uses such migration when it creates new arrays. While GEOM
RAID has no mechanism to implement migration in general case, this specifc
case still can be handled easily via degraded RAID1 creation followed by
regular rebuild.
rather than a constant so that VM_KMEM_SIZE_MAX will scale automatically
with the kernel address space size. This is particularly important for
MIPS because the same definition is used by both 32- and 64-bit kernels.
Tested by: jchandra
unmerged BERI DTS files) to head:
Use the OFW compatible string "mips,mips4k" rather than
"mips4k,cp0" for interrupt control using MIPS4k CP0.
Suggested by: thompsa
Implement a MIPS FDT PIC decode routine to use when no PIC has been
configured, which assumes a cascade back to the nexus bus (e.g.,
the on-board CP0 interrupt management parts on the MIPS). If the
soc bus in a MIPS DTS file is declared as "mips4k,cp0"-compatible,
then this will be enabled. This is sufficient to allow IRQs to be
configured on BERI.
Sponsored by: DARPA, AFRL
This prevents quad igb card on high core machines, without any nmbcluster or
igb queue tuning wedging the boot process if all nics are configured.
Reviewed by: jfv
Approved by: pjd (mentor)
MFC after: 1 week
Provided a bus_space implementation for FDT, modelled on
bus_space_generic, but with a local version of the map address
routine that does a P->V translation, as is the case with NLM's
similar routine for XLP. It's not clear to me that this is the
right solution -- possibly this belongs in simplebus -- however,
it is sufficient to get the DE4 LED driver working.
Sponsored by: DARPA, AFRL
In a sign of weakness, replicate the MIPS bus_space_generic.c to
produce a new FDT version, which will perform necessary address
space translation for bus_space -- the solution used in NLM's MIPS
FDT support, but possibly not quite the right thing. This is
inconsistent with regular I/O via the nexus and the generic
bus_space, which instead perform translation via pmap_mapdev()
when a resource is activated. However, it will work while I
attempt to identify what the right way to reconcile possible
approaches.
(Another approach might be to make simplebus use Nexus's activate
routine instead of a generic one?)
Sponsored by: DARPA, AFRL
Add code so that the BERI boot process can ask the kernel linker for
DTB blobs that may have been left for it by the boot loader, as done
on PowerPC and ARM. This will require both a more mature boot
loader, and more mature boot loader argument passing mechanism,
than currently supported on BERI.
Sponsored by: DARPA, AFRL
Initialise Openfirmware/FDT code earlier in the FreeBSD/beri boot,
so that the results will be available for configuring the console
UART (eventually).
Suggested by: thompsa
Sponsored by: DARPA, AFRL
It is alike to RAID1, but with dedicating master and recovery disks and
providing manual control over synchronization. It allows to use recovery
disk as snapshot of the master disk from the time of the last sync.
This implementation is not functionaly complete comparing to Windows,
but it is better then silent conversion to RAID1 on first boot.
to avoid sending extra READ CAPACITY requests by dastart(). Schedule periph
again on reprobe completion, or otherwise it may stuck indefinitely long.
This should fix USB explore thread hanging on device unplug, waiting for
periph destruction.
Reported by: hselasky
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.
Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete). This guarantees
that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
power of 2.
MFC after: 1 week
ASUS P8Z77-V board reports _AC2, _AC3 and _AC4 setpoints as 0C. With active
cooling already automatically set to _AC2, that still caused driver to print
two useless lines about temperature above _AC3 and _AC4 every ten seconds.
Three setponts of 0C is probably a board bug, but the same spam could happen
also in correct case if system is runnign not with the lowest cooling level.
the underlying zap_count() to return no errors. However, it is possible
that the pool reaches to such a state where zap_count would return error,
leading to panics when a pool is imported.
This commit changes the ddt_zap_count to return error returned from
zap_count and handle the error appropriately. With this change, it's now
possible to let zpool rollback damaged transaction groups and import the
pool.
Obtained from: ZFS on Linux github (e8fd45a0f9)
MFC after: 1 month
false. It is right. Delete it because on the next line we catch all
'negative' cases with the test > 2, since 'negative' numbers are just
really big unsigned numbers and we do an identical action.
get back the leased write reference from the lower vnode. There is no
other path which can correct v_writecount on the lowervp.
Reported by: flo
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
and da_default_timeout where their current hardcoded values matched the current
default value for said tunables.
PR: kern/169976
Reviewed by: pjd (mentor)
Approved by: mav
DISKFLAG_CANDELETE. While this change makes this layer consistent
other layers such as UFS and ZFS BIO_DELETE support may not notice
any change made manually via these device sysctls until the device
is reopened via a mount.
Also corrected var order in dadeletemethodsysctl
PR: kern/169801
Reviewed by: pjd (mentor)
Approved by: mav
MFC after: 2 weeks
Factor out USB mouse and keyboard detection logic.
Reject USB keyboards which have mouse alike HID items
in their HID descriptors.
Submitted by: Matthew W
MFC after: 1 week
Currently we use interface indeces as zone IDs for link-local and
interface-local scopes, and since we don't have any tool to configure
zone IDs, there is no need to acquire the afdata lock several times per
packet only to read if_index value.
So, now in6_setscope reads zone IDs for interface-local, link-local and
global scopes without a lock.
Sponsored by: Yandex LLC
MFC after: 2 weeks
resources are partitioned.
- Reduce the number of virtual interfaces reserved for PF4. This leaves
spare room in the source MAC table and allows the driver to setup
filters that rewrite the source MAC address.
- Reduce the number of filters and use the freed up space for the CLIP
(Compressed Local IPv6 addresses) table. This is a prerequisite for
IPv6 TOE support which will follow separately in a series of commits.
MFC after: 1 week
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not. This can result in
very slow data transfers in the case of window scaling disagreements.
The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.
Reviewed by: net@
MFC after: 2 weeks
It stops treating the address on the interface as special by source
address selection rule even when the interface is outgoing interface.
This is desired in some situation.
Requested by: hrs
Reviewed by: IHANet folks including hrs
MFC after: 1 week
Previously CTL would leave individual LUNs enabled in the target
driver, whether or not the port as a whole was enabled. It would
also leave the wildcard LUN enabled indefinitely.
This change means that CTL will enable and disable any active LUNs,
as well as the wildcard LUN, when enabling and disabling a port.
Also, fix a bug that could crop up due to an uninitialized CCB
type.
ctl.c: Before calling ctl_frontend_online(), run through
the LUN list and enable all active LUNs.
After calling ctl_frontend_offline(), run through
the LUN list and disble all active LUNs.
scsi_ctl.c: Before bringing a port online, allocate the
wildcard peripheral for that bus. And after taking
a port offline, invalidate the wildcard peripheral
for that bus.
Make sure that we hold the SIM lock around all
calls to xpt_action() and other transport layer
interfaces that require it.
Use CAM_SIM_{LOCK|UNLOCK} consistently to acquire
and release the SIM lock.
Update a number of outdated comments. Some of
these should have been fixed long ago.
Actually do LUN disbables now. The newer drivers
in the tree work correctly for this as far as I
know.
Initialize the CCB type to CTLFE_CCB_DEFAULT to
avoid a panic due to uninitialized memory.
Submitted by: Chuck Tuffli (partially)
MFC after: 1 week
in devfs if a particular race condition is hit in the device pager
code.
This was a side effect of change 227530 which changed the device
pager interface to call a new destructor routine for the cdev.
That destructor routine, old_dev_pager_dtor(), takes a VM object
handle.
The object handle is cast to a struct cdev *, and passed into
dev_rel().
That works in most cases, except the case in cdev_pager_allocate()
where there is a race condition between two threads allocating an
object backed by the same device. The loser of the race
deallocates its object at the end of the function.
The problem is that before inserting the object into the
dev_pager_object_list, the object's handle is changed from the
struct cdev pointer to the object's own address. This is to avoid
conflicts with the winner of the race, which already inserted an
object in the list with a handle that is a pointer to the same cdev
structure.
The object is then passed to vm_object_deallocate(), and eventually
makes its way down to old_dev_pager_dtor(). That function passes
the handle pointer (which is actually a VM object, not a struct
cdev as usual) into dev_rel(). dev_rel() decrements the reference
count in the assumed struct cdev (which happens to be 0), and
that triggers the assertion in dev_rel() that the reference count
is greater than or equal to 0.
The fix is to add a cdev pointer to the VM object, and use that
pointer when calling the cdev_pg_dtor() routine.
vm_object.h: Add a struct cdev pointer to the VM object
structure.
device_pager.c: In cdev_pager_allocate(), populate the new cdev
pointer.
In dev_pager_dealloc(), use the new cdev pointer
when calling the object's cdev_pg_dtor() routine.
Reviewed by: kib
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
This should not matter much when running on bare metal but it makes the guest
more friendly when running inside a virtual machine.
Discussed with: jhb
Obtained from: NetApp
and embeds it into address. Inside the kernel we keep addresses with
embedded zone id only for two scopes: link-local and interface-local.
For other scopes this function is nop in most cases. To reduce an
overhead of locking, first check that address is capable for embedding.
Also, handle the loopback address before acquire the lock.
Sponsored by: Yandex LLC
MFC after: 1 week
This is intended to support reporting FFT results during active channel
scans, for users who would like to fiddle around with writing applications
that do both FFT visualisation _and_ AP scanning.
* add a new ioctl to enable/trigger spectral scan at channel change/reset;
* set do_spectral consistently if it's enabled, so a channel set/reset
will carry forth the correct PHY error configuration so frames
are actually received;
* for NICs that don't do spectral scan, don't bother checking the
spectral scan state on channel change/reset.
Tested:
* AR9280 - STA and scanning;
* AR5416 - STA, ensured that the SS code doesn't panic
r238966
Bump up the heap size to 1MB. With a few kernel modules, libstand
zalloc and userboot seem to want to use ~600KB of heap space, which
results in a segfault when malloc fails in bhyveload.
r241180
Clarify comment about default number of FICL dictionary cells.
r241153
Allow the number of FICL dictionary cells to be overridden.
Loading a 7.3 ISO with userboot/amd64 takes up 10035 cells,
overflowing the long-standing default of 10000.
Bump userboot's value up to 15000 cells.
Reviewed by: dteske (r238966,241180)
Obtained from: NetApp
- Missing PTE_SYNC in pmap_kremove caused memory corruption
in userland applications
- Fix lack of cache flushes when using special PTEs for zeroing or
copying pages. If there are dirty lines for destination memory
and page later remapped as a non-cached region actual content
might be overwritten by these dirty lines when cache eviction
happens as a result of applying cache eviction policy or because
of wbinv_all call.
- icache sync for new mapping for userland applications.
Tested by: gber
This change was originally intended to account for test kthreads under
the nvmecontrol process, but jhb indicated it may not be safe to
associate kthreads with userland processes and this could have
unintended consequences.
I did not observe any problems with this change, but my testing didn't
exhaust the kinds of corner cases that could cause problems. It is not
that important to account for these test threads under nvmecontrol, so I
am just reverting this change for now.
On a related note, the part of this patch for <= 7.x fails compilation
so reverting this fixes that too.
Suggested by: jhb
This patch will save CPU time when the XHCI interrupt is
shared with other devices.
Only check event rings when interrupt bits are set.
Otherwise would indicate hiding possible hardware fault(s).
Tested by: sos @
Submitted by: sos @
MFC after: 1 week
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.
Reviewed by: bde
MFC after: 3 days
l2_wbinv_range function implementation causes function
fail to flush caches for chip with RTL number 0x7. I failed
to find official PL310 revision with this RTL number
so further research on this matter required.
Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.
Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.
Workaround race prone tn_readdir_last[pn] fields update.
Add tmpfs_dir_destroy() to free all dirents.
Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.
Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c
* Mikrotik RouterBoard 433AH have PCI slot 18 wired to INT0 on the PCI Bus.
This is different from e.g. Atheros PB42 and Ubiquiti boards.
* Check for hint hint.pcib.0.baseslot=X, where X is number of base slot;
* If hint not supplied print a warning and use default AR71XX_PCI_BASE_SLOT;
PR: kern/174978
Approved by: adrian (mentor)
in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice.
This is a horrible hack to get freefall to compile, and is in dire need
of reconciliation. This antique zlib-1.04 code needs to go away.
During the early days of bhyve it did not support instruction emulation
which necessitated the use of x2apic to access the local apic. This is no
longer the case and the dependency on x2apic has gone away.
The x2apic patches can be considered independently of bhyve and will be
merged into head via projects/x2apic.
Discussed with: grehan
If the data frame transmission failures, it may have a node reference
that needs cleaning up.
If the frame is marked as M_ENCAP then it should treat recvif as a node
reference and clear it.
Now - since the mbuf has been freed by calling if_transmit() (even on
failure), the mbuf has to be treated as invalid. Hence why the ifp is
used.
If if_transmit() fails, the node ref may need freeing.
This is based on the same logic used by the ageq, which the mesh code
(re) uses for frames which need to be staged before transmitting.
It also does the same thing - if M_ENCAP is set on the mbuf, it treats
the recvif pointer as a node reference and derefs it.
TX stalls in this driver, I've also had some
time to evaluate the effectiveness of different
watchdog strategies.
This is the latest attempt, which consolidates
all of the watchdog logic in one place and
consistently detects TX stalls and resets within
a couple of seconds.
the guest to execute real or unpaged protected mode code - bhyve relies on
this feature to execute the AP bootstrap code.
Get rid of the hack that allowed bhyve to support SMP guests on processors
that do not have the "unrestricted guest" capability. This hack was entirely
FreeBSD-specific and would not work with any other guest OS.
Instead, limit the number of vcpus to 1 when executing on processors without
"unrestricted guest" capability.
Suggested by: grehan
Obtained from: NetApp
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.
Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious. The default is "cache".
Tested and benchmarked by: pho (previous version)
MFC after: 2 weeks
__func__ and add some missing ones.
- Remove a stale comment.
- Remove unused NUM_ELEMENTS macro.
- Remove extra empty lines.
- Use DEVMETHOD_END.
- Use NULL rather than 0 for pointers.
MFC after: 3 days
- Replace incorrect function names in printf(9) strings with __func__.
- Make xctrl_shutdown_reasons table const.
- Use nitems() rather than rolling an own version.
- Use DEVMETHOD_END.
- Use NULL rather than 0 for pointers.
MFC after: 3 days
This includes the HAL routines to setup, enable/activate/disable spectral
scan and configure the relevant registers.
This still requires driver interaction to enable spectral scan reporting.
Specifically:
* call ah_spectralConfigure() to configure and enable spectral scan;
* .. there's currently no way to disable spectral scan... that will have
to follow.
* call ah_spectralStart() to force start a spectral report;
* call ah_spectralStop() to force stop an active spectral report.
The spectral scan results appear as PHY errors (type 0x5 on the AR9280,
same as radar) but with the spectral scan bit set (0x10 in the last byte
of the frame) identifying it as a spectral report rather than a radar
FFT report.
Caveats:
* It's likely quite difficult to run spectral _and_ radar at the same
time. Enabling spectral scan disables the radar thresholds but
leaves radar enabled. Thus, the driver (for now) needs to ensure
that only one or the other is enabled.
* .. it needs testing on HT40 mode.
Tested:
* AR9280 in STA mode, HT/20 only
TODO:
* Test on AR9285, AR9287;
* Test in both HT20 and HT40 modes;
* .. all the driver glue.
Obtained from: Qualcomm Atheros
FDT headers can't be included if the kernel is compiled without
FDT support, due to dependence on generated kobj headers. BERI
supports both FDT and non-FDT kernels.
Spotted by: bz
(as used in AM335x SoC for BeagleBone).
Among other things:
* Watchdog reset doesn't hang the driver.
* Disconnecting cable doesn't hang the driver.
* ifconfig up/down doesn't hang the driver
* Out-of-memory no longer panics the driver.
Known issues:
* Doesn't have good support for fragmented packets
(calls m_defrag() on TX, assumes RX packets are never fragmented)
* Promisc and allmulti still unimplimented
* addmulti and delmulti still unimplemented
* TX queue still stalls (but watchdog now consistently recovers in ~5s)
* No sysctl monitoring
* Only supports port0
* No switch configuration support
* Not tested on anything but BeagleBone
Committed from: BeagleBone
mount, which means that is must not be called while the snaplock is
owned. The vfs_write_resume(9) does call the function as the
VFS_SUSP_CLEAN() method, which is too early and falls into the region
still protected by snaplock.
Add yet another flag for the vfs_write_resume_flags() to avoid calling
suspension cleanup handler after the suspend is lifted, and use it in
the ffs_snapshot() call to vfs_write_resume.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- Add pl310.disable tunable to disable L2 cache altogether. In
order to make sure that it's 100% disabled we use cache event
counters for cache line eviction and read allocate events
and panic if any of these counters increased. This is purely
for debugging purpose
- Direct access DEBUG_CTRL and CTRL might be unavailable in
unsecure mode, so use platform-specific functions for
these registers
- Replace #if 1 with proper erratum numbers
- Add erratum 753970 workaround
- Remove wait function for atomic operations
- Protect cache operations with spin mutex in order to prevent race condition
- Disable instruction cache prefetch and make sure data cache
prefetch is enabled in OMAP4-specific intialization
Interrupts must be disabled while handling a partial cache line flush,
as otherwise the interrupt handling code may modify data in the non-DMA
part of the cache line while we have it stashed away in the temporary
stack buffer, then we end up restoring a stale value.
PR: 160431
Submitted by: Ian Lepore
perhaps due to an interrupt configuration problem, do not try to free
device ivars that have not yet have been allocated.
MFC after: 1 week
Reviewed by: gonzo
Sponsored by: DARPA, AFRL
* Finish adding the HAL capability to announce whether a NIC supports
spectral scan or not;
* Add spectral scan methods to the HAL structure;
* Add HAL_SPECTRAL_PARAM for configuration of the spectral scan logic.
The capability ID and HAL_SPECTRAL_PARAM struct are from Qualcomm
Atheros.
the write start, by adding a variation of the vfs_write_resume(9)
which accepts flags.
Use the new function to prevent a deadlock between parallel suspension
and snapshotting a UFS mount. The ffs_snapshot() code performed
vfs_write_resume() followed by vn_start_write() while owning the
snaplock. If the suspension intervene between resume and
vn_start_write(), the deadlock occured after the suspending thread
tried to lock the snaplock, most typically during the write in the
ffs_copyonwrite().
Reported and tested by: Andreas Longwitz <longwitz@incore.de>
Reviewed by: mckusick
MFC after: 2 weeks
X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC,
in HEAD
- Remove unused br_prod_bufs member
- Fixup r241037: buf_ring pads br_prod_* and br_cons_* members at 128
bytes, assuming a fixed cache line size for all the architectures.
However, the above mentioned revision broke the padding.
Use explicit padding to the CACHE_LINE_SIZE on the members that
mark the initial new padded sections. Of course, the padding is not
important for performance reasons in the DEBUG_BUFRING case, leaving
br_cons members to share the cache line with br_lock.
- Fixup r244732: by removing incorrectly added membar in
buf_ring_dequeue_sc() where surrounding locking shoud be enough.
- Drastically reduce the number of membar used (pratically reverting
r244732) by switching rmb() in buf_ring_dequeue_mc() and wmb() in
buf_ring_enqueue() to be complete barriers. This, along with
br_prod_bufs departure, should fix ordering issues as explained in
the provided comments.
This patch is not targeted for MFC.
Sponsored by: EMC / Isilon storage division
Reviewed by: glebius
- Add my copyright to files I've touched a lot this year.
- Add dash in front of all copyright notices according to style(9).
- Move $OpenBSD$ down below copyright notices.
- Remove extra line between cdefs.h and __FBSDID.
Basically it's replica of VersatilePB code which is replica of XBox FB
code. All of them are linear framebuffers and should have common bits
moved to reusable framework.
- Disable interrupt when updating compare value in order to
make this operation atomical
- Increase minimum period for event timer. Systimer on BCM2835
is compare timer, so if minimum period is too small it might
be less then fraction of time between "read current value" and
"set compare timer" operations. It means that when timer is armed
actual counter value is more then compare value and it will take
whole cycle (~32sec for 1MHz timer) to fire interrupt.
Submitted by: Daisuke Aoyama <aoyama at peach.ne.jp>
signal bug_ring ownership. However, instructions can be reordered
around members write leading to stale values for ie. br_prod_bufs.
Use correct memory barriers to ensure proper ordering of the
ownership tokens updates.
Sponsored by: EMC / Isilon storage division
MFC after: 2 weeks
typically do not handle the SYNCHRONIZE_CACHE command - they either
return an error or the firmware enters a reset loop.
Reviewed by: hselasky
Approved by: rstone (co-mentor)
MFC after: 2 weeks
As pointed out by hselasky@, USB_IF_CSI is the wrong macro here since we want
to declare the device's interface class, subclass and protocol, not class,
subclass and driver info.
Follow-up to r244704.
PR: kern/174707
Approved by: glebius
MFC after: 1 week
case. There is no point in optimizing further the code and use a TRUE
litteral for a path that does heavyweight stuff anyway (like lock acq),
at the price of obfuscated code.
Use the appropriate check where necessary and remove a macro.
Sponsored by: EMC / Isilon storage division
MFC after: 3 days
to the current demotion factor instead of assigning it.
This allows external scripts to control demotion factor together
with kernel in a raceless manner.
all interested parties in case if interface flag IFF_UP has changed.
However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.
This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.
P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
- Use spinlock_enter()/spinlock_exit() to prevent a thread holding a
debug lock from being preempted to prevent other threads waiting
on that lock from starvation.
- Handle the possibility of CPU migration in between the fetch of curcpu
and the call to spinlock_enter() by saving curcpu in a local variable.
- Use memory barriers to prevent reordering of loads and stores of the
data protected by the lock outside of the critical section
- Eliminate false sharing of the locks by moving them into the structures
that they protect and aligning them to a cacheline boundary.
- Record the owning thread in the lock to make debugging future problems
easier.
Reviewed by: rpaulo (initial version)
MFC after: 2 weeks
'"'. Mangling is only done for label names read from file system
metadata. Encoding resembles URL encoding. For example, the space
character becomes %20.
Help by: kib
Discussed with: imp, kib, pjd
interrupt context can still be idlethread. At that point, without the
panic condition, it can still happen that idlethread then will try to
acquire some locks to carry on some operations.
Skip the idlethread check on block/sleep lock operations when KDB is
active.
Reported by: jh
Tested by: jh
MFC after: 1 week
processing. For if_transmit() style hardware drivers (which none publicly
exist yet, for wireless) they will need to still implement if_start()
but only to re-start the TX queue.
allocate a map or mapping resources. That seems to imply that any memory
allocations it does must use M_NOWAIT and check for NULL.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
elements to the USB audio softc structure. This fixes a double CPU
fault when attaching USB audio devices in 10-current for i386 at
least.
MFC after: 1 week
Use a boundary of zero, hence a PAGE_SIZE boundary
is implied by all memory allocations.
Background:
Busdma has problems to allocate more than PAGE_SIZE
bytes when the boundary is PAGE_SIZE bytes too.
Initially it was thought that a boundary of PAGE_SIZE
bytes will only affect loading of DMA memory, so that
segments get split correctly, but it also affects
allocation of DMA'able memory.
Solution:
USB can detect big segments and split them as required
by the USB code.
MFC after: 1 week
Reported by: gonzo
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.
Tested by: pho
Reviewed by: kib, mdf
MFC after: 1 week
previous names, 'ptag' and 'pmap' -- p stands for packet.
This change reduces the difference between the code in stable/9
and head, and also helps using the same ixgbe_netmap.h on both branches.
Approved by: Jack Vogel
through the USB API and/or busdma.
The following assumptions have been made:
umass - buffers passed from CAM/SCSI layer are OK
network - mbufs are OK.
Some other nits while at it.
MFC after: 1 week
Suggested by: imp
DMA data does not reside next to non DMA data. This
might cause more memory to be allocated, but solves
problems on platforms using manual cache
synchronization.
Add a convenience function to get the buffer only
from a USB transfer's page cache structure.
MFC after: 1 week
Suggested by: imp
Unfortunately 5720S uses 5709S PHY id so add a hack to detect 5720S
PHY by checking parent device name. 5720S PHY does not support 2500SX.
Tested by: Geans Pin < geanspin <> broadcom dot com >
We also try to make better use of the fs flags instead of
trying adapt the code according to the fs structures. In
the case of subsecond timestamps and birthtime we now
check that the feature is explicitly enabled: previously
we only checked that the reserved space was available and
silently wrote them.
This approach is much safer, especially if the filesystem
happens to use embedded inodes or support EAs.
Discussed with: Zheng Liu
MFC after: 5 days
- Use the new architecture-agnostic buffer pool manager that uses uma(9)
to manage a set of power-of-2 sized buffers for bus_dmamem_alloc().
- Create pools of buffers backed by both regular and uncacheable memory,
and use them to handle regular versus BUS_DMA_COHERENT allocations.
- Use uma(9) to manage a pool of bus_dmamap structs instead of local code
to manage a static list of 500 items (it took 3300 maps to get to
multi-user mode, so the static pool wasn't much of an optimization).
- Small BUS_DMA_COHERENT allocations no longer waste an entire page per
allocation, or set pages to uncached when they contain data other than
DMA buffers. There's no longer a need for drivers to work around the
inefficiency by allocing large buffers then sub-dividing them.
- Because we know the alignment and padding of buffers allocated by
bus_dmamem_alloc() (whether coherent or regular memory, and whether
obtained from the pool allocator or directly from the kernel) we
can avoid doing partial cacheline flushes on them.
- Add a fast-out to _bus_dma_could_bounce() (and some comments about
what the routine really does because the old misplaced comment was wrong).
- Everywhere the dma tag alignment is used, the interpretation is that
an alignment of 1 means no special alignment. If the tag is created
with an alignment argument of zero, store it in the tag as one, and
remove all the code scattered around that changed 0->1 at point of use.
- Remove stack-allocated arrays of segments, use a local array of two
segments within the tag struct, or dynamically allocate an array at first
use if nsegments > 2. On an arm system I tested, only 5 of 97 tags used
more than two segments. On my x86 desktop it was only 7 of 111 tags.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
manage a set of power-of-2 sized buffers for bus_dmamem_alloc().
This allows the caller to provide the back-end allocator uma allocator,
allowing full control of the memory pages backing the pool. For
convenience, it provides an optional builtin allocator that provides pages
allocated with the VM_MEMATTR_UNCACHEABLE attribute, for managing pools of
DMA buffers for BUS_DMA_COHERENT or BUS_DMA_NOCACHE.
This also allows the caller to specify a minimum alignment, and it ensures
that all buffers start on a boundary and have a length that's a multiple of
that value, to avoid using buffers that trigger partial cache line flushes.
Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>
only returns name, but also vnode of corefile to use.
This simplifies the code and closes few races, especially in %I handling.
Reviewed by: kib
Obtained from: WHEEL Systems
- Use this new format to automatically handle syscalls and VOPs. This
changes the earlier format but is still human readable.
Sponsored by: EMC / Isilon Storage Division