The T3 ASIC can provide an incoming packet's timestamp instead of its RSS hash.
The timestamp is just a counter running off the card's clock. With a 175MHz
clock an increment represents ~5.7ns and the 32 bit value wraps around in ~25s.
# sysctl -d dev.cxgbc.0.pkt_timestamp
dev.cxgbc.0.pkt_timestamp: provide packet timestamp instead of connection hash
# sysctl -d dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: core clock frequency (in KHz)
# sysctl dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: 175000
flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for
the memory allocation.
As Warner noted, allowing the functions to sleep might cause
reordering of the queued notifications.
Reviewed by: imp, jh
MFC after: 3 weeks
Clang generates the following warnings when building subr_usbd.c:
| subr_usbd.c:598:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype
| subr_usbd.c:627:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype
| subr_usbd.c:649:13: warning: promoted type 'int' of K&R function
| parameter is not compatible with the parameter type 'uint8_t' (aka
| 'unsigned char') declared in a previous prototype
Instead of just ANSIfying these three prototypes, do it for the entire
file.
Spotted by: clang
side-effect of purging more than the requested translation. While
this is not a problem in general, it invalidates the assumption made
during constructing the trapframe on entry into the kernel in SMP
configurations. The assumption is that only the first store to the
stack will possibly cause a TLB miss. Since the ptc.g purges the
translation caches of all CPUs in the coherency domain, a ptc.g
executed on one CPU can cause a purge on another CPU that is
currently running the critical code that saves the state to the
trapframe. This can cause an unexpected TLB miss and with interrupt
collection disabled this means an unexpected data nested TLB fault.
A data nested TLB fault will not save any context, nor provide a
way for software to determine what caused the TLB miss nor where
it occured. Careful construction of the kernel entry and exit code
allows us to handle a TLB miss in precisely orchastrated points
and thereby avoiding the need to wire the kernel stack, but the
unexpected TLB miss caused by the ptc.g instructution resulted in
an unrecoverable condition and resulting in machine checks.
The solution to this problem is to synchronize the kernel entry
on all CPUs with the use of the ptc.g instruction on a single CPU
by implementing a bare-bones readers-writer lock that allows N
readers (= N CPUs entering the kernel) and 1 writer (= execution
of the ptc.g instruction on some CPU). This solution wins over
a rendez-vous approach by not interrupting CPUs with an IPI.
This problem has not been observed on the Montecito.
PR: ia64/147772
MFC after: 6 days
a real (inline) function or applying void casting for all its consumers.
In most of places, the "return value" is not checked nor assigned, which
causes too many warnings for some smart compilers, i.e., clang.
Found by: clang
Remove unneeded rxtx handler, make que handler generic.
Do not allocate header mbufs in rx ring if not doing hdr split.
Release the lock in rxeof call to stack.
MFC for 8.1 asap
via %s
Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.
Found by: clang
MFC after: 2 week
Apparently it's bad when we first have an ANSI prototype in function
declaration, but then use K&R in its defintion.
Complaint from: clang
MFC after: 2 weeks
per-buf flag to catch if a buf is double-counted in the free count.
This code was useful to debug an instance where a local patch at Isilon
was incorrectly managing numfreebufs for a new buf state.
Reviewed by: jeff
Approved by: zml (mentor)
CPU_FOREACH(i) iterates over the CPU IDs of all available CPUs. The
CPU_FIRST() and CPU_NEXT(i) macros can also be used to iterate over
available CPU IDs. CPU_NEXT(i) wraps around to CPU_FIRST() rather than
returning some sort of terminator.
Requested by: rwatson
Reviewed by: attilio
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)
shared resources defaults beyond absolute minimums.
The new values are chosen mostly by magic. They are still fairly
small and will need increasing for large installations (especially
SHMMAX). However, they are now enough to e.g. start PostgreSQL
installations with ~~300 users and nearly 512 MB of shared buffers.
Reviewed by: A short discussion on hackers@
a) There was a case where a ICMP message could cause
us to return leaving a stuck lock on an stcb.
b) The iterator needed some tweaks to fix its lock
ordering.
c) The ITERATOR_LOCK is no longer needed in the freeing
of a stcb. Now that the timer based one is gone we don't
have a multiple resume situation. Add to that that there
was somewhere a path out of the freeing of an assoc that
did NOT release the iterator_lock.. it was time to clean
this old code up and in the process fix the lock bug.
MFC after: 1 week
in particular, do not handle deferred DMA map load operations at all.
Any error, and especially EINPROGRESS, is treated as a hard error and
typically abort the current operation. The fact that the busdma code
queues the load operation for when resources (i.e. bounce buffers in
this particular case) are available makes this especially problematic.
Bounce buffering, unlike what the PR synopsis would suggest, works
fine.
While on the subject, properly implement swi_vm().
PR: 147502
MFC after: 1 week
802.11 duplicate detection. Upon looking at the standard, we discover
that 802.11-2007 says:
"A receiving QoS STA is also required to keep only the most recent
cache entry per<Address 2, TID, sequence-number> triple, storing only
the most recently received fragment number for that triple. A receiving
STA may omit tuples obtained from broadcast/multicast or ATIM frames
from the cache."
To fix this, we just disable duplicate detection for multicast/broadcast
frames.
Reviewed by: sam
MFC after: 4 weeks
Obtained from: DragonFly
in Linux emulation layer. Linux seems to only require that pos is
page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter
checking is too strict to allow some Linux binaries to run. tsMuxeR is
one example of such a binary.
Discussed with: jhb
MFC after: 1 week
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.
Reviewed by: jhb@
This avoids errors or __DECONST() from places with higher WARNS levels.
Adjust a local cache variable in ipcs to const as well
to compile in the new world order.
Suggested by: jhb
Reviewed by: jhb, kib, brueffer (man)
support FreeBSD.
1) Timeout ioctl command timeouts.
Do not reset the controller if ioctl command completed
successfully.
2) Remove G66_WORKAROUND code (this bug never shipped).
3) Remove unnecessary interrupt lock (intr_lock).
4) Timeout firmware handshake for PChip reset (don't wait forever).
5) Handle interrupts inline.
6) Unmask command interrupt ONLY when adding a command to the pending
queue.
7) Mask command interrupt ONLY after removing the last command from
the pending queue.
8) Remove TW_OSLI_DEFERRED_INTR_USED code.
9) Replace controller "state" with separate data fields to avoid races:
TW_CLI_CTLR_STATE_ACTIVE ctlr->active
TW_CLI_CTLR_STATE_INTR_ENABLED ctlr->interrupts_enabled
TW_CLI_CTLR_STATE_INTERNAL_REQ_BUSY ctlr->internal_req_busy
TW_CLI_CTLR_STATE_GET_MORE_AENS ctlr->get_more_aens
TW_CLI_CTLR_STATE_RESET_IN_PROGRESS ctlr->reset_in_progress
TW_CLI_CTLR_STATE_RESET_PHASE1_IN_PROGRESS ctlr->reset_phase1_in_progress
10) Fix "req" leak in twa_action() when simq is frozen and req is NOT
null.
11) Replace softc "state" with separate data fields to avoid races:
TW_OSLI_CTLR_STATE_OPEN sc->open
TW_OSLI_CTLR_STATE_SIMQ_FROZEN sc->simq_frozen
12) Fix reference to TW_OSLI_REQ_FLAGS_IN_PROGRESS in
tw_osl_complete_passthru()
13) Use correct CAM status values.
Change CAM_REQ_CMP_ERR to CAM_REQ_INVALID.
Remove use of CAM_RELEASE_SIMQ for physical data addresses.
14) Do not freeze/ release the simq with non I/O commands.
When it is appropriate to temporarily freeze the simq with an I/O
command use:
xpt_freeze_simq(sim, 1);
ccb->ccb_h.status |= CAM_RELEASE_SIMQ;
otherwise use:
xpt_freeze_simq(sim, 1);
xpt_release_simq(sim, 1);
Submitted by: Tom Couch <tom.couch lsi.com>
PR: kern/147695
MFC after: 3 days
sctp_inpcb:
1) Make sure not to remove the flag on the PCB until
after the close() caller is back in control with the
lock. Otherwise a quickly freeing assoc could kill the
inpcb and cause a panic.
2) Make sure all calls to log_closing have not released
the locks before calling the log function, we don't
want the logging function to crash us due to a freed
inpcb.
3) Make sure that when we get to the end, we release all
locks (after removing them from view) and as long as
we are NOT the inp-kill timer removing the inp, call
the callout_drain() function so a racing timer won't
later call in and cause a racing crash.
MFC after: 1 week
directory entry. Use the new function in devfs_fqpn(), devfs_lookupx()
and devfs_vptocnp() instead of manually resolving the parent entry.
Reviewed by: kib
passing through. Modifications are restricted to a subset of C language
operations on unsigned integers of 8, 16, 32 or 64 bit size.
These are: set to new value (=), addition (+=), subtraction (-=),
multiplication (*=), division (/=), negation (= -), bitwise AND (&=),
bitwise OR (|=), bitwise eXclusive OR (^=), shift left (<<=),
shift right (>>=). Several operations are all applied to a packet
sequentially in order they were specified by user.
Submitted by: Maxim Ignatenko <gelraen.ua at gmail.com>
Vadim Goncharov <vadimnuclight at tpu.ru>
Discussed with: net@
Approved by: mav (mentor)
MFC after: 1 month
the case of immediate unconfigure after configure. Hold the periph an
extra count while we have the task to create sysctl context outstanding
so that the periph doesn't go away unexpectedly.
Sponsored by: Panasas
Reviewed by: scsi@
MFC after: 1 month
from the buffer pages to buffer. Combine the code to set buffer
dirty range (previously in vfs_setdirty()) and to clean the pages
(vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now
the vm object lock is acquired only once.
Drain the VPO_BUSY bit of the buffer pages before setting valid
and clean bits in vfs_clean_pages_dirty_buf() with new helper
vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not
busy.
In vfs_busy_pages(), move the wait for draining of VPO_BUSY before
the dirtyness handling, to follow the structure of
vfs_clean_pages_dirty_buf().
Reported and tested by: pho
Suggested and reviewed by: alc
MFC after: 2 weeks
the interrupt followed by a brief delay if it is not currently masked
before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers. This
closes a race where ioapic_program_intpin() could use a stale value of
the masked state to compute the masked bit in the register.
Reviewed by: mav
MFC after: 2 weeks
better devices. This can be disabled on a per-device basis using quirks as
well.
This also handles the case where there is actually no connected LUN 0
(which can definitely be the case for storage arrays).
Reviewed by: scsi@
MFC after: 1 month
for REPORT and SET TARGET PORT GROUP commands (foundations for future work).
Regularize opcodes to be upper case hex.
Pick *one* of tab or space after #define (tab) and stick with that.
MFC after: 2 weeks
1) Only use both mapping arrays when NR sack is off. This
way we can hold off moving the cumack (not the best but
workable) when NR-sack is on.
2) We must make sure to just return on the move of the
bit to the NR array if the cum-ack as already went
past the TSN. This prevents marking a bit behind the
array and hitting the invariant code that panic's us.
MFC after: 1 week
- Re-enable TSO. This was broken previously due to CSUM_TSO clearing the
CSUM_TCP flag, so our checksum flags were incorrectly set going to the
netback driver. That was fixed in r206844 in tcp_output.c, so we can
turn TSO back on here.
- Fix the way transmit slots are calculated, so that we can't overfill
the ring.
- Avoid sending packets with more fragments/segments than netback can
handle. The Linux netback code can only handle packets of
MAX_SKB_FRAGS, which turns out to be 18 on machines with 4K pages. We
can easily generate packets with 32 or so fragments with TSO turned on.
Right now the solution is just to drop the packets (since netback
doesn't seem to handle it gracefully), but we should come up with a way
to allow a driver to tell the TCP stack the maximum number of fragments
it can handle in a single packet.
- Fix the way the consumer is tracked in the receive path. It could get
out of sync fairly easily.
- Use standard Xen ring macros to make it clearer how netfront is using
the rings.
- Get rid of Linux-ish negative errno return values.
- Added more documentation to the driver.
- Refactored code to make it easier to read.
- Some other minor fixes.
Reviewed by: gibbs
Reviewed by: gibbs
Sponsored by: Spectra Logic
MFC after: 7 days
We were only paying attention to the nr-mapping-array. Which
seems to make sense on the surface, by definition things
up to the cum-ack should be deliverable thus in the nr-mapping-array.
However (there is always a gotcha) thats not true when it
comes to large messages. The stack may hold the message
while re-assembling it not not deliver it based on several
thresholds. If that happens (which it would for smaller
large messages) then the cum-ack is figured wrong. We
now properly use both arrays in the cum-ack calculation.
MFC after: 1 week.
the timer. This is done by considering the locks
we will destroy and if they are contended we consider
it the same as a reference count being up. Fixing this
appears to cleanup another crash that was appearing with
all the timers where the socket buf lock got corrupted.
2) Fix the sysctl code to take a lot more care when looking
at INP's that are in the GONE or ALLGONE state.
MFC after: 1 week
of apitesters.. Basically we end up with attempting
to destroy a lock thats contended on. A cookie echo
arrives at the same time that the close is happening.
The close gets the lock but the cookie echo has already
passed the check for the gone flag and is then locked
waiting on the create lock.. when we go to destroy it
bam. For now we do the timer destroy for all calls
to close.. We can probably optimize this later so that
we check whats being contended on and if there is contention
then do the timer thing. but this is probably safest since
the inp has been removed from all lists and references and
only the timer can find it.. once the locks are released all
other places will instantly see the GONE flag and bail (thats
what the change in sctp_input is one place that was lacking
the bail code).
MFC after: 1 week
held by checking the create and inp locks as well.
2) Fix a bug in that when a socket is closed an INIT-ACK
is returned, we do NOT unlock the locked_tcb unless its
different (an unlikely scenario). If we blindly unlock as
we were doing before we can end up unlocking the actual
stcb thats about to be sent down to the free function which
requires the lock be held.
MFC after: 1 week
was setup to do an abortive close an association that was
in the accept_queue could get stuck and never freed. Now
we properly start the kill timer on the socket and turn
off the flag (same thing we do for the graceful close method).
MFC after: 1 week
corruption bug where if an ATA command is issued before DMA is started,
data will become available to the controller before it knows what to do
with it. This results in either data corruption or a controller crash.
This patch remedies the problem by adopting the workaround employed
by Linux and Darwin: starting the DMA engine prior to sending the ATA
command.
Observer on: Xserve G5
Reviewed by: mav
MFC after: 1 week
the page is managed.
Don't set the machine-independent layer's dirty field for the page being
mapped in init_pte_prot(). (The dirty field is only supposed to set when
a mapping is removed or write-protected and the page was managed and
modified.)
Determine whether or not to perform dirty bit emulation based on whether
or not the page is managed, i.e., pageable, not based on whether the page
is being mapped into the kernel address space. Nearly all of the kernel
address space consists of unmanaged pages, so this has neglible impact on
the overhead of dirty bit emulation for the kernel address space. However,
there can also exist unmanaged pages in the user address space. Previously,
dirty bit emulation was unnecessarily performed on these pages.
Tested by: jchandra@
buffers it should also reinitialize RX descriptors otherwise some
stale data could be passed to controller. This could end up with
mbuf double free or unexpected NULL pointer dereference in upper
stack. To fix the issue, save loaded buffer's length and
reinitialize RX descriptors with the saved value whenever bge(4)
reuses the loaded RX buffers.
While I'm here, increase the number of RX buffers to 512 from 256.
This simplifies RX buffer handling as well as giving more RX
buffers. Controller supports just fixed number of RX buffers
(i.e. 512) and bge(4) used to rely on hope that our CPU is fast
enough to keep up with the controller. With this change, bge(4)
will use 1MB for RX buffers but I don't think it would cause
problems in these days.
Reported by: marcel
Tested by: marcel
1) Fix the alignment of a comment.
2) Fix a BUG where we were NOT paying attention
to the RESEND marking on retransmitting control
chunks.. and worse we were not decrementing the
retran count that could cause us to loop forever.
3) Add in the valdiate_no_lock function on invariants
so that we will really check all ways out to be sure
a lock does not slip out locked.
MFC after: 1 week.
1) Makes it so that the INVARIANT function validate nolocks is
available anywhere.
2) Fixes a BUG where a close has been done on a collision socket
and the cookie processing would return leaving a lock held.
MFC after: 1 week
fix. On Apple OpenPICs, the low/high bit of the interrupt sense is only
respected for interrupt 0. We currently erroneously program all OpenPIC
interrupts level high instead of level low by default, which only matters
for some G5 systems where the SATA controllers use IRQ 0.
This change is a quick fix that will be reverted once the effect of
changing the default interrupt sense on embedded systems is known.
MFC after: 3 days
are respected. This fixes loading the Apple onboard audio driver
(snd_ai2s) as a module after boot, which would previously cause a panic.
PR: powerpc/146888
MFC after: 5 days
context from in-kernel execution of padlock instructions and to handle
spurious FPUDNA exceptions that sometime are raised when doing padlock
calculations.
Globally mark crypto(9) kthread as using FPU.
Reviewed by: pjd
Hardware provided by: Sentex Communications
Tested by: pho
PR: amd64/135014
MFC after: 1 month
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.
Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.
KPI discussed with: fabient
Tested by: pho, fabient
Hardware provided by: Sentex Communications
MFC after: 1 month
I don't know why- but it occurred to me in looking at the second sleep
is that all I want is a pause- not an actual sleep. So do that instead.
MFC after: 2 weeks
fails to allocate MIPS page table pages. The current usage of VM_WAIT in
case of vm_phys_alloc_contig() failure is not correct, because:
"There is no guarantee that any of the available free (or cached) pages
after the VM_WAIT will fall within the range of suitable physical
addresses. Every time this function sleeps and a single page is freed
(or cached) by someone else, this function will be reawakened. With
a little bad luck, you could spin indefinitely."
We also add low and high parameters to vm_contig_grow_cache() and
vm_contig_launder() so that we restrict vm_contig_launder() to the range
of pages we are interested in.
Reported by: alc
Reviewed by: alc
Approved by: rrs (mentor)
GCC forwards the -N flag directly to ld. This flag is not documented and
not supported by (for example) Clang. Just use -Wl,-N.
Submitted by: Pawel Worach
file to be of maximum size.
- Add special handling required for SMI/VTOC8 disklabel partcode, i.e. avoid
overwriting the label when writing the bootstrap code to the partition
starting at 0 and install it to all partitions when the -i option is omitted
just like geom_sunlabel(4) and sunlabel(8) do by default.
- Add missing prototypes.
- Add const where applicable.
Reviewed by: marcel
MFC after: 3 days
cover the initial read by dqopen(). Assert that vnode is locked in
dqopen(). Remove VFS_LOCK_GIANT() from dqopen(), since quotaon() keeps
Giant locked if needed around the call.
different mount points, e.g. the nullfs vnode and the covered vnode
from the lower filesystem. In this case, existing assertion in
vop_rename_pre() may be triggered.
Check for vnode locks equiality instead of the vnodes itself to
not trip over the situation.
Submitted by: Mikolaj Golub <to.my.trociny@gmail.com>
Tested by: pho
MFC after: 2 weeks
doing bidirectional stress traffic on 82598.
Also a couple bug fixes from Michael Tuexen, thank you!!
Add a workaround into the header so that 8 REL can use
the driver (adds local copy of ALTQ fix).
MFC: in a few days
o fdtbus(4) - the main abstract bus driver for all FDT-compliant systems. This
is a direct replacement for the many incompatible bus drivers grouping
integrated peripherals on embedded platforms (like obio(4), ocpbus(4) etc.)
o simplebus(4) - bus driver representing ePAPR style 'simple-bus' node, which
is an umbrella device for most of the integrated peripherals on a typical
system-on-chip device.
o Other components (common routines library, PCI node processing helper
functions)
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
fields that is always included in PCPU_MD_FIELDS. The macro is empty for
non-XEN kernels. This avoids duplicating non-XEN per-CPU fields in two
places. While here, remove several unused fields from the XEN-specific
structure.
Reviewed by: kmacy, gibbs
MFC after: 1 month
Use it to allow to tune sem_nsem_max at runtime, only when sem.ko
module is present in kernel.
Requested and tested by: amdmi3
Reviewed by: jhb
MFC after: 3 days
mapping but not changing the physical page being mapped, the wrong flags
were being inspected in order to determine whether or not to flush the
instruction cache. The effect of looking at the wrong flags was that the
instruction cache was never being flushed.
Reviewed by: marcel
The fix is a partial import and merge of OpenSolaris onnv revisions
8227:f7d7be9b1f56. and 9292:e112194b5b73
Approved by: pjd, delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6798298)
MFC after: 3 days
When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
If disk was missing on pool load or import and on next pool load or import
it was present, resilver wasn't started automatically and ZFS reported all disks
as ONLINE and healthy. Then, when another disk died, pool became unaccessible,
because if it was 2-way mirror or RAIDZ1 two vdevs were out of sync.
To fix the problem, start resilver automatically on pool load or import.
Obtained from: OpenSolaris
MFC after: 3 days
When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
The remaining, unmerged portions of r175404
Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)
Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.
Eliminate \n from a nearby panic string.
Use KASSERT() to reimplement pmap_copy()'s two assertions.
Merge portions of r177659
To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.
r208609
Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().
r208645
When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.
Tidy up the variable declarations at the start of pmap_enter().
I removed too many lines and a wrong pointer was accidentally passed down.
Tested by: Scott Allendorf (scott-allendorf at uiowa dot edu), kib
MFC after: 3 days
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.
Tidy up the variable declarations at the start of pmap_enter().
- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
ta_func may free the task structure, so no references to its members
are valid after the handler has been called. Using a per-queue member
and having waits longer than strictly necessary was suggested by jhb.
Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: zml, jhb
o Let OFW_INIT() and OF_init() return status value.
o Provide helper routines for 'compatible' property handling.
o Only compile OF and OFW code, which is relevant in FDT scenario.
o Other minor cosmetics
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
- use correct size (512) while reading a gang block
- skip holes while reading child blocks
- advance buffer pointer while reading child blocks
PR: 144214
MFC after: 10 days
will be called automatically by 'timer1clock()'.
Do profiling as often as possible by running it as the same frequency as
'timer1hz'. The statistics clock is run as close to 128Hz as possible.
Pointed out by: mav@
file via NFS. Specifically, to satisfy close-to-open-consistency, the NFS
client always performs at least one RPC on a file during an open(2) to see
if the file has changed. Normally this RPC is an ACCESS or GETATTR RPC
that is forced by flushing a file's attribute cache during nfs_open() and
then requesting new attributes. However, if the file is noticed to be
stale during nfs_open(), the only recourse is to fail the open(2) call
with ESTALE. On the other hand, if the ACCESS or GETATTR RPC is sent
during nfs_lookup(), then the NFS client can fall back to a LOOKUP RPC to
obtain the new file handle in the case that a file has been replaced.
This change causes the NFS client to flush the attribute cache during
nfs_lookup() when validating a name cache hit if the attributes fetched
during nfs_lookup() can be reused in nfs_open(). This allows the client
to open a replaced file via the new file handle the first time that it
notices a replaced file rather than failing with ESTALE in some cases.
Reviewed by: rmacklem, bde
Reviewed by: mohans (older version)
MFC after: 1 week
set but be cleared before the call to sodisconnect(). In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.
PR: kern/144061
Reported by: Matt Reimer <mreimer at vpop.net>,
Nikolay Denev <ndenev at gmail.com>,
Mikolaj Golub <to.my.trociny at gmail.com>
MFC after: 3 days
the jail(8) command. [10:04]
Fix a one-NUL-byte buffer overflow in libopie. [10:05]
Correctly sanity-check a buffer length in nfs mount. [10:06]
Approved by: so (cperciva)
Approved by: re (kensmith)
Security: FreeBSD-SA-10:04.jail
Security: FreeBSD-SA-10:05.opie
Security: FreeBSD-SA-10:06.nfsclient
whole bus (XPT_SCAN_BUS) and a single lun on that bus (XPT_SCAN_LUN).
It's less resource comsumptive than scanning a whole bus when the
caller knows only one target has changes.
Reviewed by: scsi@
Sponsored by: Panasas
MFC after: 1 month
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.
Reviewed by: kib
o DB-88F5182
o DB-88F5281
o DB-88F6281
o DB-78100
o SheevaPlug
This also includes device tree bindings definitions for some newly introduced
nodes (mpp, gpio).
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
The driver is stub. It just creates device entry and feeds
reassembled packets from hardware into it.
If in future we would port wsmouse(4) from NetBSD, or make
sysmouse(4) to support absolute motion events, then the driver
can be extended to act as system mouse. Meanwhile, it just
presents a /dev/uep0, that can be utilized by X driver, that
I am going to commit to ports tree soon.
The name for the driver is chosen to be the same as in NetBSD,
however, due to different USB stacks this driver isn't a port.
o This is disabled by default for now, and can be enabled using WITH_FDT at
build time.
o Tested with ARM and PowerPC.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
the arguments array instead of array itself. ia64 syscall arguments are
readily available in the frame, point args to it, do not do unnecessary
bcopy. Still reserve the array in syscall_args for ia32 emulation.
Suggested and reviewed by: marcel
MFC after: 1 month
Make sure not to requeue freed mbuf in sge_start_locked(). This
should fix NULL pointer dereference panic.
Reported by: Nikolay Denev <ndenev <> gmail dot com>
Submitted by: jhb
Implement an optional delay to the ddb reset/reboot command.
This allows textdumps to be run automatically with unattended reboots
after a resonable timeout, while still permitting an administrator to
break into debugger if attached to the console at the time of the
event for further debugging. Cap the maximum delay at 1 week to avoid
highly accidental results, and default to 15s in case of problems
parsing the timeout value.
Move hex2dec helper function from db_thread.c to db_command.c to make
it generally available and prefix it with a "db_" to avoid namespace
collisions.
Reviewed by: rwatson
MFC after: 4 weeks
Improve IPsec flow distribution for better netisr parallelism.
Instead of using the pointer that would have the last bits masked in a %
statement in netisr_select_cpuid() to select the queue, use the SPI.
Reviewed by: rwatson
MFC after: 4 weeks
APIC interrupt that fires when a threshold of corrected machine check
events is reached. CMCI also includes a count of events when reporting
corrected errors in the bank's status register. Note that individual
banks may or may not support CMCI. If they do, each bank includes its own
threshold register that determines when the interrupt fires. Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl). The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.
Tested by: Sailaja Bangaru sbappana at yahoo com
MFC after: 1 month
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().
Reviewed by: kib (an earlier version)
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.
Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().
Diagnosed and tested by: neel
MFC after: 3 days
Intention of this commit is to let us take a full advantage
of libusb(8) ported to Linux. This decreases a possibility of getting
any collisions within ioctl() "command" space, especially with
relation to LINUX_SNDCTL_SEQ... stuff.
Basically, we provide commands, that will be mapped in the kernel
to correct ones and forward those to the USB layer. Port enabling
functionality brought with this patch is here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=146895
Bump __FreeBSD_version to catch, since which version installing a
port makes sense.
This patch should bring no regressions. So far, only i386 is tested.
Tested by: thompsa@
Reviewed by: thompsa@
OKed by: netchild@
The arcstats.l2_write_bytes_written kstat counter introduced
in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter
imported in r208373 (OpenSolaris revision 8582:df9361868dbe)
Approved by: pjd, delphij (mentor)
MFC after: 3 days
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.
Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().
The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.
Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().
Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.
The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.
Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month
device, make sure we have no real HPET device entry with same ID.
As side effect, it potentially allows several HPETs to be attached.
Use first of them for timecounting, rest (if ever present) could later
be used as event sources.
memory on a platform. Tested on the Sibyte with 256MB and 1GB memory
configurations.
- Replace vtophys() with MIPS_KSEG0_TO_PHYS() to convert a page table
page's virtual address to physical. We can safely do this because
page table pages are allocated out of KSEG0.
- Add an assertion to verify that when a page table page is freed it
contains all zeroes. We can now use it after allocation without
zeroing it.
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
the idle process is now shared across all idle threads.
MFC after: 1 month
locate a high memory area for the heap using the SMAP.
- Read the number of hard drive devices from the BIOS instead of hardcoding
a limit of 128. Some BIOSes duplicate disk devices once you get beyond
the maximum drive number.
MFC after: 1 month
fd_set bits in select(2). It seems that historical behaviour is to not
reporting exception on EOF, and several applications are broken.
Reported by: Yoshihiko Sarumaru <ysarumaru gmail com>
Discussed with: bde
PR: ports/140934
MFC after: 2 weeks
- Adds re-partitioning TLB per core for enabled threads.
- Adds hardware thread id to cpuid mapping
- updates rge driver packet distribution and message ring handling
threads to be started based on hardware thread id.
- remove unused early debugging code to set control registers.
- coding style fixes
Approved by: rrs (mentor)
where running ofwdump could cause hangs by forcing all secondary CPUs
into a busy wait with interrupts off during the call.
Following section 8.4 of the Open Firmware PowerPC processor binding,
the firmware is free to overwrite the system interrupt handlers during
OF calls, restoring the OS handlers on exit. On single CPU systems, this
process is invisible to the operating system. On multiple CPU systems,
taking any exception on a secondary CPU while an OF call is in progress
ends with that exception vectored into OF, resulting in a slow movement
of the entire system into firmware context and a machine hang.
MFC after: 3 days
hook it up to ada(4) also. While at it, rename *ad_firmware_geom_adjust()
to *ata_disk_firmware_geom_adjust() etc now that these are no longer
limited to ad(4).
Reviewed by: mav
MFC after: 3 days