The only current purpose of the pvh lock was explained there
On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote:
> Let me lay out one example for you in detail. Suppose that we have
> three processors and two of these processors are actively using the same
> pmap. Now, one of the two processors sharing the pmap performs a
> pmap_remove(). Suppose that one of the removed mappings is to a
> physical page P. Moreover, suppose that the other processor sharing
> that pmap has this mapping cached with write access in its TLB. Here's
> where the trouble might begin. As you might expect, the processor
> performing the pmap_remove() will acquire the fine-grained lock on the
> PV list for page P before destroying the mapping to page P. Moreover,
> this processor will ensure that the vm_page's dirty field is updated
> before releasing that PV list lock. However, the TLB shootdown for this
> mapping may not be initiated until after the PV list lock is released.
> The processor performing the pmap_remove() is not problematic, because
> the code being executed by that processor won't presume that the mapping
> is destroyed until the TLB shootdown has completed and pmap_remove() has
> returned. However, the other processor sharing the pmap could be
> problematic. Specifically, suppose that the third processor is
> executing the page daemon and concurrently trying to reclaim page P.
> This processor performs a pmap_remove_all() on page P in preparation for
> reclaiming the page. At this instant, the PV list for page P may
> already be empty but our second processor still has a stale TLB entry
> mapping page P. So, changes might still occur to the page after the
> page daemon believes that all mappings have been destroyed. (If the PV
> entry had still existed, then the pmap lock would have ensured that the
> TLB shootdown completed before the pmap_remove_all() finished.) Note,
> however, the page daemon will know that the page is dirty. It can't
> possibly mistake a dirty page for a clean one. However, without the
> current pvh global locking, I don't think anything is stopping the page
> daemon from starting the laundering process before the TLB shootdown has
> completed.
>
> I believe that a similar example could be constructed with a clean page
> P' and a stale read-only TLB entry. In this case, the page P' could be
> "cached" in the cache/free queues and recycled before the stale TLB
> entry is flushed.
TLBs for addresses with updated PTEs are always flushed before pmap
lock is unlocked. On the other hand, amd64 pmap code does not always
flushes TLBs before PV list locks are unlocked, if previously PTEs
were cleared and PV entries removed.
To handle the situations where a thread might notice empty PV list but
third thread still having access to the page due to TLB invalidation
not finished yet, introduce delayed invalidation. Comparing with the
pvh_global_lock, DI does not block entered thread when
pmap_remove_all() or pmap_remove_write() (callers of
pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait()
callers are blocked until all previously noted DI blocks are leaved,
thus ensuring that neccessary TLB invalidations were performed before
returning from pmap_remove_all() or pmap_remove_write().
See comments for detailed description of the mechanism, and also for
the explanations why several pmap methods, most important
pmap_enter(), do not need DI protection.
Reviewed by: alc, jhb (turnstile KPI usage)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5747
get_token(..) returns int32_t, not enum tok, and in many cases tests for items
not in enum tok (e.g. '('). Make the typing consistent with get_token, which
includes a domino effect of changing enum tok to int32_t.
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
* Log the per-completion status out if requested
* If we get a PHY failure, the retrycnt is set to 0 and ack=0, so
the logic was incorrect. So, for ack=0, ensure we don't log
a retrycnt of 0 (or rate control breaks) or a negative retrycnt
(or rate control also breaks.)
Tested:
* BCM4321 (11abgn N-PHY), BCM4312 (LP-PHY)
The RSB controller speaks a simplified two wire protocol at speeds up to
20MHz. It is used on sun8i and sun9i family SoCs to communicate with
power management ICs.
RSB isn't really I2C or SMBus, but the driver exposes an iicbus interface
to simplify power management IC drivers (which may need to support both
RSB and I2C connectivity).
- Save errno
- Set errno to 0
- Call strtoul
- Test errno (optional, but many calls to strtoul did this afterwards)
Some of the code was setting errno = 0 after calling strtoul, not setting
errno = 0, or setting errno to saved_errno after the call, but before the
test. These all have unwanted behavioral side-effects, depending on the
initial value of errno and whether or not the input to strtoul was correct
or incorrect.
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
* Ensure we set 20MHz wide channels (hard-coded) for PHY-N.
* Change the core rese tto take a flag saying "gmode" vesus uint32_t
flags. This is important for BCMA support where the "gmode" bit
is different.
* Refactor out the mac-phy clock reset routine (usde by PHY-N).
Tested:
* BCM4321 (PHY-N), BCM4312 (PHY-LP)
TODO:
* Checkpoint test on PHY-G hardware, just to check.
How errno is saved before and restored after strtoul calls needs a rethink
MFC after: 1 week
Reported by: gcc 5.x
Sponsored by: EMC / Isilon Storage Division
For the multihomed case, ifp be used after being freed. NULL the value
after freeing it and avoid getting into the branch without reassigning
a new value.
CID: 272671
Obtained from: NetBSD
MFC after: 2 weeks
I meant to use nitems, not sizeof(..) with the destination buffer. Using sizeof(..)
on a pointer will always truncate the output in the destination buffer incorrectly
Pointyhat to: ngie
MFC after: 1 week
X-MFC with: r299764
Sponsored by: EMC / Isilon Storage Division
rc cannot be negative -- that was already tested for earlier on in
the function
MFC after: 1 week
Reported by: clang, gcc
Sponsored by: EMC / Isilon Storage Division
parse_context, parse_user_security: test for validity of results from
parse_ascii(..) with by casting to int32_t and comparing to -1; comparing
unsigned types to negative values will always be false.
Reported by: clang, Coverity
CID: 1011432, 1011433
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Increase the size of `string` by 1 to account for the '\0' terminator. In the event
that `str` doesn't contain any non-alpha chars, i would be set to MAXSTR, and
the subsequent strlcpy call would overflow by a character.
Remove unnecessary `string[i] = '\0'` -- this is already handled by strlcpy.
MFC after: 1 week
Reported by: clang
Sponsored by: EMC / Isilon Storage Division
Technically this is a no-op, but mute the clang warning in case the malloc call
above for fstring ever changes in the future
Reported by: clang
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
This is a no-op as the malloc above set the size of the buffer to the size used
below, but this keeps things consistent in case the malloc call changes somehow.
MFC after: 1 week
Reported by: clang
Sponsored by: EMC / Isilon Storage Division
This isn't compiled in yet; so some code here duplicates what
is in the existing code. I'll migrate it all out in subsequent
commits.
Obtained from: b43 (definitions), bcm-v4 specifications website
This will eventually live in sys/dev/bhnd/, but I won't use that until
we migrate the whole driver over.
So, this'll live here for now.
Obtained from: Linux b43 (definitions)
When a file is opened write-only and a partial block was written,
buffered I/O would try and read the whole block in. This would
result in a hung thread, since there was no open (fuse filehandle)
that allowed reading. This patch avoids the problem by forcing
DIRECT_IO for this case.
It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO
flag in its reply to the open.
Tested by: nishida@asusa.net, freebsd@moosefs.com
PR: 194293, 206238
MFC after: 2 weeks
bwn_sqrt() is in the PHY-LP code but is also needed by the upcoming
PHY-N support.
The other two routines are used by the PHY-N code.
The next commit will introduce it into the compile and pull bwn_sqrt()
out of the PHY-LP source.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.
This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.
This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.
However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.
Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.
As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.
These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).
PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix
again hopefully.
Rather than blindly removing a supposedly unused variable as reported by
the Clang Static Analyzer, inspect the code and hide them with proper
#ifdefs as they are used in certain conditional parts of the code.