current lock classes KPI it was really difficult because there was no
way to pass an rmtracker object to the lock/unlock routines. In order
to accomplish the task, modify the aforementioned functions so that
they can return (or pass as argument) an uinptr_t, which is in the rm
case used to hold a pointer to struct rm_priotracker for current
thread. As an added bonus, this fixes rm_sleep() in the rm shared
case, which right now can communicate priotracker structure between
lc_unlock()/lc_lock().
Suggested by: jhb
Reviewed by: jhb
Approved by: re (delphij)
in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.
The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
The SDM (June 2013) tables on these are rather confusing. Yes, they
assign the same name (BR_MISP_RETIRED.ALL_BRANCHES) to two codes
(C5H/00H and C5H/04H.) The latter however is the PEBS version.
So, to make it easier to see the difference - and yes, we can use
both without having to actually enable the PEBS specific bits! -
just rename the PEBS one to _PS so there's no clashing.
Tested:
* Sandy bridge
bridge Xeon.
Summary: These are PEBS events but they're also available as normal
counter/sample events. The source table (Table 19-2) lists the
base versions (LOAD, STLB_MISS, SPLIT, ALL) but it says they must
be qualified with other values. This particular commit fleshes
out those umask values.
Source:
* Linux; SDM June 2013, Volume 3B, Table 19-2 and 18-21.
Tested:
* Sandy Bridge (non-Xeon)
kld_unload event handler which gets invoked after a linker file has been
successfully unloaded. The kld_unload and kld_load event handlers are now
invoked with the shared linker lock held, while kld_unload_try is invoked
with the lock exclusively held.
Convert hwpmc(4) to use these event handlers instead of having
kern_kldload() and kern_kldunload() invoke hwpmc(4) hooks whenever files are
loaded or unloaded. This has no functional effect, but simplifes the linker
code somewhat.
Reviewed by: jhb
* Add in MEM_LOAD_UOPS_LLC_HIT_RETIRED for both sandy bridge and sandy
bridge Xeon. Right now it only is enabled for Sandy Bridge.
* D2/0F is actually a combination rather than a separate counter, so
just flip that on for the CPU types that support it.
There's an errata for using this on SB Xeon hardware - I've documented
it in kern/181346.
Tested:
* Sandy Bridge
* Sandy Bridge Xeon
Sponsored by: Netflix, Inc.
versions than the one in base (dim@ mentioned he tried on 4.7.3 and 4.8.1)
do not whine about it, so, at some point this workaround will be reverted.
Reported by: ache
Discussed with: dim
those of some non-architectural core events. This is not a problem in the
general case as long as there's an 1:1 mapping between the two, but there
are few exceptions. For example, 3CH_01H on Nehalem/Westmere represents
both unhalted-reference-cycles and CPU_CLK_UNHALTED.REF_P.
CPU_CLK_UNHALTED.REF_P on the aforementioned architectures does not measure
reference (i.e. bus) but TSC, so there's the need to disambiguate.
In order to avoid the namespace collision rename all the architectural
events in a way they cannot be ambigous and refactor the architectural
events handling function to reflect this change.
While here, per Jim Harris request, rename
iap_architectural_event_is_unsupported() to iap_event_is_architectural().
Discussed with: jimharris
Reviewed by: jimharris, gnn
at least if FreeBSD is ran under VirtualBox. In order to avoid the leakage,
properly deallocate structures in case CPU claims that hw performance
monitoring counters are not supported.
Reported by: hiren
0x3C: /* Per Intel document 325462-045US 01/2013. */
Add manpage to document all the goodness that is available in this
processor model.
Submitted by: hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by: jimharris, sbruno
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.
The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho
case 0x3E: /* Per Intel document 325462-045US 01/2013. */
Add manpage to document all the goodness that is available in this
processor model.
No support for uncore events at this time.
Submitted by: hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by: davide, jimharris, sbruno
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
executed. This means past the point where userret() is generally
executed.
Skip the td_pinned check if a callchain tracing is currently happening
and add a more robust check to pmc_capture_user_callchain() in order to
catch td_pinned leak past ast() in hwpmc case.
Reported and tested by: fabient
MFC after: 1 week
X-MFC: r240246
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
(Model 0x2D /* Per Intel document 253669-044US 08/2012. */)
Add manpage to document all the goodness that is available in this
processor model.
No support for uncore events at this time.
Submitted by: hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by: jimharris@ fabient@
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
trap checks (eg. printtrap()).
Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.
Reviewed by: bde, jhb
MFC after: 1 week
Due to some differences in MSRs between Xeon Sandy Bridge and Core Sandy
Bridge (Model 0x2A), wrmsr() may generate in a GP# fault exception and so a
panic of the machine.
Approved by: gnn (mentor)
MFC after: 3 days
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
Sponsored by: NETASQ
MFC after: 1 month
- Do not cover error returned by pmc_core_initialize with the
result of pmc_uncore_initialize, fail right away.
- Give a user something to report instead failing silently
Reported by: Alexandr Kovalenko <never@nevermind.kiev.ua>
- Replace MIPS24K-specific code with more generic framework that will
make adding new CPU support easier
- Add MIPS24K support for new framework
- Limit backtrace depth to 1 for stability reasons and add option
HWPMC_MIPS_BACKTRACE to override this limitation
processors with CPUID signature 06_1AH, 06_1EH, and 06_1FH.
Refuse to allocate them on unsupported model.
Submitted by: Davide Italiano <davide.italiano@gmail.com>
MFC after: 1 month
This is a bit hackish and should be made more generic (ie, support more than
two hard-coded performance counter+config register pairs) so it can be used
for mips74k and other chips.
All this does is process the initial interrupt event. It doesn't (yet) handle
callgraph events, so even if you route the exception/interrupt to this routine
and flip the bit on, it will hang and crash pmc unless you disable callgraph
support when you enable a sample based PMC.
As the underlying block is 4KB if the PMC throughput is low the measurement
will be reported on the next tick. pmcstat(8) use the modified flush API to
reclaim current buffer before displaying next top.
MFC after: 1 month
* Add the interrupt bit in the configuration register
* Correctly set the counter register for the sampling overflow
interrupt. The interrupt is asserted when bit 31 is set.
So set the overflow value at 0x80000000 and subtract the
programmed value as appropriate.
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.
Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).
Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.
The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN
while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.
Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now
The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.
Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno
a PMC. It was possible that we could have turned a bit on but
never cleared it.
Extend the calls to rdmsr() to all necessary functions, not
just those which previously caused a panic.
Pointed out by: jhb@
MFC after: 1 week
All of the necessary wrmsr calls are now preceded by a rdmsr
and we leave the reserved bits alone.
Document the bits in the relevant registers for future reference.
Tested by: mdf
MFC after: 1 week
This bug would cause allocating sample-mode PMCs to fail with ENOMEM after allocating several counting-mode PMCs.
Approved by: jkoshy (mentor)
MFC after: 2 weeks
domain clock, 8 programmable PMC.
- Westmere based CPU (Xeon 5600, Corei7 980X) support.
- New man pages with events list for core and uncore.
- Updated Corei7 events with Intel 253669-033US December 2009 doc.
There is some removed events in the documentation, they have been
kept in the code but documented in the man page as obsolete.
- Offcore response events can be setup with rsp token.
Sponsored by: NETASQ
pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.
This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.
MFC after: 3 days
* Correct a group of typos: for Core2 programmable events, check
user supplied umask values against the correct event descriptor
field.
Submitted by: Ryan Stone <rysto32 at gmail dot com>
This brings hwpmc(4) support for 2nd and 3rd generation XScale cores.
Right now it's enabled by default to make sure we test this a bit.
When the time comes it can be disabled by default.
Tested on Gateworks boards.
A man page is coming.
Obtained from: //depot/user/rpaulo/xscalepmc/...
- fix a system deadlock on process exit when the sample buffer
is full (pmclog_loop blocked in fo_write) and pmcstat exit.
Reviewed by: jkoshy
MFC after: 3 weeks