11804 Commits

Author SHA1 Message Date
davidxu
6616b254f2 - According to specification, SI_USER code should only be generated by
standard kill(). On other systems, SI_LWP is generated by lwp_kill().
  This will allow conforming applications to differentiate between
  signals generated by standard events and those generated by other
  implementation events in a manner compatible with existing practice.
- Bump __FreeBSD_version
2010-08-24 07:22:24 +00:00
imp
50d4a3193c This should really be MACHINE not MACHINE_ARCH, and is this Makefile even used? 2010-08-23 06:22:35 +00:00
brian
89b2d8bbb4 uio_resid isn't updated by VOP_READDIR for nfs filesystems. Use
the uio_offset adjustment instead to calculate a correct *len.

Without this change, we run off the end of the directory data
we're reading and panic horribly for nfs filesystems.

MFC after:	1 week
2010-08-23 05:33:31 +00:00
rpaulo
dda24289cb Call the systrace_probe_func() when the error value.
Sponsored by:	The FreeBSD Foundation
2010-08-22 11:30:49 +00:00
rpaulo
ea11ba6788 Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
rpaulo
6f62630bc2 Bump KDTRACE_THREAD_ZERO and use M_ZERO as a malloc flag instead of
calling bzero.

Sponsored by:	The FreeBSD Foundation
2010-08-22 11:09:53 +00:00
rpaulo
a34abf7c98 Fix style issues.
Sponsored by:	The FreeBSD Foundation
2010-08-22 11:08:18 +00:00
davidxu
84d25462c9 make sure thread lock is locked. 2010-08-20 23:51:34 +00:00
jhb
d4890c88b0 Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created.  Use them to implement macros that
otherwise manipulated the flags directly.  Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe.  This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by:	kib
MFC after:	3 days
2010-08-20 19:46:50 +00:00
davidxu
89f466d2b2 If thread set a TDP_WAKEUP for itself, clears the flag and returns EINTR
immediately, this is used for implementing reliable pthread cancellation.
2010-08-20 04:28:30 +00:00
jhb
d02cab2556 Remove unused KTRACE includes. 2010-08-19 16:41:27 +00:00
jhb
2f662f7a9c There isn't really a need to hold the ktrace mutex just to read the value
of p_traceflag that is stored in the kinfo_proc structure.  It is still
racey even with the lock and the code will read a consistent snapshot of
the flag without the lock.
2010-08-19 16:40:30 +00:00
jhb
faa167a723 Fix a whitespace nit and remove a questioning comment. STAILQ_CONCAT()
does require the STAILQ the existing list is being added to to already
be initialized (it is CONCAT() vs MOVE()).
2010-08-19 16:38:58 +00:00
jhb
d64a4df941 Keep the process locked when calling ktrops() or ktrsetchildren() instead
of dropping the lock only to immediately reacquire it.
2010-08-17 21:34:19 +00:00
kib
d9f088a03e Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by:	marius (sparc64)
MFC after:	1 month
2010-08-17 08:55:45 +00:00
pjd
120209c66c Simplify taskqueue_drain() by using proved macros. 2010-08-13 19:20:35 +00:00
gibbs
f5039e4d7d Allow interrupt driven config hooks to be registered from config hook callbacks.
Interrupt driven configuration hooks serve two purposes: they are a
mechanism for registering for a callback that is invoked once interrupt
services are available, and they hold off root device selection so long
as any configuration hooks are still active.  Before this change, it was
not possible to safely register additional hooks from the context of a
configuration hook callback.  The need for this feature arises when
interrupts are required to discover new devices (e.g. access to the XenStore
to find para-virtualized devices) which in turn also require the ability
to hold off root device selection until some lengthy, interrupt driven,
configuration task has completed (e.g. Xen front/back device driver
negotiation).

More specifically, the mutex protecting the list of active configuration
hooks is never held during a callback, and static information is used
to ensure proper ordering and only a single callback to each hook even
when faced with registration or removal of a hook during an active run.

Sponsored by:	Spectra Logic Corporation
MFC after:      1 week.
2010-08-12 19:50:40 +00:00
gibbs
6b6ab892f9 Properly indent a continue statement. No functional changes. 2010-08-12 19:26:27 +00:00
jkim
f9341f06d7 Add the half of time-of-day clock resolution when we adjust system time from
time-of-day clock or vice versa.  For x86 systems, RTC resolution is one
second and we used to lose up to one second whenever we initialize system
time from RTC or write system time back to RTC.  With this change, margin
of error per conversion is roughly between -0.5 and +0.5 second rather
than between -1 and 0 second.  Note that it does not take care of errors
from getnanotime(9) (which is up to 1/hz second) or CLOCK_GETTIME() latency.
These are just too expensive to correct and it is not worthy of the cost.
2010-08-12 17:17:05 +00:00
jkim
1974c6514b Provide description for 'machdep.disable_rtc_set' sysctl. Clean up style(9)
nits.  Remove a redundant return statement and an unnecessary variable.
2010-08-12 16:13:24 +00:00
kib
ade28bdd40 The buffers b_vflags field is not always properly protected by
bufobj lock. If b_bufobj is not NULL, then bufobj lock should be
held when manipulating the flags. Not doing this sometimes leaves
BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to
stuck indefinitely in "getbuf" sleep, waiting for background write to
finish which is not actually performed.

Add BO_LOCK() in the cases where it was missed.

In collaboration with:	pho
Tested by:	bz
Reviewed by:	jeff
MFC after:	1 month
2010-08-12 08:36:23 +00:00
mdf
0737955344 Rework memguard(9) to reserve significantly more KVA to detect
use-after-free over a longer time.  Also release the backing pages of
a guarded allocation at free(9) time to reduce the overhead of using
memguard(9).  Allow setting and varying the malloc type at run-time.
Add knobs to allow:

 - randomly guarding memory
 - adding un-backed KVA guard pages to detect underflow and overflow
 - a lower limit on the size of allocations that are guarded

Reviewed by:    alc
Reviewed by:    brueffer, Ulrich Spörlein <uqs spoerlein net> (man page)
Silence from:   -arch
Approved by:    zml (mentor)
MFC after:      1 month
2010-08-11 22:10:37 +00:00
ivoras
9cecaf1c60 Fix (hopefully) the spelling of "queuing."
Submitted by:	bf1783 at gmail com
2010-08-09 23:32:37 +00:00
ivoras
191f678b27 Bumping the read-ahead count once more, to value equivalent to 512 KiB on
most system, based on benchmark results on a low-end fibre channel SAN
under VMWare:

vfs.read_max		read performance
8  (historical default)	83 MB/s
16 (recent bump)	131 MB/s
32 (this version)	152 MB/s
64			157 MB/s

(results are +/- 3 MB/s)

As read-ahead is heuristic, based on past IO requests, it shouldn't be
problematic. The new default is still smaller then in other OSes.
2010-08-09 22:56:10 +00:00
ivoras
fa067e3c30 Elaborate on how hirunningspace was chosen. 2010-08-09 22:22:46 +00:00
gavin
dbc7cd5ae9 Add descriptions to a handful of sysctl nodes.
PR:		kern/148580
Submitted by:	Galimov Albert <wtfcrap mail.ru>
MFC after:	1 week
2010-08-09 14:48:31 +00:00
attilio
307b2c04a2 The r208165 fixed a bug related to unsigned integer overflowing for the
number of CPUs detection.
However, that was not mention at all, the problem was not reported, the
patch has not been MFCed and the fix is mostly improper.

Fix the original overflow (caused when 32 CPUs must be detected) by
just using a different mathematical computation (it also makes more
explicit the size of operands involved, which is good in the moment
waiting for a more complete support for a large number of CPUs).

PR:		kern/148698
Submitted by:	Joe Landers <jlanders at vmware dot com>
Tested by:	gianni
MFC after:	10 days
2010-08-09 00:23:57 +00:00
jamie
4e0690ba81 Back out r210974. Any convenience of not typing "persist" is outweighed
by the possibility of unintended partially-formed jails.
2010-08-08 23:22:55 +00:00
ivoras
252207bbfb To help with sequential read UFS performance on modern systems, increase
the vfs.read_max default. For most systems this means going from 128 KiB
to 256 KiB, which is still very conservative and lower than what most
other operating systems use, but as a sane default should not
interfere much with existing systems.

For systems with RAID volumes and/or virtualization envirnments, where
read performance is very important, increasing this sysctl tunable to 32
or even more will demonstratively yield additional performance benefits.

If MAXPHYS ever gets bumped up, it will probably be a good idea to slave
read_max to it.
2010-08-07 18:30:10 +00:00
tuexen
542f657a7f Fix a bug where MSG_TRUNC was not returned in all necessary cases for
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.

Reviewed by: bz
MFC after: 3 weeks
2010-08-07 17:57:58 +00:00
jamie
37e8c8fb79 Implicitly make a new jail persistent if it's set not to attach.
MFC after:	3 days
2010-08-06 22:04:18 +00:00
jhb
19ddbf5c38 Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
csjp
1e529a8eb9 Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes
the virtualization detection successfully disabling the clflush instruction.
This fixes insta-panics for XEN hvm users when the hw.clflush_disable
tunable is -1 or 0 (-1 by default).

Discussed with:	jhb
2010-08-06 15:04:40 +00:00
kib
7c864123d4 Add "show cdev" ddb command.
In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:44:01 +00:00
kib
ba7ee96f4a Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:42:15 +00:00
alc
b6ec5a5f0a In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we must
cast PAGE_SIZE to an "int".  (Powerpc and sparc, unlike the other
architectures, define PAGE_SIZE as a "long".)

Submitted by:	Andreas Tobler
2010-08-04 05:09:02 +00:00
alc
329f9f0435 Update the "desiredvnodes" calculation. In particular, make the part of
the calculation that is based on the kernel's heap size more conservative.
Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the
time being set MAXVNODES_MAX to a large value.

Reviewed by:	jhb@
MFC after:	6 weeks
2010-08-02 21:33:36 +00:00
rpaulo
1c3476a3fa Bump the witness pendlist to 768 to accomodate the increased number of
spinlocks.
2010-07-29 16:13:26 +00:00
mdf
6857471cf3 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
alc
55426fcc55 The interpreter name should no longer be treated as a buffer that can be
overwritten.  (This change should have been included in r210545.)

Submitted by:	kib
2010-07-28 04:47:40 +00:00
alc
256c63de28 Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name.  The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by:	kib
2010-07-27 17:31:03 +00:00
alc
02c0473d35 Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer.  For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%.  It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings.  Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings.  While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map.  In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated.  The old size was one page too
large.  (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by:	kib
2010-07-25 17:43:38 +00:00
alc
0c709bf109 Eliminate a little bit of duplicated code. 2010-07-23 18:58:27 +00:00
avg
b44b5ccee0 completely ignore zero-sized elf sections in modules of elf object type (amd64)
Current code doesn't check size of elf sections and may perform needless
actions of zero-sized memory allocation and similar.
The bigger issue is that alignment requirement of a zero-sized section
gets effectively applied to the next section if it has smaller alignment
requirement.  But other tools, like gdb and consequently kgdb,
completely ignore zero-sized sections and thus may map symbols to
addresses differently.

Zero-sized sections are not typical in general.
Their typical (only, even) cause in FreeBSD modules is inline assembly that
creates custom sections which is found in pcpu.h and vnet.h.  Mere inclusion
of one of those header files produces a custom section in elf output.
If there is no actual use for the section in a given module, then the
section remains empty.

Better solution is to avoid creating zero-sized sections altogether,
which is in plans.

Preloaded modules are handled in boot code (load_elf_obj.c), while
dynamically loaded modules are handled by kernel (link_elf_obj.c).

Based on code by:	np
MFC after:		3 weeks
2010-07-23 17:07:51 +00:00
avg
0152f7748b cpufreq: allocate long-lived buffer for handling of sysctl requests
At present the cpufreq sysctl handler for current level setting would
allocate and deallocate a temporary buffer of 24KB even to handle a
read-only query.  This puts unnecessary load on memory subsystem when
current level is checked frequently, e.g. when the likes of powerd
and system monitoring software are running.
Change the strategy to allocating a long-lived buffer for handling the
requests.

Reviewed by:	njl
MFC after:	2 weeks
2010-07-23 16:46:42 +00:00
ivoras
dd4be6368d Make lorunningspace catch up with hirunningspace.
While there, add comment about the magic numbers.

Prodded by:	alc
2010-07-23 12:30:29 +00:00
mdf
e8106ea76c Remove unused variable that snuck in during development.
Approved by:    zml (mentor)
2010-07-22 17:23:43 +00:00
mdf
fa23fa820a Fix taskqueue_drain(9) to not have false negatives. For threaded
taskqueues, more than one task can be running simultaneously.

Also make taskqueue_run(9) static to the file, since there are no
consumers in the base kernel and the function signature needs to change
with this fix.

Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the
taskqueue(9) man page.

Reviewed by:    jhb
Approved by:    zml (mentor)
2010-07-22 16:41:09 +00:00
kib
9ac2754b6d When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
trasz
a5239fb269 Remove spurious '/*-' marks and fix some other style problems.
Submitted by:	bde@
2010-07-22 05:42:29 +00:00