11874 Commits

Author SHA1 Message Date
jamie
37e8c8fb79 Implicitly make a new jail persistent if it's set not to attach.
MFC after:	3 days
2010-08-06 22:04:18 +00:00
jhb
19ddbf5c38 Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid.  Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead.  This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by:	peter, sbruno
Reviewed by:	rookie
Obtained from:	Yahoo! (x86)
MFC after:	1 month
2010-08-06 15:36:59 +00:00
csjp
1e529a8eb9 Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes
the virtualization detection successfully disabling the clflush instruction.
This fixes insta-panics for XEN hvm users when the hw.clflush_disable
tunable is -1 or 0 (-1 by default).

Discussed with:	jhb
2010-08-06 15:04:40 +00:00
kib
7c864123d4 Add "show cdev" ddb command.
In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:44:01 +00:00
kib
ba7ee96f4a Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with:	pho
MFC after:	1 month
2010-08-06 09:42:15 +00:00
alc
b6ec5a5f0a In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we must
cast PAGE_SIZE to an "int".  (Powerpc and sparc, unlike the other
architectures, define PAGE_SIZE as a "long".)

Submitted by:	Andreas Tobler
2010-08-04 05:09:02 +00:00
alc
329f9f0435 Update the "desiredvnodes" calculation. In particular, make the part of
the calculation that is based on the kernel's heap size more conservative.
Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the
time being set MAXVNODES_MAX to a large value.

Reviewed by:	jhb@
MFC after:	6 weeks
2010-08-02 21:33:36 +00:00
rpaulo
1c3476a3fa Bump the witness pendlist to 768 to accomodate the increased number of
spinlocks.
2010-07-29 16:13:26 +00:00
mdf
6857471cf3 Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma
zones for each malloc bucket size.  The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class.  This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused.  At this point inspection or memguard(9)
can be used to catch the offending code.

Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.

This code is based on work by Ron Steinke at Isilon Systems.

Reviewed by:    -arch (mostly silence)
Reviewed by:    zml
Approved by:    zml (mentor)
2010-07-28 15:36:12 +00:00
alc
55426fcc55 The interpreter name should no longer be treated as a buffer that can be
overwritten.  (This change should have been included in r210545.)

Submitted by:	kib
2010-07-28 04:47:40 +00:00
alc
256c63de28 Introduce exec_alloc_args(). The objective being to encapsulate the
details of the string buffer allocation in one place.

Eliminate the portion of the string buffer that was dedicated to storing
the interpreter name.  The pointer to the interpreter name can simply be
made to point to the appropriate argument string.

Reviewed by:	kib
2010-07-27 17:31:03 +00:00
alc
02c0473d35 Change the order in which the file name, arguments, environment, and
shell command are stored in exec*()'s demand-paged string buffer.  For
a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces
the number of global TLB shootdowns by 31%.  It also eliminates about
330k page faults on the kernel address space.

Change exec_shell_imgact() to use "args->begin_argv" consistently as
the start of the argument and environment strings.  Previously, it
would sometimes use "args->buf", which is the start of the overall
buffer, but no longer the start of the argument and environment
strings.  While I'm here, eliminate unnecessary passing of "&length"
to copystr(), where we don't actually care about the length of the
copied string.

Clean up the initialization of the exec map.  In particular, use the
correct size for an entry, and express that size in the same way that
is used when an entry is allocated.  The old size was one page too
large.  (This discrepancy originated in 2004 when I rewrote
exec_map_first_page() to use sf_buf_alloc() instead of the exec map
for mapping the first page of the executable.)

Reviewed by:	kib
2010-07-25 17:43:38 +00:00
alc
0c709bf109 Eliminate a little bit of duplicated code. 2010-07-23 18:58:27 +00:00
avg
b44b5ccee0 completely ignore zero-sized elf sections in modules of elf object type (amd64)
Current code doesn't check size of elf sections and may perform needless
actions of zero-sized memory allocation and similar.
The bigger issue is that alignment requirement of a zero-sized section
gets effectively applied to the next section if it has smaller alignment
requirement.  But other tools, like gdb and consequently kgdb,
completely ignore zero-sized sections and thus may map symbols to
addresses differently.

Zero-sized sections are not typical in general.
Their typical (only, even) cause in FreeBSD modules is inline assembly that
creates custom sections which is found in pcpu.h and vnet.h.  Mere inclusion
of one of those header files produces a custom section in elf output.
If there is no actual use for the section in a given module, then the
section remains empty.

Better solution is to avoid creating zero-sized sections altogether,
which is in plans.

Preloaded modules are handled in boot code (load_elf_obj.c), while
dynamically loaded modules are handled by kernel (link_elf_obj.c).

Based on code by:	np
MFC after:		3 weeks
2010-07-23 17:07:51 +00:00
avg
0152f7748b cpufreq: allocate long-lived buffer for handling of sysctl requests
At present the cpufreq sysctl handler for current level setting would
allocate and deallocate a temporary buffer of 24KB even to handle a
read-only query.  This puts unnecessary load on memory subsystem when
current level is checked frequently, e.g. when the likes of powerd
and system monitoring software are running.
Change the strategy to allocating a long-lived buffer for handling the
requests.

Reviewed by:	njl
MFC after:	2 weeks
2010-07-23 16:46:42 +00:00
ivoras
dd4be6368d Make lorunningspace catch up with hirunningspace.
While there, add comment about the magic numbers.

Prodded by:	alc
2010-07-23 12:30:29 +00:00
mdf
e8106ea76c Remove unused variable that snuck in during development.
Approved by:    zml (mentor)
2010-07-22 17:23:43 +00:00
mdf
fa23fa820a Fix taskqueue_drain(9) to not have false negatives. For threaded
taskqueues, more than one task can be running simultaneously.

Also make taskqueue_run(9) static to the file, since there are no
consumers in the base kernel and the function signature needs to change
with this fix.

Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the
taskqueue(9) man page.

Reviewed by:    jhb
Approved by:    zml (mentor)
2010-07-22 16:41:09 +00:00
kib
9ac2754b6d When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
trasz
a5239fb269 Remove spurious '/*-' marks and fix some other style problems.
Submitted by:	bde@
2010-07-22 05:42:29 +00:00
mav
f7b270cbd0 Use proper sysctl type (quad) for et_frequency. It fixes output on sparc64. 2010-07-21 12:23:49 +00:00
attilio
800d46f6e4 Probabilly defaulting to KTR_GEN is not the right decision when KTR_MASK
is not defined at all because KTR_GEN is still a valid class and some
traces may fit in. Default to 0, instead, and block any tracing.

As long as this is a POLA violation (some thirdy-part code, even if
that may be a questionable choice, could be rely on that feature) a
MFC possibility might be carefully evaluated.

Sponsored by:	Sandvine Incorporated
2010-07-21 10:14:04 +00:00
mav
0ea74c96a2 Fix several un-/signedness bugs of r210290 and r210293. Add one more check. 2010-07-20 15:48:29 +00:00
ivoras
d9b793e64d Fix expression style.
Prodded by: jhb
2010-07-20 13:59:51 +00:00
mav
1021ed9c1f Extend timer driver API to report also minimal and maximal supported period
lengths. Make MI wrapper code to validate periods in request. Make kernel
clock management code to honor these hardware limitations while choosing hz,
stathz and profhz values.
2010-07-20 10:58:56 +00:00
davidxu
cdb7adc908 Fix function name in error messages. 2010-07-20 02:23:12 +00:00
trasz
3e54021797 Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by:	cperciva@
2010-07-18 20:57:53 +00:00
trasz
935237a66a The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.
2010-07-18 20:23:10 +00:00
trasz
dd1ffe6ba1 Remove outdated comment and move part of it into more applicable place. 2010-07-18 19:29:12 +00:00
ivoras
56cd1257b0 In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace on
machines which can clearly afford the memory.

This is a somewhat conservative version of the patch - more fine tuning may be
necessary.

Idea from: Thread on hackers@
Discussed with: alc
2010-07-18 10:15:33 +00:00
jhb
96d598c33f Retire td_syscalls now that it is no longer needed. 2010-07-15 20:24:37 +00:00
ivoras
3fb9f87a34 A cosmetic change - don't output empty <flags>. 2010-07-15 13:46:30 +00:00
mav
bd622e7c20 Rename timeevents.c to kern_clocksource.c.
Suggested by:	jhb@
2010-07-14 18:43:27 +00:00
jhb
fb1e0aa66f - Document layout of KTR_STRUCT payload in a comment.
- Simplify ktrstruct() calling convention by having ktrstruct() use
  strlen() rather than requiring the caller to hand-code the length of
  constant strings.

MFC after:	1 month
2010-07-14 17:38:01 +00:00
mav
b8b00841c9 Move timeevents.c to MI code, as it is not x86-specific. I already have
it working on Marvell ARM SoCs, and it would be nice to unify timer code
between more platforms.
2010-07-14 13:31:27 +00:00
cperciva
14d1adbf2c Correctly copy the M_RDONLY flag when duplicating a reference
to an mbuf external buffer.

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-10:07.mbuf
2010-07-13 02:45:17 +00:00
jkim
06b6c2769b Use type-specific inline function imax() instead of deprecated macro MAX().
Prodded by:	bde
2010-07-12 15:32:45 +00:00
alc
db4ca9f5c2 Change the implementation of vm_hold_free_pages() so that it performs at
most one call to pmap_qremove(), and thus one TLB shootdown, instead of one
call and TLB shootdown per page.

Simplify the interface to vm_hold_free_pages().

MFC after:	3 weeks
2010-07-11 20:11:44 +00:00
mav
d760bd51fb Remove interval validation from cpu_tick_calibrate(). As I found, check
was needed at preliminary version of the patch, where number of CPU ticks
was divided strictly on 16 seconds. Final code instead uses real interval
duration, so precise interval should not be important. Same time aliasing
issues around second boundary causes false positives, periodically logging
useless "t_delta ... too long/short" messages when HZ set below 256.
2010-07-11 16:47:45 +00:00
alc
7c09dc242c Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently,
the maintenance of vm_pageout_deficit can be localized to just two places:
vm_page_alloc() and vm_pageout_scan().

This change also corrects an off-by-one error in the maintenance of
vm_pageout_deficit.  Historically, the buffer cache functions, allocbuf()
and vm_hold_load_pages(), have not taken into account that vm_page_alloc()
already increments vm_pageout_deficit by one.

Reviewed by:	kib
2010-07-09 19:38:30 +00:00
jhb
f338f6d0f8 Accidentally committed an older version of this comment rather than the
final one.
2010-07-09 13:59:53 +00:00
jhb
7e3b216a37 Refine a comment.
Reviewed by:	bde
2010-07-09 13:53:25 +00:00
jh
d171161918 Remove redundant high >= 0.
Reported by:	rstone
2010-07-09 10:57:55 +00:00
jkim
93b88a93da Implement optional 'precision' for numbers. Previously, it was parsed but
ignored.  Some third-party modules (e.g., APCICA) prefer this format over
zero padding flag '0'.
2010-07-08 22:13:23 +00:00
jhb
1f4cf66ed2 - Various style and whitespace fixes.
- Make sugid_coredump and kern_logsigexit private to kern_sig.c.

Submitted by:	bde (partially)
MFC after:	1 month
2010-07-08 19:15:26 +00:00
jh
f673b7098a Assert that low and high are >= 0. The allocator doesn't support the
negative range.
2010-07-08 16:53:19 +00:00
attilio
865de58a04 - Simplify logic in handling ticks wrap-up
- Fix a bug where thread may be in sleeping state but the wchan won't
  be set, leading to an empty container for sleepq_type(). [0]

Sponsored by:		Sandvine Incorporated
[0] Submitted by:	Bryan Venteicher
			<bryanv at daemoninthecloset dot org>
MFC after:		3 days
X-MFC:			209577
2010-07-07 12:00:11 +00:00
kib
15d16124c2 In revoke(), verify that VCHR vnode indeed belongs to devfs.
Found and tested by:	pho
MFC after:	1 week
2010-07-06 18:20:49 +00:00
ed
1075ceb3e2 Fix a race condition, where a TTY could be destroyed twice.
There are special cases where tty_rel_free() can be called twice in a
row, namely when closing and revoking the TTY at the same moment. Only
call destroy_dev_sched_cb() once.

Reported by:	Jeremie Le Hen
MFC after:	1 week
2010-07-06 08:56:34 +00:00
kib
15a394fbba Add the ability for the allocflag argument of the vm_page_grab() to
specify the increment of vm_pageout_deficit when sleeping due to page
shortage. Then, in allocbuf(), the code to allocate pages when extending
vmio buffer can be replaced by a call to vm_page_grab().

Suggested and reviewed by:	alc
MFC after:	2 weeks
2010-07-05 21:13:32 +00:00