Commit Graph

12141 Commits

Author SHA1 Message Date
John Baldwin
f83e8b25c1 Fix a race in the SMP rendezvous code. Specifically, the write by the
last CPU to to finish the rendezvous action may become visible to
different CPUs at different times.  As a result, the CPU that initiated
the rendezvous may exit the rendezvous and drop the lock allowing another
rendezvous to be initiated on the same CPU or a different CPU.  In that
case the exit sentinel may be cleared before all CPUs have noticed causing
those CPUs to hang forever.

Workaround this by using a generation count to notice when this race
occurs and to exit the rendezvous in that case.

The problem was independently diagnosted by mlaier@ and avg@ as well.

Submitted by:	neel
Reviewed by:	avg, mlaier
Obtained from:	NetApp
MFC after:	1 week
2011-05-17 16:39:08 +00:00
Poul-Henning Kamp
384bf94c48 Use memset() instead of bzero() and memcpy() instead of bcopy(), there
is no relevant difference for sbufs, and it increases portability of
the source code.

Split the actual initialization of the sbuf into a separate local
function, so that certain static code checkers can understand
what sbuf_new() does, thus eliminating on silly annoyance of
MISRA compliance testing.

Contributed by:		An anonymous company in the last business I
			expected sbufs to invade.
2011-05-17 11:04:50 +00:00
Poul-Henning Kamp
eb05ee7a71 Don't expect PAGE_SIZE to exist on all platforms (It is a pretty arbitrary
choice of default size in the first place)

Reverse the order of arguments to the internal static sbuf_put_byte()
function to match everything else in this file.

Move sbuf_putc_func() inside the kernel version of sbuf_vprintf
where it belongs.

sbuf_putc() incorrectly used sbuf_putc_func() which supress NUL
characters, it should use sbuf_put_byte().

Make sbuf_finish() return -1 on error.

Minor stylistic nits fixed.
2011-05-17 06:36:32 +00:00
Attilio Rao
d59dd76c22 Merge r221278 from largeSMP project:
idle_cpus_mask is just used in sched_4bsd, thus make it private for it.

Tested by:	several
2011-05-16 23:20:12 +00:00
Poul-Henning Kamp
71c2bc5c6b Change the length quantities of sbufs to be ssize_t rather than int.
Constify a couple of arguments.
2011-05-16 16:18:40 +00:00
Andriy Gapon
dd7498ae03 better integrate cyclic module with clocksource/eventtimer subsystem
Now in the case when one-shot timers are used cyclic events should fire
closer to theier scheduled times.  As the cyclic is currently used only
to drive DTrace profile provider, this is the area where the change
makes a difference.

Reviewed by:	mav (earlier version, a while ago)
X-MFC after:	clocksource/eventtimer subsystem
2011-05-16 15:29:59 +00:00
Matthew D Fleming
fa2c76c975 Correctly use INOUT for the offset/len parameters to vop_allocate. As
far as I can tell this is for documentation only at the moment.
2011-05-13 14:29:28 +00:00
Alexander Motin
167aee3895 Refactor Xen PV code to use new event timers subsystem. That uses one-shot
Xen timer and time counter to provide one-shot and periodic time events.

On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.

Now only ia64 and some ARMs left not migrated to the new event timers.
2011-05-13 12:39:37 +00:00
Matthew D Fleming
3d08a76bbc Use a name instead of a magic number for kern_yield(9) when the priority
should not change.  Fetch the td_user_pri under the thread lock.  This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by:	Jason Behmer < jason DOT behmer AT isilon DOT com >
2011-05-13 05:27:58 +00:00
Stanislav Sedov
ff6f41a472 - Do no try to drop a NULL filedesc pointer. 2011-05-12 10:56:33 +00:00
Stanislav Sedov
0daf62d9f5 - Commit work from libprocstat project. These patches add support for runtime
file and processes information retrieval from the running kernel via sysctl
  in the form of new library, libprocstat.  The library also supports KVM backend
  for analyzing memory crash dumps.  Both procstat(1) and fstat(1) utilities have
  been modified to take advantage of the library (as the bonus point the fstat(1)
  utility no longer need superuser privileges to operate), and the procstat(1)
  utility is now able to display information from memory dumps as well.

  The newly introduced fuser(1) utility also uses this library and able to operate
  via sysctl and kvm backends.

  The library is by no means complete (e.g. KVM backend is missing vnode name
  resolution routines, and there're no manpages for the library itself) so I
  plan to improve it further.  I'm commiting it so it will get wider exposure
  and review.

  We won't be able to MFC this work as it relies on changes in HEAD, which
  was introduced some time ago, that break kernel ABI.  OTOH we may be able
  to merge the library with KVM backend if we really need it there.

Discussed with:	rwatson
2011-05-12 10:11:39 +00:00
Jaakko Heinonen
852bee75b7 To avoid duplicated warning, move WITNESS_WARN() added in r221597 to the
branch which doesn't call malloc(9).

Suggested by:	kib
2011-05-07 17:59:07 +00:00
Jaakko Heinonen
816c203937 Add WITNESS_WARN() to getenv() to explicitly note that the function may
sleep. This helps to expose bugs when the requested environment variable
doesn't exist.
2011-05-07 11:10:58 +00:00
Andrey V. Elsukov
b50a7799b8 Add make_dev_alias_p() function. It is similar to make_dev_alias(),
but it may return an error like make_dev_p() does.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
2011-05-03 18:54:18 +00:00
Edward Tomasz Napierala
a7ad07bff3 Change the way rctl interfaces with jails by introducing prison_racct
structure, which acts as a proxy between them.  This makes jail rules
persistent, i.e. they can be added before jail gets created, and they
don't disappear when the jail gets destroyed.
2011-05-03 07:32:58 +00:00
John Baldwin
85ee63c923 Add a new bus method, BUS_ADJUST_RESOURCE() that is intended to be a
wrapper around rman_adjust_resource().  Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus.  There is currently no default implementation.  A
bus_adjust_resource() wrapper is provided for use in drivers.
2011-04-29 21:36:45 +00:00
John Baldwin
bb82622c3e Extend the rman(9) API to support altering an existing resource.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
  existing resource.  It only succeeds if the newly requested address
  space is already free.  It also supports shrinking a resource in
  which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
  start and end addresses for the first or last unallocated region in
  an rman, respectively.  This can be used to determine by how much
  the resource backing an rman must be adjusted to accomodate an
  allocation request that does not fit into the existing rman.

While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
2011-04-29 20:05:19 +00:00
John Baldwin
b67d11bbcc Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
  ~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
  rm_end to allow managing the full range (0 to ~0ul) if they are not set by
  the caller when rman_init() is called.
2011-04-29 18:41:21 +00:00
Attilio Rao
2be767e069 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
Ryan Stone
60dd73b78b If the 4BSD scheduler tries to schedule a thread that has been pinned or
bound to an AP before SMP has started, the system will panic when we try
to touch per-CPU state for that AP because that state has not been
initialized yet.  Fix this in the same way as ULE: place all threads in
the global run queue before SMP has started.

Reviewed by:	jhb
MFC after:	1 month
2011-04-26 20:34:30 +00:00
Konstantin Belousov
b2ad91f26b Implement the delayed task execution extension to the taskqueue
mechanism. The caller may specify a timeout in ticks after which the
task will be scheduled.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	jeff, jhb
MFC after:	1 month
2011-04-26 11:39:56 +00:00
Jeff Roberson
5bd186a65a - Catch up to falloc() changes.
- PHOLD() before using a task structure on the stack.
 - Fix a LOR between the sleepq lock and thread lock in _intr_drain().
2011-04-26 07:30:52 +00:00
Rick Macklem
c65c068a5f Fix a LOR in vfs_busy() where, after msleeping, it would lock
the mutexes in the wrong order for the case where the
MBF_MNTLSTLOCK is set. I believe this did have the
potential for deadlock. For example, if multiple nfsd threads
called vfs_busyfs(), which calls vfs_busy() with MBF_MNTLSTLOCK.
Thanks go to pho for catching this during his testing.

Tested by:	pho
Submitted by:	kib
MFC after:	2 weeks
2011-04-23 11:22:48 +00:00
Jaakko Heinonen
1b0fe69dc9 Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".

This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.

Reviewed by:	bde
2011-04-22 07:26:09 +00:00
Matthew D Fleming
1ce4508f6d Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9)
drive looping and potentially yielding.

Requested by:	kib
2011-04-19 16:36:24 +00:00
Matthew D Fleming
5d253e418f Fix a copy/paste whitespace error. 2011-04-18 16:40:47 +00:00
Matthew D Fleming
7323776b01 Regen. 2011-04-18 16:32:47 +00:00
Matthew D Fleming
d91f88f7f3 Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by:	-arch (previous version)
2011-04-18 16:32:22 +00:00
Jilles Tjoelker
6100955206 ktrace: Log the code for all signals (PSIG events).
The code provides information on how the signal was generated.

Formerly, the code was only logged for traps, much like only signal handlers
for traps received a meaningful si_code before FreeBSD 7.0.

In rare cases, no information is available and 0 is still logged.

MFC after:	1 week
2011-04-17 14:38:11 +00:00
Dmitry Chagin
fa2835d296 Remove malloc(9) return value checks when M_WAITOK is used.
MFC after:	2 Week
2011-04-16 16:20:51 +00:00
Gleb Smirnoff
443301e296 Revert r194662, since it breaks ng_ksocket(4) and may break
other socket consumers with alternate sb_upcall.

PR:		kern/154676
Submitted by:	Arnaud Lacombe <lacombar gmail.com>
MFC after:	7 days
2011-04-14 14:54:22 +00:00
Sergey Kandaurov
ced9253e4e Remove stale M_ZOMBIE malloc type.
This type is unused since embedding p_ru into struct proc.

MFC after:	1 week
2011-04-14 14:25:47 +00:00
Gavin Atkinson
0f4d3c921d Add a new DDB command, "show rmans", which will show the address and brief
details of each rman header, but not the contents of all rman structures
in the system.  This is especially useful on platforms where some rmans
have many thousands of entries in rmans, making scrolling through the
output of "show all rman" impractical.  Individual rmans can then be viewed
including their contents with "show rman 0xaddr" as usual.

Reviewed by:	jhb
2011-04-13 19:10:56 +00:00
Sergey Kandaurov
6bed196c35 Staticize malloc types.
Approved by:	lstewart
MFC after:	1 week
2011-04-13 11:28:46 +00:00
Lawrence Stewart
891b8ed467 Use the full and proper company name for Swinburne University of Technology
throughout the source tree.

Requested by:	Grenville Armitage, Director of CAIA at Swinburne University of
			Technology
MFC after:	3 days
2011-04-12 08:13:18 +00:00
Edward Tomasz Napierala
415896e3b1 Rename a misnamed structure field (hr_loginclass), and reorder priv(9)
constants to match the order and naming of syscalls.  No functional changes.
2011-04-10 18:35:43 +00:00
Konstantin Belousov
2cfd5d3c91 Some callers of proc_reparent() already have the parent process locked.
Detect the situation and avoid process lock recursion.

Reported by:	Fabian Keil <freebsd-listen fabiankeil de>
2011-04-10 17:07:02 +00:00
Attilio Rao
1283e9cd60 Reintroduce the fix already discussed in r216805 (please check its history
for a detailed explanation of the problems).

The only difference with the previous fix is in Solution2:
CPUBLOCK is no longer set when exiting from callout_reset_*() functions,
which avoid the deadlock (leading to r217161).
There is no need to CPUBLOCK there because the running-and-migrating
assumption is strong enough to avoid problems there.
Furthermore add a better !SMP compliancy (leading to shrinked code and
structures) and facility macros/functions.

Tested by:	gianni, pho, dim
MFC after:	3 weeks
2011-04-08 18:48:57 +00:00
Edward Tomasz Napierala
722581d9e6 Add RACCT_NOFILE accounting.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-06 19:13:04 +00:00
Edward Tomasz Napierala
b1fb5f9c8d Style fix.
Submitted by:	jhb@
2011-04-06 19:08:50 +00:00
Edward Tomasz Napierala
3bcf74459f Add accounting for SysV-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-06 18:11:24 +00:00
John Baldwin
e806d352d2 Fix several places to ignore processes that are not yet fully constructed.
MFC after:	1 week
2011-04-06 17:47:22 +00:00
Edward Tomasz Napierala
8caddd81e2 Add ucred pointer to the SysV-related memory structures. This is required
for racct.

Note that after this commit, ipcs(1) needs to be rebuilt.  Otherwise, it will
fail with "ipcs: sysctlbyname: kern.ipc.msqids: Cannot allocate memory".

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-06 16:59:54 +00:00
Edward Tomasz Napierala
1ba5ad4210 Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
Edward Tomasz Napierala
c98fe0a557 Add missing stubs. 2011-04-05 19:50:34 +00:00
Sergey Kandaurov
443db69597 Remove malloc type M_NETADDR unused since splitting into vfs_subr.c
and vfs_export.c.

MFC after:	1 week
2011-04-04 16:23:01 +00:00
Edward Tomasz Napierala
6fd8c2bd8a Add accounting for RACCT_NPTS. 2011-04-02 15:02:42 +00:00
Konstantin Belousov
1fe80828e7 After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9)
and remove the falloc() version that lacks flag argument. This is done
to reduce the KPI bloat.

Requested by:	jhb
X-MFC-note:	do not
2011-04-01 13:28:34 +00:00
Konstantin Belousov
7332c129e0 Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
Edward Tomasz Napierala
58c77a9d53 Enable accounting for RACCT_NPROC and RACCT_NTHR.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-03-31 19:22:11 +00:00