Commit Graph

12602 Commits

Author SHA1 Message Date
mckusick
7901256b30 Export vinactive() from kern/vfs_subr.c (e.g., make it no longer
static and declare its prototype in sys/vnode.h) so that it can be
called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c)
instead of the body of vinactive() being cut and pasted into
process_deferred_inactive().

Reviewed by: kib
MFC after:   2 weeks
2012-04-11 23:01:11 +00:00
jhb
294ae9574d Allow device_busy() and device_unbusy() to be invoked while a device is
being attached.  This is implemented by adding a new DS_ATTACHING state
while a device's DEVICE_ATTACH() method is being invoked.  A driver is
required to not fail an attach of a busy device.  The device's state will
be promoted to DS_BUSY rather than DS_ACTIVE() if the device was marked
busy during DEVICE_ATTACH().

Reviewed by:	kib
MFC after:	1 week
2012-04-11 20:57:41 +00:00
eadler
2a42c5c4e9 Return EBADF instead of EMFILE from dup2 when the second argument is
outside the range of valid file descriptors

PR:		kern/164970
Submitted by:	Peter Jeremy <peterjeremy@acm.org>
Reviewed by:	jilles
Approved by:	cperciva
MFC after:	1 week
2012-04-11 14:08:09 +00:00
jilles
4360dc9ca8 Remove unused and wrong SA_PROC internal signal property.
The SA_PROC signal property indicated whether each signal number is directed
at a specific thread or at the process in general. However, that depends on
how the signal was generated and not on the signal number. SA_PROC was not
used.
2012-04-09 21:58:58 +00:00
mav
e1ffe54fb7 Microoptimize cpu_search().
According to profiling, it makes one take 6% of CPU time on hackbench
with its million of context switches per second, instead of 8% before.
2012-04-09 18:24:58 +00:00
gleb
fb452e77b0 Add vfs_getopt_size. Support human readable file system options in tmpfs.
Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with:	delphij
MFC after:	2 weeks
2012-04-07 15:27:34 +00:00
melifaro
8b1d10268c - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
jhb
5829de48d9 Add new ktrace records for the start and end of VM faults. This gives
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take.  The new ktrace records are
enabled via the 'p' trace type, and are enabled in the default set of
trace points.

Reviewed by:	kib
MFC after:	2 weeks
2012-04-05 17:13:14 +00:00
davidxu
cc55f4943b In sem_post, the field _has_waiters is no longer used, because some
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.
2012-04-05 03:05:02 +00:00
davidxu
8c31e244f2 umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily.  On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c
2012-04-05 02:24:08 +00:00
np
307ef13f94 - Remove redundant call to pr_ctloutput from code that handles SO_SETFIB.
- Add a check for errors during copyin while here.

Reviewed by:	julian, bz
MFC after:	2 weeks
2012-04-03 18:38:00 +00:00
kib
ff6239a557 When process exists, not only the children shall be reparented to
init, but also the orphans shall be removed from the orphan list,
because the list header is destroyed.

Reported and tested by:	pho
MFC after:	3 days
2012-04-02 19:35:36 +00:00
kib
9ad701f91f Add helper function to remove the process from the orphans list and
use it instead of inlined code.

Tested by:	pho
MFC after:	3 days
2012-04-02 19:34:56 +00:00
jhb
506e2f15b9 Export some more useful info about shared memory objects to userland
via procstat(1) and fstat(1):
- Change shm file descriptors to track the pathname they are associated
  with and add a shm_path() method to copy the path out to a caller-supplied
  buffer.
- Use the fo_stat() method of shared memory objects and shm_path() to
  export the path, mode, and size of a shared memory object via
  struct kinfo_file.
- Add a struct shmstat to the libprocstat(3) interface along with a
  procstat_get_shm_info() to export the mode and size of a shared memory
  object.
- Change procstat to always print out the path for a given object if it
  is valid.
- Teach fstat about shared memory objects and to display their path,
  mode, and size.

MFC after:	2 weeks
2012-04-01 18:22:48 +00:00
davidxu
42d5de0c66 Remove stale comments. 2012-03-31 06:48:41 +00:00
davidxu
0bd3403eb7 Remove trailing semicolon, it is a typo. 2012-03-30 12:57:14 +00:00
davidxu
febc18f31b Fix COMPAT_FREEBSD32 build.
Submitted by: Andreas Tobler < andreast at fgznet dot ch >
2012-03-30 09:03:53 +00:00
davidxu
f7f769bc6d Remove trailing space. 2012-03-30 05:49:32 +00:00
davidxu
5faf75d34c Merge umtxq_sleep and umtxq_nanosleep into a single function by using
an abs_timeout structure which describes timeout info.
2012-03-30 05:40:26 +00:00
davidxu
362bad78ca Reduce code size by creating common timed sleeping function. 2012-03-29 02:46:43 +00:00
fabient
5edfb77dd3 Add software PMC support.
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after:	1 month
2012-03-28 20:58:30 +00:00
rstone
0ee65aa24e Instead of only iterating over the set of known SDT probes when sdt.ko is
loaded and unloaded, also have sdt.ko register callbacks with kern_sdt.c
that will be called when a newly loaded KLD module adds more probes or
a module with probes is unloaded.

This fixes two issues: first, if a module with SDT probes was loaded after
sdt.ko was loaded, those new probes would not be available in DTrace.
Second, if a module with SDT probes was unloaded while sdt.ko was loaded,
the kernel would panic the next time DTrace had cause to try and do
anything with the no-longer-existent probes.

This makes it possible to create SDT probes in KLD modules, although there
are still two caveats: first, any SDT probes in a KLD module must be part
of a DTrace provider that is defined in that module.  At present DTrace
only destroys probes when the provider is destroyed, so you can still
panic the system if a KLD module creates new probes in a provider from a
different module(including the kernel) and then unload the the first module.

Second, the system will panic if you unload a module containing SDT probes
while there is an active D script that has enabled those probes.

MFC after:	1 month
2012-03-27 15:07:43 +00:00
melifaro
fd561480db - Add knlist_init_rw_reader() function to kqueue(9).
Function acquired reader lock if needed.
Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED)
- While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues

Reviewed by:    glebius
Approved by:    ae(mentor)

MFC after:      2 weeks
2012-03-26 09:34:17 +00:00
trociny
0079b1f6c5 Add a sysctl to set and retrieve binary osreldate of another process.
Suggested by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-03-23 20:05:41 +00:00
ae
bb8b607479 Correct debug message. 2012-03-22 09:29:07 +00:00
alc
e02fd6b842 Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB.  In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB.  This is exactly as
recommended in AMD's documentation.  In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry.  In short, this saves us from
having to perform potentially costly TLB flushes.  In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry.  Usually, such spurious page faults are handled
by vm_fault() without incident.  However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault().  This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with:	kib
Tested by:		gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after:	1 week
2012-03-22 04:52:51 +00:00
ae
f0e7ec67c0 Acquire modules lock before call module_getname() in the KLD_DEBUG case.
MFC after:	1 week
2012-03-21 09:48:32 +00:00
eadler
169b46c915 - Clean up timestamps in msgbuf code. The timestamps should now be
inserted after the priority token thus cleaning up the output.
- Remove the needless double internal do_add_char function.
- Resolve a possible deadlock if interrupts are
    disabled and getnanotime is called

Reviewed by:	bde  kmacy, avg, sbruno (various versions)
Approved by:	cperciva
MFC after:	2 weeks
2012-03-19 00:36:32 +00:00
jh
683a986c03 Cast wallclock.tv_sec to uint64_t to avoid overflow in the calculation.
PR:		kern/161552
Reviewed by:	trasz
Tested by:	Nikos Vassiliadis
MFC after:	1 week
2012-03-18 19:13:32 +00:00
davide
cd0c342e57 Add rudimentary profiling of the hash table used in the in the umtx code to
hold active lock queues.

Reviewed by:	attilio
Approved by:	davidxu, gnn (mentor)
MFC after:	3 weeks
2012-03-16 20:32:11 +00:00
tuexen
b8b34b6ecf Fix bugs which can result in a panic when an non-SCTP socket it
used with an sctp_ system-call which expects an SCTP socket.

MFC after: 3 days.
2012-03-15 14:13:38 +00:00
ae
894c8dc15b Add CTLFLAG_TUN to the sysctl definition and fix style.
Pointed by:	Garrett Cooper
MFC after:	2 weeks
2012-03-15 06:01:21 +00:00
ae
9be115302d Add debug.kld_debug loader tunable.
MFC after:	2 weeks
2012-03-15 05:11:29 +00:00
jh
59d9d84ca4 Add an assert for proctree_lock to proc_to_reap().
Discussed with:	kib
MFC after:	1 week
2012-03-14 15:52:23 +00:00
kib
6e85340add Lock the process around manipulations with p_flag.
Reported and reviewed by:	jh
MFC after:	3 days
2012-03-13 22:00:46 +00:00
adrian
f2bb6a85d7 Add module load/unload stubs. 2012-03-13 20:27:48 +00:00
mav
5b5fc4e585 Add kern.eventtimer.activetick tunable/sysctl, specifying whether each
hardclock() tick should be run on every active CPU, or on only one.

On my tests, avoiding extra interrupts because of this on 8-CPU Core i7
system with HZ=10000 saves about 2% of performance. At this moment option
implemented only for global timers, as reprogramming per-CPU timers is
too expensive now to be compensated by this benefit, especially since we
still have to regularly run hardclock() on at least one active CPU to
update system uptime. For global timer it is quite trivial: timer runs
always, but we just skip IPIs to other CPUs when possible.

Option is enabled by default now, keeping previous behavior, as periodic
hardclock() calls are still used at least to implement setitimer(2) with
ITIMER_VIRTUAL and ITIMER_PROF arguments. But since default schedulers don't
depend on it since r232917, we are much more free to experiment with it.

MFC after:	1 month
2012-03-13 10:21:08 +00:00
mav
ffaa080e67 Rewrite thread CPU usage percentage math to not depend on periodic calls
with HZ rate through the sched_tick() calls from hardclock().

Potentially it can be used to improve precision, but now it is just minus
one more reason to call hardclock() for every HZ tick on every active CPU.
SCHED_4BSD never used sched_tick(), but keep it in place for now, as at
least SCHED_FBFS existing in patches out of the tree depends on it.

MFC after:	1 month
2012-03-13 08:18:54 +00:00
pho
e35bb21f2c Allways call fdrop(). 2012-03-12 11:56:57 +00:00
kib
4e790f9b2b ELF image can have several PT_NOTE program headers. Look for the ELF
brand note in each header, instead of using only first one.

Reviewed by:	kan
Tested by:	andrew (arm), flo (sparc64)
MFC after:	3 weeks
2012-03-11 19:38:49 +00:00
kib
8adabb0356 Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by:	gianni
2012-03-11 12:19:58 +00:00
mav
4be9351f8b Revert r175376 and tune cpufreq(4) frequency comparison logic instead.
Instead of using 25MHz equality threshold, look for the nearest value when
handling dev.cpu.0.freq sysctl and for exact match when it is expected.

ACPI may report extra level with frequency 1MHz above the nominal to
control Intel Turbo Boost operation. It is not a bug, but feature:
dev.cpu.0.freq_levels: 2934/106000 2933/95000 2800/82000 ...
In this case value 2933 means 2.93GHz, but 2934 means 3.2-3.6GHz.

I've found that my Core i7-870 based system has Intel Turbo Boost disabled
by default and without this change it was absolutely invisible and hard
to control.

MFC after:	2 weeks
2012-03-10 18:56:16 +00:00
mav
1324baa4eb Idle ticks optimization:
- Pass number of events to the statclock() and profclock() functions
   same as to hardclock() before to not call them many times in a loop.
 - Rename them into statclock_cnt() and profclock_cnt().
 - Turn statclock() and profclock() into compatibility wrappers,
   still needed for arm.
 - Rename hardclock_anycpu() into hardclock_cnt() for unification.

MFC after:	1 week
2012-03-10 14:57:21 +00:00
trasz
a0d48d6f11 Remove useless thread_{lock,unlock}() in raccd. 2012-03-10 14:38:49 +00:00
jmallett
d25fa497f7 Export intrcnt correctly when running under 32-bit compatibility.
Reviewed by:	gonzo, nwhitehorn
2012-03-09 22:30:54 +00:00
pho
c84e05a07c Perform the parameter validation before assigning it to a signed int
variable. This fixes the problem seen with readdir(3) fuzzing.

Submitted by:	bde
MFC after:	1 week
2012-03-09 21:31:12 +00:00
mav
d6e827162d Make kern.sched.idlespinthresh default value adaptive depending of HZ.
Otherwise with HZ above 8000 CPU may never skip timer ticks on idle.
2012-03-09 19:09:08 +00:00
mav
fb50c869a4 Be more polite when setting state->nextevent inside cpu_new_callout().
Hardclock is not the only who wakes idle CPU since kdtrace cyclic addition.

MFC after:	2 weeks
2012-03-09 07:30:48 +00:00
kib
5abd2bb7cb Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after:	1 week
2012-03-09 00:12:05 +00:00
pho
81cae127b0 Free up allocated memory used by posix_fadvise(2). 2012-03-08 20:34:13 +00:00