Commit Graph

13705 Commits

Author SHA1 Message Date
bjk
7280c1da39 Check for mismatched vref()/vdrop()
Assert that the hold count has not fallen below the use count, a situation
that would only happen when a vref() (or similar) is erroneously paired
with a vdrop().  This situation has not been observed in the wild, but
could be helpful for someone implementing a new filesystem.

Reviewed by:	kib
Approved by:	hrs (mentor)
2014-05-21 03:11:27 +00:00
kib
235f4b1083 When exec_new_vmspace() decides that current vmspace cannot be reused
on execve(2), it calls vmspace_exec(), which frees the current
vmspace.  The thread executing an exec syscall gets new vmspace
assigned, and old vmspace is freed if only referenced by the current
process.  The free operation includes pmap_release(), which
de-constructs the paging structures used by hardware.

If the calling process is multithreaded, other threads are suspended
in the thread_suspend_check(), and need to be unsuspended and run to
be able to exit on successfull exec.  Now, since the old vmspace is
destroyed, paging structures are invalid, threads are resumed on the
non-existent pmaps (page tables), which leads to triple fault on x86.

To fix, postpone the free of old vmspace until the threads are resumed
and exited.  To avoid modifications to all image activators all of
which use exec_new_vmspace(), memoize the current (old) vmspace in
kern_execve(), and notify it about the need to call vmspace_free()
with a thread-private flag TDP_EXECVMSPC.

http://bugs.debian.org/743141

Reported by:	Ivo De Decker <ivo.dedecker@ugent.be> through secteam
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-05-20 09:19:35 +00:00
truckman
56f5be6dbc Slightly restructure the final loop in rman_reserve_resource_bound().
Replace with the existing loop termination test with a similar
condition from the nested "if" that may terminate the loop a bit
sooner, but still not too early.   This condition can then be removed
from the nested "if".  Relocate an operator to be style(9) compliant.

MFC after:	3 days
2014-05-19 04:44:27 +00:00
trasz
c857f75919 Initialize loginclass mutex using MTX_SYSINIT instead of using SI_SUB_CPU.
Suggested by:	rwatson@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-05-14 09:03:02 +00:00
truckman
c965763705 Be even more paranoid about overflow.
Requested by:	ache
2014-05-12 20:22:42 +00:00
truckman
a5f5868e49 Nuke a couple of unnecessary assigments. Nothing uses the values of rstart
and rend after this point.

MFC after:	1 week
2014-05-12 17:56:52 +00:00
jilles
3009646c96 accept(),accept4(): Don't set *addrlen = 0 on [ECONNABORTED].
If the underlying protocol reported an error (e.g. because a connection was
closed while waiting in the queue), this error was also indicated by
returning a zero-length address. For all other kinds of errors (e.g.
[EAGAIN], [ENFILE], [EMFILE]), *addrlen is unmodified and there are
successful cases where a zero-length address is returned (e.g. a connection
from an unbound Unix-domain socket), so this error indication is not
reliable.

As reported in Austin Group bug #836, modifying *addrlen on error may cause
subtle bugs if applications retry the call without resetting *addrlen.
2014-05-11 21:21:14 +00:00
cperciva
e9e7ccdea5 In cf_get_method, when we don't already know what clock speed the CPU is
running at, guess the nearest value instead of looking for a value within
25 MHz of the observed frequency.

Prior to this change, if a system booted with Intel Turbo Boost enabled,
the dev.cpu.0.freq sysctl is nonfunctional, since the ACPI-reported
frequency for Turbo Boost states does not match the actual clock frequency
(and thus no levels are within 25 MHz of the observed frequency) and the
current performance level is read before a new level is set.

MFC after:	3 days
Relnotes:	Bug fix in power management on CPUs with Intel Turbo Boost
2014-05-11 10:32:58 +00:00
adrian
65fc060c58 Add in support to optionally pin the swi threads.
Under enough load, the swi's can actually be preempted and migrated
to other currently free cores.  When doing RSS experiments, this lead
to the per-CPU TCP timers not lining up any more with the RX CPU said
flows were ending up on, leading to increased lock contention.

Since there was a little pushback on flipping them on by default,
I've left the default at "don't pin."

The other less obvious problem here is that the default swi
is also the same as the destination swi for CPU #0.  So if one
pins the swi on CPU #0, there's no default floating swi.

A nice future project would be to create a separate swi for
the "default" floating swi, as well as per-CPU swis that are
(optionally) pinned.

Tested:

* parallel TCP tests (2 x 1g unfortunately for now);
  CPU: Intel(R) Xeon(R) CPU E5-2650

Note:

This is based on some initial investigation into RSS/TCP stack lock
contention on FreeBSD-HEAD whilst at Netflix in January 2014.
2014-05-10 00:53:36 +00:00
truckman
c163eb4b72 Avoid unsigned integer overflow which can cause
rman_reserve_resource_bound() to return incorrect results.

Continue the initial search until the first viable region is found.

Add a comment to explain the search termination test.

PR:		kern/188534
Reviewed by:	jhb (previous version)
MFC after:	1 week
2014-05-05 15:59:31 +00:00
mjg
1b83ce1524 Request a non-exiting process in sysctl_kern_proc_{o,}filedesc
This fixes a race with exit1 freeing p_textvp.

Suggested by:	kib
MFC after:	1 week
2014-05-02 21:55:09 +00:00
brueffer
09b97bcfaf Free resources in an error case.
CID:		1018947
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-05-02 21:34:17 +00:00
rwatson
7ca3cc4389 Garbage collect mtxpool_lockbuilder, the mutex pool historically used
for lockmgr and sx interlocks, but unused since optimised versions of
those sleep locks were introduced.  This will save a (quite) small
amount of memory in all kernel configurations.  The sleep mutex pool is
retained as it is used for 'struct bio' and several other consumers.

Discussed with:	jhb
MFC after:	3 days
2014-05-02 07:57:40 +00:00
mjg
7ec4134d19 Ignore the error from pipespace_new when creating a pipe.
It can fail if pipe map is exhausted (as a result of too many pipes created),
but it is not fatal and could be provoked by unprivileged users. The only
consequence is worse performance with given pipe.

Reported by:	ivoras
Suggested by:	kib
MFC after:	1 week
2014-05-02 00:52:13 +00:00
brooks
f9a10360bc Fix a 2038 bug.
If time_t is 64-bit (i.e. isn't 32-bit) allow any value of year, not
just years less than 2038.

Don't bother fixing the underflow in the case of years before 1903.

MFC after:	1 week
Sponsored by:	DARPA, AFRL
2014-05-01 22:28:14 +00:00
marius
6678ece656 Given that as of r258002 the last external user is gone, make sched_lock
static.
2014-04-29 20:51:57 +00:00
grehan
455d465e40 Bump WITNESS_PENDLIST by MAXCPU to account for the
pmap pvlist locks which are scaled by MAXCPU.

This allows an amd64 system to boot with MAXCPU set
to 256, which is currently FreeBSD's hard limit without
x2apic support.

Compile-tested for other arch's.

PR:	185831
Discussed with:		jhb
MFC after:	3 weeks
2014-04-29 17:22:29 +00:00
brooks
2e62cc3f35 Revert r263754, re-adding support for hw.bus.devctl_disable. Breaking
old devd's and thus hosts that get IP addresses from DHCP was too much
of a POLA violation.

The sysctl may be removed again after r263758 has been merged to at
least stable/9 and stable/10, and releases have been cut from those
branches.

Discussed with:	mjg
Reported by:	theraven, rwatson
2014-04-28 20:38:08 +00:00
scottl
62a64f0d2b Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code.  Replace its use with smp_started.  There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix
2014-04-26 20:27:54 +00:00
bdrewery
9c2b40ff13 Fix grammar error and trailing newline.
Submitted by:	danfe
MFC after:	3 days
2014-04-23 02:21:17 +00:00
ian
40b5add506 Fix a comment typo; conversion tables are for leap years, not leap seconds. 2014-04-20 13:37:22 +00:00
kib
201ea63020 Fix typo.
MFC after:	3 days
2014-04-17 18:13:23 +00:00
np
bf77eba909 Do not set M_BESTFIT if a strategy has already been provided. This
fixes problems when using M_FIRSTFIT.

Reviewed by:	jeff@
MFC after:	1 week
2014-04-16 21:39:43 +00:00
mav
5058ec1a0c Fix VIRTUAL and PROF interval timers for short intervals, broken at r247903.
Due to the way those timers are implemented, we can't handle very short
intervals.  In addition to that mentioned patch caused math overflows
for short intervals.  To avoid that round those intervals to 1 tick.

PR:		kern/187668
MFC after:	1 week
2014-04-16 18:37:46 +00:00
brueffer
e3e2951f67 Refine r264422: set buf to NULL only when we don't allocate memory,
and free buf unconditionally.

Requested by:	kib
MFC after:	1 week
2014-04-14 21:02:20 +00:00
brueffer
e539c25cad Free buf after usage.
CID:		1199377
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-04-13 21:23:15 +00:00
davide
aeb4d19339 Hide internal details of sbintime_t implementation wrapping INT64_MAX into
SBT_MAX, to make it more robust in case internal type representation will
change in the future. All the consumers were migrated to SBT_MAX and
every new consumer (if any) should from now use this interface.

Requested by:	bapt, jmg, Ryan Lortie (implictly)
Reviewed by:	mav, bde
2014-04-12 23:29:29 +00:00
bdrewery
b749ffee6a Use proper MFSNAMELEN for fs type.
MFC after:	2 weeks
Reviewed by:	rodrigc
Also spotted by:ambrisko
2014-04-12 21:39:17 +00:00
davidxu
89ffe95c09 Add kqueue support for devctl.
Reviewed by:	kib,mjg
2014-04-10 02:30:51 +00:00
sbruno
d798d00aab sys/kern/imgact_binmisc.c -- free the right pointer mask vs magic
sys/sys/imagact_binmisc.h -- cleanup white space tabs vs spaces
                          -- remove stray " in comment

Submitted by:	jmallett@
2014-04-08 22:12:01 +00:00
sbruno
168c6df708 Add Stacey Son's binary activation patches that allow remapping of
execution to a emumation program via parsing of ELF header information.

With this kernel module and userland tool, poudriere is able to build
ports packages via the QEMU userland tools (or another emulator program)
in a different architecture chroot, e.g. TARGET=mips TARGET_ARCH=mips

I'm not connecting this to GENERIC for obvious reasons, but this should
allow the kernel module to be built by default and enable the building
of the userland tool (which automatically loads the kernel module).

Submitted by:	sson@
Reviewed by:	jhb@
2014-04-08 20:10:22 +00:00
ray
4fc5a92a24 Do not fill screen, while muted.
Sponsored by:	The FreeBSD Foundation
2014-04-07 22:37:13 +00:00
ed
fe8b776487 Thinko: don't forget to apply 'howto' in case init(8) isn't running. 2014-04-07 21:18:12 +00:00
ed
22932578aa Clean up shutdown_nice(). Just send the right signal to init(8).
Right now, init(8) cannot distinguish between an ACPI power button press
or a Ctrl+Alt+Del sequence on the keyboard. This is because
shutdown_nice() sends SIGINT to init(8) unconditionally, but later
modifies the arguments to reboot(2) to force a certain behaviour.

Instead of doing this, patch up the code to just forward the appropriate
signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the
system.

While there, move waittime to the function where it's used; kern_reboot().
2014-04-07 21:11:29 +00:00
ed
87c17a9c66 Implement kqueue(2) for procdesc(4).
kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.

Notes:

- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
  be used on totally different descriptor types, this should not clash.

- Let procdesc_kqops_event() reuse the same structure as filt_proc().
  The only difference is that procdesc_kqops_event() should also be able
  to deal with the case where the process was already terminated after
  registration. Simply test this when hint == 0.

- Fix some style(9) issues in filt_proc() to keep it consistent with the
  newly added procdesc_kqops_event().

- Save the exit status of the process in pd->pd_xstat, as we cannot pick
  up the proctree_lock from within procdesc_kqops_event().

Discussed on:	arch@
Reviewed by:	kib@
2014-04-07 18:10:49 +00:00
ed
2443e8f2ac Fix a typo. The function name is pdfork; not pfork. 2014-04-06 20:20:07 +00:00
ed
b97204b786 Nit: fix locking of p->p_state in procdesc_close().
According to <sys/proc.h>, this field needs to be locked with either the
p_mtx or the p_slock. In this case the damage was quite small. Instead
of being reaped, the process would just be reparented to init, so it
could be reaped from there.
2014-04-06 20:00:42 +00:00
kib
e62844041e Use realloc(9) instead of doing the reallocation inline.
Submitted by:	bde
MFC after:	1 week
2014-04-05 20:44:52 +00:00
dchagin
e0cab82fb2 Prevent alq from panic when the invalid alq_file path specified.
MFC after:	1 week
2014-04-05 16:54:47 +00:00
kib
e965005b68 When KN_INFLUX is set on the knote due to kqueue_register() or
kqueue_scan() unlocking the kqueue to call f_event, knote() or
knote_fork() should not skip the knote.  The knote is not going to
disappear during the influx time, and the mutual exclusion between
scan and knote() is ensured by both code pathes taking knlist lock.
The race appears since knlist lock is before kq lock, so KN_INFLUX
must be set, kq lock must be dropped and only then knlist lock can be
taken.  The window between kq unlock and knlist lock causes lost
events.

Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe
for the knote(), and check for it to ignore KN_INFLUX in the knote*()
as needed.  Also, in knote(), remove the lockless check for the
KN_INFLUX flag, which could also result in the lost notification.

Reported and tested by:	Kohji Okuno <okuno.kohji@jp.panasonic.com>
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-04-05 14:09:16 +00:00
emaste
18755bc3d6 Initialise m_pkthdr via bzero instead of explicitly zeroing each member
Sponsored by:	The FreeBSD Foundation
2014-04-04 21:09:06 +00:00
davidxu
80a7722c32 Fix SIGIO delivery. Use fsetown() to handle file descriptor owner
ioctl and use pgsigio() to send SIGIO.

Submitted by:	truckman
Reviewed by:	mjg
2014-04-04 12:31:13 +00:00
mjg
439611d0ad Garbage collect fdavail.
It rarely returns an error and fdallocn handles the failure of fdalloc
just fine.
2014-04-04 05:07:36 +00:00
ian
0e63af86f9 Fix build breakage. Apparently all ARM configs build kern_et.c, but only a
few of them also build kern_clocksource.c.  That strikes me as insane, but
maybe there's a good reason for it.  Until I figure that out, un-break
the build by not referencing functions in kern_clocksource if NO_EVENTTIMERS
is defined.
2014-04-02 17:34:17 +00:00
ian
2b2f1d5e5c Add support for event timers whose clock frequency can change while running. 2014-04-02 15:56:11 +00:00
mjg
028af9e58b Document a known problem with handling the process intended to receive
SIGIO in /dev/devctl.

Suggested by:	adrian
MFC after:	6 days
2014-03-25 23:30:35 +00:00
mjg
6131eec5de Remove long obsolete sysctl hw.bus.devctl_disable.
Suggested by:	imp
Relnotes:	yes
2014-03-25 23:19:45 +00:00
mjg
47c4497087 Remove lockless check in devopen, while correct it does not make much sense.
Suggested by:	imp
MFC after:	6 days
2014-03-25 23:13:46 +00:00
mjg
ff55131045 Make /dev/devctl mpsafe.
MFC after:	1 week
2014-03-25 03:28:58 +00:00
emax
4d523456c7 change defaule permissions on /dev/devstat. while i'm here remove
D_NEEDGIANT flag

Submitted by:	jhb
Reviewed by:	jhb, scottl, rwatson, delphij, phk
MFC after:	1 week
2014-03-24 18:13:41 +00:00