Commit Graph

13760 Commits

Author SHA1 Message Date
dchagin
538f396887 To allow to run the interpreter itself add a new ELF branding type.
Allow Linux ABI to run ELF interpreter.

MFC after:	3 days
2014-05-31 15:01:51 +00:00
glebius
999343aa9c Whitespace only. 2014-05-30 08:22:58 +00:00
markj
4c818572b7 Commit the rest of the changes that were intended to be part of r266826.
X-MFC-with:	r266826
2014-05-29 01:42:22 +00:00
truckman
6c648960e7 Initialize r_flags the same way in all cases using a sanitized copy of
flags that has several bits cleared. The RF_WANTED and RF_FIRSTSHARE
bits are invalid in this context, and we want to defer setting RF_ACTIVE
in r_flags until later.  This should make rman_get_flags() return
the correct answer in all cases.

Add a KASSERT() to catch callers which incorrectly pass the RF_WANTED
or RF_FIRSTSHARE flags.

Do a strict equality check on the share type bits of flags.  In
particular, do an equality check on RF_PREFETCHABLE.  The previous
code would allow one type of mismatch of RF_PREFETCHABLE but disallow
the other type of mismatch.  Also, ignore the the RF_ALIGNMENT_MASK
bits since alignment validity should be handled by the amask check.
This field contains an integer value, but previous code did a strange
bitwise comparison on it.

Leave the original value of flags unmolested as a minor debug aid.

Change the start+amask overflow check to a KASSERT() since it is just
meant to catch a highly unlikely programming error in the caller.

Reviewed by:	jhb
MFC after:	1 month
2014-05-28 16:57:17 +00:00
adrian
578ca42efc Add a new taskqueue setup method that takes a cpuid to pin the
taskqueue worker thread(s) to.

For now it isn't a taskqueue/taskthread error to fail to pin
to the given cpuid.

Thanks to rpaulo@, kib@ and jhb@ for feedback.

Tested:

* igb(4), with local RSS patches to pin taskqueues.

TODO:

* ask the doc team for help in documenting the new API call.
* add a taskqueue_start_threads_cpuset() method which takes
  a cpuset_t - but this may require a bunch of surgery to
  bring cpuset_t into scope.
2014-05-24 20:37:15 +00:00
bjk
7280c1da39 Check for mismatched vref()/vdrop()
Assert that the hold count has not fallen below the use count, a situation
that would only happen when a vref() (or similar) is erroneously paired
with a vdrop().  This situation has not been observed in the wild, but
could be helpful for someone implementing a new filesystem.

Reviewed by:	kib
Approved by:	hrs (mentor)
2014-05-21 03:11:27 +00:00
kib
235f4b1083 When exec_new_vmspace() decides that current vmspace cannot be reused
on execve(2), it calls vmspace_exec(), which frees the current
vmspace.  The thread executing an exec syscall gets new vmspace
assigned, and old vmspace is freed if only referenced by the current
process.  The free operation includes pmap_release(), which
de-constructs the paging structures used by hardware.

If the calling process is multithreaded, other threads are suspended
in the thread_suspend_check(), and need to be unsuspended and run to
be able to exit on successfull exec.  Now, since the old vmspace is
destroyed, paging structures are invalid, threads are resumed on the
non-existent pmaps (page tables), which leads to triple fault on x86.

To fix, postpone the free of old vmspace until the threads are resumed
and exited.  To avoid modifications to all image activators all of
which use exec_new_vmspace(), memoize the current (old) vmspace in
kern_execve(), and notify it about the need to call vmspace_free()
with a thread-private flag TDP_EXECVMSPC.

http://bugs.debian.org/743141

Reported by:	Ivo De Decker <ivo.dedecker@ugent.be> through secteam
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-05-20 09:19:35 +00:00
truckman
56f5be6dbc Slightly restructure the final loop in rman_reserve_resource_bound().
Replace with the existing loop termination test with a similar
condition from the nested "if" that may terminate the loop a bit
sooner, but still not too early.   This condition can then be removed
from the nested "if".  Relocate an operator to be style(9) compliant.

MFC after:	3 days
2014-05-19 04:44:27 +00:00
trasz
c857f75919 Initialize loginclass mutex using MTX_SYSINIT instead of using SI_SUB_CPU.
Suggested by:	rwatson@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2014-05-14 09:03:02 +00:00
truckman
c965763705 Be even more paranoid about overflow.
Requested by:	ache
2014-05-12 20:22:42 +00:00
truckman
a5f5868e49 Nuke a couple of unnecessary assigments. Nothing uses the values of rstart
and rend after this point.

MFC after:	1 week
2014-05-12 17:56:52 +00:00
jilles
3009646c96 accept(),accept4(): Don't set *addrlen = 0 on [ECONNABORTED].
If the underlying protocol reported an error (e.g. because a connection was
closed while waiting in the queue), this error was also indicated by
returning a zero-length address. For all other kinds of errors (e.g.
[EAGAIN], [ENFILE], [EMFILE]), *addrlen is unmodified and there are
successful cases where a zero-length address is returned (e.g. a connection
from an unbound Unix-domain socket), so this error indication is not
reliable.

As reported in Austin Group bug #836, modifying *addrlen on error may cause
subtle bugs if applications retry the call without resetting *addrlen.
2014-05-11 21:21:14 +00:00
cperciva
e9e7ccdea5 In cf_get_method, when we don't already know what clock speed the CPU is
running at, guess the nearest value instead of looking for a value within
25 MHz of the observed frequency.

Prior to this change, if a system booted with Intel Turbo Boost enabled,
the dev.cpu.0.freq sysctl is nonfunctional, since the ACPI-reported
frequency for Turbo Boost states does not match the actual clock frequency
(and thus no levels are within 25 MHz of the observed frequency) and the
current performance level is read before a new level is set.

MFC after:	3 days
Relnotes:	Bug fix in power management on CPUs with Intel Turbo Boost
2014-05-11 10:32:58 +00:00
adrian
65fc060c58 Add in support to optionally pin the swi threads.
Under enough load, the swi's can actually be preempted and migrated
to other currently free cores.  When doing RSS experiments, this lead
to the per-CPU TCP timers not lining up any more with the RX CPU said
flows were ending up on, leading to increased lock contention.

Since there was a little pushback on flipping them on by default,
I've left the default at "don't pin."

The other less obvious problem here is that the default swi
is also the same as the destination swi for CPU #0.  So if one
pins the swi on CPU #0, there's no default floating swi.

A nice future project would be to create a separate swi for
the "default" floating swi, as well as per-CPU swis that are
(optionally) pinned.

Tested:

* parallel TCP tests (2 x 1g unfortunately for now);
  CPU: Intel(R) Xeon(R) CPU E5-2650

Note:

This is based on some initial investigation into RSS/TCP stack lock
contention on FreeBSD-HEAD whilst at Netflix in January 2014.
2014-05-10 00:53:36 +00:00
truckman
c163eb4b72 Avoid unsigned integer overflow which can cause
rman_reserve_resource_bound() to return incorrect results.

Continue the initial search until the first viable region is found.

Add a comment to explain the search termination test.

PR:		kern/188534
Reviewed by:	jhb (previous version)
MFC after:	1 week
2014-05-05 15:59:31 +00:00
mjg
1b83ce1524 Request a non-exiting process in sysctl_kern_proc_{o,}filedesc
This fixes a race with exit1 freeing p_textvp.

Suggested by:	kib
MFC after:	1 week
2014-05-02 21:55:09 +00:00
brueffer
09b97bcfaf Free resources in an error case.
CID:		1018947
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-05-02 21:34:17 +00:00
rwatson
7ca3cc4389 Garbage collect mtxpool_lockbuilder, the mutex pool historically used
for lockmgr and sx interlocks, but unused since optimised versions of
those sleep locks were introduced.  This will save a (quite) small
amount of memory in all kernel configurations.  The sleep mutex pool is
retained as it is used for 'struct bio' and several other consumers.

Discussed with:	jhb
MFC after:	3 days
2014-05-02 07:57:40 +00:00
mjg
7ec4134d19 Ignore the error from pipespace_new when creating a pipe.
It can fail if pipe map is exhausted (as a result of too many pipes created),
but it is not fatal and could be provoked by unprivileged users. The only
consequence is worse performance with given pipe.

Reported by:	ivoras
Suggested by:	kib
MFC after:	1 week
2014-05-02 00:52:13 +00:00
brooks
f9a10360bc Fix a 2038 bug.
If time_t is 64-bit (i.e. isn't 32-bit) allow any value of year, not
just years less than 2038.

Don't bother fixing the underflow in the case of years before 1903.

MFC after:	1 week
Sponsored by:	DARPA, AFRL
2014-05-01 22:28:14 +00:00
marius
6678ece656 Given that as of r258002 the last external user is gone, make sched_lock
static.
2014-04-29 20:51:57 +00:00
grehan
455d465e40 Bump WITNESS_PENDLIST by MAXCPU to account for the
pmap pvlist locks which are scaled by MAXCPU.

This allows an amd64 system to boot with MAXCPU set
to 256, which is currently FreeBSD's hard limit without
x2apic support.

Compile-tested for other arch's.

PR:	185831
Discussed with:		jhb
MFC after:	3 weeks
2014-04-29 17:22:29 +00:00
brooks
2e62cc3f35 Revert r263754, re-adding support for hw.bus.devctl_disable. Breaking
old devd's and thus hosts that get IP addresses from DHCP was too much
of a POLA violation.

The sysctl may be removed again after r263758 has been merged to at
least stable/9 and stable/10, and releases have been cut from those
branches.

Discussed with:	mjg
Reported by:	theraven, rwatson
2014-04-28 20:38:08 +00:00
scottl
62a64f0d2b Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code.  Replace its use with smp_started.  There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix
2014-04-26 20:27:54 +00:00
bdrewery
9c2b40ff13 Fix grammar error and trailing newline.
Submitted by:	danfe
MFC after:	3 days
2014-04-23 02:21:17 +00:00
ian
40b5add506 Fix a comment typo; conversion tables are for leap years, not leap seconds. 2014-04-20 13:37:22 +00:00
kib
201ea63020 Fix typo.
MFC after:	3 days
2014-04-17 18:13:23 +00:00
np
bf77eba909 Do not set M_BESTFIT if a strategy has already been provided. This
fixes problems when using M_FIRSTFIT.

Reviewed by:	jeff@
MFC after:	1 week
2014-04-16 21:39:43 +00:00
mav
5058ec1a0c Fix VIRTUAL and PROF interval timers for short intervals, broken at r247903.
Due to the way those timers are implemented, we can't handle very short
intervals.  In addition to that mentioned patch caused math overflows
for short intervals.  To avoid that round those intervals to 1 tick.

PR:		kern/187668
MFC after:	1 week
2014-04-16 18:37:46 +00:00
brueffer
e3e2951f67 Refine r264422: set buf to NULL only when we don't allocate memory,
and free buf unconditionally.

Requested by:	kib
MFC after:	1 week
2014-04-14 21:02:20 +00:00
brueffer
e539c25cad Free buf after usage.
CID:		1199377
Found with:	Coverity Prevent(tm)
MFC after:	1 week
2014-04-13 21:23:15 +00:00
davide
aeb4d19339 Hide internal details of sbintime_t implementation wrapping INT64_MAX into
SBT_MAX, to make it more robust in case internal type representation will
change in the future. All the consumers were migrated to SBT_MAX and
every new consumer (if any) should from now use this interface.

Requested by:	bapt, jmg, Ryan Lortie (implictly)
Reviewed by:	mav, bde
2014-04-12 23:29:29 +00:00
bdrewery
b749ffee6a Use proper MFSNAMELEN for fs type.
MFC after:	2 weeks
Reviewed by:	rodrigc
Also spotted by:ambrisko
2014-04-12 21:39:17 +00:00
davidxu
89ffe95c09 Add kqueue support for devctl.
Reviewed by:	kib,mjg
2014-04-10 02:30:51 +00:00
sbruno
d798d00aab sys/kern/imgact_binmisc.c -- free the right pointer mask vs magic
sys/sys/imagact_binmisc.h -- cleanup white space tabs vs spaces
                          -- remove stray " in comment

Submitted by:	jmallett@
2014-04-08 22:12:01 +00:00
sbruno
168c6df708 Add Stacey Son's binary activation patches that allow remapping of
execution to a emumation program via parsing of ELF header information.

With this kernel module and userland tool, poudriere is able to build
ports packages via the QEMU userland tools (or another emulator program)
in a different architecture chroot, e.g. TARGET=mips TARGET_ARCH=mips

I'm not connecting this to GENERIC for obvious reasons, but this should
allow the kernel module to be built by default and enable the building
of the userland tool (which automatically loads the kernel module).

Submitted by:	sson@
Reviewed by:	jhb@
2014-04-08 20:10:22 +00:00
ray
4fc5a92a24 Do not fill screen, while muted.
Sponsored by:	The FreeBSD Foundation
2014-04-07 22:37:13 +00:00
ed
fe8b776487 Thinko: don't forget to apply 'howto' in case init(8) isn't running. 2014-04-07 21:18:12 +00:00
ed
22932578aa Clean up shutdown_nice(). Just send the right signal to init(8).
Right now, init(8) cannot distinguish between an ACPI power button press
or a Ctrl+Alt+Del sequence on the keyboard. This is because
shutdown_nice() sends SIGINT to init(8) unconditionally, but later
modifies the arguments to reboot(2) to force a certain behaviour.

Instead of doing this, patch up the code to just forward the appropriate
signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the
system.

While there, move waittime to the function where it's used; kern_reboot().
2014-04-07 21:11:29 +00:00
ed
87c17a9c66 Implement kqueue(2) for procdesc(4).
kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.

Notes:

- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
  be used on totally different descriptor types, this should not clash.

- Let procdesc_kqops_event() reuse the same structure as filt_proc().
  The only difference is that procdesc_kqops_event() should also be able
  to deal with the case where the process was already terminated after
  registration. Simply test this when hint == 0.

- Fix some style(9) issues in filt_proc() to keep it consistent with the
  newly added procdesc_kqops_event().

- Save the exit status of the process in pd->pd_xstat, as we cannot pick
  up the proctree_lock from within procdesc_kqops_event().

Discussed on:	arch@
Reviewed by:	kib@
2014-04-07 18:10:49 +00:00
ed
2443e8f2ac Fix a typo. The function name is pdfork; not pfork. 2014-04-06 20:20:07 +00:00
ed
b97204b786 Nit: fix locking of p->p_state in procdesc_close().
According to <sys/proc.h>, this field needs to be locked with either the
p_mtx or the p_slock. In this case the damage was quite small. Instead
of being reaped, the process would just be reparented to init, so it
could be reaped from there.
2014-04-06 20:00:42 +00:00
kib
e62844041e Use realloc(9) instead of doing the reallocation inline.
Submitted by:	bde
MFC after:	1 week
2014-04-05 20:44:52 +00:00
dchagin
e0cab82fb2 Prevent alq from panic when the invalid alq_file path specified.
MFC after:	1 week
2014-04-05 16:54:47 +00:00
kib
e965005b68 When KN_INFLUX is set on the knote due to kqueue_register() or
kqueue_scan() unlocking the kqueue to call f_event, knote() or
knote_fork() should not skip the knote.  The knote is not going to
disappear during the influx time, and the mutual exclusion between
scan and knote() is ensured by both code pathes taking knlist lock.
The race appears since knlist lock is before kq lock, so KN_INFLUX
must be set, kq lock must be dropped and only then knlist lock can be
taken.  The window between kq unlock and knlist lock causes lost
events.

Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe
for the knote(), and check for it to ignore KN_INFLUX in the knote*()
as needed.  Also, in knote(), remove the lockless check for the
KN_INFLUX flag, which could also result in the lost notification.

Reported and tested by:	Kohji Okuno <okuno.kohji@jp.panasonic.com>
Discussed with:	jmg
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-04-05 14:09:16 +00:00
emaste
18755bc3d6 Initialise m_pkthdr via bzero instead of explicitly zeroing each member
Sponsored by:	The FreeBSD Foundation
2014-04-04 21:09:06 +00:00
davidxu
80a7722c32 Fix SIGIO delivery. Use fsetown() to handle file descriptor owner
ioctl and use pgsigio() to send SIGIO.

Submitted by:	truckman
Reviewed by:	mjg
2014-04-04 12:31:13 +00:00
mjg
439611d0ad Garbage collect fdavail.
It rarely returns an error and fdallocn handles the failure of fdalloc
just fine.
2014-04-04 05:07:36 +00:00
ian
0e63af86f9 Fix build breakage. Apparently all ARM configs build kern_et.c, but only a
few of them also build kern_clocksource.c.  That strikes me as insane, but
maybe there's a good reason for it.  Until I figure that out, un-break
the build by not referencing functions in kern_clocksource if NO_EVENTTIMERS
is defined.
2014-04-02 17:34:17 +00:00
ian
2b2f1d5e5c Add support for event timers whose clock frequency can change while running. 2014-04-02 15:56:11 +00:00
mjg
028af9e58b Document a known problem with handling the process intended to receive
SIGIO in /dev/devctl.

Suggested by:	adrian
MFC after:	6 days
2014-03-25 23:30:35 +00:00
mjg
6131eec5de Remove long obsolete sysctl hw.bus.devctl_disable.
Suggested by:	imp
Relnotes:	yes
2014-03-25 23:19:45 +00:00
mjg
47c4497087 Remove lockless check in devopen, while correct it does not make much sense.
Suggested by:	imp
MFC after:	6 days
2014-03-25 23:13:46 +00:00
mjg
ff55131045 Make /dev/devctl mpsafe.
MFC after:	1 week
2014-03-25 03:28:58 +00:00
emax
4d523456c7 change defaule permissions on /dev/devstat. while i'm here remove
D_NEEDGIANT flag

Submitted by:	jhb
Reviewed by:	jhb, scottl, rwatson, delphij, phk
MFC after:	1 week
2014-03-24 18:13:41 +00:00
neel
a2b3b7f5fa Don't lose track of the KTR entries copied from 'ktr_buf_init[]' to the
dynamically allocated 'ktr_buf[]'.

The memcpy arranges 'ktr_buf[]' such that the latest KTR entry is at
'KTR_BOOT_ENTRIES - 1'.
2014-03-22 22:35:57 +00:00
bdrewery
6fcf6199a4 Rename global cnt to vm_cnt to avoid shadowing.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from:	arch@
Sponsored by:	EMC / Isilon Storage Division
2014-03-22 10:26:09 +00:00
mjg
df8e97fc8b Mark the following sysctls as MPSAFE:
kern.file
kern.proc.filedesc
kern.proc.ofiledesc

MFC after:	7 days
2014-03-21 19:12:05 +00:00
kib
24c4e4a548 Fix two issues with /dev/mem access on amd64, both causing kernel page
faults.

First, for accesses to direct map region should check for the limit by
which direct map is instantiated.

Second, for accesses to the kernel map, success returned from the
kernacc(9) does not guarantee that consequent attempt to read or write
to the checked address succeed, since other thread might invalidate
the address meantime.  Add a new thread private flag TDP_DEVMEMIO,
which instructs vm_fault() to return error when fault happens on the
MAP_ENTRY_NOFAULT entry, instead of panicing.  The trap handler would
then see a page fault from access, and recover in normal way, making
/dev/mem access safer.

Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed
and having Giant locked does not solve issues for amd64.

Note that at least the second issue exists on other architectures, and
requires similar patching for md code.

Reported and tested by:	clusteradm (gjb, sbruno)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-21 14:25:09 +00:00
mjg
103a66d7d0 Take filedesc lock only for reading when allocating new fdtable.
Code populating the table does this already.

MFC after:	1 week
2014-03-21 01:34:19 +00:00
attilio
26e1531d4b Fix comments.
Sponsored by:	EMC / Isilon Storage Division
2014-03-19 12:45:40 +00:00
kib
b236080eb1 Make the array pointed to by AT_PAGESIZES auxv properly aligned.
Also, remove the expression which calculated the location of the
strings for a new image and grown over the time to be
non-comprehensible.  Instead, calculate the offsets by steps, which
also makes fixing the alignments much cleaner.

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-03-19 12:35:04 +00:00
attilio
060e6c4c4b Fix GENERIC build. 2014-03-19 00:38:27 +00:00
attilio
f931c33558 Regen per r263318.
Sponsored by:	EMC / Isilon storage division
2014-03-18 21:34:11 +00:00
attilio
25d02685fb Remove dead code from umtx support:
- Retire long time unused (basically always unused) sys__umtx_lock()
  and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall

__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jhb
2014-03-18 21:32:03 +00:00
rwatson
33fdc14c0c Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after:	3 weeks
2014-03-16 10:55:57 +00:00
jmg
b66f059b49 change td_retval into a union w/ off_t, with defines to mask the
change...  This eliminates a cast, and also forces td_retval
(often 2 32-bit registers) to be aligned so that off_t's can be
stored there on arches with strict alignment requirements like
armeb (AVILA)...  On i386, this doesn't change alignment, and on
amd64 it doesn't either, as register_t is already 64bits...

This will also prevent future breakage due to people adding additional
fields to the struct...

This gets AVILA booting a bit farther...

Reviewed by:	bde
2014-03-16 00:53:40 +00:00
glebius
80e85e32a5 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
glebius
d494babace Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
bdrewery
fd2362b1d8 Combine similar code from vprintf(9) and log(9).
MFC after:	2 weeks
2014-03-14 01:17:11 +00:00
asomers
b56bd6b02a Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleaner
mechanism, based on the new SB_STOP sockbuf flag.  The old hack dynamically
changed the sending sockbuf's high water mark whenever adding or removing
data from the receiving sockbuf.  It worked for stream sockets, but it never
worked for SOCK_SEQPACKET sockets because of their atomic nature.  If the
sockbuf was partially full, it might return EMSGSIZE instead of blocking.

The new solution is based on DragonFlyBSD's fix from commit
3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27.  It adds an SB_STOP
flag to sockbufs.  Whenever uipc_send surpasses the socket's size limit, it
sets SB_STOP on the sending sockbuf.  sbspace() will then return 0 for that
sockbuf, causing sosend_generic and friends to block.  uipc_rcvd will
likewise clear SB_STOP.  There are two fringe benefits: uipc_{send,rcvd} no
longer need to call chgsbsize() on every send and receive because they don't
change the sockbuf's high water mark.  Also, uipc_sense no longer needs to
acquire the UIPC linkage lock, because it's simpler to compute the
st_blksizes.

There is one drawback: since sbspace() will only ever return 0 or the
maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum
size by at most one packet of size less than the max.  I don't think that's
a serious problem.  In fact, I'm not even positive that FreeBSD guarantees a
socket will always stay within its nominal size limit.

sys/sys/sockbuf.h
	Add the SB_STOP flag and adjust sbspace()

sys/sys/unpcb.h
	Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb.

sys/kern/uipc_usrreq.c
	Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP
	backpressure mechanism.  Removing obsolete unpcb fields from
	db_show_unpcb.

tests/sys/kern/unix_seqpacket_test.c
	Clear expected failures from ATF.

Obtained from:	DragonFly BSD
PR:		kern/185812
Reviewed by:	silence from freebsd-net@ and rwatson@
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-03-13 18:42:12 +00:00
kib
801489b5e7 Use correct types for sizeof() in the calculations for the malloc(9) sizes [1].
While there, remove unneeded checks for failed allocations with M_WAITOK flag.

Submitted by:	Conrad Meyer <cemeyer@uw.edu> [1]
MFC after:	1 week
2014-03-12 10:25:26 +00:00
kib
f7d0f51921 The auio structure is only initialized when the vnode is symlink,
avoid reading from it otherwise.

Submitted by:	Conrad Meyer <cemeyer@uw.edu>
MFC after:	1 week
2014-03-12 10:23:51 +00:00
jeff
216dedc4bf - Make runq_steal_from more aggressive. Previously it would examine only
a single priority queue.  If that queue had a thread or threads which
   could not be migrated we would fail to steal load.  This could cause
   starvation in situations where cores are idle.

Submitted by:	Doug Kilpatrick <dkilpatrick@isilon.com>
Tested by:	pho
Reviewed by:	mav
Sponsored by:	EMC / Isilon Storage Division
2014-03-08 00:35:06 +00:00
asomers
e2a82966a6 Partial revert of change 262914. I screwed up subversion syntax with
perforce syntax and committed some unrelated files.  Only devd files
should've been committed.

Reported by: 	imp
Pointy hat to:	asomers
MFC after:	3 weeks
X-MFC-With:	r262914
2014-03-07 23:40:36 +00:00
asomers
94889417b9 sbin/devd/devd.8
sbin/devd/devd.cc
	Add a -q flag to devd that will suppress syslog logging at
	LOG_NOTICE or below.

Requested by:	ian@ and imp@
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-03-07 23:30:48 +00:00
asomers
2a6c6c59a2 Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical
buffers drop packets".  It was caused by a check for the space available
in a sockbuf, but it was checking the wrong sockbuf.

sys/sys/sockbuf.h
sys/kern/uipc_sockbuf.c
    Add sbappendaddr_nospacecheck_locked(), which is just like
    sbappendaddr_locked but doesn't validate the receiving socket's
    space.  Factor out common code into sbappendaddr_locked_internal().
    We shouldn't simply make sbappendaddr_locked check the space and
    then call sbappendaddr_nospacecheck_locked, because that would cause
    the O(n) function m_length to be called twice.

sys/kern/uipc_usrreq.c
    Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets,
    because the receiving sockbuf's size limit is irrelevant.

tests/sys/kern/unix_seqpacket_test.c
    Now that 185813 is fixed, pipe_128k_8k fails intermittently due to
    185812.  Make it fail every time by adding a usleep after starting
    the writer thread and before starting the reader thread in
    test_pipe.  That gives the writer time to fill up its send buffer.
    Also, clear the expected failure message due to 185813.  It actually
    said "185812", but that was a typo.

PR:		kern/185813
Reviewed by:	silence from freebsd-net@ and rwatson@
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-03-06 20:24:15 +00:00
dim
e42ec49846 Merge from head up to r262415. 2014-02-23 23:33:11 +00:00
dim
b1b5b8cb84 On sparc64, VM_KMEM_SIZE_SCALE is not a constant expression, so it
cannot be tested in a CTASSERT().
2014-02-23 17:37:24 +00:00
bdrewery
e6b4042303 Fix style of comment blocks.
Reported by:	peter
Approved by:	bapt (mentor, implicit)
X-MFC with:	r262006
2014-02-22 04:28:49 +00:00
markj
19ec16208e Print a backtrace if the SDT(9) stub gets called so that there's at least
some hope of figuring out how it happened.

Suggested by:	rstone
MFC after:	1 week
2014-02-22 01:41:45 +00:00
mjg
1c3ca2a367 Fix a race between kern_proc_{o,}filedesc_out and fdescfree leading
to use-after-free.

fdescfree proceeds to free file pointers once fd_refcnt reaches 0, but
kern_proc_{o,}filedesc_out only checked for hold count.

MFC after:	3 days
2014-02-21 22:29:09 +00:00
bdrewery
0db3f6b736 Fix M_FILEDESC leak in fdgrowtable() introduced in r244510.
fdgrowtable() now only reallocates fd_map when necessary.

This fixes fdgrowtable() to use the same logic as fdescfree() for
when to free the fd_map. The logic in fdescfree() is intended to
not free the initial static allocation, however the fd_map grows
at a slower rate than the table does. The table is intended to hold
20 fd, but its initial map has many more slots than 20.  The slot
sizing causes NDSLOTS(20) through NDSLOTS(63) to be 1 which matches
NDSLOTS(20), so fdescfree() was assuming that the fd_map was still
the initial allocation and not freeing it.

This partially reverts r244510 by reintroducing some of the logic
it removed in fdgrowtable().

Reviewed by:	mjg
Approved by:	bapt (mentor)
MFC after:	2 weeks
2014-02-17 00:00:39 +00:00
bdrewery
d8cb95cb17 Remove redundant memcpy of fd_ofiles in fdgrowtable() added in r247602
Discussed with:	mjg
Approved by:	bapt (mentor)
MFC after:	2 weeks
2014-02-16 23:10:46 +00:00
adrian
6558baa4b2 Include the CPU id in the per-CPU timer swi thread descriptions.
Original patch by:	jhb
2014-02-14 23:19:51 +00:00
pluknet
5d77eeed31 Preserve one character space for a trailing '\0'.
Found by:	Ivan Klymenko via cppcheck
Discussed with: ae
MFC after:	1 week
2014-02-14 20:54:03 +00:00
brueffer
26f5cbe9dc Fix a bug in be_uuid_dec(); it called le16dec() instead of be16dec(),
probably due to copy+pasting le_uuid_dec().

PR:		146588
Submitted by:	Erwin Rol <erwin at erwinrol.com>
Reviewed by:	marcel
MFC after:	1 week
2014-02-13 22:24:36 +00:00
ian
4ca4e5e369 Rework the EARLY_PRINTF mechanism. Instead of defining a special eprintf()
routine, now a platform can provide a pointer to an early_putc() routine
which is used instead of cn_putc().  Control can be handed off from early
printf support to standard console support by NULLing out the pointer
during standard console init.

This leverages all the existing error reporting that uses printf calls,
such as panic() which can now be usefully employed even in early
platform init code (useful at least to those who maintain that code and
build kernels with EARLY_PRINTF defined).

Reviewed by:	imp, eadler
2014-02-12 00:53:38 +00:00
jhb
57d1391321 Expose OBJT_MGTDEVICE VM objects used for GEM/TTM with drm2 as an
explicit object type.

Reviewed by:	kib
MFC after:	1 week
2014-02-11 21:57:37 +00:00
glebius
45bf1cc683 Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized.
Sponsored by:	Nginx, Inc.
2014-02-10 19:59:46 +00:00
glebius
bec9d523c2 Remove identical vnet sysctl handlers, and handle CTLFLAG_VNET
in the sysctl_root().

Note: SYSCTL_VNET_* macros can be removed as well. All is
  needed to virtualize a sysctl oid is set CTLFLAG_VNET on it.
  But for now keep macros in place to avoid large code churn.

Sponsored by:	Nginx, Inc.
2014-02-07 13:47:33 +00:00
jhb
94d685456e Drop the 3rd clause from all 3 clause BSD licenses where I am the sole
holder to convert them to 2 clause BSD licenses.

MFC after:	1 week
2014-02-05 18:13:27 +00:00
nwhitehorn
a7d50b7d03 ULE works on Book-E since r258002, so remove statements to the contrary. 2014-02-01 20:46:35 +00:00
jamie
64b15ec174 Back out r261266 pending security buy-in.
r261266:
  Add a jail parameter, allow.kmem, which lets jailed processes access
  /dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
  This in conjunction with changing the drm driver's permission check from
  PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
2014-01-31 17:39:51 +00:00
kib
f0cb8e7d88 The posix_madvise(3) and posix_fadvise(2) should return error on
failure, same as posix_fallocate(2).

Noted by:	Bob Bishop <rb@gid.co.uk>
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-30 18:04:39 +00:00
jamie
223bb594b0 Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.

Submitted by:	netchild
MFC after:	1 week
2014-01-29 13:41:13 +00:00
jmg
17e2456463 fix spelling of lock_initialized.. jhb approved..
MFC after:	1 week
2014-01-28 17:27:54 +00:00
csjp
800e6812e2 Allow sigwait(2) in capabilities mode.
It's common for multi-threaded processes to create a thread for
the purpose of synchronously processing signals. Allow such processes to
utilize a capabilities sandbox.

Discussed with:	rwatson, pjd
MFC after:	2 weeks
2014-01-28 01:49:49 +00:00
rmh
a40185286e Accept O_CLOEXEC in shm_open().
Reviewed by:	jilles, jhb
MFC after:	1 week
2014-01-24 21:05:07 +00:00
kib
05b9ae7031 The posix_fallocate(2) syscall should return error number on error,
without modifying errno.

Reported and tested by:	Gennady Proskurin <gpr@mail.ru>
Reviewed by:	mdf
PR:	standards/186028
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-01-23 17:24:26 +00:00