Commit Graph

14256 Commits

Author SHA1 Message Date
Dmitry Chagin
1aa90eca33 In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision:	https://reviews.freebsd.org/D1032
Reviewed by:	trasz
2015-05-24 14:39:26 +00:00
Dmitry Chagin
09baafb471 In preparation for switching linuxulator to the use the native 1:1
threads introduce kern_thr_alloc() which will be used later in the
linux_clone().

Differential Revision:	https://reviews.freebsd.org/D1029
Reviewed by:	trasz
2015-05-24 14:37:45 +00:00
Dmitry Chagin
95be6d2b1f In preparation for switching linuxulator to the use the native 1:1
threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit().
Move
Where the second will be used in linux_exit() system call later.

Differential Revision:	https://reviews.freebsd.org/D1028
Reviewed by:	trasz
2015-05-24 14:36:33 +00:00
Konstantin Belousov
3077f938b4 If thread requested to not stop on non-boundary, then not only
stopping signals should obey, but also all forms of single-threading.
Otherwise, thread might sleep interruptible while owning some
resources, and single-threading thread could try to access them.
An example is owning vnode lock while dumping core.

Submitted by:	Conrad Meyer
Review:	https://reviews.freebsd.org/D2612
Tested by:	pho
MFC after:	1 week
2015-05-23 19:09:04 +00:00
Warner Losh
ee960398a0 Fix typo in symbol name. It helps to hit save in all your buffers
before committing.
2015-05-22 21:10:14 +00:00
Warner Losh
d36eec691a Export the eflags field from the elf header. This allows better
discrimination between different subarch binaries, at least for mips
and arm. Arm is implemented, mips is still tbd, so not currently
exported. aarch64 does not export this because aarch64 binaries use
different tags and flags than arm.

Differential Revision: https://reviews.freebsd.org/D2611
2015-05-22 20:50:35 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
John Baldwin
21119d0641 Expand ktr_mask to be a 64-bit unsigned integer.
The mask does not really need to be updated with atomic operations and
the downside of losing races during transitions is not great (it is
not marked volatile, so those races are pretty wide open as it is).

Differential Revision:	https://reviews.freebsd.org/D2595
Reviewed by:	emaste, neel, rpaulo
MFC after:	2 weeks
2015-05-22 11:09:41 +00:00
John Baldwin
c209e3e2e6 Only reparent a traced process to its old parent if the tracing process is
not the old parent. Otherwise, proc_reap() will leave the zombie in place
resulting in the process' status being returned twice to its parent.

Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by
this change.

Differential Revision:	https://reviews.freebsd.org/D2594
Reviewed by:	kib
MFC after:	2 weeks
2015-05-22 11:04:54 +00:00
John Baldwin
c636f94bd2 Revert r282971. It depends on condvar consumers not destroying condvars
until all threads sleeping on a condvar have resumed execution after being
awakened.  However, there are cases where that guarantee is very hard to
provide.
2015-05-21 16:43:26 +00:00
Pedro F. Giffuni
cd508278c1 ddb: finish converting boolean values.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool.  This also involved cleaning some type
mismatches and ansifying old C function declarations.

Pointed out by:	bde
Discussed with:	bde, ian, jhb
2015-05-21 15:16:18 +00:00
Mariusz Zaborski
963bc7a03f Fix memory leak.
Approved by:	pjd (mentor)
2015-05-20 17:48:22 +00:00
Mariusz Zaborski
1acd888ff5 Style.
Approved by:	pjd (mentor)
2015-05-20 17:47:01 +00:00
Mariusz Zaborski
823870acb8 Always use the nv_free function.
Approved by:	pjd (mentor)
2015-05-20 17:44:58 +00:00
Alan Somers
7a9c38e681 Properly null-terminate strings in a kernel dump header. A version string
longer than 192 bytes will cause the version field of a dump header to
overflow. strncpy doesn't null terminate it, so savecore will print a
corrupted info file. Using strlcpy fixes the bug.

Differential Revision:	https://reviews.freebsd.org/D2560
Reviewed by:		markj
MFC after:		3 weeks
Sponsored by:		Spectra Logic
2015-05-19 16:23:47 +00:00
Mateusz Guzik
747c0dd67c fd: fix imbalanced fdp unlock in F_SETLK and F_GETLK
MFC after:	3 days
2015-05-18 14:27:04 +00:00
Mateusz Guzik
c3293b83c4 Tidy up sys_umask a little bit
Consistently use saved fdp pointer as it cannot change. If it could change the
code would be already incorrect.

No functional changes.
2015-05-18 13:43:33 +00:00
John Baldwin
5c894ee2ef Previously, cv_waiters was only updated by cv_signal or cv_wait. If a
thread awakened due to a time out, then cv_waiters was not decremented.
If INT_MAX threads timed out on a cv without an intervening cv_broadcast,
then cv_waiters could overflow. To fix this, have each sleeping thread
decrement cv_waiters when it resumes.

Note that previously cv_waiters was protected by the sleepq chain lock.
However, that lock is not held when threads resume from sleep. In
addition, the interlock is also not always reacquired after resuming
(cv_wait_unlock), nor is it always held by callers of cv_signal() or
cv_broadcast(). Instead, use atomic ops to update cv_waiters. Since
the sleepq chain lock is still held on every increment, it should
still be safe to compare cv_waiters against zero while holding the
lock in the wakeup routines as the only way the race should be lost
would result in extra calls to sleepq_signal() or sleepq_broadcast().

Differential Revision:	https://reviews.freebsd.org/D2427
Reviewed by:	benno
Reported by:	benno (wrap of cv_waiters in the field)
MFC after:	2 weeks
2015-05-15 13:50:37 +00:00
Konstantin Belousov
100ac78be1 On amd64, make proc0 pmap initialization slightly more correct. In
particular, switch to the proc0 pmap to have expected %cr3 and PCID
for the thread0 during initialization, and the up to date pm_active
mask.

pmap_pinit0() should be done after proc0->p_vmspace is assigned so
that the amd64 pmap_activate() find the correct curproc pmap.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-05-15 08:30:29 +00:00
Konstantin Belousov
84cdea97e5 Right now, the process' p_boundary_count counter is decremented by the
suspended thread itself, on the return path from
thread_suspend_check().  A consequence is that return from
thread_single_end(SINGLE_BOUNDARY) may leave p_boundary_count
non-zero, it might be even equal to the threads count.

Now, assume that we have two threads in the process, both calling
execve(2).  Suppose that the first thread won the race to be the
suspension thread, and that afterward its exec failed for any reason.
After the first thread did thread_single_end(SINGLE_BOUNDARY), second
thread becomes the process suspension thread and checks
p_boundary_count.  The non-zero value of the count allows the
suspension loop to finish without actually suspending some threads.
In other words, we enter exec code with some threads not suspended.

Fix this by decrementing p_boundary_count in the
thread_single_end()->thread_unsuspend_one() during marking the thread
as runnable.  This way, a return from thread_single_end() guarantees
that the counter is cleared.  We do not care whether the unsuspended
thread has a chance to run.

Add some asserts to ensure the state of the process when single
boundary suspension is lifted.  Also make thread_unuspend_one()
static.

In collaboration with:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-15 07:54:31 +00:00
Jonathan Anderson
60aa2c85fa Allow sizeof(cpuset_t) to be queried in capability mode.
This allows functions that retrieve and inspect pthread_attr_t objects to
work correctly: querying the cpuset_t size is part of querying CPU
affinity information, which is part of creating a complete pthread_attr_t.

Approved by: rwatson (mentor)
Reviewed by: pjd
Sponsored by: NSERC
2015-05-14 15:14:03 +00:00
Edward Tomasz Napierala
ba8f0eb8fc Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Differential Revision:	https://reviews.freebsd.org/D2407
Reviewed by:	emaste@, wblock@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-05-14 14:03:55 +00:00
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Konstantin Belousov
44ec2b63c5 The vmem callback to reclaim kmem arena address space on low or
fragmented conditions currently just wakes up the pagedaemon.  The
kmem arena is significantly smaller then the total available physical
memory, which means that there are loads where kmem arena space could
be exhausted, while there is a lot of pages available still.  The
woken up pagedaemon sees vm_pages_needed != 0, verifies the condition
vm_paging_needed() which is false, clears the pass and returns back to
sleep, not calling neither uma_reclaim() nor lowmem handler.

To handle low kmem arena conditions, create additional pagedaemon
thread which calls uma_reclaim() directly.  The thread sleeps on the
dedicated channel and kmem_reclaim() wakes the thread in addition to
the pagedaemon.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-09 20:08:36 +00:00
Konstantin Belousov
ac437c0754 Do not return from thread_single(SINGLE_BOUNDARY) until all stopped
thread are guarenteed to be removed from the processors.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-09 18:32:13 +00:00
Andrey V. Elsukov
089bb672c4 m_dup() is supposed to give a writable copy of an mbuf chain. It uses
m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field.
If original mbuf chain has M_RDONLY flag, its copy also will have it.
Reset this flag explicitly.

MFC after:	2 weeks
2015-05-07 18:35:01 +00:00
Mateusz Guzik
edf1796d3e Fix up panics when fork fails due to hitting proc limit
The function clearning credentials on failure asserts the process is a
zombie, which is not true when fork fails.

Changing creds to NULL is unnecessary, but is still being done for
consistency with other code.

Pointy hat: mjg
Reported by: pho
2015-05-06 21:03:19 +00:00
Ian Lepore
28315e27a7 Implement a mechanism for making changes in the kernel<->driver PPS
interface without breaking ABI or API compatibility with existing drivers.

The existing data structures used to communicate between the kernel and
driver portions of PPS processing contain no spare/padding fields and no
flags field or other straightforward mechanism for communicating changes
in the structures or behaviors of the code.  This makes it difficult to
MFC new features added to the PPS facility.  ABI compatibility is
important; out-of-tree drivers in module form are known to exist.  (Note
that the existing api_version field in the pps_params structure must
contain the value mandated by RFC 2783 and any RFCs that come along after.)

These changes introduce a pair of abi-version fields which are filled in
by the driver and the kernel respectively to indicate the interface
version.  The driver sets its version field before calling the new
pps_init_abi() function.  That lets the kernel know how much of the
pps_state structure is understood by the driver and it can avoid using
newer fields at the end of the structure that it knows about if the driver
is a lower version.  The kernel fills in its version field during the init
call, letting the driver know what features and data the kernel supports.

To implement the new version information in a way that is backwards
compatible with code from before these changes, the high bit of the
lightly-used 'kcmode' field is repurposed as a flag bit that indicates the
driver is aware of the abi versioning scheme.  Basically if this bit is
clear that indicates a "version 0" driver and if it is set the driver_abi
field indicates the version.

These changes also move the recently-added 'mtx' field of pps_state from
the middle to the end of the structure, and make the kernel code that uses
this field conditional on the driver being abi version 1 or higher.  It
changes the only driver currently supplying the mtx field, usb_serial, to
use pps_init_abi().

Reviewed by:	hselasky@
2015-05-04 17:59:39 +00:00
Mariusz Zaborski
a523fa069f nv_malloc can fail in userland.
Add check to prevent a NULL pointer dereference.

Pointed out by:	mjg
Approved by:	pjd (mentor)
2015-05-02 18:12:34 +00:00
Mariusz Zaborski
da20d06f84 Remove duplicated code using macro template for the nvlist_add_.* functions.
Approved by:	pjd (mentor)
2015-05-02 18:10:45 +00:00
Mariusz Zaborski
169c153b59 Introduce the NV_FLAG_NO_UNIQUE flag. When set, it allows to store
multiple values using the same key in a nvlist.

Approved by:	pjd (mentor)
Obtained from:	WHEEL Systems (http://www.wheelsystems.com)

Update man page.

Reviewed by:	AllanJude
Approved by:	pjd (mentor)
2015-05-02 18:03:47 +00:00
Mariusz Zaborski
bd1da0a002 Approved, oprócz użycie RESTORE_ERRNO() do ustawiania errno.
Change the nvlist_recv() function to take additional argument that
specifies flags expected on the received nvlist. Receiving a nvlist with
different set of flags than the ones we expect might lead to undefined
behaviour, which might be potentially dangerous.

Update consumers of this and related functions and update the tests.

Approved by:	pjd (mentor)

Update man page for nvlist_unpack, nvlist_recv, nvlist_xfer, cap_recv_nvlist
and cap_xfer_nvlist.

Reviewed by:	AllanJude
Approved by:	pjd (mentor)
2015-05-02 17:45:52 +00:00
Bjoern A. Zeeb
44aa151e1b Fix an off-by-one bug in string/array handling which lead to memory overwrite
and follow-up assertion errors on at least ARM after r282257,
with nvp_magic being 0x6e7600:
Assertion failed: ((nvp)->nvp_magic == 0x6e7670), function nvpair_name, file .../subr_nvpair.c, line 713.

Sponsored by:	DARPA/AFRL
2015-05-02 08:31:16 +00:00
Mark Johnston
9ad64f27be Remove a stale reference to the stop_scheduler_on_panic tunable, which
itself was removed in r243515.

MFC after:	1 week
2015-05-02 00:27:58 +00:00
Mariusz Zaborski
24f93ee714 Add nvlist_flags() function, which returns nvlist's public flags.
Approved by:	pjd (mentor)
2015-05-01 17:50:24 +00:00
Mariusz Zaborski
b7be86aca6 Mark local function as static as a result of removing recursion.
Approved by:	pjd (mentor)
2015-04-30 20:50:42 +00:00
Mariusz Zaborski
45fd5ced7b Rename macros to use prefix ERRNO. Add macro ERRNO_SET. Now
ERRNO_{RESTORE/SAVE} must by used together, additional variable is not
needed. Always use ERRNO_{SAVE/RESTORE/SET} macros.

Approved by:	pjd (mentor)
2015-04-30 20:47:33 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Mariusz Zaborski
be73922fcd Save errno from close override.
Approved by:	pjd (mentor)
2015-04-29 22:59:44 +00:00
Mariusz Zaborski
0bb5e6ef80 Remove the nvlist_.*[fv] functions.
Those functions are problematic, because there is no way to report
memory allocation problems without complicating the API, so we can
either abort or potentially return invalid results. None of which is
acceptable.

In most cases the caller knows the size of the name, so he can allocate
buffer on the stack and use snprintf(3) to prepare the name.

After some discussion the conclusion is to removed those functions,
which also simplifies the API.

Discussed with: pjd, rstone
Approved by:	pjd (mentor)
2015-04-29 22:57:04 +00:00
Mariusz Zaborski
3cfb71c186 Remove recursion from descriptor-related functions.
Approved by:	pjd (mentor)
2015-04-29 22:15:02 +00:00
Mariusz Zaborski
003e3ea15b Always use the nv_malloc macro instead of malloc(3).
Approved by:	pjd (mentor)
2015-04-29 21:54:34 +00:00
Mariusz Zaborski
906289c2ec Style fixes.
Approved by:	pjd (mentor)
2015-04-29 21:50:04 +00:00
Edward Tomasz Napierala
4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
Edward Tomasz Napierala
aa32b5e076 Make setproctitle(3) work in Capsicum capability mode. This makes
ctld(8) child processes to indicate initiator address and name in
their titles, similar to what iscsid(8) child processes do.

PR:		181352
Differential Revision:	https://reviews.freebsd.org/D2363
Reviewed by:	rwatson@, mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-27 11:18:16 +00:00
Konstantin Belousov
9a2c85350a Partially revert r255986: do not call VOP_FSYNC() when helping
bufdaemon in getnewbuf(), do use buf_flush().  The difference is that
bufdaemon uses TRYLOCK to get buffer locks, which allows calls to
getnewbuf() while another buffer is locked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-27 11:13:19 +00:00
Konstantin Belousov
f16f8610bb Fix locking for oshmctl() and shmsys().
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-27 11:12:51 +00:00
Mateusz Guzik
8d0a4ab212 fd: plug an always overwritten initialization in fdalloc 2015-04-26 17:27:55 +00:00
Mateusz Guzik
203322f966 Consistently use p instead of td->td_proc in create_thread
No functional changes.
2015-04-26 17:22:59 +00:00
Rick Macklem
7cfdc2a7bc MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.

Differential Revision:	https://reviews.freebsd.org/D2330
Reviewed by:	mav, kib
MFC after:	2 weeks
2015-04-25 00:52:01 +00:00