Commit Graph

13817 Commits

Author SHA1 Message Date
Gleb Smirnoff
1967edba02 Provide m_catpkt(), a wrapper around m_cat() that deals with M_PKTHDR mbufs.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-04 09:07:14 +00:00
Mateusz Guzik
2570cdd605 Plug a hypothetical use after free in sysctl kern.proc.groups.
MFC after:	1 week
2014-09-04 01:21:33 +00:00
Benno Rice
c079e1c018 Add KASSERTs to catch the case where a developer may have forgotten to
set bo_bsize on a bufobj.

This is a slight modification of the patch provided.

PR:		193146
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC Isilon Storage Division
2014-09-04 00:10:06 +00:00
Konstantin Belousov
624bf9e134 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:40:16 +00:00
Konstantin Belousov
8626a0ddc6 Retire thread_unthread(), it has only one caller. Update comment in
the block of code before the previous call to thread_unthread().

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:35:42 +00:00
Konstantin Belousov
fd229b5b75 Right now, thread_single(SINGLE_EXIT) returns after the p_numthreads
reaches 1. The p_numthreads counter is decremented in thread_exit() by
a call to thread_unlink(). This means that the exiting threads may
still execute on other CPUs when thread_single(SINGLE_EXIT) returns.
As result, vmspace could be destroyed while paging structures are
still used on other CPUs by exiting threads.

Delay the return from thread_single(SINGLE_EXIT) until all threads are
really destroyed by thread_stash() after the last switch out. The
p_exitthreads counter already provides the required mechanism, move
the wait from the thread_wait() (which is called from wait(2) code)
into thread_single().

Reported by:	many (as "panic: pmap active <addr>")
Reviewed by:	alc, jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:18:07 +00:00
Gleb Smirnoff
5b5477d762 Fix dereference after NULL check.
CID:		1234607
Sponsored by:	Nginx, Inc.
2014-09-03 08:14:07 +00:00
Mateusz Guzik
9152087ea7 Fix up proc_realparent to always return correct process.
Prior to the change it would always return initproc for non-traced processes.

This fixes ps apparently always returning 1 as ppid.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-09-03 06:25:34 +00:00
Alan Cox
01a8fb7db5 Automatically prefault a limited number of mappings to resident pages in
shmat(2), just like mmap(2).

MFC after:	5 days
Sponsored by:	EMC / Isilon Storage Division
2014-08-31 17:38:41 +00:00
Mateusz Guzik
6662ce5aab Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
Andreas Tobler
5be725d7e8 Rename shm_dict_init to shm_init to fix a compiler warning.
Reviewed by:	jhb
2014-08-29 21:50:32 +00:00
John Baldwin
610a2b3c45 Use a unit number allocator to provide suitable st_dev and st_ino values
for POSIX shared memory descriptors.  The implementation is similar to
that used for pipes.

MFC after:	1 week
2014-08-29 18:18:29 +00:00
Konstantin Belousov
575e02d94f Add function and wrapper to switch lockmgr and vnode lock back to
auto-promotion of shared to exclusive.

Tested by:	hrs, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-29 09:02:01 +00:00
Mateusz Guzik
8b04bbef31 Return real parent pid in kinfo (used by e.g. ps)
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
2014-08-28 08:41:11 +00:00
Jean-Sébastien Pédron
3e206539a1 vt(4): Add cngrab() and cnungrab() callbacks
They are used when a panic occurs or when entering a DDB session for
instance.

cngrab() forces a vt-switch to the console window, no matter if the
original window is another terminal or an X session. However, cnungrab()
doesn't vt-switch back to the original window currently.

MFC after:	1 week
2014-08-27 10:04:10 +00:00
Gleb Smirnoff
e86447ca44 - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-26 14:44:08 +00:00
Mateusz Guzik
037755fd15 Fix up races with f_seqcount handling.
It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after:	1 week
2014-08-26 08:17:22 +00:00
Konstantin Belousov
c83655f334 Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321.  Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-24 16:37:50 +00:00
Mateusz Guzik
8cc11167fb Plug a memory leak in case of failed lookups in capability mode.
Put common cnp cleanup into one function and use it for this purpose.

MFC after:	1 week
2014-08-24 12:51:12 +00:00
Mateusz Guzik
ce8daaadbd Use refcount_init in sigacts_alloc.
This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after:	3 days
2014-08-24 09:24:37 +00:00
Mateusz Guzik
abd386bafe Fix getppid for traced processes.
Traced processes always have the tracer set as the parent.
Utilize proc_realparent to obtain the right process when needed.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:04:09 +00:00
Mateusz Guzik
a661bebe26 Properly reparent traced processes when the tracer dies.
Previously they were uncoditionally reparented to init. In effect
it was possible that tracee was never returned to original parent.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:02:16 +00:00
Alexander Motin
2e7d7bb294 Restore pre-r239157 handling of sched_yield(), when thread time slice was
aborted, allowing other threads to run.  Without this change thread is just
rescheduled again, that was illustrated by provided test tool.

PR:		192926
Submitted by:	eric@vangyzen.net
MFC after:	2 weeks
2014-08-23 17:31:56 +00:00
Konstantin Belousov
fbb6eca60f In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check.  This restores the pre-r251684 behaviour, to
retry once after the signal is detected.

PR:	kern/192918
Submitted by:	Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net>
Obtained from:	Dell Inc.
MFC after:	1 week
2014-08-22 18:42:14 +00:00
Konstantin Belousov
350ae56373 Ensure that sigaction flags for signal, which disposition is reset to
ignored or default, are not leaking.  Apparently, there exists code
which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN.

In kern_sigaction, ignore flags when resetting.  Encapsulate the flag
and disposition testing into helper sigact_flag_test().

On exec, and when delivering signal with SA_RESETHAND flag set,
signals are reset automatically.  Use new helper sigdflt(), which
removes duplicated code and corrects all flag bits for the signal.

For proc0, set sigintr bit for all ignored signals.  Ignored signals
are consumed in tdsendsignal() and not delivered to the victim thread
at all.

Reported and tested by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 08:19:08 +00:00
Konstantin Belousov
2d86417410 Check the validity of struct sigaction sa_flags value, reject unknown
flags.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 07:52:47 +00:00
Hiroki Sato
ed063112f4 Fix a panic which occurs in a VIMAGE-enabled kernel after r270158, and
separate socket_hhook_register() part and put it into VNET_SYS{,UN}INIT()
handler.

Discussed with:	marcel
2014-08-22 05:03:30 +00:00
Marcel Moolenaar
4ec7371233 For vendors like Juniper, extensibility for sockets is important. A
good example is socket options that aren't necessarily generic.  To
this end, OSD is added to the socket structure and hooks are defined
for key operations on sockets.  These are:
o   soalloc() and sodealloc()
o   Get and set socket options
o   Socket related kevent filters.

One aspect about hhook that appears to be not fully baked is the return
semantics (the return value from the hook is ignored in hhook_run_hooks()
at the time of commit).  To support return values, the socket_hhook_data
structure contains a 'status' field to hold return values.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-08-18 23:45:40 +00:00
Warner Losh
817dc00433 Expand the elf brandelf infrastructure to give access to the whole ELF
header (Elf_Ehdr) to determine if a particular interpretor wants to
accept it or not. Use this mechanism to filter EABI arm on OABI arm
kernels, and vice versa. This method could also be used to implement
OABI on EABI arm kernels, if desired, or to allow a single mips kernel
to run o32, n32 and n64 binaries.

Differential Revision: https://reviews.freebsd.org/D609
2014-08-18 02:44:56 +00:00
Edward Tomasz Napierala
3914ddf8a7 Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris.  It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by:	allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric:	D523
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2014-08-17 09:44:42 +00:00
Mark Johnston
ba78d6b7a1 Correct the order of arguments passed to LIST_INSERT_AFTER().
Reviewed by:	kib
X-MFC-With:	r269656
2014-08-15 15:42:58 +00:00
Xin LI
7001d850bb Add a new loader tunable, vm.kmem_zmax which allows a system administrator
to limit the maximum allocation size that malloc(9) would consider using
the UMA cache allocator as backend.

Suggested by:	alfred
MFC after:	2 weeks
2014-08-14 05:31:39 +00:00
Xin LI
bda06553fd Re-instate UMA cached backend for 4K - 64K allocations. New consumers
like geli(4) uses malloc(9) to allocate temporary buffers that gets
free'ed shortly, causing frequent TLB shootdown as observed in hwpmc
supported flame graph.

Discussed with:	jeff, alfred
MFC after:	1 week
2014-08-14 05:13:24 +00:00
Konstantin Belousov
70978c93b8 If vm_page_grab() allocates a new page, the page is not inserted into
page queue even when the allocation is not wired.  It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.

In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.

In the uiomove_object_page(), deactivate page before the object is
unlocked.  There is no leak, since the page is deactivated after
uiomove_fromphys() finished.  But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-13 05:44:08 +00:00
Gleb Smirnoff
cd1692fa5d Move KASSERT into locked region.
Submitted by:	kib
2014-08-11 15:06:07 +00:00
Gleb Smirnoff
eaf78ad3f7 Use M_WAITOK in sf_buf_init().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 13:12:18 +00:00
Gleb Smirnoff
818d40d033 Provide sf_buf_ref() to optimize refcounting of already allocated
sendfile(2) buffers.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 12:59:55 +00:00
Bjoern A. Zeeb
e346f8c452 Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant
and export the kern_ version needed by an upcoming linuxolator change.

MFC after:	3 days
Sponsored by:	DARPA,AFRL
2014-08-07 16:49:50 +00:00
Andrey V. Elsukov
3e40097976 Temporary revert r269661, it looks like the patch isn't complete. 2014-08-07 14:32:28 +00:00
Andrey V. Elsukov
c60e497af9 Use cpuset_setithread() to apply cpu mask to taskq threads.
Sponsored by:	Yandex LLC
2014-08-07 10:23:50 +00:00
Konstantin Belousov
d735998057 Correct the problems with the ptrace(2) making the debuggee an orphan.
One problem is inferior(9) looping due to the process tree becoming a
graph instead of tree if the parent is traced by child. Another issue
is due to the use of p_oppid to restore the original parent/child
relationship, because real parent could already exited and its pid
reused (noted by mjg).

Add the function proc_realparent(9), which calculates the parent for
given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head
element of the p_orphan list and than stepping back to its container
to find the parent process. If the parent has already exited, the
init(8) is returned.

Move the P_ORPHAN and the new helper flag from the p_flag* to new
p_treeflag field of struct proc, which is protected by proctree lock
instead of proc lock, since the orphans relationship is managed under
the proctree_lock already.

The remaining uses of p_oppid in ptrace(PT_DETACH) and process
reapping are replaced by proc_realparent(9).

Phabric:	D417
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-07 05:47:53 +00:00
Gleb Smirnoff
c8d2ffd6a7 Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.

As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.

Tested by:	glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-05 09:44:10 +00:00
Kirk McKusick
5f9500c358 Add support for multi-threading of soft updates.
Replace a single soft updates thread with a thread per FFS-filesystem
mount point. The threads are associated with the bufdaemon process.

Reviewed by:  kib
Tested by:    Peter Holm and Scott Long
MFC after:    2 weeks
Sponsored by: Netflix
2014-08-04 22:03:58 +00:00
Davide Italiano
4295aa9240 Fix an overflow in getsockopt(). optval isn't big enough to hold
sbintime_t.
Re-introduce r255030 behaviour capping socket timeouts to INT_32
if they're too large.

CR:	https://phabric.freebsd.org/D433
Reported by:	demon
Reviewed by:	bde [1], jhb [2]
MFC after:	2 weeks
2014-08-04 05:40:51 +00:00
Peter Wemm
6dde7ecb5d Partial revert of r262867.
r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET,
but also changed one of the SOCK_DGRAM code paths to use the new
sbappendaddr_nospacecheck_locked() function.  This lead to SOCK_DGRAM
bypassing socket buffer limits.
2014-08-03 22:37:21 +00:00
Sergey Kandaurov
bcdd3bceb6 vn_path_to_global_path: update comment. 2014-08-03 07:59:19 +00:00
Warner Losh
146cbf6fa2 Make the witness lock limit an option. 2014-08-03 05:00:43 +00:00
Konstantin Belousov
168f4ee0a8 Remove Giant acquisition from the mount and unmount pathes.
It could be claimed that two things were reasonable protected by
Giant.  One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx.  Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-03 03:27:54 +00:00
Rui Paulo
551a78956c In the shm_open() and shm_unlink() syscalls, export the path to KTR.
MFC after:	1 week
2014-08-01 23:29:04 +00:00
Konstantin Belousov
634012b917 Remove one-time use macros which check for the vnode lifecycle. More,
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags.  Two of the three macros were only used in assertions.

In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself.  Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.

In v_incr_usecount(), call vholdl() before incrementing other ref
counters.  The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-29 16:42:34 +00:00