Commit Graph

12534 Commits

Author SHA1 Message Date
kib
0c3998cc9e Currently, the debugger attached to the process executing vfork() does
not get syscall exit notification until the child performed exec of
exit.  Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.

Reported, tested and reviewed by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 2 weeks
2012-02-27 21:10:10 +00:00
jhb
4110bb206b Clear the a device's description string anytime it's driver changes.
Descriptions  are specific to drivers and we don't change drivers on attached
devices.  This fixes a few places where we were not clearing the description
when detaching a driver (e.g. with device_attach() failed).  While here, fix
a few other nits:
- Remove spurious call to remove a device's driver from
  devclass_driver_deleted().  device_detach() removes it already.
- Fix a typo.
2012-02-27 16:08:18 +00:00
davidxu
96aacc2279 Follow changes made in revision 232144, pass absolute timeout to kernel,
this eliminates a clock_gettime() syscall.
2012-02-27 13:38:52 +00:00
mav
8a35bc6c4f Rework CPU load balancing in SCHED_ULE:
- In sched_pickcpu() be more careful taking previous CPU on SMT systems.
Do it only if all other logical CPUs of that physical one are idle to avoid
extra resource sharing.
 - In sched_pickcpu() change general logic of CPU selection. First
look for idle CPU, sharing last level cache with previously used one,
skipping SMT CPU groups. If none found, search all CPUs for the least loaded
one, where the thread with its priority can run now. If none found, search
just for the least loaded CPU.
 - Make cpu_search() compare lowest/highest CPU load when comparing CPU
groups with equal load. That allows to differentiate 1+1 and 2+0 loads.
 - Make cpu_search() to prefer specified (previous) CPU or group if load
is equal. This improves cache affinity for more complicated topologies.
 - Randomize CPU selection if above factors are equal. Previous code tend
to prefer CPUs with lower IDs, causing unneeded collisions.
 - Rework periodic balancer in sched_balance_group(). With cpu_search()
more intelligent now, make balansing process flat, removing recursion
over the topology tree. That fixes double swap problem and makes load
distribution more even and predictable.

All together this gives 10-15% performance improvement in many tests on
CPUs with SMT, such as Core i7, for number of threads is less then number
of logical CPUs. In some tests it also gives positive effect to systems
without SMT.

Reviewed by:	jeff
Tested by:	flo, hackers@
MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2012-02-27 10:31:54 +00:00
phk
c1bac816f5 Also call the low-level driver if ->c_iflag & (IXON|IXOFF|IXANY) changes.
Uftdi(4) examines (c_iflag & (IXON|IXOFF)) to control hw XON-XOFF support.
This is obviously no good, if changes to those bits are not communicated
down the stack.
2012-02-26 20:56:49 +00:00
alc
351ef48158 Fix typo.
MFC after:	1 week
2012-02-26 19:10:14 +00:00
mm
d974ef7be1 Analogous to r232059, add a parameter for the ZFS file system:
allow.mount.zfs:
	allow mounting the zfs filesystem inside a jail

This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.

Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.

TODO:	document the connection of allow.mount.* and VFCF_JAIL for kernel
	developers

MFC after:	10 days
2012-02-26 16:30:39 +00:00
jilles
06d1ac3238 Fix fchmod() and fchown() on fifos.
The new fifo implementation in r232055 broke fchmod() and fchown() on fifos.
Postfix needs this.

Submitted by:	gianni
Reported by:	dougb
2012-02-26 15:14:29 +00:00
trociny
a9902044d1 Add sysctl to retrieve or set umask of another process.
Submitted by:	Dmitry Banschikov <me ubique spb ru>
Discussed with:	kib, rwatson
Reviewed by:	kib
MFC after:	2 weeks
2012-02-26 14:25:48 +00:00
kib
79e65f67d4 Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number.  This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by:	Jukka A. Ukkonen <jau iki fi>
PR:	  kern/162352
Discussed with:	bz
Reviewed by:	glebius
MFC after:	2 weeks
2012-02-26 13:55:43 +00:00
kib
18b7fd6ba3 Remove apparently redundand checks for socket so_proto being non-NULL
from sosetopt() and sogetopt().  No exposed sockets may have so_proto
invalid.

Discussed with:	bz, rwatson
Reviewed by:	glebius
MFC after:	2 weeks
2012-02-26 13:51:05 +00:00
maxim
07e034748a o Reduce chances for integer overflow.
o More verbose sysctl description added.

MFC after:	2 weeks
Sponsored by:	Nginx, Inc.
2012-02-25 12:06:40 +00:00
trociny
87f7f0cfe8 When detaching an unix domain socket, uipc_detach() checks
unp->unp_vnode pointer to detect if there is a vnode associated with
(binded to) this socket and does necessary cleanup if there is.

The issue is that after forced unmount this check may be too late as
the unp_vnode is reclaimed and the reference is stale.

To fix this provide a helper function that is called on a socket vnode
reclamation to do necessary cleanup.

Pointed by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-02-25 10:15:41 +00:00
davidxu
61033245ae In revision 231989, we pass a 16-bit clock ID into kernel, however
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.
2012-02-25 02:12:17 +00:00
kib
a0f40cddfb Restore the return statement erronously removed in the r232048.
Submitted by:	cognet
Pointy hat to:	kib (reuse the one I already got today)
MFC after:	13 days
2012-02-24 11:02:35 +00:00
mm
4825085ea4 To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:

allow.mount.devfs:
	allow mounting the devfs filesystem inside a jail

allow.mount.nullfs:
	allow mounting the nullfs filesystem inside a jail

Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.

Reviewed by:	jamie
Suggested by:	pjd
MFC after:	2 weeks
2012-02-23 18:51:24 +00:00
kmacy
a5c27d1ee0 merge pipe and fifo implementations
Also reviewed by: jhb, jilles (initial revision)
Tested by: pho, jilles

Submitted by:	gianni
Reviewed by:	bde
2012-02-23 18:37:30 +00:00
brueffer
8b76671a80 Catch up with r195837 (2.5 years ago) which renamed net_add_domain() to domain_add().
PR:		165424
Submitted by:	Lachlan Kang
MFC after:	1 week
2012-02-23 17:47:19 +00:00
kib
bfe47eb1df Allow the parent to gather the exit status of the children reparented
to the debugger.  When reparenting for debugging, keep the child in
the new orphan list of old parent.  When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.

Submitted by:	Dmitry Mikulin <dmitrym juniper.net>
MFC after:	2 weeks
2012-02-23 11:50:23 +00:00
davidxu
79308ead48 Fix typo. 2012-02-22 07:34:23 +00:00
davidxu
d177303078 Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.
2012-02-22 03:22:49 +00:00
trociny
05b0baa9d7 unp_connect() may use a shared lock on the vnode to fetch the socket.
Suggested by:	jhb
Reviewed by:	jhb, kib, rwatson
MFC after:	2 weeks
2012-02-21 19:40:13 +00:00
kib
80ae8fe82c Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
delphij
3e8e765569 Revert r231923 for now. Further work is needed to make sure that the
behavior is consistent.
2012-02-20 09:32:32 +00:00
delphij
c86b0fb582 Use uprintf instead of printf for the reason why a kernel module can not
be loaded.  This way, the administrator can get response immediately from
the shell session rather than relying on dmesg.

MFC after:	1 month
2012-02-20 01:05:17 +00:00
alc
d953613252 Close a race due to dropping of the map lock between creating a map entry
for a shared mapping and marking the entry for inheritance.

Reviewed by:	kib
X-MFC after:	r231526
2012-02-19 00:28:49 +00:00
kib
abd1094f17 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
eadler
c7937266c4 Add a timestamp to the msgbuf output in order to determine when when
messages were printed.

This can be enabled with the kern.msgbuf_show_timestamp sysctl

PR:		kern/161553
Reviewed by:	avg
Submitted by:	Arnaud Lacombe <lacombar@gmail.com>
Approved by:	cperciva
MFC after:	1 month
2012-02-16 05:11:35 +00:00
kib
4658c8a871 The PTRACESTOP() macro is used only once. Inline the only use and remove
the macro.

MFC after:	1 week
2012-02-11 14:49:25 +00:00
ed
ec8ff22f84 Remove unneeded newline. It fits in 80 columns now.
Pointed out by:	jh
2012-02-10 14:55:47 +00:00
ed
2b4f7a9e8a Merge si_name and __si_namebuf.
The si_name pointer always points to the __si_namebuf member inside the
same object. Remove it and rename __si_namebuf to si_name.
2012-02-10 12:40:50 +00:00
kevlo
c935e4e242 Add a missing break. This bug was introduced in r228856. 2012-02-10 06:30:52 +00:00
kib
956d09353b Mark the automatically attached child with PL_FLAG_CHILD in struct
lwpinfo flags, for PT_FOLLOWFORK auto-attachment.

In collaboration with:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 1 week
2012-02-10 00:02:13 +00:00
mm
1626913ed1 Add support for mounting devfs inside jails.
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by:	jamie
MFC after:	1 month
2012-02-09 10:22:08 +00:00
kib
5515381ae2 Unbreak detection of the async mode for clustered writes after r231075.
Submitted by:	bde
MFC after:	12 days
2012-02-08 15:07:19 +00:00
pjd
bf6a1ce070 Allow to set kern.ipc.shmmax from /boot/loader.conf.
MFC after:	1 week
2012-02-08 09:18:22 +00:00
ed
3a052ed6cd Fix whitespace inconsistencies in TTY code. 2012-02-06 18:15:46 +00:00
jhb
1a6892cbe7 Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().
2012-02-06 17:00:28 +00:00
kib
52c17430bc Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
kevlo
4c30dca03c - Use uint8_t for the variable x and spell the size of the variable
as sizeof(x)
- Capitalized comment
- Parentheses around return value

Requested by:	bde
2012-02-06 06:03:16 +00:00
mm
8a4156df5d Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after:	10 days
2012-02-05 10:59:50 +00:00
davidxu
c4909ace45 Add 32-bit compat code for AIO kevent flags introduced in revision 230857. 2012-02-05 04:49:31 +00:00
rstone
d657a808e1 Whenever a new kernel thread is spawned, explicitly clear any CPU affinity
set on the new thread.  This prevents the thread from inadvertently
inheriting affinity from a random sibling.

Submitted by:	attilio
Tested by:	pho
MFC after:	1 week
2012-02-04 16:49:29 +00:00
hrs
43b8619604 Fix input validation in SO_SETFIB.
Reviewed by:	bz
MFC after:	1 day
2012-02-04 15:00:26 +00:00
kib
a7d7065053 Add kqueue support to /dev/klog.
Submitted by:	Mateusz Guzik <mjguzik gmail com>
PR:	  kern/156423
MFC after:	1 weeks
2012-02-01 14:34:52 +00:00
davidxu
e9edcfe7c1 If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.

PR:	kern/156567
2012-02-01 02:53:06 +00:00
kib
e1d70baef7 A debugger which requested PT_FOLLOW_FORK should get the notification
about new child not only when doing PT_TO_SCX, but also for PT_CONTINUE.
If TDB_FORK flag is set, always issue a stop, the same as is done for
TDB_EXEC.

Reported by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	1 week
2012-01-30 20:00:29 +00:00
jhb
d372b841c6 Refine the implementation of POSIX_FADV_NOREUSE for the read(2) case such
that instead of using direct I/O it allows read-ahead similar to
POSIX_FADV_NORMAL, but invokes VOP_ADVISE(POSIX_FADV_DONTNEED) after the
read(2) has completed to purge just-read data.  The write(2) path continues
to use direct I/O for POSIX_FADV_NOREUSE for now.  Note that NOREUSE works
optimally if an application reads and writes full fs blocks.
2012-01-30 19:35:15 +00:00
ambrisko
2e6fa96915 When detaching an AIO or LIO requests grab the lock and tell knlist_remove
that we have the lock now.  This cleans up a locking panic ASSERT when
knlist_empty is called without a lock when INVARIANTS etc. are turned.

Reviewed by:	kib jhb
MFC after:	1 week
2012-01-30 19:19:22 +00:00