Commit Graph

12518 Commits

Author SHA1 Message Date
David Xu
f911d9fa4d Fix typo. 2012-02-22 07:34:23 +00:00
David Xu
b13a8fa78f Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.
2012-02-22 03:22:49 +00:00
Mikolaj Golub
a95852edf3 unp_connect() may use a shared lock on the vnode to fetch the socket.
Suggested by:	jhb
Reviewed by:	jhb, kib, rwatson
MFC after:	2 weeks
2012-02-21 19:40:13 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
Xin LI
fcdd3d322b Revert r231923 for now. Further work is needed to make sure that the
behavior is consistent.
2012-02-20 09:32:32 +00:00
Xin LI
5bfbb59851 Use uprintf instead of printf for the reason why a kernel module can not
be loaded.  This way, the administrator can get response immediately from
the shell session rather than relying on dmesg.

MFC after:	1 month
2012-02-20 01:05:17 +00:00
Alan Cox
7dc0ace10e Close a race due to dropping of the map lock between creating a map entry
for a shared mapping and marking the entry for inheritance.

Reviewed by:	kib
X-MFC after:	r231526
2012-02-19 00:28:49 +00:00
Konstantin Belousov
3494f31ad2 Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode.  Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by:   alc
MFC after:     2 weeks
2012-02-17 23:47:16 +00:00
Bjoern A. Zeeb
9dba179d5e IFC @231845
Sponsored by:	Cisco Systems, Inc.
2012-02-17 00:27:48 +00:00
Eitan Adler
f17a6f1b17 Add a timestamp to the msgbuf output in order to determine when when
messages were printed.

This can be enabled with the kern.msgbuf_show_timestamp sysctl

PR:		kern/161553
Reviewed by:	avg
Submitted by:	Arnaud Lacombe <lacombar@gmail.com>
Approved by:	cperciva
MFC after:	1 month
2012-02-16 05:11:35 +00:00
Konstantin Belousov
343b391f20 The PTRACESTOP() macro is used only once. Inline the only use and remove
the macro.

MFC after:	1 week
2012-02-11 14:49:25 +00:00
Ed Schouten
852b05c5b5 Remove unneeded newline. It fits in 80 columns now.
Pointed out by:	jh
2012-02-10 14:55:47 +00:00
Ed Schouten
8fac9b7b7d Merge si_name and __si_namebuf.
The si_name pointer always points to the __si_namebuf member inside the
same object. Remove it and rename __si_namebuf to si_name.
2012-02-10 12:40:50 +00:00
Kevin Lo
de02885a7b Add a missing break. This bug was introduced in r228856. 2012-02-10 06:30:52 +00:00
Konstantin Belousov
db3273398b Mark the automatically attached child with PL_FLAG_CHILD in struct
lwpinfo flags, for PT_FOLLOWFORK auto-attachment.

In collaboration with:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 1 week
2012-02-10 00:02:13 +00:00
Martin Matuska
0cc207a6f5 Add support for mounting devfs inside jails.
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by:	jamie
MFC after:	1 month
2012-02-09 10:22:08 +00:00
Konstantin Belousov
9cfb2326bc Unbreak detection of the async mode for clustered writes after r231075.
Submitted by:	bde
MFC after:	12 days
2012-02-08 15:07:19 +00:00
Pawel Jakub Dawidek
12075c0936 Allow to set kern.ipc.shmmax from /boot/loader.conf.
MFC after:	1 week
2012-02-08 09:18:22 +00:00
Ed Schouten
cd864a19a5 Fix whitespace inconsistencies in TTY code. 2012-02-06 18:15:46 +00:00
John Baldwin
bf40d24a3f Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().
2012-02-06 17:00:28 +00:00
Konstantin Belousov
c480f781ea Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
Kevin Lo
0ca2381d9b - Use uint8_t for the variable x and spell the size of the variable
as sizeof(x)
- Capitalized comment
- Parentheses around return value

Requested by:	bde
2012-02-06 06:03:16 +00:00
Martin Matuska
a91d2201f9 Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after:	10 days
2012-02-05 10:59:50 +00:00
David Xu
d56e058a79 Add 32-bit compat code for AIO kevent flags introduced in revision 230857. 2012-02-05 04:49:31 +00:00
Ryan Stone
312ac3a23a Whenever a new kernel thread is spawned, explicitly clear any CPU affinity
set on the new thread.  This prevents the thread from inadvertently
inheriting affinity from a random sibling.

Submitted by:	attilio
Tested by:	pho
MFC after:	1 week
2012-02-04 16:49:29 +00:00
Hiroki Sato
cf8b832511 Fix input validation in SO_SETFIB.
Reviewed by:	bz
MFC after:	1 day
2012-02-04 15:00:26 +00:00
Bjoern A. Zeeb
ee799639e8 Add SO_SETFIB option support on PF_INET6 sockets and allow inheriting the
FIB number from the process, as set by setfib(2), on socket creation.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 11:00:53 +00:00
Konstantin Belousov
6af519cf18 Add kqueue support to /dev/klog.
Submitted by:	Mateusz Guzik <mjguzik gmail com>
PR:	  kern/156423
MFC after:	1 weeks
2012-02-01 14:34:52 +00:00
David Xu
fde809356a If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.

PR:	kern/156567
2012-02-01 02:53:06 +00:00
Konstantin Belousov
6ad1ff09cc A debugger which requested PT_FOLLOW_FORK should get the notification
about new child not only when doing PT_TO_SCX, but also for PT_CONTINUE.
If TDB_FORK flag is set, always issue a stop, the same as is done for
TDB_EXEC.

Reported by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	1 week
2012-01-30 20:00:29 +00:00
John Baldwin
2bd3e4c2c2 Refine the implementation of POSIX_FADV_NOREUSE for the read(2) case such
that instead of using direct I/O it allows read-ahead similar to
POSIX_FADV_NORMAL, but invokes VOP_ADVISE(POSIX_FADV_DONTNEED) after the
read(2) has completed to purge just-read data.  The write(2) path continues
to use direct I/O for POSIX_FADV_NOREUSE for now.  Note that NOREUSE works
optimally if an application reads and writes full fs blocks.
2012-01-30 19:35:15 +00:00
Doug Ambrisko
8e9fc27818 When detaching an AIO or LIO requests grab the lock and tell knlist_remove
that we have the lock now.  This cleans up a locking panic ASSERT when
knlist_empty is called without a lock when INVARIANTS etc. are turned.

Reviewed by:	kib jhb
MFC after:	1 week
2012-01-30 19:19:22 +00:00
Konstantin Belousov
62c625fdd2 Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit
and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported,
and some variants of powerpc32.

MFC after:	2 months (if ever)
2012-01-30 07:56:00 +00:00
Attilio Rao
5d7380f8e3 Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with:	flo
Reviewed by:	avg
MFC after:	3 months (or never)
X-MFC:		r228424
2012-01-28 14:00:21 +00:00
Gleb Smirnoff
94fce84763 Fix size check, that prevents getting negative after casting
to a signed type

Reviewed by:	bde
2012-01-27 08:58:58 +00:00
Kenneth D. Merry
7e949c467c Xen netback driver rewrite.
share/man/man4/Makefile,
share/man/man4/xnb.4,
sys/dev/xen/netback/netback.c,
sys/dev/xen/netback/netback_unit_tests.c:

	Rewrote the netback driver for xen to attach properly via newbus
	and work properly in both HVM and PVM mode (only HVM is tested).
	Works with the in-tree FreeBSD netfront driver or the Windows
	netfront driver from SuSE.  Has not been extensively tested with
	a Linux netfront driver.  Does not implement LRO, TSO, or
	polling.  Includes unit tests that may be run through sysctl
	after compiling with XNB_DEBUG defined.

sys/dev/xen/blkback/blkback.c,
sys/xen/interface/io/netif.h:

	Comment elaboration.

sys/kern/uipc_mbuf.c:

	Fix page fault in kernel mode when calling m_print() on a
	null mbuf.  Since m_print() is only used for debugging, there
	are no performance concerns for extra error checking code.

sys/kern/subr_scanf.c:

	Add the "hh" and "ll" width specifiers from C99 to scanf().
	A few callers were already using "ll" even though scanf()
	was handling it as "l".

Submitted by:	Alan Somers <alans@spectralogic.com>
Submitted by:	John Suykerbuyk <johns@spectralogic.com>
Sponsored by:	Spectra Logic
MFC after:	1 week
Reviewed by:	ken
2012-01-26 16:35:09 +00:00
Gleb Smirnoff
434ea137cc Although aio_nbytes is size_t, later is is signed to
casted types: to ssize_t in filesystem code and to
int in buf code, thus supplying a negative argument
leads to kernel panic later. To fix that check user
supplied argument in the beginning of syscall.

Submitted by:	Maxim Dounin <mdounin mdounin.ru>, maxim@
2012-01-26 11:59:48 +00:00
Konstantin Belousov
abc942b56c When doing vflush(WRITECLOSE), clean vnode pages.
Unmounts do vfs_msync() before calling VFS_UNMOUNT(), but there is
still a race allowing a process to dirty pages after msync
finished. Remounts rw->ro just left dirty pages in system.

Reviewed by:	alc, tegge (long time ago)
Tested by:	pho
MFC after:	2 weeks
2012-01-25 20:54:09 +00:00
Konstantin Belousov
d5210589b7 Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps.  Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by:  jhb (previous version)
MFC after:    2 weeks
2012-01-25 20:48:20 +00:00
Mikolaj Golub
45efc9b4aa Fix CTL flags in the declarations of KERN_PROC_ENV, AUXV and
PS_STRINGS sysctls: they are read only.

MFC after:	1 week
2012-01-25 20:15:58 +00:00
Konstantin Belousov
7a7e609a32 Apparently, both nfs clients do not use cache_enter_time()
consistently, creating some namecache entries without NCF_TS flag.
This causes panic due to failed assertion.

As a temporal relief, remove the assert. Return epoch timestamp for
the entries without timestamp if asked.

While there, consolidate the code which returns timestamps, into a
helper cache_out_ts().

Discussed with:	 jhb
MFC after: 2 weeks
2012-01-23 17:09:23 +00:00
Gleb Smirnoff
93a1b4c4cf Convert panic()s to KASSERT()s. This is an optimisation for
hashdestroy() since in absence of INVARIANTS a compiler
will drop the entire for() cycle.
2012-01-23 16:31:46 +00:00
Mikolaj Golub
8854fe3915 Change kern.proc.rlimit sysctl to:
- retrive only one, specified limit for a process, not the whole
  array, as it was previously (the sysctl has been added recently and
  has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by:	Andrey Zonov <andrey at zonov.org>
Discussed with:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-01-22 20:25:00 +00:00
Pawel Jakub Dawidek
9b9a01792d TDF_* flags should be used with td_flags field and TDP_* flags should be used
with td_pflags field. Correct two places where it was not the case.

Discussed with:	kib
MFC after:	1 week
2012-01-22 11:01:36 +00:00
Konstantin Belousov
c2b396f294 Remove the nc_time and nc_ticks elements from struct namecache, and
provide struct namecache_ts which is the old struct namecache. Only
allocate struct namecache_ts if non-null struct timespec *tsp was
passed to cache_enter_time, otherwise use struct namecache.

Change struct namecache allocation and deallocation macros into static
functions, since logic becomes somewhat twisty.  Provide accessor for
the nc_name member of struct namecache to hide difference between
struct namecache and namecache_ts.

The aim of the change is to not waste 20 bytes per small namecache
entry.

Reviewed by:	 jhb
MFC after: 2 weeks
X-MFC-note:  after r230394
2012-01-22 01:11:06 +00:00
Martin Matuska
6dfe0a3dc2 Use separate buffer for global path to avoid overflow of path buffer.
Reviewed by:	jamie@
MFC after:	3 weeks
2012-01-21 00:06:21 +00:00
John Baldwin
5aefb4cbbf Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client.  The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries.  However, if there are multiple entries for a single vnode,
they all share a single timestamp.  To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry.  The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode.  Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache.  The latter is subject to races with other
lookups updating the attribute cache concurrently.  Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
  a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
  cache does not store name cache entries for ".", so we cannot get a
  useful timestamp.  It didn't really make much sense to recheck the
  attributes on the the directory to validate the namecache hit for "."
  anyway.
- ABI compat shims for the name cache routines are present in this commit
  so that it is safe to MFC.

MFC after:	2 weeks
2012-01-20 20:02:01 +00:00
Konstantin Belousov
2974cc36f7 Use shared lock for the executable vnode in the exec path after the
VV_TEXT changes are handled. Assert that vnode is exclusively locked at
the places that modify VV_TEXT.

Discussed with:	alc
MFC after:	3 weeks
2012-01-19 23:03:31 +00:00
Alan Cox
1dfab8025e Explain why it is safe to unlock the vnode.
Requested by:	kib
2012-01-17 16:20:50 +00:00
Kirk McKusick
cc672d3599 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00