12507 Commits

Author SHA1 Message Date
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
eadler
c7937266c4 Add a timestamp to the msgbuf output in order to determine when when
messages were printed.

This can be enabled with the kern.msgbuf_show_timestamp sysctl

PR:		kern/161553
Reviewed by:	avg
Submitted by:	Arnaud Lacombe <lacombar@gmail.com>
Approved by:	cperciva
MFC after:	1 month
2012-02-16 05:11:35 +00:00
kib
4658c8a871 The PTRACESTOP() macro is used only once. Inline the only use and remove
the macro.

MFC after:	1 week
2012-02-11 14:49:25 +00:00
ed
ec8ff22f84 Remove unneeded newline. It fits in 80 columns now.
Pointed out by:	jh
2012-02-10 14:55:47 +00:00
ed
2b4f7a9e8a Merge si_name and __si_namebuf.
The si_name pointer always points to the __si_namebuf member inside the
same object. Remove it and rename __si_namebuf to si_name.
2012-02-10 12:40:50 +00:00
kevlo
c935e4e242 Add a missing break. This bug was introduced in r228856. 2012-02-10 06:30:52 +00:00
kib
956d09353b Mark the automatically attached child with PL_FLAG_CHILD in struct
lwpinfo flags, for PT_FOLLOWFORK auto-attachment.

In collaboration with:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	 1 week
2012-02-10 00:02:13 +00:00
mm
1626913ed1 Add support for mounting devfs inside jails.
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by:	jamie
MFC after:	1 month
2012-02-09 10:22:08 +00:00
kib
5515381ae2 Unbreak detection of the async mode for clustered writes after r231075.
Submitted by:	bde
MFC after:	12 days
2012-02-08 15:07:19 +00:00
pjd
bf6a1ce070 Allow to set kern.ipc.shmmax from /boot/loader.conf.
MFC after:	1 week
2012-02-08 09:18:22 +00:00
ed
3a052ed6cd Fix whitespace inconsistencies in TTY code. 2012-02-06 18:15:46 +00:00
jhb
1a6892cbe7 Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().
2012-02-06 17:00:28 +00:00
kib
52c17430bc Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
kevlo
4c30dca03c - Use uint8_t for the variable x and spell the size of the variable
as sizeof(x)
- Capitalized comment
- Parentheses around return value

Requested by:	bde
2012-02-06 06:03:16 +00:00
mm
8a4156df5d Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after:	10 days
2012-02-05 10:59:50 +00:00
davidxu
c4909ace45 Add 32-bit compat code for AIO kevent flags introduced in revision 230857. 2012-02-05 04:49:31 +00:00
rstone
d657a808e1 Whenever a new kernel thread is spawned, explicitly clear any CPU affinity
set on the new thread.  This prevents the thread from inadvertently
inheriting affinity from a random sibling.

Submitted by:	attilio
Tested by:	pho
MFC after:	1 week
2012-02-04 16:49:29 +00:00
hrs
43b8619604 Fix input validation in SO_SETFIB.
Reviewed by:	bz
MFC after:	1 day
2012-02-04 15:00:26 +00:00
kib
a7d7065053 Add kqueue support to /dev/klog.
Submitted by:	Mateusz Guzik <mjguzik gmail com>
PR:	  kern/156423
MFC after:	1 weeks
2012-02-01 14:34:52 +00:00
davidxu
e9edcfe7c1 If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.

PR:	kern/156567
2012-02-01 02:53:06 +00:00
kib
e1d70baef7 A debugger which requested PT_FOLLOW_FORK should get the notification
about new child not only when doing PT_TO_SCX, but also for PT_CONTINUE.
If TDB_FORK flag is set, always issue a stop, the same as is done for
TDB_EXEC.

Reported by:	Dmitry Mikulin <dmitrym juniper net>
MFC after:	1 week
2012-01-30 20:00:29 +00:00
jhb
d372b841c6 Refine the implementation of POSIX_FADV_NOREUSE for the read(2) case such
that instead of using direct I/O it allows read-ahead similar to
POSIX_FADV_NORMAL, but invokes VOP_ADVISE(POSIX_FADV_DONTNEED) after the
read(2) has completed to purge just-read data.  The write(2) path continues
to use direct I/O for POSIX_FADV_NOREUSE for now.  Note that NOREUSE works
optimally if an application reads and writes full fs blocks.
2012-01-30 19:35:15 +00:00
ambrisko
2e6fa96915 When detaching an AIO or LIO requests grab the lock and tell knlist_remove
that we have the lock now.  This cleans up a locking panic ASSERT when
knlist_empty is called without a lock when INVARIANTS etc. are turned.

Reviewed by:	kib jhb
MFC after:	1 week
2012-01-30 19:19:22 +00:00
kib
6392be1eb8 Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit
and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported,
and some variants of powerpc32.

MFC after:	2 months (if ever)
2012-01-30 07:56:00 +00:00
attilio
1521eb4479 Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with:	flo
Reviewed by:	avg
MFC after:	3 months (or never)
X-MFC:		r228424
2012-01-28 14:00:21 +00:00
glebius
ac538c830d Fix size check, that prevents getting negative after casting
to a signed type

Reviewed by:	bde
2012-01-27 08:58:58 +00:00
ken
7f685c218a Xen netback driver rewrite.
share/man/man4/Makefile,
share/man/man4/xnb.4,
sys/dev/xen/netback/netback.c,
sys/dev/xen/netback/netback_unit_tests.c:

	Rewrote the netback driver for xen to attach properly via newbus
	and work properly in both HVM and PVM mode (only HVM is tested).
	Works with the in-tree FreeBSD netfront driver or the Windows
	netfront driver from SuSE.  Has not been extensively tested with
	a Linux netfront driver.  Does not implement LRO, TSO, or
	polling.  Includes unit tests that may be run through sysctl
	after compiling with XNB_DEBUG defined.

sys/dev/xen/blkback/blkback.c,
sys/xen/interface/io/netif.h:

	Comment elaboration.

sys/kern/uipc_mbuf.c:

	Fix page fault in kernel mode when calling m_print() on a
	null mbuf.  Since m_print() is only used for debugging, there
	are no performance concerns for extra error checking code.

sys/kern/subr_scanf.c:

	Add the "hh" and "ll" width specifiers from C99 to scanf().
	A few callers were already using "ll" even though scanf()
	was handling it as "l".

Submitted by:	Alan Somers <alans@spectralogic.com>
Submitted by:	John Suykerbuyk <johns@spectralogic.com>
Sponsored by:	Spectra Logic
MFC after:	1 week
Reviewed by:	ken
2012-01-26 16:35:09 +00:00
glebius
7900947bc5 Although aio_nbytes is size_t, later is is signed to
casted types: to ssize_t in filesystem code and to
int in buf code, thus supplying a negative argument
leads to kernel panic later. To fix that check user
supplied argument in the beginning of syscall.

Submitted by:	Maxim Dounin <mdounin mdounin.ru>, maxim@
2012-01-26 11:59:48 +00:00
kib
cc993a6b75 When doing vflush(WRITECLOSE), clean vnode pages.
Unmounts do vfs_msync() before calling VFS_UNMOUNT(), but there is
still a race allowing a process to dirty pages after msync
finished. Remounts rw->ro just left dirty pages in system.

Reviewed by:	alc, tegge (long time ago)
Tested by:	pho
MFC after:	2 weeks
2012-01-25 20:54:09 +00:00
kib
6f4618881e Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps.  Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by:  jhb (previous version)
MFC after:    2 weeks
2012-01-25 20:48:20 +00:00
trociny
e59865bdd0 Fix CTL flags in the declarations of KERN_PROC_ENV, AUXV and
PS_STRINGS sysctls: they are read only.

MFC after:	1 week
2012-01-25 20:15:58 +00:00
kib
947cf51cc2 Apparently, both nfs clients do not use cache_enter_time()
consistently, creating some namecache entries without NCF_TS flag.
This causes panic due to failed assertion.

As a temporal relief, remove the assert. Return epoch timestamp for
the entries without timestamp if asked.

While there, consolidate the code which returns timestamps, into a
helper cache_out_ts().

Discussed with:	 jhb
MFC after: 2 weeks
2012-01-23 17:09:23 +00:00
glebius
18321230d6 Convert panic()s to KASSERT()s. This is an optimisation for
hashdestroy() since in absence of INVARIANTS a compiler
will drop the entire for() cycle.
2012-01-23 16:31:46 +00:00
trociny
fcd1c36656 Change kern.proc.rlimit sysctl to:
- retrive only one, specified limit for a process, not the whole
  array, as it was previously (the sysctl has been added recently and
  has not been backported to stable yet, so this change is ok);

- allow to set a resource limit for another process.

Submitted by:	Andrey Zonov <andrey at zonov.org>
Discussed with:	kib
Reviewed by:	kib
MFC after:	2 weeks
2012-01-22 20:25:00 +00:00
pjd
32d21832e3 TDF_* flags should be used with td_flags field and TDP_* flags should be used
with td_pflags field. Correct two places where it was not the case.

Discussed with:	kib
MFC after:	1 week
2012-01-22 11:01:36 +00:00
kib
fb6370fb86 Remove the nc_time and nc_ticks elements from struct namecache, and
provide struct namecache_ts which is the old struct namecache. Only
allocate struct namecache_ts if non-null struct timespec *tsp was
passed to cache_enter_time, otherwise use struct namecache.

Change struct namecache allocation and deallocation macros into static
functions, since logic becomes somewhat twisty.  Provide accessor for
the nc_name member of struct namecache to hide difference between
struct namecache and namecache_ts.

The aim of the change is to not waste 20 bytes per small namecache
entry.

Reviewed by:	 jhb
MFC after: 2 weeks
X-MFC-note:  after r230394
2012-01-22 01:11:06 +00:00
mm
ada0b70d26 Use separate buffer for global path to avoid overflow of path buffer.
Reviewed by:	jamie@
MFC after:	3 weeks
2012-01-21 00:06:21 +00:00
jhb
f75e35e4d7 Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client.  The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries.  However, if there are multiple entries for a single vnode,
they all share a single timestamp.  To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry.  The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode.  Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache.  The latter is subject to races with other
lookups updating the attribute cache concurrently.  Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
  a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
  cache does not store name cache entries for ".", so we cannot get a
  useful timestamp.  It didn't really make much sense to recheck the
  attributes on the the directory to validate the namecache hit for "."
  anyway.
- ABI compat shims for the name cache routines are present in this commit
  so that it is safe to MFC.

MFC after:	2 weeks
2012-01-20 20:02:01 +00:00
kib
35f031ce17 Use shared lock for the executable vnode in the exec path after the
VV_TEXT changes are handled. Assert that vnode is exclusively locked at
the places that modify VV_TEXT.

Discussed with:	alc
MFC after:	3 weeks
2012-01-19 23:03:31 +00:00
alc
091f2726d5 Explain why it is safe to unlock the vnode.
Requested by:	kib
2012-01-17 16:20:50 +00:00
mckusick
af2e331939 Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks
2012-01-17 01:08:01 +00:00
alc
5210c69a89 Improve abstraction. Eliminate direct access by elf*_load_section()
to an OBJT_VNODE-specific field of the vm object.  The same
information can be just as easily obtained from the struct vattr that
is in struct image_params if the latter is passed to
elf*_load_section().  Moreover, by replacing the vmspace and vm
object parameters to elf*_load_section() with a struct image_params
parameter, we actually reduce the size of the object code.

In collaboration with:	kib
2012-01-17 00:27:32 +00:00
pluknet
b3d0f7050a Be pedantic and change // comment to C-style one.
Noticed by:		Bruce Evans
2012-01-16 20:42:56 +00:00
kevlo
c0b68d117e Fix a style bug
Spotted by:	avg
2012-01-16 14:54:48 +00:00
davidxu
0483748480 Eliminate branch and insert an explicit reader memory barrier to ensure
that waiter bit is set before reading semaphore count.
2012-01-16 04:39:10 +00:00
trociny
d4e71152bd Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by:	kib
MFC after:	3 days
2012-01-15 18:47:24 +00:00
mm
2621071309 Fix missing in r230129:
kern_jail.c: initialize fullpath_disabled to zero
vfs_cache.c: add missing dot in comment

Reported by:	kib
MFC after:	1 month
2012-01-15 18:08:15 +00:00
uqs
d61d88a310 Convert files to UTF-8 2012-01-15 13:23:18 +00:00
mm
9f44ed5ca8 Introduce vn_path_to_global_path()
This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.

In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.

Unbreaks jailed zfs(8) with enforce_statfs set to 1.

Reviewed by:	kib
MFC after:	1 month
2012-01-15 12:08:20 +00:00
eadler
e07bec5a9c - Fix undefined behavior when device_get_name is null
- Make error message more informative

PR:		kern/149800
Submitted by:	olgeni
Approved by:	cperciva
MFC after:	1 week
2012-01-15 07:09:18 +00:00