Commit Graph

13106 Commits

Author SHA1 Message Date
Alexander Motin
1af19ee4a2 Add support for good old 8192Hz profiling clock to software PMC.
Reviewed by:	fabient
2013-02-26 18:13:42 +00:00
Attilio Rao
590f9303e5 Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by:	EMC / Isilon storage division
Tested by:	pho
Reviewed by:	alc
2013-02-26 01:00:11 +00:00
Pawel Jakub Dawidek
1d59211b2e Style.
Suggested by:	kib
2013-02-25 20:51:29 +00:00
Pawel Jakub Dawidek
893365e42d After r237012, the fdgrowtable() doesn't drop the filedesc lock anymore,
so update a stale comment.

Reviewed by:	kib, keramida
2013-02-25 20:50:08 +00:00
John Baldwin
593efaf9f7 Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep().  In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing.  Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag.  Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed.  For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
  the request as being on behalf of the thread performing the lookup
  (cnp_thread) rather than using a NULL thread pointer.  This causes
  NFS to properly handle signals during this VOP on an interruptible
  mount.

PR:		kern/176179
Reported by:	Russell Cattelan (sigdeferstop() recursion)
Reviewed by:	kib
MFC after:	1 month
2013-02-21 19:02:50 +00:00
Jamie Gritton
ffc72591b1 Don't worry if a module is already loaded when looking for a fstype to mount
(possible in a race condition).

Reviewed by:	kib
MFC after:	1 week
2013-02-21 02:41:37 +00:00
John Baldwin
353374b525 Fix a few typos. 2013-02-19 16:35:27 +00:00
Pawel Jakub Dawidek
b2e054b0d4 Update the comment: we do show the backtrace of misbehaving thread. 2013-02-17 21:37:32 +00:00
Pawel Jakub Dawidek
f0ad2ecb9c Style. 2013-02-17 11:56:36 +00:00
Pawel Jakub Dawidek
8e1d51ab40 - Require CAP_FSYNC capability right when opening a file with O_SYNC or O_FSYNC
flags.
- While here simplify check for locking flags.

Sponsored by:	The FreeBSD Foundation
2013-02-17 11:53:51 +00:00
Pawel Jakub Dawidek
11b0cfe3cd Remove redundant parenthesis. 2013-02-17 11:49:21 +00:00
Pawel Jakub Dawidek
49549b1894 Remove redundant space. 2013-02-17 11:48:16 +00:00
Pawel Jakub Dawidek
6c08be2b88 Add break to the default case. 2013-02-17 11:47:58 +00:00
Pawel Jakub Dawidek
4881a5950e Don't treat pointers as booleans. 2013-02-17 11:47:30 +00:00
Pawel Jakub Dawidek
de26549841 Remove redundant parenthesis. 2013-02-17 11:47:01 +00:00
Kirk McKusick
2bc1a1fe5c Add barrier write capability to the VFS buffer interface. A barrier
write is a disk write request that tells the disk that the buffer
being written must be committed to the media along with any writes
that preceeded it before any future blocks may be written to the drive.

Barrier writes are provided by adding the functions bbarrierwrite
(bwrite with barrier) and babarrierwrite (bawrite with barrier).

Following a bbarrierwrite the client knows that the requested buffer
is on the media. It does not ensure that buffers written before that
buffer are on the media. It only ensure that buffers written before
that buffer will get to the media before any buffers written after
that buffer. A flush command must be sent to the disk to ensure that
all earlier written buffers are on the media.

Reviewed by: kib
Tested by:   Peter Holm
2013-02-16 14:51:30 +00:00
Ian Lepore
a1137de941 Add PPS_CANWAIT support for time_pps_fetch(). This adds support for all three
blocking modes described in section 3.4.3 of RFC 2783, allowing the caller
to retrieve the most recent values without blocking, to block for a specified
time, or to block forever.

Reviewed by:	discussion on hackers@
2013-02-15 18:30:32 +00:00
Sergey Kandaurov
d7ffa24831 vn_io_faults_cnt:
- use u_long consistently
- use SYSCTL_ULONG to match the type of variable

Reviewed by:	kib
MFC after:	1 week
2013-02-15 14:22:05 +00:00
Sergey Kandaurov
ab15d8039e Add support of passing SCM_BINTIME ancillary data object for PF_LOCAL
sockets.

PR:		kern/175883
Submitted by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Discussed with:	glebius, phk
MFC after:	2 weeks
2013-02-15 13:00:20 +00:00
Ian Lepore
74938cbb7f Make the F_READAHEAD option to fcntl(2) work as documented: a value of zero
now disables read-ahead.  It used to effectively restore the system default
readahead hueristic if it had been changed; a negative value now restores
the default.

Reviewed by:	kib
2013-02-13 15:09:16 +00:00
Konstantin Belousov
dd0b4fb6d5 Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c.  It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code.  The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync().  Previously this was done in a type specific
way.  Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by:	jeff (sponsored by EMC/Isilon)
Reviewed by:	kan (previous version), scottl,
	mjacob (isp(4), no objections for target mode changes)
Discussed with:	     ian (arm changes)
Tested by:	marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
	amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
2013-02-12 16:57:20 +00:00
Marius Strobl
18716f9f4b Update comments to reflect r246689. 2013-02-11 23:05:10 +00:00
Marius Strobl
bdc5f0172e Make SYSCTL_{LONG,QUAD,ULONG,UQUAD}(9) work as advertised and also handle
constant values.

Reviewed by:	kib
MFC after:	3 days
2013-02-11 21:50:00 +00:00
Konstantin Belousov
2871baa49a Remove the ia64-specific code fragment, which effect is more cleanly
done by the call to trans_prot() function a line before.

Discussed with:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	1 week
2013-02-10 20:08:33 +00:00
Andriy Gapon
c43b08dc6c ktr: correctly handle possible wrap-around in the boot buffer
Older entries should be 'before' newer entries in the new buffer too
and there should be no zero-filled gap between them.

Pointed out by:	jhb
MFC after:	3 days
X-MFC with:	r246282
2013-02-08 07:29:07 +00:00
Konstantin Belousov
888d4d4f86 When vforked child is traced, the debugging events are not generated
until child performs exec().  The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.

On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.

Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME).  Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case.  The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.

Found and tested by:  zont
Reviewed by:	pluknet
MFC after:	2 weeks
2013-02-07 15:34:22 +00:00
Konstantin Belousov
2ca4998342 Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by:	jilles
Discussed with:	bde
MFC after:	2 weeks
2013-02-07 14:53:33 +00:00
Neel Natu
dae3dc73f6 If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

The text used above is verbatim from r195249 and the code should now be
in line with the intent of that commit.
2013-02-07 06:48:47 +00:00
Pawel Jakub Dawidek
fbda3d5dae Audit sockaddr argument for bind(2), connect(2), accept(2), sendto(2) and
recvfrom(2) syscalls.

Sponsored by:	The FreeBSD Foundation
2013-02-07 00:36:00 +00:00
Pawel Jakub Dawidek
82b316b377 Minor style tweaks. 2013-02-07 00:27:11 +00:00
John Baldwin
a120a7a3cd Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary.  However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly.  Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks).  The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland.  This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by:	kib
MFC after:	1 month
2013-02-06 17:06:51 +00:00
Sergey Kandaurov
23c053d6a2 Prezero the acl structure which is to be copied to usermode, to avoid
leakage of the previous content of padding and unitialized fields.

Reported by:	Ilia Noskov <noskov@nic.ru>
Reviewed by:	kib
MFC after:	1 week
2013-02-06 15:18:46 +00:00
Sergey Kandaurov
51dc4fea4c Remove reference to the rlist code from comments, and fix a typo visible
in the resulted change.

Reviewed by:	kib
MFC after:	1 week
2013-02-05 20:08:33 +00:00
Andriy Gapon
c8199bc955 ktr: prevent possible footshooting with KTR_ENTRIES and KTR_BOOT_ENTRIES
Suggested by:	adrian
MFC after:	14 days
X-MFC with:	r246282
2013-02-04 21:58:57 +00:00
Andriy Gapon
f85ed12497 ktr: copy content from the early static buffer if KTR_ENTRIES !=
KTR_BOOT_ENTRIES

Reported by:	glebius, jhb
Pointyhat to:	avg
MFC after:	14 days
X-MFC with:	r246282
2013-02-04 21:50:55 +00:00
Marius Strobl
94bfd5b1a0 Try to improve r242655 take III: move these SYSCTLs describing the kernel
map, which is defined and initialized in vm/vm_kern.c, to the latter.

Submitted by:	alc
2013-02-04 09:35:48 +00:00
Marius Strobl
e8cbe54bc4 Further improve r242655 and supply VM_{MIN,MAX}_KERNEL_ADDRESS as constant
values to SYSCTL_ULONG(9) where possible.

Submitted by:	bde
2013-02-03 21:43:55 +00:00
Andriy Gapon
36b7dde416 allow for large KTR_ENTRIES values by allocating ktr_buf using malloc(9)
Only during very early boot, before malloc(9) is functional (SI_SUB_KMEM),
the static ktr_buf_init is used.  Size of the static buffer is determined
by a new kernel option KTR_BOOT_ENTRIES.  Its default value is 1024.

This commit builds on top of r243046.

Reviewed by:	alc
MFC after:	17 days
2013-02-03 09:57:39 +00:00
Andriy Gapon
8eede5c4d9 fix some fat-fingering in r246246
Submitted by:	mjg
Pointyhat to:	avg
MFC after:	5 days
X-MFC with:	r246246
2013-02-02 14:19:50 +00:00
Andriy Gapon
bfdcb3bcba print compiler version in the kernel banner
And provide kernel compiler version as a sysctl as well.
This is useful while we have gcc and clang cohabitation.
This could be even more useful when we have support
for external toolchains.

In cooperation with:	mjg
MFC after:		13 days
2013-02-02 11:58:35 +00:00
Grzegorz Bernacki
2d7d16429c Get time of next event from other cores only if SMP is already started.
Reviewed by: mav
Obtained from: Semihalf
2013-02-01 11:39:03 +00:00
Pawel Jakub Dawidek
4bbe7b0c20 Now that MPSAFE flag is gone, we can arrange code a bit better. 2013-01-31 22:20:05 +00:00
Pawel Jakub Dawidek
b108953c6f Remove leftover label after Giant removal from VFS. 2013-01-31 22:15:41 +00:00
Pawel Jakub Dawidek
a2c496ebb9 Remove label that was accidentally moved during Giant removal from VFS. 2013-01-31 22:14:16 +00:00
Pawel Jakub Dawidek
9e2677fd6d Simplify code a bit. This is leftover after Giant removal from VFS. 2013-01-31 22:12:48 +00:00
Konstantin Belousov
538375d42e The case of pid == WAIT_MYPGRP for the kern_wait() is already handled
in kern_wait6(), which is called by kern_wait().  Remove the redundand
check, introduced in r243136, and add a comment noting this, to make
the code less confusing.

The blank lines are added to properly delineate the scope of the
preceeding comments.

Noted by:	"Jukka A. Ukkonen" <jau@iki.fi>
MFC after:	1 week
2013-01-30 13:14:15 +00:00
John Baldwin
a8df530ddc Mark 'ticks', 'time_second', and 'time_uptime' as volatile to prevent the
compiler from caching their values in tight loops.

Reviewed by:	bde
MFC after:	1 week
2013-01-28 19:38:13 +00:00
Gleb Smirnoff
29110f87a6 - Move large functions m_getjcl() and m_get2() to kern/uipc_mbuf.c
- style(9) fixes to mbuf.h

Reviewed by:	bde
2013-01-24 09:29:41 +00:00
John Baldwin
75d774e36a Fix a typo. 2013-01-23 14:37:05 +00:00
Andre Oppermann
371407162b Move the mbuf memory limit calculations from init_param2() to
tunable_mbinit() where it is next to where it is used later.

Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES
to SI_SUB_KMEM after the VM is running.  This allows to use better
methods to determine the effectively available physical and virtual
memory available to the kernel.

Update comments.

In a second step it can be merged into mbuf_init().
2013-01-17 21:28:31 +00:00
Alfred Perlstein
17ebe960a6 Do not autotune ncallout to be greater than 18508.
When maxusers was unrestricted and maxfiles was allowed to autotune
much higher the result was that ncallout which was based on maxfiles
and maxproc grew much higher than was needed.

To fix this clip autotuning to the same number we would get with
the old maxusers algorithm which would stop scaling at 384
maxusers.

Growing ncalout higher is not likely to be needed since most consumers
of timeout(9) are gone and any higher value for ncallout causes the
callwheel hashes to be much larger than will even be needed for
most applications.

MFC after:      1 month

Reviewed by:	mav
2013-01-15 19:26:17 +00:00
Andrey Zonov
0165f660c1 - Detect when we are in KVM.
Silence on:	emulation
Approved by:	kib (mentor)
MFC after:	1 week
2013-01-15 14:05:59 +00:00
Konstantin Belousov
10b4bb0b33 Add a trivial comment to record the proper commit log for r245407:
Set the v_hash for a new vnode in the getnewvnode() to the value
calculated based on the vnode structure address.  Filesystems using
vfs_hash_insert() override the v_hash using the standard formula of
(inode_number + mnt_hashseed).  For other filesystems, the
initialization allows the vfs_hash_index() to provide useful hash too.

Suggested, reviewed and tested by:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:52:23 +00:00
Konstantin Belousov
a41df84820 diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 7c243b6..0bdaf36 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW,
 #define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt)
 #define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt)

+static int vnsz2log;

 /*
  * Initialize the vnode management data structures.
@@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW,
 static void
 vntblinit(void *dummy __unused)
 {
+	u_int i;
 	int physvnodes, virtvnodes;

 	/*
@@ -332,6 +334,9 @@ vntblinit(void *dummy __unused)
 	syncer_maxdelay = syncer_mask + 1;
 	mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF);
 	cv_init(&sync_wakeup, "syncer");
+	for (i = 1; i <= sizeof(struct vnode); i <<= 1)
+		vnsz2log++;
+	vnsz2log--;
 }
 SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL);

@@ -1067,6 +1072,14 @@ alloc:
 	}
 	rangelock_init(&vp->v_rl);

+	/*
+	 * For the filesystems which do not use vfs_hash_insert(),
+	 * still initialize v_hash to have vfs_hash_index() useful.
+	 * E.g., nullfs uses vfs_hash_index() on the lower vnode for
+	 * its own hashing.
+	 */
+	vp->v_hash = (uintptr_t)vp >> vnsz2log;
+
 	*vpp = vp;
 	return (0);
 }
2013-01-14 05:42:54 +00:00
Konstantin Belousov
f6af8e375c Add exported vfs_hash_index() function, which calculates the canonical
pre-masked hash for the given vnode.  The function assumes that
vp->v_hash is initialized by the filesystem vnode instantiation
function.  At the moment, it is only done if filesystem uses
vfs_hash_insert().

Reviewed by:	peter
Tested by:	peter, pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:41:40 +00:00
Konstantin Belousov
7b982bc831 Rename vfs_hash_index() to vfs_hash_bucket().
Reviewed by:	peter
Tested by:	peter, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	5 days
2013-01-14 05:40:21 +00:00
Konstantin Belousov
ddd6b3fc33 Add flags argument to vfs_write_resume() and remove
vfs_write_resume_flags().

Sponsored by:	The FreeBSD Foundation
2013-01-11 06:08:32 +00:00
Mateusz Guzik
43287e2753 lockmgr: unlock interlock (if requested) when dealing with upgrade/downgrade
requests for LK_NOSHARE locks, just like for shared locks.

PR:		kern/174969
Reviewed by:	attilio
MFC after:	1 week
2013-01-06 21:47:59 +00:00
Konstantin Belousov
a545089ed5 Protect the p->p_pgrp dereference with the process lock.
MFC after:	3 days
2013-01-06 15:10:10 +00:00
Neel Natu
a09580e7a3 Teach the kernel to recognize that it is executing inside a bhyve virtual
machine.

Obtained from:	NetApp
2013-01-05 19:18:50 +00:00
Benjamin Kaduk
5e9723e271 Fix some minor inaccuracies introduced in r243251.
Also correct the comment in kern_synch.c which was the source of the
problematic text.

Reviewed by:	kib (previous version)
Approved by:	hrs (mentor)
2013-01-05 00:23:26 +00:00
David Xu
eea8d86d4d Revert revision 244760 because strncpy pads trailing space with zero,
this prevents kernel data from being leaked.

Noticed by: Joerg Sonnenberger &lt; joerg at britannica dot bec dot de &gt;
2013-01-04 11:11:12 +00:00
Konstantin Belousov
d1c5e3f8b0 Remove the deprecated MNT_VNODE_FOREACH interface. Use the
MNT_VNODE_FOREACH_ALL instead.
2013-01-03 19:02:52 +00:00
Konstantin Belousov
f99cb34c4f The process_deferred_inactive() function locks the vnodes of the ufs
mount, which means that is must not be called while the snaplock is
owned.  The vfs_write_resume(9) does call the function as the
VFS_SUSP_CLEAN() method, which is too early and falls into the region
still protected by snaplock.

Add yet another flag for the vfs_write_resume_flags() to avoid calling
suspension cleanup handler after the suspend is lifted, and use it in
the ffs_snapshot() call to vfs_write_resume.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-01-01 16:14:48 +00:00
Konstantin Belousov
91e9474552 Make it possible to atomically resume writes on the mount and account
the write start, by adding a variation of the vfs_write_resume(9)
which accepts flags.

Use the new function to prevent a deadlock between parallel suspension
and snapshotting a UFS mount.  The ffs_snapshot() code performed
vfs_write_resume() followed by vn_start_write() while owning the
snaplock.  If the suspension intervene between resume and
vn_start_write(), the deadlock occured after the suspending thread
tried to lock the snaplock, most typically during the write in the
ffs_copyonwrite().

Reported and tested by:	Andreas Longwitz <longwitz@incore.de>
Reviewed by:	mckusick
MFC after:	2 weeks
X-MFC-note:	make the vfs_write_resume(9) function a macro after the MFC,
	in HEAD
2012-12-28 23:08:30 +00:00
Oleksandr Tymoshenko
7fc3ae51f3 Fix build on ARM (and probably other platforms) 2012-12-28 06:52:53 +00:00
David Xu
9d4bf0db7c Use strlcpy to NULL-terminate error message even if user provided a short
buffer.
2012-12-28 02:43:33 +00:00
Attilio Rao
c92c859b7b Fixup r244240: mp_ncpus will be 1 also in the !SMP and smp_disabled=1
case. There is no point in optimizing further the code and use a TRUE
litteral for a path that does heavyweight stuff anyway (like lock acq),
at the price of obfuscated code.

Use the appropriate check where necessary and remove a macro.

Sponsored by:	EMC / Isilon storage division
MFC after:	3 days
2012-12-26 15:20:32 +00:00
Konstantin Belousov
ad9789f6db Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by:	Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after:	2 weeks
2012-12-23 22:43:27 +00:00
Jaakko Heinonen
b1e1f725e7 Reject spaces and double quotation marks in device names. devctl(4)
and devd(8) can't handle names with such characters properly.

PR:		bin/144736, kern/161912
Discussed with:	imp, kib, pjd
2012-12-22 13:33:28 +00:00
Attilio Rao
cd2fe4e632 Fixup r240424: On entering KDB backends, the hijacked thread to run
interrupt context can still be idlethread. At that point, without the
panic condition, it can still happen that idlethread then will try to
acquire some locks to carry on some operations.

Skip the idlethread check on block/sleep lock operations when KDB is
active.

Reported by:	jh
Tested by:	jh
MFC after:	1 week
2012-12-22 09:37:34 +00:00
Attilio Rao
b1308d72c2 Fixup r218424: uio_yield() was scaling directly to userland priority.
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.

Tested by:	pho
Reviewed by:	kib, mdf
MFC after:	1 week
2012-12-21 13:14:12 +00:00
Dag-Erling Smørgrav
b5471c918f Rewrite fdgrowtable() so common mortals can actually understand what
it does and how, and add comments describing the data structures and
explaining how they are managed.
2012-12-20 20:18:27 +00:00
Olivier Houchard
05d9035003 Create an architecture-agnostic buffer pool manager that uses uma(9) to
manage a set of power-of-2 sized buffers for bus_dmamem_alloc().

This allows the caller to provide the back-end allocator uma allocator,
allowing full control of the memory pages backing the pool.  For
convenience, it provides an optional builtin allocator that provides pages
allocated with the VM_MEMATTR_UNCACHEABLE attribute, for managing pools of
DMA buffers for BUS_DMA_COHERENT or BUS_DMA_NOCACHE.

This also allows the caller to specify a minimum alignment, and it ensures
that all buffers start on a boundary and have a length that's a multiple of
that value, to avoid using buffers that trigger partial cache line flushes.

Submitted by:	Ian Lepore <freebsd@damnhippie.dyndns.org>
2012-12-20 00:34:54 +00:00
Pawel Jakub Dawidek
c345faea5a Replace expand_name() function with corefile_open() function, which not
only returns name, but also vnode of corefile to use.

This simplifies the code and closes few races, especially in %I handling.

Reviewed by:	kib
Obtained from:	WHEEL Systems
2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek
22a5d85aa9 Use correct file permissions when looking for available core file if
kern.corefile contains %I.

Obtained from:	WHEEL Systems
2012-12-19 23:40:02 +00:00
Jeff Roberson
4c44811c9d - Add new machine parsable KTR macros for timing events.
- Use this new format to automatically handle syscalls and VOPs.  This
   changes the earlier format but is still human readable.

Sponsored by:	EMC / Isilon Storage Division
2012-12-19 20:10:00 +00:00
Jeff Roberson
5b39d5c739 - Correctly handle EWOULDBLOCK in quiesce_cpus
Discussed with:	mav
2012-12-19 20:08:06 +00:00
Pawel Jakub Dawidek
07a8e07896 The 'flags' argument can be modified in vn_open_cred(), so we need to
set it for every loop interation.

Pointed out by:	kib
2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek
cc58032c44 Do not audit paths we try when kern.corefile contains %I.
Obtained from:	WHEEL Systems
2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek
29146f1a7a Style cleanups. 2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek
086053a370 The expand_name() function isn't called with the process lock held anymore,
so we can safely use malloc(M_WAITOK) now.

Pointed out by:	kib
2012-12-19 12:00:09 +00:00
Mateusz Guzik
af3c786c47 prison_racct_detach can be called for not fully initialized jail, so make it check that the jail has racct before doing anything
PR:		kern/174436
Reviewed by:	trasz
MFC after:	3 days
2012-12-18 18:34:36 +00:00
Andrey Zonov
5eb0d2838c - Add sysctl to allow unprivileged users to call mlock(2)-family system
calls and turn it on.
- Do not allow to call them inside jail. [1]

Pointed out by:	trasz [1]
Reviewed by:	avg
Approved by:	kib (mentor)
MFC after:	1 week
2012-12-18 07:36:45 +00:00
Pawel Jakub Dawidek
f06f465db7 Minor style tweaks.
Obtained from:	WHEEL Systems
2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek
c52ff61196 Better variables naming in expand_name() to be more consistent with coredump().
Obtained from:	WHEEL Systems
2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek
dd57ce87eb Move expand_name() after process lock is released.
This fixed panic where we hold mutex (process lock) and try to obtain sleepable
lock (vnode lock in expand_name()). The panic could occur when %I was used
in kern.corefile.

Additionally we avoid expand_name() overhead when coredumps are disabled.

Obtained from:	WHEEL Systems
2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek
2ce1b32df2 Don't add audit record when coredumps are disabled or name cannot be expanded.
Discussed with:	rwatson
Obtained from:	WHEEL Systems
2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek
7e73ee85ab Make the check easier to read.
Obtained from:	WHEEL Systems
2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek
b039f8c2aa Use 'cred' variable.
Obtained from:	WHEEL Systems
2012-12-16 13:56:38 +00:00
Konstantin Belousov
14df601e47 When mnt_vnode_next_active iterator cannot lock the next vnode and
yields, specify the user priority for the yield.  Otherwise, a
higher-priority (kernel) thread could fall into the priority-inversion
with the thread owning the mutex lock.

On single-processor machines or UP kernels, do not loop adaptively
when the next vnode cannot be locked, instead yield unconditionally.

Restructure the iteration initializer and the iterator to remove code
duplication.  Put the code to fetch and lock a vnode next to the
current marker, into the mnt_vnode_next_active() function, and use it
instead of repeating the loop.

Reported by:	hrs, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2012-12-15 02:04:46 +00:00
Konstantin Belousov
d4015944e7 Remove a special case for XEN, which is erronous and makes vfork(2)
behaviour to differ from the documented, only on XEN.  If there are
any issues with XEN pmap left, they should be fixed in pmap.

MFC after:	2 weeks
2012-12-15 02:02:11 +00:00
Rick Macklem
f1c4014cd5 The group list for a non-default export entry (a host/subnet one)
was being copied from the wrong place. This patch fixes that.
This could cause access failures for mapped users, when the group
permissions were needed.

PR:		147998
Submitted by:	Christopher Key (cjk32 at cam.ac.uk)
MFC after:	2 weeks
2012-12-14 21:49:06 +00:00
Alfred Perlstein
15d32bd543 Cleanup more of the kassert_panic.
fix compile warnings on !amd64 and NULL derefs that would happen
if kassert_panic() would return.
2012-12-11 07:08:14 +00:00
Alfred Perlstein
c2c5ede903 Fix WITNESS when INVARIANT_SUPPORT is defined.
This fixes tinderbox breakage from r244105.

Pointed out by: adrian
2012-12-11 05:59:16 +00:00
Alfred Perlstein
6b6bd3b704 Switch the hardwired WITNESS panics to kassert_panic.
This is an ongoing effort to provide runtime debug information
useful in the field that does not panic existing installations.

This gives us the flexibility needed when shipping images to a
potentially large audience with WITNESS enabled without worrying
about formerly non-fatal LORs hurting a release.

Sponsored by: iXsystems
2012-12-11 01:23:50 +00:00
Alfred Perlstein
d3bfafb4f6 back out half of 244098.
kern.bootfile needs to be rw for installkernel.

Pointed out by: kib, flo
2012-12-11 00:10:20 +00:00
Alfred Perlstein
a94053ba39 allow KASSERT to enter KDB. 2012-12-10 23:11:26 +00:00
Alfred Perlstein
d06cadae1e make sysctls kern.{bootfile,conftxt} read-only
MFC after:	1 month
2012-12-10 23:09:55 +00:00
Konstantin Belousov
686ffcaceb Do not yield while owning a mutex. The Giant reacquire in the
kern_yield() is problematic than.

The owned mutex is the mount interlock, and it is in fact not needed
to guarantee the stability of the mount list of active vnodes, so fix
the the issue by only taking the mount interlock for MNT_REF and
MNT_REL operations.

While there, augment the unconditional yield by some amount of
spinning [1].

Reported and tested by:	pho
Reviewed by:	attilio
Submitted by:	attilio [1]
MFC after:	3 days
2012-12-10 20:44:09 +00:00