14037 Commits

Author SHA1 Message Date
kib
c1b643c449 Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.
Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all.  E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish.  On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour.  Note that it is still racy, since writes are not
stopped.

Discussed with:	avg, bjk, mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2014-12-09 10:00:47 +00:00
kib
86353785aa Apply chunk forgotten in r275620. Remove local variable for real.
CID:	1257462
Sponsored by:	The FreeBSD Foundation
2014-12-09 09:36:28 +00:00
kib
1a8d4344d0 Add functions syncer_suspend() and syncer_resume(), which are supposed
to be called before suspension and after resume, correspondingly.  The
syncer_suspend() ensures that all filesystems dirty data and metadata
are saved to the permanent storage, and stops kernel threads which
might modify filesystems.  The syncer_resume() restores stopped
threads.

For now, only syncer is stopped.  This is needed, because each sync
loop causes superblock updates for UFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:48:57 +00:00
kib
0957112e3c When getnewbuf_reuse_bp() is called to reclaim some (clean) buffer,
the vnode owning the buffer is not locked.  More, it cannot be locked
safely, since getnewbuf_reuse_bp() is called from newbuf(), and some
other vnode is already locked, for which reused buffer will be
reassigned.

As the consequence, reclamation of the owning vnode could go in
parallel, in particular, the call to vnode_destroy_vobject(), which
deallocates the vm object and zeroes the v_bufobj->bo_object.  Note
that the pages wired by the buffer are left wired and can be safely
freed by the vfs_vmio_release() without the need for the vm object
lock.  Also, seeing stale pointer to the v_object is safe due to vm
object type stability.

Check for bo_bufobj != NULL and cache the value in local variable to
avoid trying to lock NULL vm object.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:42:34 +00:00
kib
6ad3754659 Do some refactoring and minor cleanups of the thread_single() code in
preparation for the global stop commit.

Move the code to weed suspended or sleeping threads into the
appropriate state, into the helper weed_inhib().  Current code already
has deep nesting and hard to follow [1].

Add currently useless helper remain_for_mode(), which returns the
count of threads which are allowed to run, according to the
single-threading mode.

In thread_single_end(), do not save curthread into local variable, it
is unused after, except to find curproc.

Remove stray empty line.

Requested by:	avg [1]
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:27:43 +00:00
kib
042604e2ce Thread waiting for the vfork(2)-ed child to exec or exit, must allow
for the suspension.

Currently, the loop performs uninterruptible cv_wait(9) call, which
prevents suspension until child allows further execution of parent.
If child is stopped, suspension or single-threading is delayed
indefinitely.

Create a helper thread_suspend_check_needed() to identify the need for
a call to thread_suspend_check().  It is required since call to the
thread_suspend_check() cannot be safely done while owning the child
(p2) process lock.  Only when suspension is needed, drop p2 lock and
call thread_suspend_check().  Perform wait for cv with timeout, in
case suspend is requested after wait started; I do not see a better
way to interrupt the wait.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:18:05 +00:00
kib
efcd39f691 When process is exiting, check for suspension regardless of
multithreaded status of the process.

The stopped state must be cleared before P_WEXIT is set.  A stop
signal delivered just before first PROC_LOCK() block in exit1(9) would
put the process into pending stop with P_WEXIT set or assertion
triggered.  Also recheck for the suspension after failed
thread_single(9) call, since process lock could be dropped.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:02:02 +00:00
avg
15bfe2d262 remove opensolaris cyclic code, replace with high-precision callouts
In the old days callout(9) had 1 tick precision and that was inadequate
for some uses, e.g. DTrace profile module, so we had to emulate cyclic
API and behavior.  Now we can directly use callout(9) in the very few
places where cyclic was used.

Differential Revision:	https://reviews.freebsd.org/D1161
Reviewed by:	gnn, jhb, markj
MFC after:	2 weeks
2014-12-07 11:21:41 +00:00
imp
46a8e0560d Const poison in a few places to ensure we don't modify things
through the module data pointer.
2014-12-03 22:14:13 +00:00
jhb
1e8b1cd510 Revert device_getenv_int() for now as it duplicates resource_int_value().
We should perhaps implement a device_getenv_*() and device_setenv_*() API
as a convenience wrapper on top of resource_*_value() and resource_set_*().
2014-12-03 15:29:53 +00:00
kib
1941346c9d Disable recursion for the process spinlock.
Tested by:	pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2014-12-01 17:36:10 +00:00
gibbs
36e4267804 Remove trailing whitespace. 2014-11-30 19:32:00 +00:00
glebius
3dd3d6d9ff Merge from projects/sendfile:
Provide pru_ready for AF_LOCAL sockets.  Local sockets sendsdata directly
to the receive buffer of the peer, thus pru_ready also works on the peer
socket.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 13:40:58 +00:00
glebius
9cadf1b974 Merge from projects/sendfile: extend protocols API to support
sending not ready data:
o Add new flag to pru_send() flags - PRUS_NOTREADY.
o Add new protocol method pru_ready().

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-30 13:24:21 +00:00
glebius
25da94eb3e Merge from projects/sendfile:
o Introduce a notion of "not ready" mbufs in socket buffers.  These
mbufs are now being populated by some I/O in background and are
referenced outside.  This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
  freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
  The sb_ccc stands for ""claimed character count", or "committed
  character count".  And the sb_acc is "available character count".
  Consumers of socket buffer API shouldn't already access them directly,
  but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
  with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
  search.
o New function sbready() is provided to activate certain amount of mbufs
  in a socket buffer.

A special note on SCTP:
  SCTP has its own sockbufs.  Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs.  Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them.  The new
notion of "not ready" data isn't supported by SCTP.  Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
  A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does.  This was discussed with rrs@.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 12:52:33 +00:00
glebius
176ae2299c - Move sbcheck() declaration under SOCKBUF_DEBUG.
- Improve SOCKBUF_DEBUG macros.
- Improve sbcheck().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-30 11:22:39 +00:00
glebius
843bdaf93c Make sballoc() and sbfree() functions. Ideally, they could be marked
as static, but unfortunately Infiniband (ab)uses them.

Sponsored by:	Nginx, Inc.
2014-11-30 11:02:07 +00:00
imp
d737995628 The current limit of 100k for the linker hints file is getting a bit
crowded as we now are at about 70k. Bump the limit to 1MB instead
which is still quite a reasonable limit and allows for future growth
of this file and possible future expansion to additional data.

MFC After: 2 weeks
2014-11-29 17:29:30 +00:00
kib
9127a1f190 Remove lock recursion for the pipe pair mutex, and disable the
recursion on mutex initialization.

The only places where the recursive acquire is performed are read and
write filters, since knlist, which uses the pipe pair mutex as lock,
is locked when filter is called.

The recursion was added in r93296, and consistent locking for
kn_fop->f_event() introduced in r133741.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2014-11-29 17:18:20 +00:00
kib
1642e3809f Assert the state of the process lock and sigact mutex in
kern_sigprocmask() and reschedule_signals().

Discussed with:	rea
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-28 10:20:00 +00:00
hselasky
f4e178e921 Style changes:
- Move two IOCTL related defines to the top of the C-file
- Add more comments describing the recently added IOCTL small size and
small align macros
2014-11-28 09:32:07 +00:00
alfred
1413b742e9 Make igb and ixgbe check tunables at probe time.
This allows one to make a kernel module to tune the
number of queues before the driver loads.

This is needed so that a module at SI_SUB_CPU can set
tunables for these drivers to take.  Otherwise getenv
is called too early by the TUNABLE macros.

Reviewed by: smh
Phabric: https://reviews.freebsd.org/D1149
2014-11-26 20:19:36 +00:00
kib
11cee2ecf7 The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
  the process, and interlocking with process mutex and thread lock.
  The main reason of this is that turnstile locks are after thread
  locks, so you e.g. cannot unlock blockable mutex (think process
  mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
  from the clock interrupt context.  Replace the p_slock by p_itimmtx
  and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason.  Replace the p_slock
  by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting.  Need for the spinlock there is subtle,
  my understanding is that spinlock blocks context switching for the
  current thread, which prevents td_runtime and similar fields from
  changing (updates are done at the mi_switch()).  Replace the p_slock
  by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-26 14:10:00 +00:00
kib
4501dadd00 Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.

Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().

Submitted by:	rea
MFC after:	1 week
2014-11-26 14:09:04 +00:00
jhb
5ffe4e5562 Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add
a new per-device '%domain' sysctl node that returns the NUMA domain a
device is associated with if it is associated with one.

Note that this API is still a WIP and might change before 11.0 actually
ships.

Differential Revision:	https://reviews.freebsd.org/D930
Reviewed by:	kib, adrian
2014-11-24 19:55:45 +00:00
jhb
c5e82d754f Properly initialize the capability rights for vnodes exported to procstat
that aren't for file descriptors (cwd, jdir, tracevp, etc.).

Submitted by:	Mikhail <mp@lenta.ru>
2014-11-24 18:34:11 +00:00
glebius
b4ef8e602d Merge from projects/sendfile:
o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but
  doesn't sleep. It returns immediately, and will execute the I/O done handler
  function that must be supplied as argument.
o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager.
o Extend pagertab to support pgo_getpages_async method, and implement this
  method for vnode_pager.

Reviewed by:	kib
Tested by:	pho
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-23 12:01:52 +00:00
mjg
a4324a5514 ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct
Change racct_ create and destroy to macros evaluating to nothing without RACCT
so that their callers passing ui_racct don't have to be ifdefed.
2014-11-23 08:25:44 +00:00
mjg
e31a493d7e filedesc: plug a test for impossible condition in fgetvp_rights 2014-11-23 00:12:27 +00:00
kib
0a8ab23540 The size value should be asserted when it is known.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
2014-11-22 18:15:02 +00:00
jhb
1671ac9155 Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
  to match what Linux does in that 1) it dumps the entire XSAVE area
  including the fxsave state, and 2) it stashes a copy of the current
  xsave mask in the unused padding between the fxsave state and the
  xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
  only the extra portion. This avoids having to always make two
  ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
  XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
  and the current XSAVE mask (%xcr0).

Differential Revision:	https://reviews.freebsd.org/D1193
Reviewed by:	kib
MFC after:	2 weeks
2014-11-21 20:53:17 +00:00
glebius
79b6f9c70a Do not allocate zero-length mbuf in sosend_generic().
Found by:	pho
Sponsored by:	Nginx, Inc.
2014-11-19 14:27:38 +00:00
zbb
aee9f2a5a3 Stop using early_putc immediately after configuring console with cninit()
Early UART should be released right after system console initialization is
completed. Otherwise, after cninit() both early and system console coexist
what may lead to various issues (i.a. writing to unmapped early
UART address). This cannot be done in cninit_finish() since it can be
called late at the end of MI configuration.

Obtained from:   Semihalf
Reviewed by:     andrew
Sponsored by:    The FreeBSD Foundation
2014-11-19 14:23:29 +00:00
imp
e1fec13f7c opt_global.h is included automatically in the build. No need to
explicitly include it in these places.

Sponsored by: Netflix
2014-11-18 17:06:56 +00:00
jmg
39fa4746e5 prevent doing filter ops locking for staticly compiled filter ops...
This significantly reduces lock contention when adding/removing knotes
on busy multi-kq system...  Next step is to cache these references per
kq.. i.e. kq refs it once and keeps a local ref count so that the same
refs don't get accessed by many cpus...

only allocate a knote when we might use it...

Add a new flag, _FORCEONESHOT..  This allows a thread to force the
delivery of another event in a safe manner, say waking up an idle http
connection to force it to be reaped...

If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's
disabled, so won't be delivered anyways..

Tested by:	adrian
2014-11-16 01:18:41 +00:00
glebius
d69e7f6f82 - Use NULL to compare a pointer.
- Use KASSERT() instead of panic.
- Remove useless 'continue', no need to restart cycle here.

Sponsored by:	Nginx, Inc.
2014-11-14 15:44:19 +00:00
glebius
de0e48fa9a Merge from projects/sendfile:
Use sbcut_locked() instead of manually editing a sockbuf.

Sponsored by:	Nginx, Inc.
2014-11-14 15:33:40 +00:00
kib
d97b2c5d8d In vfs_write_suspend_umnt(), if suspension cannot be established, do
not forget to restore write ops count when returning the error.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-14 11:31:10 +00:00
glebius
3bc8a689c1 There should not be zero length mbufs in socket buffers. The code comes
from r1451, and thus can't be explained.  A patch with explicit panic()
here survived all tests.

Tested by:	pho
Sponsored by:	Nginx, Inc.
2014-11-14 06:02:29 +00:00
jkim
66032933f9 Correct a typo to fix chown(2). It was broken since r274476.
Pointy hat to:	kib
X-MFC-With:	r274476
2014-11-13 23:51:13 +00:00
mjg
077a8b14ec filedesc: fixup fdinit to lock fdp and preapare files conditinally
Not all consumers providing fdp to copy from want files.

Perhaps these functions should be reorganized to better express the outcome.

This fixes up panics after r273895 .

Reported by:	markj
2014-11-13 21:15:09 +00:00
kib
de34fc931d Fix assertion, &uc->uc_busy is never zero, the intent is to test the
uc_busy value, and not its address [1].

Remove the single use of the macro, write KASSERT() explicitely in the
code of umtxq_sleep_pi().

Submitted by:	Eric van Gyzen <eric@vangyzen.net> [1]
MFC after:	1 week
2014-11-13 18:51:09 +00:00
kib
b4ef709604 Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 18:01:51 +00:00
kib
6cedba80db Do not try to dereference thread pointer when the value is not a pointer.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 17:44:35 +00:00
kib
e257542e11 Remove fossil. It has been present in 4.4Lite2, but its use was
removed for some time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-13 17:43:37 +00:00
dchagin
c0a51053a4 Regen for r274462. 2014-11-13 05:28:06 +00:00
dchagin
162012051b Add the ppoll() system call.
Export kern_poll() needed by an upcoming Linuxulator change.

Differential Revision:	https://reviews.freebsd.org/D1133
Reviewed by:	kib, wblock
MFC after:	1 month
2014-11-13 05:26:14 +00:00
kib
ff19294d91 For posix_fallocate(2) and posix_fadvise(2), return ESPIPE when
underlying file does not have DFLAG_SEEKABLE set [1].

For posix_fallocate(2), simplify error handling logic.  Do return when
fp is not yet referenced.

Noted by:	bde [1]
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-11-12 17:31:38 +00:00
glebius
834e6d1d30 Merge from projects/sendfile:
- Use KASSERT()s instead of panic().
- Use sbavail() instead of sb_cc.

Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2014-11-12 10:17:46 +00:00
glebius
c0b38b545a In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-11-12 09:57:15 +00:00