Commit Graph

12736 Commits

Author SHA1 Message Date
trociny
3ef0ae6cd1 Fix KASSERT message.
MFC after:	3 days
2012-07-03 19:08:02 +00:00
kib
53224f018a Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
jhb
ab100847da Honor db_pager_quit in 'show uma' and 'show malloc'.
MFC after:	1 month
2012-07-02 16:14:52 +00:00
imp
492254ade0 Remove an old hack I noticed years ago, but never committed. 2012-06-28 07:33:43 +00:00
alc
c5e6daff9d Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
kevlo
8473fac955 Correct sizeof usage
Obtained from:	DragonFly
2012-06-25 05:41:16 +00:00
kib
c763bb1500 Move the code dealing with shared page into a dedicated
kern_sharedpage.c source file from kern_exec.c.

MFC after:	  29 days
2012-06-23 10:15:23 +00:00
kib
497817697c Stop updating the struct vdso_timehands from even handler executed in
the scheduled task from tc_windup(). Do it directly from tc_windup in
interrupt context [1].

Establish the permanent mapping of the shared page into the kernel
address space, avoiding the potential need to sleep waiting for
allocation of sf buffer during vdso_timehands update. As a
consequence, shared_page_write_start() and shared_page_write_end()
functions are not needed anymore.

Guess and memorize the pointers to native host and compat32 sysentvec
during initialization, to avoid the need to get shared_page_alloc_sx
lock during the update.

In tc_fill_vdso_timehands(), do not loop waiting for timehands
generation to stabilize, since vdso_timehands is written in the same
interrupt context which wrote timehands.

Requested by:	  mav [1]
MFC after:	  29 days
2012-06-23 09:33:06 +00:00
kib
7b36a08108 Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
kib
4109c3e1ac Enchance the shared page chunk allocator.
Do not rely on the busy state of the page from which we allocate the
chunk, to protect allocator state. Use statically allocated sx lock
instead.

Provide more flexible KPI. In particular, allow to allocate chunk
without providing initial data, and allow writes into existing
allocation. Allow to get an sf buf which temporary maps the chunk, to
allow sequential updates to shared page content without unmapping in
between.

Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 06:39:28 +00:00
kib
df9f3d2faa Fix locking for f_offset, vn_read() and vn_write() cases only, for now.
It seems that intended locking protocol for struct file f_offset field
was as follows: f_offset should always be changed under the vnode lock
(except fcntl(2) and lseek(2) did not followed the rules). Since
read(2) uses shared vnode lock, FOFFSET_LOCKED block is additionally
taken to serialize shared vnode lock owners.

This was broken first by enabling shared lock on writes, then by
fadvise changes, which moved f_offset assigned from under vnode lock,
and last by vn_io_fault() doing chunked i/o. More, due to uio_offset
not yet valid in vn_io_fault(), the range lock for reads was taken on
the wrong region.

Change the locking for f_offset to always use FOFFSET_LOCKED block,
which is placed before rangelocks in the lock order.

Extract foffset_lock() and foffset_unlock() functions which implements
FOFFSET_LOCKED lock, and consistently lock f_offset with it in the
vn_io_fault() both for reads and writes, even if MNTK_NO_IOPF flag is
not set for the vnode mount. Indicate that f_offset is already valid
for vn_read() and vn_write() calls from vn_io_fault() with FOF_OFFSET
flag, and assert that all callers of vn_read() and vn_write() follow
this protocol.

Extract get_advice() function to calculate the POSIX_FADV_XXX value
for the i/o region, and use it were appropriate.

Reviewed by:	jhb
Tested by:	pho
MFC after:	2 weeks
2012-06-21 09:19:41 +00:00
pjd
8f9f9f3c91 Check proper flag (PDF_DAEMON, not PD_DAEMON) when deciding if the process
should be killed or not.

This fixes killing pdfork(2)ed process on last close of the corresponding
process descriptor.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:23:59 +00:00
pjd
81ad62d5c5 The falloc() function obtains two references to newly created 'fp'.
On success we have to drop one after procdesc_finit() and on failure
we have to close allocated slot with fdclose(), which also drops one
reference for us and drop the remaining reference with fdrop().

Without this change closing process descriptor didn't result in killing
pdfork(2)ed child.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:21:59 +00:00
jhb
571562fffb Further refine the implementation of POSIX_FADV_NOREUSE.
First, extend the changes in r230782 to better handle the common case
of using NOREUSE with sequential reads.  A NOREUSE file descriptor
will now track the last implicit DONTNEED request it made as a result
of a NOREUSE read.  If a subsequent NOREUSE read is adjacent to the
previous range, it will apply the DONTNEED request to the entire range
of both the previous read and the current read.  The effect is that
each read of a file accessed sequentially will apply the DONTNEED
request to the entire range that has been read.  This allows NOREUSE
to properly handle misaligned reads by flushing each buffer to cache
once it has been completely read.

Second, apply the same changes made to read(2) by r230782 and this
change to writes.  This provides much better performance in the
sequential write case as it allows writes to still be clustered.  It
also provides much better performance for misaligned writes.  It does
mean that NOREUSE will be generally ineffective for non-sequential
writes as the current implementation relies on a future NOREUSE
write's implicit DONTNEED request to flush the dirty buffer from the
current write.

MFC after:	2 weeks
2012-06-19 18:42:24 +00:00
pho
85a3f61dca In tty_makedev() the following construction:
dev = make_dev_cred();
dev->si_drv1 = tp;

leaves a small window where the newly created device may be opened
and si_drv1 is NULL.

As this is a vary rare situation, using a lock to close the window
seems overkill. Instead just wait for the assignment of si_drv1.

Suggested by:	kib
MFC after:	1 week
2012-06-18 07:34:38 +00:00
pjd
0118c86062 Don't check for race with close on advisory unlock (there is nothing smart we
can do when such a race occurs). This saves lock/unlock cycle for the filedesc
lock for every advisory unlock operation.

MFC after:	1 month
2012-06-17 21:04:22 +00:00
pjd
32ff81e94f Extend the comment about checking for a race with close to explain why
it is done and why we don't return an error in such case.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:59:37 +00:00
pjd
9a81d01ee0 If VOP_ADVLOCK() call or earlier checks failed don't check for a race with
close, because even if we had a race there is nothing to unlock.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:32:32 +00:00
davide
163c370e14 The variable 'error' in sys_poll() is initialized in declaration to value
zero but in any case is overwritten by successive copyin(), making the
previous initialization useless. Remove this.
As an added bonus this fixes a style(9) bug.

Discussed with:		kib
Approved by:		gnn (mentor)
MFC after:		3 days
2012-06-17 13:03:50 +00:00
pjd
9719a38d39 Revert r237073. 'td' can be NULL here.
MFC after:	1 month
2012-06-16 12:56:36 +00:00
pjd
144a7f643e One more attempt to make prototypes formated according to style(9), which
holefully recovers from the "worse than useless" state.

Reported by:	bde
MFC after:	1 month
2012-06-15 10:00:29 +00:00
pjd
2ede2f9ae2 Update comment.
MFC after:	1 month
2012-06-14 17:32:58 +00:00
pjd
c2fe03ba67 Remove fdtofp() function and use fget_locked(), which works exactly the same.
MFC after:	1 month
2012-06-14 16:25:10 +00:00
pjd
0984458a79 Assert that the filedesc lock is being held when the fdunwrap() function
is called.

MFC after:	1 month
2012-06-14 16:23:16 +00:00
pjd
f84f6132c8 Simplify the code by making more use of the fdtofp() function.
MFC after:	1 month
2012-06-14 15:37:15 +00:00
pjd
4a9c37500e - Assert that the filedesc lock is being held when fdisused() is called.
- Fix white spaces.

MFC after:	1 month
2012-06-14 15:35:14 +00:00
pjd
7b02ff9171 Style fixes and assertions improvements.
MFC after:	1 month
2012-06-14 15:34:10 +00:00
pjd
32b7d4b149 Assert that the filedesc lock is not held when closef() is called.
MFC after:	1 month
2012-06-14 15:26:23 +00:00
pjd
e1c12932a7 Style fixes.
Reported by:	bde
MFC after:	1 month
2012-06-14 15:21:57 +00:00
pjd
2014b8defb Remove code duplication from fdclosexec(), which was the reason of the bug
fixed in r237065.

MFC after:	1 month
2012-06-14 12:43:37 +00:00
pjd
6634e42976 When we are closing capabilities during exec, we want to call mq_fdclose()
on the underlying object and not on the capability itself.

Similar bug was fixed in r236853.

MFC after:	1 month
2012-06-14 12:41:21 +00:00
pjd
841890f62a Style.
MFC after:	1 month
2012-06-14 12:37:41 +00:00
pjd
0ca632f7e9 When checking if file descriptor number is valid, explicitely check for 'fd'
being less than 0 instead of using cast-to-unsigned hack.

Today's commit was brought to you by the letters 'B', 'D' and 'E' :)
2012-06-13 22:12:10 +00:00
pjd
0123f7ed5a Now that dupfdopen() doesn't depend on finstall() being called earlier,
indx will never be -1 on error, as none of dupfdopen(), finstall() and
kern_capwrap() modifies it on error, but what is more important none of
those functions install and leave file at indx descriptor on error.

Leave an assert to prove my words.

MFC after:	1 month
2012-06-13 21:38:07 +00:00
pjd
f695b590b4 Allocate descriptor number in dupfdopen() itself instead of depending on
the caller using finstall().
This saves us the filedesc lock/unlock cycle, fhold()/fdrop() cycle and closes
a race between finstall() and dupfdopen().

MFC after:	1 month
2012-06-13 21:32:35 +00:00
pjd
f7e18321ef - Remove nfp variable that is not really needed.
- Update comment.
- Style nits.

MFC after:	1 month
2012-06-13 21:22:35 +00:00
pjd
219cd5caaa Remove duplicated code.
MFC after:	1 month
2012-06-13 21:15:01 +00:00
pjd
5d3532ce69 Add missing {.
MFC after:	1 month
2012-06-13 21:13:18 +00:00
pjd
c745de62f2 Style.
MFC after:	1 month
2012-06-13 21:11:58 +00:00
pjd
54a86dc320 There is no need to set td->td_retval[0] to -1 on error.
Confirmed by:	jhb
MFC after:	1 month
2012-06-13 21:10:00 +00:00
pjd
b836448bf3 There is only one caller of the dupfdopen() function, so we can simplify
it a bit:
- We can assert that only ENODEV and ENXIO errors are passed instead of
  handling other errors.
- The caller always call finstall() for indx descriptor, so we can assume
  it is set. Actually the filedesc lock is dropped between finstall() and
  dupfdopen(), so there is a window there for another thread to close the
  indx descriptor, but it will be closed in next commit.

Reviewed by:	mjg
MFC after:	1 month
2012-06-13 19:00:29 +00:00
mjg
29bd2f6d46 Remove 'low' argument from fd_last_used().
This function is static and the only caller always passes 0 as low.

While here update note about return values in comment.

Reviewed by:	pjd
Approved by:	trasz (mentor)
MFC after:	1 month
2012-06-13 17:18:16 +00:00
mjg
1ca4c8cbf9 Re-apply reverted parts of r236935 by pjd with some changes.
If fdalloc() decides to grow fdtable it does it once and at most doubles
the size. This still may be not enough for sufficiently large fd. Use fd
in calculations of new size in order to fix this.

When growing the table, fd is already equal to first free descriptor >= minfd,
also fdgrowtable() no longer drops the filedesc lock. As a result of this there
is no need to retry allocation nor lookup.

Fix description of fd_first_free to note all return values.

In co-operation with:	pjd
Approved by:	trasz (mentor)
MFC after:	1 month
2012-06-13 17:12:53 +00:00
pjd
bcf3f4263d Revert part of the r236935 for now, until I figure out why it doesn't
work properly.

Reported by:	davidxu
2012-06-12 10:25:11 +00:00
pjd
ea4cd345da fdgrowtable() no longer drops the filedesc lock so it is enough to
retry finding free file descriptor only once after fdgrowtable().

Spotted by:	pluknet
MFC after:	1 month
2012-06-11 22:05:26 +00:00
pjd
b7902b949c Use consistent way of checking if descriptor number is valid.
MFC after:	1 month
2012-06-11 20:17:20 +00:00
pjd
00ef5a8d82 Be consistent with white spaces.
MFC after:	1 month
2012-06-11 20:01:50 +00:00
pjd
d698b8f852 Remove code duplicated in kern_close() and do_dup() and use closefp() function
introduced a minute ago.

This code duplication was responsible for the bug fixed in r236853.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 20:00:44 +00:00
pjd
c8465e01a1 Introduce closefp() function that we will be able to use to eliminate
code duplication in kern_close() and do_dup().

This is committed separately from the actual removal of the duplicated
code, as the combined diff was very hard to read.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:57:31 +00:00
pjd
cab8c2dc3a Merge two ifs into one to make the code almost identical to the code in
kern_close().

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:53:41 +00:00