Commit Graph

12734 Commits

Author SHA1 Message Date
Kevin Lo
c61325d009 Correct sizeof usage
Obtained from:	DragonFly
2012-06-25 05:41:16 +00:00
Konstantin Belousov
a665ed986c Move the code dealing with shared page into a dedicated
kern_sharedpage.c source file from kern_exec.c.

MFC after:	  29 days
2012-06-23 10:15:23 +00:00
Konstantin Belousov
21c295ef88 Stop updating the struct vdso_timehands from even handler executed in
the scheduled task from tc_windup(). Do it directly from tc_windup in
interrupt context [1].

Establish the permanent mapping of the shared page into the kernel
address space, avoiding the potential need to sleep waiting for
allocation of sf buffer during vdso_timehands update. As a
consequence, shared_page_write_start() and shared_page_write_end()
functions are not needed anymore.

Guess and memorize the pointers to native host and compat32 sysentvec
during initialization, to avoid the need to get shared_page_alloc_sx
lock during the update.

In tc_fill_vdso_timehands(), do not loop waiting for timehands
generation to stabilize, since vdso_timehands is written in the same
interrupt context which wrote timehands.

Requested by:	  mav [1]
MFC after:	  29 days
2012-06-23 09:33:06 +00:00
Konstantin Belousov
aea810386d Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
Konstantin Belousov
a9d8437c6d Enchance the shared page chunk allocator.
Do not rely on the busy state of the page from which we allocate the
chunk, to protect allocator state. Use statically allocated sx lock
instead.

Provide more flexible KPI. In particular, allow to allocate chunk
without providing initial data, and allow writes into existing
allocation. Allow to get an sf buf which temporary maps the chunk, to
allow sequential updates to shared page content without unmapping in
between.

Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 06:39:28 +00:00
Konstantin Belousov
854c3ce7ac Fix locking for f_offset, vn_read() and vn_write() cases only, for now.
It seems that intended locking protocol for struct file f_offset field
was as follows: f_offset should always be changed under the vnode lock
(except fcntl(2) and lseek(2) did not followed the rules). Since
read(2) uses shared vnode lock, FOFFSET_LOCKED block is additionally
taken to serialize shared vnode lock owners.

This was broken first by enabling shared lock on writes, then by
fadvise changes, which moved f_offset assigned from under vnode lock,
and last by vn_io_fault() doing chunked i/o. More, due to uio_offset
not yet valid in vn_io_fault(), the range lock for reads was taken on
the wrong region.

Change the locking for f_offset to always use FOFFSET_LOCKED block,
which is placed before rangelocks in the lock order.

Extract foffset_lock() and foffset_unlock() functions which implements
FOFFSET_LOCKED lock, and consistently lock f_offset with it in the
vn_io_fault() both for reads and writes, even if MNTK_NO_IOPF flag is
not set for the vnode mount. Indicate that f_offset is already valid
for vn_read() and vn_write() calls from vn_io_fault() with FOF_OFFSET
flag, and assert that all callers of vn_read() and vn_write() follow
this protocol.

Extract get_advice() function to calculate the POSIX_FADV_XXX value
for the i/o region, and use it were appropriate.

Reviewed by:	jhb
Tested by:	pho
MFC after:	2 weeks
2012-06-21 09:19:41 +00:00
Pawel Jakub Dawidek
53e1646325 Check proper flag (PDF_DAEMON, not PD_DAEMON) when deciding if the process
should be killed or not.

This fixes killing pdfork(2)ed process on last close of the corresponding
process descriptor.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:23:59 +00:00
Pawel Jakub Dawidek
0a7007b98f The falloc() function obtains two references to newly created 'fp'.
On success we have to drop one after procdesc_finit() and on failure
we have to close allocated slot with fdclose(), which also drops one
reference for us and drop the remaining reference with fdrop().

Without this change closing process descriptor didn't result in killing
pdfork(2)ed child.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:21:59 +00:00
John Baldwin
cd4ecf3cd2 Further refine the implementation of POSIX_FADV_NOREUSE.
First, extend the changes in r230782 to better handle the common case
of using NOREUSE with sequential reads.  A NOREUSE file descriptor
will now track the last implicit DONTNEED request it made as a result
of a NOREUSE read.  If a subsequent NOREUSE read is adjacent to the
previous range, it will apply the DONTNEED request to the entire range
of both the previous read and the current read.  The effect is that
each read of a file accessed sequentially will apply the DONTNEED
request to the entire range that has been read.  This allows NOREUSE
to properly handle misaligned reads by flushing each buffer to cache
once it has been completely read.

Second, apply the same changes made to read(2) by r230782 and this
change to writes.  This provides much better performance in the
sequential write case as it allows writes to still be clustered.  It
also provides much better performance for misaligned writes.  It does
mean that NOREUSE will be generally ineffective for non-sequential
writes as the current implementation relies on a future NOREUSE
write's implicit DONTNEED request to flush the dirty buffer from the
current write.

MFC after:	2 weeks
2012-06-19 18:42:24 +00:00
Peter Holm
e84a11e7ff In tty_makedev() the following construction:
dev = make_dev_cred();
dev->si_drv1 = tp;

leaves a small window where the newly created device may be opened
and si_drv1 is NULL.

As this is a vary rare situation, using a lock to close the window
seems overkill. Instead just wait for the assignment of si_drv1.

Suggested by:	kib
MFC after:	1 week
2012-06-18 07:34:38 +00:00
Pawel Jakub Dawidek
d99e1d5fd6 Don't check for race with close on advisory unlock (there is nothing smart we
can do when such a race occurs). This saves lock/unlock cycle for the filedesc
lock for every advisory unlock operation.

MFC after:	1 month
2012-06-17 21:04:22 +00:00
Pawel Jakub Dawidek
604a7c2f00 Extend the comment about checking for a race with close to explain why
it is done and why we don't return an error in such case.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:59:37 +00:00
Pawel Jakub Dawidek
fd6049b186 If VOP_ADVLOCK() call or earlier checks failed don't check for a race with
close, because even if we had a race there is nothing to unlock.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:32:32 +00:00
Davide Italiano
68cefd64ad The variable 'error' in sys_poll() is initialized in declaration to value
zero but in any case is overwritten by successive copyin(), making the
previous initialization useless. Remove this.
As an added bonus this fixes a style(9) bug.

Discussed with:		kib
Approved by:		gnn (mentor)
MFC after:		3 days
2012-06-17 13:03:50 +00:00
Pawel Jakub Dawidek
cff2dcd10d Revert r237073. 'td' can be NULL here.
MFC after:	1 month
2012-06-16 12:56:36 +00:00
Pawel Jakub Dawidek
3cde71cb25 One more attempt to make prototypes formated according to style(9), which
holefully recovers from the "worse than useless" state.

Reported by:	bde
MFC after:	1 month
2012-06-15 10:00:29 +00:00
Pawel Jakub Dawidek
a79de683f5 Update comment.
MFC after:	1 month
2012-06-14 17:32:58 +00:00
Pawel Jakub Dawidek
19a8f6748e Remove fdtofp() function and use fget_locked(), which works exactly the same.
MFC after:	1 month
2012-06-14 16:25:10 +00:00
Pawel Jakub Dawidek
b7fc69ca89 Assert that the filedesc lock is being held when the fdunwrap() function
is called.

MFC after:	1 month
2012-06-14 16:23:16 +00:00
Pawel Jakub Dawidek
1a94dc8581 Simplify the code by making more use of the fdtofp() function.
MFC after:	1 month
2012-06-14 15:37:15 +00:00
Pawel Jakub Dawidek
215aeba939 - Assert that the filedesc lock is being held when fdisused() is called.
- Fix white spaces.

MFC after:	1 month
2012-06-14 15:35:14 +00:00
Pawel Jakub Dawidek
7aef754274 Style fixes and assertions improvements.
MFC after:	1 month
2012-06-14 15:34:10 +00:00
Pawel Jakub Dawidek
8d169d9ff0 Assert that the filedesc lock is not held when closef() is called.
MFC after:	1 month
2012-06-14 15:26:23 +00:00
Pawel Jakub Dawidek
eb273c01f3 Style fixes.
Reported by:	bde
MFC after:	1 month
2012-06-14 15:21:57 +00:00
Pawel Jakub Dawidek
c7e9a659ca Remove code duplication from fdclosexec(), which was the reason of the bug
fixed in r237065.

MFC after:	1 month
2012-06-14 12:43:37 +00:00
Pawel Jakub Dawidek
8f59e9fddc When we are closing capabilities during exec, we want to call mq_fdclose()
on the underlying object and not on the capability itself.

Similar bug was fixed in r236853.

MFC after:	1 month
2012-06-14 12:41:21 +00:00
Pawel Jakub Dawidek
5570ae7d87 Style.
MFC after:	1 month
2012-06-14 12:37:41 +00:00
Pawel Jakub Dawidek
620216725a When checking if file descriptor number is valid, explicitely check for 'fd'
being less than 0 instead of using cast-to-unsigned hack.

Today's commit was brought to you by the letters 'B', 'D' and 'E' :)
2012-06-13 22:12:10 +00:00
Pawel Jakub Dawidek
7080f124d2 Now that dupfdopen() doesn't depend on finstall() being called earlier,
indx will never be -1 on error, as none of dupfdopen(), finstall() and
kern_capwrap() modifies it on error, but what is more important none of
those functions install and leave file at indx descriptor on error.

Leave an assert to prove my words.

MFC after:	1 month
2012-06-13 21:38:07 +00:00
Pawel Jakub Dawidek
3812dcd3de Allocate descriptor number in dupfdopen() itself instead of depending on
the caller using finstall().
This saves us the filedesc lock/unlock cycle, fhold()/fdrop() cycle and closes
a race between finstall() and dupfdopen().

MFC after:	1 month
2012-06-13 21:32:35 +00:00
Pawel Jakub Dawidek
7f35af0110 - Remove nfp variable that is not really needed.
- Update comment.
- Style nits.

MFC after:	1 month
2012-06-13 21:22:35 +00:00
Pawel Jakub Dawidek
c64dd3bae1 Remove duplicated code.
MFC after:	1 month
2012-06-13 21:15:01 +00:00
Pawel Jakub Dawidek
81424ab705 Add missing {.
MFC after:	1 month
2012-06-13 21:13:18 +00:00
Pawel Jakub Dawidek
85c1550d63 Style.
MFC after:	1 month
2012-06-13 21:11:58 +00:00
Pawel Jakub Dawidek
baf946221d There is no need to set td->td_retval[0] to -1 on error.
Confirmed by:	jhb
MFC after:	1 month
2012-06-13 21:10:00 +00:00
Pawel Jakub Dawidek
6195bfebcc There is only one caller of the dupfdopen() function, so we can simplify
it a bit:
- We can assert that only ENODEV and ENXIO errors are passed instead of
  handling other errors.
- The caller always call finstall() for indx descriptor, so we can assume
  it is set. Actually the filedesc lock is dropped between finstall() and
  dupfdopen(), so there is a window there for another thread to close the
  indx descriptor, but it will be closed in next commit.

Reviewed by:	mjg
MFC after:	1 month
2012-06-13 19:00:29 +00:00
Mateusz Guzik
2ca63f0a90 Remove 'low' argument from fd_last_used().
This function is static and the only caller always passes 0 as low.

While here update note about return values in comment.

Reviewed by:	pjd
Approved by:	trasz (mentor)
MFC after:	1 month
2012-06-13 17:18:16 +00:00
Mateusz Guzik
02efb9a8b1 Re-apply reverted parts of r236935 by pjd with some changes.
If fdalloc() decides to grow fdtable it does it once and at most doubles
the size. This still may be not enough for sufficiently large fd. Use fd
in calculations of new size in order to fix this.

When growing the table, fd is already equal to first free descriptor >= minfd,
also fdgrowtable() no longer drops the filedesc lock. As a result of this there
is no need to retry allocation nor lookup.

Fix description of fd_first_free to note all return values.

In co-operation with:	pjd
Approved by:	trasz (mentor)
MFC after:	1 month
2012-06-13 17:12:53 +00:00
Pawel Jakub Dawidek
faf0db351d Revert part of the r236935 for now, until I figure out why it doesn't
work properly.

Reported by:	davidxu
2012-06-12 10:25:11 +00:00
Pawel Jakub Dawidek
039dc89f0d fdgrowtable() no longer drops the filedesc lock so it is enough to
retry finding free file descriptor only once after fdgrowtable().

Spotted by:	pluknet
MFC after:	1 month
2012-06-11 22:05:26 +00:00
Pawel Jakub Dawidek
d3ec30e525 Use consistent way of checking if descriptor number is valid.
MFC after:	1 month
2012-06-11 20:17:20 +00:00
Pawel Jakub Dawidek
fd45a47ba6 Be consistent with white spaces.
MFC after:	1 month
2012-06-11 20:01:50 +00:00
Pawel Jakub Dawidek
19d9c0e11e Remove code duplicated in kern_close() and do_dup() and use closefp() function
introduced a minute ago.

This code duplication was responsible for the bug fixed in r236853.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 20:00:44 +00:00
Pawel Jakub Dawidek
642db963ab Introduce closefp() function that we will be able to use to eliminate
code duplication in kern_close() and do_dup().

This is committed separately from the actual removal of the duplicated
code, as the combined diff was very hard to read.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:57:31 +00:00
Pawel Jakub Dawidek
129c87eb7d Merge two ifs into one to make the code almost identical to the code in
kern_close().

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:53:41 +00:00
Pawel Jakub Dawidek
d327cee241 Move the code around a bit to move two parts of code duplicated from
kern_close() close together.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:51:27 +00:00
Pawel Jakub Dawidek
8b40793150 Now that fdgrowtable() doesn't drop the filedesc lock we don't need to
check if descriptor changed from under us. Replace the check with an
assert.

Discussed with:	kib
Tested by:	pho
MFC after:	1 month
2012-06-11 19:48:55 +00:00
Mitsuru IWASAKI
c1b0dc80b5 Another fixe for r236772.
- Adjust correct cpuset (stopped_cpus/suspended_cpus) for
  cpu_spinwait() in generic_stop_cpus().
2012-06-11 18:47:26 +00:00
Pawel Jakub Dawidek
f3cd980557 Style fixes and simplifications.
MFC after:	1 month
2012-06-11 16:08:03 +00:00
Pawel Jakub Dawidek
effb6326a1 Remove redundant include.
MFC after:	1 month
2012-06-10 20:24:01 +00:00