Commit Graph

12754 Commits

Author SHA1 Message Date
Konstantin Belousov
481af8b933 Cosmetics: define FREEBSD32_MINUSER and AOUT32_MINUSER for struct
sysentvec .sv_minuser. Also improve style.

Submitted by:	Oliver Pinter <oliver.pinter@gmail.com>
MFC after:	1 week
2012-07-22 13:41:45 +00:00
Konstantin Belousov
a53cab2c6c (Incomplete) fixes for symbols visibility issues and style in fcntl.h.
Append '__' prefix to the tag of struct oflock, and put it under BSD
namespace. Structure is needed both by libc and kernel, thus cannot be
hidden under #ifdef _KERNEL.

Move a set of non-standard F_* and O_* constants into BSD namespace.
SUSv4 explicitely allows implemenation to pollute F_* and O_* names
after fcntl.h is included, but it costs us nothing to adhere
to the specification if exact POSIX compliance level is requested by
user code.

Change some spaces after #define to tabs.

Noted by and discussed with:	     bde
MFC after:   1 week
2012-07-21 13:02:11 +00:00
Konstantin Belousov
eb3d975443 Remove line which was accidentally kept in r238614.
Submitted by:	pjd
Pointy hat to:	kib
MFC after:	1 week
2012-07-19 20:38:03 +00:00
Konstantin Belousov
d1ae5c8337 Fix several reads beyond the mapped first page of the binary in the
ELF parser. Specifically, do not allow note reader and interpreter
path comparision in the brandelf code to read past end of the page.
This may happen if specially crafter ELF image is activated.

Submitted by:	Lukasz Wojcik <lukasz.wojcik zoho com>
MFC after:	3 days
2012-07-19 11:15:53 +00:00
Konstantin Belousov
49d02b13bc Implement F_DUPFD_CLOEXEC command for fcntl(2), specified by SUSv4.
PR:	  standards/169962
Submitted by:	Jukka A. Ukkonen <jau iki fi>
MFC after:	1 week
2012-07-19 10:22:54 +00:00
George V. Neville-Neil
57d025c338 Add support for walltimestamp in DTrace.
Submitted by:	Fabian Keil
MFC after:	2 weeks
2012-07-16 20:17:19 +00:00
Gabor Pali
599fc82b06 - Add support for displaying process stack memory regions.
Approved by:	rwatson
MFC after:	3 days
2012-07-16 09:38:19 +00:00
Matthew D Fleming
f806cdcf99 Fix a bug with memguard(9) on 32-bit architectures without a
VM_KMEM_MAX_SIZE.

The code was not taking into account the size of the kernel_map, which
the kmem_map is allocated from, so it could produce a sub-map size too
large to fit.  The simplest solution is to ignore VM_KMEM_MAX entirely
and base the memguard map's size off the kernel_map's size, since this
is always relevant and always smaller.

Found by:	Justin Hibbits
2012-07-15 20:29:48 +00:00
John Baldwin
2919668490 Make the interval timings for EVFILT_TIMER more accurate. tvtohz() always
adds an extra tick to account for the current partial clock tick.  However,
that is not appropriate for a repeating timer when the exact tvtohz() value
should be used for subsequent intervals.  Fix repeating callouts for
EVFILT_TIMER by subtracting 1 tick from the tvtohz() result similar to the
fix used in realitexpire() for interval timers.

While here, update a few comments to note that if the EVFILT_TIMER code
were to move out of kern_event.c, it should move to kern_time.c (where the
interval timer code it mimics lives) rather than kern_timeout.c.

MFC after:	1 month
2012-07-13 13:24:33 +00:00
Konstantin Belousov
92a9c65b06 Fix build for kernels with dtrace hooks.
MFC after:	1 month
2012-07-11 18:50:50 +00:00
George V. Neville-Neil
3fac94ba94 Initial commit of an I/O provider for DTrace on FreeBSD.
These probes are most useful when looking into the structures
they provide, which are listed in io.d.  For example:

dtrace -n 'io:genunix::start { printf("%d\n", args[0]->bio_bcount); }'

Note that the I/O systems in FreeBSD and Solaris/Illumos are sufficiently
different that there is not a 1:1 mapping from scripts that work
with one to the other.
MFC after:	1 month
2012-07-11 16:27:02 +00:00
David Xu
7ce60f6013 Always clear p_xthread if current thread no longer needs it, in theory, if
debugger exited without calling ptrace(PT_DETACH), there is a time window
that the p_xthread may be pointing to non-existing thread, in practical,
this is not a problem because child process soon will be killed by parent
process.
2012-07-10 05:45:13 +00:00
David Xu
5985d61556 If you have pressed CTRL+Z and a process is suspended, then you use gdb
to attach to the process, it is surprising that the process is resumed
without inputting any gdb commands, however ptrace manual said:
  The tracing process will see the newly-traced process stop and may
  then control it as if it had been traced all along.
But the current code does not work in this way, unless traced process
received a signal later, it will continue to run as a background task.
To fix this problem, just send signal SIGSTOP to the traced process after
we resumed it, this works like that you are attaching to a running process,
it is not perfect but better than nothing.
2012-07-09 09:24:46 +00:00
Mateusz Guzik
4fd85c4b5d Follow-up commit to r238220:
Pass only FEXEC (instead of FREAD|FEXEC) in fgetvp_exec. _fget has to check for
!FWRITE anyway and may as well know about FREAD.

Make _fget code a bit more readable by converting permission checking from if()
to switch(). Assert that correct permission flags are passed.

In collaboration with:	kib
Approved by:	trasz (mentor)
MFC after:	6 days
X-MFC: with r238220
2012-07-09 05:39:31 +00:00
Mateusz Guzik
28a7f60741 Unbreak handling of descriptors opened with O_EXEC by fexecve(2).
While here return EBADF for descriptors opened for writing (previously it was ETXTBSY).

Add fgetvp_exec function which performs appropriate checks.

PR:		kern/169651
In collaboration with:	kib
Approved by:	trasz (mentor)
MFC after:	1 week
2012-07-08 00:51:38 +00:00
Mikolaj Golub
e71a7957bd Fix KASSERT message.
MFC after:	3 days
2012-07-03 19:08:02 +00:00
Konstantin Belousov
c5c1199c83 Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by:	pho
No objections from:	jhb
MFC after:    3 weeks
2012-07-02 21:01:03 +00:00
John Baldwin
687c94aac9 Honor db_pager_quit in 'show uma' and 'show malloc'.
MFC after:	1 month
2012-07-02 16:14:52 +00:00
Warner Losh
f4dc9a4015 Remove an old hack I noticed years ago, but never committed. 2012-06-28 07:33:43 +00:00
Alan Cox
e30df26e7b Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
Kevin Lo
c61325d009 Correct sizeof usage
Obtained from:	DragonFly
2012-06-25 05:41:16 +00:00
Konstantin Belousov
a665ed986c Move the code dealing with shared page into a dedicated
kern_sharedpage.c source file from kern_exec.c.

MFC after:	  29 days
2012-06-23 10:15:23 +00:00
Konstantin Belousov
21c295ef88 Stop updating the struct vdso_timehands from even handler executed in
the scheduled task from tc_windup(). Do it directly from tc_windup in
interrupt context [1].

Establish the permanent mapping of the shared page into the kernel
address space, avoiding the potential need to sleep waiting for
allocation of sf buffer during vdso_timehands update. As a
consequence, shared_page_write_start() and shared_page_write_end()
functions are not needed anymore.

Guess and memorize the pointers to native host and compat32 sysentvec
during initialization, to avoid the need to get shared_page_alloc_sx
lock during the update.

In tc_fill_vdso_timehands(), do not loop waiting for timehands
generation to stabilize, since vdso_timehands is written in the same
interrupt context which wrote timehands.

Requested by:	  mav [1]
MFC after:	  29 days
2012-06-23 09:33:06 +00:00
Konstantin Belousov
aea810386d Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
Konstantin Belousov
a9d8437c6d Enchance the shared page chunk allocator.
Do not rely on the busy state of the page from which we allocate the
chunk, to protect allocator state. Use statically allocated sx lock
instead.

Provide more flexible KPI. In particular, allow to allocate chunk
without providing initial data, and allow writes into existing
allocation. Allow to get an sf buf which temporary maps the chunk, to
allow sequential updates to shared page content without unmapping in
between.

Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 06:39:28 +00:00
Konstantin Belousov
854c3ce7ac Fix locking for f_offset, vn_read() and vn_write() cases only, for now.
It seems that intended locking protocol for struct file f_offset field
was as follows: f_offset should always be changed under the vnode lock
(except fcntl(2) and lseek(2) did not followed the rules). Since
read(2) uses shared vnode lock, FOFFSET_LOCKED block is additionally
taken to serialize shared vnode lock owners.

This was broken first by enabling shared lock on writes, then by
fadvise changes, which moved f_offset assigned from under vnode lock,
and last by vn_io_fault() doing chunked i/o. More, due to uio_offset
not yet valid in vn_io_fault(), the range lock for reads was taken on
the wrong region.

Change the locking for f_offset to always use FOFFSET_LOCKED block,
which is placed before rangelocks in the lock order.

Extract foffset_lock() and foffset_unlock() functions which implements
FOFFSET_LOCKED lock, and consistently lock f_offset with it in the
vn_io_fault() both for reads and writes, even if MNTK_NO_IOPF flag is
not set for the vnode mount. Indicate that f_offset is already valid
for vn_read() and vn_write() calls from vn_io_fault() with FOF_OFFSET
flag, and assert that all callers of vn_read() and vn_write() follow
this protocol.

Extract get_advice() function to calculate the POSIX_FADV_XXX value
for the i/o region, and use it were appropriate.

Reviewed by:	jhb
Tested by:	pho
MFC after:	2 weeks
2012-06-21 09:19:41 +00:00
Pawel Jakub Dawidek
53e1646325 Check proper flag (PDF_DAEMON, not PD_DAEMON) when deciding if the process
should be killed or not.

This fixes killing pdfork(2)ed process on last close of the corresponding
process descriptor.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:23:59 +00:00
Pawel Jakub Dawidek
0a7007b98f The falloc() function obtains two references to newly created 'fp'.
On success we have to drop one after procdesc_finit() and on failure
we have to close allocated slot with fdclose(), which also drops one
reference for us and drop the remaining reference with fdrop().

Without this change closing process descriptor didn't result in killing
pdfork(2)ed child.

Reviewed by:	rwatson
MFC after:	1 month
2012-06-19 22:21:59 +00:00
John Baldwin
cd4ecf3cd2 Further refine the implementation of POSIX_FADV_NOREUSE.
First, extend the changes in r230782 to better handle the common case
of using NOREUSE with sequential reads.  A NOREUSE file descriptor
will now track the last implicit DONTNEED request it made as a result
of a NOREUSE read.  If a subsequent NOREUSE read is adjacent to the
previous range, it will apply the DONTNEED request to the entire range
of both the previous read and the current read.  The effect is that
each read of a file accessed sequentially will apply the DONTNEED
request to the entire range that has been read.  This allows NOREUSE
to properly handle misaligned reads by flushing each buffer to cache
once it has been completely read.

Second, apply the same changes made to read(2) by r230782 and this
change to writes.  This provides much better performance in the
sequential write case as it allows writes to still be clustered.  It
also provides much better performance for misaligned writes.  It does
mean that NOREUSE will be generally ineffective for non-sequential
writes as the current implementation relies on a future NOREUSE
write's implicit DONTNEED request to flush the dirty buffer from the
current write.

MFC after:	2 weeks
2012-06-19 18:42:24 +00:00
Peter Holm
e84a11e7ff In tty_makedev() the following construction:
dev = make_dev_cred();
dev->si_drv1 = tp;

leaves a small window where the newly created device may be opened
and si_drv1 is NULL.

As this is a vary rare situation, using a lock to close the window
seems overkill. Instead just wait for the assignment of si_drv1.

Suggested by:	kib
MFC after:	1 week
2012-06-18 07:34:38 +00:00
Pawel Jakub Dawidek
d99e1d5fd6 Don't check for race with close on advisory unlock (there is nothing smart we
can do when such a race occurs). This saves lock/unlock cycle for the filedesc
lock for every advisory unlock operation.

MFC after:	1 month
2012-06-17 21:04:22 +00:00
Pawel Jakub Dawidek
604a7c2f00 Extend the comment about checking for a race with close to explain why
it is done and why we don't return an error in such case.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:59:37 +00:00
Pawel Jakub Dawidek
fd6049b186 If VOP_ADVLOCK() call or earlier checks failed don't check for a race with
close, because even if we had a race there is nothing to unlock.

Discussed with:	kib
MFC after:	1 month
2012-06-17 16:32:32 +00:00
Davide Italiano
68cefd64ad The variable 'error' in sys_poll() is initialized in declaration to value
zero but in any case is overwritten by successive copyin(), making the
previous initialization useless. Remove this.
As an added bonus this fixes a style(9) bug.

Discussed with:		kib
Approved by:		gnn (mentor)
MFC after:		3 days
2012-06-17 13:03:50 +00:00
Pawel Jakub Dawidek
cff2dcd10d Revert r237073. 'td' can be NULL here.
MFC after:	1 month
2012-06-16 12:56:36 +00:00
Pawel Jakub Dawidek
3cde71cb25 One more attempt to make prototypes formated according to style(9), which
holefully recovers from the "worse than useless" state.

Reported by:	bde
MFC after:	1 month
2012-06-15 10:00:29 +00:00
Pawel Jakub Dawidek
a79de683f5 Update comment.
MFC after:	1 month
2012-06-14 17:32:58 +00:00
Pawel Jakub Dawidek
19a8f6748e Remove fdtofp() function and use fget_locked(), which works exactly the same.
MFC after:	1 month
2012-06-14 16:25:10 +00:00
Pawel Jakub Dawidek
b7fc69ca89 Assert that the filedesc lock is being held when the fdunwrap() function
is called.

MFC after:	1 month
2012-06-14 16:23:16 +00:00
Pawel Jakub Dawidek
1a94dc8581 Simplify the code by making more use of the fdtofp() function.
MFC after:	1 month
2012-06-14 15:37:15 +00:00
Pawel Jakub Dawidek
215aeba939 - Assert that the filedesc lock is being held when fdisused() is called.
- Fix white spaces.

MFC after:	1 month
2012-06-14 15:35:14 +00:00
Pawel Jakub Dawidek
7aef754274 Style fixes and assertions improvements.
MFC after:	1 month
2012-06-14 15:34:10 +00:00
Pawel Jakub Dawidek
8d169d9ff0 Assert that the filedesc lock is not held when closef() is called.
MFC after:	1 month
2012-06-14 15:26:23 +00:00
Pawel Jakub Dawidek
eb273c01f3 Style fixes.
Reported by:	bde
MFC after:	1 month
2012-06-14 15:21:57 +00:00
Pawel Jakub Dawidek
c7e9a659ca Remove code duplication from fdclosexec(), which was the reason of the bug
fixed in r237065.

MFC after:	1 month
2012-06-14 12:43:37 +00:00
Pawel Jakub Dawidek
8f59e9fddc When we are closing capabilities during exec, we want to call mq_fdclose()
on the underlying object and not on the capability itself.

Similar bug was fixed in r236853.

MFC after:	1 month
2012-06-14 12:41:21 +00:00
Pawel Jakub Dawidek
5570ae7d87 Style.
MFC after:	1 month
2012-06-14 12:37:41 +00:00
Pawel Jakub Dawidek
620216725a When checking if file descriptor number is valid, explicitely check for 'fd'
being less than 0 instead of using cast-to-unsigned hack.

Today's commit was brought to you by the letters 'B', 'D' and 'E' :)
2012-06-13 22:12:10 +00:00
Pawel Jakub Dawidek
7080f124d2 Now that dupfdopen() doesn't depend on finstall() being called earlier,
indx will never be -1 on error, as none of dupfdopen(), finstall() and
kern_capwrap() modifies it on error, but what is more important none of
those functions install and leave file at indx descriptor on error.

Leave an assert to prove my words.

MFC after:	1 month
2012-06-13 21:38:07 +00:00
Pawel Jakub Dawidek
3812dcd3de Allocate descriptor number in dupfdopen() itself instead of depending on
the caller using finstall().
This saves us the filedesc lock/unlock cycle, fhold()/fdrop() cycle and closes
a race between finstall() and dupfdopen().

MFC after:	1 month
2012-06-13 21:32:35 +00:00