12950 Commits

Author SHA1 Message Date
andre
e3d1886535 Tidy up somaxconn (accept queue limit) and related functions
and move it together into one place.
2012-10-20 10:51:32 +00:00
andre
b521bc0b76 Move socket UMA zone initialization functionality together into
one place.
2012-10-19 12:16:29 +00:00
andre
a00941fc1d Move UMA socket zone initialization from uipc_domain.c to uipc_socket.c
into one place next to its other related functions to avoid confusion.
2012-10-19 10:15:32 +00:00
andre
0a3cd6fb1a Remove unnecessary includes from sosend_copyin() and fix
a couple of style issues.
2012-10-18 21:04:30 +00:00
andre
bd86ceebe0 Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
zero copy specialized sosend_copyin() helper function.
2012-10-18 20:22:17 +00:00
attilio
65d8b7120d Disconnect non-MPSAFE SMBFS from the build in preparation for dropping
GIANT from VFS. In addition, disconnect also netsmb, which is a base
requirement for SMBFS.

In the while SMBFS regular users can use FUSE interface and smbnetfs
port to work with their SMBFS partitions.

Also, there are ongoing efforts by vendor to support in-kernel smbfs,
so there are good chances that it will get relinked once properly locked.

This is not targeted for MFC.
2012-10-18 12:04:56 +00:00
attilio
85c1a64cec Disconnect non-MPSAFE NTFS from the build in preparation for dropping
GIANT from VFS. This code is particulary broken and fragile and other
in-kernel implementations around, found in other operating systems,
don't really seem clean and solid enough to be imported at all.
If someone wants to reconsider in-kernel NTFS implementation for
inclusion again, a fair effort for completely fixing and cleaning it
up is expected.

In the while NTFS regular users can use FUSE interface and ntfs-3g
port to work with their NTFS partitions.

This is not targeted for MFC.
2012-10-17 11:30:00 +00:00
attilio
3f4806915e Disconnect non-MPSAFE NWFS from the build in preparation for dropping
GIANT from VFS. In addition, disconnect also netncp, which is a base
requirement for NWFS.

In the possibility of a future maintenance of the code and later
readd to the FreeBSD base, maybe we should think about a better location
for netncp. I'm not entirely sure the / top location is actually right,
however I will let network people to comment on that more specifically.

This is not targeted for MFC.
2012-10-17 11:16:17 +00:00
attilio
efcca33ac5 Disconnect non-MPSAFE PORTALFS from the build in preparation for dropping
GIANT from VFS.

This is not targeted for MFC.
2012-10-16 09:59:10 +00:00
attilio
b66f8a3fde Disconnect non-MPSAFE HPFS from the build in preparation for dropping
GIANT from VFS.

This is not targeted for MFC.
2012-10-16 09:55:31 +00:00
kib
867cb9c7c5 Acquire the rangelock for truncate(2) as well.
Reported and reviewed by:	avg
Tested by:	pho
MFC after:	1 week
2012-10-15 18:15:18 +00:00
kib
cc37fedcd0 Add a KPI to allow to reserve some amount of space in the numvnodes
counter, without actually allocating the vnodes. The supposed use of
the getnewvnode_reserve(9) is to reclaim enough free vnodes while the
code still does not hold any resources that might be needed during the
reclamation, and to consume the slack later for getnewvnode() calls
made from the innards. After the critical block is finished, the
caller shall free any reserve left, by getnewvnode_drop_reserve(9).

Reviewed by:	avg
Tested by:	pho
MFC after:	1 week
2012-10-14 19:43:37 +00:00
mav
4a7b4c2440 panic() with reasonable message instead of returning zero frequency causing
division by zero later if event timer's minimal period is above one second.
For now it is just a theoretical possibility.

Found by:	Clang Static Analyzer
2012-10-10 19:46:46 +00:00
attilio
6997194551 Add an unified macro to deny ability from the compiler to reorder
instruction loads/stores at its will.
The macro __compiler_membar() is currently supported for both gcc and
clang, but kernel compilation will fail otherwise.

Reviewed by:	bde, kib
Discussed with:	dim, theraven
MFC after:	2 weeks
2012-10-09 14:32:30 +00:00
avg
7e3795ec58 cngetc: use cpu_spinwait to ease the cncheckc loop a tiny bit
Reviewed by:	julian
MFC after:	10 days
2012-10-06 19:50:23 +00:00
avg
457ded9997 ktrace/kern_exec: check p_tracecred instead of p_cred
.. when deciding whether to continue tracing across suid/sgid exec.
Otherwise if root ktrace-d an unprivileged process and the processed
exec-ed a suid program, then tracing didn't continue across exec.

Reviewed by:	bde, kib
MFC after:	22 days
2012-10-06 19:23:44 +00:00
ed
b7b93e5288 Fix faulty error code handling in read(2) on TTYs.
When performing a non-blocking read(2), on a TTY while no data is
available, we should return EAGAIN. But if there's a modem disconnect,
we should return 0. Right now we only return 0 when doing a blocking
read, which is wrong.

MFC after:	1 month
2012-10-03 13:51:03 +00:00
wollman
b94c59684c Fix spelling of the function name in two assertion messages. 2012-10-02 18:38:05 +00:00
eadler
833f88ba45 Provide a generic way to disable devices at boot time
PR:		kern/119202
Requested by:	peterj
Reviewed by:	sbruno, jhb
Approved by:	cperciva
MFC after:	1 week
2012-10-02 03:33:41 +00:00
pjd
ef5782071f - Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change
mkfifoat(2) was not restricted.
- Introduce CAP_MKNOD and enforce it on mknodat(2).

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-10-01 05:43:24 +00:00
kib
8f845e475e Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by:	pho (previous version)
MFC after:	2 weeks
2012-09-28 11:25:02 +00:00
mdf
394f27b845 Fix up kernel sources to be ready for a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:30:49 +00:00
pjd
3646977f28 Revert r240931, as the previous comment was actually in sync with POSIX.
I have to note that POSIX is simply stupid in how it describes O_EXEC/fexecve
and friends. Yes, not only inconsistent, but stupid.

In the open(2) description, O_RDONLY flag is described as:

	O_RDONLY	Open for reading only.

Taken from:

	http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

Note "for reading only". Not "for reading or executing"!

In the fexecve(2) description you can find:

	The fexecve() function shall fail if:

	[EBADF]
		The fd argument is not a valid file descriptor open for executing.

Taken from:

	http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html

As you can see the function shall fail if the file was not open with O_EXEC!

And yet, if you look closer you can find this mess in the exec.html:

	Since execute permission is checked by fexecve(), the file description
	fd need not have been opened with the O_EXEC flag.

Yes, O_EXEC flag doesn't have to be specified after all. You can open a file
with O_RDONLY and you still be able to fexecve(2) it.
2012-09-27 16:43:23 +00:00
trociny
d677fb07e1 Kernel and modules have "set_vnet" linker set, where virtualized
global variables are placed. When a module is loaded by link_elf
linker its variables from "set_vnet" linker set are copied to the
kernel "set_vnet" ("modspace") and all references to these variables
inside the module are relocated accordingly.

The issue is when a module is loaded that has references to global
variables from another, previously loaded module: these references are
not relocated so an invalid address is used when the module tries to
access the variable. The example is V_layer3_chain, defined in ipfw
module and accessed from ipfw_nat.

The same issue is with DPCPU variables, which use "set_pcpu" linker
set.

Fix this making the link_elf linker on a module load recognize
"external" DPCPU/VNET variables defined in the previously loaded
modules and relocate them accordingly. For this set_pcpu_list and
set_vnet_list are used, where the addresses of modules' "set_pcpu" and
"set_vnet" linker sets are stored.

Note, archs that use link_elf_obj (amd64) were not affected by this
issue.

Reviewed by:	jhb, julian, zec (initial version)
MFC after:	1 month
2012-09-27 14:55:15 +00:00
kib
0c5a309cd8 Make the updates of the tid ring buffer' head and tail pointers
explicit by moving them into separate statements from the buffer
element accesses.

Requested by:	jhb
MFC after:	3 days
2012-09-26 09:25:11 +00:00
pjd
68930741f2 Fix freebsd32_kmq_timedreceive() and freebsd32_kmq_timedsend() to use
getmq_read() and getmq_write() respectively, just like sys_kmq_timedreceive()
and sys_kmq_timedsend().

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 22:15:59 +00:00
pjd
b79f4497f4 vn_write() always expects FOF_OFFSET flag, which is asserted at the begining,
so there is no need to check for it.

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:31:17 +00:00
pjd
8e4b011dc5 We cannot open file for reading and executing (O_RDONLY | O_EXEC).
Well, in theory we can pass those two flags, because O_RDONLY is 0,
but we won't be able to read from a descriptor opened with O_EXEC.

Update the comment.

Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:11:40 +00:00
pjd
1d5d62ac36 Require CAP_DELETE on directory descriptor for unlinkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 21:00:36 +00:00
pjd
4816885ff1 Require CAP_CREATE on directory descriptor for symlinkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 20:59:12 +00:00
pjd
5ca6c6bd61 Require CAP_CREATE on directory descriptor for linkat(2).
Sponsored by:	FreeBSD Foundation
MFC after:	2 weeks
2012-09-25 20:58:15 +00:00
pjd
76c124139f O_EXEC flag is not part of the O_ACCMODE mask, check it separately.
If O_EXEC is provided don't require CAP_READ/CAP_WRITE, as O_EXEC
is mutually exclusive to O_RDONLY/O_WRONLY/O_RDWR.

Without this change CAP_FEXECVE capability right is not enforced.

Sponsored by:	FreeBSD Foundation
MFC after:	3 days
2012-09-25 20:48:49 +00:00
gnn
dd583c889b Change the module name for the I/O provider to "kernel" from
"genunix"  This will requires us to modify externally created
DTrace scripts but makes logical sense for FreeBSD.

Requested by: rpaulo
MFC after:	2 weeks
2012-09-25 19:16:28 +00:00
jhb
cabd8b8de7 Add optional entropy harvesting for software interrupts in swi_sched()
as controlled by kern.random.sys.harvest.swi.  SWI harvesting feeds into
the interrupt FIFO and each event is estimated as providing a single bit of
entropy.

Reviewed by:	markm, obrien
MFC after:	2 weeks
2012-09-25 14:55:46 +00:00
kib
a46afff790 Do not skip two elements of the tid_buffer when reusing the buffer
slot. This eventually results in exhaustion of the tid space, causing
new threads get tid -1 as identifier.

The bad effect of having the thread id equal to -1 is that
UMTX_OP_UMUTEX_WAIT returns EFAULT for a lock owned by such thread,
because casuword cannot distinguish between literal value -1 read from
the address and -1 returned as an indication of faulted
access. _thr_umutex_lock() helper from libthr does not check for
errors from _umtx_op_err(2), causing an infinite loop in
mutex_lock_sleep().

We observed the JVM processes hanging and consuming enormous amount of
system time on machines with approximately 100 days uptime.

Reported by:	Mykola Dzham <freebsd levsha org ua>
MFC after:	1 week
2012-09-22 12:17:09 +00:00
eadler
8600cbb5b6 Correct double "the the"
Approved by:	cperciva
MFC after:	3 days
2012-09-14 21:28:56 +00:00
avg
304f0afa4d sched_ule: fix inverted condition in reporting of priority lending via ktr
Reviewed by:	kan
MFC after:	1 week
2012-09-14 19:55:28 +00:00
attilio
d63ec4c4ce Remove all the checks on curthread != NULL with the exception of some MD
trap checks (eg. printtrap()).

Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.

Reviewed by:	bde, jhb
MFC after:	1 week
2012-09-13 22:26:22 +00:00
jhb
1e4174de1a Ignore stop and continue signals sent to an exiting process. Stop signals
set p_xstat to the signal that triggered the stop, but p_xstat is also
used to hold the exit status of an exiting process.  Without this change,
a stop signal that arrived after a process was marked P_WEXIT but before
it was marked a zombie would overwrite the exit status with the stop signal
number.

Reviewed by:	kib
MFC after:	1 week
2012-09-13 15:51:18 +00:00
attilio
a634dfc320 Improve check coverage about idle threads.
Idle threads are not allowed to acquire any lock but spinlocks.
Deny any attempt to do so by panicing at the locking operation
when INVARIANTS is on. Then, remove the check on blocking on a
turnstile.
The check in sleepqueues is left because they are not allowed to use
tsleep() either which could happen still.

Reviewed by:	bde, jhb, kib
MFC after:	1 week
2012-09-12 22:10:53 +00:00
attilio
ac29dac65b Tweak the commit message in case of panic for sleeping from threads
with TDP_NOSLEEPING on.

The current message has no informations on the thread and wchan
involed, which may be useful in case where dumps have mangled dwarf
informations.

Reported by:    kib
Reviewed by:	bde, jhb, kib
MFC after:	1 week
2012-09-12 22:05:54 +00:00
kib
5a86a6849a Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with:	pho
MFC after:	3 weeks
2012-09-09 19:17:15 +00:00
kib
01b848f0ee Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the
lookup code that dotdot lookups shall override any shared lock
requests with the exclusive one. The flag is useful for filesystems
which sometimes need to upgrade shared lock to exclusive inside the
VOP_LOOKUP or later, which cannot be done safely for dotdot, due to
dvp also locked and causing LOR.

In collaboration with:	    pho
MFC after:	3 weeks
2012-09-09 19:11:52 +00:00
attilio
7f498fac4a Move the checks for td_pinned, td_critnest, TDP_NOFAULTING and
TDP_NOSLEEPING leaking from syscallret() to userret() so that also
trap handling is covered. Also, the check on td_locks is not duplicated
between the two functions.

Reported by:	avg
Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:35:15 +00:00
attilio
1e61b37b3f Move PT_UPDATED_FLUSH() before td_locks check in order to have more
coverage also in the XEN case.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:29:53 +00:00
attilio
8dece93b14 userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by:	kib
MFC after:	1 week
2012-09-08 18:27:11 +00:00
glebius
b0ab73e8f6 Supply the pr_ctloutput method for local datagram sockets,
so that setsockopt() and getsockopt() work on them.

This makes 'tools/regression/sockets/unix_cmsg -t dgram'
more successful.
2012-09-07 21:06:54 +00:00
jhb
55089b858a A few whitespace and comment fixes. 2012-09-07 15:10:46 +00:00
ray
f2d384bbd0 Style fixes.
Suggested by:   mdf
Approved by:	adrian (menthor)
2012-09-04 23:16:55 +00:00
ray
da392b2a99 Add missing braces.
Approved by:	bschmidt (while mentor offline)
Pointed by:     gcooper
Pointy hat to:  ray
2012-09-03 09:46:46 +00:00