Commit Graph

13760 Commits

Author SHA1 Message Date
kib
c8b684225c Correct the problem reported by test16 from
tools/regression/file/flock/flock.c, which completes the fix in
r192685.  When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-08 08:10:15 +00:00
truckman
9f3ff514ca Declaration whitespace changes for style(9).
MFC after:	1 week
2014-07-07 22:02:39 +00:00
mjg
8a494537a4 Don't call crdup nor uifind under vnode lock.
A locked vnode can get into the way of satisyfing malloc with M_WATOK.

This is a fixup to r268087.

Suggested by:	kib
MFC after:	1 week
2014-07-07 14:03:30 +00:00
marcel
9f28abd980 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
hselasky
9d5fb7b715 When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
2014-07-05 06:12:48 +00:00
kib
fd64376ccf Micro-manage clang to get the expected inlining for cpu_search().
Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline,
while cpu_search() gets always_inline.  With the attributes set,
cpu_search() is inlined in wrappers, and if()s with constant
conditionals are optimized.

On some tests on many-core machine, the hwpmc reported samples for
cpu_search*() are reduced from 25% total to 9%.

Submitted by:	"Rang, Anton" <anton.rang@isilon.com>
MFC after:	1 week
2014-07-03 11:06:27 +00:00
marcel
603a2512a3 Drop KTR records when we're in the debugger so that the debugger isn't
changing or overwriting the trace buffer. When KTR is enabled for things
like traps or pmap functions, the amount of logging can be substantial.
2014-07-02 22:13:07 +00:00
emaste
833365590a Fix typos in VTY constant names from r268158 2014-07-02 14:47:48 +00:00
emaste
9825a4c806 Prefer vt(4) for UEFI boot
The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism
for the startup routine to set the preferred console.  This change is
ugly because console init happens very early in the boot, making a
cleaner interface difficult.  This change is intended only to facilitate
the sc(4) / vt(4) transition, and can be reverted once vt(4) is the
default.
2014-07-02 13:24:21 +00:00
mjg
3820ac174b Plug gcc warning after r268074 about unitialized newsigacts
Reported by:	Gary Jennejohn <gljennjohn gmail.com>
2014-07-02 05:45:40 +00:00
mjg
24f1528d0b Don't call crcopysafe or uifind unnecessarily in execve.
MFC after:	1 week
2014-07-01 09:21:32 +00:00
mjg
cd12aa0ab7 Perform a lockless check in sigacts_shared.
It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after:	1 week
2014-07-01 06:29:15 +00:00
adrian
b39e0d405b If we're doing RSS then ensure that the callwheel swi's are CPU pinned. 2014-06-30 04:25:51 +00:00
hselasky
c676f789a4 Compile fixes:
Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.

Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.

Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.

Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.

MFC after:	2 weeks
2014-06-28 17:36:18 +00:00
mjg
3bf95dde7c Call fdcloseexec right after fdunshare.
No functional changes.

MFC after:	1 week
2014-06-28 05:51:45 +00:00
mjg
0954f0fb37 Make fdunshare accept only td parameter.
Proc had to match the thread anyway and 2 parameters were inconsistent
with the rest.

MFC after:	1 week
2014-06-28 05:41:53 +00:00
mjg
16033c0793 Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.

Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.

MFC after:	1 week
2014-06-28 05:18:03 +00:00
hselasky
db24749e6c Fix regression issue after r267961. Handle special string case for
SYSCTLs like previously.

MFC after:	2 weeks
Reported by:	several people
2014-06-28 03:59:04 +00:00
hselasky
35b126e324 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
gjb
fc21f40567 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
marius
15fe259895 In order to get vt(4) a bit closer to the feature set provided by sc(4),
implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to
SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the
default colors of normal and kernel text respectively.
Note on the naming: Although affecting the output of vt(4), technically
kern/subr_terminal.c is primarily concerned with changing default colors
so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR.
Actually, if the architecture and abstraction of terminal+teken+vt would
be perfect, dev/vt/* wouldn't be touched by this commit at all.

Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-06-27 19:57:57 +00:00
emaste
75aefa2c4a Add CTLFLAG_NOFETCH flag; console vty code runs before tunable fetch
Also remove redundant "" assignment for string in BSS.

Submitted by:	hselasky@
2014-06-27 19:07:35 +00:00
emaste
2b75e7bbda Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.

(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-06-27 17:50:33 +00:00
hselasky
bd1ed65f0f Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
mjg
21b16efdd1 Check lower bound of cmsg_len.
If passed cm->cmsg_len was below cmsghdr size the experssion:
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;

would give negative result. However, in practice it would not
result in a crash because the kernel would try to obtain garbage fds
for given process and would error out with EBADF.

PR:		124908
Submitted by:	campbell mumble.net (modified a little)
MFC after:	1 week
2014-06-27 05:04:36 +00:00
pjd
64b5b5018d Remove duplicated includes.
Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
2014-06-26 13:57:44 +00:00
attilio
195f74af68 sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes
before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING
case.

Sponsored by:	EMC / Isilon storage division
2014-06-24 15:16:55 +00:00
mjg
dc769e3b99 Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone

MFC after:	1 week
2014-06-23 01:28:18 +00:00
mjg
202339afcf Don't take filedesc lock in fdunshare().
We can read refcnt safely and only care if it is equal to 1.

If it could suddenly change from 1 to something bigger the code would be
buggy even in the previous form and transitions from > 1 to 1 are equally racy
and harmless (we copy even though there is no need).

MFC after:	1 week
2014-06-22 21:37:27 +00:00
melifaro
3f08add0c8 Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection:	jhb
Tested by:	hiren
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-22 11:32:23 +00:00
mjg
d74326bc91 fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent
is to iterate over active fds or lookup one, there is no point in looking
beyond that limit.

Few places are left unpatched for now.

MFC after:	1 week
2014-06-22 01:31:55 +00:00
mjg
38cd838637 do_dup: plug redundant adjustment of fd_lastfile
By that time it was already set by fdalloc, or was there in the first place
if fd is replaced.

MFC after:	1 week
2014-06-22 00:53:33 +00:00
kib
f5cc3af6c8 In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once.  Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by:	bde
Reviewed by:	bde, trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-17 07:11:00 +00:00
dchagin
dd6bed9dd2 Revert r266925 as it can lead to instant panic at fexecve():
To allow to run the interpreter itself add a new ELF branding type.

Pointed out by:	kib, mjg
2014-06-17 05:29:18 +00:00
attilio
2802c525ad - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
kib
cde29dbd9f Use vn_io_fault for the writes from core dumping code. Recursing into
VM due to copyin(9) faulting while VFS locks are held is
deadlock-prone there in the same way as for the write(2) syscall.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-06-15 04:51:53 +00:00
mav
9e5655843e Implement simple direct-mapped cache for popular filesystem identifiers to
avoid congestion on global mountlist_mtx mutex in vfs_busyfs(), while
traversing through the list of mount points.

This change significantly improves NFS server scalability, since it had
to do this translation for every request, and the global lock becomes quite
congested.

This code is more optimized for relatively small number of mount points.
On systems with hundreds of active mount points this simple cache may have
many collisions.  But the original traversal code in that case should also
behave much worse, so we are not loosing much.

Reviewed by:	attilio
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-12 12:43:48 +00:00
mav
60cbac5944 Remove unneeded mountlist_mtx acquisition from sync_fsync().
All struct mount fields accessed by sync_fsync() are protected by MNT_MTX.
2014-06-11 12:56:49 +00:00
mav
1be587d505 Move root_mount_hold() functionality to separate mutex.
It has nothing to share with mutex protecting list of mounted file systems.
2014-06-11 08:14:08 +00:00
kib
b5eae21087 Devolatile as needed.
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2014-06-09 09:10:31 +00:00
kib
e15884d6df Change the nblock mutex, protecting the needsbuffer buffer deficit
flags, to rwlock.  Lock it in read mode when used from subroutines
called from buffer release code paths.

The needsbuffer is now updated using atomics, while read lock of
nblock prevents loosing the wakeups from bufspacewakeup() and
bufcountadd() in getnewbuf_bufd_help().

In several interesting loads, needsbuffer flags are never set, while
buffers are reused quickly.  This causes brelse() and bqrelse() from
different threads to content on the nblock.  Now they take nblock in
read mode, together with needsbuffer not needing an update, allowing
higher parallelism.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-06-09 03:38:03 +00:00
alc
2f6e275d72 Refresh a comment. The VM_STACK option was eliminated in r43209.
Sponsored by:	EMC / Isilon Storage Division
2014-06-09 00:15:16 +00:00
mav
6106e186e6 Remove extra branching from r267232.
MFC after:	2 weeks
2014-06-08 19:01:37 +00:00
mav
14c8389ad0 Use atomics to modify numvnodes variable.
This allows to mostly avoid lock usage in getnewvnode_[drop_]reserve(),
that reduces number of global vnode_free_list_mtx mutex acquisitions
from 4 to 2 per NFS request on ZFS, improving SMP scalability.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-08 15:38:40 +00:00
kib
00a919ea8f Remove write-only local variable.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-08 10:56:25 +00:00
kib
321a65896d Initialize the pbuf counter for directio using SYSINIT, instead of
using a direct hook called from kern_vfs_bio_buffer_alloc().
Mark ffs_rawread.c as requiring both ffs and directio options to be
compiled into the kernel.  Add ffs_rawread.c to the list of ufs.ko
module' sources.

In addition to stopping breaking the layering violation, it also
allows to link kernel when FFS is configured as module and DIRECTIO is
enabled.

One consequence of the change is that ffs_rawread.o is always linked
into the module regardless of the DIRECTIO option.  This is similar to
the option QUOTA and ufs_quota.c.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-08 10:55:06 +00:00
jilles
5e8a5eb68c ktrace: Use designated initializers for the data_lengths array.
In the .o file, this only changes some line numbers (head amd64) because
element 0 is no longer explicitly initialized.

This should make bugs like FreeBSD-SA-14:12.ktrace less likely.

Discussed with:	des
MFC after:	1 week
2014-06-06 14:49:00 +00:00
davide
e9bd39eb77 Convert functions to the new-style format.
Submitted by:	Vijay Singh <vijju.singh@gmail.com> via -hackers
2014-06-05 03:46:46 +00:00
marcel
916c7006f5 Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers.  This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1.  Juniper Networks - The network stack implementation in Junos
    is entirely different from FreeBSD's one and this change
    allows Juniper to build "stock" NIC drivers that can be used
    in combination with both the FreeBSD and Junos stacks.
2.  FreeBSD - This change opens the door towards changing ifnet
    and implementing new features and optimizations in the network
    stack without it requiring a change in the many NIC drivers
    FreeBSD has.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Reviewed by:	glebius@
Obtained from:	Juniper Networks, Inc.
2014-06-02 17:54:39 +00:00
adrian
e3480f19d0 Pin the right thread.
This _was_ right, a last minute suggestion and not enough testing makes
Adrian a bad boy.

Tested:

* igb(4) with RSS patches, by hand verifying each igb(4) taskqueue
  tid from procstat -ka using cpuset -g -t <tid>.
2014-06-01 04:11:05 +00:00