13852 Commits

Author SHA1 Message Date
Konstantin Belousov
3760e341ca Change the calculation of the kinfo_vmentry field kve_private_resident
to reflect its name.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:49:00 +00:00
Mateusz Guzik
965d08605f Plug p_pptr null test in do_execve. It is always true. 2014-07-14 22:40:46 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Konstantin Belousov
895b3782c6 Extract the code to put a filesystem into the suspended state (at the
unmount time) in the helper vfs_write_suspend_umnt().  Use it instead
of two inline copies in FFS.

Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:10:00 +00:00
Konstantin Belousov
57ef02ff0f In kern_linkat(), avoid passing doomed vnode to the VOP.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:41:13 +00:00
Konstantin Belousov
a69452162a Generalize vn_get_ino() to allow filesystems to use custom vnode
producer, instead of hard-coding VFS_VGET().  New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.

Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:34:54 +00:00
Kevin Lo
cb7df69b7e Make bind(2) and connect(2) return EAFNOSUPPORT for AF_UNIX on wrong
address family.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191586 for the
original discussion.

Reviewed by:	terry
2014-07-14 06:00:01 +00:00
Mateusz Guzik
8bedd5d782 Clear nonblock and async on devctl close instaed of open.
This is a purely cosmetic change.
2014-07-12 15:35:04 +00:00
Gleb Smirnoff
1fbe6a82f4 Improve reference counting of EXT_SFBUF pages attached to mbufs.
o Do not use UMA refcount zone. The problem with this zone is that
  several refcounting words (16 on amd64) share the same cache line,
  and issueing atomic(9) updates on them creates cache line contention.
  Also, allocating and freeing them is extra CPU cycles.
  Instead, refcount the page directly via vm_page_wire() and the sfbuf
  via sf_buf_alloc(sf_buf_page(sf)) [1].

o Call refcounting/freeing function for EXT_SFBUF via direct function
  call, instead of function pointer. This removes barrier for CPU
  branch predictor.

o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to
  satisfy assertion in mb_dtor_mbuf(). Remove the assertion from
  mb_dtor_mbuf(). Use bcopy() instead of manual assignments to
  copy m_ext in mb_dupcl().

[1] This has some problems for now. Using sf_buf_alloc() merely to
    increase refcount is expensive, and is broken on sparc64. To be
    fixed.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-07-11 19:40:50 +00:00
Gleb Smirnoff
fcc34a238c Fix style bug: rename the refcount field of m_ext to ext_cnt, to match
other members.

Sponsored by:	Nginx, Inc.
2014-07-11 14:34:29 +00:00
Gleb Smirnoff
15c28f87b8 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
Mateusz Guzik
88f98985aa Eliminate plim and vtmp local vars in exit1.
No functional changes.

MFC after:	1 week
2014-07-10 22:54:38 +00:00
Mateusz Guzik
30d58d6b39 Don't make a temporary copy of fixed sysctl strings. 2014-07-10 21:46:57 +00:00
Mateusz Guzik
b23c40d7b1 Don't zero fd_nfiles during fdp destruction.
Code trying to take a look has to check fd_refcnt and it is 0 by that time.

This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.

MFC after:	1 week
2014-07-10 21:05:45 +00:00
Mateusz Guzik
e518baf8f9 Avoid relocking filedesc lock when closing fds during fdp destruction.
Don't call bzero nor fdunused from fdfree for such cases. It would do
unnecessary work and complain that the lock is not taken.

MFC after:	1 week
2014-07-10 20:59:54 +00:00
Pietro Cerutti
7150b86bfe Implement Short/Small String Optimization in SBUF(9) and change lengths and
positions in the API from ssize_t and int to size_t.

CR:		D388
Approved by:	des, bapt
2014-07-10 13:08:51 +00:00
Konstantin Belousov
479fcb4e32 Unconditionally initialize addr to handle the case of changed map
timestamp while the map is unlocked.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2014-07-10 11:20:24 +00:00
Konstantin Belousov
a91831a261 Current code in sysctl proc.vmmap, which intent is to calculate the
amount of resident pages, in fact calculates the amount of installed
pte entries in the region.  Resident pages which were not soft-faulted
yet are not counted.

Calculate the amount of resident pages by looking in the objects chain
backing the region.

Add a knob to disable the residency calculation at all.  For large
sparce regions, either previous or updated algorithm runs for too long
time, while several introspection tools do not need the (advisory) RSS
value at all.

PR:	kern/188911
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-09 19:11:57 +00:00
Xin LI
2827952eb4 Don't leave the padding between the msg header and the cmsg data,
and the padding after the cmsg data un-initialized.

Submitted by:	tuexen
Security:	CVE-2014-3952
Security:	FreeBSD-SA-14:17.kmem
2014-07-08 21:54:23 +00:00
Konstantin Belousov
3bcc218f46 Correct the problem reported by test16 from
tools/regression/file/flock/flock.c, which completes the fix in
r192685.  When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-08 08:10:15 +00:00
Don Lewis
626a79752f Declaration whitespace changes for style(9).
MFC after:	1 week
2014-07-07 22:02:39 +00:00
Mateusz Guzik
5e2554b7f8 Don't call crdup nor uifind under vnode lock.
A locked vnode can get into the way of satisyfing malloc with M_WATOK.

This is a fixup to r268087.

Suggested by:	kib
MFC after:	1 week
2014-07-07 14:03:30 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Hans Petter Selasky
604bf9d37e When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
2014-07-05 06:12:48 +00:00
Konstantin Belousov
2499a5ccef Micro-manage clang to get the expected inlining for cpu_search().
Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline,
while cpu_search() gets always_inline.  With the attributes set,
cpu_search() is inlined in wrappers, and if()s with constant
conditionals are optimized.

On some tests on many-core machine, the hwpmc reported samples for
cpu_search*() are reduced from 25% total to 9%.

Submitted by:	"Rang, Anton" <anton.rang@isilon.com>
MFC after:	1 week
2014-07-03 11:06:27 +00:00
Marcel Moolenaar
054b57a740 Drop KTR records when we're in the debugger so that the debugger isn't
changing or overwriting the trace buffer. When KTR is enabled for things
like traps or pmap functions, the amount of logging can be substantial.
2014-07-02 22:13:07 +00:00
Ed Maste
969d3cc28b Fix typos in VTY constant names from r268158 2014-07-02 14:47:48 +00:00
Ed Maste
018147eef9 Prefer vt(4) for UEFI boot
The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism
for the startup routine to set the preferred console.  This change is
ugly because console init happens very early in the boot, making a
cleaner interface difficult.  This change is intended only to facilitate
the sc(4) / vt(4) transition, and can be reverted once vt(4) is the
default.
2014-07-02 13:24:21 +00:00
Mateusz Guzik
a6bad85e8e Plug gcc warning after r268074 about unitialized newsigacts
Reported by:	Gary Jennejohn <gljennjohn gmail.com>
2014-07-02 05:45:40 +00:00
Mateusz Guzik
350d51816e Don't call crcopysafe or uifind unnecessarily in execve.
MFC after:	1 week
2014-07-01 09:21:32 +00:00
Mateusz Guzik
d00c8ea429 Perform a lockless check in sigacts_shared.
It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after:	1 week
2014-07-01 06:29:15 +00:00
Adrian Chadd
c445c3c7f6 If we're doing RSS then ensure that the callwheel swi's are CPU pinned. 2014-06-30 04:25:51 +00:00
Hans Petter Selasky
4813ad54f8 Compile fixes:
Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.

Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.

Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.

Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.

MFC after:	2 weeks
2014-06-28 17:36:18 +00:00
Mateusz Guzik
b0bc0cadbe Call fdcloseexec right after fdunshare.
No functional changes.

MFC after:	1 week
2014-06-28 05:51:45 +00:00
Mateusz Guzik
b9d32c36fa Make fdunshare accept only td parameter.
Proc had to match the thread anyway and 2 parameters were inconsistent
with the rest.

MFC after:	1 week
2014-06-28 05:41:53 +00:00
Mateusz Guzik
35778d7aa9 Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.

Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.

MFC after:	1 week
2014-06-28 05:18:03 +00:00
Hans Petter Selasky
6a3287f889 Fix regression issue after r267961. Handle special string case for
SYSCTLs like previously.

MFC after:	2 weeks
Reported by:	several people
2014-06-28 03:59:04 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Marius Strobl
7344ee184b In order to get vt(4) a bit closer to the feature set provided by sc(4),
implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to
SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the
default colors of normal and kernel text respectively.
Note on the naming: Although affecting the output of vt(4), technically
kern/subr_terminal.c is primarily concerned with changing default colors
so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR.
Actually, if the architecture and abstraction of terminal+teken+vt would
be perfect, dev/vt/* wouldn't be touched by this commit at all.

Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-06-27 19:57:57 +00:00
Ed Maste
6ac6c9d5f4 Add CTLFLAG_NOFETCH flag; console vty code runs before tunable fetch
Also remove redundant "" assignment for string in BSS.

Submitted by:	hselasky@
2014-06-27 19:07:35 +00:00
Ed Maste
59644098f8 Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.

(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-06-27 17:50:33 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Mateusz Guzik
de966666a2 Check lower bound of cmsg_len.
If passed cm->cmsg_len was below cmsghdr size the experssion:
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;

would give negative result. However, in practice it would not
result in a crash because the kernel would try to obtain garbage fds
for given process and would error out with EBADF.

PR:		124908
Submitted by:	campbell mumble.net (modified a little)
MFC after:	1 week
2014-06-27 05:04:36 +00:00
Pawel Jakub Dawidek
e16406c7ba Remove duplicated includes.
Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
2014-06-26 13:57:44 +00:00
Attilio Rao
e989086b1d sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes
before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING
case.

Sponsored by:	EMC / Isilon storage division
2014-06-24 15:16:55 +00:00
Mateusz Guzik
450570a55e Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone

MFC after:	1 week
2014-06-23 01:28:18 +00:00
Mateusz Guzik
158627616c Don't take filedesc lock in fdunshare().
We can read refcnt safely and only care if it is equal to 1.

If it could suddenly change from 1 to something bigger the code would be
buggy even in the previous form and transitions from > 1 to 1 are equally racy
and harmless (we copy even though there is no need).

MFC after:	1 week
2014-06-22 21:37:27 +00:00
Alexander V. Chernikov
811985398d Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection:	jhb
Tested by:	hiren
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-22 11:32:23 +00:00
Mateusz Guzik
adf87ab01c fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent
is to iterate over active fds or lookup one, there is no point in looking
beyond that limit.

Few places are left unpatched for now.

MFC after:	1 week
2014-06-22 01:31:55 +00:00