Commit Graph

13835 Commits

Author SHA1 Message Date
Xin LI
bda06553fd Re-instate UMA cached backend for 4K - 64K allocations. New consumers
like geli(4) uses malloc(9) to allocate temporary buffers that gets
free'ed shortly, causing frequent TLB shootdown as observed in hwpmc
supported flame graph.

Discussed with:	jeff, alfred
MFC after:	1 week
2014-08-14 05:13:24 +00:00
Konstantin Belousov
70978c93b8 If vm_page_grab() allocates a new page, the page is not inserted into
page queue even when the allocation is not wired.  It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.

In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.

In the uiomove_object_page(), deactivate page before the object is
unlocked.  There is no leak, since the page is deactivated after
uiomove_fromphys() finished.  But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-13 05:44:08 +00:00
Gleb Smirnoff
cd1692fa5d Move KASSERT into locked region.
Submitted by:	kib
2014-08-11 15:06:07 +00:00
Gleb Smirnoff
eaf78ad3f7 Use M_WAITOK in sf_buf_init().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 13:12:18 +00:00
Gleb Smirnoff
818d40d033 Provide sf_buf_ref() to optimize refcounting of already allocated
sendfile(2) buffers.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 12:59:55 +00:00
Bjoern A. Zeeb
e346f8c452 Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant
and export the kern_ version needed by an upcoming linuxolator change.

MFC after:	3 days
Sponsored by:	DARPA,AFRL
2014-08-07 16:49:50 +00:00
Andrey V. Elsukov
3e40097976 Temporary revert r269661, it looks like the patch isn't complete. 2014-08-07 14:32:28 +00:00
Andrey V. Elsukov
c60e497af9 Use cpuset_setithread() to apply cpu mask to taskq threads.
Sponsored by:	Yandex LLC
2014-08-07 10:23:50 +00:00
Konstantin Belousov
d735998057 Correct the problems with the ptrace(2) making the debuggee an orphan.
One problem is inferior(9) looping due to the process tree becoming a
graph instead of tree if the parent is traced by child. Another issue
is due to the use of p_oppid to restore the original parent/child
relationship, because real parent could already exited and its pid
reused (noted by mjg).

Add the function proc_realparent(9), which calculates the parent for
given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head
element of the p_orphan list and than stepping back to its container
to find the parent process. If the parent has already exited, the
init(8) is returned.

Move the P_ORPHAN and the new helper flag from the p_flag* to new
p_treeflag field of struct proc, which is protected by proctree lock
instead of proc lock, since the orphans relationship is managed under
the proctree_lock already.

The remaining uses of p_oppid in ptrace(PT_DETACH) and process
reapping are replaced by proc_realparent(9).

Phabric:	D417
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-07 05:47:53 +00:00
Gleb Smirnoff
c8d2ffd6a7 Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.

As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.

Tested by:	glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-05 09:44:10 +00:00
Kirk McKusick
5f9500c358 Add support for multi-threading of soft updates.
Replace a single soft updates thread with a thread per FFS-filesystem
mount point. The threads are associated with the bufdaemon process.

Reviewed by:  kib
Tested by:    Peter Holm and Scott Long
MFC after:    2 weeks
Sponsored by: Netflix
2014-08-04 22:03:58 +00:00
Davide Italiano
4295aa9240 Fix an overflow in getsockopt(). optval isn't big enough to hold
sbintime_t.
Re-introduce r255030 behaviour capping socket timeouts to INT_32
if they're too large.

CR:	https://phabric.freebsd.org/D433
Reported by:	demon
Reviewed by:	bde [1], jhb [2]
MFC after:	2 weeks
2014-08-04 05:40:51 +00:00
Peter Wemm
6dde7ecb5d Partial revert of r262867.
r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET,
but also changed one of the SOCK_DGRAM code paths to use the new
sbappendaddr_nospacecheck_locked() function.  This lead to SOCK_DGRAM
bypassing socket buffer limits.
2014-08-03 22:37:21 +00:00
Sergey Kandaurov
bcdd3bceb6 vn_path_to_global_path: update comment. 2014-08-03 07:59:19 +00:00
Warner Losh
146cbf6fa2 Make the witness lock limit an option. 2014-08-03 05:00:43 +00:00
Konstantin Belousov
168f4ee0a8 Remove Giant acquisition from the mount and unmount pathes.
It could be claimed that two things were reasonable protected by
Giant.  One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx.  Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-03 03:27:54 +00:00
Rui Paulo
551a78956c In the shm_open() and shm_unlink() syscalls, export the path to KTR.
MFC after:	1 week
2014-08-01 23:29:04 +00:00
Konstantin Belousov
634012b917 Remove one-time use macros which check for the vnode lifecycle. More,
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags.  Two of the three macros were only used in assertions.

In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself.  Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.

In v_incr_usecount(), call vholdl() before incrementing other ref
counters.  The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-29 16:42:34 +00:00
Konstantin Belousov
d3a3b8b038 Simplify the expression, by removing redundand calculation.
Noted by:	"O'Connor, Daniel" <Daniel.O'Connor@emc.com>
MFC after:	3 days
2014-07-29 01:46:31 +00:00
Konstantin Belousov
5d9b4508fd For md(4), posix shm(3) and tmpfs(5), free swap space used by paged in
dirty page, which is written by the process.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:27:05 +00:00
Pietro Cerutti
adecd05bf0 Unbreak the ABI by reverting r268494 until the compat shims are provided 2014-07-28 07:20:22 +00:00
Marcel Moolenaar
1e0a021e3d The accept filter code is not specific to the FreeBSD IPv4 network stack,
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.

Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.

Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.

In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-07-26 19:27:34 +00:00
Marcel Moolenaar
be836fab6c Don't return ERESTART when the device is gone. In ttydev_leave() ERESTART
is the indication that draining got interrupted due to a revoke(2) and
that tty_drain() is to be called again for draining to complete. If the
device is flagged as gone, then waiting/draining is not possible. Only
return ERESTART when waiting is still possible.

Obtained from:	Juniper Networks, Inc.
2014-07-26 15:46:41 +00:00
Gavin Atkinson
f6b4f5ca21 Add error return to dumpsys(), and use it in doadump().
This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.

Submitted by:	Conrad Meyer (EMC / Isilon storage division)
2014-07-25 23:52:53 +00:00
Daniel Eischen
66d8df9dfc Insert new threads at the end of the thread list in the process
instead of at the beginning.  This allows an intra process signal
to be sent to the oldest thread with the signal unmasked - which,
if it still exists, is the main thread.  This mimics behavior
found in Linux and Solaris.
2014-07-25 20:21:02 +00:00
Mateusz Guzik
a1bf811596 Prepare fget_unlocked for reading fd table only once.
Some capsicum functions accept fdp + fd and lookup fde based on that.
Add variants which accept fde.

Reviewed by:	pjd
MFC after:	1 week
2014-07-23 19:33:49 +00:00
Mateusz Guzik
6a1cf96b4a Cosmetic changes to unp_internalize
Don't throw away the result of fget_unlocked.
Move fdp increment to for loop to make it consistent with similar code
elsewhere.

MFC after:	1 week
2014-07-23 18:04:52 +00:00
Gleb Smirnoff
c71b4037ff Use assignment instead of bcopy.
Submitted by:	jmg
2014-07-18 14:59:35 +00:00
Baptiste Daroussin
42e62eca52 Extend kqueue's EVFILT_TIMER by adding precision unit flags support
Define the precision macros as bits sets to conform with XNU equivalent.
Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag
is passed.

Phabric:	https://phabric.freebsd.org/D421
Reviewed by:	kib
2014-07-18 14:27:04 +00:00
Kevin Lo
c29a33213b Deprecate m_act. Use m_nextpkt always. 2014-07-17 05:21:16 +00:00
Don Lewis
d3a6879421 Nuke the never-used RF_TIMESHARE feature, reducing the complexity of the
code.  The consensus on arch@ is that this feature might have been useful
in the distant past, but is now just unnecessary bloat.

The int_rman_activate_resource() and int_rman_deactivate_resource()
functions become trivial, so manually inline them.

The special deferred handling of RF_ACTIVE is no longer needed in
reserve_resource_bound(), so eliminate the associated code at the
end of the function.

These changes reduce the object file size by more than 500 bytes on i386.

Update the rman.9 man page to reflect the removal of the RF_TIMESHARE
feature.

MFC after:	2 weeks
2014-07-16 22:18:19 +00:00
Konstantin Belousov
65589a29f4 Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code.  In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:04:46 +00:00
Konstantin Belousov
a62eb1398a Followup to r268466.
- Move the code to calculate resident count into separate function.
  It reduces the indent level and makes the operation of
  vmmap_skip_res_cnt tunable more clear.
- Optimize the calculation of the resident page count for map entry.
  Skip directly to the next lowest available index and page among the
  whole shadow chain.
- Restore the use of pmap_incore(9), only to verify that current
  mapping is indeed superpage.
- Note the issue with the invalid pages.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:57:03 +00:00
Konstantin Belousov
3760e341ca Change the calculation of the kinfo_vmentry field kve_private_resident
to reflect its name.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:49:00 +00:00
Mateusz Guzik
965d08605f Plug p_pptr null test in do_execve. It is always true. 2014-07-14 22:40:46 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Konstantin Belousov
895b3782c6 Extract the code to put a filesystem into the suspended state (at the
unmount time) in the helper vfs_write_suspend_umnt().  Use it instead
of two inline copies in FFS.

Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:10:00 +00:00
Konstantin Belousov
57ef02ff0f In kern_linkat(), avoid passing doomed vnode to the VOP.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:41:13 +00:00
Konstantin Belousov
a69452162a Generalize vn_get_ino() to allow filesystems to use custom vnode
producer, instead of hard-coding VFS_VGET().  New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.

Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:34:54 +00:00
Kevin Lo
cb7df69b7e Make bind(2) and connect(2) return EAFNOSUPPORT for AF_UNIX on wrong
address family.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191586 for the
original discussion.

Reviewed by:	terry
2014-07-14 06:00:01 +00:00
Mateusz Guzik
8bedd5d782 Clear nonblock and async on devctl close instaed of open.
This is a purely cosmetic change.
2014-07-12 15:35:04 +00:00
Gleb Smirnoff
1fbe6a82f4 Improve reference counting of EXT_SFBUF pages attached to mbufs.
o Do not use UMA refcount zone. The problem with this zone is that
  several refcounting words (16 on amd64) share the same cache line,
  and issueing atomic(9) updates on them creates cache line contention.
  Also, allocating and freeing them is extra CPU cycles.
  Instead, refcount the page directly via vm_page_wire() and the sfbuf
  via sf_buf_alloc(sf_buf_page(sf)) [1].

o Call refcounting/freeing function for EXT_SFBUF via direct function
  call, instead of function pointer. This removes barrier for CPU
  branch predictor.

o Do not cleanup the mbuf to be freed in mb_free_ext(), merely to
  satisfy assertion in mb_dtor_mbuf(). Remove the assertion from
  mb_dtor_mbuf(). Use bcopy() instead of manual assignments to
  copy m_ext in mb_dupcl().

[1] This has some problems for now. Using sf_buf_alloc() merely to
    increase refcount is expensive, and is broken on sparc64. To be
    fixed.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-07-11 19:40:50 +00:00
Gleb Smirnoff
fcc34a238c Fix style bug: rename the refcount field of m_ext to ext_cnt, to match
other members.

Sponsored by:	Nginx, Inc.
2014-07-11 14:34:29 +00:00
Gleb Smirnoff
15c28f87b8 All mbuf external free functions never fail, so let them be void.
Sponsored by:	Nginx, Inc.
2014-07-11 13:58:48 +00:00
Mateusz Guzik
88f98985aa Eliminate plim and vtmp local vars in exit1.
No functional changes.

MFC after:	1 week
2014-07-10 22:54:38 +00:00
Mateusz Guzik
30d58d6b39 Don't make a temporary copy of fixed sysctl strings. 2014-07-10 21:46:57 +00:00
Mateusz Guzik
b23c40d7b1 Don't zero fd_nfiles during fdp destruction.
Code trying to take a look has to check fd_refcnt and it is 0 by that time.

This is a follow up to r268505, without this the code would leak memory for
tables bigger than the default.

MFC after:	1 week
2014-07-10 21:05:45 +00:00
Mateusz Guzik
e518baf8f9 Avoid relocking filedesc lock when closing fds during fdp destruction.
Don't call bzero nor fdunused from fdfree for such cases. It would do
unnecessary work and complain that the lock is not taken.

MFC after:	1 week
2014-07-10 20:59:54 +00:00
Pietro Cerutti
7150b86bfe Implement Short/Small String Optimization in SBUF(9) and change lengths and
positions in the API from ssize_t and int to size_t.

CR:		D388
Approved by:	des, bapt
2014-07-10 13:08:51 +00:00
Konstantin Belousov
479fcb4e32 Unconditionally initialize addr to handle the case of changed map
timestamp while the map is unlocked.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2014-07-10 11:20:24 +00:00
Konstantin Belousov
a91831a261 Current code in sysctl proc.vmmap, which intent is to calculate the
amount of resident pages, in fact calculates the amount of installed
pte entries in the region.  Resident pages which were not soft-faulted
yet are not counted.

Calculate the amount of resident pages by looking in the objects chain
backing the region.

Add a knob to disable the residency calculation at all.  For large
sparce regions, either previous or updated algorithm runs for too long
time, while several introspection tools do not need the (advisory) RSS
value at all.

PR:	kern/188911
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-09 19:11:57 +00:00
Xin LI
2827952eb4 Don't leave the padding between the msg header and the cmsg data,
and the padding after the cmsg data un-initialized.

Submitted by:	tuexen
Security:	CVE-2014-3952
Security:	FreeBSD-SA-14:17.kmem
2014-07-08 21:54:23 +00:00
Konstantin Belousov
3bcc218f46 Correct the problem reported by test16 from
tools/regression/file/flock/flock.c, which completes the fix in
r192685.  When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-08 08:10:15 +00:00
Don Lewis
626a79752f Declaration whitespace changes for style(9).
MFC after:	1 week
2014-07-07 22:02:39 +00:00
Mateusz Guzik
5e2554b7f8 Don't call crdup nor uifind under vnode lock.
A locked vnode can get into the way of satisyfing malloc with M_WATOK.

This is a fixup to r268087.

Suggested by:	kib
MFC after:	1 week
2014-07-07 14:03:30 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Hans Petter Selasky
604bf9d37e When getting the initial value of numeric tunables use the
getenv_xxx() functions instead of strtoq(), because the getenv_xxx()
functions include wrappers for various postfixes like G/M/K, which
strtoq() doesn't do.
2014-07-05 06:12:48 +00:00
Konstantin Belousov
2499a5ccef Micro-manage clang to get the expected inlining for cpu_search().
Mark cpu_search_lowest/cpu_search_highest/cpu_search_both as noinline,
while cpu_search() gets always_inline.  With the attributes set,
cpu_search() is inlined in wrappers, and if()s with constant
conditionals are optimized.

On some tests on many-core machine, the hwpmc reported samples for
cpu_search*() are reduced from 25% total to 9%.

Submitted by:	"Rang, Anton" <anton.rang@isilon.com>
MFC after:	1 week
2014-07-03 11:06:27 +00:00
Marcel Moolenaar
054b57a740 Drop KTR records when we're in the debugger so that the debugger isn't
changing or overwriting the trace buffer. When KTR is enabled for things
like traps or pmap functions, the amount of logging can be substantial.
2014-07-02 22:13:07 +00:00
Ed Maste
969d3cc28b Fix typos in VTY constant names from r268158 2014-07-02 14:47:48 +00:00
Ed Maste
018147eef9 Prefer vt(4) for UEFI boot
The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism
for the startup routine to set the preferred console.  This change is
ugly because console init happens very early in the boot, making a
cleaner interface difficult.  This change is intended only to facilitate
the sc(4) / vt(4) transition, and can be reverted once vt(4) is the
default.
2014-07-02 13:24:21 +00:00
Mateusz Guzik
a6bad85e8e Plug gcc warning after r268074 about unitialized newsigacts
Reported by:	Gary Jennejohn <gljennjohn gmail.com>
2014-07-02 05:45:40 +00:00
Mateusz Guzik
350d51816e Don't call crcopysafe or uifind unnecessarily in execve.
MFC after:	1 week
2014-07-01 09:21:32 +00:00
Mateusz Guzik
d00c8ea429 Perform a lockless check in sigacts_shared.
It is used only during execve (i.e. singlethreaded), so there is no fear
of returning 'not shared' which soon becomes 'shared'.

While here reorganize the code a little to avoid proc lock/unlock in
shared case.

MFC after:	1 week
2014-07-01 06:29:15 +00:00
Adrian Chadd
c445c3c7f6 If we're doing RSS then ensure that the callwheel swi's are CPU pinned. 2014-06-30 04:25:51 +00:00
Hans Petter Selasky
4813ad54f8 Compile fixes:
Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.

Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.

Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.

Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.

MFC after:	2 weeks
2014-06-28 17:36:18 +00:00
Mateusz Guzik
b0bc0cadbe Call fdcloseexec right after fdunshare.
No functional changes.

MFC after:	1 week
2014-06-28 05:51:45 +00:00
Mateusz Guzik
b9d32c36fa Make fdunshare accept only td parameter.
Proc had to match the thread anyway and 2 parameters were inconsistent
with the rest.

MFC after:	1 week
2014-06-28 05:41:53 +00:00
Mateusz Guzik
35778d7aa9 Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.

Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.

MFC after:	1 week
2014-06-28 05:18:03 +00:00
Hans Petter Selasky
6a3287f889 Fix regression issue after r267961. Handle special string case for
SYSCTLs like previously.

MFC after:	2 weeks
Reported by:	several people
2014-06-28 03:59:04 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Marius Strobl
7344ee184b In order to get vt(4) a bit closer to the feature set provided by sc(4),
implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to
SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the
default colors of normal and kernel text respectively.
Note on the naming: Although affecting the output of vt(4), technically
kern/subr_terminal.c is primarily concerned with changing default colors
so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR.
Actually, if the architecture and abstraction of terminal+teken+vt would
be perfect, dev/vt/* wouldn't be touched by this commit at all.

Reviewed by:	emaste
MFC after:	3 days
Sponsored by:	Bally Wulff Games & Entertainment GmbH
2014-06-27 19:57:57 +00:00
Ed Maste
6ac6c9d5f4 Add CTLFLAG_NOFETCH flag; console vty code runs before tunable fetch
Also remove redundant "" assignment for string in BSS.

Submitted by:	hselasky@
2014-06-27 19:07:35 +00:00
Ed Maste
59644098f8 Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.

(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-06-27 17:50:33 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Mateusz Guzik
de966666a2 Check lower bound of cmsg_len.
If passed cm->cmsg_len was below cmsghdr size the experssion:
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;

would give negative result. However, in practice it would not
result in a crash because the kernel would try to obtain garbage fds
for given process and would error out with EBADF.

PR:		124908
Submitted by:	campbell mumble.net (modified a little)
MFC after:	1 week
2014-06-27 05:04:36 +00:00
Pawel Jakub Dawidek
e16406c7ba Remove duplicated includes.
Submitted by:	Mariusz Zaborski <oshogbo@FreeBSD.org>
2014-06-26 13:57:44 +00:00
Attilio Rao
e989086b1d sysctl subsystem uses sxlocks so avoid to setup dynamic sysctl nodes
before sleepinit() has been fully executed in the SLEEPQUEUE_PROFILING
case.

Sponsored by:	EMC / Isilon storage division
2014-06-24 15:16:55 +00:00
Mateusz Guzik
450570a55e Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone

MFC after:	1 week
2014-06-23 01:28:18 +00:00
Mateusz Guzik
158627616c Don't take filedesc lock in fdunshare().
We can read refcnt safely and only care if it is equal to 1.

If it could suddenly change from 1 to something bigger the code would be
buggy even in the previous form and transitions from > 1 to 1 are equally racy
and harmless (we copy even though there is no need).

MFC after:	1 week
2014-06-22 21:37:27 +00:00
Alexander V. Chernikov
811985398d Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection:	jhb
Tested by:	hiren
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-22 11:32:23 +00:00
Mateusz Guzik
adf87ab01c fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent
is to iterate over active fds or lookup one, there is no point in looking
beyond that limit.

Few places are left unpatched for now.

MFC after:	1 week
2014-06-22 01:31:55 +00:00
Mateusz Guzik
0f0b852c73 do_dup: plug redundant adjustment of fd_lastfile
By that time it was already set by fdalloc, or was there in the first place
if fd is replaced.

MFC after:	1 week
2014-06-22 00:53:33 +00:00
Konstantin Belousov
7b81a399a4 In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once.  Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by:	bde
Reviewed by:	bde, trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-17 07:11:00 +00:00
Dmitry Chagin
2dedc1281a Revert r266925 as it can lead to instant panic at fexecve():
To allow to run the interpreter itself add a new ELF branding type.

Pointed out by:	kib, mjg
2014-06-17 05:29:18 +00:00
Attilio Rao
3ae10f7477 - Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
  adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI.  __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
Tested by:	pho
2014-06-16 18:15:27 +00:00
Konstantin Belousov
2e501b0a9e Use vn_io_fault for the writes from core dumping code. Recursing into
VM due to copyin(9) faulting while VFS locks are held is
deadlock-prone there in the same way as for the write(2) syscall.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-06-15 04:51:53 +00:00
Alexander Motin
781c93d405 Implement simple direct-mapped cache for popular filesystem identifiers to
avoid congestion on global mountlist_mtx mutex in vfs_busyfs(), while
traversing through the list of mount points.

This change significantly improves NFS server scalability, since it had
to do this translation for every request, and the global lock becomes quite
congested.

This code is more optimized for relatively small number of mount points.
On systems with hundreds of active mount points this simple cache may have
many collisions.  But the original traversal code in that case should also
behave much worse, so we are not loosing much.

Reviewed by:	attilio
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-12 12:43:48 +00:00
Alexander Motin
4f655310bf Remove unneeded mountlist_mtx acquisition from sync_fsync().
All struct mount fields accessed by sync_fsync() are protected by MNT_MTX.
2014-06-11 12:56:49 +00:00
Alexander Motin
eb6d6216c4 Move root_mount_hold() functionality to separate mutex.
It has nothing to share with mutex protecting list of mounted file systems.
2014-06-11 08:14:08 +00:00
Konstantin Belousov
a19c5d3716 Devolatile as needed.
Sponsored by:	The FreeBSD Foundation
MFC after:	13 days
2014-06-09 09:10:31 +00:00
Konstantin Belousov
7f82c6c17f Change the nblock mutex, protecting the needsbuffer buffer deficit
flags, to rwlock.  Lock it in read mode when used from subroutines
called from buffer release code paths.

The needsbuffer is now updated using atomics, while read lock of
nblock prevents loosing the wakeups from bufspacewakeup() and
bufcountadd() in getnewbuf_bufd_help().

In several interesting loads, needsbuffer flags are never set, while
buffers are reused quickly.  This causes brelse() and bqrelse() from
different threads to content on the nblock.  Now they take nblock in
read mode, together with needsbuffer not needing an update, allowing
higher parallelism.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-06-09 03:38:03 +00:00
Alan Cox
78960940fe Refresh a comment. The VM_STACK option was eliminated in r43209.
Sponsored by:	EMC / Isilon Storage Division
2014-06-09 00:15:16 +00:00
Alexander Motin
3345d73ca8 Remove extra branching from r267232.
MFC after:	2 weeks
2014-06-08 19:01:37 +00:00
Alexander Motin
590d636321 Use atomics to modify numvnodes variable.
This allows to mostly avoid lock usage in getnewvnode_[drop_]reserve(),
that reduces number of global vnode_free_list_mtx mutex acquisitions
from 4 to 2 per NFS request on ZFS, improving SMP scalability.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-08 15:38:40 +00:00
Konstantin Belousov
a288c757d4 Remove write-only local variable.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-08 10:56:25 +00:00
Konstantin Belousov
23f6698fbd Initialize the pbuf counter for directio using SYSINIT, instead of
using a direct hook called from kern_vfs_bio_buffer_alloc().
Mark ffs_rawread.c as requiring both ffs and directio options to be
compiled into the kernel.  Add ffs_rawread.c to the list of ufs.ko
module' sources.

In addition to stopping breaking the layering violation, it also
allows to link kernel when FFS is configured as module and DIRECTIO is
enabled.

One consequence of the change is that ffs_rawread.o is always linked
into the module regardless of the DIRECTIO option.  This is similar to
the option QUOTA and ufs_quota.c.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-06-08 10:55:06 +00:00
Jilles Tjoelker
093e059c7d ktrace: Use designated initializers for the data_lengths array.
In the .o file, this only changes some line numbers (head amd64) because
element 0 is no longer explicitly initialized.

This should make bugs like FreeBSD-SA-14:12.ktrace less likely.

Discussed with:	des
MFC after:	1 week
2014-06-06 14:49:00 +00:00
Davide Italiano
e392e44c27 Convert functions to the new-style format.
Submitted by:	Vijay Singh <vijju.singh@gmail.com> via -hackers
2014-06-05 03:46:46 +00:00