Commit Graph

13896 Commits

Author SHA1 Message Date
Gleb Smirnoff
bd071d4d19 - Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
  they are absolutely unrelated to queues.

Submitted by:	Mikhail <mp lenta.ru> [1]
2014-09-28 14:05:18 +00:00
Mateusz Guzik
0c4a09a378 Make do_dup() static and move relevant macros to kern_descrip.c
No functional changes.
2014-09-26 19:48:47 +00:00
John Baldwin
c1d67516d9 Don't panic if a resource is allocated twice. Instead, print a warning and
fail the allocation request.  Allocations of "reserved" resources such as
PCI BARs already fail the request instead of panic'ing in this case.

MFC after:	1 week
2014-09-26 18:37:49 +00:00
Konstantin Belousov
f69261f2f9 Fix fcntl(2) compat32 after r270691. The copyin and copyout of the
struct flock are done in the sys_fcntl(), which mean that compat32 used
direct access to userland pointers.

Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which
performs neccessary userland memory accesses, and use it from both
native and compat32 fcntl syscalls.

Reported by:	jhibbits
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2014-09-25 21:07:19 +00:00
Konstantin Belousov
2c6fbcbed6 In kern_linkat() and kern_renameat(), do not call namei(9) while
holding a write reference on the filesystem.  Try to get write
reference in unblocked way after all vnodes are resolved; if failed,
drop all locks and retry after waiting for suspension end.

The VFS_UNMOUNT() methods for UFS and tmpfs try to establish
suspension on unmount, while covered vnode is locked by VFS, which
prevents namei() from stepping over the mount point.  The thread doing
namei() sleeps on the covered vnode lock, owning the write ref.

Reported by:	bdrewery
Tested by:	bdrewery (previous version), pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-25 20:42:25 +00:00
Justin Hibbits
a1c1634858 Stage one of multipass suspend/resume
Summary:
Add the beginnings of multipass suspend/resume, by introducing
BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this.

Reviewers: jhb

Reviewed By: jhb

Differential Revision: https://reviews.freebsd.org/D590
2014-09-23 02:56:40 +00:00
John Baldwin
9696feebe2 Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
  various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
  for each file and then convert that to a kinfo_ofile structure rather than
  keeping a second, different set of code that directly manipulates
  type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision:	https://reviews.freebsd.org/D775
Reviewed by:	kib, glebius (earlier version)
2014-09-22 16:20:47 +00:00
John Baldwin
fadf3fb98c Convert from timeout(9) to callout(9). 2014-09-22 14:27:26 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Sean Bruno
7c51714e0a svn revisions r269964 and r269963 seemed to have impaired small memory
footprint systems(32M/64M) and didn't leave enough free memory to load modules
when it was setting up page tables that for sizes that are never used on
these smallish boards.

Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this
from happening. Verified on mips32 h/w.

PR:             193465
Submitted by:   delphij
Reviewed by:    adrian
2014-09-22 05:07:22 +00:00
Alexander Motin
ae9e9b4fda Reprase r271616 comments.
Submitted by:	alc
MFC after:	1 month
2014-09-17 17:43:32 +00:00
Adrian Chadd
066da8050b Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather
than u_char.

Migrate post_filter to use an int for a CPU rather than u_char.

Change intr_event_bind() to use an int for CPU rather than u_char.

It touches the ppc, sparc64, arm and mips machdep code but it should
(hah!) be a no-op.

Tested:

* i386, AMD64 laptops

Reviewed by:	jhb
2014-09-17 17:33:22 +00:00
Adrian Chadd
7f7528fc79 Modify cpuset_setithread() to take a CPU ID as an integer, not a char.
We're going to end up having > 254 CPUs at some point.
2014-09-16 01:21:47 +00:00
Enji Cooper
257597a434 Validate the mode argument in access, eaccess, and faccessat for optional
POSIX compliance and to improve compatibility with Linux and NetBSD

The issue was identified with lib/libc/sys/t_access:access_inval from
NetBSD

Update the manpage accordingly

PR: 181155
Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code)
MFC after: 4 weeks
Phabric: D678 (code), D786 (manpage)
Sponsored by: EMC / Isilon Storage Division
2014-09-16 00:56:47 +00:00
Alexander Motin
7965496958 Add comments describing r271604 change.
MFC after:	3 days
2014-09-15 11:17:36 +00:00
Alexander Motin
7e9b58eaaa Add couple memory barries to serialize tdq_cpu_idle and tdq_load accesses.
This change fixes transient performance drops in some of my benchmarks,
vanishing as soon as I am trying to collect any stats from the scheduler.
It looks like reordered access to those variables sometimes caused loss of
IPI_PREEMPT, that delayed thread execution until some later interrupt.

MFC after:	3 days
2014-09-14 22:13:19 +00:00
Alexander V. Chernikov
c1d9ecf2be Fix error handling in cpuset_setithread() introduced in r267716.
Noted by:	kib
MFC after:	1 week
2014-09-13 13:46:16 +00:00
John Baldwin
2d69d0dcc2 Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
  and invfo_kqfilter() for use by file types that do not support the
  respective operations.  Home-grown versions of invfo_poll() were
  universally broken (they returned an errno value, invfo_poll()
  uses poll_no_poll() to return an appropriate event mask).  Home-grown
  ioctl routines also tended to return an incorrect errno (invfo_ioctl
  returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
  unsupported file operations.
- Reorder fileops members to match the order in the structure definition
  to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
  layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat().  Most
  of these used invfo_*(), but a dummy fo_stat() implementation was
  added.
2014-09-12 21:29:10 +00:00
John Baldwin
cd550b9b52 Tweak pipe_truncate() to more closely match pipe_chown() and pipe_chmod()
by checking PIPE_NAMED and using invfo_truncate() for unnamed pipes.
2014-09-12 21:20:36 +00:00
John Baldwin
0ed667f6e5 Simplify vntype_to_kinfo() by returning when the desired value is found
instead of breaking out of the loop and then immediately checking the loop
index so that if it was broken out of the proper value can be returned.

While here, use nitems().
2014-09-12 20:56:09 +00:00
Gleb Smirnoff
27ad26d8c7 Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES(). 2014-09-10 12:36:41 +00:00
Edward Tomasz Napierala
f514b97b7d Avoid unlocking unlocked mutex in RCTL jail code. Specific test case
is attached to PR.

PR:		193457
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2014-09-09 16:05:33 +00:00
Hiroki Sato
714266373c - Make hhook_run_socket() vnet-aware instead of adding CURVNET_SET() around
the function calls.
- Fix a memory leak and stats in the case that hhook_run_socket() fails
  in soalloc().

PR:	193265
2014-09-08 09:04:22 +00:00
Jean-Sébastien Pédron
ffea80b445 pause_sbt(): Take the cold path (ie. use DELAY()) if KDB is active
This fixes a panic in the i915 driver when one uses debug.kdb.enter=1
under vt(4).

PR:		193269
Reported by:	emaste@
Submitted by:	avg@
MFC after:	3 days
2014-09-08 08:44:50 +00:00
Gleb Smirnoff
9e739a5a05 Fix for r271182.
Submitted by:	mjg
Pointy hat to:	me, submitter and everyone who urged me to commit
2014-09-07 05:44:14 +00:00
Mateusz Guzik
64196a9996 Plug unnecessary fp assignments in kern_fcntl.
No functional changes.
2014-09-05 23:56:25 +00:00
Gleb Smirnoff
d9257d8b57 Set vnet context before accessing V_socket_hhh[].
Submitted by:	"Hiroo Ono (小野寛生)" <hiroo.ono+freebsd gmail.com>
2014-09-05 19:50:18 +00:00
Sean Bruno
65f20a89f1 Allow multiple image activators to run on the same execution by changing
imgp->interpreted to a bitmask instead of, functionally, a bool. Each
imgactivator now requires its own flag in interpreted to indicate whether
or not it has already examined argv[0].

Change imgp->interpreted to an unsigned char to add one extra bit for
future use.

With this change, one can execute a shell script from a 64bit host native
make and still get the binmisc image activator to fire for the script
interpreter.  Prior to this, execution would fail.

Phabric:	https://reviews.freebsd.org/D696
Reviewed by:	jhb@
MFC after:	4 weeks
2014-09-04 21:31:25 +00:00
Gleb Smirnoff
7ee2d05890 Change a very strange code in m_demote() to simple assertion.
Sponsored by:	Nginx, Inc.
2014-09-04 19:27:30 +00:00
Gleb Smirnoff
1967edba02 Provide m_catpkt(), a wrapper around m_cat() that deals with M_PKTHDR mbufs.
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-04 09:07:14 +00:00
Mateusz Guzik
2570cdd605 Plug a hypothetical use after free in sysctl kern.proc.groups.
MFC after:	1 week
2014-09-04 01:21:33 +00:00
Benno Rice
c079e1c018 Add KASSERTs to catch the case where a developer may have forgotten to
set bo_bsize on a bufobj.

This is a slight modification of the patch provided.

PR:		193146
Submitted by:	Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by:	EMC Isilon Storage Division
2014-09-04 00:10:06 +00:00
Konstantin Belousov
624bf9e134 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:40:16 +00:00
Konstantin Belousov
8626a0ddc6 Retire thread_unthread(), it has only one caller. Update comment in
the block of code before the previous call to thread_unthread().

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:35:42 +00:00
Konstantin Belousov
fd229b5b75 Right now, thread_single(SINGLE_EXIT) returns after the p_numthreads
reaches 1. The p_numthreads counter is decremented in thread_exit() by
a call to thread_unlink(). This means that the exiting threads may
still execute on other CPUs when thread_single(SINGLE_EXIT) returns.
As result, vmspace could be destroyed while paging structures are
still used on other CPUs by exiting threads.

Delay the return from thread_single(SINGLE_EXIT) until all threads are
really destroyed by thread_stash() after the last switch out. The
p_exitthreads counter already provides the required mechanism, move
the wait from the thread_wait() (which is called from wait(2) code)
into thread_single().

Reported by:	many (as "panic: pmap active <addr>")
Reviewed by:	alc, jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-09-03 08:18:07 +00:00
Gleb Smirnoff
5b5477d762 Fix dereference after NULL check.
CID:		1234607
Sponsored by:	Nginx, Inc.
2014-09-03 08:14:07 +00:00
Mateusz Guzik
9152087ea7 Fix up proc_realparent to always return correct process.
Prior to the change it would always return initproc for non-traced processes.

This fixes ps apparently always returning 1 as ppid.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-09-03 06:25:34 +00:00
Alan Cox
01a8fb7db5 Automatically prefault a limited number of mappings to resident pages in
shmat(2), just like mmap(2).

MFC after:	5 days
Sponsored by:	EMC / Isilon Storage Division
2014-08-31 17:38:41 +00:00
Mateusz Guzik
6662ce5aab Add missing proctree locking to fill_kinfo_proc consumers.
This fixes r270444.

Pointy hat:	mjg
Reported by:	many
MFC after:	1 week
2014-08-30 03:10:55 +00:00
Andreas Tobler
5be725d7e8 Rename shm_dict_init to shm_init to fix a compiler warning.
Reviewed by:	jhb
2014-08-29 21:50:32 +00:00
John Baldwin
610a2b3c45 Use a unit number allocator to provide suitable st_dev and st_ino values
for POSIX shared memory descriptors.  The implementation is similar to
that used for pipes.

MFC after:	1 week
2014-08-29 18:18:29 +00:00
Konstantin Belousov
575e02d94f Add function and wrapper to switch lockmgr and vnode lock back to
auto-promotion of shared to exclusive.

Tested by:	hrs, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-29 09:02:01 +00:00
Mateusz Guzik
8b04bbef31 Return real parent pid in kinfo (used by e.g. ps)
Add a separate field which exports tracer pid and add a new keyword
("tracer") for ps to display it.

This is a follow up to r270444.

Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
2014-08-28 08:41:11 +00:00
Jean-Sébastien Pédron
3e206539a1 vt(4): Add cngrab() and cnungrab() callbacks
They are used when a panic occurs or when entering a DDB session for
instance.

cngrab() forces a vt-switch to the console window, no matter if the
original window is another terminal or an X session. However, cnungrab()
doesn't vt-switch back to the original window currently.

MFC after:	1 week
2014-08-27 10:04:10 +00:00
Gleb Smirnoff
e86447ca44 - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-26 14:44:08 +00:00
Mateusz Guzik
037755fd15 Fix up races with f_seqcount handling.
It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after:	1 week
2014-08-26 08:17:22 +00:00
Konstantin Belousov
c83655f334 Revert the handling of all siginfo sa_flags except SA_SIGINFO to the
pre-r270321.  Namely, the flags are preserved for SIG_DFL and SIG_IGN
dispositions.

Requested and reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-24 16:37:50 +00:00
Mateusz Guzik
8cc11167fb Plug a memory leak in case of failed lookups in capability mode.
Put common cnp cleanup into one function and use it for this purpose.

MFC after:	1 week
2014-08-24 12:51:12 +00:00
Mateusz Guzik
ce8daaadbd Use refcount_init in sigacts_alloc.
This change is a no-op, but fixes up an inconsistency introduced with
r268634.

MFC after:	3 days
2014-08-24 09:24:37 +00:00
Mateusz Guzik
abd386bafe Fix getppid for traced processes.
Traced processes always have the tracer set as the parent.
Utilize proc_realparent to obtain the right process when needed.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:04:09 +00:00
Mateusz Guzik
a661bebe26 Properly reparent traced processes when the tracer dies.
Previously they were uncoditionally reparented to init. In effect
it was possible that tracee was never returned to original parent.

Reviewed by:	kib
MFC after:	1 week
2014-08-24 09:02:16 +00:00
Alexander Motin
2e7d7bb294 Restore pre-r239157 handling of sched_yield(), when thread time slice was
aborted, allowing other threads to run.  Without this change thread is just
rescheduled again, that was illustrated by provided test tool.

PR:		192926
Submitted by:	eric@vangyzen.net
MFC after:	2 weeks
2014-08-23 17:31:56 +00:00
Konstantin Belousov
fbb6eca60f In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check.  This restores the pre-r251684 behaviour, to
retry once after the signal is detected.

PR:	kern/192918
Submitted by:	Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net>
Obtained from:	Dell Inc.
MFC after:	1 week
2014-08-22 18:42:14 +00:00
Konstantin Belousov
350ae56373 Ensure that sigaction flags for signal, which disposition is reset to
ignored or default, are not leaking.  Apparently, there exists code
which relies on SA_SIGINFO not reported for SIG_DFL or SIG_IGN.

In kern_sigaction, ignore flags when resetting.  Encapsulate the flag
and disposition testing into helper sigact_flag_test().

On exec, and when delivering signal with SA_RESETHAND flag set,
signals are reset automatically.  Use new helper sigdflt(), which
removes duplicated code and corrects all flag bits for the signal.

For proc0, set sigintr bit for all ignored signals.  Ignored signals
are consumed in tdsendsignal() and not delivered to the victim thread
at all.

Reported and tested by:	royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 08:19:08 +00:00
Konstantin Belousov
2d86417410 Check the validity of struct sigaction sa_flags value, reject unknown
flags.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-22 07:52:47 +00:00
Hiroki Sato
ed063112f4 Fix a panic which occurs in a VIMAGE-enabled kernel after r270158, and
separate socket_hhook_register() part and put it into VNET_SYS{,UN}INIT()
handler.

Discussed with:	marcel
2014-08-22 05:03:30 +00:00
Marcel Moolenaar
4ec7371233 For vendors like Juniper, extensibility for sockets is important. A
good example is socket options that aren't necessarily generic.  To
this end, OSD is added to the socket structure and hooks are defined
for key operations on sockets.  These are:
o   soalloc() and sodealloc()
o   Get and set socket options
o   Socket related kevent filters.

One aspect about hhook that appears to be not fully baked is the return
semantics (the return value from the hook is ignored in hhook_run_hooks()
at the time of commit).  To support return values, the socket_hhook_data
structure contains a 'status' field to hold return values.

Submitted by:	Anuranjan Shukla <anshukla@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-08-18 23:45:40 +00:00
Warner Losh
817dc00433 Expand the elf brandelf infrastructure to give access to the whole ELF
header (Elf_Ehdr) to determine if a particular interpretor wants to
accept it or not. Use this mechanism to filter EABI arm on OABI arm
kernels, and vice versa. This method could also be used to implement
OABI on EABI arm kernels, if desired, or to allow a single mips kernel
to run o32, n32 and n64 binaries.

Differential Revision: https://reviews.freebsd.org/D609
2014-08-18 02:44:56 +00:00
Edward Tomasz Napierala
3914ddf8a7 Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris.  It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by:	allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric:	D523
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2014-08-17 09:44:42 +00:00
Mark Johnston
ba78d6b7a1 Correct the order of arguments passed to LIST_INSERT_AFTER().
Reviewed by:	kib
X-MFC-With:	r269656
2014-08-15 15:42:58 +00:00
Xin LI
7001d850bb Add a new loader tunable, vm.kmem_zmax which allows a system administrator
to limit the maximum allocation size that malloc(9) would consider using
the UMA cache allocator as backend.

Suggested by:	alfred
MFC after:	2 weeks
2014-08-14 05:31:39 +00:00
Xin LI
bda06553fd Re-instate UMA cached backend for 4K - 64K allocations. New consumers
like geli(4) uses malloc(9) to allocate temporary buffers that gets
free'ed shortly, causing frequent TLB shootdown as observed in hwpmc
supported flame graph.

Discussed with:	jeff, alfred
MFC after:	1 week
2014-08-14 05:13:24 +00:00
Konstantin Belousov
70978c93b8 If vm_page_grab() allocates a new page, the page is not inserted into
page queue even when the allocation is not wired.  It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.

In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.

In the uiomove_object_page(), deactivate page before the object is
unlocked.  There is no leak, since the page is deactivated after
uiomove_fromphys() finished.  But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-08-13 05:44:08 +00:00
Gleb Smirnoff
cd1692fa5d Move KASSERT into locked region.
Submitted by:	kib
2014-08-11 15:06:07 +00:00
Gleb Smirnoff
eaf78ad3f7 Use M_WAITOK in sf_buf_init().
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 13:12:18 +00:00
Gleb Smirnoff
818d40d033 Provide sf_buf_ref() to optimize refcounting of already allocated
sendfile(2) buffers.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-11 12:59:55 +00:00
Bjoern A. Zeeb
e346f8c452 Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant
and export the kern_ version needed by an upcoming linuxolator change.

MFC after:	3 days
Sponsored by:	DARPA,AFRL
2014-08-07 16:49:50 +00:00
Andrey V. Elsukov
3e40097976 Temporary revert r269661, it looks like the patch isn't complete. 2014-08-07 14:32:28 +00:00
Andrey V. Elsukov
c60e497af9 Use cpuset_setithread() to apply cpu mask to taskq threads.
Sponsored by:	Yandex LLC
2014-08-07 10:23:50 +00:00
Konstantin Belousov
d735998057 Correct the problems with the ptrace(2) making the debuggee an orphan.
One problem is inferior(9) looping due to the process tree becoming a
graph instead of tree if the parent is traced by child. Another issue
is due to the use of p_oppid to restore the original parent/child
relationship, because real parent could already exited and its pid
reused (noted by mjg).

Add the function proc_realparent(9), which calculates the parent for
given process. It uses the flag P_TREE_FIRST_ORPHAN to detect the head
element of the p_orphan list and than stepping back to its container
to find the parent process. If the parent has already exited, the
init(8) is returned.

Move the P_ORPHAN and the new helper flag from the p_flag* to new
p_treeflag field of struct proc, which is protected by proctree lock
instead of proc lock, since the orphans relationship is managed under
the proctree_lock already.

The remaining uses of p_oppid in ptrace(PT_DETACH) and process
reapping are replaced by proc_realparent(9).

Phabric:	D417
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-07 05:47:53 +00:00
Gleb Smirnoff
c8d2ffd6a7 Merge all MD sf_buf allocators into one MI, residing in kern/subr_sfbuf.c
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.

As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.

Tested by:	glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-08-05 09:44:10 +00:00
Kirk McKusick
5f9500c358 Add support for multi-threading of soft updates.
Replace a single soft updates thread with a thread per FFS-filesystem
mount point. The threads are associated with the bufdaemon process.

Reviewed by:  kib
Tested by:    Peter Holm and Scott Long
MFC after:    2 weeks
Sponsored by: Netflix
2014-08-04 22:03:58 +00:00
Davide Italiano
4295aa9240 Fix an overflow in getsockopt(). optval isn't big enough to hold
sbintime_t.
Re-introduce r255030 behaviour capping socket timeouts to INT_32
if they're too large.

CR:	https://phabric.freebsd.org/D433
Reported by:	demon
Reviewed by:	bde [1], jhb [2]
MFC after:	2 weeks
2014-08-04 05:40:51 +00:00
Peter Wemm
6dde7ecb5d Partial revert of r262867.
r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET,
but also changed one of the SOCK_DGRAM code paths to use the new
sbappendaddr_nospacecheck_locked() function.  This lead to SOCK_DGRAM
bypassing socket buffer limits.
2014-08-03 22:37:21 +00:00
Sergey Kandaurov
bcdd3bceb6 vn_path_to_global_path: update comment. 2014-08-03 07:59:19 +00:00
Warner Losh
146cbf6fa2 Make the witness lock limit an option. 2014-08-03 05:00:43 +00:00
Konstantin Belousov
168f4ee0a8 Remove Giant acquisition from the mount and unmount pathes.
It could be claimed that two things were reasonable protected by
Giant.  One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx.  Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-08-03 03:27:54 +00:00
Rui Paulo
551a78956c In the shm_open() and shm_unlink() syscalls, export the path to KTR.
MFC after:	1 week
2014-08-01 23:29:04 +00:00
Konstantin Belousov
634012b917 Remove one-time use macros which check for the vnode lifecycle. More,
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags.  Two of the three macros were only used in assertions.

In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself.  Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.

In v_incr_usecount(), call vholdl() before incrementing other ref
counters.  The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-29 16:42:34 +00:00
Konstantin Belousov
d3a3b8b038 Simplify the expression, by removing redundand calculation.
Noted by:	"O'Connor, Daniel" <Daniel.O'Connor@emc.com>
MFC after:	3 days
2014-07-29 01:46:31 +00:00
Konstantin Belousov
5d9b4508fd For md(4), posix shm(3) and tmpfs(5), free swap space used by paged in
dirty page, which is written by the process.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:27:05 +00:00
Pietro Cerutti
adecd05bf0 Unbreak the ABI by reverting r268494 until the compat shims are provided 2014-07-28 07:20:22 +00:00
Marcel Moolenaar
1e0a021e3d The accept filter code is not specific to the FreeBSD IPv4 network stack,
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.

Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.

Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.

In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.

Submitted by:	Steve Kiernan <stevek@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-07-26 19:27:34 +00:00
Marcel Moolenaar
be836fab6c Don't return ERESTART when the device is gone. In ttydev_leave() ERESTART
is the indication that draining got interrupted due to a revoke(2) and
that tty_drain() is to be called again for draining to complete. If the
device is flagged as gone, then waiting/draining is not possible. Only
return ERESTART when waiting is still possible.

Obtained from:	Juniper Networks, Inc.
2014-07-26 15:46:41 +00:00
Gavin Atkinson
f6b4f5ca21 Add error return to dumpsys(), and use it in doadump().
This commit does not add error returns to minidumpsys() or
textdump_dumpsys(); those can also be added later.

Submitted by:	Conrad Meyer (EMC / Isilon storage division)
2014-07-25 23:52:53 +00:00
Daniel Eischen
66d8df9dfc Insert new threads at the end of the thread list in the process
instead of at the beginning.  This allows an intra process signal
to be sent to the oldest thread with the signal unmasked - which,
if it still exists, is the main thread.  This mimics behavior
found in Linux and Solaris.
2014-07-25 20:21:02 +00:00
Mateusz Guzik
a1bf811596 Prepare fget_unlocked for reading fd table only once.
Some capsicum functions accept fdp + fd and lookup fde based on that.
Add variants which accept fde.

Reviewed by:	pjd
MFC after:	1 week
2014-07-23 19:33:49 +00:00
Mateusz Guzik
6a1cf96b4a Cosmetic changes to unp_internalize
Don't throw away the result of fget_unlocked.
Move fdp increment to for loop to make it consistent with similar code
elsewhere.

MFC after:	1 week
2014-07-23 18:04:52 +00:00
Gleb Smirnoff
c71b4037ff Use assignment instead of bcopy.
Submitted by:	jmg
2014-07-18 14:59:35 +00:00
Baptiste Daroussin
42e62eca52 Extend kqueue's EVFILT_TIMER by adding precision unit flags support
Define the precision macros as bits sets to conform with XNU equivalent.
Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag
is passed.

Phabric:	https://phabric.freebsd.org/D421
Reviewed by:	kib
2014-07-18 14:27:04 +00:00
Kevin Lo
c29a33213b Deprecate m_act. Use m_nextpkt always. 2014-07-17 05:21:16 +00:00
Don Lewis
d3a6879421 Nuke the never-used RF_TIMESHARE feature, reducing the complexity of the
code.  The consensus on arch@ is that this feature might have been useful
in the distant past, but is now just unnecessary bloat.

The int_rman_activate_resource() and int_rman_deactivate_resource()
functions become trivial, so manually inline them.

The special deferred handling of RF_ACTIVE is no longer needed in
reserve_resource_bound(), so eliminate the associated code at the
end of the function.

These changes reduce the object file size by more than 500 bytes on i386.

Update the rman.9 man page to reflect the removal of the RF_TIMESHARE
feature.

MFC after:	2 weeks
2014-07-16 22:18:19 +00:00
Konstantin Belousov
65589a29f4 Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code.  In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:04:46 +00:00
Konstantin Belousov
a62eb1398a Followup to r268466.
- Move the code to calculate resident count into separate function.
  It reduces the indent level and makes the operation of
  vmmap_skip_res_cnt tunable more clear.
- Optimize the calculation of the resident page count for map entry.
  Skip directly to the next lowest available index and page among the
  whole shadow chain.
- Restore the use of pmap_incore(9), only to verify that current
  mapping is indeed superpage.
- Note the issue with the invalid pages.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:57:03 +00:00
Konstantin Belousov
3760e341ca Change the calculation of the kinfo_vmentry field kve_private_resident
to reflect its name.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-15 19:49:00 +00:00
Mateusz Guzik
965d08605f Plug p_pptr null test in do_execve. It is always true. 2014-07-14 22:40:46 +00:00
Mateusz Guzik
c959c23740 Manage struct sigacts refcnt with atomics instead of a mutex.
MFC after:	1 week
2014-07-14 21:12:59 +00:00
Konstantin Belousov
895b3782c6 Extract the code to put a filesystem into the suspended state (at the
unmount time) in the helper vfs_write_suspend_umnt().  Use it instead
of two inline copies in FFS.

Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 09:10:00 +00:00
Konstantin Belousov
57ef02ff0f In kern_linkat(), avoid passing doomed vnode to the VOP.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:41:13 +00:00
Konstantin Belousov
a69452162a Generalize vn_get_ino() to allow filesystems to use custom vnode
producer, instead of hard-coding VFS_VGET().  New function, which
takes callback, is called vn_get_ino_gen(), standard callback for
vn_get_ino() is provided.

Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the
uses of vn_get_ino_gen().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-14 08:34:54 +00:00