Commit Graph

14234 Commits

Author SHA1 Message Date
Konstantin Belousov
7b445033ff On exec, single-threading must be enforced before arguments space is
allocated from exec_map.  If many threads try to perform execve(2) in
parallel, the exec map is exhausted and some threads sleep
uninterruptible waiting for the map space.  Then, the thread which won
the race for the space allocation, cannot single-thread the process,
causing deadlock.

Reported and tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-10 09:00:40 +00:00
Konstantin Belousov
44ec2b63c5 The vmem callback to reclaim kmem arena address space on low or
fragmented conditions currently just wakes up the pagedaemon.  The
kmem arena is significantly smaller then the total available physical
memory, which means that there are loads where kmem arena space could
be exhausted, while there is a lot of pages available still.  The
woken up pagedaemon sees vm_pages_needed != 0, verifies the condition
vm_paging_needed() which is false, clears the pass and returns back to
sleep, not calling neither uma_reclaim() nor lowmem handler.

To handle low kmem arena conditions, create additional pagedaemon
thread which calls uma_reclaim() directly.  The thread sleeps on the
dedicated channel and kmem_reclaim() wakes the thread in addition to
the pagedaemon.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-05-09 20:08:36 +00:00
Konstantin Belousov
ac437c0754 Do not return from thread_single(SINGLE_BOUNDARY) until all stopped
thread are guarenteed to be removed from the processors.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-05-09 18:32:13 +00:00
Andrey V. Elsukov
089bb672c4 m_dup() is supposed to give a writable copy of an mbuf chain. It uses
m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field.
If original mbuf chain has M_RDONLY flag, its copy also will have it.
Reset this flag explicitly.

MFC after:	2 weeks
2015-05-07 18:35:01 +00:00
Mateusz Guzik
edf1796d3e Fix up panics when fork fails due to hitting proc limit
The function clearning credentials on failure asserts the process is a
zombie, which is not true when fork fails.

Changing creds to NULL is unnecessary, but is still being done for
consistency with other code.

Pointy hat: mjg
Reported by: pho
2015-05-06 21:03:19 +00:00
Ian Lepore
28315e27a7 Implement a mechanism for making changes in the kernel<->driver PPS
interface without breaking ABI or API compatibility with existing drivers.

The existing data structures used to communicate between the kernel and
driver portions of PPS processing contain no spare/padding fields and no
flags field or other straightforward mechanism for communicating changes
in the structures or behaviors of the code.  This makes it difficult to
MFC new features added to the PPS facility.  ABI compatibility is
important; out-of-tree drivers in module form are known to exist.  (Note
that the existing api_version field in the pps_params structure must
contain the value mandated by RFC 2783 and any RFCs that come along after.)

These changes introduce a pair of abi-version fields which are filled in
by the driver and the kernel respectively to indicate the interface
version.  The driver sets its version field before calling the new
pps_init_abi() function.  That lets the kernel know how much of the
pps_state structure is understood by the driver and it can avoid using
newer fields at the end of the structure that it knows about if the driver
is a lower version.  The kernel fills in its version field during the init
call, letting the driver know what features and data the kernel supports.

To implement the new version information in a way that is backwards
compatible with code from before these changes, the high bit of the
lightly-used 'kcmode' field is repurposed as a flag bit that indicates the
driver is aware of the abi versioning scheme.  Basically if this bit is
clear that indicates a "version 0" driver and if it is set the driver_abi
field indicates the version.

These changes also move the recently-added 'mtx' field of pps_state from
the middle to the end of the structure, and make the kernel code that uses
this field conditional on the driver being abi version 1 or higher.  It
changes the only driver currently supplying the mtx field, usb_serial, to
use pps_init_abi().

Reviewed by:	hselasky@
2015-05-04 17:59:39 +00:00
Mariusz Zaborski
a523fa069f nv_malloc can fail in userland.
Add check to prevent a NULL pointer dereference.

Pointed out by:	mjg
Approved by:	pjd (mentor)
2015-05-02 18:12:34 +00:00
Mariusz Zaborski
da20d06f84 Remove duplicated code using macro template for the nvlist_add_.* functions.
Approved by:	pjd (mentor)
2015-05-02 18:10:45 +00:00
Mariusz Zaborski
169c153b59 Introduce the NV_FLAG_NO_UNIQUE flag. When set, it allows to store
multiple values using the same key in a nvlist.

Approved by:	pjd (mentor)
Obtained from:	WHEEL Systems (http://www.wheelsystems.com)

Update man page.

Reviewed by:	AllanJude
Approved by:	pjd (mentor)
2015-05-02 18:03:47 +00:00
Mariusz Zaborski
bd1da0a002 Approved, oprócz użycie RESTORE_ERRNO() do ustawiania errno.
Change the nvlist_recv() function to take additional argument that
specifies flags expected on the received nvlist. Receiving a nvlist with
different set of flags than the ones we expect might lead to undefined
behaviour, which might be potentially dangerous.

Update consumers of this and related functions and update the tests.

Approved by:	pjd (mentor)

Update man page for nvlist_unpack, nvlist_recv, nvlist_xfer, cap_recv_nvlist
and cap_xfer_nvlist.

Reviewed by:	AllanJude
Approved by:	pjd (mentor)
2015-05-02 17:45:52 +00:00
Bjoern A. Zeeb
44aa151e1b Fix an off-by-one bug in string/array handling which lead to memory overwrite
and follow-up assertion errors on at least ARM after r282257,
with nvp_magic being 0x6e7600:
Assertion failed: ((nvp)->nvp_magic == 0x6e7670), function nvpair_name, file .../subr_nvpair.c, line 713.

Sponsored by:	DARPA/AFRL
2015-05-02 08:31:16 +00:00
Mark Johnston
9ad64f27be Remove a stale reference to the stop_scheduler_on_panic tunable, which
itself was removed in r243515.

MFC after:	1 week
2015-05-02 00:27:58 +00:00
Mariusz Zaborski
24f93ee714 Add nvlist_flags() function, which returns nvlist's public flags.
Approved by:	pjd (mentor)
2015-05-01 17:50:24 +00:00
Mariusz Zaborski
b7be86aca6 Mark local function as static as a result of removing recursion.
Approved by:	pjd (mentor)
2015-04-30 20:50:42 +00:00
Mariusz Zaborski
45fd5ced7b Rename macros to use prefix ERRNO. Add macro ERRNO_SET. Now
ERRNO_{RESTORE/SAVE} must by used together, additional variable is not
needed. Always use ERRNO_{SAVE/RESTORE/SET} macros.

Approved by:	pjd (mentor)
2015-04-30 20:47:33 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Mariusz Zaborski
be73922fcd Save errno from close override.
Approved by:	pjd (mentor)
2015-04-29 22:59:44 +00:00
Mariusz Zaborski
0bb5e6ef80 Remove the nvlist_.*[fv] functions.
Those functions are problematic, because there is no way to report
memory allocation problems without complicating the API, so we can
either abort or potentially return invalid results. None of which is
acceptable.

In most cases the caller knows the size of the name, so he can allocate
buffer on the stack and use snprintf(3) to prepare the name.

After some discussion the conclusion is to removed those functions,
which also simplifies the API.

Discussed with: pjd, rstone
Approved by:	pjd (mentor)
2015-04-29 22:57:04 +00:00
Mariusz Zaborski
3cfb71c186 Remove recursion from descriptor-related functions.
Approved by:	pjd (mentor)
2015-04-29 22:15:02 +00:00
Mariusz Zaborski
003e3ea15b Always use the nv_malloc macro instead of malloc(3).
Approved by:	pjd (mentor)
2015-04-29 21:54:34 +00:00
Mariusz Zaborski
906289c2ec Style fixes.
Approved by:	pjd (mentor)
2015-04-29 21:50:04 +00:00
Edward Tomasz Napierala
4b5c9cf62f Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

Differential Revision:	https://reviews.freebsd.org/D2369
Reviewed by:	kib@
MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-04-29 10:23:02 +00:00
Edward Tomasz Napierala
aa32b5e076 Make setproctitle(3) work in Capsicum capability mode. This makes
ctld(8) child processes to indicate initiator address and name in
their titles, similar to what iscsid(8) child processes do.

PR:		181352
Differential Revision:	https://reviews.freebsd.org/D2363
Reviewed by:	rwatson@, mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-27 11:18:16 +00:00
Konstantin Belousov
9a2c85350a Partially revert r255986: do not call VOP_FSYNC() when helping
bufdaemon in getnewbuf(), do use buf_flush().  The difference is that
bufdaemon uses TRYLOCK to get buffer locks, which allows calls to
getnewbuf() while another buffer is locked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-27 11:13:19 +00:00
Konstantin Belousov
f16f8610bb Fix locking for oshmctl() and shmsys().
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-27 11:12:51 +00:00
Mateusz Guzik
8d0a4ab212 fd: plug an always overwritten initialization in fdalloc 2015-04-26 17:27:55 +00:00
Mateusz Guzik
203322f966 Consistently use p instead of td->td_proc in create_thread
No functional changes.
2015-04-26 17:22:59 +00:00
Rick Macklem
7cfdc2a7bc MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.

Differential Revision:	https://reviews.freebsd.org/D2330
Reviewed by:	mav, kib
MFC after:	2 weeks
2015-04-25 00:52:01 +00:00
Konstantin Belousov
b9062c934a Use correct length for sparse uiomove(). It must be the clipped to
the page size, len is the total transfer length, which may be larger
than zero_region.

Reported and tested by:	clusteradm (gjb)
Sponsored by:	The FreeBSD Foundation
X-MFC-With:	r281442
2015-04-24 22:05:12 +00:00
Mark Johnston
da10a60340 Make vpanic() externally visible so that it can be called as part of the
DTrace panic() action.

Differential Revision:	https://reviews.freebsd.org/D2349
Reviewed by:	avg
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-04-24 03:17:21 +00:00
Konstantin Belousov
fe8a824ca6 Handle incorrect ELF images specifying size for PT_GNU_STACK not being
multiple of page size.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2015-04-23 11:27:21 +00:00
Alexander Motin
f743d981f3 Make AIO to not allocate pbufs for unmapped I/O like r281825.
While there, make few more performance optimizations.

On 40-core system doing many 512-byte AIO reads from array of raw SSDs
this change removes lock congestions inside pbuf allocator and devfs,
and bottleneck on single AIO completion taskqueue thread.  It improves
peak AIO performance from ~600K to ~1.3M IOPS.

MFC after:	2 weeks
2015-04-22 18:11:34 +00:00
Craig Rodrigues
d9db52256e Move zlib.c from net to libkern.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/

Submitted by:		Steve Kiernan stevek@juniper.net
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/28
Relnotes:		yes
2015-04-22 14:38:58 +00:00
Craig Rodrigues
d5fec48956 Support file verification in MAC.
* Add VCREAT flag to indicate when a new file is being created
* Add VVERIFY to indicate verification is required
* Both VCREAT and VVERIFY are only passed on the MAC method vnode_check_open
  and are removed from the accmode after
* Add O_VERIFY flag to rtld open of objects
* Add 'v' flag to __sflags to set O_VERIFY flag.

Submitted by:		Steve Kiernan <stevek@juniper.net>
Obtained from:		Juniper Networks, Inc.
GitHub Pull Request:	https://github.com/freebsd/freebsd/pull/27
Relnotes:		yes
2015-04-22 01:54:25 +00:00
Edward Tomasz Napierala
6289b482ec Modify kern___getcwd() to take max pathlen limit as an additional
argument.  This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.

Differential Revision:	https://reviews.freebsd.org/D2335
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-21 13:55:24 +00:00
Alexander Motin
869fd29a7b Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here.  If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.

On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS.  On my previous
24-core system this problem also existed, but was less serious.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-21 10:55:53 +00:00
Eric van Gyzen
c207ff9319 Always send log(9) messages to the message buffer.
It is truer to the semantics of logging for messages to *always*
go to the message buffer, where they can eventually be collected
and, in fact, be put into a log file.

This restores the behavior prior to r70239, which seems to have
changed it inadvertently.

Submitted by:	Eric Badger <eric@badgerio.us>
Reviewed by:	jhb
Approved by:	kib (mentor)
Obtained from:	Dell Inc.
MFC after:	1 week
2015-04-20 20:03:26 +00:00
Konstantin Belousov
8103a8f608 Regen. 2015-04-18 21:50:53 +00:00
Konstantin Belousov
0538aafc41 The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter.  The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development.  The
shims were reasonable to allow easier revert to the older kernel at
that time.

Now, two or three major releases later, shims do not serve any
purpose.  Such old kernels cannot handle current libc, so revert the
compatibility code.

Make padded syscalls support conditional under the COMPAT6 config
option.  For COMPAT32, the syscalls were under COMPAT6 already.

Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.

Reviewed by:	jhb, imp (previous versions)
Discussed with:	peter
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-18 21:50:13 +00:00
Mark Johnston
38563f7c92 Remove unimplemented sched provider probes.
They were added for compatibility with the sched provider in Solaris and
illumos, but our sched provider is already incompatible since it uses native
types, so there isn't much point in keeping them around.

Differential Revision:	https://reviews.freebsd.org/D2167
Reviewed by:	rpaulo
2015-04-18 20:36:58 +00:00
Konstantin Belousov
ad8b1d857d Initialize td_sel in the thread_init(). Struct thread is not zeroed
on the initial allocation, but seltdinit() assumes that td_sel is NULL
or a valid pointer.  Note that thread_fini()/seltdfini() also relies
on this, but correctly resets td_sel to NULL.

Submitted by:	luke.tw@gmail.com
PR:	199518
MFC after:	1 week
2015-04-18 17:21:12 +00:00
Kirk McKusick
f351915514 More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.

Reviewed by: kib@
Phabric:     D2265
2015-04-18 00:59:03 +00:00
Devin Teske
43d4f8c4c6 Add "GELI Passphrase:" prompt to boot loader.
A new loader.conf(5) option of geom_eli_passphrase_prompt="YES" will now
allow you to enter your geli(8) root-mount credentials prior to invoking
the kernel.

See check-password.4th(8) for details.

Differential Revision:	https://reviews.freebsd.org/D2105
Reviewed by:	imp, kmoore
Discussed on:	-current
MFC after:	3 days
X-MFC-to:	stable/10
Relnotes:	yes
2015-04-16 20:53:15 +00:00
Rick Macklem
dda11d4ab9 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
Neel Natu
1a688aa53e Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.
Device probe value of BUS_PROBE_NOWILDCARD should be treated specially only
if the device has a fixed devclass. Otherwise it should be interpreted just
as if the driver doesn't want to claim the device.

Prior to this change a device that was not claimed explicitly by its driver
would remain "attached" to the driver that returned BUS_PROBE_NOWILDCARD.
This would bump up the reference on 'driver->refs' and its 'dev->ops' would
point to the 'driver->ops'. When the driver is subsequently unloaded the
'dev->ops->cls' is left pointing to freed memory.

This fixes an easily reproducible #GP fault caused by loading and unloading
vmm.ko multiple times.

Differential Revision:	https://reviews.freebsd.org/D2294
Reviewed by:	imp, jhb
Discussed with:	rstone
Reported by:	Leon Dang (ldang@nahannisys.com)
MFC after:	2 weeks
2015-04-15 16:22:05 +00:00
Edward Tomasz Napierala
1c73bcab8e Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This
adds missing jail and MAC checks.

Differential Revision:	https://reviews.freebsd.org/D2193
Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-04-15 09:13:11 +00:00
Konstantin Belousov
316b384343 Implement support for binary to requesting specific stack size for the
initial thread.  It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.

The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-04-15 08:13:53 +00:00
George V. Neville-Neil
63bf240482 When a kernel has DEVICE_POLLING turned on but no drivers have
the capability do not try to take the mutex at all.

Replaces misbegotten attempt from reverted commit 281276

Pointed out by: glebius
Sponsored by: Rubicon Communications (Netgate)
Differential Revision:	https://reviews.freebsd.org/D2262
2015-04-14 14:22:34 +00:00
Randall Stewart
b132edb56f Fix my stupid restoral of old code.. must be c_iflags now.
Thanks jhb for catching my stupidity...
MFC after:	3 days
2015-04-14 00:02:39 +00:00
Randall Stewart
07a2df5d83 Restore the two lines accidentally deleted that allow CALLOUT_DIRECT to be
specifed in the flags.

Thanks Mark Johnston for noticing this ;-o

MFC after:	3 days
2015-04-13 23:06:13 +00:00