Commit Graph

14946 Commits

Author SHA1 Message Date
andrew
120a010498 Add an interface to handle interrupt controllers that have a contiguous
range of interrupts they pass to a second controller driver to handle.
The parent driver is expected to detect when one of these interrupts has
been triggered and call intr_child_irq_handler to pass the interrupt to
a child. The children controllers are then expected to manage the range
by allocating interrupts as needed.

This will initially be used by the ARM GICv3 driver, but is is expected to
be useful for other driver where this type of allocation applies.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6436
2016-06-03 10:13:18 +00:00
mjg
2175f886b3 taskqueue: plug a leak in _taskqueue_create
While here make some style fixes and postpone the sprintf so that it is
only done when the function can no longer fail.

CID:	1356041
2016-06-02 15:52:34 +00:00
mjg
25669dd1d9 Microoptimize locking primitives by avoiding unnecessary atomic ops.
Inline version of primitives do an atomic op and if it fails they fallback to
actual primitives, which immediately retry the atomic op.

The obvious optimisation is to check if the lock is free and only then proceed
to do an atomic op.

Reviewed by:	jhb, vangyzen
2016-06-01 18:32:20 +00:00
bz
fac944a70a The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
glebius
a0c5a05de8 Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.
Submitted by:	CTurt
Security:	SA-16:20
Security:	SA-16:21
2016-05-31 16:56:30 +00:00
trasz
661ae93171 Cosmetics - add missing space after ellipses in shutdown messages.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-31 15:27:33 +00:00
jamie
f9e8138137 Mark jail(2), and the sysctls that it (and only it) uses as deprecated.
jail(8) has long used jail_set(2), and those sysctl only cause confusion.
2016-05-30 05:21:24 +00:00
mjg
3cca53e0a8 fd: provide a common exit point for unlock in kern_dup
While here assert dropped filedesc lock on return from closefp.
2016-05-27 17:00:15 +00:00
mjg
4a70fa3997 exec: get rid of one vnode lock/unlock pair in do_execve
The lock was temporarily dropped for vrele calls, but they can be
postponed to a point where the lock is not held in the first place.

While here shuffle other code not needing the lock.
2016-05-27 15:03:38 +00:00
bdrewery
01a0751bcd exec: Provide execpath in imgp for the process_exec hook.
This was previously set after the hook and only if auxargs were present.
Now always provide it if possible.

MFC after:	2 weeks
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D6546
2016-05-26 23:19:39 +00:00
bdrewery
ecf7a0caee exec: Add credential change information into imgp for process_exec hook.
This allows an EVENTHANDLER(process_exec) hook to see if the new image
will cause credentials to change whether due to setgid/setuid or because
of POSIX saved-id semantics.

This adds 3 new fields into image_params:
  struct ucred *newcred		Non-null if the credentials will change.
  bool credential_setid		True if the new image is setuid or setgid.

This will pre-determine the new credentials before invoking the image
activators, where the process_exec hook is called.  The new credentials
will be installed into the process in the same place as before, after
image activators are done handling the image.

MFC after:	2 weeks
Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D6544
2016-05-26 23:18:54 +00:00
cem
444253bba5 crypto routines: Hint minimum buffer sizes to the compiler
Use the C99 'static' keyword to hint to the compiler IVs and output digest
sizes.  The keyword informs the compiler of the minimum valid size for a given
array.  Obviously not every pointer can be validated (i.e., the compiler can
produce false negative but not false positive reports).

No functional change.  No ABI change.

Sponsored by:	EMC / Isilon Storage Division
2016-05-26 19:29:29 +00:00
hselasky
d09154a9e8 Add support for boolean sysctl's.
Because the size of bool can be implementation defined, make a bool
sysctl handler which handle bools. Userspace sees the bools like
unsigned 8-bit integers. Values are filtered to either 1 or 0 upon
read and write, similar to what a compiler would do.

Requested by:	kmacy @
Sponsored by:	Mellanox Technologies
2016-05-26 08:41:55 +00:00
ian
8d8c35656e Include machine/acle-compat.h in cdefs.h on arm if the compiler doesn't
have ACLE support built in.  The ACLE (ARM C Language Extensions) defines
a set of standardized symbols which indicate the architecture version and
features available.  ACLE support is built in to modern compilers (both
clang and gcc), but absent from gcc prior to 4.4.

ARM (the company) provides the acle-compat.h header file to define the
right symbols for older versions of gcc.  Basically, acle-compat.h does
for arm about the same thing cdefs.h does for freebsd: defines
standardized macros that work no matter which compiler you use.  If ARM
hadn't provided this file we would have ended up with a big #ifdef __arm__
section in cdefs.h with our own compatibility shims.

Remove #include <machine/acle-compat.h> from the zillion other places (an
ever-growing list) that it appears.  Since style(9) requires sys/types.h
or sys/param.h early in the include list, and both of those lead to
including cdefs.h, only a couple special cases still need to include
acle-compat.h directly.

Loves it:     imp
2016-05-25 19:44:26 +00:00
kib
e2164fa4cf Silence false LOR report due to the taskqueue mutex and kqueue lock
named the same.

Reported by:	Doug Luce <doug@freebsd.con.com>
Sponsored by:	The FreeBSD Foundation
2016-05-24 21:13:33 +00:00
jhb
f913b0e3d5 Return the correct status when a partially completed request is cancelled.
After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists where a request can be
cancelled after it has partially completed.  POSIX doesn't appear to
dictate exactly how to handle this case, but in general I feel that
aio_cancel() should arrange to cancel any request it can, but that any
partially completed requests should return a partial completion rather
than ECANCELED.  To that end, fix the socket AIO cancellation routine to
return a short read/write if a partially completed request is cancelled
rather than ECANCELED.

Sponsored by:	Chelsio Communications
2016-05-24 21:09:05 +00:00
andrew
0304f4942c Limit calling pmc_hook to when the interrupt comes while running userspace.
We may enable interrupts from within the callback, e.g. in a data abort
during copyin. If we receive an interrupt at that time pmc_hook will be
called again and, as it is handling userspace stack tracing, will hit a
KASSERT as it checks if the trapframe is from userland.

With this I can run hwpmc with intrng on a ThunderX and have it trace all
CPUs.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-24 12:06:56 +00:00
jhb
2889382edd Don't prematurely return short completions on blocking sockets.
Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket.
Previously, requests were only requeued if they returned EWOULDBLOCK
and completed no data.  Now after a partial completion on a blocking
socket the request is queued and the remaining request is retried when
the socket is ready.  This allows writes larger than the currently
available space on a blocking socket to fully complete.  Reads on a
blocking socket that satifsy the low watermark can still return a short
read (same as read()).

In order to track previously completed data, the internal 'status'
field of the AIO job is used to store the amount of previously
computed data.

Non-blocking sockets continue to return short completions for both
reads and writes.

Add a test for a "large" AIO write on a blocking socket that writes
twice the socket buffer size to a UNIX domain socket.

Sponsored by:	Chelsio Communications
2016-05-24 03:13:27 +00:00
asomers
69a8601e63 Fix build of kern/subr_unit.c, broken by r300539
Reported by:	peter
Pointyhat to:	asomers
Sponsored by:	Spectra Logic Corp
2016-05-24 00:14:58 +00:00
asomers
d14be2b60f Add bit_count to the bitstring(3) api
Add a bit_count function, which efficiently counts the number of bits set in
a bitstring.

sys/sys/bitstring.h
tests/sys/sys/bitstring_test.c
share/man/man3/bitstring.3
	Add bit_alloc

sys/kern/subr_unit.c
	Use bit_count instead of a naive counting loop in check_unrhdr, used
	when INVARIANTS are enabled. The userland test runs about 6x faster
	in a generic build, or 8.5x faster when built for Nehalem, which has
	the POPCNT instruction.

sys/sys/param.h
	Bump __FreeBSD_version due to the addition of bit_alloc

UPDATING
	Add a note about the ABI incompatibility of the bitstring(3)
	changes, as suggested by lidl.

Suggested by:	gibbs
Reviewed by:	gibbs, ngie
MFC after:	9 days
X-MFC-With:	299090, 300538
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6255
2016-05-23 20:29:18 +00:00
andrew
57b4892772 Add the needed hwpmc hooks to subr_intr.c. This is needed for the correct
operation of hwpmc on, for example, arm64 with intrng.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-23 15:26:35 +00:00
hselasky
5a9f6f8a54 Use DELAY() instead of _sleep() when SCHEDULER_STOPPED() is set inside
pause_sbt(). This allows pause() to continue working during a panic()
which is not invoking KDB. This is useful when debugging graphics
drivers using the LinuxKPI.

Obtained from:	kmacy @
MFC after:	1 week
2016-05-23 10:31:54 +00:00
bapt
8bfdfde95e Fix typo introduced by me (not the submitter) when fixing typos 2016-05-22 13:10:48 +00:00
bapt
58e5acb1f5 Fix typos in the comments
Submitted by:	cipherwraith666@gmail.com (via github)
2016-05-22 13:04:45 +00:00
avg
0dbe97e8b2 fix loss of taskqueue wakeups (introduced in r300113)
Submitted by:	kmacy
Tested by:	dchagin
2016-05-21 14:51:49 +00:00
jhb
ca492fdb2d Add sglist functions for working with arrays of VM pages.
sglist_count_vmpages() determines the number of segments required for
a buffer described by an array of VM pages. sglist_append_vmpages()
adds the segments described by such a buffer to an sglist.  The latter
function is largely pulled from sglist_append_bio(), and
sglist_append_bio() now uses sglist_append_vmpages().

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2016-05-20 23:28:43 +00:00
jhb
ed372eaa9f Consistently set status to -1 when completing an AIO request with an error.
Sponsored by:	Chelsio Communications
2016-05-20 19:46:25 +00:00
jhb
b4b2ae5652 Add new bus methods for mapping resources.
Add a pair of bus methods that can be used to "map" resources for direct
CPU access using bus_space(9).  bus_map_resource() creates a mapping and
bus_unmap_resource() releases a previously created mapping.  Mappings are
described by 'struct resource_map' object.  Pointers to these objects can
be passed as the first argument to the bus_space wrapper API used for bus
resources.

Drivers that wish to map all of a resource using default settings
(for example, using uncacheable memory attributes) do not need to change.
However, drivers that wish to use non-default settings can now do so
without jumping through hoops.

First, an RF_UNMAPPED flag is added to request that a resource is not
implicitly mapped with the default settings when it is activated.  This
permits other activation steps (such as enabling I/O or memory decoding
in a device's PCI command register) to be taken without creating a
mapping.  Right now the AGP drivers don't set RF_ACTIVE to avoid using
up a large amount of KVA to map the AGP aperture on 32-bit platforms.
Once RF_UNMAPPED is supported on all platforms that support AGP this
can be changed to using RF_UNMAPPED with RF_ACTIVE instead.

Second, bus_map_resource accepts an optional structure that defines
additional settings for a given mapping.

For example, a driver can now request to map only a subset of a resource
instead of the entire range.  The AGP driver could also use this to only
map the first page of the aperture (IIRC, it calls pmap_mapdev() directly
to map the first page currently).  I will also eventually change the
PCI-PCI bridge driver to request mappings of the subset of the I/O window
resource on its parent side to create mappings for child devices rather
than passing child resources directly up to nexus to be mapped.  This
also permits bridges that do address translation to request suitable
mappings from a resource on the "upper" side of the bus when mapping
resources on the "lower" side of the bus.

Another attribute that can be specified is an alternate memory attribute
for memory-mapped resources.  This can be used to request a
Write-Combining mapping of a PCI BAR in an MI fashion.  (Currently the
drivers that do this call pmap_change_attr() directly for x86 only.)

Note that this commit only adds the MI framework.  Each platform needs
to add support for handling RF_UNMAPPED and thew new
bus_map/unmap_resource methods.  Generally speaking, any drivers that
are calling rman_set_bustag() and rman_set_bushandle() need to be
updated.

Discussed on:	arch
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D5237
2016-05-20 17:57:47 +00:00
markj
a09fd6097b Move IPv6 malloc tag definitions into the IPv6 code. 2016-05-20 04:45:08 +00:00
scottl
ae68a36f74 Adjust the creation of tq_name so it can be freed correctly
Reviewed by:	jhb, allanjude
Differential Revision:	D6454
2016-05-19 17:14:24 +00:00
ken
7eeed3c838 Add support for managing Shingled Magnetic Recording (SMR) drives.
This change includes support for SCSI SMR drives (which conform to the
Zoned Block Commands or ZBC spec) and ATA SMR drives (which conform to
the Zoned ATA Command Set or ZAC spec) behind SAS expanders.

This includes full management support through the GEOM BIO interface, and
through a new userland utility, zonectl(8), and through camcontrol(8).

This is now ready for filesystems to use to detect and manage zoned drives.
(There is no work in progress that I know of to use this for ZFS or UFS, if
anyone is interested, let me know and I may have some suggestions.)

Also, improve ATA command passthrough and dispatch support, both via ATA
and ATA passthrough over SCSI.

Also, add support to camcontrol(8) for the ATA Extended Power Conditions
feature set.  You can now manage ATA device power states, and set various
idle time thresholds for a drive to enter lower power states.

Note that this change cannot be MFCed in full, because it depends on
changes to the struct bio API that break compatilibity.  In order to
avoid breaking the stable API, only changes that don't touch or depend on
the struct bio changes can be merged.  For example, the camcontrol(8)
changes don't depend on the new bio API, but zonectl(8) and the probe
changes to the da(4) and ada(4) drivers do depend on it.

Also note that the SMR changes have not yet been tested with an actual
SCSI ZBC device, or a SCSI to ATA translation layer (SAT) that supports
ZBC to ZAC translation.  I have not yet gotten a suitable drive or SAT
layer, so any testing help would be appreciated.  These changes have been
tested with Seagate Host Aware SATA drives attached to both SAS and SATA
controllers.  Also, I do not have any SATA Host Managed devices, and I
suspect that it may take additional (hopefully minor) changes to support
them.

Thanks to Seagate for supplying the test hardware and answering questions.

sbin/camcontrol/Makefile:
	Add epc.c and zone.c.

sbin/camcontrol/camcontrol.8:
	Document the zone and epc subcommands.

sbin/camcontrol/camcontrol.c:
	Add the zone and epc subcommands.

	Add auxiliary register support to build_ata_cmd().  Make sure to
	set the CAM_ATAIO_NEEDRESULT, CAM_ATAIO_DMA, and CAM_ATAIO_FPDMA
	flags as appropriate for ATA commands.

	Add a new get_ata_status() function to parse ATA result from SCSI
	sense descriptors (for ATA passthrough over SCSI) and ATA I/O
	requests.

sbin/camcontrol/camcontrol.h:
	Update the build_ata_cmd() prototype

	Add get_ata_status(), zone(), and epc().

sbin/camcontrol/epc.c:
	Support for ATA Extended Power Conditions features.  This includes
	support for all features documented in the ACS-4 Revision 12
	specification from t13.org (dated February 18, 2016).

	The EPC feature set allows putting a drive into a power power mode
	immediately, or setting timeouts so that the drive will
	automatically enter progressively lower power states after various
	idle times.

sbin/camcontrol/fwdownload.c:
	Update the firmware download code for the new build_ata_cmd()
	arguments.

sbin/camcontrol/zone.c:
	Implement support for Shingled Magnetic Recording (SMR) drives
	via SCSI Zoned Block Commands (ZBC) and ATA Zoned Device ATA
	Command Set (ZAC).

	These specs were developed in concert, and are functionally
	identical.  The primary differences are due to SCSI and ATA
	differences.  (SCSI is big endian, ATA is little endian, for
	example.)

	This includes support for all commands defined in the ZBC and
	ZAC specs.

sys/cam/ata/ata_all.c:
	Decode a number of additional ATA command names in ata_op_string().

	Add a new CCB building function, ata_read_log().

	Add ata_zac_mgmt_in() and ata_zac_mgmt_out() CCB building
	functions.  These support both DMA and NCQ encapsulation.

sys/cam/ata/ata_all.h:
	Add prototypes for ata_read_log(), ata_zac_mgmt_out(), and
	ata_zac_mgmt_in().

sys/cam/ata/ata_da.c:
	Revamp the ada(4) driver to support zoned devices.

	Add four new probe states to gather information needed for zone
	support.

	Add a new adasetflags() function to avoid duplication of large
	blocks of flag setting between the async handler and register
	functions.

	Add new sysctl variables that describe zone support and paramters.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

sys/cam/scsi/scsi_all.c:
	Add command descriptions for the ZBC IN/OUT commands.

	Add descriptions for ZBC Host Managed devices.

	Add a new function, scsi_ata_pass() to do ATA passthrough over
	SCSI.  This will eventually replace scsi_ata_pass_16() -- it
	can create the 12, 16, and 32-byte variants of the ATA
	PASS-THROUGH command, and supports setting all of the
	registers defined as of SAT-4, Revision 5 (March 11, 2016).

	Change scsi_ata_identify() to use scsi_ata_pass() instead of
	scsi_ata_pass_16().

	Add a new scsi_ata_read_log() function to facilitate reading
	ATA logs via SCSI.

sys/cam/scsi/scsi_all.h:
	Add the new ATA PASS-THROUGH(32) command CDB.  Add extended and
	variable CDB opcodes.

	Add Zoned Block Device Characteristics VPD page.

	Add ATA Return SCSI sense descriptor.

	Add prototypes for scsi_ata_read_log() and scsi_ata_pass().

sys/cam/scsi/scsi_da.c:
	Revamp the da(4) driver to support zoned devices.

	Add five new probe states, four of which are needed for ATA
	devices.

	Add five new sysctl variables that describe zone support and
	parameters.

	The da(4) driver supports SCSI ZBC devices, as well as ATA ZAC
	devices when they are attached via a SCSI to ATA Translation (SAT)
	layer.  Since ZBC -> ZAC translation is a new feature in the T10
	SAT-4 spec, most SATA drives will be supported via ATA commands
	sent via the SCSI ATA PASS-THROUGH command.  The da(4) driver will
	prefer the ZBC interface, if it is available, for performance
	reasons, but will use the ATA PASS-THROUGH interface to the ZAC
	command set if the SAT layer doesn't support translation yet.
	As I mentioned above, ZBC command support is untested.

	Add support for the new BIO_ZONE bio, and all of its subcommands:
	DISK_ZONE_OPEN, DISK_ZONE_CLOSE, DISK_ZONE_FINISH, DISK_ZONE_RWP,
	DISK_ZONE_REPORT_ZONES, and DISK_ZONE_GET_PARAMS.

	Add scsi_zbc_in() and scsi_zbc_out() CCB building functions.

	Add scsi_ata_zac_mgmt_out() and scsi_ata_zac_mgmt_in() CCB/CDB
	building functions.  Note that these have return values, unlike
	almost all other CCB building functions in CAM.  The reason is
	that they can fail, depending upon the particular combination
	of input parameters.  The primary failure case is if the user
	wants NCQ, but fails to specify additional CDB storage.  NCQ
	requires using the 32-byte version of the SCSI ATA PASS-THROUGH
	command, and the current CAM CDB size is 16 bytes.

sys/cam/scsi/scsi_da.h:
	Add ZBC IN and ZBC OUT CDBs and opcodes.

	Add SCSI Report Zones data structures.

	Add scsi_zbc_in(), scsi_zbc_out(), scsi_ata_zac_mgmt_out(), and
	scsi_ata_zac_mgmt_in() prototypes.

sys/dev/ahci/ahci.c:
	Fix SEND / RECEIVE FPDMA QUEUED in the ahci(4) driver.

	ahci_setup_fis() previously set the top bits of the sector count
	register in the FIS to 0 for FPDMA commands.  This is okay for
	read and write, because the PRIO field is in the only thing in
	those bits, and we don't implement that further up the stack.

	But, for SEND and RECEIVE FPDMA QUEUED, the subcommand is in that
	byte, so it needs to be transmitted to the drive.

	In ahci_setup_fis(), always set the the top 8 bits of the
	sector count register.  We need it in both the standard
	and NCQ / FPDMA cases.

sys/geom/eli/g_eli.c:
	Pass BIO_ZONE commands through the GELI class.

sys/geom/geom.h:
	Add g_io_zonecmd() prototype.

sys/geom/geom_dev.c:
	Add new DIOCZONECMD ioctl, which allows sending zone commands to
	disks.

sys/geom/geom_disk.c:
	Add support for BIO_ZONE commands.

sys/geom/geom_disk.h:
	Add a new flag, DISKFLAG_CANZONE, that indicates that a given
	GEOM disk client can handle BIO_ZONE commands.

sys/geom/geom_io.c:
	Add a new function, g_io_zonecmd(), that handles execution of
	BIO_ZONE commands.

	Add permissions check for BIO_ZONE commands.

	Add command decoding for BIO_ZONE commands.

sys/geom/geom_subr.c:
	Add DDB command decoding for BIO_ZONE commands.

sys/kern/subr_devstat.c:
	Record statistics for REPORT ZONES commands.  Note that the
	number of bytes transferred for REPORT ZONES won't quite match
	what is received from the harware.  This is because we're
	necessarily counting bytes coming from the da(4) / ada(4) drivers,
	which are using the disk_zone.h interface to communicate up
	the stack.  The structure sizes it uses are slightly different
	than the SCSI and ATA structure sizes.

sys/sys/ata.h:
	Add many bit and structure definitions for ZAC, NCQ, and EPC
	command support.

sys/sys/bio.h:
	Convert the bio_cmd field to a straight enumeration.  This will
	yield more space for additional commands in the future.  After
	change r297955 and other related changes, this is now possible.
	Converting to an enumeration will also prevent use as a bitmask
	in the future.

sys/sys/disk.h:
	Define the DIOCZONECMD ioctl.

sys/sys/disk_zone.h:
	Add a new API for managing zoned disks.  This is very close to
	the SCSI ZBC and ATA ZAC standards, but uses integers in native
	byte order instead of big endian (SCSI) or little endian (ATA)
	byte arrays.

	This is intended to offer to the complete feature set of the ZBC
	and ZAC disk management without requiring the application developer
	to include SCSI or ATA headers.  We also use one set of headers
	for ioctl consumers and kernel bio-level consumers.

sys/sys/param.h:
	Bump __FreeBSD_version for sys/bio.h command changes, and inclusion
	of SMR support.

usr.sbin/Makefile:
	Add the zonectl utility.

usr.sbin/diskinfo/diskinfo.c
	Add disk zoning capability to the 'diskinfo -v' output.

usr.sbin/zonectl/Makefile:
	Add zonectl makefile.

usr.sbin/zonectl/zonectl.8
	zonectl(8) man page.

usr.sbin/zonectl/zonectl.c
	The zonectl(8) utility.  This allows managing SCSI or ATA zoned
	disks via the disk_zone.h API.  You can report zones, reset write
	pointers, get parameters, etc.

Sponsored by:	Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D6147
Reviewed by:	wblock (documentation)
2016-05-19 14:08:36 +00:00
glebius
047e60c94a The SA-16:19 wouldn't have happened if the sockargs() had properly typed
argument for length.  While here make it static and convert to ANSI C.

Reviewed by:	C Turt
2016-05-18 22:05:50 +00:00
rpokala
cd3729c2d6 Fix misleading comments in bus_if.m
While looking at r300073, I noticed these incorrect comments in the context
of the diff.

Reviewed by:	imp, jhb
Differential Revision:	https://reviews.freebsd.org/D6431
2016-05-18 16:25:34 +00:00
andrew
5755bcaba9 Return the struct intr_pic pointer from intr_pic_register. This will be
needed in later changes where we may not be able to lock the pic list lock
to perform a lookup, e.g. from within interrupt context.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-18 15:05:44 +00:00
kib
54336eef75 Ensure that ftruncate(2) is performed synchronously when file is
opened in O_SYNC mode, at least for UFS.  This also handles
truncation, done due to the O_SYNC | O_TRUNC flags combination to
open(2), in synchronous way.

Noted by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-05-18 12:03:57 +00:00
scottl
596f2a146d Import the 'iflib' API library for network drivers. From the author:
"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional.  The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation.  ixl and ixgbe driver conversions will be committed
shortly.  We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by:   mmacy@nextbsd.org
Reviewed by:    gallatin
Differential Revision:  D5211
2016-05-18 04:35:58 +00:00
markj
e3564c4833 Do not acquire the thread lock in hardclock_cnt() unless needed.
This function only sets thread flags if a SIGPROF or SIGVTALRM timer
has fired, which is almost never the case.

MFC after:	2 weeks
2016-05-18 03:55:54 +00:00
markj
d82dbf711e Micro-optimize sleepq_broadcast().
- Avoid a conditional branch on the return value of sleepq_resume_thread()
  by ORing its return value into the boolean wakeup_swapper. This is
  consistent with other sleepqueue functions which just pass this return
  value to their caller.
- sleepq_resume_thread() unconditionally removes the thread from its queue,
  so there's no need to maintain a pointer to the next element in the queue.

MFC after:	2 weeks
2016-05-18 03:50:21 +00:00
markj
601af01d4d Remove the MUTEX_DEBUG kernel option.
It has no counterpart among the other lock primitives and has been a
no-op for years. Mutex consistency checks are generally done whenver
INVARIANTS is enabled.
2016-05-18 03:34:02 +00:00
markj
706498b426 Guard the lockstat:::thread-spin probe with KDTRACE_HOOKS.
X-MFC-With:	r300103
2016-05-18 03:23:07 +00:00
markj
39c9d57c9c lockstat:::thread-spin should only fire after spinning for the lock.
MFC after:	1 week
2016-05-18 03:21:21 +00:00
glebius
30f7e6f500 Add a comment and KASSERT that a M_NOFREE mbuf has always EXT_EXTREF ext.
Submitted by:	kmacy
2016-05-17 23:15:16 +00:00
imp
f675e6a4ec Don't forget to quote \ characters with \. 2016-05-17 22:52:42 +00:00
glebius
5e838e04e3 Validate that user supplied control message length is not negative.
Submitted by:	C Turt <cturt hardenedbsd.org>
Security:	SA-16:19
Security:	CVE-2016-1887
2016-05-17 22:28:53 +00:00
jhb
eab5a1b232 Document the formatting requirements of location and pnpinfo strings.
devd requires location and pnpinfo strings generated by bus drivers
to be formatted as a list of name=value keypairs.  Non-conforming
bus drivers cause devd to mis-parse device events for these buses.

Note that this documents the desired requirements.  devctl_safe_quote()
doesn't yet escape backslash characters, and devd doesn't handle escaped
characters in quoted values.

Differential Revision:	https://reviews.freebsd.org/D6252
2016-05-17 19:34:07 +00:00
kib
8da898f26c Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held.  The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths.  Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive).  Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot.  When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
   pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
   the lifetime of the shared mutex associated with a vnode' page.

Reviewed by:	jilles (previous version, supposedly the objection was fixed)
Discussed with:	brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2016-05-17 09:56:22 +00:00
andrew
f1a8e245ae Introduce MSI and MSI-X support to intrng. This adds a new msi device
interface with 5 methods to mirror the 5 MSI/MSI-X methods in the pcib
interface. The pcib driver will need to perform a device specific lookup
to find the MSI controller and pass this to intrng as the xref. Intrng
will finally find the controller and have it handle the requested operation.

Obtained from:	ABT Systems Ltd
MFH:		yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5985
2016-05-16 09:11:40 +00:00
avg
594a717030 vfs_read_dirent: increment ncookies after adding a cookie
It seems that at present vfs_read_dirent() is used only with filesystems
that do not support cookies, so the bug never manifested itself.

MFC after:	1 week
2016-05-16 07:31:11 +00:00
avg
65bf368323 dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE
This is a bit hackish, but the flag is currently set only for ZFS
snapshots mounted under .zfs.  mountcheckdirs() can change cdir/rdir
references to a covered vnode.  But for the said snapshots the covered
vnode is really ephemeral and it must never be accessed (except
for a few specific cases).

To do:	consider removing mountcheckdirs() entirely

MFC after:	5 days
2016-05-16 07:23:24 +00:00
jhb
bcc5b0c55d Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
trasz
fe66d1db50 Stop hiding errors that result in failure to mount /dev. Otherwise,
missing /dev directory makes one end up with a completely deaf (init
without stdout/stderr) system with no hints on the console, unless
you've booted up with bootverbose.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-12 07:38:10 +00:00
cem
493fc2338c subr_vmem: Fix double-free in error case of vmem_create
If vmem_init() fails, 'vm' is already destroyed and freed.  Don't free it
again.

Reported by:	Coverity
CID:		1042110
Sponsored by:	EMC / Isilon Storage Division
2016-05-11 23:16:11 +00:00
kib
8f11289452 Add vfs_hash_ref(9) function, which finds a vnode by the hash value
and returns it referenced.

The function is similar to vfs_hash_get(9), but unlike the later,
returned vnode is not locked.  This operation cannot be requested with
the vget(9) flags.

Reviewed and tested by:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-05-11 06:32:22 +00:00
kib
5938890173 Style: wrap long lines.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-05-11 06:27:00 +00:00
jhb
6bae79f884 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
2016-05-09 20:50:21 +00:00
andrew
ce5d941306 Check malloc succeeded in pic_create, with M_NOWAIT it may return NULL.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-09 12:24:39 +00:00
mjg
4376e44d3a fd: assert dropped filedesc lock in fdcloseexec 2016-05-08 03:26:12 +00:00
skra
8d92017b7d Set correct size to the size member of struct intr_map_data when
initialized. As the size member is not used at the present,
it did not break anything.
2016-05-06 08:54:00 +00:00
emaste
d0cc83784b Add explicit cast to fix mips and powerpc build after r299090
Sponsored by:	The FreeBSD Foundation
2016-05-05 15:21:33 +00:00
skra
ee35e13013 INTRNG - redefine struct intr_map_data to avoid headers pollution. Each
struct associated with some type defined in enum intr_map_data_type
must have struct intr_map_data on the top of its own definition now.
When such structs are used, correct type and size must be filled in.

There are three such structs defined in sys/intr.h now. Their
definitions should be moved to corresponding headers by follow-up
commits.

While this change was propagated to all INTRNG like PICs,
pic_map_intr() method implementations were corrected on some places.
For this specific method, it's ensured by a caller that the 'data'
argument passed to this method is never NULL. Also, the return error
values were standardized there.
2016-05-05 13:31:19 +00:00
skra
9326629705 Remove superfluous check. The pic_dev member of struct pic
is never NULL on PIC found by pic_lookup().
2016-05-05 13:23:38 +00:00
adrian
689d1f5128 s/struct device */device_t/g
Submitted by:	kmacy
2016-05-04 23:31:52 +00:00
asomers
09b44517ca Improve performance and functionality of the bitstring(3) api
Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow
for efficient searching of set or cleared bits starting from any bit offset
within the bit string.

Performance is improved by operating on longs instead of bytes and using
ffsl() for searches within a long. ffsl() is a compiler builtin in both
clang and gcc for most architectures, converting what was a brute force
while loop search into a couple of instructions.

All of the bitstring(3) API continues to be contained in the header file.
Some of the functions are large enough that perhaps they should be uninlined
and moved to a library, but that is beyond the scope of this commit.

sys/sys/bitstring.h:
        Convert the majority of the existing bit string implementation from
        macros to inline functions.

        Properly protect the implementation from inadvertant macro expansion
        when included in a user's program by prefixing all private
        macros/functions and local variables with '_'.

        Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and
        bit_ffc() in terms of their "at" counterparts.

        Provide a kernel implementation of bit_alloc(), making the full API
        usable in the kernel.

        Improve code documenation.

share/man/man3/bitstring.3:
        Add pre-exisiting API bit_ffc() to the synopsis.

        Document new APIs.

        Document the initialization state of the bit strings
        allocated/declared by bit_alloc() and bit_decl().

        Correct documentation for bitstr_size(). The original code comments
        indicate the size is in bytes, not "elements of bitstr_t". The new
        implementation follows this lead. Only hastd assumed "elements"
        rather than bytes and it has been corrected.

etc/mtree/BSD.tests.dist:
tests/sys/Makefile:
tests/sys/sys/Makefile:
tests/sys/sys/bitstring.c:
        Add tests for all existing and new functionality.

include/bitstring.h
	Include all headers needed by sys/bitstring.h

lib/libbluetooth/bluetooth.h:
usr.sbin/bluetooth/hccontrol/le.c:
        Include bitstring.h instead of sys/bitstring.h.

sbin/hastd/activemap.c:
        Correct usage of bitstr_size().

sys/dev/xen/blkback/blkback.c
        Use new bit_alloc.

sys/kern/subr_unit.c:
        Remove hard-coded assumption that sizeof(bitstr_t) is 1.  Get rid of
        unrb.busy, which caches the number of bits set in unrb.map.  When
        INVARIANTS are disabled, nothing needs to know that information.
        callapse_unr can be adapted to use bit_ffs and bit_ffc instead.
        Eliminating unrb.busy saves memory, simplifies the code, and
        provides a slight speedup when INVARIANTS are disabled.

sys/net/flowtable.c:
        Use the new kernel implementation of bit-alloc, instead of hacking
        the old libc-dependent macro.

sys/sys/param.h
        Update __FreeBSD_version to indicate availability of new API

Submitted by:   gibbs, asomers
Reviewed by:    gibbs, ngie
MFC after:      4 weeks
Sponsored by:   Spectra Logic Corp
Differential Revision:  https://reviews.freebsd.org/D6004
2016-05-04 22:34:11 +00:00
royger
9610d15b51 rtc: fix inverted resolution check
The current code in clock_register checks if the newly added clock has a
resolution value higher than the current one in order to make it the
default, which is wrong. Clocks with a lower resolution value should be
better than ones with a higher resolution value, in fact with the current
code FreeBSD is always selecting the worse clock.

Reviewed by:		kib jhb jkim
Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D6185
2016-05-04 13:48:59 +00:00
sephe
24e41b4fb5 kern: Factor out function to convert hash flags to malloc(9) flags
Suggested by:	jhb
Reviewed by:	jhb, kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6184
2016-05-04 03:07:52 +00:00
kib
838f7d1304 Add EVFILT_VNODE open, read and close notifications.
While there, order EVFILT_VNODE notes descriptions alphabetically.

Based on submission, and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-03 15:17:43 +00:00
sephe
50eba76e30 kern: Add phashinit_flags(), which allows malloc(M_NOWAIT)
It will be used for the upcoming LRO hash table initialization.
And probably will be useful in other cases, when M_WAITOK can't
be used.

Reviewed by:	jhb, kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6138
2016-05-03 07:17:13 +00:00
jhb
c71e075efb Revert bus_get_cpus() for now.
I really thought I had run this through the tinderbox before committing,
but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
2016-05-03 01:17:40 +00:00
jhb
2da46e01a0 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Reviewed by:	wblock (manpage)
Differential Revision:	https://reviews.freebsd.org/D5519
2016-05-02 18:00:38 +00:00
kib
9eb6f0fde4 Issue NOTE_EXTEND when a directory entry is added to or removed from
the monitored directory as the result of rename(2) operation.  The
renames staying in the directory are not reported.

Submitted by:	Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after:	2 weeks
2016-05-02 13:18:17 +00:00
kib
5bc5df1663 Fix reporting of NOTE_LINK when directory link count changes due to
rename removing or adding subdirectory entry.

Discussed with and tested by:	Vladimir Kondratyev <wulf@cicgroup.ru>
NetBSD PR:	48958 (http://gnats.netbsd.org/48958)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2016-05-02 13:13:32 +00:00
pfg
28823d0656 sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
pfg
d791a14b72 sys/kern: spelling fixes.
Mostly on comments but affects some debug messages.

MFC after: 2 weeks
2016-04-29 21:54:28 +00:00
asomers
effae8a6ec Automate the subr_unit test.
Build and install the subr_unit test program originally written by phk, and
run it with the other ATF tests.

tests/sys/kern/Makefile
	* Build and install the subr_unit test as a plain test

sys/kern/subr_unit.c
	* Reduce the default number of repetitions from 100 to 1, and add a
	  command-line parser to override it.
	* Don't be so noisy by default
	* Fix an include problem for the test build

Reviewed by:	ngie
MFC after:	4 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D6038
2016-04-29 21:11:31 +00:00
jhb
1a67ba1558 Expose soaio_enqueue().
This can be used by protocol-specific AIO handlers to queue work to the
socket AIO daemon pool.

Sponsored by:	Chelsio Communications
2016-04-29 20:12:45 +00:00
jhb
ab5f9da057 Introduce a new protocol hook pru_aio_queue.
This allows a protocol to claim individual AIO requests instead of using
the default socket AIO handling.

Sponsored by:	Chelsio Communications
2016-04-29 20:11:09 +00:00
pfg
e84bb9d29d bufs: make B_DIRTY and B_PERSISTENT flags available
It appears these flags were related to ext2fs but are completely
unused nowadays. Retire them.

Suggested by: mckusick
2016-04-29 16:32:28 +00:00
mmel
ee7bd1cecf INTRNG: Define 'INTR_IRQ_INVALID' constant and use it consistently
as error indicator.
2016-04-28 12:04:12 +00:00
mmel
1e1149563f GPIO: Add support for gpio pin interrupts.
Add new function gpio_alloc_intr_resource(), which allows an allocation
of interrupt resource associated to given gpio pin. It also allows to
specify interrupt configuration.

Note: This functionality is dependent on INTRNG, and must be
implemented in each GPIO controller.
2016-04-28 12:03:22 +00:00
jhb
9e4bb0297c Add a bus_null_rescan() method that always fails with an error.
Use this in place of kobj_error_method to disable BUS_RESCAN() on
PCI drivers that do not use the "standard" scanning algorithm.
2016-04-27 17:49:42 +00:00
jhb
eb8279b760 Add 'devctl delete' that calls device_delete_child().
'devctl delete' can be used to delete a device that is no longer present.
As an anti-foot-shooting measure, 'delete' will not delete a device
unless it's parent bus says it is no longer present.  This can be
overridden by passing the force ('-f') flag.

Note that this command should be used with care.  If a device is deleted
that is actually present it can't be resurrected unless the parent bus
device's driver supports rescans.

Differential Revision:	https://reviews.freebsd.org/D6019
2016-04-27 16:33:17 +00:00
jhb
e05c6840a1 Add a new rescan method to the bus interface.
The BUS_RESCAN() method rescans a single bus device checking for devices
that have been added or removed from the bus.  A new 'rescan' command is
added to devctl(8) to trigger a rescan.

Differential Revision:	https://reviews.freebsd.org/D6016
2016-04-27 16:29:03 +00:00
jamie
c39e007779 Delay revmoing the last jail reference in prison_proc_free, and instead
put it off into the pr_task.  This is similar to prison_free, and in fact
uses the same task even though they do something slightly different.

This resolves a LOR between the process lock and allprison_lock, which
came about in r298565.

PR:		48471
2016-04-27 02:25:21 +00:00
cem
555ff4cf7f posix4_mib: Don't overrun facility_initialized array
The facility_initialized and facility arrays are the same size and were
intended to be indexed the same.  I believe this mismatch was just a
typo/braino in r208731.

Reported by:	Coverity
CID:		1017430
Sponsored by:	EMC / Isilon Storage Division
2016-04-27 00:10:32 +00:00
cem
1e1ff456ba subr_mbpool: Don't free bogus pointer in error paths
An mbpool is allocated with a contiguous array of mbpages.  Freeing an
individual mbpage has never been valid.  Don't do it.

This bug has been present since this code was introduced in r117624 (2003).

Reported by:	Coverity
CID:		1009687
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 23:58:55 +00:00
jamie
6043e5f56b Use crcopysafe in jail_attach. 2016-04-26 21:19:12 +00:00
cem
973e983535 osd(9): Change array pointer to array pointer type from void*
This is a minor follow-up to r297422, prompted by a Coverity warning.  (It's
not a real defect, just a code smell.)  OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.

osd.9 is updated to match, along with a few trivial igor fixes.

Reported by:	Coverity
CID:		1353811
Sponsored by:	EMC / Isilon Storage Division
2016-04-26 19:57:35 +00:00
jamie
bfa6d165c2 Redo the changes to the SYSV IPC sysctl functions from r298585, so they
don't (mis)use sbufs.

PR:		48471
2016-04-26 18:17:44 +00:00
pfg
fc01419148 sys: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:38:17 +00:00
br
a2fb4c593f Add support for RISC-V. 2016-04-26 12:29:47 +00:00
br
778cc5a811 Move arm's devmap to some generic place, so it can be used
by other architectures.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D6091
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
2016-04-26 11:53:37 +00:00
jamie
69e1bbc965 Fix the logic in r298585: shm_prison_cansee returns an errno, so is
the opposite of a boolean.

PR:		48471
2016-04-25 22:30:10 +00:00
jamie
3f9624c2d6 Encapsulate SYSV IPC objects in jails. Define per-module parameters
sysvmsg, sysvsem, and sysvshm, with the following bahavior:

inherit: allow full access to the IPC primitives.  This is the same as
the current setup with allow.sysvipc is on.  Jails and the base system
can see (and moduly) each other's objects, which is generally considered
a bad thing (though may be useful in some circumstances).

disable: all no access, same as the current setup with allow.sysvipc off.

new: A jail may see use the IPC objects that it has created.  It also
gets its own IPC key namespace, so different jails may have their own
objects using the same key value.  The parent jail (or base system) can
see the jail's IPC objects, but not its keys.

PR:		48471
Submitted by:	based on work by kikuchan98@gmail.com
MFC after:	5 days
2016-04-25 17:06:50 +00:00
jamie
ff6105ece0 Use the new PR_METHOD_REMOVE to clean up jail handling in POSIX
message queues.
2016-04-25 04:36:54 +00:00
jamie
f29978c971 Pass the current/new jail to PR_METHOD_CHECK, which pushes the call
until after the jail is found or created.  This requires unlocking the
jail for the call and re-locking it afterward, but that works because
nothing in the jail has been changed yet, and other processes won't
change the important fields as long as allprison_lock remains held.

Keep better track of name vs namelc in kern_jail_set.  Name should
always be the hierarchical name (relative to the caller), and namelc
the last component.

PR:		48471
MFC after:	5 days
2016-04-25 04:27:58 +00:00
jamie
2003e1acf2 Add a new jail OSD method, PR_METHOD_REMOVE. It's called when a jail is
removed from the user perspective, i.e. when the last pr_uref goes away,
even though the jail mail still exist in the dying state.  It will also
be called if either PR_METHOD_CREATE or PR_METHOD_SET fail.

PR:		48471
MFC after:	 5 days
2016-04-25 04:24:00 +00:00
jamie
b67e183366 Remove the PR_REMOVE flag, which was meant as a temporary marker for
a jail that might be seen mid-removal.  It hasn't been doing the right
thing since at least the ability to resurrect dying jails, and such
resurrection also makes it unnecessary.
2016-04-25 03:58:08 +00:00
pfg
729533413f sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
trasz
9411c8c253 Get rid of rctl_lock; use racct_lock where appropriate. The fast paths
already required both of them, so having a separate rctl_lock didn't
buy us anything.

Reviewed by:	mjg@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5914
2016-04-21 16:22:52 +00:00
pfg
fc65edc1cd Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00