Commit Graph

14327 Commits

Author SHA1 Message Date
Konstantin Belousov
9889bbac23 Mutex memory is not zeroed, add MTX_NEW.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-06 14:09:00 +00:00
Mark Johnston
947401dd50 Move the comment describing namei(9) back to namei()'s definition.
MFC after:	3 days
2015-07-05 22:56:41 +00:00
Mark Johnston
8bbd1f25b1 Remove a stale descriptive comment for gbincore().
The splay trees referenced in the comment were converted to
path-compressed tries in r250551.

MFC after:	3 days
2015-07-05 22:44:41 +00:00
Mark Johnston
5f34e93c58 Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by:	kib, mjg
Differential Revision:	https://reviews.freebsd.org/D2937
2015-07-05 22:37:33 +00:00
Mateusz Guzik
f131759f54 fd: make 'rights' a manadatory argument to fget* functions 2015-07-05 19:05:16 +00:00
Mariusz Zaborski
54f98da930 Move the nvlist source and private includes from sys/kern to seperate
directory sys/contrib/libnv.

The goal of this operation is to NOT install header files which shouldn't
be used outside the nvlist library.

Approved by:	pjd (mentor)
2015-07-04 16:33:37 +00:00
Mateusz Guzik
9ca30b0e06 vfs: use shared vnode locking when looking up ".." in vop_stdvptocnp
Briefly discussed with: kib
2015-07-04 15:46:39 +00:00
Mateusz Guzik
dba0bec2bb fd: de-k&r-ify functions + some whitespace fixes
No functional changes.
2015-07-04 15:42:03 +00:00
Mateusz Guzik
ee5f66f820 sysctl: get rid of sysctl_lock/unlock
Inline their contents into the only consumer.
2015-07-04 14:44:39 +00:00
Mateusz Guzik
d5fc115a1a sysctl: remove a debugging printf which crept in with r285125 2015-07-04 07:01:43 +00:00
Mateusz Guzik
b8633775a8 sysctl: switch sysctllock to a sleepable rmlock
The lock is almost never taken for writing.
2015-07-04 06:54:15 +00:00
Mateusz Guzik
e2f5418e73 sysvshm: fix up some whitespace issues and spurious initialisation 2015-07-02 19:14:30 +00:00
Mateusz Guzik
77a26248a3 sysvshm: don't lock proc when calculating attach_va
vm_daddr is constant and RLIMIT_DATA can be obtained from thread's copy of
rlimits.
2015-07-02 19:03:44 +00:00
Mateusz Guzik
0be3a191a4 sysvshm: fix shmrealloc
The code was supposed to initialize new segs in newsegs array, but used the old
pointer.
2015-07-02 19:00:22 +00:00
Konstantin Belousov
1965f86c72 Vnode is not referenced by the vfs_domount() at the point where
asserts are made.  Remove them, since we might dereference freed
memory.  Leaked locks are asserted by the syscall return code anyway.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-02 14:31:47 +00:00
Navdeep Parhar
9523d1bfc3 Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags
hanging off the header need to be freed too.

Differential Revision:	https://reviews.freebsd.org/D2708
Reviewed by:	ae@, hiren@
2015-06-30 17:19:58 +00:00
Mark Murray
d1b06863fb Huge cleanup of random(4) code.
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
  neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
  random(4) device in the kernel, and the KERN_ARND sysctl will
  return nothing. With RANDOM_DUMMY there will be a random(4) that
  always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
  through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
  suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
  This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
  there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
  behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
  (direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
  weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
  weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
  when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
  selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
  that instead. All that is needed here is N=0, N++, N==0, and some
  localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
  now the test will do a basic check of compressibility. Clairvoyant
  talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
  it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
  functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
2015-06-30 17:00:45 +00:00
Konstantin Belousov
6ef120027f Do not calculate the stack's bottom address twice.
Submitted by:	Olivц╘r Pintц╘r
Review:	https://reviews.freebsd.org/D2953
MFC after:	1 week
2015-06-30 15:22:47 +00:00
Mark Murray
6687b6720b Ansify another function. This is the last in the file, I hope. 2015-06-28 10:51:08 +00:00
Mark Murray
7233d3094d ANSIfy the only function that uses K&R definition in this file. 2015-06-28 09:44:58 +00:00
Konstantin Belousov
b2c3df842b Handle errors from background write of the cylinder group blocks.
First, on the write error, bufdone() call from ffs_backgroundwrite()
panics because pbrelvp() cleared bp->b_bufobj, while brelse() would
try to re-dirty the copy of the cg buffer.  Handle this by setting
B_INVAL for the case of BIO_ERROR.

Second, we must re-dirty the real buffer containing the cylinder group
block data when background write failed.  Real cg buffer was already
marked clean in ffs_bufwrite(). After the BV_BKGRDINPROG flag is
cleared on the real cg buffer in ffs_backgroundwrite(), buffer scan
may reuse the buffer at any moment. The result is lost write, and if
the write error was only transient, we get corrupted bitmaps.

We cannot re-dirty the original cg buffer in the
ffs_backgroundwritedone(), since the context is not sleepable,
preventing us from sleeping for origbp' lock.  Add BV_BKGDERR flag
(protected by the buffer object lock), which is converted into delayed
write by brelse(), bqrelse() and buffer scan.

In collaboration with:	Conrad Meyer <cse.cem@gmail.com>
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation (kib),
	  EMC/Isilon storage division (Conrad)
MFC after:	2 weeks
2015-06-27 09:44:14 +00:00
Adrian Chadd
5bbb2169d2 Un-static cpuset_which() - it's useful in other contexts, such as some
CPU set operations in my upcoming NUMA work.

Tested/compiled:

* i386 (run)
* amd64 (run)
* mips (run)
* mips64 (run)
* armv6 (built)

Sponsored by:	Norse Corp, Inc.
2015-06-26 04:14:05 +00:00
Mateusz Guzik
7150ce743a rlimit: deduplicate code in chg* functions 2015-06-25 00:15:37 +00:00
Sean Bruno
4e83b32a80 At the suggestion of jhb, replace atomic_set/clear calls with use of
exclusive locks in the enable/disable interpreter path.

Tested with WITNESS/INVARIANTS on and off.

Reviewed by:	sson davide
2015-06-24 15:52:26 +00:00
John-Mark Gurney
1977bd233a zero this struct as it depends upon it...
Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D2890
2015-06-23 18:40:20 +00:00
Konstantin Belousov
b05c401ff6 Only take previous buffer queue lock (olock) when needed for REMFREE
in binsfree().

Submitted by:	Conrad Meyer
Sponsored by:	EMC / Isilon Storage Division
Review:	https://reviews.freebsd.org/D2882
MFC after:	1 week
2015-06-23 06:12:14 +00:00
Sean Bruno
945afa7c25 Make imgact_binmisc_exec() static.
Submitted by:	kib
Reviewed by:	sson
2015-06-22 17:04:24 +00:00
Sean Bruno
602ec83516 Remove uneeded NULL check since malloc the malloc is now M_WAITOK
Submitted by:	mjg
2015-06-19 20:35:17 +00:00
Sean Bruno
e0ae213f63 Must have one of either M_WAITOK or M_NOWAIT, read the man page bruno.
Submitted by:	mjg
2015-06-19 19:57:39 +00:00
Sean Bruno
a7647ec444 Feedback from commit r284535
davide:  imgact_binmisc_clear_entry() needs to use atomic ops to remove
the enable bit.

kib:  M_NOWAIT is not warranted and comment is invalid.
2015-06-19 18:57:36 +00:00
Sean Bruno
5f98711d51 This change replaces the mutex with a sx lock for the interpreter list to
avoid the problem of holding a non-sleep lock during a page fault as
reported by witness. It also uses atomics where possible to avoid having
to acquire the exclusive lock. In addition, it consistently uses
memset()/memcpy() instead of bzero()/bcopy().

Differential Revision:	https://reviews.freebsd.org/D1971
Submitted by:	sson
Reviewed by:	jhb
2015-06-18 02:04:20 +00:00
Bjoern A. Zeeb
af10bf055f Initialise pr_enforce_statfs from the "default" sysctl value and
not from the compile time constant.  The sysctl value is seeded
from the compile time constant.

MFC after:	2 weeks
2015-06-17 13:15:54 +00:00
Konstantin Belousov
1eabd96728 vfs_msync(), called from syncer vnode fsync VOP, only iterates over
the active vnode list for the given mount point, with the assumption
that vnodes with dirty pages are active.  This is enforced by
vinactive() doing vm_object_page_clean() pass over the vnode pages.

The issue is, if vinactive() cannot be called during vput() due to the
vnode being only shared-locked, we might end up with the dirty pages
for the vnode on the free list.  Such vnode is invisible to syncer,
and pages are only cleaned on the vnode reactivation.  In other words,
the race results in the broken guarantee that user data, written
through the mmap(2), is written to the disk not later than in 30
seconds after the write.

Fix this by keeping the vnode which is freed but still owing
inactivation, on the active list.  When syncer loops find such vnode,
it is deactivated and cleaned by the final vput() call.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-06-17 04:46:58 +00:00
Pedro F. Giffuni
708694f704 Use nitems() macro instead of __arraycount() 2015-06-16 20:19:00 +00:00
Mateusz Guzik
4da8456f0a Replace struct filedesc argument in getvnode with struct thread
This is is a step towards removal of spurious arguments.
2015-06-16 13:09:18 +00:00
Mateusz Guzik
9ef8328d52 fd: make rights a mandatory argument to fget_unlocked 2015-06-16 09:52:36 +00:00
Mateusz Guzik
80f3623f2f fd: don't unnecessary copy capabilities in _fget 2015-06-16 09:08:30 +00:00
Mateusz Guzik
cedab3c72c fd: reduce excessive zeroing on fd close
fde_file as NULL is already an indicator of an unused fd. All other
fields are populated when fp is installed.
2015-06-14 14:10:05 +00:00
Mateusz Guzik
ea31808c3b fd: move out actual fp installation to _finstall
Use it in fd passing functions as the first step towards fd code cleanup.
2015-06-14 14:08:52 +00:00
Jeremie Le Hen
3768a5dfb5 nit: Rename racct_alloc_resource to racct_adjust_resource.
This is more accurate as the amount can be negative.

MFC after:	2 weeks
2015-06-14 08:33:14 +00:00
Gleb Smirnoff
093c7f396d Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by:	kib
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-06-12 11:32:20 +00:00
Andriy Gapon
076dd8eb2e several lockstat improvements
0. For spin events report time spent spinning, not a loop count.
While loop count is much easier and cheaper to obtain it is hard
to reason about the reported numbers, espcially for adaptive locks
where both spinning and sleeping can happen.
So, it's better to compare apples and apples.

1. Teach lockstat about FreeBSD rw locks.
This is done in part by changing the corresponding probes
and in part by changing what probes lockstat should expect.

2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD.

3. Report lock acquisition events for successful rw try-lock operations.

4. Teach lockstat about FreeBSD sx locks.
Reporting of events for those locks completely mirrors
rw locks.

5. Report spin and block events before acquisition event.
This is behavior documented for the upstream, so it makes sense to stick
to it.  Note that because of FreeBSD adaptive lock implementations
both the spin and block events may be reported for the same acquisition
while the upstream reports only one of them.

Differential Revision:	https://reviews.freebsd.org/D2727
Reviewed by:	markj
MFC after:	17 days
Relnotes:	yes
Sponsored by:	ClusterHQ
2015-06-12 10:01:24 +00:00
Mateusz Guzik
3331a33a42 ussreq: use saved fdp pointer insted of td->td_proc->p_fd
No functional changes.
2015-06-12 06:28:22 +00:00
Konstantin Belousov
529c97886b Tweaks for r284178:
Do not include machine/atomic.h explicitely, the header is already included
by sys/systm.h.

Force inlining of tc_getgen() and tc_setgen().  The functions are used
more than once, which causes compilers with non-aggressive inlining
policies to generate calls.

Suggested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-06-11 04:41:54 +00:00
Mateusz Guzik
21de5aea6c Fixup the build after r284215.
Submitted by:	Ivan Klymenko <fidaj ukr.net> [slighly modified]
2015-06-10 12:39:01 +00:00
Mateusz Guzik
f6f6d24062 Implement lockless resource limits.
Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
2015-06-10 10:48:12 +00:00
Mateusz Guzik
4ea6a9a28f Generalised support for copy-on-write structures shared by threads.
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.

This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
2015-06-10 10:43:59 +00:00
Mateusz Guzik
3b3eb22ab6 fd: remove fdesc_mtx 2015-06-10 09:40:07 +00:00
Mateusz Guzik
153cc61b54 fd: use atomics to manage fd_refcnt and fd_holcnt
This gets rid of fdesc_mtx.
2015-06-10 09:34:50 +00:00
Kenneth D. Merry
5672fac935 Add support for reading MAM attributes to camcontrol(8) and libcam(3).
MAM is Medium Auxiliary Memory and is most commonly found as flash
chips on tapes.

This includes support for reading attributes and decoding most
known attributes, but does not yet include support for writing
attributes or reporting attributes in XML format.

libsbuf/Makefile:
	Add subr_prf.c for the new sbuf_hexdump() function.  This
	function is essentially the same function.

libsbuf/Symbol.map:
	Add a new shared library minor version, and include the
	sbuf_hexdump() function.

libsbuf/Version.def:
	Add version 1.4 of the libsbuf library.

libutil/hexdump.3:
	Document sbuf_hexdump() alongside hexdump(3), since it is
	essentially the same function.

camcontrol/Makefile:
	Add attrib.c.

camcontrol/attrib.c:
	Implementation of READ ATTRIBUTE support for camcontrol(8).

camcontrol/camcontrol.8:
	Document the new 'camcontrol attrib' subcommand.

camcontrol/camcontrol.c:
	Add the new 'camcontrol attrib' subcommand.

camcontrol/camcontrol.h:
	Add a function prototype for scsiattrib().

share/man/man9/sbuf.9:
	Document the existence of sbuf_hexdump() and point users to
	the hexdump(3) man page for more details.

sys/cam/scsi/scsi_all.c:
	Add a table of known attributes, text descriptions and
	handler functions.

	Add a new scsi_attrib_sbuf() function along with a number
	of other related functions that help decode attributes.

	scsi_attrib_ascii_sbuf() decodes ASCII format attributes.

	scsi_attrib_int_sbuf() decodes binary format attributes, and
	will pass them off to scsi_attrib_hexdump_sbuf() if they're
	bigger than 8 bytes.

	scsi_attrib_vendser_sbuf() decodes the vendor and drive
	serial number attribute.

	scsi_attrib_volcoh_sbuf() decodes the Volume Coherency
	Information attribute that LTFS writes out.

sys/cam/scsi/scsi_all.h:
	Add a number of attribute-related structure definitions and
	other defines.

	Add function prototypes for all of the functions added in
	scsi_all.c.

sys/kern/subr_prf.c:
	Add a new function, sbuf_hexdump().  This is the same as
	the existing hexdump(9) function, except that it puts the
	result in an sbuf.

	This also changes subr_prf.c so that it can be compiled in
	userland for includsion in libsbuf.

	We should work to change this so that the kernel hexdump
	implementation is a wrapper around sbuf_hexdump() with a
	statically allocated sbuf with a drain.  That will require
	a drain function that goes to the kernel printf() buffer
	that can take a non-NUL terminated string as input.
	That is because an sbuf isn't NUL-terminated until it is
	finished, and we don't want to finish it while we're still
	using it.

	We should also work to consolidate the userland hexdump and
	kernel hexdump implemenatations, which are currently
	separate.  This would also mean making applications that
	currently link in libutil link in libsbuf.

sys/sys/sbuf.h:
	Add the prototype for sbuf_hexdump(), and add another copy
	of the hexdump flag values if they aren't already defined.

	Ideally the flags should be defined in one place but the
	implemenation makes it difficult to do properly.  (See
	above.)

Sponsored by:	Spectra Logic Corporation
MFC after:	1 week
2015-06-09 21:39:38 +00:00