13494 Commits

Author SHA1 Message Date
Mark Johnston
7b77e1fe0f Specify SDT probe argument types in the probe definition itself rather than
using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API
to allow probes with dynamically-translated types.

There is no functional change.

MFC after:	2 weeks
2013-08-15 04:08:55 +00:00
Mark Johnston
12ede07ab8 Use kld_{load,unload} instead of mod_{load,unload} for the linker file load
and unload event handlers added in r254266.

Reported by:	jhb
X-MFC with:	r254266
2013-08-14 00:42:21 +00:00
Jeff Roberson
99de9af2a6 - Disable quantum caches on the kmem_arena. This can make fragmentation
worse on small KVA systems.  I had intended to only enable it for
   debugging.

Sponsored by:	EMC / Isilon Storage Division
2013-08-13 22:41:24 +00:00
Jeff Roberson
8441d1e842 - Add a statically allocated memguard arena since it is needed very early
on.
 - Pass the appropriate flags to vmem_xalloc() when allocating space for
   the arena from kmem_arena.

Sponsored by:	EMC / Isilon Storage Division
2013-08-13 22:40:43 +00:00
John Baldwin
e05bf4cf95 Some small cleanups to the fixes in r180340:
- Set NOTE_TRACKERR before running filt_proc().  If the knote did not
  have NOTE_FORK set in fflags when registered, then the TRACKERR event
  could miss being posted.
- Don't pass the pid in to filt_proc() for NOTE_FORK events.  The special
  handling for pids is done knote_fork() directly and no longer in
  filt_proc().

MFC after:	2 weeks
2013-08-13 18:45:58 +00:00
Gleb Smirnoff
90c35c1939 - Minor style(9) fix.
- Bring a comment up to date.
2013-08-13 13:40:31 +00:00
Mark Johnston
8776669b53 FreeBSD's DTrace implementation has a few problems with respect to handling
probes declared in a kernel module when that module is unloaded. In
particular,

* Unloading a module with active SDT probes will cause a panic. [1]
* A module's (FBT/SDT) probes aren't destroyed when the module is unloaded;
  trying to use them after the fact will generally cause a panic.

This change fixes both problems by porting the DTrace module load/unload
handlers from illumos and registering them with the corresponding
EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all
probes defined in a module when that module is unloaded, and to prevent a
module unload from proceeding if some of its probes are active. The latter
problem has already been fixed for FBT probes by checking lf->nenabled in
kern_kldunload(), but moving the check into the DTrace framework generalizes
it to all kernel providers and also fixes a race in the current
implementation (since a probe may be activated between the check and the
call to linker_file_unload()).

Additionally, the SDT implementation has been reworked to define SDT
providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT
to create and destroy SDT probes when a module is loaded or unloaded. This
simplifies things quite a bit since it means that pretty much all of the SDT
code can live in sdt.ko, and since it becomes easier to integrate SDT with
the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible
in that SDT providers spanning multiple modules can be created on the fly
when a module is loaded; at the moment it looks like illumos' SDT
implementation requires all SDT probes to be statically defined in a single
kernel table.

PR:		166927, 166926, 166928
Reported by:	davide [1]
Reviewed by:	avg, trociny (earlier version)
MFC after:	1 month
2013-08-13 03:10:39 +00:00
Mark Johnston
9c6139e411 Remove some unused fields from struct linker_file. They were added in
r172862 for use by the DTrace SDT framework but don't seem to have ever
been used.

MFC after:	2 weeks
2013-08-13 03:09:00 +00:00
Mark Johnston
c9b645b50b Add event handlers for module load and unload events. The load handlers are
called after the module has been loaded, and the unload handlers are called
before the module is unloaded. Moreover, the module unload handlers may
return an error to prevent the unload from proceeding.

Reviewed by:	avg
MFC after:	2 weeks
2013-08-13 03:07:49 +00:00
Konstantin Belousov
2f7c18600c The r254167 moved initialization of the sleepqueues before the witness
is operational.  init_sleepqueues() initializes 256 mutexes, which,
due to witness still being cold, started to overflow the pending_locks
array.

As stated in the reported panic message, increase WITNESS_PENDLIST
from 768 to 1024, which provides space for additional 256 locks.

Reported by:	many
Tested by:	rakuco, bdrewery
2013-08-10 21:42:14 +00:00
Olivier Houchard
662423aeaf Don't call sleepinit() from proc0_init(), make it a SYSINIT instead.
vmem needs the sleepq locks to be initialized when free'ing kva, so we want it
called as early as possible.
2013-08-09 23:13:52 +00:00
Olivier Houchard
e137643ef3 Instead of just trying to do it for arm, make sure vm_kmem_size is properly
aligned in kmeminit(), where it'll work for any arch.

Suggested by:	alc
2013-08-09 22:30:54 +00:00
Attilio Rao
e946b94934 On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (older version)
Reviewed by:	jeff
Tested by:	pho, scottl
2013-08-09 11:28:55 +00:00
Attilio Rao
ac6b769be9 Give mutex(9) the ability to recurse on a per-instance basis.
Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls.
This helps in cases we want to narrow down to specific calls the
possibility to recurse for some locks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff, alc
Tested by:	pho
2013-08-09 11:24:29 +00:00
Attilio Rao
c7aebda8a1 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
Edward Tomasz Napierala
8ddc3590cc Don't dereference null pointer should acl_alloc() be passed M_NOWAIT
and allocation failed.  Nothing in the tree passed M_NOWAIT.

Obtained from:	mjg
MFC after:	1 month
2013-08-09 08:40:31 +00:00
Scott Long
f510415d84 Add a helpful message that can help point to why a sysctl tree removal failed
Obtained from:	Netflix
MFC after:	3 days
2013-08-09 01:04:44 +00:00
Ryan Stone
08a42caa50 Allow drivers to return BUS_PROBE_NOWILDCARD from their attach routine to
match devices where the driver class was fixed but the unit number was
wildcarded.  This better matches the documented behaviour in
DEVICE_PROBE(9).

Reviewed by:	imp
2013-08-08 19:30:49 +00:00
John Baldwin
5b596f0f5f Don't emit a spurious EVFILT_PROC event with no fflags set on process exit
if NOTE_EXIT is not being monitored.  The rationale is that a listener
should only get an event for exit() if they registered interest via
NOTE_EXIT.  This matches the behavior on OS X.
- Don't save the exit status on process exit unless NOTE_EXIT is being
  monitored.
- Add an internal EV_DROP flag that requests kqueue_scan() to free the
  knote without signalling it to userland and use this when a process
  exits but the fflags in the knote is zero.

Reviewed by:	jmg
MFC after:	1 month
2013-08-07 19:56:35 +00:00
Kevin Lo
3de1bd9502 Remove unsigned comparison < 0
Found by:	LLVM
Reviewed by:	luigi
2013-08-07 07:22:56 +00:00
Jeff Roberson
5df87b21d3 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
Konstantin Belousov
456597e7bd Do not override the ENOENT error for the empty path, or EFAULT errors
from copyins, with the relative lookup check.

Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-08-05 19:42:03 +00:00
Attilio Rao
be99683637 Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by:	EMC / Isilon storage division
Reported by:	kib
2013-08-05 08:55:35 +00:00
Attilio Rao
3b6714cacb The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff
Tested by:	pho
2013-08-04 21:07:24 +00:00
Attilio Rao
878a788734 Remove unnecessary soft busy of the page before to do vn_rdwr() in
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc, jeff
Tested by:	pho
2013-08-04 15:56:19 +00:00
Marcel Moolenaar
90aa031bf1 Add a tunable for the default timeout. 2013-08-03 04:25:25 +00:00
Gleb Smirnoff
977c7043eb Remove extra zeroing after M_ZERO allocation. 2013-08-02 13:06:49 +00:00
Konstantin Belousov
1f3ad93be7 Remove unused malloc type.
Requested by:	alc
MFC after:	1 week
2013-08-01 12:55:41 +00:00
Ian Lepore
6cbd933b37 Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem
other than the one specified by the BOOTP server.  This configures NFS
using the BOOTP protocol while also respecting other root-path options such
as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT
boot option.  It allows you to override the root path provided by the
server, or to supply a root path when the server provides IP configuration
but no root path info.

This maintains the historical BOOTP_NFSROOT behavior of panicking on a
failure to mount the root path provided by the server, unless you've
provided an alternative via the ROOTDEVNAME kernel option or by setting
vfs.root.mountfrom.  The behavior of panicking when given no other options
is preserved because it amounts to a bit of a retry loop that could
eventually recover from a transient network or server problem.

The user can now override the root path from loader(8) even if the
kernel is compiled with BOOTP_NFSROOT.  If vfs.root.mountfrom is set in
the environment it is used unconditionally -- it always overrides the
BOOTP info.  If it begins with [old]nfs: then the BOOTP code uses it
instead of the server-provided info.  If it specifies some other
filesystem then the bootp code will not panic like it used to and the code
in vfs_mountroot.c will invoke the right filesystem to do the mount.

If the kernel is compiled with the ROOTDEVNAME option, then that name is
used by the BOOTP code if either
      * The server doesn't provide a pathname.
      * The boothowto flags include RB_DFLTROOT.
The latter allows the user to compile in alternate path in ROOTDEVNAME
such as ufs:/dev/da0s1a and boot from that path by setting
boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).

The one thing not provided here is automatic failover from a
server-provided path to a compiled-in one without the user manually
requesting that.  The code just isn't currently structured in a way that
makes that possible with a lot of rewrite.  I think the ability to set
vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server
doesn't provide a name covers the most common needs.

A set of patches submitted by Lars Eggert provided the part I couldn't
figure out by myself when I tried to do this last year; many thanks.

Reviewed by:	rodrigc
2013-07-31 19:14:00 +00:00
Scott Long
fcd9ff2c67 Another fix for r253823; retain the default of 1 readahead block for sendfile.
Submitted by:	glebius
Obtained from:	Netflix
MFC after:	3 days
2013-07-31 15:55:01 +00:00
Scott Long
de925dd31f Fix r253823. Some WIP patches snuck in.
Submitted by:	zont
2013-07-30 23:50:09 +00:00
Scott Long
fc4a5f052b Create a knob, kern.ipc.sfreadahead, that allows one to tune the amount of
readahead that sendfile() will do.  Default remains the same.

Obtained from:	Netflix
MFC after:	3 days
2013-07-30 23:26:05 +00:00
Konstantin Belousov
2c38cc792e When creation of the v_pollinfo raced and our instance of vpollinfo
must be destroyed, knlist_clear() and seldrain() calls could be
avoided, since vpollinfo was not used.  More, the knlist_clear()
calling protocol requires the knlist locked, which is not true at the
call site.

Split the destruction into the helper destroy_vpollinfo_free(), and
call it when raced, instead of destroy_vpollinfo().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:   3 days
2013-07-28 06:59:29 +00:00
John Baldwin
5e3a17c0b9 Use VMFS_OPTIMAL_SPACE instead of VMFS_ALIGNED_SPACE in shm_map(). 2013-07-24 20:34:25 +00:00
Marcel Moolenaar
ba7f4cdc97 Further restrict the MAC addresses that we use for UUID generation
to those that are universally administered. While it is possible to
add locally administered MAC addresses, it's unclear whether those
are (expected) to be more unique than random multicast MAC addresses
or not.

With many U-Boot configurations assigning fixed and non-official MAC
addresses to ethernet ports and without setting the 'X' flag, this
change may have very little value in the embedded (development)
space. Uniqueness of the universally administered addresses is non-
existent on the (H/W) bench and questionable under the (S/W) desk.
In short: this change is aimed at production environments...
2013-07-24 18:13:43 +00:00
Marcel Moolenaar
8ff6ca1e08 In uuid_ether_add(), avoid false positives due to the limited type
used to hold the sum of the bytes of the MAC address. While here,
rename the variable that holds the sum from 'c' to 'sum'.

Pointed out by: thompsa@
2013-07-24 16:22:27 +00:00
Andriy Gapon
785797c341 rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.

Rationale:

- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
  invoked depending on its relative order with scheduler; this was not obvious
  and the bug actually used to exist

Reviewed by:	kib (ealier version)
MFC after:	14 days
2013-07-24 09:45:31 +00:00
Gleb Smirnoff
9e3cc17647 Remove unused argument from vmem_add1().
Reviewed by:	jeff
2013-07-24 08:02:56 +00:00
Marcel Moolenaar
ef1f916971 Decouple the UUID generator from network interfaces by having MAC
addresses added to the UUID generator using uuid_ether_add(). The
UUID generator keeps an arbitrary number of MAC addresses, under
the assumption that they are rarely removed (= uuid_ether_del()).
This achieves the following:
1.  It brings up closer to having the network stack as a loadable
    module.
2.  It allows the UUID generator to filter MAC addresses for best
    results (= highest chance of uniqeness).
3.  MAC addresses can come from anywhere, irrespactive of whether
    it's used for an interface or not.

A side-effect of the change is that when no MAC addresses have been
added, a random multicast MAC address is created once and re-used if
needed. Previusly, when a random MAC address was needed, it was
created for every call. Thus, a change in behaviour is introduced
for when no MAC addresses exist.

Obtained from:	Juniper Networks, Inc.
2013-07-24 04:24:21 +00:00
Gleb Smirnoff
e28a647db6 Revert r249590 and in case if mp_ncpus isn't initialized use MAXCPU. This
allows us to init counter zone at early stage of boot.

Reviewed by:	kib
Tested by:	Lytochkin Boris <lytboris gmail.com>
2013-07-23 11:16:40 +00:00
Mateusz Guzik
b1051d92fe Remove cr_prison NULL check from proc_to_reap.
Userspace processes always have a prison.

MFC after:	2 weeks
2013-07-22 02:07:15 +00:00
Mateusz Guzik
462314b3f9 Remove duplicate assertion from tdsendsignal.
MFC after:	2 weeks
2013-07-22 00:44:37 +00:00
Konstantin Belousov
643ee87175 Implement compat32 wrappers for the ktimer_* syscalls.
Reported, reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:43:52 +00:00
Konstantin Belousov
058262c69a Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32
argument.

Reviewed and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:40:30 +00:00
Konstantin Belousov
9731998997 Move the convert_sigevent32() utility function into freebsd32_misc.c
for consumption outside the vfs_aio.c.

For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods,
also copy in the sigev_value, since librt event pumping loop compares
note generation number with the value passed through sigev_value.

Tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-21 19:33:48 +00:00
Konstantin Belousov
d31e4b3a58 id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2).
Reported and tested by:	Petr Salinger <Petr.Salinger@seznam.cz>
PR:	threads/180652
Sponsored by:	The FreeBSD Foundation
2013-07-20 13:39:41 +00:00
John Baldwin
ff74a3fa6b Be more aggressive in using superpages in all mappings of objects:
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
  vm_map_find() that will try to alter the alignment of a mapping to match
  any existing superpage mappings of the object being mapped.  If no
  suitable address range is found with the necessary alignment,
  vm_map_find() will fall back to using the simple first-fit strategy
  (VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to
  use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.

Reviewed by:	alc (earlier version)
MFC after:	2 weeks
2013-07-19 19:06:15 +00:00
Konstantin Belousov
a8b0523ae7 Clear the vnode knotes before destroying vpollinfo.
Reported and tested by:	Patrick Lamaiziere <patfbsd@davenulle.org>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2013-07-17 10:56:21 +00:00
Gleb Smirnoff
59b9c4f289 Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5.
Now that r253351 moved sendfile() stats to a separate struct, the
last field used in mbstat is m_mcfail, which is updated, but never
read or obtained from userland.
2013-07-15 12:18:36 +00:00
Andrey V. Elsukov
05d1f5bce0 Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00