Commit Graph

201 Commits

Author SHA1 Message Date
Mark Murray
10cb24248a This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random.
This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.

The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.

The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.

Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.

My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.

My Nomex pants are on. Let the feedback commence!

Reviewed by:	trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by:	so(des)
2014-10-30 21:21:53 +00:00
Adrian Chadd
3fe93b946f Convert a missed u_char cpu -> int cpu.
This was caught by a gcc build.

Reported by:	luigi
Sponsored by:	Norse Corp, Inc.
2014-10-19 04:38:02 +00:00
Konstantin Belousov
b76278407d Add kernel option KSTACK_USAGE_PROF to sample the stack depth on
interrupts and report the largest value seen as sysctl
debug.max_kstack_used.  Useful to estimate how close the kernel stack
size is to overflow.

In collaboration with:	Larry Baird <lab@gta.com>
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2014-10-04 18:38:14 +00:00
Adrian Chadd
066da8050b Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather
than u_char.

Migrate post_filter to use an int for a CPU rather than u_char.

Change intr_event_bind() to use an int for CPU rather than u_char.

It touches the ppc, sparc64, arm and mips machdep code but it should
(hah!) be a no-op.

Tested:

* i386, AMD64 laptops

Reviewed by:	jhb
2014-09-17 17:33:22 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alexander V. Chernikov
811985398d Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection:	jhb
Tested by:	hiren
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-22 11:32:23 +00:00
Mark Murray
f02e47dc1e Snapshot. This passes the build test, but has not yet been finished or debugged.
Contains:

* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).

* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.

* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.

* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.

* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.

Submitted by:	Arthur Mesh <arthurmesh@gmail.com> (the first item)
2013-10-04 06:55:06 +00:00
Mark Murray
c495c93567 Snapshot; Do some running repairs on entropy harvesting. More needs to follow. 2013-08-26 18:35:21 +00:00
Alfred Perlstein
3eebd44d0c The change in r236456 (atomic_store_rel not locked) exposed a bug
in the ithread code where we could lose ithread interrupts if
intr_event_schedule_thread() is called while the ithread is already
running.  Effectively memory writes could be ordered incorrectly
such that the unatomic write of 0 to ithd->it_need (in ithread_loop)
could be delayed until after it was set to be triggered again by
intr_event_schedule_thread().

This was particularly a big problem for CAM because CAM optimizes
scheduling of ithreads such that it only signals camisr() when it
queues to an empty queue.  This means that additional completion
events would not unstick CAM and quickly lead to a complete lockup
of the CAM stack.

To fix this use atomics in all places where ithd->it_need is accessed.

Submitted by: delphij, mav
Obtained from: TrueOS, iXsystems
MFC After: 1 week
2013-07-04 05:53:05 +00:00
Konstantin Belousov
b887a1555c If filter of the interrupt event is not null, print it, in addition to
the handler address.  Add a mark to distinguish between filter and
handler.

Note that the arguments for both filter and handler are same.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	jhb
MFC after:	1 week
2013-04-05 14:30:51 +00:00
Davide Italiano
dbd2e1677f MFcalloutng (r244355):
Make loadavg calculation callout direct. There are several reasons for it:
 - it is very simple and doesn't worth context switch to SWI;
 - since SWI is no longer used here, we can remove twelve years old hack,
excluding this SWI from from the loadavg statistics;
 - it fixes problem when eventtimer (HPET) shares interrupt with some other
device, and that interrupt thread counted as permanent loadavg of 1; now
loadavg accounted before that interrupt thread is scheduled.

Sponsored by:	Google Summer of Code 2012, iXsystems inc.
Tested by:	flo, marius, ian, Fabian Keil, markj
2013-03-04 11:22:19 +00:00
Neel Natu
dae3dc73f6 If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

The text used above is verbatim from r195249 and the code should now be
in line with the intent of that commit.
2013-02-07 06:48:47 +00:00
John Baldwin
d95dca1d08 Add optional entropy harvesting for software interrupts in swi_sched()
as controlled by kern.random.sys.harvest.swi.  SWI harvesting feeds into
the interrupt FIFO and each event is estimated as providing a single bit of
entropy.

Reviewed by:	markm, obrien
MFC after:	2 weeks
2012-09-25 14:55:46 +00:00
Alexander Kabaev
c9516c94b4 Do not add handler to event handlers list until ithread is created.
In rare event when fast and ithread interrupts share the same vector
and the fast handler was registered first, we can end up trying to
schedule the ithread that is not created yet. The kernel built with
INVARIANTS then triggers an assertion.

Change the order to create the ithread first and only then add the
handler that needs it to the interrupt event handlers list.

Reviewed by: jhb
2012-08-06 16:37:43 +00:00
Juli Mallett
85729c2c44 Export intrcnt correctly when running under 32-bit compatibility.
Reviewed by:	gonzo, nwhitehorn
2012-03-09 22:30:54 +00:00
John Baldwin
44ad547522 Add a new sched_clear_name() method to the scheduler interface to clear
the cached name used for KTR_SCHED traces when a thread's name changes.
This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's
name changes, most commonly via execve().

MFC after:	2 weeks
2012-03-08 19:41:05 +00:00
Sergey Kandaurov
037f43d3ef Be pedantic and change // comment to C-style one.
Noticed by:		Bruce Evans
2012-01-16 20:42:56 +00:00
Ed Schouten
dc15eac046 Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.
2012-01-02 12:12:10 +00:00
Attilio Rao
521ea19d1c - Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
  tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
  This area can be widely improved by applying the same to other
  architectures and likely finding an unified approach among them and
  move the whole code to be MI. More work in this area is expected to
  happen fairly soon.

No MFC is previewed for this patch.

Tested by:	pluknet
Reviewed by:	jhb
Approved by:	re (kib)
2011-07-18 15:19:40 +00:00
Jeff Roberson
5bd186a65a - Catch up to falloc() changes.
- PHOLD() before using a task structure on the stack.
 - Fix a LOR between the sleepq lock and thread lock in _intr_drain().
2011-04-26 07:30:52 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
John Baldwin
d330520523 - Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, and
PI_DISKLOW.  While here, rename PI_TTYLOW to PI_TTY.
- Add a macro PI_SWI() that takes a SWI_* constant as an argument and
  returns the suitable thread priority.
2011-01-11 22:15:30 +00:00
Alexander Motin
1f255bd340 Store interrupt trap frame into struct thread. It allows interrupt handler
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.

Reviewed by:	jhb@
2010-06-10 16:14:05 +00:00
Andriy Gapon
89fc20cc5e KASSERT that return value of interrupt filter complies with contract
For example a return value of zero could lead to a stuck level-triggered
interrupt line.

Reviewed by:	jhb (for INTR_FILTER case)
MFC after:	3 weeks
2010-01-27 09:59:08 +00:00
Attilio Rao
1b9d701fee Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by:	jeff (originally), kris (currently)
Reviewed by:	jhb
Tested by:	Giuseppe Cocomazzi <sbudella at email dot it>
2009-11-03 16:46:52 +00:00
John Baldwin
d0c9a29169 Use language more closely resembling English in a panic message.
Pointy hat to:	jhb
Submitted by:	pluknet
2009-10-15 18:51:19 +00:00
John Baldwin
37b8ef16cd Add a facility for associating optional descriptions with active interrupt
handlers.  This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
  a description with an active interrupt handler setup by BUS_SETUP_INTR.
  It has a default method (bus_generic_describe_intr()) which simply passes
  the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
  printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
  an interrupt handler and copy the name passed to intr_event_add_handler()
  into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
  to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
  the nexus(4) driver supply a custom bus_describe_intr method that invokes
  a new intr_describe() MD routine which in turn looks up the associated
  interrupt event and invokes intr_event_describe_handler().

Requested by:	many
Reviewed by:	scottl
MFC after:	2 weeks
2009-10-15 14:54:35 +00:00
John Baldwin
cebc7fb16c Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
  to a specific CPU to return an error value instead of void, thus allowing
  it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
  destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
  on the first interrupt in a group.  Moving the first interrupt in a group
  moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
  intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
  interrupts.  Previously, binding an interrupt to a CPU only performed a
  privilege check if the interrupt had an interrupt thread.  Interrupts
  without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
  cpuset mask for the associated interrupt thread.

Approved by:	re (kib)
2009-07-01 17:20:07 +00:00
John Baldwin
9bd55acf5e Return errors from intr_event_bind() to the caller of intr_set_affinity().
Specifically, if a non-root user attempts to bind an interrupt the request
will now report failure with EPERM rather than silently failing with a
successful return code.

MFC after:	1 week
2009-06-25 18:35:19 +00:00
Robert Watson
e84bcd8494 Binding interrupts to a CPU consists of two parts: setting up CPU
affinity for the interrupt thread, and requesting that underlying
hardware direct interrupts to the CPU.  For software interrupt
threads, implement a no-op interrupt event binder that returns
success, so that the interrupt management code will just set the
ithread's affinity and succeed.

Reviewed by:	jhb
MFC after:	1 week
2009-05-18 14:02:55 +00:00
David E. O'Brien
b386eb9139 style(9) 2008-09-23 14:25:56 +00:00
John Baldwin
37e9511fcb Expose a new public routine intr_event_execute_handlers() which executes
all the non-filter handlers attached to an interrupt event.  This can be
used by device drivers which multiplex their interrupt onto the interrupt
handlers for child devices.
2008-09-15 22:19:44 +00:00
Kip Macy
6205924afd Submit a band-aid for interrupt set up race.
MFC after:	1 month
2008-08-22 23:24:53 +00:00
Kip Macy
2976b312a1 revert change from local tree 2008-07-18 07:07:57 +00:00
Kip Macy
4af83c8cff import vendor fixes to cxgb 2008-07-18 06:12:31 +00:00
Bjoern A. Zeeb
04a58b9d5f Remove an unneeded error variable to make clear that if reaching
the end of the function we never return an error.
2008-06-29 18:26:07 +00:00
Jeff Roberson
8df78c41d6 - Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
 - In reset walk the children of kern_sched_stats and reset the counters
   via the oid_arg1 pointer.  This allows us to add arbitrary counters to
   the tree and still reset them properly.
 - Define a set of switch types to be passed with flags to mi_switch().
   These types are named SWT_*.  These types correspond to SCHED_STATS
   counters and are automatically handled in this way.
 - Make the new SWT_ types more specific than the older switch stats.
   There are now stats for idle switches, remote idle wakeups, remote
   preemption ithreads idling, etc.
 - Add switch statistics for ULE's pickcpu algorithm.  These stats include
   how much migration there is, how often affinity was successful, how
   often threads were migrated to the local cpu on wakeup, etc.

Sponsored by:	Nokia
2008-04-17 04:20:10 +00:00
Jeff Roberson
9b33b154b5 - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
John Baldwin
8aa9e82e67 Move INTR_FILTER from opt_global.h to its own header. 2008-04-05 20:13:15 +00:00
John Baldwin
1ee1b68792 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
Jeff Roberson
e4b1aa6210 - Fix a mis-merge that crept in during the softclock changes.
Spotted by:	jhb
2008-04-04 01:03:23 +00:00
Jeff Roberson
8d809d5061 Implement per-cpu callout threads, wheels, and locks.
- Move callout thread creation from kern_intr.c to kern_timeout.c
 - Call callout_tick() on every processor via hardclock_cpu() rather than
   inspecting callout internal details in kern_clock.c.
 - Remove callout implementation details from callout.h
 - Package up all of the global variables into a per-cpu callout structure.
 - Start one thread per-cpu.  Threads are not strictly bound.  They prefer
   to execute on the native cpu but may migrate temporarily if interrupts
   are starving callout processing.
 - Run all callouts by default in the thread for cpu0 to maintain current
   ordering and concurrency guarantees.  Many consumers may not properly
   handle concurrent execution.
 - The new callout_reset_on() api allows specifying a particular cpu to
   execute the callout on.  This may migrate a callout to a new cpu.
   callout_reset() schedules on the last assigned cpu while
   callout_reset_curcpu() schedules on the current cpu.

Reviewed by:	phk
Sponsored by:	Nokia
2008-04-02 11:20:30 +00:00
John Baldwin
6d2d1c044f Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
  and collapse down to one intr_event_create() routine.  The disable and
  eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
  routines since to trim one extra indirection.

Compiled on:	{arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on:	{amd64,i386} x {FILTER, !FILTER}
2008-03-17 22:42:01 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
John Baldwin
eaf86d1678 Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
  code wishes to bind an interrupt source to an individual CPU.  The MD
  code may reject the binding with an error.  If an assign_cpu function
  is not provided, then the kernel assumes the platform does not support
  binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
  event is bound to a CPU.  Only shared ithreads are bound.  We currently
  leave private ithreads for drivers using filters + ithreads in the
  INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
  a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
  PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
  an interrupt source and binds its interrupt event to the specified CPU.
  MI code can currently (ab)use this by doing:

	intr_bind(rman_get_start(irq_res), cpu);

  however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
  where the implementation in the x86 nexus(4) driver would end up calling
  intr_bind() internally.

Requested by:	kmacy, gallatin, jeff
Tested on:	{amd64, i386} x {regular, INTR_FILTER}
2008-03-14 19:41:48 +00:00
Jeff Roberson
6617724c5f Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential.  Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
2008-03-12 10:12:01 +00:00
Julian Elischer
539976ffdf fix typo in code normally not compiled in. 2007-10-29 20:45:31 +00:00
Julian Elischer
3c1ffc320f Fix typo in code obviously not being compiled on any of my machines.
found by: rdivacky@
2007-10-28 23:11:57 +00:00