Commit Graph

351 Commits

Author SHA1 Message Date
Mark Johnston
8675f5f776 Only check the blessings table for known LORs.
Previously we would check for blessings before marking a given lock
pair as reversed, so each "reversed" lock acquisition would require
a linear scan of the table.  Instead, check the table after marking
the pair as reversed but before generating a report.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21135
2019-08-02 18:01:47 +00:00
Mark Johnston
49c3e8c8d1 Enable witness(4) blessings.
witness has long had a facility to "bless" designated lock pairs.  Lock
order reversals between a pair of blessed locks are not reported upon.
We have a number of long-standing false positive LOR reports; start
marking well-understood LORs as blessed.

This change hides reports about UFS vnode locks and the UFS dirhash
lock, and UFS vnode locks and buffer locks, since those are the two that
I observe most often.  In the long term it would be preferable to be
able to limit blessings to a specific site where a lock is acquired,
and/or extend witness to understand why some lock order reversals are
valid (for example, if code paths with conflicting lock orders are
serialized by a third lock), but in the meantime the false positives
frequently confuse users and generate bug reports.

Reviewed by:	cem, kib, mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21039
2019-07-30 17:09:58 +00:00
Andrey V. Elsukov
2317067c31 Remove bpf interface lock, it is no longer exist. 2019-05-14 10:21:28 +00:00
Matt Macy
9e58ff6ff9 convert inpcbinfo hash and info rwlocks to epoch + mutex
- Convert inpcbinfo info & hash locks to epoch for read and mutex for write
- Garbage collect code that handled INP_INFO_TRY_RLOCK failures as
  INP_INFO_RLOCK which can no longer fail

When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces
unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to
3%.

Overall packet throughput rate limited by CPU affinity and NIC driver design
choices.

On the receiver unhalted core cycles samples in in_pcblookup_hash went from
13% to to 1.6%

Tested by LLNW and pho@

Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15686
2018-06-19 01:54:00 +00:00
Andrey V. Elsukov
20efcfc602 Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.

Reviewed by:	melifaro, olivier
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D15789
2018-06-16 08:26:23 +00:00
Matt Macy
552b3e1798 witness/hwpmc: fix locking order for pmc locks 2018-05-28 23:14:38 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Stephen Hurd
f3e1324b41 Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by:	mmacy <mmacy@mattmacy.io>
Reviewed by:	shurd, sbruno
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14969
2018-05-02 19:36:29 +00:00
Mark Johnston
5cd29d0f3c Improve VM page queue scalability.
Currently both the page lock and a page queue lock must be held in
order to enqueue, dequeue or requeue a page in a given page queue.
The queue locks are a scalability bottleneck in many workloads. This
change reduces page queue lock contention by batching queue operations.
To detangle the page and page queue locks, per-CPU batch queues are
used to reference pages with pending queue operations. The requested
operation is encoded in the page's aflags field with the page lock
held, after which the page is enqueued for a deferred batch operation.
Page queue scans are similarly optimized to minimize the amount of
work performed with a page queue lock held.

Reviewed by:	kib, jeff (previous versions)
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14893
2018-04-24 21:15:54 +00:00
Konstantin Belousov
d86c1f0dc1 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00
Stephen Hurd
f422673e10 Make BPF global lock an SX
This allows NIC drivers to sleep on polling config operations.

Submitted by:	Matthew Macy <mmacy@mattmacy.io>
Reviewed by:	shurd
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14982
2018-04-10 19:42:50 +00:00
Jeff Roberson
9a4b4cd3bc Start witness much earlier in boot so that we can shrink the pend list and
make it more immune to further change.

Reviewed by:	markj, imp (Part of D14707)
Sponsored by:	Netflix, Dell/EMC Isilon
2018-03-22 19:11:43 +00:00
Jonathan T. Looney
2529f56ed3 Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.

The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.

It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.

You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.

This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.

There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.

Reviewed by:	gnn (previous version)
Obtained from:	Netflix, Inc.
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11085
2018-03-22 09:40:08 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Mateusz Guzik
5132933a08 Bump WITNESS_PENDLIST to accomodate sleepq chain bump.
Reported by:	ngie
2017-10-23 01:00:35 +00:00
Conrad Meyer
f41b85a63c ddb(4): Add 'show badstacks' command to show witness badstacks
Add a DDB command that mirrors sysctl debug.witness.badstacks.

Reapply r323935 after fixing trivial deficiency.  I forgot to compile with
WITNESS enabled.  Thanks emaste@ for fixing the build while I was asleep.

Reported by:	rstone
Reviewed by:	rstone (previous version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12468
2017-09-23 17:48:49 +00:00
Ed Maste
4c087f8a83 Revert r323935 as it broke the build
subr_witness.c:2577:4: error: use of undeclared identifier 'req'
                        req->oldidx = 0;
                        ^
2017-09-23 12:35:46 +00:00
Conrad Meyer
6fec2d2cce ddb(4): Add 'show badstacks' command to show witness badstacks
Add a DDB command that mirrors sysctl debug.witness.badstacks.

Reported by:	rstone
Reviewed by:	rstone
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12468
2017-09-22 20:01:12 +00:00
Mark Johnston
526b5fe16c Amend r321884 to check the refcount and update the class with w_mtx held.
Reviewed by:	jhb
X-MFC with:	r321884
2017-08-01 23:14:38 +00:00
Mark Johnston
57688b6e4e Fix a witness assertion that fires when a lock type's class changes.
When all instances of a lock type are destroyed (for example, after a
module unload), the corresponding witness entry remains associated with
that lock type. In this case, we shouldn't panic if a new instance of the
lock type is created and its lock class does not match that recorded in the
witness entry.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11788
2017-08-01 17:50:28 +00:00
Mark Johnston
69d2418faa Make witness_warn() always print to the console.
witness_warn() either breaks into the debugger or panics the system, so its
output should go to the console regardless of the witness(4) output channel
configuration.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-02-05 02:27:04 +00:00
Mark Johnston
ac91917211 Fix WITNESS hints for pagequeue locks.
MFC after:	1 week
2016-10-29 20:01:48 +00:00
Mateusz Guzik
1d2541fd1a cache: get rid of the global lock
Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

Discussed with:		kib
Tested by:	pho
2016-09-23 04:45:11 +00:00
Mateusz Guzik
a27815330c cache: improve scalability by introducing bucket locks
An array of bucket locks is added.

All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.

This is an intermediate step towards removal of the global lock.

Reviewed by:	kib
Tested by:	pho
2016-09-10 16:29:53 +00:00
Bruce Evans
d350ce61cf Less-quick fix for locking fixes in r172250. r172250 added a second
syscons spinlock for the output routine alone.  It is better to extend
the coverage of the first syscons spinlock added in r162285.  2 locks
might work with complicated juggling, but no juggling was done.  What
the 2 locks actually did was to cover some of the missing locking in
each other and deadlock less often against each other than a single
lock with larger coverage would against itself.  Races are preferable
to deadlocks here, but 2 locks are still worse since they are harder
to understand and fix.

Prefer deadlocks to races and merge the second lock into the first one.

Extend the scope of the spinlocking to all of sc_cnputc() instead of
just the sc_puts() part.  This further prefers deadlocks to races.

Extend the kdb_active hack from sc_puts() internals for the second lock
to all spinlocking.  This reduces deadlocks much more than the other
changes increases them.  The s/p,10* test in ddb gets much further now.
Hide this detail in the SC_VIDEO_LOCK() macro.  Add namespace pollution
in 1 nested #include and reduce namespace pollution in other nested
#includes to pay for this.

Move the first lock higher in the witness order.  The second lock was
unnaturally low and the first lock was unnaturally high.  The second
lock had to be above "sleepq chain" and/or "callout" to avoid spurious
LORs for visual bells in sc_puts().  Other console driver locks are
already even higher (but not adjacent like they should be) except when
they are missing from the table.  Audio bells also benefit from the
syscons lock being high so that audio mutexes have chance of being
lower.  Otherwise, console drviver locks should be as low as possible.
Non-spurious LORs now occur if the bell code calls printf() or is
interrupted (perhaps by an NMI) and the interrupt handler calls
printf().  Previous commits turned off many bells in console i/o but
missed ones done by the teken layer.
2016-08-25 13:46:52 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
Pedro F. Giffuni
8dfea46460 Remove slightly used const values that can be replaced with nitems().
Suggested by:	jhb
2016-04-21 15:38:28 +00:00
Pedro F. Giffuni
02abd40029 kernel: use our nitems() macro when it is available through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:48:27 +00:00
Mark Johnston
1155462e3a The buffer passed to an sbuf drain callback is not necessarily
null-terminated, so don't assume that it is.

Reported by:	pho
X-MFC-With:	r291059
2015-11-23 18:45:35 +00:00
Mark Johnston
150b709793 Remove a commented-out debug print.
MFC after:	1 week
2015-11-19 05:58:51 +00:00
Mark Johnston
1b9254f885 Add support for a configurable output channel to witness(4).
This is useful in environments where system configuration is performed by
automated interaction with the system console, since unexpected witness
output makes such automation difficult. With this change, the new
debug.witness.output_channel sysctl allows one to specify that witness
output is to be printed to the kernel log (using log(9)) rather than the
console.

Reviewed by:	cem, jhb
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4183
2015-11-19 05:56:59 +00:00
Marius Strobl
7815d3948c o Revert the other functional half of r239864, i. e. the merge of r134227
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
  but also for the MD cache invalidation, TLB demapping and remote register
  reading IPIs due to the following reasons:
  - The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
    sparc64. That's because on sparc64, spin locks don't disable interrupts
    completely but only raise the processor interrupt level to PIL_TICK. This
    means that IPIs still get delivered and direct dispatch IPIs such as the
    cache invalidation etc. IPIs in question are still executed.
  - In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
    IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
    Consequently, smp_ipi_mtx may be locked for an extended amount of time as
    queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
    scheduled via a soft interrupt. Moreover, given that this soft interrupt
    is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
    on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
    turn leading to the target in question trying to send an IPI by itself
    while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
    deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
  concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
  reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
  i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
  instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
  are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
  smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
  trying to be delivered to APs that in fact aren't up and running, yet.
  While at it, move setting of the cpu_ipi_{selected,single}() pointers to
  the appropriate delivery functions from mp_init() to cpu_mp_start() where
  it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
  the delays before completely disabling interrupts again in the CPU-specific
  cross-trap delivery functions, previously giving other CPUs a window for
  sending IPIs on their part. Actually, we now should be able to entirely get
  rid of completely disabling interrupts in these functions. Such a change
  needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
  necessary for correctness, this avoids page faults when accessing the stack
  of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
  kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
  possible, avoiding jitter.

PR:		201245
MFC after:	3 days
2015-07-24 15:13:21 +00:00
Mark Johnston
620711e033 Fix an incorrect assertion in witness.
The number of available lock list entries for a thread is LOCK_CHILDCOUNT,
and each entry can record up to LOCK_NCHILDREN locks. When iterating over
the locks held by a thread, a bound on the loop index is therefore given
by LOCK_CHILDCOUNT * LOCK_NCHILDREN; WITNESS_COUNT is an unrelated
constant.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2974
2015-07-07 19:29:18 +00:00
Mark Johnston
8b84d791a0 witness: don't warn about matrix inconsistencies without holding the mutex
Lock order checking is done without the witness mutex held, so multiple
threads that are racing to establish a new lock order may read matrix
entries that are in an inconsistent state. Don't print a warning in this
case, but instead just redo the check after taking the witness lock.

Differential Revision:	https://reviews.freebsd.org/D2713
Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-06-07 18:59:47 +00:00
Pedro F. Giffuni
cd508278c1 ddb: finish converting boolean values.
The replacement started at r283088 was necessarily incomplete without
replacing boolean_t with bool.  This also involved cleaning some type
mismatches and ansifying old C function declarations.

Pointed out by:	bde
Discussed with:	bde, ian, jhb
2015-05-21 15:16:18 +00:00
Konstantin Belousov
13dad10871 The umtx_lock mutex is used by top-half of the kernel, but is
currently a spin lock.  Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock.  Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.

Change umtx_lock to be the sleepable mutex.  For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.

Discussed with:	jhb
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-02-28 04:19:02 +00:00
Mark Johnston
4fd6ca7275 Fix a typo from r189544, which replaced unp_global_rwlock with unp_list_lock
and unp_link_rwlock.

MFC after:	3 days
2014-10-20 20:21:40 +00:00
Marcel Moolenaar
ddbe5b951f Fix nits in previous commit:
1.  Remove initializer for badstack_sbuf_size; it gets set unconditionally.
2.  Remove meaningless comment.
3.  Group witness_count and its sysctl together.
4.  Fix spacing in for statements (space after for and within condition).
5.  Change *all* M_NOWAIT usages in witness_initialize() to M_WAITOK; not
    just those that were newly introduced -- the allocation is assumed to
    succeed for all allocations.
6.  Avoid using uint8_t as the base type in sizeof() expressions; Use the
    variable name (w_rmatrix) as much as possible.

Pointed out by: jhb@ (thanks!)
2014-10-11 16:34:01 +00:00
Marcel Moolenaar
90a5222f14 Turn WITNESS_COUNT into a tunable and sysctl. This allows adjusting
the value without recompiling the kernel.  This is useful when
recompiling is not possible as an immediate solution. When we run out
of witness objects, witness is completely disabled. Not having an
immediate solution can therefore be problematic.

Submitted by:	Sreekanth Rupavatharam <rupavath@juniper.net>
Obtained from:	Juniper Networks, Inc.
2014-10-11 02:02:58 +00:00
Warner Losh
146cbf6fa2 Make the witness lock limit an option. 2014-08-03 05:00:43 +00:00
Marcel Moolenaar
e7d939bda2 Remove ia64.
This includes:
o   All directories named *ia64*
o   All files named *ia64*
o   All ia64-specific code guarded by __ia64__
o   All ia64-specific makefile logic
o   Mention of ia64 in comments and documentation

This excludes:
o   Everything under contrib/
o   Everything under crypto/
o   sys/xen/interface
o   sys/sys/elf_common.h

Discussed at: BSDcan
2014-07-07 00:27:09 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Peter Grehan
d6cd193e5e Bump WITNESS_PENDLIST by MAXCPU to account for the
pmap pvlist locks which are scaled by MAXCPU.

This allows an amd64 system to boot with MAXCPU set
to 256, which is currently FreeBSD's hard limit without
x2apic support.

Compile-tested for other arch's.

PR:	185831
Discussed with:		jhb
MFC after:	3 weeks
2014-04-29 17:22:29 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Neel Natu
84cc772fe5 Bump up WITNESS_COUNT from 1024 to 1536 so there are sufficient entries for
WITNESS to actually work.

Reviewed by:	jhb@
2014-01-20 01:59:35 +00:00
Dimitry Andric
951d674203 In sys/kern/subr_witness.c, remove static function
witness_lock_order_key_empty(), which is unused since r181695.

MFC after:	3 days
2013-12-25 16:58:14 +00:00