Under enough load, the swi's can actually be preempted and migrated
to other currently free cores. When doing RSS experiments, this lead
to the per-CPU TCP timers not lining up any more with the RX CPU said
flows were ending up on, leading to increased lock contention.
Since there was a little pushback on flipping them on by default,
I've left the default at "don't pin."
The other less obvious problem here is that the default swi
is also the same as the destination swi for CPU #0. So if one
pins the swi on CPU #0, there's no default floating swi.
A nice future project would be to create a separate swi for
the "default" floating swi, as well as per-CPU swis that are
(optionally) pinned.
Tested:
* parallel TCP tests (2 x 1g unfortunately for now);
CPU: Intel(R) Xeon(R) CPU E5-2650
Note:
This is based on some initial investigation into RSS/TCP stack lock
contention on FreeBSD-HEAD whilst at Netflix in January 2014.
rman_reserve_resource_bound() to return incorrect results.
Continue the initial search until the first viable region is found.
Add a comment to explain the search termination test.
PR: kern/188534
Reviewed by: jhb (previous version)
MFC after: 1 week
for lockmgr and sx interlocks, but unused since optimised versions of
those sleep locks were introduced. This will save a (quite) small
amount of memory in all kernel configurations. The sleep mutex pool is
retained as it is used for 'struct bio' and several other consumers.
Discussed with: jhb
MFC after: 3 days
It can fail if pipe map is exhausted (as a result of too many pipes created),
but it is not fatal and could be provoked by unprivileged users. The only
consequence is worse performance with given pipe.
Reported by: ivoras
Suggested by: kib
MFC after: 1 week
If time_t is 64-bit (i.e. isn't 32-bit) allow any value of year, not
just years less than 2038.
Don't bother fixing the underflow in the case of years before 1903.
MFC after: 1 week
Sponsored by: DARPA, AFRL
pmap pvlist locks which are scaled by MAXCPU.
This allows an amd64 system to boot with MAXCPU set
to 256, which is currently FreeBSD's hard limit without
x2apic support.
Compile-tested for other arch's.
PR: 185831
Discussed with: jhb
MFC after: 3 weeks
old devd's and thus hosts that get IP addresses from DHCP was too much
of a POLA violation.
The sysctl may be removed again after r263758 has been merged to at
least stable/9 and stable/10, and releases have been cut from those
branches.
Discussed with: mjg
Reported by: theraven, rwatson
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.
Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix
Due to the way those timers are implemented, we can't handle very short
intervals. In addition to that mentioned patch caused math overflows
for short intervals. To avoid that round those intervals to 1 tick.
PR: kern/187668
MFC after: 1 week
SBT_MAX, to make it more robust in case internal type representation will
change in the future. All the consumers were migrated to SBT_MAX and
every new consumer (if any) should from now use this interface.
Requested by: bapt, jmg, Ryan Lortie (implictly)
Reviewed by: mav, bde
execution to a emumation program via parsing of ELF header information.
With this kernel module and userland tool, poudriere is able to build
ports packages via the QEMU userland tools (or another emulator program)
in a different architecture chroot, e.g. TARGET=mips TARGET_ARCH=mips
I'm not connecting this to GENERIC for obvious reasons, but this should
allow the kernel module to be built by default and enable the building
of the userland tool (which automatically loads the kernel module).
Submitted by: sson@
Reviewed by: jhb@
Right now, init(8) cannot distinguish between an ACPI power button press
or a Ctrl+Alt+Del sequence on the keyboard. This is because
shutdown_nice() sends SIGINT to init(8) unconditionally, but later
modifies the arguments to reboot(2) to force a certain behaviour.
Instead of doing this, patch up the code to just forward the appropriate
signal to userspace. SIGUSR1 and SIGUSR2 can already be used to halt the
system.
While there, move waittime to the function where it's used; kern_reboot().
kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that
behaves the same, but operates on a procdesc(4) instead. Only implement
NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also
returns the exit status of the process, meaning that we can now obtain
this value, even if pdwait4(2) is still unimplemented.
Notes:
- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will
be used on totally different descriptor types, this should not clash.
- Let procdesc_kqops_event() reuse the same structure as filt_proc().
The only difference is that procdesc_kqops_event() should also be able
to deal with the case where the process was already terminated after
registration. Simply test this when hint == 0.
- Fix some style(9) issues in filt_proc() to keep it consistent with the
newly added procdesc_kqops_event().
- Save the exit status of the process in pd->pd_xstat, as we cannot pick
up the proctree_lock from within procdesc_kqops_event().
Discussed on: arch@
Reviewed by: kib@
According to <sys/proc.h>, this field needs to be locked with either the
p_mtx or the p_slock. In this case the damage was quite small. Instead
of being reaped, the process would just be reparented to init, so it
could be reaped from there.
kqueue_scan() unlocking the kqueue to call f_event, knote() or
knote_fork() should not skip the knote. The knote is not going to
disappear during the influx time, and the mutual exclusion between
scan and knote() is ensured by both code pathes taking knlist lock.
The race appears since knlist lock is before kq lock, so KN_INFLUX
must be set, kq lock must be dropped and only then knlist lock can be
taken. The window between kq unlock and knlist lock causes lost
events.
Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe
for the knote(), and check for it to ignore KN_INFLUX in the knote*()
as needed. Also, in knote(), remove the lockless check for the
KN_INFLUX flag, which could also result in the lost notification.
Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
few of them also build kern_clocksource.c. That strikes me as insane, but
maybe there's a good reason for it. Until I figure that out, un-break
the build by not referencing functions in kern_clocksource if NO_EVENTTIMERS
is defined.
To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.
Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.
Exp-run revealed no ports using it directly.
No objection from: arch@
Sponsored by: EMC / Isilon Storage Division
faults.
First, for accesses to direct map region should check for the limit by
which direct map is instantiated.
Second, for accesses to the kernel map, success returned from the
kernacc(9) does not guarantee that consequent attempt to read or write
to the checked address succeed, since other thread might invalidate
the address meantime. Add a new thread private flag TDP_DEVMEMIO,
which instructs vm_fault() to return error when fault happens on the
MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would
then see a page fault from access, and recover in normal way, making
/dev/mem access safer.
Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed
and having Giant locked does not solve issues for amd64.
Note that at least the second issue exists on other architectures, and
requires similar patching for md code.
Reported and tested by: clusteradm (gjb, sbruno)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Also, remove the expression which calculated the location of the
strings for a new image and grown over the time to be
non-comprehensible. Instead, calculate the offsets by steps, which
also makes fixing the alignments much cleaner.
Reported and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week