Each plaform performs virtual memory split between kernel and user space
and assigns kernel certain amount of memory space. However, is is sometimes
reasonable to change the default values. Such situation may happen on
systems where the demand for kernel buffers is high, many devices occupying
memory etc. This of course comes with the cost of decreasing user space
memory range so shall be used with care. Most embedded systems will not
suffer from this limtation but rather take advantage of this potential
since default behavior is left unchanged.
Submitted by: Wojciech Macek <wma@semihalf.com>
Reviewed by: imp
Obtained from: Semihalf
translation. In particular, despite IO-APICs only take 8bit apic id,
IR translation structures accept 32bit APIC Id, which allows x2APIC
mode to function properly. Extend msi_cpu of struct msi_intrsrc and
io_cpu of ioapic_intsrc to full int from one byte.
KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid
bringing all dmar headers into interrupt code. The non-PCI(e) devices
which generate message interrupts on FSB require special handling. The
HPET FSB interrupts are remapped, while DMAR interrupts are not.
For each msi and ioapic interrupt source, the iommu cookie is added,
which is in fact index of the IRE (interrupt remap entry) in the IR
table. Cookie is made at the source allocation time, and then used at
the map time to fill both IRE and device registers. The MSI
address/data registers and IO-APIC redirection registers are
programmed with the special values which are recognized by IR and used
to restore the IRE index, to find proper delivery mode and target.
Map all MSI interrupts in the block when msi_map() is called.
Since an interrupt source setup and dismantle code are done in the
non-sleepable context, flushing interrupt entries cache in the IR
hardware, which is done async and ideally waits for the interrupt,
requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is
modified to take a boolean argument requesting busy-wait for the
written sequence number instead of waiting for interrupt.
Some interrupts are configured before IR is initialized, e.g. ACPI
SCI. Add intr_reprogram() function to reprogram all already
configured interrupts, and call it immediately before an IR unit is
enabled. There is still a small window after the IO-APIC redirection
entry is reprogrammed with cookie but before the unit is enabled, but
to fix this properly, IR must be started much earlier.
Add workarounds for 5500 and X58 northbridges, some revisions of which
have severe flaws in handling IR. Use the same identification methods
as employed by Linux.
Review: https://reviews.freebsd.org/D1892
Reviewed by: neel
Discussed with: jhb
Tested by: glebius, pho (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
- Split the driver into independent pf and vf loadables. This is
in preparation for SRIOV support which will be following shortly.
This also allows us to keep a seperate revision control over the
two parts, making for easier sustaining.
- Make the TX/RX code a shared/seperated file, in the old code base
the ixv code would miss fixes that went into ixgbe, this model
will eliminate that problem.
- The driver loadables will now match the device names, something that
has been requested for some time.
- Rather than a modules/ixgbe there is now modules/ix and modules/ixv
- It will also be possible to make your static kernel with only one
or the other for streamlined installs, or both.
Enjoy!
Submitted by: jfv and erj
any defaults or user specified actions on the command line. This would
be useful for specifying features that are always broken or that
cannot make sense on a specific architecture, like ACPI on pc98 or
EISA on !i386 (!x86 usage of EISA is broken and there's no supported
hardware that could have it in any event). Any items in
BROKEN_OPTIONS are forced to "no" regardless of other settings.
Clients are expected change BROKEN_OPTIONS with +=. It will not
be unset, so other parts of the build system can have visibility
into the options that are broken on this platform, though this
should be very rare.
Differential Revision: https://reviews.freebsd.org/D2009
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.
Differential Revision: https://reviews.freebsd.org/D1987
Sponsored by: Mellanox Technologies
MFC after: 1 month
in kern_gzio.c. The old gzio interface was somewhat inflexible and has not
worked properly since r272535: currently, the gzio functions are called with
a range lock held on the output vnode, but kern_gzio.c does not pass the
IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr()
attempts to reacquire the range lock. Moreover, the new gzio interface can
be used to implement kernel core compression.
This change also modifies the kernel configuration options needed to enable
userland core dump compression support: gzio is now an option rather than a
device, and the COMPRESS_USER_CORES option is removed. Core dump compression
is enabled using the kern.compress_user_cores sysctl/tunable.
Differential Revision: https://reviews.freebsd.org/D1832
Reviewed by: rpaulo
Discussed with: kib
executables. The goal here, not yet accomplished, is to let the e500 kernel
run under QEMU by setting KERNBASE to something that fits in low memory and
then having the kernel relocate itself at runtime.
Implement the interace to create SR-IOV Virtual Functions (VFs).
When a driver registers that they support SR-IOV by calling
pci_setup_iov(), the SR-IOV code creates a new node in /dev/iov
for that device. An ioctl can be invoked on that device to
create VFs and have the driver initialize them.
At this point, allocating memory I/O windows (BARs) is not
supported.
Differential Revision: https://reviews.freebsd.org/D76
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Sandvine Inc.
I2C real-time clock (RTC).
The DS3231 has an integrated temperature-compensated crystal oscillator
(TXCO) and crystal.
DS3231 has a temperature sensor, an independent 32kHz output (which can be
turned on and off by the driver) and another output that can be used as
interrupt for alarms or as a second square-wave output, which frequency and
operation mode can be set by driver sysctl(8) knobs.
Differential Revision: https://reviews.freebsd.org/D1016
Reviewed by: ian, rpaulo
Tested on: Raspberry pi model B
dtrace is able to display a stack trace similar to the one below.
# dtrace -p 603 -n 'tcp:kernel::receive { stack(); }'
0 70 :receive
kernel`ip_input+0x140
kernel`netisr_dispatch_src+0xb8
kernel`ether_demux+0x1c4
kernel`ether_nh_input+0x3a8
kernel`netisr_dispatch_src+0xb8
kernel`ether_input+0x60
kernel`cpsw_intr_rx+0xac
kernel`intr_event_execute_handlers+0x128
kernel`ithread_loop+0xb4
kernel`fork_exit+0x84
kernel`swi_exit
kernel`swi_exit
Tested by: gnn
Sponsored by: ABT Systems Ltd
KERNBUILDDIR. Come up with some sensible defaults (though listing them
in kmod.mk may be unwise -- we have no easy way to know what are the
best sensible defaults for everything so we just catch the big stuff).
Append SRCS.${opt} for each option in KERN_OPTS to SRCS to allow easy
conditional compilation. Append any notion of KERN_OPTS_EXTRA to the
list of kernel opts.
Differential Revision: https://reviews.freebsd.org/D1530
this option from all modules that enable it theirselves.
In C mode -fms-extensions option enables anonymous structs and unions,
allowing us to use this C11 feature in kernel. Of course, clang supports
it without any extra options.
Reviewed by: dim
used by other places that expect to unwind the stack, e.g. dtrace and
stack(9).
As I have written most of this code I'm changing the license to the
standard FreeBSD license. I have received approval from the other
developers who have changed any of the affected code.
Approved by: ian, imp, rpaulo, eadler (all license change)
Highlights:
- Multiple verbs API updates
- Support for RoCE, RDMA over ethernet
All hardware drivers depending on the common infiniband stack has been
updated aswell.
Discussed with: np @
Sponsored by: Mellanox Technologies
MFC after: 1 month
has been removed and the driver has been greatly simplified and
optimised for FreeBSD. The driver is currently not built by default.
Requested by: Bruce Simpson <bms@fastmail.net>
This port failed to gain traction and probably only a couple Wii consoles
ran FreeBSD all the way to single user mode with an md(4). IPC
support was never implemented, so it was impossible to use any peripheral
Any further development, if any, will happen at https://github.com/rpaulo/wii.
Discussed with: nathanw (a long time ago), jhibbits
root with BSD.root.mtree, so it often times will not exist. Rather
than force the latter for an installkernel, just create the directory
with a comment about why.
Submitted by: Guy Yur
I discovered this while working on llvm/lld and realized export-dynamic
only supported --. Although upstream will eventually grow to support
both - and --, switch this in our build system, because GNU ld supports
both modes, and because there's some hope lld will become the default linker
for FreeBSD in the future.
Discussed with: emaste, rdivacky
in kernel config files..
put VERBOSE_SYSINIT in it's own option header so the one file,
init_main.c, can use it instead of requiring an entire kernel recompile
to change one file..
The C standard undefines behavior when signed integers overflow. The
compiler toolchain has become more adept at detecting this and taking
advantage of faster undefined behavior. At the current time this has the
unfortunate effect of the clock stopping after 24 days of uptime.
clang makes no distinction between -fwrapv and -fno-strict-overflow. gcc
does treat them differently but -fwrapv is mature in gcc and is the
behavior are actually expecting.
Obtained from: kib
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.
Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by: royger (eariler version)
MFC after: 1 month
allocations if only one element should be allocated per page
cache. Make one allocation per element compile time configurable. Fix
a comment while at it.
Suggested by: ian @
MFC after: 1 week
it processes its own ELF relocations and can be loaded and run in place at
any physical/virtual address.
NB: This requires an updated loader to boot!
Relnotes: yes
__attribute__((format(...))), and the -fformat-extensions flag was
removed, introduce a new macro in bsd.sys.mk to choose the right variant
of compile flag for the used compiler, and use it.
Also add something similar to kern.mk, since including bsd.sys.mk from
that file will anger Warner. :-)
Note that bsd.sys.mk does not support the MK_FORMAT_EXTENSIONS knob used
in kern.mk, since that knob is only available in kern.opts.mk, not in
src.opts.mk. We might want to add it later, to more easily support
external compilers for building world (in particular, sys/boot).
bits.
The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.
* net/rss_config.[ch] takes care of the generic bits for doing
configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.
This should have no functional impact on anyone currently using
the RSS support.
Differential Revision: D1383
Reviewed by: gnn, jfv (intel driver bits)
amd64. Until further we need some custom C-flags when building the
Linux compat API.
MFC after: 1 month
Sponsored by: Mellanox Technologies
Reported by: bz@
by dumbbell@ to be able to compile this layer as a dependency module.
Clean up some Makefiles and remove the no longer used OFED define.
Currently only i386 and amd64 targets are supported.
MFC after: 1 month
Sponsored by: Mellanox Technologies
code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly
identical and simply redefine a number of constants and helper subroutines;
a generic implementation will make it easier to implement features around
kernel core dumps. This change does not alter any minidump code and should
have no functional impact.
PR: 193873
Differential Revision: https://reviews.freebsd.org/D904
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Reviewed by: jhibbits (earlier version)
Sponsored by: EMC / Isilon Storage Division
get the tail part of the path. We can now build kernels the
old-fashioned way on FreeBSD 9.x and 10.x on at least amd64 using
clang 3.3, 3.4 or gcc 4.2.1 (though with the latter you need
WITHOUT_MODULES="aesni vmm cxgbe" due to various issues with
gcc 4.2.1).
of the scan API.
The eventual aim is to have 'ieee80211_scan.c' have the net80211 and
driver facing scan API to start, finish and continue doing scanning
while 'ieee80211_swscan.c' implements the software scanner that
runs the scan task, handles probe request/reply bits, configures
the VAP off-channel, changes channel and does the scanning bits.
For NICs that do no scanning at all, the existing code is needed.
ath(4) and most of the other NICs (dumb USB ones in particular)
do little to no scan offload - it's all done in software.
Some NICs may do single channel at a time scanning; I haven't really
checked them out in detail.
iwn(4), the upcoming 7260 driver stuff, the new Qualcomm Atheros
11ac chipsets and the Atheros mobile/USB full-offload chips all
have complete scan engines in firmware. We don't have to drive
any of it at all - the firmware just needs to be told what to scan,
when to scan, how long to scan. It'll take care of going off
channel, pausing TX/RX appropriately, sending sleep notification
to the AP, sending probe requests and handling probe responses.
It'll do passive/active scan itself. It's almost completely
transparent to the network stack - all we see are scan notifications
when it finishes scanning each channel and beacons/probe responses
when it does its thing. Once it's done we get a final notification
that the scan is complete, with some scan results in the message.
The iwn(4) NICs handle doing active scanning too as an option
and will handle waiting appropriately on 5GHz passive channels
before active scanning.
There's some more refactoring, tidying up and lock assertions to
sprinkle around to tidy this whole thing up before I turn swscan.c
into another set of ic methods to override by the driver or
alternate scan module. So in theory this is all one big no-op
commit. In theory.
Tested:
* iwn(4) 5200, STA mode
* ath(4) 6205, STA mode
* ath(4) - various NICs, AP mode
has support for the .codeXX directives). However, it is desirable, for
a time, to allow kernels to be built with clang 3.4. Historically, it
has been advantageous to allow stable X-1 to build kernels the old
way (so long as the impact of doing so is small), and this restores
that ability.
Also, centralize the addition of ${ASM_CFLAGS.${.IMPSRC}}, place it in
kern.mk rather than kern.pre.mk so that all modules can benefit, and
give the same treatment to CFLAGS in kern.mk as well.
building with gcc 4.2
This has been requested several times over the past few months by several
people (including me), because gcc 4.2 just gets it wrong too often. It's
causing us to litter the code with lots of bogus initializers just to
squelch the warnings. We still have clang and coverity telling us about
uninitialized variables, and they do so more accurately.
mostly paves the way for the new pmap code, and shouldn't result in any
noticible behavior differences.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>,
Michal Meloun <meloun@miracle.cz
CWARNFALGS.$file centrally so we don't have to have it in all the
places. Remove a few warning flags that are no longer needed.
Also, always use -Wno-unknown-pragma to (hopefully temporarily) work
around #pragma ident in debug.h in the opensolaris code. Remove some
stale warning suppression that's no longer necessary.
roughly 10 years, and the driver has not enjoyed any significant maintenance
since long before that. Despite well-meaning efforts from a number of
people, myself included, it never made the jump to 64-bit and was relegated
to the back-corners of i386. Now its frailty is hampering forward progress
with Clang. Any renewed engineering efforts are of course welcome and can
happen outside of the tree. No MFC of this is planned.
When we started compiling the kernel with -march=armv7 the compiler
started emitting new types of relocation info which are incompatible with
the shared-lib file format used by .ko modules. This workaround prevents
the compiler from emitting the instruction sequences that require the
new relocs. This amounts to using an undocumented internal compiler
flag, so this is just a temporary workaround while we look for a good fix.
PR: 196407
the place where the C dialect is selected. Have a fairly long list
of newly requires warning suppression for clang 3.5.0, also
centralized in kern.mk. Survive the fallout of the removal of
bsd.sys.mk from bsd.kmod.mk.
raft of new warnings that appear to be on by default in clang 3.5.0.
Fix RPI-B build issues with new clang not liking the ability to pass
arbitrary flags to as, since some flags are more arbitrary (and thus
verboten) than others.
These warnings should be actually fixed in the code, but this is a
band-aide to get things (almost) building again.
a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.
b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual. mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue. It also has:
- the ability to enqueue/dequeue multiple items. This might become
significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
descriptors and have another if_transmit thread take over. A thread
that's writing tx descriptors can end up doing so for an unbounded
time period if a) there are other if_transmit threads continuously
feeding the sofware queue, and b) the chip keeps up with whatever the
thread is throwing at it.
- accurate statistics about interesting events even when the stats come
at the expense of additional branches/conditional code.
The NIC txq lock is uncontested on the fast path at this point. I've
left it there for synchronization with the control events (interface
up/down, modload/unload).
c) Add support for "type 1" coalescing work request in the normal NIC tx
path. This work request is optimized for frames with a single item in
the DMA gather list. These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.
d) Do not request automatic cidx updates every 32 descriptors. Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately). Also, request an automatic final update
when the queue idles after activity. This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles. This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.
e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx). Allow work requests to be written
directly to the hardware descriptor ring if room is available. I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.
MFC after: 2 months
code alongside the existing implementation and quickly toggle between
the two implementations when testing. Once the new code is past its
teething stage we can remove this option.
initially set up the MMU. Some day they may also be useful as part of
suspend/resume handling, when we get better at power management.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>,
Michal Meloun <meloun@miracle.cz
mechanism defined for armv7 (and also present on some armv6 chips including
the arm1176 used on rpi). The information is parsed into a global cpuinfo
structure, which will be used by (upcoming) new cache and tlb maintenance
code to handle cpu-specific variations of the maintence sequences.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>,
Michal Meloun <meloun@miracle.cz
images into the kernel as currently included into iwn6000g2{a,b}fw.ko
and delete the old files, missed in r254199 and r259135 respectively.
MFC after: 3 days
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.
Discussed on: freebsd-fs
the orphaned descendants. Base of the API is modelled after the same
feature from the DragonFlyBSD.
Requested by: bapt
Reviewed by: jilles (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
for counter mode), and AES-GCM. Both of these modes have been added to
the aesni module.
Included is a set of tests to validate that the software and aesni
module calculate the correct values. These use the NIST KAT test
vectors. To run the test, you will need to install a soon to be
committed port, nist-kat that will install the vectors. Using a port
is necessary as the test vectors are around 25MB.
All the man pages were updated. I have added a new man page, crypto.7,
which includes a description of how to use each mode. All the new modes
and some other AES modes are present. It would be good for someone
else to go through and document the other modes.
A new ioctl was added to support AEAD modes which AES-GCM is one of them.
Without this ioctl, it is not possible to test AEAD modes from userland.
Add a timing safe bcmp for use to compare MACs. Previously we were using
bcmp which could leak timing info and result in the ability to forge
messages.
Add a minor optimization to the aesni module so that single segment
mbufs don't get copied and instead are updated in place. The aesni
module needs to be updated to support blocked IO so segmented mbufs
don't have to be copied.
We require that the IV be specified for all calls for both GCM and ICM.
This is to ensure proper use of these functions.
Obtained from: p4: //depot/projects/opencrypto
Relnotes: yes
Sponsored by: FreeBSD Foundation
Sponsored by: NetGate
Mave the grant table code into the dev/xen folder in preparation for turning
it into a device using the newbus interface. This is just code motion, no
functional changes.
Sponsored by: Citrix Systems R&D
When running as a Xen PVH Dom0 we need to add custom buses that override
some of the functionality present in the ACPI PCI Bus and the PCI Bus. We
currently override the ACPI PCI Bus, but not the PCI Bus, so add a new
override for the PCI Bus and share the generic functions between them.
Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D
conf/files.amd64:
- Add the new files.
x86/xen/xen_pci_bus.c:
- Generic file that contains the PCI overrides so they can be used by the
several PCI specific buses.
xen/xen_pci.h:
- Prototypes for the generic overried functions.
dev/xen/pci/xen_pci.c:
- Xen specific override for the PCI bus.
dev/xen/pci/xen_acpi_pci.c:
- Xen specific override for the ACPI PCI bus.
to automatically set the armv6 option when MACHINE_ARCH is armv6. That
allows replacing ever-growing lists of cpu names as options to compile
a given file with the using either "optional armv6" or "optional !armv6".
partitions. Several utilities still use this interface and require
additional information since gpart was activated than before. This
allows fsck of a UFS partition without having to specify it is UFS,
per historic behavior.
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.
No objections from: net@
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.
gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
work at system startup. But this also removes support for having tunnels with
the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
Use our standard ioctls for tunnels.
me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);
PR: 164475
Differential Revision: D1023
No objections from: net@
Relnotes: yes
Sponsored by: Yandex LLC
problems than it solves. SYSDIR is already defined almost always and
can be used instead. Working around the one case where it isn't is
much easier than working around the fact that @ may not exist in 18
other places.
Differential Revision: https://reviews.freebsd.org/D1100
875. This intersects with the agp_i810.c, which supports all Intels
from i810 to Core i5/7. Both agp_intel.c and agp_i810.c are compiled
into kernel when device agp is specified in config, and agp_i810
attach seems to be selected by chance due to linking order.
Strip support for 810 and later from agp_intel.c. Since 440-class
chipsets do not support any long-mode capable CPUs, remove agp_intel.c
from amd64 kernel file list. Note that agp_intel.c is not compiled
into agp.ko on amd64 already.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.
The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.
The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.
Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.
My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.
My Nomex pants are on. Let the feedback commence!
Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by: so(des)
Multipass device attachment was tested on many arm platforms by users and
only success was reported on the arm@ mailing list. This is just the
long-delayed followup of making it the default.
Multipass attachment is necessary when using vendor-supplied FDT data,
because our devices may need to be attached in a different order than they
are described in the FDT data.
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.
Sponsored by: Citrix Systems R&D
i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
- Introduce a function to make generic hypercalls into Xen.
xen/interface/xen.h:
xen/interface/memory.h:
- Import the new hypercall XENMEM_add_to_physmap_range used by
auto-translated guests to map memory from foreign domains.
dev/xen/privcmd/privcmd.c:
- This device has the following functions:
- Allow user-space applications to make hypercalls into Xen.
- Allow user-space applications to map memory from foreign domains,
this is accomplished using the newly introduced hypercall
(XENMEM_add_to_physmap_range).
xen/privcmd.h:
- Public ioctl interface for the privcmd device.
x86/xen/hvm.c:
- Remove declaration of hypercall_page, now it's declared in
hypercall.h.
conf/files:
- Add the privcmd device to the build process.
The user-space event channel device is used by applications to receive
and send event channel interrupts. This device is based on the Linux
evtchn device.
Sponsored by: Citrix Systems R&D
xen/evtchn/evtchn_dev.c:
- Remove the old event channel device, which was already disabled in
the build system.
dev/xen/evtchn/evtchn_dev.c:
- Import a new event channel device based on the one present in
Linux.
- This device allows the following operations:
- Bind VIRQ event channels (ioctl).
- Bind regular event channels (ioctl).
- Create and bind new event channels (ioctl).
- Unbind event channels (ioctl).
- Send notifications to event channels (ioctl).
- Reset the device shared memory ring (ioctl).
- Unmask event channels (write).
- Receive event channel upcalls (read).
- The new code is MP safe, and can be used concurrently.
conf/files:
- Add the new device to the build system.
vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in
a UDP packet. This implementation is based on RFC7348.
Currently, the IPv6 support is not fully compliant with the specification:
we should be able to receive UPDv6 packets with a zero checksum, but we
need to support RFC6935 first. Patches for this should come soon.
Encapsulation protocols such as vxlan emphasize the need for the FreeBSD
network stack to support batching, GRO, and GSO. Each frame has to make
two trips through the network stack, and each frame will be at most MTU
sized. Performance suffers accordingly.
Some latest generation NICs have begun to support vxlan HW offloads that
we should also take advantage of. VIMAGE support should also be added soon.
Differential Revision: https://reviews.freebsd.org/D384
Reviewed by: gnn
Relnotes: yes
vestige of a time when we needed to do that, but it is all handled by
beforedepend now. When we depend on the symlink, bmake will cause the
file to be rebuilt always.
With this change, dtrace.ko doesn't rebuild every time through a
KERNFAST run.
Sponsored by: Netfix
For compatibility, 'device windtunnel' is still supported, but one should use
'device adm1030' instead, and this has been updated in GENERIC and NOTES.
The powerpc support was the only supported architecture not prepending the elf format name
with "-freebsd" in base this change makes it consistent with other architectures.
On newer version of binutils the powerpc format is also prepended with "-freebsd".
Also modify the kernel ldscripts in that regards.
As a result it is now possible cross build the kernel on powerpc using newer binutils
Differential Revision: https://reviews.freebsd.org/D926
Differential Revision: https://reviews.freebsd.org/D928
syscalls themselves are tightly coupled with the network stack and
therefore should not be in the generic socket code.
The following four syscalls have been marked as NOSTD so they can be
dynamically registered in sctp_syscalls_init() function:
sys_sctp_peeloff
sys_sctp_generic_sendmsg
sys_sctp_generic_sendmsg_iov
sys_sctp_generic_recvmsg
The syscalls are also set up to be dynamically registered when COMPAT32
option is configured.
As a side effect of moving the SCTP syscalls, getsock_cap needs to be
made available outside of the uipc_syscalls.c source file. A proper
prototype has been added to the sys/socketvar.h header file.
API tests from the SCTP reference implementation have been run to ensure
compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)
Submitted by: Steve Kiernan <stevek@juniper.net>
Reviewed by: tuexen, rrs
Obtained from: Juniper Networks, Inc.
interrupts and report the largest value seen as sysctl
debug.max_kstack_used. Useful to estimate how close the kernel stack
size is to overflow.
In collaboration with: Larry Baird <lab@gta.com>
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
the oabi is still in the tree, but it is expected this will be removed
as developers work on surrounding code.
With this commit the ARM EABI is the only supported supported ABI by
FreeBSD on ARMa 32-bit processors.
X-MFC after: never
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D876
This device is used by the user-space daemon that runs xenstore
(xenstored). It allows xenstored to map the xenstore memory page, and
reports the event channel xenstore is using.
Sponsored by: Citrix Systems R&D
dev/xen/xenstore/xenstored_dev.c:
- Add the xenstored character device that's used to map the xenstore
memory into user-space, and to report the event channel used by
xenstore.
conf/files:
- Add the device to the build process.
Move xenstore related devices (xenstore.c and xenstore_dev.c) from
xen/xenstore to dev/xen/xenstore. This is just code motion, no
functional changes.
Sponsored by: Citrix Systems R&D
This patch adds support for MSI interrupts when running on Xen. Apart
from adding the Xen related code needed in order to register MSI
interrupts this patch also makes the msi_init function a hook in
init_ops, so different MSI implementations can have different
initialization functions.
Sponsored by: Citrix Systems R&D
xen/interface/physdev.h:
- Add the MAP_PIRQ_TYPE_MULTI_MSI to map multi-vector MSI to the Xen
public interface.
x86/include/init.h:
- Add a hook for setting custom msi_init methods.
amd64/amd64/machdep.c:
i386/i386/machdep.c:
- Set the default msi_init hook to point to the native MSI
initialization method.
x86/xen/pv.c:
- Set the Xen MSI init hook when running as a Xen guest.
x86/x86/local_apic.c:
- Call the msi_init hook instead of directly calling msi_init.
xen/xen_intr.h:
x86/xen/xen_intr.c:
- Introduce support for registering/releasing MSI interrupts with
Xen.
- The MSI interrupts will use the same PIC as the IO APIC interrupts.
xen/xen_msi.h:
x86/xen/xen_msi.c:
- Introduce a Xen MSI implementation.
x86/xen/xen_nexus.c:
- Overwrite the default MSI hooks in the Xen Nexus to use the Xen MSI
implementation.
x86/xen/xen_pci.c:
- Introduce a Xen specific PCI bus that inherits from the ACPI PCI
bus and overwrites the native MSI methods.
- This is needed because when running under Xen the MSI messages used
to configure MSI interrupts on PCI devices are written by Xen
itself.
dev/acpica/acpi_pci.c:
- Lower the quality of the ACPI PCI bus so the newly introduced Xen
PCI bus can take over when needed.
conf/files.i386:
conf/files.amd64:
- Add the newly created files to the build process.
- Use bus_*() instead of bus_space_*().
- Use device_printf().
- Remove unused global variables and the extra warning suppression
they required.
- Use callout() instead of timeout().
Reviewed by: se
For now restrict it to amd64. Other architectures might be
re-added later once tested.
Remove the drivers from the global NOTES and files files and move
them to the amd64 specifics.
Remove the drivers from the i386 modules build and only leave the
amd64 version.
Rather than depending on "inet" depend on "pci" and make sure that
ixl(4) and ixlv(4) can be compiled independently [2]. This also
allows the drivers to build properly on IPv4-only or IPv6-only
kernels.
PR: 193824 [2]
Reviewed by: eric.joyner intel.com
MFC after: 3 days
References:
[1] http://lists.freebsd.org/pipermail/svn-src-all/2014-August/090470.html
for amd64/linux32. Fix the entirely bogus (untested) version from
r161310 for i386/linux using the same shared code in compat/linux.
It is unclear to me if we could support more clock mappings but
the current set allows me to successfully run commercial
32bit linux software under linuxolator on amd64.
Reviewed by: jhb
Differential Revision: D784
MFC after: 3 days
Sponsored by: DARPA, AFRL
The flowdirector feature shares on-chip memory with other things
such as the RX buffers. In theory it should be configured in a way
that doesn't interfere with the rest of operation. In practice,
the RX buffer calculation didn't take the flow-director allocation
into account and there'd be overlap. This lead to various garbage
frames being received containing what looks like internal NIC state.
What _I_ saw was traffic ending up in the wrong RX queues.
If I was doing a UDP traffic test with only one NIC ring receiving
traffic, everything is fine. If I fired up a second UDP stream
which came in on another ring, there'd be a few percent of traffic
from both rings ending up in the wrong ring. Ie, the RSS hash would
indicate it was supposed to come in ring X, but it'd come in ring Y.
However, when the allocation was fixed up, the developers at Verisign
still saw traffic stalls.
The flowdirector feature ends up fiddling with the NIC to do various
attempts at load balancing connections by populating flow table rules
based on sampled traffic. It's likely that all of that has to be
carefully reviewed and made less "magic".
So for now the flow director feature is disabled (which fixes both
what I was seeing and what they were seeing) until it's all much
more debugged and verified.
Tested:
* (me) 82599EB 2x10G NIC, RSS UDP testing.
* (verisign) not sure on the NIC (but likely 82599), 100k-200k/sec TCP
transaction tests.
Submitted by: Marc De La Gueronniere <mdelagueronniere@verisign.com>
MFC after: 1 week
Sponsored by: Verisign, Inc.
many thanks for their continued support of FreeBSD.
While I'm there, also implement a new build knob, WITHOUT_HYPERV to
disable building and installing of the HyperV utilities when necessary.
The HyperV utilities are only built for i386 and amd64 targets.
This is a stable/10 candidate for inclusion with 10.1-RELEASE.
Submitted by: Wei Hu <weh microsoft com>
MFC after: 1 week
PCI IDs into quirks, which mostly fit (though you'd get no argument
from me that AHCI_Q_SATA1_UNIT0 is oddly specific). Set these quirks
in the PCI attachment. Make some shared functions public so that PCI
and possibly other bus attachments can use them.
The split isn't perfect yet, but it is functional. The split will be
perfected as other bus attachments for AHCI are written.
Sponsored by: Netflix
Reviewed by: kan, mav
Differential Revision: https://reviews.freebsd.org/D699
friendliness. This should restore old-fashioned kernel building in a
cross environment, though this has only had limited testing.
Sponsored by: Netflix
in the clocks=<...> properties of their FDT data. The clock properties
consist of 2-cell tuples, each containing a clock device node reference and
a clock number. A clock device driver can register itself as providing
this interface, then other drivers can turn the FDT clock node reference
into the corresponding device_t so that they can use the interface to query
and manipulate their clocks.
This provides convenience functions to enable or disable all the clocks
listed in the properties for a device, so most drivers will be able to
manage their clocks with a single call to fdt_clock_enable_all(dev).
This is the last major change in given branch.
Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.
Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..
Some examples:
3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
- It was decided to change the driver name to if_ixl for FreeBSD
- This release adds the VF Driver to the tree, it can be built into
the kernel or as the if_ixlv module
- The VF driver is independent for the first time, this will be
desireable when full SRIOV capability is added to the OS.
- Thanks to my new coworker Eric Joyner for his superb work in
both the core and vf driver code.
Enjoy everyone!
Submitted by: jack.vogel@intel.com and eric.joyner@intel.com
MFC after: 3 days (hoping to make 10.1)
UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.
There are still a few outstanding problems; they will be fixed shortly.
Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric: D523
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
cutover is, but we need better tools to cope with inline tuning per
compiler version than we have. This is a quick bandaid until such
tools are around.
socket options. This includes managing the correspoing stat counters.
Add the SCTP_DETAILED_STR_STATS kernel option to control per policy
counters on every stream. The default is off and only an aggregated
counter is available. This is sufficient for the RTCWeb usecase.
MFC after: 1 week
options into kern.opts.mk and change all the places where we use
src.opts.mk to pull in the options. Conditionally define SYSDIR and
use SYSDIR/conf/kern.opts.mk instead of a CURDIR path. Replace all
instances of CURDIR/../../etc with STSDIR, but only in the affected
files.
As a special compatibility hack, include bsd.owm.mk at the top of
kern.opts.mk to allow the bare build of sys/modules to work on older
systems. If the defaults ever change between 9.x, 10.x and current for
these options, however, you'll wind up with the host OS' defaults
rather than the -current defaults. This hack will be removed when
we no longer need to support this build scenario.
Reviewed by: jhb
Differential Revision: https://phabric.freebsd.org/D529
device attachment on arm platforms. If this is defined, nexus attaches
early in BUS_PASS_BUS, and other busses and devices attach later, in the
pass number they are set up for. Without it defined, nexus attaches in
BUS_PASS_DEFAULT and thus so does everything else, which is status quo.
Arm platforms which use FDT data to enumerate devices have been relying
on devices being attached in the exact order they're listed in the dts
source file. That's one of things currently preventing us from using
vendor-supplied fdt data (because then we don't control the order of the
devices in the data). Multi-pass attachment can go a long way towards
solving that problem by ensuring things like clock and interrupt drivers
are attached before the more mundane devices that need them.
The long-term goal is to have all arm fdt-based platforms using multipass.
This option is a bridge to that, letting us enable it selectively as
platforms are converted and tested (the alternative being to just throw
a big switch and try to fight fires as they're reported).
handled by creator(4) (Sun Creator 3D, Elite 3D, etc.). This provides
vt(4) consoles on all devices currently supported by syscons on sparc64.
The driver should also be easily adaptable to support newer Sun framebuffers
such as the XVR-500 and higher.
Many thanks to dumbbell@ (Jean-Sebastien Pedron) for testing this remotely
during development.
The MD allocators were very common, however there were some minor
differencies. These differencies were all consolidated in the MI allocator,
under ifdefs. The defines from machine/vmparam.h turn on features required
for a particular machine. For details look in the comment in sys/sf_buf.h.
As result no MD code left in sys/*/*/vm_machdep.c. Some arches still have
machine/sf_buf.h, which is usually quite small.
Tested by: glebius (i386), tuexen (arm32), kevlo (arm32)
Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
opt_inet6.h into kmod.mk by forcing almost everybody to eat the same
dogfood. While at it, consolidate the opt_bpf.h and opt_mroute.h
targets here too.
provides support for a variety of low-end graphics hardware (SBus adapters,
Mach64, QEMU's framebuffer, XVR-100). A driver for at least the Creator3D
cards will have to be present before this can become the default console
driver.
To test vt(4) on sparc64, set kern.vty=vt at the loader prompt.
* Rewrite interface tables to use interface indexes
Kernel changes:
* Add generic interface tracking API:
- ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates
state & bumps ref)
- ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to
update ifindex)
- ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer)
- ipfw_iface_unref(unlocked, drops reference)
Additionally, consumer callbacks are called in interface withdrawal/departure.
* Rewrite interface tables to use iface tracking API. Currently tables are
implemented the following way:
runtime data is stored as sorted array of {ifidx, val} for existing interfaces
full data is stored inside namedobj instance (chained hashed table).
* Add IP_FW_XIFLIST opcode to dump status of tracked interfaces
* Pass @chain ptr to most non-locked algorithm callbacks:
(prepare_add, prepare_del, flush_entry ..). This may be needed for better
interaction of given algorithm an other ipfw subsystems
* Add optional "change_ti" algorithm handler to permit updating of
cached table_info pointer (happens in case of table_max resize)
* Fix small bug in ipfw_list_tables()
* Add badd (insert into sorted array) and bdel (remove from sorted array) funcs
Userland changes:
* Add "iflist" cmd to print status of currently tracked interface
* Add stringnum_cmp for better interface/table names sorting
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.
Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.
Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.
In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.
Submitted by: Steve Kiernan <stevek@juniper.net>
Obtained from: Juniper Networks, Inc.
completely silenced. Make sure these warnings appear again, so there is
some incentive to fix them, but do not error out the whole kernel build
for them.
Noticed by: steven@pyro.eu.org
PR: 191867
MFC after: 3 days
This allows to clone VMs and move them between LUNs inside one storage
host without generating extra network traffic to the initiator and back,
and without being limited by network bandwidth.
LUNs participating in copy operation should have UNIQUE NAA or EUI IDs set.
For LUNs without these IDs VMWare will use traditional copy operations.
Beware: the above LUN IDs explicitly set to values non-unique from the VM
cluster point of view may cause data corruption if wrong LUN is addressed!
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This includes:
o All directories named *ia64*
o All files named *ia64*
o All ia64-specific code guarded by __ia64__
o All ia64-specific makefile logic
o Mention of ia64 in comments and documentation
This excludes:
o Everything under contrib/
o Everything under crypto/
o sys/xen/interface
o sys/sys/elf_common.h
Discussed at: BSDcan
console's ability to enter the debugger.... rwatson forgot to document
this when he changed it back in 2011... There is more docs to write
about this, but at least fix this for now...
Reviewed by: emaste
MFC after: 1 week
Both vt(4) and ofwfb(4) need a lot of love to be usable on sparc64 and even
then the performance of ofwfb(4) would suck compared to hardware accelerated
drivers like creator(4) and machfb(4).
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.
MFC after: 1 month
implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to
SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the
default colors of normal and kernel text respectively.
Note on the naming: Although affecting the output of vt(4), technically
kern/subr_terminal.c is primarily concerned with changing default colors
so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR.
Actually, if the architecture and abstraction of terminal+teken+vt would
be perfect, dev/vt/* wouldn't be touched by this commit at all.
Reviewed by: emaste
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
configs. Switch the BERI_NETFPGA_MDROOT to 64bit by default.
Give we have working interrupts also cleanup the extra polling CFLAGS from
the module Makefile.
MFC after: 2 weeks