Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.
This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.
This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.
However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.
Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.
As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.
These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).
PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix
These firmwares were obtained from the "Chelsio T5/T4 Unified Wire
v2.12.0.3 for Linux" release. Changes since 1.14.4.0 (which is the
firmware in -STABLE branches) are in the "Release Notes" accompanying
the Unified Wire release and are copy-pasted here as well.
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
ETH
- Fixed a link flap issue on T580-CR.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
- Fixed an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host. (DCBX CEE)
- Fixed an issue where apply bit set for APP id was affecting the ETS and PFC
settings.(DCBX IEEE)
- Fixed an issue where app priority values are not handled correctly in fw.
(DCBX IEEE)
- Fixed an issue where enable/disable dcbx can cause crash. (DCBX CEE,DCBX IEEE)
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.37.0
Date : 04/27/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
DCBX
- Fixed an issue where iscsi tlv is sent incorrectly to host.(DCBX CEE)
- Fixed an issue where enable/disable dcbx can cause crash in firmware.(DCBX CEE)
FOiSCSI
- Fixes an issue where fw was sending ENETUNREACH event for normal tcp
disconnection.
FOFCoE
- Removed BB6 support.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 6 weeks
Relnotes: yes
Sponsored by: Chelsio Communications
gpiokey driver implements functional subset of gpiokeys device-tree bindings:
https://www.kernel.org/doc/Documentation/devicetree/bindings/input/gpio-keys.txt
It acts as a virtual keyboard, so keys are visible through kbdmux(4)
Driver maps linux scancodes for most common keys to FreeBSD scancodes and
also extends spec by introducing freebsd,code property to specify
FreeBSD-native scancodes.
Reviewed by: mmel, jmcneill
Differential Revision: https://reviews.freebsd.org/D6279
legacy siba sentry5 cpu glue.
The siba_cc code is the hard-coded chipcommon bits for the sentry s5,
which will eventually be replaced with the more flexible bhnd sipa/cc
code.
bwn, etc uses siba_bwn, which doesn't use siba or siba_cc to do anything.
Turns out that ye olde siba.c is /just/ the siba mips code (used by
the initial SENTRY5 port. However, I don't think it was ever
finished enough to be useful, and I do have this nagging feeling
that we'll eventually replace it with the bhnd code.
But, since bhnd(4) introduced siba.c too, we ended up with a
source file name clash, and that broke the SENTRY5 build.
It /looks/ like this is the only place siba.c / device siba is
used.
Trim the leading directory of a firmware source file from the resulting
target object file name so the object file is stored in the object
directory. Previously, using 'FIRMWS= /path/to/fw.bin:fw.bin' would
store the generated 'fw.bin.fwo' file in the /path/to directory. Now
it stores it in the object directory of the kernel module being built.
Reviewed by: bdrewery
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6285
This adds support for the NVRAM handling and the basic SPROM
hardware used on siba(4) and bcma(4) devices, including:
* SPROM directly attached to the PCI core, accessible via PCI configuration
space.
* SPROM attached to later ChipCommon cores.
* SPROM variables vended from the parent SoC bus (e.g. via a directly-attached
flash device).
Additional improvements to the NVRAM/SPROM interface will
be required, but this changeset stands alone as working
checkpoint.
Submitted by: Landon Fuller <landonf@landonf.org>
Reviewed by: Michael Zhilin <mizkha@gmail.com> (Broadcom MIPS support)
Differential Revision: https://reviews.freebsd.org/D6196
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
Clocks, GPIO, UART, SD card / eMMC, USB, watchdog, and ethernet are
supported. Note that the A83T contains two clusters of four Cortex-A7
CPUs, and only CPUs in first cluster are started for now.
Tested on a Sinovoip Banana Pi BPI-M3.
This is an initial work in progress to use the replacement bhnd
bus code for devices which support it.
* Add manpage updates for bhnd, bhndb, siba
* Add kernel options for bhnd, bhndbus, etc
* Add initial support in if_bwn_pci / if_bwn_mac for using bhnd
as the bus transport for suppoted NICs
* if_bwn_pci will eventually be the PCI bus glue to interface to bwn,
which will use the right backend bus to attach to, versus direct
nexus/bhnd attachments (as found in embedded broadcom devices.)
The PCI glue defaults to probing at a lower level than the bwn glue,
so bwn should still attach as per normal without a boot time tunable set.
It's also not fully fleshed out - the bwn probe/attach code needs to be
broken out into platform and bus specific things (just like ath, ath_pci,
ath_ahb) before we can shift the driver over to using this.
Tested:
* BCM4311, STA mode
* BCM4312, STA mode
Submitted by: Landon Fuller <landonf@landonf.org>
Differential Revision: https://reviews.freebsd.org/D6191
* Break out the 'g' phy code;
* Break out the debugging bits into a separate source file, since
some debugging prints are done in the phy code;
* Make some more chip methods in if_bwn.c public.
This brings the size of if_bwn.c down to 6,805 lines which is now
approaching managable.
This (and eventually migrating the other PHY code out) is in preparation
for adding the 11n PHY. No, the 11ac PHY (for the BCM4260 softmac part) isn't
yet open source, so we can't grow that. Yet.
This trims ~3,700 lines of code from if_bwn.c, bringing it down to a slightly
less crazy sounding 10,446 lines of code.
implementations. Early in the boot the kernel will use an approximate,
however after the timer has been probed it will switch to a more accurate
implementation.
Reviewed by: manu
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D5762
There is no need to clear all the DDR memory (we only need to clear
BSS section).
I was playing with non-default version of hardware (the bitfile
synthesized for 4-level page memory system) and clearing was helpful,
but then realized support for 4-level page system is untested/broken
in both RocketCore and lowRISC.
The additional regex replacements are actully required due to an
elfcopy bug which is now fixed (by r298361), not a Clang/GCC issue.
Sponsored by: The FreeBSD Foundation
Newest CLANG objcpy uses different name parsing.
Modify regexp to match (i.e. avoid substitution
of "/" or "-" with "_").
Obtained from: Semihalf
Sponsored by: Juniper Networks
Reviewed by: hselasky, zbb
Differential Revision: https://reviews.freebsd.org/D5873
fdt_static_dtb.S dependency in sys/conf/files is currently set as:
$S/boot/fdt/dts/${MACHINE}/${FDT_DTS_FILE}
This is wrong, as what fdt_static_dtb.S actually uses is the DTB file
produced from the FDT_DTS_FILE.
In addition it also makes using DTS files stored in $S/gnu/dts/${MACHINE}/
impossible.
So, change the dependency to "fdt_dtb_file", which seems to be the right
option here anyway.
Approved by: adrian (mentor)
Sponsored by: Smartcom - Bulgaria AD
Differential Revision: https://reviews.freebsd.org/D5963
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
as before. The common scheduling bits have moved from inline code in
each of the CAM periph drivers into a library that implements the
default scheduling.
In addition, a number of rate-limiting and I/O preference options can
be enabled by adding CAM_IOSCHED_NETFLIX to your config file. A number
of extra stats are also maintained. CAM_IOSCHED_NETFLIX isn't on by
default because it uses a separate BIO_READ and BIO_WRITE queue, so
doesn't honor BIO_ORDERED between these two types of operations. We
already didn't honor it for BIO_DELETE, and we don't depend on
BIO_ORDERED between reads and writes anywhere in the system (it is
currently used with BIO_FLUSH in ZFS to make sure some writes are
complete before others start and as a poor-man's soft dependency in
one place in UFS where we won't be issuing READs until after the
operation completes). However, out of an abundance of caution, it
isn't enabled by default.
Plus, this also brings in NCQ TRIM support for those SSDs that support
it. A black list is also provided for known rogues that use NCQ trim
as an excuse to corrupt the drive. It was difficult to separate out
into a separate commit.
This code has run in production at Netflix for over a year now.
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D4609
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system. DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().
MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5782
This optimization attempts to utylize as wide as possible register store instructions to zero large buffers.
The implementation, if possible, will use 'dc zva' to zero buffer by cache lines.
Speedup: 60x faster memory zeroing
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5726
separate driver. Add support for activating clock and hwreset resources
for these devices when the EXT_RESOURCES option is present.
Reviewed by: andrew, mmel, Emmanuel Vadot <manu@bidouilliste.com>
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5749
PowerPC has real Open Firmware and does not necessarily need FDT.
Make ofwpci.c only PCI dependent.
Pointed out by: emaste
Reviewed by: nwhitehorn
Obtained from: Semihalf
Import portions of the PowerPC OF PCI implementation into new file
"ofwpci.c", common for other platforms. The files ofw_pci.c and ofw_pci.h
from sys/powerpc/ofw no longer exist. All required declarations are moved
to sys/dev/ofw/ofwpci.h. This creates a new ofw_pci_write_ivar() function
and modifies some others methods. Most functions contain existing ppc
implementations in the majority unchanged. Now there is no need to have
multiple identical copies of methods for various architectures.
Requested by: jhibbits
Reviewed by: jhibbits, marius
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D4879
In r296926 the -P <path> option was added to kbdcontrol, which enables
this change for a simplified compile-time default keymap build process.
PR: 193865
Reviewed by: Oliver Pinter
Tested by: Oliver Pinter
MFC After: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5708
Make it a device option to be included in the kernel configs that request this file.
Reported by: mmel
Suggested by: mmel
Reviewed by: mmel
Differential Revision: https://reviews.freebsd.org/D5699
support frameworks (i.e. clk/regulators/tsensors/fuses...).
It provides simple unified consumers interface for manipulations with
phy (USB/SATA/PCIe) resources.
support frameworks(i.e. clk/reset/phy/tsensors/fuses...).
The framework is still far from perfect and probably doesn't have stable
interface yet, but we want to start testing it on more real boards and
different architectures.
Keymap header files have historically been generated using the build
host's /usr/sbin/kbdcontrol and using the host's keymap files.
However, that introduces an issue when building a kernel to use vt(4)
on a system using sc(4), or vice versa: kbdcontrol searches for keymap
files in the /usr/share subdirectory appropriate for the host, not the
target.
With this change the build searches both the and sc keymap directories
from the source tree.
PR: 193865
Submitted by: Harald Schmalzbauer
The inclusion of .MAKE.DEPENDFILE (.depend) has special logic in make
to ignore stale/missing dependencies. bmake 20160220 added a '.dinclude'
directive that uses the special logic for .depend when including the file.
This fixes a build error when a file is moved or deleted that exists in a
.depend.OBJ file. This happened in r292782 when sha512c.c "moved" and an
incremental build of lib/libmd would fail with:
make: don't know how to make /usr/src/lib/libcrypt/../libmd/sha512c.c. Stop
Now this will just be seen as a stale dependency and cause a rebuild:
make: /usr/obj/usr/src/lib/libmd/.depend.sha512c.o, 13: ignoring stale .depend for /usr/src/lib/libcrypt/../libmd/sha512c.c
--- sha512c.o ---
...
This rebuild will only be done once since the .depend.sha512c.o will
be updated on the build with the -MF flags.
This also removes -MP being passed for the .depend.OBJ generation (which
would create fake targets for system headers) since the logic is no
longer needed to protect from missing files.
Sponsored by: EMC / Isilon Storage Division
This was a regression in r295985.
bsd.dep.mk adds to SRCS for dtrace probes, yacc grammars and some
others.
The code that is moving is planned to be removed once FAST_DEPEND is
default (and the only option) though since FAST_DEPEND doesn't use this.
Pointyhat to: bdrewery
Sponsored by: EMC / Isilon Storage Division
improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.
A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.
The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.
Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.
Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.
Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.
Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.
Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289
These firmwares were obtained from the beta "Chelsio T5/T4 Unified Wire
v2.12.0.2 for Linux" release. Changes since last release are listed in the
"Release Notes" accompanying the beta release and are copy-pasted here as well.
The plan is to have only GA'd firmwares in any -STABLE FreeBSD branch so I'll
MFC this (after 2 months) only if it ends up in a GA release.
================================================================================
================================================================================
22.1. T5 Firmware
+++++++++++++++++++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where the default ingress
queue was ignored.
- Fixed an issue where adapter failed to load fw by adjusting DRAM frequency.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Fixed 40G link failures with some switches when auto-negotiation enabled.
- Fixed to improve on link bring-up time.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where bogus d3hot bits were set causing traffic stall.
- Fixed an issue where sometimes adapter was not seen after reboot.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
- Fixed an issue where link failed to come up after removing twinax cable and
inserting optical module.
OFLD
- Fixed a potential iSCSI data corruption issue by disabling RxFragEn flag.
FOiSCSI
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP was not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added new interface to program DCA settings in SGE contexts; allow 32-byte
IQE size
- Added PTP interface fw_ptp_ts to support PTP Frequeny and Offset adjustment.
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
OFLD:
- WR opcode is returned to host in cqe error response.
================================================================================
================================================================================
22.2. T4 Firmware
+++++++++++++++++
Version : 1.15.28.0
Date : 02/29/2016
================================================================================
FIXES
-----
BASE:
- Fixed an issue in FW_RSS_VI_CONFIG_CMD handling where default ingress queue
was ignored.
- Fixed an issue in watchdog which was causing VM bring-up failure after
reboot.
- Per port buffer groups size doubled to improve performance.
- Fixed an issue where iWARP was crashing in conjunction with traffic
management.
FOiSCSI:
- Fixed an issue in recovery path where connection was getting closed before
recovery processing was done.
- Fixed an issue in TCP port reuse.
- Fixed an issue in recovery path when large number (>64) of iSCSI connections
were in use.
- Returned ENETUNREACH if IP had not been provisioned yet and driver tried to
use given inerface.
ENHANCEMENTS
------------
BASE:
- Added MPS raw interface.
ETH:
- New mailbox command FW_DCB_IEEE_CMD api added for IEEE dcbx.
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
different methods to start the secondary cores in a kernel built for
multiple SoCs, e.g. with the Allwinner A20 and A31.
Sponsored by: ABT systems Ltd
Differential Revision: https://reviews.freebsd.org/D5466
This allows 'make analyze' or 'make OBJ.clang-analyzer' to run the
Clang static analyzer and present results on stdout.
Obtained from: NetBSD (CVS Rev. 1.3)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5449
will be needed when we bring in further support for these SoCs.
Submitted by: Emmanuel Vadot <manu@bidouilliste.com>
Differential Revision: https://reviews.freebsd.org/D5340
This will generate dependencies rather than depending on the previous behavior
of depending on the guessed OBJS: *.h dependecies or a user running
'make depend'.
Experimentation showed that depending only on headers was not enough and
prone to .ORDER errors. Downstream users may also have added
dependencies into beforedepend or afterdepend targets. The safest way to
ensure dependencies are generated before build is to run 'make depend'
beforehand rather than just depending on DPSRCS+SRCS.
Note that the OBJS_DEPEND_GUESS mechanism (a.k.a .if !exists(.depend) then
foo.o: *.h) is still useful as it improves incremental builds with missing
.depend.* files and allows 'make foo.o' to usually work, while this
'beforebuild: depend' ensures that the build will always find all dependencies.
The 'make foo.o' case has no means of a 'beforebuild' hook.
This also removes several hacks in the DIRDEPS_BUILD:
- NO_INSTALL_INCLUDES is no longer needed as it mostly was to work around
.ORDER problems with building the needed headers early.
- DIRDEPS_BUILD: It is no longer necesarry to track "local dependencies" in
Makefile.depend.
These were only in Makefile.depend for 'clean builds' since nothing would
generate the files due to skipping 'make depend' and early dependency
bugs that have been fixed, such as adding headers into SRCS for the
OBJS_DEPEND_GUESS mechanism. Normally if a .depend file does not exist then
a dependency is added by bsd.lib.mk/bsd.prog.mk from OBJS: *.h. However,
meta.autodep.mk creates a .depend file from created meta files and inserts
that into Makefile.depend. It also only tracks *.[ch] files though which can
miss some dependencies that are hooked into 'make depend'. This .depend
that is created then breaks incremental builds due to the !exists(.depend)
checks for OBJS_DEPEND_GUESS. The goal was to skip 'make depend' yet it only
really works the first time. After that files are not generated as expected,
which r288966 tried to address but was using buildfiles: rather than
beforebuild: and was reverted in r291725. As noted previously,
depending only on headers in beforebuild: would create .ORDER errors
in some cases.
meta.autodep.mk is still used to generate Makefile.depend though via:
gendirdeps: Makefile.depend
.END: gendirdeps
This commit allows removing all of the "local dependencies" in
Makefile.depend which cuts down on churn and removes some of the
arch-dependent Makefile.depend files.
The "local dependencies" were also problematic for bootstrapping.
Sponsored by: EMC / Isilon Storage Division
FAST_DEPEND is intended to be the "skip 'make depend' and mkdep"
feature. Since DIRDEPS_BUILD does this already with some of its own
hacks, and filemon doesn't need this, and nofilemon does, teach it how
to handle each of these cases.
In meta+filemon mode filemon will handle dependencies itself via the
meta mode logic in bmake. We still want to set MK_FAST_DEPEND=yes to
enable some logic that indicates that 'make depend' is skipped in the
traditional sense. The actual .depend.* files will be skipped.
When nofilemon is set though we still need to track and generate dependencies.
Sponsored by: EMC / Isilon Storage Division
The .depend file will still be generated if _EXTRADEPEND is used. The target
is kept with a dependency on DPSRCS though so that 'make depend' will generate
all files.
Sponsored by: EMC / Isilon Storage Division
Rather than depend on .depend not existing, check the actual
.depend.OBJ file that will be used for that object. If it doesn't
exist then use the guessed dependencies.
FAST_DEPEND may never have a .depend file. Not having one means all of the
previous logic would over-depend all object files on all headers which is not
what we wanted. It also means that if a .depend is generated before a build
is done for _EXTRADEPEND (such as for PROG or LIB) then all of these
dependencies would not be used since the .depend wasn't generated from mkdep
and the real .depend.* files are not generated until the build.
Sponsored by: EMC / Isilon Storage Division
The 'cleanilinks' target is kept since it may still be useful as added in
r200178, though never documented.
Sponsored by: EMC / Isilon Storage Division
Tested on Spike simulator with 2 and 16 cores (tlb enabled),
so set MAXCPU to 16 at this time.
This uses FDT data to get information about CPUs
(code based on arm64 mp_machdep).
Invalidate entire TLB cache as it is the only way yet.
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
* provided OFW interface for pci_host_generic (for handling devices which are present in DTS under the PCI node)
* removed support for internal PCI from arm64/cavium
* cleaned up and made most of the code common
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D5261
Split heartbeat, shutdown and timesync out of utils code
and name them properly.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe, Hongjiang Zhang <honzhan microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5216
and geom_uncompress(4):
1. mkuzip(8):
- Proper support for eliminating all-zero blocks when compressing an
image. This feature is already supported by the geom_uzip(4) module
and CLOOP format in general, so it's just a matter of making mkuzip(8)
match. It should be noted, however that this feature while it sounds
great, results in very slight improvement in the overall compression
ratio, since compressing default 16k all-zero block produces only 39
bytes compressed output block, which is 99.8% compression ratio. With
typical average compression ratio of amd64 binaries and data being
around 60-70% the difference between 99.8% and 100.0% is not that
great further diluted by the ratio of number of zero blocks in the
uncompressed image to the overall number of blocks being less than
0.5 (typically). However, this may be important from performance
standpoint, so that kernel are not spinning its wheels decompressing
those empty blocks every time this zero region is read. It could also
be important when you create huge image mostly filled with zero
blocks for testing purposes.
- New feature allowing to de-duplicate output image. It turns out that
if you twist CLOOP format a bit you can do that as well. And unlike
zero-blocks elimination, this gives a noticeable improvement in the
overall compression ratio, reducing output image by something like
3-4% on my test UFS2 3GB image consisting of full FreeBSD base system
plus some of the packages (openjdk, apache etc), about 2.3GB worth of
file data (800+MB compressed). The only caveat is that images created
with this feature "on" would not work on older versions of FeeBSDxi
kernel, hence it's turned off by default.
- provide options to control both features and document them in manual
page.
- merge in all relevant LZMA compression support from the mkulzma(8),
add new option to select between both.
- switch license from ad-hoc beerware into standard 2-clause BSD.
2. geom_uzip(4):
- implement support for de-duplicated images;
- optimize some code paths to handle "all-zero" blocks without reading
any compressed data;
- beef up manual page to explain that geom_uzip(4) is not limited only
to md(4) images. The compressed data can be written to the block
device and accessed directly via magic of GEOM(4) and devfs(4),
including to mount root fs from a compressed drive.
- convert debug log code from being compiled in conditionally into
being present all the time and provide two sysctls to turn it on or
off. Due to intended use of the module, it can be used in
environments where there may not be a luxury to put new kernel with
debug code enabled. Having those options handy allows debug issues
without as much problem by just having access to serial console or
network shell access to a box/appliance. The resulting additional
CPU cycles are just few int comparisons and branches, and those are
minuscule when compared to data decompression which is the main
feature of the module.
- hopefully improve robustness and resiliency of the geom_uzip(4) by
performing some of the data validation / range checking on the TOC
entries and rejecting to attach to an image if those checks fail.
- merge in all relevant LZMA decompression support from the
geom_uncompress(4), enable automatically when appropriate format is
indicated in the header.
- move compilation work into its own worker thread so that it does not
clog g_up. This allows multiple instances work in parallel utilizing
smp cores.
- document new knobs in the manual page.
Reviewed by: adrian
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D5333
- Add URTWN_WITHOUT_UCODE option (will disable any firmware specific code
when set).
- Do not exclude the driver from build when MK_SOURCELESS_UCODE is set
(URTWN_WITHOUT_UCODE will be enforced unconditionally).
- Do not abort initialization when firmware cannot be loaded;
behave like the URTWN_WITHOUT_UCODE option was set.
- Drop some unused variables from urtwn_softc structure.
Tested with RTL8188EU and RTL8188CUS in HOSTAP and STA modes.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4849
Extract common code from PowerPC's ofw_pci
Import portions of the PowerPC OF PCI implementation into
new file "ofw_pci.c", common for other platforms. The files ofw_pci.c and
ofw_pci.h from sys/powerpc/ofw no longer exist. All required declarations
are moved to sys/dev/ofw/ofw_pci.h.
This creates a new ofw_pci_write_ivar() function and modifies
ofw_pci_nranges(), ofw_pci_read_ivar(), ofw_pci_route_interrupt()
methods.
Most functions contain existing ppc implementations in the majority
unchanged. Now there is no need to have multiple identical copies
of methods for various architectures.
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Reviewed by: jhibbits, mmel
Differential Revision: https://reviews.freebsd.org/D4879
This needs to return to the drawing board as it breaks both
PowerPC and Sparc64 build.
Pointed out by: jhibbits
This will speed up some tree-walks with FAST_DEPEND which otherwise
would include length(SRCS) .depend files.
This also uses a trick suggested by sjg@ to still read them in when
specifying _V_READ_DEPEND=1 in the env/make args.
Sponsored by: EMC / Isilon Storage Division
Import portions of the PowerPC OF PCI implementation into
new file "ofw_pci.c", common for other platforms. The files ofw_pci.c and
ofw_pci.h from sys/powerpc/ofw no longer exist. All required declarations
are moved to sys/dev/ofw/ofw_pci.h.
This creates a new ofw_pci_write_ivar() function and modifies
ofw_pci_nranges(), ofw_pci_read_ivar(), ofw_pci_route_interrupt() methods.
Most functions contain existing ppc implementations in the majority
unchanged. Now there is no need to have multiple identical copies
of methods for various architectures.
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Reviewed by: jhibbits, mmel
Differential Revision: https://reviews.freebsd.org/D4879
configuration from the FDT data, then set the pins into the requested
state. As part of this the gpio controller now reports the correct number
of pins instead of returning the number of bank * 32.
To allow for a future consolidated kernel we add the SOC_ALLWINNER_A10 and
SOC_ALLWINNER_A20 kernel options. These need to be set as appropriate for
the SoC the kernel will boot on.
Submitted by: Emmanuel Vadot <manu@bidouilliste.com>
Differential Revision: https://reviews.freebsd.org/D5177
Some chip revisions don't have their external PCIe buses
behind the internal bridge. Add support for FDT-configurable
PEMs but keep ability for PCIe enumeration.
Reviewed by: andrew, wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5285
This was originall done by kan@.
Submitted by: Stanislav Galabov <sgalabov@gmail.com>
Reviewed by: kan
Differential Revision: https://reviews.freebsd.org/D5184
This allows skipping 'make depend' or running 'make clean all' without
getting a flip-flopping dependency due to the exists() just below.
Otherwise an error is encountered, such as:
fatal error: 'machine/endian.h' file not found.
Sponsored by: EMC / Isilon Storage Division
a mips big-endian board.
This is (hopefully! ish!) a temporary change until a slightly better way
can be found to express this without a config option.
Tested:
* BUFFALO WZR-HP-G300NH 1stGen (by submitter)
Submitted by: Mori Hiroki <yamori813@yahoo.co.jp>
This revision does the following renames:
CPU_MIPS24KC -> CPU_MIPS24K
CPU_MIPS74KC -> CPU_MIPS74K
CPU_MIPS1004KC -> CPU_MIPS1004K
It also adds the following new CPU_MIPSxxx options:
CPU_MIPS24KE, CPU_MIPS34K, CPU_MIPS1074K, CPU_INTERAPTIV, CPU_PROAPTIV
CPU_MIPSxxxxKC is limiting and possibly misleading as it implies the
MIPSxxxxK CPU has no FPU.
It would be better if the CPUs are named after their standard functionalities
only and the presence or absence of FPU can then be controlled via the
CPU_HAVEFPU option.
I will send out another dependent revision that moves MIPS 32 r2 and r3
CPUs to use the EHB instruction for clearing hazards instead of NOP/SSNOP.
Submitted by: Stanislav Galabov <sgalabov@gmail.com>
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D5077
MD_ROOT_SIZE and embed_mfs.sh were basically retired as part of
https://reviews.freebsd.org/D2903 .
However, when building a kernel with 'options MD_ROOT_SIZE' specified, this
results in a non-working MFS, as within sys/dev/md/md.c we fall within the
wrong # ifdef.
This patch implements the following:
* Allow kernels to be built without the MD_ROOT_SIZE option, which results
in a kernel built as per D2903.
* Allow kernels to be built with the MD_ROOT_SIZE option, which results
in a kernel built similarly to the pre-D2903 way, with the following
differences:
* The MFS is now put in a separate section within the kernel (oldmfs,
so it differs from the mfs section introduced by D2903).
* embed_mfs.sh is changed, so it looks up the oldmfs section within the
kernel, gets its size and offset, sees if the MFS will fit within the
allocated oldmfs section and only if all is well does a dd of the MFS
image into the kernel.
Submitted by: Stanislav Galabov <sgalabov@gmail.com>
Reviewed by: brooks, imp
Differential Revision: https://reviews.freebsd.org/D5093
This is the final step required allowing to compile and to run RISC-V
kernel and userland from HEAD.
RISC-V is a completely open ISA that is freely available to academia
and industry.
Thanks to all the people involved! Special thanks to Andrew Turner,
David Chisnall, Ed Maste, Konstantin Belousov, John Baldwin and
Arun Thomas for their help.
Thanks to Robert Watson for organizing this project.
This project sponsored by UK Higher Education Innovation Fund (HEIF5) and
DARPA CTSRD project at the University of Cambridge Computer Laboratory.
FreeBSD/RISC-V project home: https://wiki.freebsd.org/riscv
Reviewed by: andrew, emaste, kib
Relnotes: Yes
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D4982
Provide an easy to use framework for ARM64 DDB disassembler.
This commit does not contain full list of instruction opcodes.
Obtained from: Semihalf
Sponsored by: Cavium
Approved by: cognet (mentor)
Reviewed by: zbb, andrew, cognet
Differential revision: https://reviews.freebsd.org/D5114
Some firmware revisions provide different DTB tree that include
odd MDIO placement in the tree.
This commit adds support for 2 new buses:
- MRML bridge (PCIB subordinate)
- MDIO nexus (MRML subordinate)
This allows for the correct MDIO attachment with both - new and old
firmware.
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5070
- Separate FDT and general PCIe driver parts
- Drop some irrelevant printfs that cannot be displayed in
FDT attach
- Move ranges parsing to FDT portion of PCIe code
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5067
Allows for using hardware watchpoints for 1, 2, 4, 8 byte long addresses.
The default configuration of watchpoint is RW but code allows to select
RO or WO and X.
Since debugging registers are per-CPU (CP14) the watchpoint is set on
the CPU that was lucky (or not) to enter DDB.
HW breakpoints are used to perform single step in KDB.
When HW breakpoint is enabled all watchpoints are temporary disabled
to avoid recursive abort on both watchpoint and breakpoint.
In case of branch, the breakpoint is set to both - next instruction
and possible branch address. This requires at least 2 breakpoints
supported in the CPU however this is a must for ARMv6/v7 CPUs.
Reviewed by: imp
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Juniper Networks Inc.
Differential Revision: https://reviews.freebsd.org/D4037
support frameworks (i.e. regulators/phy/tsensors/fuses...).
It provides simple unified consumers interface for manipulations with
on-chip resets.
Reviewed by: ian, imp (paritaly)
support frameworks(i.e. reset/regulators/phy/tsensors/fuses...).
The clock framework significantly simplifies handling of complex clock
structures found in modern SoCs. It provides the unified consumers
interface, holds and manages actual clock topology, frequency and gating.
It's tested on three different ARM boards (Nvidia Tegra TK1, Inforce 6410 and
Odroid XU2) and on one MIPS board (Creator Ci20) by kan@.
The framework is still far from perfect and probably doesn't have stable
interface yet, but we want to start testing it on more real boards and
different architectures.
Reviewed by: ian, kan (earlier version)
separate file. Claim my copyright.
- Provide more comments, better function and structure names.
- Sort out unneeded includes from resulting two files.
No functional changes.
use of fdt_fixup_table on PowerPC and ARM. As such we can remove it from
other architectures as it's unneeded.
Reviewed by: nwhitehorn
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D5013
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.
Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.
Differential Revision: https://reviews.freebsd.org/D5007
Submitted by: Damjan Jovanovic
Reviewed by: pfg
Tested by: pho
Relnotes: Yes
MFC after: 2 months (only 10.x)
This commit introduces initial support for Marvell Armada38x platform.
Changes:
- Add common DTS files for Armada38x SoCs and DTS file for A388-GP
- Add ARMADA38X kernel configuration
- Add option SOC_MV_ARMADA38X and set MV_PCI_PORTS
- Add list of files to compile
- Implement get_tclk(), get_sar_value(), cpu_reset() functions
- Add CPU ID and SoC numbers
- Correct ifdefs in arm/mv/timer.c
Reviewed by: ian, imp
Obtained from: Semihalf
Sponsored by: Stormshield
Submitted by: Michal Stanek <mst@semihalf.com>
Differential revision: https://reviews.freebsd.org/D4210
This fixes .depend.genassym.o not being included. genassym.o depends on
all of the system headers and when rebuilt regenerates assym.s which
lists offsets for critial .S files to utilize. By having a struct in a
system header change its offsets and not have generassym.o be rebuilt,
this would lead to panics.
The flaw in the initial commit was seeing ${OBJS} in ${SYSTEM_OBJS} and
assuming it had all of ${SRCS} in it. This is not the case though. The
older mkdep code splits out all of the various SRC lists for generating
the .depend file. It also includes ${GEN_CFILES}, which had genassym.c
and was the only significant file lacking from ${SYSTEM_OBJS} upon inspection,
since it is not linked in. Rather than duplicate the likely
soon-to-be-removed mkdep lists, just add genassym.o to the DEPENDOBJS
list. Using ${SRCS} as bsd.dep.mk does would be nice but there are many
files in the build that are only added to ${OBJS} and not ${SRCS}, such
as bf_enc.o derived from bf_enc.S for i386.
Sponsored by: EMC / Isilon Storage Division
Reported by: dhw (several panics on current@)
Pointyhat to: bdrewery
The .MAKEFLAGS check inside of the .for loop is extremely slow for some
reason. Just moving it out of the loop trimmed -V lookup time from 11
seconds to 1 second in the kernel obj directory.
Sponsored by: EMC / Isilon Storage Division
Add support for Huntington MCDI licensing interface to common code.
Ported from Linux net driver IOCTL functions with restructuring for
initial support for V3 licensing API.
Submitted by: Richard Houldsworth <rhouldsworth at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4918
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: delphij, royger, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4676
The upcoming GELI support in the loader reuses parts of this code
Some ifdefs are added, and some code is moved outside of existing ifdefs
The HMAC parts of GELI are broken out into their own file, to separate
them from the kernel crypto/openssl dependant parts that are replaced
in the boot code.
Passed the GELI regression suite (tools/regression/geom/eli)
Files=20 Tests=14996
Result: PASS
Reviewed by: pjd, delphij
MFC after: 1 week
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D4699
It seems that `options GZIP` and `options ZFS` collide because they both
define inconsistent definitions for inflate, etc
Fixing this will require upgrading zlib in the kernel, as suggested in
r245102.
Pointyhat to: ngie
Reported by: bz
Sponsored by: EMC / Isilon Storage Division
KERNCONF when "make tinderbox" is run
This will help ensure that "options ZFS" will not be accidentally
regressed, as the current LINT configuration tests the zfs module
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
cperciva's libmd implementation is 5-30% faster
The same was done for SHA256 previously in r263218
cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation
Extend sbin/md5 to create sha384(1)
Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}
Reviewed by: cperciva, des, delphij
Approved by: secteam, bapt (mentor)
MFC after: 2 weeks
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3929
The mdio driver interface is generally useful for devices that require
MDIO without the full MII bus interface. This lifts the driver/interface
out of etherswitch(4), and adds a mdio(4) man page.
Submitted by: Landon Fuller <landon@landonf.org>
Differential Revision: https://reviews.freebsd.org/D4606
TFO is disabled by default in the kernel build. See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.
Reviewed by: gnn, jch, stas
MFC after: 3 days
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D4350
ports.
The sys/mips/rt305x/ code currently has these hard-coded with a comment
to make them configurable; this is the first step towards that.
Submitted by: Stanislav Galabov <sgalabov@gmail.com>
Add support for two new devices: X552 SFP+ 10 GbE, and the single port
version of X550T.
Submitted by: erj
Reviewed by: gnn
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D4186
into a new function that other platforms can share.
This creates a new ofw_reg_to_paddr() function (in a new ofw_subr.c file)
that contains most of the existing ppc implementation, mostly unchanged.
The ppc code now calls the new MI code from the MD code, then creates a
ppc-specific bus_space mapping from the results. The new arm implementation
does the same in an arm-specific way.
This also moves the declaration of OF_decode_addr() from ofw_machdep.h to
openfirm.h, except on sparc64 which uses a different function signature.
This will help all FDT platforms to set up early console access using
OF_decode_addr().
The ci20 port (by kan@) is going to reuse almost all of the intrng code
since the SoC in question looks suspiciously like someone took an ARM
SoC design and replaced the ARM core with a MIPS core.
* migrate out the code;
* rename ARM_ -> INTR_;
* rename arm_ -> intr_;
* move the interrupt flush routine from intr.c / intrng.c into
arm/machdep_intr.c - removing the code duplication and removing
the ARM specific bits from here.
Thanks to the Star Wars: The Force Awakens premiere line for allowing
me a couple hours of quiet time to finish the universe builds.
Tested:
* make universe
TODO:
* The structure definitions in subr_intr.c still includes machine/intr.h
which requires one duplicates all of the intrng definitions in
the platform code (which kan has done, and I think we don't have to.)
Instead I should break out the generic things (function declarations,
common intr structures, etc) into a separate header.
* Kan has requested I make the PIC based IPI stuff optional.
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
I have?". "what is the MTU?", "Give me the IPv4 source address to use",
etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
dealing with IPv6 mtu, finding "address" ifp (almost never done right),
provide easy-to-use API hiding all the complexity and returning the
needed info into small on-stack structure.
This change also helps hiding route subsystem internals (locking, direct
rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
and rtentry WLOCK).
Sponsored by: Yandex LLC
clock_gettime(2) on ARMv7 and ARMv8 systems which have architectural
generic timer hardware. It is similar how the RDTSC timer is used in
userspace on x86.
Fix a permission problem where generic timer access from EL0 (or
userspace on v7) was not properly initialized on APs.
For ARMv7, mark the stack non-executable. The shared page is added for
all arms (including ARMv8 64bit), and the signal trampoline code is
moved to the page.
Reviewed by: andrew
Discussed with: emaste, mmel
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4209
One reason the kernel does not build reproducibly is that it includes
a timestamp in the version string. SOURCE_DATE_EPOCH provides a standard
method to address this: it should be set to the last modification time
of the source, and build processes use the specified timestamp instead
of the "current" date and time.
This change uses SOURCE_DATE_EPOCH if it is set; how it gets set needs
to be addressed elsewhere.
Reviewed by: bapt
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
typically memory mapped bus, for example on the AMD Opteron A1100 the AHCI
device is mapped in the CPUs address space, and not through a PCI
controller.
Further work is needed for this to work with ACPI as this is expected to be
common on ARMv8 servers.
Reviewed by: mav, mmel
Obtained from: mmel, ABT Systems Ltd
Relnotes: yes
Sponsored by: SoftIron Inc
Differential Revision: https://reviews.freebsd.org/D4269
changes. Use the list of MFILES found by find to identify the set of
possible auto-generated files and add the intersection of this set and
SRCS to CLEANFILES.
Submitted by: imp (previous version), sbruno
Differential Revision: https://reviews.freebsd.org/D4336
IPv4/IPv6 checksum offloading and VLAN tag insertion/stripping.
Since uether doesn't provide a way to announce driver specific offload
capabilities to upper stack, checksum offloading support needs more work
and will be done in the future.
Special thanks to Hayes Wang from RealTek who gave input.
Because of how osreldate.h was being built with newvers.sh, which always
spat out a vers.c dependent on SVN or git, the meta mode build was
considering osreldate.h to depend on the current git or SVN index. This
would lead to entire tree rebuilds when modifying git's index. There's
no reason to be generating vers.c here so just skip it.
While here, in mk-osreldate.sh rename PARAM_H to proper PARAMFILE (which
newvers.sh already has a default for) and remove unneeded export.
Sponsored by: EMC / Isilon Storage Division
Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
This should be a big no-op pass; and reduces the size of if_ath.c.
I'm hopefully soon going to take a whack at the USB support for ath(4)
and this'll require some reuse of the busdma memory code.
default and add a manual page for mlx5en. The mlx5 module contains
shared code for both infiniband and ethernet. The mlx5en module
contains specific code for ethernet functionality only. A mlx5ib
module is in the works for infiniband support.
Supported hardware:
- ConnectX-4: 10/20/25/40/50/56/100Gb/s speeds.
- ConnectX-4 LX: 10/25/40/50Gb/s speeds (low power consumption)
Refer to the mlx5en(4) manual page for a comprehensive list.
The team porting the mlx5 driver(s) to FreeBSD:
- Hans Petter Selasky <hselasky@freebsd.org>
- Oded Shanoon <odeds@mellanox.com>
- Meny Yossefi <menyy@mellanox.com>
- Shany Michaely <shanim@mellanox.com>
- Shahar Klein <shahark@mellanox.com>
- Daria Genzel <dariaz@mellanox.com>
- Mark Bloch <markb@mellanox.com>
Differential Revision: https://reviews.freebsd.org/D4163
Submitted by: Mark Block <markb@mellanox.com>
Sponsored by: Mellanox Technologies
Reviewed by: gnn @
MFC after: 3 days
It turns out on a 16550 w/ a 25MHz SoC reference clock you get a little
over 3% error at 115200 baud, which causes this to fail.
Just .. cope. Things cope these days.
Default to 30 (3.0%) as before, but allow UART_DEV_TOLERANCE_PCT to be
set at build time to change that.
QorIQ SoCs (e5500 core, P5 family) have 2 BARs for local access windows, while
MPC85XX, and P1/P2 families use only a single BAR register.
This also adds the QORIQ_DPAA option, mutually exclusive to MPC85XX, to handle
this difference.
Obtained from: Semihalf
Sponsored by: Alex Perez/Inertial Computing
as of r288992 use it to manage the CCNT.
Use the CNNT for get_cyclecount() instead of binuptime() when device pmu
is compiled in; if it fails to attach, fall back to the former method.
Enable by default for the BeagleBoneBlack configuration.
Optained from: Cambridge/L41
Sponsored by: DARPA/AFRL
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D3837
ccache is mostly beneficial for frequent builds where -DNO_CLEAN is not
used to achieve a safe pseudo-incremental build. This is explained in
more detail upstream [1] [2]. It incurs about a 20%-28% hit to populate the
cache, but with a full cache saves 30-50% in build times. When combined with
the WITH_FAST_DEPEND feature it saves up to 65% since ccache does cache the
resulting dependency file, which it does not do when using mkdep(1)/'CC
-E'. Stats are provided at the end of this message.
This removes the need to modify /etc/make.conf with the CC:= and CXX:=
lines which conflicted with external compiler support [3] (causing the
bootstrap compiler to not be built which lead to obscure failures [4]),
incorrectly invoked ccache in various stages, required CCACHE_CPP2 to avoid
Clang errors with parenthesis, and did not work with META_MODE.
The option name was picked to match the existing option in ports. This
feature is available for both in-src and out-of-src builds that use
/usr/share/mk.
Linking, assembly compiles, and pre-processing avoid using ccache since it is
only overhead. ccache does nothing special in these modes, although there is
no harm in calling it for them.
CCACHE_COMPILERCHECK is set to 'content' when using the in-tree bootstrap
compiler to hash the content of the compiler binary to determine if it
should be a cache miss. For external compilers the 'mtime' option is used
as it is more efficient and likely to be correct. Future work may optimize the
'content' check using the same checks as whether a bootstrap compiler is needed
to be built.
The CCACHE_CPP2 pessimization is currently default in our devel/ccache
port due to Clang requiring it. Clang's -Wparentheses-equality,
-Wtautological-compare, and -Wself-assign warnings do not mix well with
compiling already-pre-processed code that may have expanded macros that
trigger the warnings. GCC has so far not had this issue so it is allowed to
disable the CCACHE_CPP2 default in our port.
Sharing a cache between multiple checkouts, or systems, is explained in
the ccache manual. Sharing a cache over NFS would likely not be worth
it, but syncing cache directories between systems may be useful for an
organization. There is also a memcached backend available [5]. Due to using
an object directory outside of the source directory though you will need to
ensure that both are in the same prefix and all users use the same layout. A
possible working layout is as follows:
Source: /some/prefix/src1
Source: /some/prefix/src2
Source: /some/prefix/src3
Objdir: /some/prefix/obj
Environment: CCACHE_BASEDIR='${SRCTOP:H}' MAKEOBJDIRPREFIX='${SRCTOP:H}/obj'
This will use src*/../obj as the MAKEOBJDIRPREFIX and tells ccache to replace
all absolute paths to be relative. Using something like this is required due
to -I and -o flags containing both SRC and OBJDIR absolute paths that ccache
adds into its hash for the object without CCACHE_BASEDIR.
distcc can be hooked into by setting CCACHE_PREFIX=/usr/local/bin/distcc.
I have not personally tested this and assume it will not mix well with
using the bootstrap compiler.
The cache from buildworld can be reused in a subdir by first running
'make buildenv' (from r290424).
Note that the cache is currently different depending on whether -j is
used or not due to ccache enabling -fdiagnostics-color automatically if
stderr is a TTY, which bmake only does if not using -j.
The system I used for testing was:
WITNESS
Build options: -j20 WITH_LLDB=yes WITH_DEBUG_FILES=yes WITH_CCACHE_BUILD=yes
DISK: ZFS 3-way mirror with very slow disks using SSD l2arc/log.
The arc was fully populated with src tree files and ccache objects.
RAM: 76GiB
CPU: Intel(R) Xeon(R) CPU L5520 @2.27GHz
2 package(s) x 4 core(s) x 2 SMT threads = hw.ncpu=16
The WITH_FAST_DEPEND feature was used for comparison here as well to show
the dramatic time savings with a full cache.
buildworld:
x buildworld-before
+ buildworld-ccache-empty
* buildworld-ccache-full
% buildworld-ccache-full-fastdep
# buildworld-fastdep
+-------------------------------------------------------------------------------+
|% * # +|
|% * # +|
|% * # xxx +|
| |A |
| A|
| A |
|A |
| A |
+-------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 3 3744.13 3794.31 3752.25 3763.5633 26.935139
+ 3 4519 4525.04 4520.73 4521.59 3.1104823
Difference at 95.0% confidence
758.027 +/- 43.4565
20.1412% +/- 1.15466%
(Student's t, pooled s = 19.1726)
* 3 1823.08 1827.2 1825.62 1825.3 2.0785572
Difference at 95.0% confidence
-1938.26 +/- 43.298
-51.5007% +/- 1.15045%
(Student's t, pooled s = 19.1026)
% 3 1266.96 1279.37 1270.47 1272.2667 6.3971113
Difference at 95.0% confidence
-2491.3 +/- 44.3704
-66.1952% +/- 1.17895%
(Student's t, pooled s = 19.5758)
# 3 3153.34 3155.16 3154.2 3154.2333 0.91045776
Difference at 95.0% confidence
-609.33 +/- 43.1943
-16.1902% +/- 1.1477%
(Student's t, pooled s = 19.0569)
buildkernel:
x buildkernel-before
+ buildkernel-ccache-empty
* buildkernel-ccache-empty-fastdep
% buildkernel-ccache-full
# buildkernel-ccache-full-fastdep
@ buildkernel-fastdep
+-------------------------------------------------------------------------------+
|# @ % * |
|# @ % * x + |
|# @ % * xx ++|
| MA |
| MA|
| A |
| A |
|A |
| A |
+-------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 3 571.57 573.94 571.79 572.43333 1.3094401
+ 3 727.97 731.91 728.06 729.31333 2.2492295
Difference at 95.0% confidence
156.88 +/- 4.17129
27.4058% +/- 0.728695%
(Student's t, pooled s = 1.84034)
* 3 527.1 528.29 528.08 527.82333 0.63516402
Difference at 95.0% confidence
-44.61 +/- 2.33254
-7.79305% +/- 0.407478%
(Student's t, pooled s = 1.02909)
% 3 400.4 401.05 400.62 400.69 0.3306055
Difference at 95.0% confidence
-171.743 +/- 2.16453
-30.0023% +/- 0.378128%
(Student's t, pooled s = 0.954969)
# 3 201.94 203.34 202.28 202.52 0.73020545
Difference at 95.0% confidence
-369.913 +/- 2.40293
-64.6212% +/- 0.419774%
(Student's t, pooled s = 1.06015)
@ 3 369.12 370.57 369.3 369.66333 0.79033748
Difference at 95.0% confidence
-202.77 +/- 2.45131
-35.4225% +/- 0.428227%
(Student's t, pooled s = 1.0815)
[1] https://ccache.samba.org/performance.html
[2] http://www.mail-archive.com/ccache@lists.samba.org/msg00576.html
[3] https://reviews.freebsd.org/D3484
[5] https://github.com/jrosdahl/ccache/pull/30
PR: 182944 [4]
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Relnotes: yes
This is because the .meta files generated from filemon already contain a
list of all files read to generate the object.
X-MFC-With: r290433
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
This is especially noticeable in the kernel obj directory since it
includes so many files.
X-MFC-With: r290433
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
driver. This is taken from the MAC at boot, but can be overridden with
'options AT91_MACB_USE_RMII'.
Switch to macb for HL201 and SAM9G20EK boards. It now works both
places. Also start to sneak up on FDT for the SAM9G20EK board, but
leave disabled due to issues with MMC that haven't been resolved.
Add early debug support for the SAM9G20EK since that is required
for FDT to work presently on these SoC.
This speeds up buildworld by 16% on my system and buildkernel by 35%.
Rather than calling mkdep(1), which is just a wrapper around 'cc -E',
use the modern -MD -MT -MF flags to gather and generate dependencies during
compilation. This flag was introduced in GCC "a long time ago", in GCC 3.0,
and is also supported by Clang. (It appears that ICC also supports this but I
do not have access to test it). This avoids running the preprocessor *twice*
for every build, in both 'make depend' and 'make all'. This is especially
noticeable when using ccache since it does not cache preprocessor results from
mkdep(1) / 'cc -E', but still speeds up compilation with the -MD flags.
For 'make depend' a tree-walk is still done to ensure that all DPSRCS
are generated when expected, and that beforedepend/afterdepend and
_EXTRADEPEND are all still respected. In time this may change but for now
I've been conservative. The time for a tree-walk with -j combined with
SUBDIR_PARALLEL is not significant. For example, it takes about 9 seconds
with -j15 to walk all of src/ for 'make depend' now on my system.
A .depend file is still generated with the various rules that apply to
the final target, or custom rules. Otherwise there are now
per-built-object-file .depend files, such as .depend.filename.o. These
are included directly by make rather than populating .depend with a loop
and .depend lines, which only added overhead to the now almost-NOP 'make
depend' phase.
Before this I experimented with having mkdep(1) called in parallel per-file.
While this improved the kernel and lib/libc 'make depend' phase, it resulted
in slower build times overall.
The -M flags are removed from CFLAGS when linking since they have no effect.
Enabling this by default, for src or out-of-src, can be done once more testing
has been done, such as a ports exp-run, and with more compilers.
The system I used for testing was:
WITNESS
Build options: -j20 WITH_LLDB=yes WITH_DEBUG_FILES=yes WITH_FAST_DEPEND=yes
DISK: ZFS 3-way mirror with very slow disks using SSD l2arc/log.
The arc was fully populated with src tree files.
RAM: 76GiB
CPU: Intel(R) Xeon(R) CPU L5520 @2.27GHz
2 package(s) x 4 core(s) x 2 SMT threads = hw.ncpu=16
buildworld:
x buildworld-before
+ buildworld-fastdep
+-------------------------------------------------------------------------------+
|+ |
|+ |
|+ xx x|
| |_MA___||
|A |
+-------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 3 3744.13 3794.31 3752.25 3763.5633 26.935139
+ 3 3153.34 3155.16 3154.2 3154.2333 0.91045776
Difference at 95.0% confidence
-609.33 +/- 43.1943
-16.1902% +/- 1.1477%
(Student's t, pooled s = 19.0569)
buildkernel:
x buildkernel-before
+ buildkernel-fastdep
+-------------------------------------------------------------------------------+
|+ x |
|++ xx|
| A||
|A| |
+-------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 3 571.57 573.94 571.79 572.43333 1.3094401
+ 3 369.12 370.57 369.3 369.66333 0.79033748
Difference at 95.0% confidence
-202.77 +/- 2.45131
-35.4225% +/- 0.428227%
(Student's t, pooled s = 1.0815)
Sponsored by: EMC / Isilon Storage Division
MFC after: 3 weeks
Relnotes: yes
wrong value in the comparison, leading to incorrectly setting the new
value.
This has been observed in the ZFS code. Without this we can lose track of
the reference count in a zrlock object.
We should move to use the generic atomic functions, however as this has
been observed I would prefer to have this working, then move to the generic
functions.
PR: 204037
Sponsored by: ABT Systems Ltd
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
LinuxKPI into the kernel. This was done to keep the build rules for
the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.
Reviewed by: np @ (cxgb and cxgbe related changes only)
Sponsored by: Mellanox Technologies
This commit introduces support for etherswitch devices that utilize SMI as
a way of accessing its registers. SMI register is located in address space
of mge -- access to it was exported through MDIO interface.
Attachment functions were enhanced so as to ensure proper initialisation
in both cases: 1) PHYs attached directly to mge, 2) PHYs attached to
switch device and switch attached to mge. Attachment of etherswitch device
depends on dts entry with compatible="mrvl,sw" property. If none is found,
typical PHY attachment procedure follows.
In case of switch attached, PHYs' status and configuration is accessible
via etherswitchcfg, and ifconfig shows always-up, non-configurable mge
interfaces.
Due to the fact that there may be simultaneous accessess to SMI
registers (e.g. from PHY attached to one of mge instances and switch
to the other), SMI access interlock was added. It is SX lock,
because sleep ability is necessary -- busy-waiting would result
in poor performance due to long delays required by hardware.
Underlying switch driver is obliged to use sleepable locks as well.
Reviewed by: adrian
Obtained from: Semihalf
Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Differential revision: https://reviews.freebsd.org/D3900
It turns out that it is pretty easy to make CloudABI work on ARM64. We
essentially only need to copy over the sysvec from AMD64 and ensure that
we use ARM64 specific registers.
As there is an overlap between function argument and return registers,
we do need to extend cloudabi64_schedtail() to only set its values if
we're actually forking. Not when we're creating a new thread.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3917
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3974
to copying in some code from the armv4 busdma, and adapting a few variable
and flag names to match the surrounding mips code.
Instead of keeping a local cache of prealloced busdma_map structs on a
mutex-protected list, set up an uma zone to cache them.
Instead of all memory allocations using M_DEVBUF, use new categories
M_BUSDMA for allocations of metadata (tags, maps, segment tracking lists),
and M_BOUNCE for bounce pages.
When buffers are allocated out of the busdma_bufalloc zones the alignment
and size of the buffers is known, and the code can skip doing any "partial
cacheline flush" logic to preserve data that may be adjacent to the DMA
buffer but contain non-DMA data.
Reviewed by: adrian, imp
This commit adds support for MDIO present in the ThunderX SoC.
From the FDT point of view it is compatible with "octeon-3860-mdio"
however only C22 mode is used.
The code also implements lmac_if interface functions.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
- The driver consists of three main componens: PF, VF, BGX
- Requires appropriate entries in DTS and MDIO driver
- Supports only FDT configuration
- Multiple Tx queues and single Rx queue supported
- No RSS, HW checksum and TSO support
- No more than 8 queues per-IF (only one Queue Set per IF)
- HW statistics enabled
- Works in all available MAC modes (1,10,20,40G)
- Style converted to BSD according to style(9)
- The code brings lmac_if interface used by the BGX driver to
update its logical MACs state.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
and armv6 architecures. The primary enhancement over the old design is
support for hierarchical interrupt controllers (such as a gpio driver
which can receive interrupts from a root PIC and act as a PIC itself for
clients interested in handling a change of gpio pin state as an
interrupt). The new code also provides an infrastructure for mapping
interrupts described in metadata in the form of a "controller reference
plus interrupt number" tuple into the simple "0-n" flat numeric space
understood by rman and the bus resource mechanisms.
Use of the new code is enabled by setting the ARM_INTRNG option, and by
making a few simple changes to the platform's support code. In addition
each existing PIC driver needs changes to be ready for INTRNG; this commit
contains the changes for the arm/gic driver, which most armv6 SoCs use, but
it does not enable the new code yet on any platform.
This project has been many years in the making, starting as a GSoC project
by Jakub Klama (jceel@) in 2012. That didn't get committed right away and
the source base evolved out from under it to some degree. In 2014 I rebased
the diffs to then -current and did some enhancements in the area of mapping
interrupt numbers and storing associated fdt data, then the project went
cold again for a while. Eventually Svata Kraus took that work in progress
and did another big round of work on it, removing most of the remaining
rough edges. Finally I took that and made one more pass through it, mostly
disabling the "INTR_SOLO" feature for now, pending further design
discussions on how to most efficiently dispatch a pending interrupt through
more than one layer of PIC. The current code with the INTR_SOLO feature
disabled uses approximate 100 extra cpu cycles for each cascaded PIC the
interrupt has to be passed to, so what's left to do is about efficiency, not
correct operation.
Differential Revision: https://reviews.freebsd.org/D2047
an error.
Most of these do a 'mkdir -p' or 'install -d' before installing, but add
the trailing / here for consistency with the userland install.
MFC after: 2 weeks
X-MFC-With: r289391
Sponsored by: EMC / Isilon Storage Division
HWPMC depends on pmu.c even if device pmu is not specified.
Would be great if we could just automatically enabled "device pmu"
if we try to compile in HWPMC.
Also several old kernel cnfigurations seem to have HWPMC enabled but are
pre-FDT and thus fail. So make pmu.c depend on fdt in case of hwpmc as
well.
MFC after: 2 weeks
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D3877
To me that seems broken as certain interrupts will never be handled
properly. I'll re-open D3877 and we can seek a better solution and
try again. For now go back to that state and avoid compile time errors.
Would be great if we could just automatically enabled "device pmu"
if we try to compile in HWPMC.
MFC after: 2 weeks
Sponsored by: DARPA/AFRL
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D3877
packets and/or state transitions from each TCP socket. That would help with
narrowing down certain problems we see in the field that are hard to reproduce
without understanding the history of how we got into a certain state. This
change provides just that.
It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is
destroyed, the list is freed. I thought this was likely to be more
performance-friendly than saving copies of the tcpcb. Plus, with the packets,
you should be able to reverse-engineer what happened to the tcpcb.
To enable the feature, you will need to compile a kernel with the TCPPCAP
option. Even then, the feature defaults to being deactivated. You can activate
it by setting a positive value for the number of captured packets. You can do
that on either a global basis or on a per-socket basis (via a setsockopt call).
There is no way to get the packets out of the kernel other than using kmem or
getting a coredump. I thought that would help some of the legal/privacy concerns
regarding such a feature. However, it should be possible to add a future effort
to export them in PCAP format.
I tested this at low scale, and found that there were no mbuf leaks and the peak
mbuf usage appeared to be unchanged with and without the feature.
The main performance concern I can envision is the number of mbufs that would be
used on systems with a large number of sockets. If you save five packets per
direction per socket and have 3,000 sockets, that will consume at least 30,000
mbufs just to keep these packets. I tried to reduce the concerns associated with
this by limiting the number of clusters (not mbufs) that could be used for this
feature. Again, in my testing, that appears to work correctly.
Differential Revision: D3100
Submitted by: Jonathan Looney <jlooney at juniper dot net>
Reviewed by: gnn, hiren