Tested on Qemu/KVM, VirtualBox, and BHyVe.
Currently built as modules-only on i386/amd64. Man pages not yet hooked
up, pending review.
Submitted by: Bryan Venteicher bryanv at daemoninthecloset dot org
Reviewed by: bz
MFC after: 4 weeks or so
based on Solarflare SFC9000 family controllers. The driver supports jumbo
frames, transmit/receive checksum offload, TCP Segmentation Offload (TSO),
Large Receive Offload (LRO), VLAN checksum offload, VLAN TSO, and Receive Side
Scaling (RSS) using MSI-X interrupts.
This work was sponsored by Solarflare Communications, Inc.
My sincere thanks to Ben Hutchings for doing a lot of the hard work!
Sponsored by: Solarflare Communications, Inc.
MFC after: 3 weeks
results in the HAL being built without HAL debugging/diagnostic support,
the module building process needs to be somehow taught to not build AR5416+
NICs if AH_SUPPORT_AR5416 isn't defined in opt_ah.h .
This implies that users who are building the driver do so with
KERNBUILDDIR set to the compile/CONFIG directory so the various
opt_* sources can be pulled in.
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.
This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.
While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.
Reported by: pluknet
Reviewed by: plunket, jhb
MFC After: 3 days
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).
PR: 124667
Obtained from: NetBSD (based on)
MFC after: 3 days
take advantage of it instead of duplicating it. This reduces the size of
the i386 GENERIC kernel by about 4k. The only potential in-tree user left
unconverted is xe(4), which generally should be changed to use miibus(4)
instead of implementing PHY handling on its own, as otherwise it makes not
much sense to add a dependency on miibus(4)/mii_bitbang(4) to xe(4) just
for the MII bitbang'ing code. The common MII bitbang'ing code also is
useful in the embedded space for using GPIO pins to implement MII access.
- Based on lessons learnt with dc(4) (see r185750), add bus barriers to the
MII bitbang read and write functions of the other drivers converted in
order to ensure the intended ordering. Given that register access via an
index register as well as register bank/window switching is subject to the
same problem, also add bus barriers to the respective functions of smc(4),
tl(4) and xl(4).
- Sprinkle some const.
Thanks to the following testers:
Andrew Bliznak (nge(4)), nwhitehorn@ (bm(4)), yongari@ (sis(4) and ste(4))
Thanks to Hans-Joerg Sirtl for supplying hardware to test stge(4).
Reviewed by: yongari (subset of drivers)
Obtained from: NetBSD (partially)
drivers that only ever attach to a particular MAC driver, i.e. inphy(4),
ruephy(4) and xlphy(4), to the directory where the respective MAC driver
lives and only compile it into the kernel when the latter is also there,
also removing it from miibus.ko and moving it into the module of the
respective MAC driver.
- While at it, rename exphy.c, which comes from NetBSD where the MAC driver
it corresponds to also is named ex(4) instead of xl(4) but that in FreeBSD
actually identifies itself as xlphy(4), and its function names accordingly
for consistency.
- Additionally while at it, fix some minor style issues like whitespace
in the register headers and add multi-inclusion protection to inphyreg.h.
thanks for their contiued support to FreeBSD.
This is version 10.80.00.003 from codeset 10.2.1 [1]
Obtained from: LSI http://kb.lsi.com/Download16574.aspx [1]
build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the
assumption that the private L2 hook (which hopefully eventually will be a
pfil hook as well) can still be useful.
Allow building the module without inet as well.
Glanced at by: jhb
MFC after: 3 days
build it with and without INET/INET6 support.
Submitted by: Alexander V. Chernikov <melifaro at yandex-team.ru> [1]
Tested by: Alexander V. Chernikov <melifaro at yandex-team.ru> [1]
Approved by: re (bz)
MFC after: 2 weeks
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.
Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week
improvements:
(1) Implement new model in previously missed at91 UART driver
(2) Move BREAK_TO_DEBUGGER and ALT_BREAK_TO_DEBUGGER from opt_comconsole.h
to opt_kdb.h (spotted by np)
(3) Garbage collect now-unused opt_comconsole.h
MFC after: 3 weeks
Approved by: re (bz)
(1) opt_capsicum.h is no longer required in ffs_alloc.c, so remove the
#include.
(2) portalfs depends on opt_capsicum.h, so have the Makefile generate one
if required.
These affect only modules built without a kernel (i.e, not buildkernel,
but yes buildworld if the dubious MODULES_WITH_WORLD is used).
Approved by: re (bz)
Sponsored by: Google Inc
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.
Reviewed by: jhb
MFC after: 2 weeks
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.
This is in no way a complete DFS/radar detection implementation!
It merely creates an abstracted interface which allows for future
development of the DFS radar detection code.
Note: Net80211 already handles the bulk of the DFS machinery,
all we need to do here is figure out that a radar event has occured
and inform it as such. It then drives the DFS state engine for us.
The "null" DFS radar detection module is included by default;
it doesn't require a device line.
This commit:
* Adds a simple abstracted layer for radar detection state -
sys/dev/ath/ath_dfs/;
* Implements a null DFS module which doesn't do anything;
(ie, implements the exact behaviour at the moment);
* Adds hooks to the ath driver to process received radar events
and gives the DFS module a chance to determine whether
a radar has been detected.
Obtained from: Atheros
filters working. (All other filters - switch without L2 info rewrite,
steer, and drop - were already fully-functional).
Some contrived examples of "switch" filters with L2 rewriting:
# cxgbetool t4nex0 iport 0 dport 80 action switch vlan +9 eport 3
Intercept all packets received on physical port 0 with TCP port 80 as
destination, insert a vlan tag with VID 9, and send them out of port 3.
# cxgbetool t4nex0 sip 192.168.1.1/32 ivlan 5 action switch \
vlan =9 smac aa:bb:cc:dd:ee:ff eport 0
Intercept all packets (received on any port) with source IP address
192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac
to aa:bb:cc:dd:ee:ff, and send it out of port 0.
MFC after: 1 week
Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.
Reviewed by: marius
Tested by: sbruno
Approved by: re
Reference code that shows how to get a packet's timestamp out of
cxgbe(4). Disabled by default because we don't have a standard way
today to pass this information up the stack.
The timestamp is 60 bits wide and each increment represents 1 tick of
the T4's core clock. As an example, the timestamp granularity is ~4.4ns
for this card:
# sysctl dev.t4nex.0.core_clock
dev.t4nex.0.core_clock: 228125
MFC after: 1 week
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
it might provide useful for debugging). For most mii(4) drivers it
was unclear whether the PHYs driven by them actually support
loopback or not. Moreover, typically loopback mode also needs to
be activated on the MAC, which none of the Ethernet drivers using
mii(4) implements. Given that loopback media has no real use (and
obviously hardly had a chance to actually work) besides for driver
development (which just loopback mode should be sufficient for
though, i.e one doesn't necessary need support for loopback media)
support for it is just dropped as both NetBSD and OpenBSD already
did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
instead of discrete function pointers, and extend this to include
a "reset" entry point. Make sure any PHY-specific reset routine is
always used, and provide one for lxtphy(4) which disables MII
interrupts (as is done for a few other PHYs we have drivers for).
This includes changing NIC drivers which previously just called the
generic mii_phy_reset() to now actually call the PHY-specific reset
routine, which might be crucial in some cases. While at it, the
redundant checks in these NIC drivers for mii->mii_instance not being
zero before calling the reset routines were removed because as soon
as one PHY driver attaches mii->mii_instance is incremented and we
hardly can end up in their media change callbacks etc if no PHY driver
has attached as mii_attach() would have failed in that case and not
attach a miibus(4) instance.
Consequently, NIC drivers now no longer should call mii_phy_reset()
directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
The purpose of that function is to perform the common steps to attach
a PHY driver instance and to hook it up to the miibus(4) instance and to
optionally also handle the probing, addition and initialization of the
supported media. So all a PHY driver without any special requirements
has to do in its bus attach method is to call mii_phy_dev_attach()
along with PHY-specific MIIF_* flags, a pointer to its PHY functions
and the add_media set to one. All PHY drivers were updated to take
advantage of mii_phy_dev_attach() as appropriate. Along with these
changes the capability mask was added to the mii_softc structure so
PHY drivers taking advantage of mii_phy_dev_attach() but still
handling media on their own do not need to fiddle with the MII attach
arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
structure. Several PHY drivers require this information also after
attaching and previously had to wrap their own softc around mii_softc.
NetBSD/OpenBSD also keep track of the model and revision on their
mii_softc structure. All PHY drivers were updated to take advantage
as appropriate.
- Convert the mebers of the MII data structure to unsigned where
appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
when mapping an OUI to the MII ID registers. All PHY drivers and
miidevs where changed as necessary. Actually this now again allows to
largely share miidevs with NetBSD, which fixed this problem already
9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
weren't explicitly converted to support flow control before. It's
unclear whether flow control actually works with these but typically
it should and their net behavior should be more correct with these
changes in place than without if the MAC driver sets MIIF_DOPAUSE.
Obtained from: NetBSD (partially)
Reviewed by: yongari (earlier version), silence on arch@ and net@
- 77115: Implement support for O_DIRECT.
- 98425: Fix a performance issue introduced in 70131 that was causing
reads before writes even when writing full blocks.
- 98658: Rename the BALLOC flags from B_* to BA_* to avoid confusion with
the struct buf B_ flags.
- 100344: Merge the BA_ and IO_ flags so so that they may both be used in
the same flags word. This merger is possible by assigning the IO_ flags
to the low sixteen bits and the BA_ flags the high sixteen bits.
- 105422: Fix a file-rewrite performance case.
- 129545: Implement IO_INVAL in VOP_WRITE() by marking the buffer as
"no cache".
- Readd the DOINGASYNC() macro and use it to control asynchronous writes.
Change i-node updates to honor DOINGASYNC() instead of always being
synchronous.
- Use a PRIV_VFS_RETAINSUGID check instead of checking cr_uid against 0
directly when deciding whether or not to clear suid and sgid bits.
Submitted by: Pedro F. Giffuni giffunip at yahoo
The AR9130 is an AR9160/AR5416 family WMAC which is glued directly
to the AR913x SoC peripheral bus (APB) rather than via a PCI/PCIe
bridge.
The specifics:
* A new build option is required to use the AR9130 - AH_SUPPORT_AR9130.
This is needed due to the different location the RTC registers live
with this chip; hopefully this will be undone in the future.
This does currently mean that enabling this option will break non-AR9130
builds, so don't enable it unless you're specifically building an image
for the AR913x SoC.
* Add the new probe, attach, EEPROM and PLL methods specific to Howl.
* Add a work-around to ah_eeprom_v14.c which disables some of the checks
for endian-ness and magic in the EEPROM image if an eepromdata block
is provided. This'll be fixed at a later stage by porting the ath9k
probe code and making sure it doesn't break in other setups (which
my previous attempt at this did.)
* Sprinkle Howl modifications throughput the interrupt path - it doesn't
implement the SYNC interrupt registers, so ignore those.
* Sprinkle Howl chip powerup/down throughout the reset path; the RTC methods
were
* Sprinkle some other Howl workarounds in the reset path.
* Hard-code an alternative setup for the AR_CFG register for Howl, that
sets up things suitable for Big-Endian MIPS (which is the only platform
this chip is glued to.)
This has been tested on the AR913x based TP-Link WR-1043nd mode, in
legacy, HT/20 and HT/40 modes.
Caveats:
* 2ghz has only been tested. I've not seen any 5ghz radios glued to this
chipset so I can't test it.
* AR5416_INTERRUPT_MITIGATION is not supported on the AR9130. At least,
it isn't implemented in ath9k. Please don't enable this.
* This hasn't been tested in MBSS mode or in RX/TX block-aggregation mode.
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.
Reviewed by: jhb
MFC after: 2 weeks
create reasonably large cache for the keys that is filled when
needed. The previous version was problematic for very large providers
(hundreds of terabytes or serval petabytes). Every terabyte of data
needs around 256kB for keys. Make the default cache limit big enough
to fit all the keys needed for 4TB providers, which will eat at most
1MB of memory.
MFC after: 2 weeks
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
on amd64.
The changes are hidden under COMPAT_43.
MFC after: 1 month
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.
Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
Add systrace_linux32 and systrace_freebsd32 modules which provide
support for tracing compat system calls in addition to native system
call tracing provided by systrace module.
Provided that all the systrace modules are loaded now you can select
what syscalls to trace in the following manner:
syscall::xxx:yyy - work on all system calls that match the specification
syscall:freebsd:xxx:yyy - only native system calls
syscall:linux32:xxx:yyy - linux32 compat system calls
syscall:freebsd32:xxx:yyy - freebsd32 compat system calls on amd64
PR: kern/152822
Submitted by: Artem Belevich <fbsdlist@src.cx>
Reviewed by: jhb (earlier version)
MFC after: 3 weeks
Linux ath9k.
The ath9k ar9002_hw_init_cal() isn't entirely clear about what
is supposed to be called for what chipsets, so I'm ignoring the
rest of it and just porting the AR9285 init cal path as-is and
leaving the rest alone. Subsequent commits may also tidy up the
Merlin (AR9285) and other chipset support.
Obtained from: Linux ath9k
generally tidy up the TX power programming code.
Enforce that the TX power offset for Merlin is -5 dBm, rather than
any other value programmable in the EEPROM. This requires some
further code to be ported over from ath9k, so until that is done
and tested, fail to attach NICs whose TX power offset isn't -5
dBm.
This improves both legacy and HT transmission on my merlin board.
It allows for stable MCS TX up to MCS15.
Specifics:
* Refactor out a bunch of the TX power calibration code -
setting/obtaining the power detector / gain boundaries,
programming the PDADC
* Take the -5 dBm TX power offset into account on Merlin -
"0" in the per-rate TX power register means -5 dBm, not
0 dBm
* When doing OLC
* Enforce min (0) and max (AR5416_MAX_RATE_POWER) when fiddling
with the TX power, to avoid the TX power values from wrapping
when low.
* Implement the 1 dBm cck power offset when doing OLC
* Implement temperature compensation for 2.4ghz mode when doing OLC
* Implement an AR9280 specific TX power calibration routine which
includes the OLC twiddles, leaving the earlier chipset path
(AR5416, AR9160) alone
Whilst here, use these refactored routines for the AR9285 TX power
calibration/programming code and enforce correct overflow/underflow
handling when fiddling with TX power values.
Obtained from: linux ath9k
Few new things available from now on:
- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
transaction group.
- Possibility to import pool in read-only mode.
MFC after: 1 month
The AR5416 and later TX descriptors have new fields for supporting
11n bits (eg 20/40mhz mode, short/long GI) and enabling/disabling
RTS/CTS protection per rate.
These functions will be responsible for initialising the TX descriptors
for the AR5416 and later chips for both HT and legacy frames.
Beacon frames will remain using the non-11n TX descriptor setup for now;
Linux ath9k does much the same.
Note that these functions aren't yet used anywhere; a few more framework
changes are needed before all of the right rate information is available
for TX.
algorithm described in the paper "Improved coexistence and loss tolerance for
delay based TCP congestion control" by Hayes and Armitage. It is implemented as
a kernel module compatible with the recently committed modular congestion
control framework.
CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide
tolerance to non-congestion related packet loss and improvements to coexistence
with loss-based congestion control algorithms. A key idea in improving
coexistence with loss-based congestion control algorithms is the use of a shadow
window, which attempts to track how NewReno's congestion window (cwnd) would
evolve. At the next packet loss congestion event, CHD uses the shadow window to
correct cwnd in a way that reduces the amount of unfairness CHD experiences when
competing with loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
algorithm based on the paper "A strategy for fair coexistence of loss and
delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and
Baker. It is implemented as a kernel module compatible with the recently
committed modular congestion control framework.
HD uses a probabilistic approach to reacting to delay-based congestion. The
probability of reducing cwnd is zero when the queuing delay is very small,
increasing to a maximum at a set threshold, then back down to zero again when
the queuing delay is high. Normal operation keeps the queuing delay below the
set threshold. However, since loss-based congestion control algorithms push the
queuing delay high when probing for bandwidth, having the probability of
reducing cwnd drop back to zero for high delays allows HD to compete with
loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
based on the paper "TCP Vegas: end to end congestion avoidance on a global
internet" by Brakmo and Peterson. It is implemented as a kernel module
compatible with the recently committed modular congestion control framework.
VEGAS uses network delay as a congestion indicator and unlike regular loss-based
algorithms, attempts to keep the network operating with stable queuing delays
and no congestion losses. By keeping network buffers used along the path within
a set range, queuing delays are kept low while maintaining high throughput.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
There's two reasons for this:
* the raw and non-raw TX path shares a lot of duplicate code which should be
refactored;
* the 11n-ready chip TX path needs a little reworking.
Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low
noise estimate of the instantaneous RTT. ERTT's implementation is robust even in
the face of delayed acknowledgements and/or TSO being in use for a connection.
A high quality, low noise RTT estimate is a requirement for applications such as
delay-based congestion control, for which we will be importing some algorithm
implementations shortly.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months
sys/dev/ath/ath_hal/ar5416/ is getting very crowded and further
commits will make it even more crowded. Now is a good time to
shuffle these files out before any more extensive work is done
on them.
Create an ar9003 directory whilst I'm here; ar9003 specific
chipset code will eventually live there.
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.
MFC after: 1 month