CTL is a disk and processor device emulation subsystem originally written
for Copan Systems under Linux starting in 2003. It has been shipping in
Copan (now SGI) products since 2005.
It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
(who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
available under a BSD-style license. The intent behind the agreement was
that Spectra would work to get CTL into the FreeBSD tree.
Some CTL features:
- Disk and processor device emulation.
- Tagged queueing
- SCSI task attribute support (ordered, head of queue, simple tags)
- SCSI implicit command ordering support. (e.g. if a read follows a mode
select, the read will be blocked until the mode select completes.)
- Full task management support (abort, LUN reset, target reset, etc.)
- Support for multiple ports
- Support for multiple simultaneous initiators
- Support for multiple simultaneous backing stores
- Persistent reservation support
- Mode sense/select support
- Error injection support
- High Availability support (1)
- All I/O handled in-kernel, no userland context switch overhead.
(1) HA Support is just an API stub, and needs much more to be fully
functional.
ctl.c: The core of CTL. Command handlers and processing,
character driver, and HA support are here.
ctl.h: Basic function declarations and data structures.
ctl_backend.c,
ctl_backend.h: The basic CTL backend API.
ctl_backend_block.c,
ctl_backend_block.h: The block and file backend. This allows for using
a disk or a file as the backing store for a LUN.
Multiple threads are started to do I/O to the
backing device, primarily because the VFS API
requires that to get any concurrency.
ctl_backend_ramdisk.c: A "fake" ramdisk backend. It only allocates a
small amount of memory to act as a source and sink
for reads and writes from an initiator. Therefore
it cannot be used for any real data, but it can be
used to test for throughput. It can also be used
to test initiators' support for extremely large LUNs.
ctl_cmd_table.c: This is a table with all 256 possible SCSI opcodes,
and command handler functions defined for supported
opcodes.
ctl_debug.h: Debugging support.
ctl_error.c,
ctl_error.h: CTL-specific wrappers around the CAM sense building
functions.
ctl_frontend.c,
ctl_frontend.h: These files define the basic CTL frontend port API.
ctl_frontend_cam_sim.c: This is a CTL frontend port that is also a CAM SIM.
This frontend allows for using CTL without any
target-capable hardware. So any LUNs you create in
CTL are visible in CAM via this port.
ctl_frontend_internal.c,
ctl_frontend_internal.h:
This is a frontend port written for Copan to do
some system-specific tasks that required sending
commands into CTL from inside the kernel. This
isn't entirely relevant to FreeBSD in general,
but can perhaps be repurposed.
ctl_ha.h: This is a stubbed-out High Availability API. Much
more is needed for full HA support. See the
comments in the header and the description of what
is needed in the README.ctl.txt file for more
details.
ctl_io.h: This defines most of the core CTL I/O structures.
union ctl_io is conceptually very similar to CAM's
union ccb.
ctl_ioctl.h: This defines all ioctls available through the CTL
character device, and the data structures needed
for those ioctls.
ctl_mem_pool.c,
ctl_mem_pool.h: Generic memory pool implementation used by the
internal frontend.
ctl_private.h: Private data structres (e.g. CTL softc) and
function prototypes. This also includes the SCSI
vendor and product names used by CTL.
ctl_scsi_all.c,
ctl_scsi_all.h: CTL wrappers around CAM sense printing functions.
ctl_ser_table.c: Command serialization table. This defines what
happens when one type of command is followed by
another type of command.
ctl_util.c,
ctl_util.h: CTL utility functions, primarily designed to be
used from userland. See ctladm for the primary
consumer of these functions. These include CDB
building functions.
scsi_ctl.c: CAM target peripheral driver and CTL frontend port.
This is the path into CTL for commands from
target-capable hardware/SIMs.
README.ctl.txt: CTL code features, roadmap, to-do list.
usr.sbin/Makefile: Add ctladm.
ctladm/Makefile,
ctladm/ctladm.8,
ctladm/ctladm.c,
ctladm/ctladm.h,
ctladm/util.c: ctladm(8) is the CTL management utility.
It fills a role similar to camcontrol(8).
It allow configuring LUNs, issuing commands,
injecting errors and various other control
functions.
usr.bin/Makefile: Add ctlstat.
ctlstat/Makefile
ctlstat/ctlstat.8,
ctlstat/ctlstat.c: ctlstat(8) fills a role similar to iostat(8).
It reports I/O statistics for CTL.
sys/conf/files: Add CTL files.
sys/conf/NOTES: Add device ctl.
sys/cam/scsi_all.h: To conform to more recent specs, the inquiry CDB
length field is now 2 bytes long.
Add several mode page definitions for CTL.
sys/cam/scsi_all.c: Handle the new 2 byte inquiry length.
sys/dev/ciss/ciss.c,
sys/dev/ata/atapi-cam.c,
sys/cam/scsi/scsi_targ_bh.c,
scsi_target/scsi_cmds.c,
mlxcontrol/interface.c: Update for 2 byte inquiry length field.
scsi_da.h: Add versions of the format and rigid disk pages
that are in a more reasonable format for CTL.
amd64/conf/GENERIC,
i386/conf/GENERIC,
ia64/conf/GENERIC,
sparc64/conf/GENERIC: Add device ctl.
i386/conf/PAE: The CTL frontend SIM at least does not compile
cleanly on PAE.
Sponsored by: Copan Systems, SGI and Spectra Logic
MFC after: 1 month
Implement ffcounter, a monotonically increasing cumulative counter on top of the
active timecounter. Provide low-level functions to read the ffcounter and
convert it to absolute time or a time interval in seconds using the current
ffclock estimates, which track the drift of the oscillator. Add a ring of
fftimehands to track passing of time on each kernel tick and pick up updates of
ffclock estimates.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.
For more information, see http://www.synclab.org/radclock/
Submitted by: Julien Ridoux (jridoux at unimelb edu au)
allocator with UMA backed jumbo allocator by default. Previously
ti(4) used sf_buf(9) interface for jumbo buffers but it was broken
at this moment such that enabling jumbo frame caused instant panic.
Due to the nature of sf_buf(9) it heavily relies on VM changes but
it seems ti(4) was not received much blessing from VM gurus. I
don't understand VM magic and implications used in driver either.
Switching to UMA backed jumbo allocator like other network drivers
will make jumbo frame work on ti(4).
While I'm here, fully allocate all RX buffers. This means ti(4) now
uses 512 RX buffer and 1024 mini RX buffers.
To use sf_buf(9) interface for jumbo buffers, introduce a new
'options TI_SF_BUF_JUMBO'. If it is proven that sf_buf(9) is better
for jumbo buffers, interesting developers can fix the issue in
future.
ti(4) still needs more bus_dma(9) cleanups and should use separate
DMA tag/map for each ring(standard, jumbo, mini, command, event
etc) but it should work on all platforms except PAE.
Special thanks to Jay[1] who provided complete remote debugging
access.
Tested by: Jay Borkenhagen <jayb <> braeburn dot org > [1]
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.
This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.
No MFC is expected for this patch.
Tested by: gianni
Reviewed by: kib
replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
configuration files. Besides duplicating functionality, amd(4), which
previously also supported the AMD Am53C974, unlike esp(4) is no longer
maintained and has accumulated enough bit rot over time to always cause
a panic during boot as long as at least one target is attached to it
(see PR 124667).
PR: 124667
Obtained from: NetBSD (based on)
MFC after: 3 days
take advantage of it instead of duplicating it. This reduces the size of
the i386 GENERIC kernel by about 4k. The only potential in-tree user left
unconverted is xe(4), which generally should be changed to use miibus(4)
instead of implementing PHY handling on its own, as otherwise it makes not
much sense to add a dependency on miibus(4)/mii_bitbang(4) to xe(4) just
for the MII bitbang'ing code. The common MII bitbang'ing code also is
useful in the embedded space for using GPIO pins to implement MII access.
- Based on lessons learnt with dc(4) (see r185750), add bus barriers to the
MII bitbang read and write functions of the other drivers converted in
order to ensure the intended ordering. Given that register access via an
index register as well as register bank/window switching is subject to the
same problem, also add bus barriers to the respective functions of smc(4),
tl(4) and xl(4).
- Sprinkle some const.
Thanks to the following testers:
Andrew Bliznak (nge(4)), nwhitehorn@ (bm(4)), yongari@ (sis(4) and ste(4))
Thanks to Hans-Joerg Sirtl for supplying hardware to test stge(4).
Reviewed by: yongari (subset of drivers)
Obtained from: NetBSD (partially)
- Axe out the SHOW_BUSYBUFS option and uses a tunable for selectively
enable/disable it, which is defaulted for not printing anything (0
value) but can be changed for printing (1 value) and be verbose (2
value)
- Improves the informations outputed: right now, there is no track of
the actual struct buf object or vnode which are referenced by the
shutdown process, but it is printed the related struct bufobj object
which is not really helpful
- Add more verbosity about the state of the struct buf lock and the
vnode informations, with the latter to be activated separately by the
sysctl
Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib
Approved by: re (ksmith)
MFC after: 10 days
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.
New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.
When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.
This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.
Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
This is done per request/suggestion from John Baldwin
who introduced the option. Trying to resume normal
system operation after a panic is very unpredictable
and dangerous. It will become even more dangerous
when we allow a thread in panic(9) to penetrate all
lock contexts.
I understand that the only purpose of this option was
for testing scenarios potentially resulting in panic.
Suggested by: jhb
Reviewed by: attilio, jhb
X-MFC-After: never
Approved by: re (kib)
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.
No MFC is previewed for this patch.
Tested by: pluknet
Approved by: re (kib)
option that is highly recommended to be adjusted in too much
documentation while doing nothing in FreeBSD since r2729 (rev 1.1).
ipcs(1) needs to be recompiled as it is accessing _KERNEL private
variables.
Reviewed by: jhb (before comment change on linux code)
Sponsored by: Sandvine Incorporated
This option will enable Capsicum capabilities, which provide a fine-grained
mask on operations that can be performed on file descriptors.
Approved by: mentor (rwatson), re (Capsicum blanket ok)
Sponsored by: Google Inc
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.
Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.
Approved by: rwatson
Sponsored by: Google UK Ltd
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64
implements its own assembly primitives for tracing events and needs to
properly check it. Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.
Tested and fixed by: pluknet
Reviewed by: marius
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
- To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
- Add some more details to UPDATING.
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).
ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
on the set of rules it maintains and the current resource usage. It also
privides userland API to manage that ruleset.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
compiled conditionally on options CAPABILITIES:
Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.
Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.
Export the capability mode flag via process information sysctls.
Sponsored by: Google, Inc.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months
The controller is commonly found on DM&P Vortex86 x86 SoC. The
driver supports all hardware features except flow control. The
flow control was intentionally disabled due to silicon bug.
DM&P Electronics, Inc. provided all necessary information including
sample board to write driver and answered many questions I had.
Many thanks for their support of FreeBSD.
H/W donated by: DM&P Electronics, Inc.
has device mem in it almost everywhere, we get warnings about
duplicated device almost everywhere. Comment it out, with a note
about why, so that we don't get those warnings.
Keep three lines disabled which I am unsure if they had been used at all.
This will allow us to seek testers and possibly bring it all back.
Discussed with: rwatson
MFC after: 7 weeks
As des noted, the section on SCTP would benefit from
a rewrite by a native speaker (which I am not).
Any volunteers?
Approved by: des (mentor)
MFC after: 1 week
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.
Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.
This code is based on work by Ron Steinke at Isilon Systems.
Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)
passing through. Modifications are restricted to a subset of C language
operations on unsigned integers of 8, 16, 32 or 64 bit size.
These are: set to new value (=), addition (+=), subtraction (-=),
multiplication (*=), division (/=), negation (= -), bitwise AND (&=),
bitwise OR (|=), bitwise eXclusive OR (^=), shift left (<<=),
shift right (>>=). Several operations are all applied to a packet
sequentially in order they were specified by user.
Submitted by: Maxim Ignatenko <gelraen.ua at gmail.com>
Vadim Goncharov <vadimnuclight at tpu.ru>
Discussed with: net@
Approved by: mav (mentor)
MFC after: 1 month
The driver is stub. It just creates device entry and feeds
reassembled packets from hardware into it.
If in future we would port wsmouse(4) from NetBSD, or make
sysmouse(4) to support absolute motion events, then the driver
can be extended to act as system mouse. Meanwhile, it just
presents a /dev/uep0, that can be utilized by X driver, that
I am going to commit to ports tree soon.
The name for the driver is chosen to be the same as in NetBSD,
however, due to different USB stacks this driver isn't a port.
driver for CAM ATA subsystem. This driver supports same hardware as
atamarvell, ataadaptec and atamvsata drivers from ata(4), but provides
many additional features, such as NCQ, PMP, etc.
that generates a fatal bus trap. Normally, the chips are setup to do
128 byte DMA bursts, but when on this CPU, they can only safely due
4-byte DMA bursts due to this bug. Details of the exact nature of the
bug are sketchy, but some can be found at
https://forum.openwrt.org/viewtopic.php?pid=70060 on pages 4, 5 and 6.
There's a small performance penalty associated with this workaround,
so it is only enabled when needed on the Atheros AR71xx platforms.
Unfortunately, this condition is impossible to detect at runtime
without MIPS specific ifdefs. Rather than cast an overly-broad net
like Linux/OpenWRT dues (which enables this workaround all the time on
MIPS32 platforms), we put this option in the kernel for just the
affected machines. Sam didn't like this aspect of the patch when he
reviewed it, and I'd love to hear sane proposals on how to fix it :)
Reviewed by: sam@
This driver was written by Alexander Pohoyda and greatly enhanced
by Nikolay Denev. I don't have these hardwares but this driver was
tested by Nikolay Denev and xclin.
Because SiS didn't release data sheet for this controller, programming
information came from Linux driver and OpenSolaris. Unlike other open
source driver for SiS190/191, sge(4) takes full advantage of TX/RX
checksum offloading and does not require additional copy operation in
RX handler.
The controller seems to have advanced offloading features like VLAN
hardware tag insertion/stripping, TCP segmentation offload(TSO) as
well as jumbo frame support but these features are not available
yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw>
who sent fix for receiving VLAN oversized frames.
from standard 3G wireless units by supplying a raw IP/IPv6 endpoint rather than
using PPP over serial. uhsoctl(1) is used to initiate and close the WAN
connection.
Obtained from: Fredrik Lindberg <fli@shapeshifter.se>
While the name is pretentious, a good explanation of its targets is
reported in this 17 months old presentation e-mail:
http://lists.freebsd.org/pipermail/freebsd-arch/2008-August/008452.html
In order to implement it, the sq_type in sleepqueues is mandatory and not
only compiled along with INVARIANTS option. Additively, a new sleepqueue
function, sleepq_type() is added, returning the type of the sleepqueue
linked to a wchan.
Three new sysctls are added in order to configure the thread:
debug.deadlkres.slptime_threshold
debug.deadlkres.blktime_threshold
debug.deadlkres.sleepfreq
rappresenting the thresholds for sleep and block time that will lead to
a deadlock matching (when exceeded), while the sleepfreq rappresents the
number of seconds between 2 consecutive thread runnings.
In order to enable the deadlock resolver thread recompile your kernel
with the option DEADLKRES.
Reviewed by: jeff
Tested by: pho, Giovanni Trematerra
Sponsored by: Nokia Incorporated, Sandvine Incorporated
MFC after: 2 weeks
Introduce ATA_CAM kernel option, turning ata(4) controller drivers into
cam(4) interface modules. When enabled, this options deprecates all ata(4)
peripheral drivers (ad, acd, ...) and interfaces and allows cam(4) drivers
(ada, cd, ...) and interfaces to be natively used instead.
As side effect of this, ata(4) mode setting code was completely rewritten
to make controller API more strict and permit above change. While doing
this, SATA revision was separated from PATA mode. It allows DMA-incapable
SATA devices to operate and makes hw.ata.atapi_dma tunable work again.
Also allow ata(4) controller drivers (except some specific or broken ones)
to handle larger data transfers. Previous constraint of 64K was artificial
and is not really required by PCI ATA BM specification or hardware.
Submitted by: nwitehorn (powerpc part)
Right now syscons(4) uses a cons25-style terminal emulator. The
disadvantages of that are:
- Little compatibility with embedded devices with serial interfaces.
- Bad bandwidth efficiency, mainly because of the lack of scrolling
regions.
- A very hard transition path to support for modern character sets like
UTF-8.
Our terminal emulation library, libteken, has been supporting
xterm-style terminal emulation for months, so flip the switch and make
everyone use an xterm-style console driver.
I still have to enable this on i386. Right now pc98 and i386 share the
same /etc/ttys file. I'm not going to switch pc98, because it uses its
own Kanji-capable cons25 emulator.
IMPORTANT: What to do if things go wrong (i.e. graphical artifacts):
- Run the application inside script(1), try to reduce the problem and
send me the log file.
- In the mean time, you can run `vidcontrol -T cons25' and `export
TERM=cons25' so you can run applications the same way you did before.
You can also build your kernel with `options TEKEN_CONS25' to make all
virtual terminals use the cons25 emulator by default.
Discussed on: current@
splitting in bce(4) instead of (ab)using ZERO_COPY_SOCKETS that was not
propagated into if_bce.c anyway. It is disabled by default.
Approved by: davidch
MFC after: 3 days
x86emu to this new module.
This changeset also brings a fix for bugs introduced with the initial
x86emu commit, which prevents the user from using some display mode or
cause instant reboots during mode switch.
Submitted by: paradox <ddkprog yahoo com>
driver still, it generally works well for most people most of the
time. It is still too green for GENERIC, however.
Submitted by: many (latest being kwm@)
MFC after: 2 days (before RC1 if possible)
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.
Authentication and encryption are not implemented.
There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).
A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.
Drivers that support mesh networks right now are: ath, ral and mwl.
More information at: http://wiki.freebsd.org/WifiMesh
Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.
Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.
Reviewed by: sam
Approved by: re (kensmith)
Obtained from: projects/mesh11s
require COMPAT_FREEBSD7. Also, explicitly note in NOTES that any version
of COMPAT_FREEBSD<n> effectively requires for newer binaries (i.e.
COMPAT_FREEBSD<n+1>, etc.). While this has been true in practice
previously, it used to compile ok before the commit earlier this week.
Discussed with: peter
Approved by: re (kensmith)
DP83065 Saturn Gigabit Ethernet controllers. These are the successors
of the Sun GEM controllers and still have a similar but extended transmit
logic. As such this driver is based on gem(4).
Thanks to marcel@ for providing a Sun Quad GigaSwift Ethernet UTP (QGE)
card which was vital for getting this driver to work on architectures
not using Open Firmware.
Approved by: re (kib)
MFC after: 2 weeks
Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...
X-MFC: never
Reviewed by: bz
Approved by: gnn(mentor)
Obtained from: NETASQ
controller. These controllers are also known as L1C(AR8131) and
L2C(AR8132) respectively. These controllers resembles the first
generation controller L1 but usage of different descriptor format
and new register mappings over L1 register space requires a new
driver. There are a couple of registers I still don't understand
but the driver seems to have no critical issues for performance and
stability. Currently alc(4) supports the following hardware
features.
o MSI
o TCP Segmentation offload
o Hardware VLAN tag insertion/stripping
o Tx/Rx interrupt moderation
o Hardware statistics counters(dev.alc.%d.stats)
o Jumbo frame
o WOL
AR8131/AR8132 also supports Tx checksum offloading but I disabled
it due to stability issues. I'm not sure this comes from broken
sample boards or hardware bugs. If you know your controller works
without problems you can still enable it. The controller has a
silicon bug for Rx checksum offloading, so the feature was not
implemented.
I'd like to say big thanks to Atheros. Atheros kindly sent sample
boards to me and answered several questions I had.
HW donated by: Atheros Communications, Inc.
with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option
as there is no more space for more SOL_SOCKET options, but this option also
fits better as an IP socket option, it seems.
- Implement this functionality also for IPv6 and RAW IP sockets.
- Always compile it in (don't use additional kernel options).
- Remove sysctl to turn this functionality on and off.
- Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this
functionality (currently only unjail root can use it).
Discussed with: julian, adrian, jhb, rwatson, kmacy
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
includes support for NFSv4. The subsystem can optionally be linked
into the kernel using the two options:
NFSCL - the client
NFSD - the server
It is also built as three modules:
nfscl - the client
nfsd - the server
nfscommon - functions shared by the client and server
Approved by: kib (mentor)
get a quick snapshot of the kernel's symbol table including the symbols
from any loaded modules (the symbols are all merged into one symbol
table). Unlike like other implementations, this ksyms driver maps
memory in the process memory space to store the snapshot at the time
/dev/ksyms is opened. It also checks to see if the process has already
a snapshot open and won't allow it to open /dev/ksyms it again until it
closes first. This prevents kernel and process memory from being
exhausted. Note that /dev/ksyms is used by the lockstat(1) command.
Reviewed by: gallatin kib (freebsd-arch)
Approved by: gnn (mentor)
kernel option.
This also permits tuning of the option per virtual network stack, as
well as separately per inet, inet6.
The kernel option is left for a transition period, marked deprecated,
and will be removed soon.
Initially requested by: phk (1 year 1 day ago)
MFC after: 4 weeks
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
runtime with 'sysctl net.inet.flowtable.enable=1'.
- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
their kernel config files.
- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
more review before bringing in changes that require more churn.
Supported by: Bitgravity Inc.
naming of the partitions (GEOM_PART_EBR_COMPAT). When
compatibility is enabled, changes to the partitioning are
disallowed.
Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.
Enable compatibility on amd64 and i386.
driver in Linux 2.6. uscanner was just a simple wrapper around a fifo and
contained no logic, the default interface is now libusb (supported by sane).
Reviewed by: HPS
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.
Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:
if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd
Drivers that were already disabled because of tty changes:
if_ppp
if_sl
Discussed on: arch@
very well maintained and point user to sysutils/fusefs-ntfs, which
at the time of this writing seems to be a better alternative.
Suggested by: luigi
MFC after: 2 weeks
The teken library already supports UTF-8 handling and xterm emulation,
but we have reasons to disable this right now. Because we should make it
easy and interesting for people to experiment with these features, allow
them to be set in kernel configuration files.
Before this commit we had a flag called `TEKEN_CONS25' to enable
cons25-style emulation. I'm calling it the opposite now, `TEKEN_XTERM',
because we want to enable it in kernel configuration files explicitly.
Requested by: kib
applications to specify a non-local IP address when bind()'ing a socket
to a local endpoint.
This allows applications to spoof the client IP address of connections
if (obviously!) they somehow are able to receive the traffic normally
destined to said clients.
This patch doesn't include any changes to ipfw or the bridging code to
redirect the client traffic through the PCB checks so TCP gets a shot
at it. The normal behaviour is that packets with a non-local destination
IP address are not handled locally. This can be dealth with some IPFW hackery;
modifications to IPFW to make this less hacky will occur in subsequent
commmits.
Thanks to Julian Elischer and others at Ironport. This work was approved
and donated before Cisco acquired them.
Obtained from: Julian Elischer and others
MFC after: 2 weeks
module. These files cause manual interaction when building
ports/audio/aureal-kmod which provides a usable i386-only driver (it requires
linking against some linux object files distributed by vendor which bankrupted
back in 2000).
MFC after: 1 week
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
controller. The controller is also known as L1E(AR8121) and
L2E(AR8113/AR8114). Unlike its predecessor Attansic L1,
AR8121/AR8113/AR8114 uses completely different Rx logic such that
it requires separate driver. Datasheet for AR81xx is not available
to open source driver writers but it shares large part of Tx and
PHY logic of L1. I still don't understand some part of register
meaning and some MAC statistics counters but the driver seems to
have no critical issues for performance and stability.
The AR81xx requires copy operation to pass received frames to upper
stack such that ale(4) consumes a lot of CPU cycles than that of
other controller. A couple of silicon bugs also adds more CPU
cycles to address the known hardware bug. However, if you have fast
CPU you can still saturate the link.
Currently ale(4) supports the following hardware features.
- MSI.
- TCP Segmentation offload.
- Hardware VLAN tag insertion/stripping with checksum offload.
- Tx TCP/UDP checksum offload and Rx IP/TCP/UDP checksum offload.
- Tx/Rx interrupt moderation.
- Hardware statistics counters.
- Jumbo frame.
- WOL.
AR81xx PCIe ethernet controllers are mainly found on ASUS EeePC or
P5Q series of ASUS motherboards. Special thanks to Jeremy Chadwick
who sent the hardware to me. Without his donation writing a driver
for AR81xx would never have been possible. Big thanks to all people
who reported feedback or tested patches.
HW donated by: koitsu
Tested by: bsam, Joao Barros <joao.barros <> gmail DOT com >
Jan Henrik Sylvester <me <> janh DOT de >
Ivan Brawley < ivan <> brawley DOT id DOT au >,
CURRENT ML
Because the TTY hooks interface was not finished when I imported the
MPSAFE TTY layer, I had to disconnect the snp(4) driver. This snp(4)
implementation has been sitting in my P4 branch for some time now.
Unfortunately it still doesn't use the same error handling as snp(4)
(returning codes through FIONREAD), but it should already be usable.
I'm committing this to SVN, hoping someone else could polish off its
rough edges. It's always better than having a broken driver sitting in
the tree.
compiled into the main AMR driver. It's code that is nice to have but not
required for normal operation, and it is reported to cause problems for some
people.
Driver supports PCI devices with class 8 and subclass 5 according to
SD Host Controller Specification.
Update NOTES, enable module and static build.
Enable related mmc and mmcsd modules build.
Discussed on: mobile@, current@
This was located in the ubsa driver, but should be moved into a separate
driver:
- 3G modems provide multiple serial ports to allow AT commands while the PPP
connection is up.
- 3G modems do not provide baud rate or other serial port settings.
- Huawei cards need specific initialisation.
- ubsa is for Belkin adapters, an Linuxy choice for another device like 3G.
Speeds achieved here with a weak signal at best is ~40kb/s (UMTS). No spooky
STALLED messages as well.
Next: Move over all entries for Sierra and Novatel cards once I have found
testers, and implemented serial port enumeration for Sierra (or rather have
Andrea Guzzo do it). They list all endpoints in 1 iface instead of 4 ifaces.
Submitted by: aguzzo@anywi.com
MFC after: 3 weeks
we ran into in the past where places hidden by TCP_SIGNATURE were
missed.
It is possible to turn it on now that FAST_IPSEC (now know as IPSEC)
is enabled for LINT and the default and only IPsec implementation.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
As clearly mentioned on the mailing lists, there is a list of drivers
that have not been ported to the MPSAFE TTY layer yet. Remove them from
the kernel configuration files. This means people can now still use
these drivers if they explicitly put them in their kernel configuration
file, which is good.
People should keep in mind that after August 10, these drivers will not
work anymore. Even though owners of the hardware are capable of getting
these drivers working again, I will see if I can at least get them to a
compilable state (if time permits).
MPSAFE patches on current@ and stable@. This driver also has a fundamental
issue in that it sleeps when sending commands to the card including in the
if_init/if_start routines (which can be called from interrupt context). As
such, the driver shouldn't be working reliably even on 4.x.
and stable@. It also is a driver for an older non-802.11 wireless PC card
that is quite slow in comparison to say, wi(4). I know Warner wants this
driver axed as well.
NET_NEEDS_GIANT. netatm has been disconnected from the build for ten
months in HEAD/RELENG_7. Specifics:
- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.
MFC after: 3 weeks
Reviewed by: bz
Discussed with: bms, bz, harti
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
to profile outoing packets for a number of mbuf chain
related parameters
e.g. number of mbufs, wasted space.
probably will do with further work later.
Reviewed by: various
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.
Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
and t3_push_frames).
- Import latest changes to cxgb_main.c and cxgb_sge.c from toestack p4 branch
- make driver local copy of tcp_subr.c and tcp_usrreq.c and override tcp_usrreqs so
TOE can also functions on versions with unmodified TCP
- add cxgb back to the build
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.
INCLUDE_CONFIG_FILE. Make a user to look at what config(8) actually does,
and how can one fetch actual configuration file.
Reported by: many
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.
Please don't forget to update your own custom config kernel files.
Approved by: cognet, marcel (maintainers of arches where option is
not enabled at the moment)
to gem_attach() as the former access softc members not yet initialized
at that time and gem_reset() actually is enough to stop the chip. [1]
o Revise the use of gem_bitwait(); add bus_barrier() calls before calling
gem_bitwait() to ensure the respective bit has been written before we
starting polling on it and poll for the right bits to change, f.e. even
though we only reset RX we have to actually wait for both GEM_RESET_RX
and GEM_RESET_TX to clear. Add some additional gem_bitwait() calls in
places we've been missing them according to the GEM documentation.
Along with this some excessive DELAYs, which probably only were added
because of bugs in gem_bitwait() and its use in the first place, as
well as as have of an gem_bitwait() reimplementation in gem_reset_tx()
were removed.
o Add gem_reset_rxdma() and use it to deal with GEM_MAC_RX_OVERFLOW errors
more gracefully as unlike gem_init_locked() it resets the RX DMA engine
only, causing no link loss and the FIFOs not to be cleared. Also use it
deal with GEM_INTR_RX_TAG_ERR errors, with previously were unhandled.
This was based on information obtained from the Linux GEM and OpenSolaris
ERI drivers.
o Turn on workarounds for silicon bugs in the Apple GMAC variants.
This was based on information obtained from the Darwin GMAC and Linux GEM
drivers.
o Turn on "infinite" (i.e. maximum 31 * 64 bytes in length) DMA bursts.
This greatly improves especially RX performance.
o Optimize the RX path, this consists of:
- kicking the receiver as soon as we've a spare descriptor in gem_rint()
again instead of just once after all the ready ones have been handled;
- kicking the receiver the right way, i.e. as outlined in the GEM
documentation in batches of 4 and by pointing it to the descriptor
after the last valid one;
- calling gem_rint() before gem_tint() in gem_intr() as gem_tint() may
take quite a while;
- doubling the size of the RX ring to 256 descriptors.
Overall the RX performance of a GEM in a 1GHz Sun Fire V210 was improved
from ~100Mbit/s to ~850Mbit/s.
o In gem_add_rxbuf() don't assign the newly allocated mbuf to rxs_mbuf
before calling bus_dmamap_load_mbuf_sg(), if bus_dmamap_load_mbuf_sg()
fails we'll free the newly allocated mbuf, unable to recycle the
previous one but a NULL pointer dereference instead.
o In gem_init_locked() honor the return value of gem_meminit().
o Simplify gem_ringsize() and dont' return garbage in the default case.
Based on OpenBSD.
o Don't turn on MAC control, MIF and PCS interrupts unless GEM_DEBUG is
defined as we don't need/use these interrupts for operation.
o In gem_start_locked() sync the DMA maps of the descriptor rings before
every kick of the transmitter and not just once after enqueuing all
packets as the NIC might instantly start transmitting after we kicked
it the first time.
o Keep state of the link state and use it to enable or disable the MAC
in gem_mii_statchg() accordingly as well as to return early from
gem_start_locked() in case the link is down. [3]
o Initialize the maximum frame size to a sane value.
o In gem_mii_statchg() enable carrier extension if appropriate.
o Increment if_ierrors in case of an GEM_MAC_RX_OVERFLOW error and in
gem_eint(). [3]
o Handle IFF_ALLMULTI correctly; don't set it if we've turned promiscuous
group mode on and don't clear the flag if we've disabled promiscuous
group mode (these were mostly NOPs though). [2]
o Let gem_eint() also report GEM_INTR_PERR errors.
o Move setting sc_variant from gem_pci_probe() to gem_pci_attach() as
device probe methods are not supposed to touch the softc.
o Collapse sc_inited and sc_pci into bits for sc_flags.
o Add CTASSERTs ensuring that GEM_NRXDESC and GEM_NTXDESC are set to
legal values.
o Correctly set up for 802.3x flow control, though #ifdef out the code
that actually enables it as this needs more testing and mainly a proper
framework to support it.
o Correct and add some conversions from hard-coded functions names to
__func__ which were borked or forgotten in if_gem.c rev. 1.42.
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o In gem_pci_attach() in case attaching fails release the resources in
the opposite order they were allocated.
o Make gem_reset() static to if_gem.c as it's not needed outside that
module.
o Remove the GEM_GIGABIT flag and the associated code; GEM_GIGABIT was
never set and the associated code was in the wrong place.
o Remove sc_mif_config; it was only used to cache the contents of the
respective register within gem_attach().
o Remove the #ifdef'ed out NetBSD/OpenBSD code for establishing a suspend
hook as it will never be used on FreeBSD.
o Also probe Apple Intrepid 2 GMAC and Apple Shasta GMAC, add support for
Apple K2 GMAC. Based on OpenBSD.
o Add support for Sun GBE/P cards, or in other words actually add support
for cards based on GEM to gem(4). This mainly consists of adding support
for the TBI of these chips. Along with this the PHY selection code was
rewritten to hardcode the PHY number for certain configurations as for
example the PHY of the on-board ERI of Blade 1000 shows up twice causing
no link as the second incarnation is isolated.
These changes were ported from OpenBSD with some additional improvements
and modulo some bugs.
o Add code to if_gem_pci.c allowing to read the MAC-address from the VPD on
systems without Open Firmware.
This is an improved version of my variant of the respective code in
if_hme_pci.c
o Now that gem(4) is MI enable it for all archs.
Pointed out by: yongari [1]
Suggested by: rwatson [2], yongari [3]
Tested on: i386 (GEM), powerpc (GMACs by marcel and yongari),
sparc64 (ERI and GEM)
Reviewed by: yongari
Approved by: re (kensmith)
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
commented out until I can re-test them on all our architectures. I
had re@ approval to commit this a long time ago, but that's before we
were this close to the branch.
Approved by: re@
other changes too).
(without any real order)
1. Use device_get_nameunit for mutex naming
2. Add timer for low-latency playback
3. Move most mixer controls from sysctls to mixer(8) controls.
This is a largest part of this patch.
4. Add analog/digital switch (as a temporary sysctl)
5. Get back support for low-bitrate playback (with help of (2))
6. Change locking for exclusive I/O. Writing to non-PTR register
is almost safe and does not need to be ordered with PTR operations.
7. Disable MIDI until we get it to detach properly and fix memory
managment problems.
8. Enable multichannel playback by default. It is as stable as
single-channel mode. Multichannel recording is still an
experimental feature.
9. Multichannel options can be changed by loader tunables.
10. Add a way to disable card from a loader tunable.
11. Add new PCI IDs.
12. Debugger settings are loader tunables now.
14. Remove some unused variables.
15. Mark pcm sub-devices MPSAFE.
16. Partially revert (bus_setup_intr -> snd_setup_intr) since it need
to be done independently
Submitted by: Yuriy Tsibizov (driver maintainer)
Approved by: re (bmah)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
included man pages on how to use it. This code is still somewhat experimental
but has been successfully tested on a number of targets. Many thanks to
Danny for contributing this.
Approved by: re
- Add custom .c wrappers for the firmware, rather than the standard
firmware(9) generated firmware objects to work around toolchain
problems on ia64 involving linking objects produced by
ld -b -binary into the kernel.
- Move from using Myricom's ".dat" firmware blobs to using Myricom's
zlib compressed ".h" firmware header files. This is done to
facilitate the custom wrappers, and saves a fair amount of wired
memory in the case where the firmware is built in, or preloaded.
- Fix two compile issues in mxge which only appear on non-i386/amd64.
Reviewed by: mlaier, mav (earlier version with just zlib support)
Glanced at by: sam
Approved by: re (kensmith)
NET_NEEDS_GIANT, which will shortly be removed. This is done in a
away that it may be easily reattached to the build before 7.1 if
appropriate locking is added. Specifics:
- Don't install netatm include files
- Disconnect netatm command line management tools
- Don't build libatm
- Don't include ATM parts in rescue or sysinstall
- Don't install sample configuration files and documents
- Don't build kernel support as a module or in NOTES
- Don't build netgraph wrapper nodes for netatm
This removes the last remaining consumer of NET_NEEDS_GIANT.
Reviewed by: harti
Discussed with: bz, bms
Approved by: re (kensmith)
This commit includes only the kernel files, the rest of the files
will follow in a second commit.
Reviewed by: bz
Approved by: re
Supported by: Secure Computing
both 6.x and 7.x. This is based on feedbacks on this thread
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=81818+0+current/freebsd-stable
and my use it on 6.x.
MFC after: 3 days
- Update the warning about UNION filesystem. It is now actively maintained,
although there are still some issues being resolved.
Reviewed by: freebsd-stable@, kris, bmah
Approved by: re (bmah)
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
tunnels, and was not MPSAFE. The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.
This removes one of five remaining consumers of NET_NEEDS_GIANT.
Approved by: re (kensmith)
o major overhaul of the way channels are handled: channels are now
fully enumerated and uniquely identify the operating characteristics;
these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
background scanning and roaming
o move scanning support into loadable modules based on the operating
mode to enable different policies and reduce the memory footprint
on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
we use a simple algorithm to trigger a roam: we threshold the rssi
and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
drivers but is incomplete; it's included now to establish a baseline
for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
encapsulation (note this can be used with any card that can tx+rx
large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
authentication and association failures
o enable the addition of sta mode power save support for drivers that need
net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
index into the rate set; this needs to be extended to deal with
multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing
Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
dynamic turbo (lightly tested), 11n (sniffing only and needs
new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested
This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others. Much of the scanning work was supported by
Atheros. The 11n work was supported by Marvell.
The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.
The lagg(4) driver provides link aggregation, failover and fault tolerance.
Discussed on: current@
Linux SCSI SG passthrough device API. The intention is to allow for both
running of Linux apps that want to talk to /dev/sg* nodes, and to facilitate
porting of apps from Linux to FreeBSD. As such, both native and linuxolator
entry points and definitions are provided.
Caveats:
- This does not support the procfs and sysfs nodes that the Linux SG
driver provides. Some Linux apps may rely on these for operation,
others may only use them for informational purposes.
- More ioctls need to be implemented.
- Linux uses a naming scheme of "sg[a-z]" for devices, while FreeBSD uses a
scheme of "sg[0-9]". Devfs aliasis (symlinks) are automatically created
to link the two together. However, tools like camcontrol only see the
native names.
- Some operations were originally designed to return byte counts or other
data directly as the syscall return value. The linuxolator doesn't appear
to support this well, so this driver just punts for these cases.
Now that the driver is in place, others are welcome to add missing
functionality. Thanks to Roman Divacky for pushing this work along.
When the linux port changes were imported which split the
target command list to be separate from the initiator command
list and the handle format changed to encode a type in the handle
the implications to the function isp_handle_index (which only
the NetBSD/OpenBSD/FreeBSD ports use) were overlooked.
The fault is twofold: first, the index into the DMA maps
in isp_pci is wrong because a target command handle with
the type bit left in place caused a bad index (and panic)
into dma map. Secondly, the assumption of the array
of DMA maps in either PCS or SBUS attachment structures is
that there is a linear mapping between handle index and
DMA map index. This can no longer be true if there are
overlapping index spaces for initiator mode and target
mode commands.
These changes bandaid around the problem by forcing us
to not have simultaneous dual roles and doing the appropriate
masking to make sure things are indexed correctly. A longer
term fix is being devloped.
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
imitating an Ethernet device, so vlan(4) and if_bridge(4) can be
attached to it for testing and benchmarking purposes. Its source
can be an introduction to the anatomy of a network interface driver
due to its simplicity as well as to a bunch of comments in it.
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
Make PIM dynamically loadable by using encap_attach_func().
PIM may now be loaded into a GENERIC kernel.
Tested with: ports/net/pimdd && tcpreplay && wireshark
Reviewed by: Pavlin Radoslavov
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).
The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
NOTES though, as ofw_syscons(4) doesn't properly interface with
syscons(4) regarding loading the font specified with SC_DFLT_FONT,
causing a kernel with both options SC_OFWFB and SC_NO_MODE_CHANGE
to not link.
lookup early. This has some performance implications and should not be
enabled by default, but might help greatly in certain setups. After some
more testing this could be turned into a sysctl.
Tested by: avatar
LOR ids: 17, 24, 32, 46, 191 (conceptual)
MFC after: 6 weeks
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.comtuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.
I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)
There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)
Reviewed by: gnn
Approved by: gnn