For a slightly thorough explaination, please refer to
[1] http://people.freebsd.org/~ariff/SOUND_4.TXT.html .
Summary of changes includes:
1 Volume Per-Channel (vpc). Provides private / standalone volume control
unique per-stream pcm channel without touching master volume / pcm.
Applications can directly use SNDCTL_DSP_[GET|SET][PLAY|REC]VOL, or for
backwards compatibility, SOUND_MIXER_PCM through the opened dsp device
instead of /dev/mixer. Special "bypass" mode is enabled through
/dev/mixer which will automatically detect if the adjustment is made
through /dev/mixer and forward its request to this private volume
controller. Changes to this volume object will not interfere with
other channels.
Requirements:
- SNDCTL_DSP_[GET|SET][PLAY|REC]_VOL are newer ioctls (OSSv4) which
require specific application modifications (preferred).
- No modifications required for using bypass mode, so applications
like mplayer or xmms should work out of the box.
Kernel hints:
- hint.pcm.%d.vpc (0 = disable vpc).
Kernel sysctls:
- hw.snd.vpc_mixer_bypass (default: 1). Enable or disable /dev/mixer
bypass mode.
- hw.snd.vpc_autoreset (default: 1). By default, closing/opening
/dev/dsp will reset the volume back to 0 db gain/attenuation.
Setting this to 0 will preserve its settings across device
closing/opening.
- hw.snd.vpc_reset (default: 0). Panic/reset button to reset all
volume settings back to 0 db.
- hw.snd.vpc_0db (default: 45). 0 db relative to linear mixer value.
2 High quality fixed-point Bandlimited SINC sampling rate converter,
based on Julius O'Smith's Digital Audio Resampling -
http://ccrma.stanford.edu/~jos/resample/. It includes a filter design
script written in awk (the clumsiest joke I've ever written)
- 100% 32bit fixed-point, 64bit accumulator.
- Possibly among the fastest (if not fastest) of its kind.
- Resampling quality is tunable, either runtime or during kernel
compilation (FEEDER_RATE_PRESETS).
- Quality can be further customized during kernel compilation by
defining FEEDER_RATE_PRESETS in /etc/make.conf.
Kernel sysctls:
- hw.snd.feeder_rate_quality.
0 - Zero-order Hold (ZOH). Fastest, bad quality.
1 - Linear Interpolation (LINEAR). Slightly slower than ZOH,
better quality but still does not eliminate aliasing.
2 - (and above) - Sinc Interpolation(SINC). Best quality. SINC
quality always start from 2 and above.
Rough quality comparisons:
- http://people.freebsd.org/~ariff/z_comparison/
3 Bit-perfect mode. Bypasses all feeder/dsp effects. Pure sound will be
directly fed into the hardware.
4 Parametric (compile time) Software Equalizer (Bass/Treble mixer). Can
be customized by defining FEEDER_EQ_PRESETS in /etc/make.conf.
5 Transparent/Adaptive Virtual Channel. Now you don't have to disable
vchans in order to make digital format pass through. It also makes
vchans more dynamic by choosing a better format/rate among all the
concurrent streams, which means that dev.pcm.X.play.vchanformat/rate
becomes sort of optional.
6 Exclusive Stream, with special open() mode O_EXCL. This will "mute"
other concurrent vchan streams and only allow a single channel with
O_EXCL set to keep producing sound.
Other Changes:
* most feeder_* stuffs are compilable in userland. Let's not
speculate whether we should go all out for it (save that for
FreeBSD 16.0-RELEASE).
* kobj signature fixups, thanks to Andriy Gapon <avg@freebsd.org>
* pull out channel mixing logic out of vchan.c and create its own
feeder_mixer for world justice.
* various refactoring here and there, for good or bad.
* activation of few more OSSv4 ioctls() (see [1] above).
* opt_snd.h for possible compile time configuration:
(mostly for debugging purposes, don't try these at home)
SND_DEBUG
SND_DIAGNOSTIC
SND_FEEDER_MULTIFORMAT
SND_FEEDER_FULL_MULTIFORMAT
SND_FEEDER_RATE_HP
SND_PCM_64
SND_OLDSTEREO
Manual page updates are on the way.
Tested by: joel, Olivier SMEDTS <olivier at gid0 d org>, too many
unsung / unnamed heroes.
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option
as there is no more space for more SOL_SOCKET options, but this option also
fits better as an IP socket option, it seems.
- Implement this functionality also for IPv6 and RAW IP sockets.
- Always compile it in (don't use additional kernel options).
- Remove sysctl to turn this functionality on and off.
- Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this
functionality (currently only unjail root can use it).
Discussed with: julian, adrian, jhb, rwatson, kmacy
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.
Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.
This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).
KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.
Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho
includes support for NFSv4. The subsystem can optionally be linked
into the kernel using the two options:
NFSCL - the client
NFSD - the server
It is also built as three modules:
nfscl - the client
nfsd - the server
nfscommon - functions shared by the client and server
Approved by: kib (mentor)
Broadcom BCM43xx chipsets. This driver uses the v3 firmware that
needs to be fetched separately. A port will be committed to create
the bwi firmware module.
The driver matches the following chips: Broadcom BCM4301, BCM4307,
BCM4306, BCM4309, BCM4311, BCM4312, BCM4318, BCM4319
The driver works for 802.11b and 802.11g.
Limitations:
This doesn't support the 802.11a or 802.11n portion of radios.
Some BCM4306 and BCM4309 cards don't work with Channel 1, 2 or 3.
Documenation for this firmware is reverse engineered from
http://bcm.sipsolutions.net/
V4 of the firmware is needed for 11a or 11n support
http://bcm-v4.sipsolutions.net/
Firmware needs to be fetched from a third party, port to be committed
# I've tested this with a BCM4319 mini-pci and a BCM4318 CardBus card, and
# not connected it to the build until the firmware port is committed.
Obtained from: DragonFlyBSD, //depot/projects/vap
Reviewed by: sam@, thompsa@
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
runtime with 'sysctl net.inet.flowtable.enable=1'.
- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
their kernel config files.
- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
more review before bringing in changes that require more churn.
Supported by: Bitgravity Inc.
naming of the partitions (GEOM_PART_EBR_COMPAT). When
compatibility is enabled, changes to the partitioning are
disallowed.
Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.
Enable compatibility on amd64 and i386.
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.
Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:
if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd
Drivers that were already disabled because of tty changes:
if_ppp
if_sl
Discussed on: arch@
o add CFI_SUPPORT_STRATAFLASH compile option to enable support
o add new ioctls to get/set the factory and user/oem segments of the PR
and to get/set Protection Lock Register that fuses the user segment
o add #defines for bits in the status register
o update cfi_wait_ready to take an offset so it can be used to wait for
PR write completion and replace constants w/ symbolic names
Note: writing the user segment isn't correct; committing now to get review.
Sponsored by: Carlson Wireless
Reviewed by: imp, Chris Anderson
o remove HAL_CHANNEL; convert the hal to use net80211 channels; this
mostly involves mechanical changes to variable names and channel
attribute macros
o gut HAL_CHANNEL_PRIVATE as most of the contents are now redundant
with the net80211 channel available
o change api for ath_hal_init_channels: no more reglass id's, no more outdoor
indication (was a noop), anM contents
o add ath_hal_getchannels to have the hal construct a channel list without
altering runtime state; this is used to retrieve the calibration list for
the device in ath_getradiocaps
o add ath_hal_set_channels to take a channel list and regulatory data from
above and construct internal state to match (maps frequencies for 900MHz
cards, setup for CTL lookups, etc)
o compact the private channel table: we keep one private channel
per frequency instead of one per HAL_CHANNEL; this gives a big
space savings and potentially improves ani and calibration by
sharing state (to be seen; didn't see anything in testing); a new config
option AH_MAXCHAN controls the table size (default to 96 which
was chosen to be ~3x the largest expected size)
o shrink ani state and change to mirror private channel table (one entry per
frequency indexed by ic_devdata)
o move ani state flags to private channel state
o remove country codes; use net80211 definitions instead
o remove GSM regulatory support; it's no longer needed now that we
pass in channel lists from above
o consolidate ADHOC_NO_11A attribute with DISALLOW_ADHOC_11A
o simplify initial channel list construction based on the EEPROM contents;
we preserve country code support for now but may want to just fallback
to a WWR sku and dispatch the discovered country code up to user space
so the channel list can be constructed using the master regdomain tables
o defer to net80211 for max antenna gain
o eliminate sorting of internal channel table; now that we use ic_devdata
as an index, table lookups are O(1)
o remove internal copy of the country code; the public one is sufficient
o remove AH_SUPPORT_11D conditional compilation; we always support 11d
o remove ath_hal_ispublicsafetysku; not needed any more
o remove ath_hal_isgsmsku; no more GSM stuff
o move Conformance Test Limit (CTL) state from private channel to a lookup
using per-band pointers cached in the private state block
o remove regulatory class id support; was unused and belongs in net80211
o fix channel list construction to set IEEE80211_CHAN_NOADHOC,
IEEE80211_CHAN_NOHOSTAP, and IEEE80211_CHAN_4MSXMIT
o remove private channel flags CHANNEL_DFS and CHANNEL_4MS_LIMIT; these are
now set in the constructed net80211 channel
o store CHANNEL_NFCREQUIRED (Noise Floor Required) channel attribute in one
of the driver-private flag bits of the net80211 channel
o move 900MHz frequency mapping into the hal; the mapped frequency is stored
in the private channel and used throughout the hal (no more mapping in the
driver and/or net80211)
o remove ath_hal_mhz2ieee; it's no longer needed as net80211 does the
calculation and available in the net80211 channel
o change noise floor calibration logic to work with compacted private channel
table setup; this may require revisiting as we no longer can distinguish
channel attributes (e.g. 11b vs 11g vs turbo) but since the data is used
only to calculate status data we can live with it for now
o change ah_getChipPowerLimits internal method to operate on a single channel
instead of all channels in the private channel table
o add ath_hal_gethwchannel to map a net80211 channel to a h/w frequency
(always the same except for 900MHz channels)
o add HAL_EEBADREG and HAL_EEBADCC status codes to better identify regulatory
problems
o remove CTRY_DEBUG and CTRY_DEFAULT enum's; these come from net80211 now
o change ath_hal_getwirelessmodes to really return wireless modes supported
by the hardware (was previously applying regulatory constraints)
o return channel interference status with IEEE80211_CHANSTATE_CWINT (should
change to a callback so hal api's can take const pointers)
o remove some #define's no longer needed with the inclusion of
<net80211/_ieee80211.h>
Sponsored by: Carlson Wireless
The teken library already supports UTF-8 handling and xterm emulation,
but we have reasons to disable this right now. Because we should make it
easy and interesting for people to experiment with these features, allow
them to be set in kernel configuration files.
Before this commit we had a flag called `TEKEN_CONS25' to enable
cons25-style emulation. I'm calling it the opposite now, `TEKEN_XTERM',
because we want to enable it in kernel configuration files explicitly.
Requested by: kib
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.
applications to specify a non-local IP address when bind()'ing a socket
to a local endpoint.
This allows applications to spoof the client IP address of connections
if (obviously!) they somehow are able to receive the traffic normally
destined to said clients.
This patch doesn't include any changes to ipfw or the bridging code to
redirect the client traffic through the PCB checks so TCP gets a shot
at it. The normal behaviour is that packets with a non-local destination
IP address are not handled locally. This can be dealth with some IPFW hackery;
modifications to IPFW to make this less hacky will occur in subsequent
commmits.
Thanks to Julian Elischer and others at Ironport. This work was approved
and donated before Cisco acquired them.
Obtained from: Julian Elischer and others
MFC after: 2 weeks
o add net80211 support for a tdma vap that is built on top of the
existing adhoc-demo support
o add tdma scheduling of frame transmission to the ath driver; it's
conceivable other devices might be capable of this too in which case
they can make use of the 802.11 protocol additions etc.
o add minor bits to user tools that need to know: ifconfig to setup and
configure, new statistics in athstats, and new debug mask bits
While the architecture can support >2 slots in a TDMA BSS the current
design is intended (and tested) for only 2 slots.
Sponsored by: Intel
o add support to byte swap EHCI descriptor contents; the IXP435
has dual-EHCI controllers integral but descriptor contents are
in big-endian format; this support is configured with the
USB_EHCI_BIG_ENDIAN_DESC option and enabled with EHCI_SCFLG_BIGEDESC
o clean up EHCI USBMODE register setup during init; add #defines for
bit values
o split debug support out into a new file and enable use through ddb
o while here remove a bunch of lingering netbsd-isms
Reviewed by: imp
container structures, depending on VIMAGE_GLOBALS compile time option.
Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.
Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively
#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif
Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.
Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.
Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.
De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.
Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.
Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
module; the ath module now brings in the hal support. Kernel
config files are almost backwards compatible; supplying
device ath_hal
gives you the same chip support that the binary hal did but you
must also include
options AH_SUPPORT_AR5416
to enable the extended format descriptors used by 11n parts.
It is now possible to control the chip support included in a
build by specifying exactly which chips are to be supported
in the config file; consult ath_hal(4) for information.
I have fixed the reported problems - if you still have trouble with it, please
contact me with as much detail as possible so that I can track down any other
issues as quickly as possible.
fix the problems a few people have noticed with the new code. People who want
to continue testing the new code or who need RPCSEC_GSS support should use
the new option NFS_NEWRPC to select it.
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.
The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.
The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.
Sponsored by: Isilon Systems
MFC after: 1 month
cable tuning. This has helped in some installations for hardware
deployed by a former employer. Made optional because the lists aren't
full of complaints about these cards... even when they were wildly
popular.
Reviewed by: attilio@, jhb@, trhodes@ (all an older version of the patch)
(1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2)
so that the general exec code isn't aware of the details of
allocating, copying, and freeing labels, rather, simply passes in
a void pointer to start and stop functions that will be used by
the framework. This change will be MFC'd.
(2) Introduce a new flags field to the MAC_POLICY_SET(9) interface
allowing policies to declare which types of objects require label
allocation, initialization, and destruction, and define a set of
flags covering various supported object types (MPC_OBJECT_PROC,
MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...). This change reduces the
overhead of compiling the MAC Framework into the kernel if policies
aren't loaded, or if policies require labels on only a small number
or even no object types. Each time a policy is loaded or unloaded,
we recalculate a mask of labeled object types across all policies
present in the system. Eliminate MAC_ALWAYS_LABEL_MBUF option as it
is no longer required.
MFC after: 1 week ((1) only)
Reviewed by: csjp
Obtained from: TrustedBSD Project
Sponsored by: Apple, Inc.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
In order to CATER this, DDB buffered output can be choosen at compile
time through the option DDB_BUFR_SIZE=nbytes where nbytes choose the size
of the buffer (suggested size is 128 bytes), which should be manually
specified in any interested config file.
Sponsored by: Nokia
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.
This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation
Discussed with: rwatson, re
NET_NEEDS_GIANT. netatm has been disconnected from the build for ten
months in HEAD/RELENG_7. Specifics:
- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.
MFC after: 3 weeks
Reviewed by: bz
Discussed with: bms, bz, harti
- KDTRACE_HOOKS for the shim layer of hooks which separate BSD licensed
code from CDDL code.
- DDB_CTF for the code that parses the CTF (compact C type format)
data for use by the DTrace Function Boundary Trace
provider and (possibly) ddb if we plan to do that.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
to profile outoing packets for a number of mbuf chain
related parameters
e.g. number of mbufs, wasted space.
probably will do with further work later.
Reviewed by: various
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.
Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,
route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2
The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"
Multiple default routes can also be inserted. Here is the netstat
output:
default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0
When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,
route delete default
would fail and trigger the following error message:
"route: writing to routing socket: No such process"
"delete net default: not in table"
On the other hand,
route delete default 10.2.5.2
would be successful: "delete net default: gateway 10.2.5.2"
One does not have to specify a gateway if there is only a single
route for a particular destination.
I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.
Reviewed by: robert, sam, gnn, julian, kmacy
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.
overridden at compile-time using kernel options of the same names.
Rather than doing a compile-time CTASSERT of buffer sizes being
even multiples of block sizes, just adjust them at boottime, as
the failure mode is more user-friendly.
MFC after: 2 months
PR: 119993
Suggested by: Scot Hetzel <swhetzel at gmail dot com>
the ABI when enabled. There is no longer an embedded lock_profile_object
in each lock. Instead a list of lock_profile_objects is kept per-thread
for each lock it may own. The cnt_hold statistic is now always 0 to
facilitate this.
- Support shared locking by tracking individual lock instances and
statistics in the per-thread per-instance lock_profile_object.
- Make the lock profiling hash table a per-cpu singly linked list with a
per-cpu static lock_prof allocator. This removes the need for an array
of spinlocks and reduces cache contention between cores.
- Use a seperate hash for spinlocks and other locks so that only a
critical_enter() is required and not a spinlock_enter() to modify the
per-cpu tables.
- Count time spent spinning in the lock statistics.
- Remove the LOCK_PROFILE_SHARED option as it is always supported now.
- Specifically drop and release the scheduler locks in both schedulers
since we track owners now.
In collaboration with: Kip Macy
Sponsored by: Nokia
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.
Please don't forget to update your own custom config kernel files.
Approved by: cognet, marcel (maintainers of arches where option is
not enabled at the moment)
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
other changes too).
(without any real order)
1. Use device_get_nameunit for mutex naming
2. Add timer for low-latency playback
3. Move most mixer controls from sysctls to mixer(8) controls.
This is a largest part of this patch.
4. Add analog/digital switch (as a temporary sysctl)
5. Get back support for low-bitrate playback (with help of (2))
6. Change locking for exclusive I/O. Writing to non-PTR register
is almost safe and does not need to be ordered with PTR operations.
7. Disable MIDI until we get it to detach properly and fix memory
managment problems.
8. Enable multichannel playback by default. It is as stable as
single-channel mode. Multichannel recording is still an
experimental feature.
9. Multichannel options can be changed by loader tunables.
10. Add a way to disable card from a loader tunable.
11. Add new PCI IDs.
12. Debugger settings are loader tunables now.
14. Remove some unused variables.
15. Mark pcm sub-devices MPSAFE.
16. Partially revert (bus_setup_intr -> snd_setup_intr) since it need
to be done independently
Submitted by: Yuriy Tsibizov (driver maintainer)
Approved by: re (bmah)
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
included man pages on how to use it. This code is still somewhat experimental
but has been successfully tested on a number of targets. Many thanks to
Danny for contributing this.
Approved by: re
NET_NEEDS_GIANT, which will shortly be removed. This is done in a
away that it may be easily reattached to the build before 7.1 if
appropriate locking is added. Specifics:
- Don't install netatm include files
- Disconnect netatm command line management tools
- Don't build libatm
- Don't include ATM parts in rescue or sysinstall
- Don't install sample configuration files and documents
- Don't build kernel support as a module or in NOTES
- Don't build netgraph wrapper nodes for netatm
This removes the last remaining consumer of NET_NEEDS_GIANT.
Reviewed by: harti
Discussed with: bz, bms
Approved by: re (kensmith)
- CMT_PF states added (w/sysctl to turn the PF version on)
- sctp_input.c had a missing incr of cookie case when the
auth was bad. This meant a free was called without an
increment to refcnt, added increment like rest of code.
- There was a case, unlikely, when the scope of the destination
changed (this is a TSNH case). In that case, it would not free
the alloc'ed asoc (in sctp_input.c).
- When listed addresses found a colliding cookie/Init, then
the collided upon tcb was not unlocked in sctp_pcb.c
- Add error checking on arguments of sctp_sendx(3) to prevent it from
referencing a NULL pointer.
- Fix an error return of sctp_sendx(3), it was returing
ENOMEM not -1.
- Get assoc id was changed to use the sanctified socket api
method for getting a assoc id (PEER_ADDR_INFO instead of
PEER_ADDR_PARAMS).
- Fix it so a peeled off socket will get a proper error return
if it trys to send to a different address then it is connected to.
- Fix so that select_a_stream can avoid an endless loop that
could hang a caller.
- time_entered (state set time) was not being set in all cases
to the time we went established.
Approved by: re(ken smith)
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- Disable shared paths for lock_profiling. Actually, lock_profiling has a
subtle race which makes results caming from shared paths not completely
trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
lock_profiling
- Style fixes
- Add a CTASSERT for some flags building
Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
the 7.0 timeframe.
This is needed because I4B is not locked and NET_NEEDS_GIANT goes away.
The plan is to lock I4B and bring everything back for 7.1.
Approved by: re (kensmith)
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.
For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.
This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.
Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
of the logging #ifdef's with a few exceptions reducing
the number of config options for SCTP.
tunnels, and was not MPSAFE. The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.
This removes one of five remaining consumers of NET_NEEDS_GIANT.
Approved by: re (kensmith)
is expanded, size of expansion was not taken int consideration.
- Fix so vtag hash is 1 bigger so that it modulo's out
correctly, avoids a panic when restart with right modulo happens.
- do not dereference stcb when control->do_not_ref_stcb is set
- Fix up packet logging to not often use a lock and also to
add to options.
- Fix some logging option duplication in the sctputil.h
The entire code is wrapperd in #ifdef ... #endif so it won't harm
the actual implementation, but developers are encouraged to test it.
For arm, ia64, ppc, sparc64 and sun4v some work is still
needed, thus arch maintainers are encouraged to bring their arch on par
with respect to i386 and amd64.
Approved by: re (implicit?)
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
Make PIM dynamically loadable by using encap_attach_func().
PIM may now be loaded into a GENERIC kernel.
Tested with: ports/net/pimdd && tcpreplay && wireshark
Reviewed by: Pavlin Radoslavov
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).
The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
the newly added DEV_EISA. This is done so that these back-ends can
be compiled on platforms not providing in{b,w,l}()/out{b,w,l}() and
friends (but may wish to use them together with bus front-ends other
than the EISA one).
NOTES though, as ofw_syscons(4) doesn't properly interface with
syscons(4) regarding loading the font specified with SC_DFLT_FONT,
causing a kernel with both options SC_OFWFB and SC_NO_MODE_CHANGE
to not link.
lookup early. This has some performance implications and should not be
enabled by default, but might help greatly in certain setups. After some
more testing this could be turned into a sysctl.
Tested by: avatar
LOR ids: 17, 24, 32, 46, 191 (conceptual)
MFC after: 6 weeks
by default for sun4v where it is absolutely required.
This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.
Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.
way I intended due to licensing restrictions. I had intended
that it would be defaulted on, with opt-out possible for
companies that don't accept the CDDL. The FreeBSD GENERIC
kernel has to be entirely BSD licensed, so the only alternative
would have been to make KDTRACE an opt-in option. That isn't
a design I favour.
wait (time waited to acquire) and hold times for *all* kernel locks. If
the architecture has a system synchronized TSC, the profiling code will
use that - thereby minimizing profiling overhead. Large chunks of profiling
code have been moved out of line, the overhead measured on the T1 for when
it is compiled in but not enabled is < 1%.
Approved by: scottl (standing in for mentor rwatson)
Reviewed by: des and jhb
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.comtuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.
I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)
There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)
Reviewed by: gnn
Approved by: gnn
other problems while labels were first being added to various kernel
objects. They have outlived their usefulness.
MFC after: 1 month
Suggested by: Christopher dot Vance at SPARTA dot com
Obtained from: TrustedBSD Project
in older versions of FreeBSD. This option is pointless as it is needed in just
about every interesting usage of forward that I have ever seen. It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x It doesn't make
the system any safer and just wastes huge amounts of develper time
when the system doesn't behave as expected when code is moved from
4.x to 6.x or 7.x
Reviewed by: glebius
MFC after: 1 week
and pc98 MD files. Remove nodevice and nooption lines specific
to sio(4) from ia64, powerpc and sparc64 NOTES. There were no
such lines for arm yet.
sio(4) is usable on less than half the platforms, not counting
a future mips platform. Its presence in MI files is therefore
increasingly becoming a burden.
the sysctls. This saves a lot of space in the resulting kernel which is
important for embedded systems. This change was done in a ABI compatible
way. The pointer is still there, it just points to an empty string instead
of the description.
MFC After: 3 days
encryption. There are two functions, a bpf tap which has a basic header with
the SPI number which our current tcpdump knows how to display, and handoff to
pfil(9) for packet filtering.
Obtained from: OpenBSD
Based on: kern/94829
No objections: arch, net
MFC after: 1 month
in 1999, and there are changes to the sysctl names compared to PR,
according to that discussion. The description is in sys/conf/NOTES.
Lines in the GENERIC files are added in commented-out form.
I'll attach the test script I've used to PR.
PR: kern/14584
Submitted by: babkin
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.
Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
timeslice and interaction detecting algorithm are based
on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
for CBUS-PNP cards there) by default, as there are no amd64 and sparc64
machines with ISA slots and which therefore could make use of this code
known to exist. For sparc64 this additionally allows to get rid of the
compat shims for in{b,w,l}()/out{b,w,l}() etc and the associated hacks.
OK'ed by: imp, peter
- Linux ioctl support, with the other Linux changes MegaCli
will run if you mount linprocfs & linsysfs then set
sysctl compat.linux.osrelease=2.6.12 or similar. This works
on i386. It should work on amd64 but not well tested yet.
StoreLib may or may not work. Remember to kldload mfi_linux.
- Add in AEN (Async Event Notification) support so we can
get messages from the firmware when something happens.
Not all messages are in defined in event detail. Use
event_log to try to figure out what happened.
- Try to implement something like SIGIO for StoreLib. Since
mrmonitor doesn't work right I can't fully test it. StoreLib
works best with the rh9 base. In theory mrmonitor isn't
needed due to native driver support of AEN :-)
Now we can configure and monitor the RAID better.
Submitted by: IronPort Systems.
When porting FreeBSD to a new platform, one of the more useful things to do is
get mi_startup() to let you know which SYSINIT it's up to. Most people tend to
whack a printf in the SYSINIT loop to print the address of the function it's
about to call. Going one better, jhb made a version that uses DDB to look up
the name of the function and print that instead. This version is essentially
his with the addition of some ifdeffery to make it optional and to allow it to
work (although using only the function address, not the symbol) if you forgot
to enable DDB.
All the cool bits by: jhb
Approved by: scottl, rink, cognet, imp
the linux module, since it is not cross-platform
- move linprocfs from "files" and "options" to architecture specific files,
since it only makes sense to build this for those architectures, where we
also have a linuxolator
- disable the build of the linuxolator on our tier-2 architecture "Alpha":
* we don't have a linux_base port which supports Alpha and at the
same time is not outdated/obsoleted upstream/in a good condition/
currently working
* the upcomming new default linux base port is based upon Fedora
Core 3 (security support via http://www.fedoralegacy.org), which
isn't available for Alpha (like the current default linux base
port which is based upon Red Hat 8)
* nobody answered my request for testing it ~1 month ago on
current@ and alpha@ (it doesn't surprises me, see above)
* a SoC student wouldn't have to waste time on something which
nobody is willing to test
This does not remove the alpha specific MD files of the linuxolator yet.
Discussed on: arch (mostly silence)
Spiritual support by: scottl
o Properly use rman(9) to manage resources. This eliminates the
need to puc-specific hacks to rman. It also allows devinfo(8)
to be used to find out the specific assignment of resources to
serial/parallel ports.
o Compress the PCI device "database" by optimizing for the common
case and to use a procedural interface to handle the exceptions.
The procedural interface also generalizes the need to setup the
hardware (program chipsets, program clock frequencies).
o Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
default and non-serdev devices are handled by the bus.
o Use the serdev I/F to collect interrupt status and to handle
interrupts across ports in priority order.
o Sync the PCI device configuration to include devices found in
NetBSD and not yet merged to FreeBSD.
o Add support for Quatech 2, 4 and 8 port UARTs.
o Add support for a couple dozen Timedia serial cards as found
in Linux.
This allows one to change the behavior of the driver pre-boot.
NOTE: This patch was made for DragonFly BSD by Sepherosa Ziehau.
PR: kern/94833
Submitted by: Devon H. O'Dell
Obtained from: DragonFly
MFC after: 1 month
end for isa(4).
o Add a seperate bus frontend for acpi(4) and allow ISA DMA for
it when ISA is configured in the kernel. This allows acpi(4)
attachments in non-ISA configurations, as is possible for ia64.
o Add a seperate bus frontend for pci(4) and detect known single
port parallel cards.
o Merge PC98 specific changes under pc98/cbus into the MI driver.
The changes are minor enough for conditional compilation and
in this form invites better abstraction.
o Have ppc(4) usabled on all platforms, now that ISA specifics
are untangled enough.
enabled by default in NETSMB and smbfs.ko.
With the most of modern SMB providers requiring encryption by
default, there is little sense left in keeping the crypto part
of NETSMB optional at the build time.
This will also return smbfs.ko to its former properties users
are rather accustomed to.
Discussed with: freebsd-stable, re (scottl)
Not objected by: bp, tjr (silence)
MFC after: 5 days
into a separate module. Accordingly, convert the option into a device
named similarly.
Note for MFC: Perhaps the option should stay in RELENG_6 for POLA reasons.
Suggested by: scottl
Reviewed by: cokane
MFC after: 5 days
option. We always build audit_syscalls.c so that the system call
stubs can return ENOSYS rather than the system call code
generating SIGSYS for the system calls. We are not yet ready to
add AUDIT to LINT, as the prototypes for system call arguments
won't be there until after the system calls for audit are added.
Much work from: wsalamon
Obtained from: TrustedBSD Project