an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console. It's easily hit by accident
when frobbing other sysctls late at night.
Users should move to the new geom_vinum implementation instead.
The refcount logic which is being added to devices to enable safe module
unloading and the buf/vm work also in progress would require a major rework
of the (old)-vinum code to comply with the new semantics.
The actual source files will not be removed until I have coordinated with
the geomvinum people if they need any bits repo-copied etc.
FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c). This
is a possible MT5 candidate.
remaining consumers to have the count passed as an option. This is
i4b, pc98/wdc, and coda.
Bump configvers.h from 500013 to 600000.
Remove heuristics that tried to parse "device ed5" as 5 units of the ed
device. This broke things like the snd_emu10k1 device, which required
quotes to make it parse right. The no-longer-needed quotes have been
removed from NOTES, GENERIC etc. eg, I've removed the quotes from:
device snd_maestro
device "snd_maestro3"
device snd_mss
I believe everything will still compile and work after this.
will cause the network stack to operate without the Giant lock by
default. This change has the potential to improve performance by
increasing parallelism and decreasing latency in network processing.
Due to the potential exposure of existing or new bugs, the following
compatibility functionality is maintained:
- It is still possible to disable Giant-free operation by setting
debug.mpsafenet to 0 in loader.conf.
- Add "options NET_WITH_GIANT", which will restore the default value of
debug.mpsafenet to 0, and is intended for use on systems compiled with
known unsafe components, or where a more conservative configuration is
desired.
- Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits
kernel components to declare dependence on Giant over the network
stack. If the declaration is made by a preloaded module or a compiled
in component, the disposition of debug.mpsafenet will be set to 0 and
a warning concerning performance degraded operation printed to the
console. If it is declared by a loadable kernel module after boot, a
warning is displayed but the disposition cannot be changed. This is
implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which
is intended for the processing of configuration choices after tunables
are read in and the console is available to generate errors, but
before much else gets going.
This compatibility behavior will go away when we've finished the last
of the locking work and are confident that operation is correct.
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.
If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.
Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files
Add netinet/ip_fw_pfil.c.
conf/options
Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile
Add ip_fw_pfil.c.
net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.
netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c
(Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.
netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)
netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.
netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.
Approved by: re (scottl)
have already done this, so I have styled the patch on their work:
1) introduce a ip_newid() static inline function that checks
the sysctl and then decides if it should return a sequential
or random IP ID.
2) named the sysctl net.inet.ip.random_id
3) IPv6 flow IDs and fragment IDs are now always random.
Flow IDs and frag IDs are significantly less common in the
IPv6 world (ie. rarely generated per-packet), so there should
be smaller performance concerns.
The sysctl defaults to 0 (sequential IP IDs).
Reviewed by: andre, silby, mlaier, ume
Based on: NetBSD
MFC after: 2 months
an adaptive fashion when adaptive mutexes are enabled. The theory
behind non-adaptive Giant is that Giant will be held for long periods
of time, and therefore spinning waiting on it is wasteful. However,
in MySQL benchmarks which are relatively Giant-free, running Giant
adaptive makes an observable difference on SMP (5% transaction rate
improvement). As such, make adaptive behavior on Giant an option so
it can be more widely benchmarked.
NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64. It
shows measurable performance gains in many circumstances, and few negative
effects. It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.
o Rename WITNESS_DDB to WITNESS_KDB. In the new world order KDB is the
acronym to use for debugging related code. The DDB option is used
to enable the DDB debugger backend only.
o Likewise, rename DDB_TRACE to KDB_TRACE, rename DDB_UNATTENDED to
KDB_UNATTENDED and rename SC_HISTORY_DDBKEY to SC_HISTORY_KDBKEY.
o Remove DDB_NOKLDSYM. The new DDB backend supports pre-linker symbol
lookups as well as KLD symbol lookups at the same time.
o Remove GDB_REMOTE_CHAT. The GDB protocol hacks to allow this are
FreeBSD specific. At the same time, the GDB protocol has packets
for console output.
in particular not without removing the options they replace or in the
proper location in this file. The purpose of this commit is to make it
possible to commit changes in parts without causing massive build
breakages. At least, that's the intend. I have no idea if it actually
works out as I hope...
bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO
- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.
Pointed out by: dwmalone
BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to
This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:
bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"
or even setting the variables manually from the OK prompt.
FAT32 filesystems to be mounted, subject to some fairly serious limitations.
This works by extending the internal pseudo-inode-numbers generated from
the file's starting cluster number to 64-bits, then creating a table
mapping these into arbitrary 32-bit inode numbers, which can fit in
struct dirent's d_fileno and struct vattr's va_fileid fields. The mappings
do not persist across unmounts or reboots, so it's not possible to export
these filesystems through NFS. The mapping table may grow to be rather
large, and may grow large enough to exhaust kernel memory on filesystems
with millions of files.
Don't enable this option unless you understand the consequences.
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.
This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.
Approved by: scottl (with his re@ hat)
This class is used for detecting volume labels on file systems:
UFS, MSDOSFS (FAT12, FAT16, FAT32) and ISO9660.
It also provide native labelization (there is no need for file system).
g_label_ufs.c is based on geom_vol_ffs from Gordon Tetlow.
g_label_msdos.c and g_label_iso9660.c are probably hacks, I just found
where volume labels are stored and I use those offsets here,
but with this class it should be easy to do it as it should be done by
someone who know how.
Implementing volume labels detection for other file systems also should
be trivial.
New providers are created in those directories:
/dev/ufs/ (UFS1, UFS2)
/dev/msdosfs/ (FAT12, FAT16, FAT32)
/dev/iso9660/ (ISO9660)
/dev/label/ (native labels, configured with glabel(8))
Manual page cleanups and some comments inside were submitted by
Simon L. Nielsen, who was, as always, very helpful. Thanks!
hash tables used in the sleep queue and turnstile code. Each option adds
a sysctl tree under debug containing the maximum depth of any bucket in
the hash table as well as a separate node for each bucket (or chain)
containing the current depth and maximum depth for that bucket.
originated on RELENG_4 and was ported to -CURRENT.
The scoreboarding code was obtained from OpenBSD, and many
of the remaining changes were inspired by OpenBSD, but not
taken directly from there.
You can enable/disable sack using net.inet.tcp.do_sack. You can
also limit the number of sack holes that all senders can have in
the scoreboard with net.inet.tcp.sackhole_limit.
Reviewed by: gnn
Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.
__FreeBSD_version bump will follow.
Tested-by: (i386)LINT
- Connect geom_stripe and geom_nop modules to the build.
- Connect STRIPE and NOP classes to the LINT build.
- Disconnect gconcat(8) from the build.
Supported by: Wheel - Open Technologies - http://www.wheel.pl
an option. Note that this option doesn't follow the normal USB_ or
Uxxx_ convention. That's because it is this way in the upstream
provider and I didn't want to change that.
returns okay when HW probe fails. This happens when comconsole flag is
set but VGA console is used instead.
Back out requested by: bde (He will be looking at other solutions from scratch)
synchronization protecting against dynamic load and unload of MAC
policies, and instead simply blocks load and unload. In a static
configuration, this allows you to avoid the synchronization costs
associated with introducing dynamicism.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
- Define option FORCECONSPEED to force the serial console to
be CONSPEED. I've run into a lot of boards in which
the detect for prior speed doesn't work and ends up with
broken console since it is at the wrong speed.
- If a serial port is marked as a console, but console=vidconsole
and if the serial ports doesn't exist it will be probed and
attached at a 8250 chip. Then writes to that will freeze the
system.
- Add an option flags 0x400000 to mark this as a potential
comconsole in-case the one flaged with 0x10 does not exist
in the system.
This makes it easier to deploy on systems with one or two serial ports.
Obtained from: IronPort
to awaken all waiters when a contested mutex is released instead of just
the highest priority waiter. If the various threads are awakened in
sequence then each thread may acquire and release the lock in question
without contention resulting in fewer expensive unlock and lock
operations. This old behavior of waking just the highest priority is
still used if this option is specified. Making the algorithm conditional
on a kernel option will allows us to benchmark both cases later and
determine which one should be used by default.
Requested by: tanimura-san
mini-layer. I don't have time to bing it forward into the GEOM world, and
no one else has stepped forward to claim it. It'll be in the Attic for safe
keeping for now.
ethernet (tested) and FDDI (not tested). The main use for this is on ADSL (or
other ATM) connections where bridged ethernet is used, PPPoE being a prime
example.
There is no manual page as yet, I will write one shortly.
Reviewed by: harti
generic watchdoc(9) interface.
Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.
Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).
Approved by: bms(mentor)
also prints the actual numerical value of the symbol in question.
Users of addr2line(1) will be less proficient in hex arithmetic as a
consequence.
This amongst other things means that traceback lines change from:
siointr1(c4016800,c073bda0,0,c06b699c,69f) at siointr1+0xc5
to
siointr1(c4016800,c073bda0,0,c06b699c,69f) at 0xc062b0bd = siointr1+0xc5
I made this an option to avoid bikesheds.
~
~
~
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.
For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.
Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.
There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.
Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.
This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.
Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.
Sponsored by: sentex.net
lots of old interfaces, and digi now supports all cards that dgb
supported. The author of the driver says that this is no longer
necessary.
Approved by: babkin@
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
dcons(4): very simple console and gdb port driver
dcons_crom(4): FireWire attachment
dconschat(8): User interface to dcons
Tested with: i386, i386-PAE, and sparc64.
enable strict checks of the AML. Our default behavior will be to relax
checks to work on as many platforms as possible. Also clean up and document
other ACPI options while I'm here.
Second (PPS) timing interface. The support is non-optional and by
default uses the DCD line signal as the pulse input. A compile-time
option (UART_PPS_ON_CTS) can be used to have uart(4) use the CTS line
signal.
Include <sys/timepps.h> in uart_bus.h to avoid having to add the
inclusion of that header in all source files.
Reviewed by: phk
FIDs to be 128-bits wide and adds support for realms.
Add a new CODA_COMPAT_5 option, which requests support for the old
Coda 5.x interface instead of the new one.
Create a new coda5.ko module that supports the 5.x interface, and make
the existing coda.ko module use the new 6.x interface. These modules
cannot both be loaded at the same time.
Obtained from: Jan Harkes & the coda-6.0.2 distribution,
NetBSD (drochner) (CODA_COMPAT_5 option).
Restructure the way ATA/ATAPI commands are processed, use a common
ata_request structure for both. This centralises the way requests
are handled so locking is much easier to handle.
The driver is now layered much more cleanly to seperate the lowlevel
HW access so it can be tailored to specific controllers without touching
the upper layers. This is needed to support some of the newer
semi-intelligent ATA controllers showing up.
The top level drivers (disk, ATAPI devices) are more or less still
the same with just corrections to use the new interface.
Pull ATA out from under Gaint now that locking can be done in a sane way.
Add support for a the National Geode SC1100. Thanks to Soekris engineering
for sponsoring a Soekris 4801 to make this support.
Fixed alot of small bugs in the chipset code for various chips now
we are around in that corner anyways.
found only many tv-cards.
We currently use more ore less evil hacks (slow_msp_audio sysctl) to
configure the various variants of these chips in order to have
stereo autodetection work. Nevertheless, this doesn't always work
even though it _should_, according to the specs.
This is, for example, the case for some popular Hauppauge models sold
sold in Germany.
However, the Linux driver always worked for me and others. Looking at
the sourcecode you will find that the linux-driver uses a very much
enhanced approach to program the various msp34xx chipset variants,
which is also found in the specs for these chips.
This is a port of the Linux MSP34xx code, written by Gerd Knorr
<kraxel@bytesex.org>, who agreed to re-release his code under a
BSD license for this port.
A new config option "BKTR_NEW_MSP34XX_DRIVER" is added, which is required
to enable the new driver. Otherwise the old code is used.
The msp34xx.c file is diff-reduced to the linux-driver to make later
modifications easier, thus it doesn't follow style(9) in most cases.
Approved by: roger (committing this, no time to test/review),
keichii (code review)
to such devices. If a device fails due to this commit, add:
options DA_OLD_QUIRKS
to the kernel config and recompile. Then send the output of "camcontrol
inquiry da0" to scsi@freebsd.org so the quirk can be re-enabled.
large to huge amounts of small or medium sized receive buffers. The problem
with these situations is that they eat up the available DMA address space
very quickly when using mbufs or even mbuf clusters. Additionally this
facility provides a direct mapping between 32-bit integers and these buffers.
This is needed for devices originally designed for 32-bit systems. Ususally
the virtual address of the buffer is used as a handle to find the buffer as
soon as it is returned by the card. This does not work for 64-bit machines
and hence this mapping is needed.
This commit has two pieces. One half is the watchdog kernel code which lives
primarily in hardclock() in sys/kern/kern_clock.c. The other half is a userland
daemon which, when run, will keep the watchdog from firing while the userland
is intact and functioning.
Approved by: jeff (mentor)
redundant paths to the same device.
This class reacts to a label in the first sector of the device,
which is created the following way:
# "0123456789abcdef012345..."
# "<----magic-----><-id-...>
echo "GEOM::FOX someid" | dd of=/dev/da0 conv=sync
NB: Since the fact that multiple disk devices are in fact the same
device is not known to GEOM, the geom taste/spoil process cannot
fully catch all corner cases and this module can therefore be
confused if you do the right wrong things.
NB: The disk level drivers need to do the right thing for this to
be useful, and that is not by definition currently the case.
of the infrastructure for the gamma driver which was removed a while back.
The DRM_LINUX option is removed because the handler is now provided by the
linux compat code itself.