inserted a few to the new files.. but I falied to
add the #include <sys/cdef.h>
Which causes a compile error.. sorry about that... got it
now :-)
Approved by:gnn
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.comtuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.
I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)
There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)
Reviewed by: gnn
Approved by: gnn
o Fix the packet statistics
o Make sure we set the FD bit when in full duplex
o Improve TX side efficency by eliminating a data copy for
unfragmented mbufs (the hardware can't do s/g).
o Minor busdma pedantry
o better comments in some places, more XXX in others
o Minor style nits.
This solves a problem I was seeing where I'd get no ethernet when not
booting with a NFS root. Well, unless I unplugged the cable and
plugged it back in first so I'd get the same up down up messages I get
for NFS root...
Thanks to sam and scottl for suggestions on making this driver more
efficient through better use of approrpiate APIs.
the ORDERED tag. This recoups significant performance gains for many
arrays.
The default is still to send out the ORDERED tag periodically.
Reviewed by: scsi (justin+timeout)
to do the userland to kernel copying in sosend_generic() and sosend_dgram().
sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported
by m_uiotombuf().
Benchmaring shows significant improvements (95% confidence):
66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO)
65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO)
(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month
mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf
chain flags. Provide compatibility macro for m_getm() calling m_getm2()
with M_PKTHDR set.
Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the
uiomove() in a tight loop over the mbuf chain. Add a flags parameter to
accept mbuf flags to be passed to m_getm2(). Adjust all callers for the
extra parameter.
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month
to get the physical address doesn't work for all values of KVA_PAGES,
while masking 8 MSBs works for all values of KVA_PAGES that are
multiple of 4 for non-PAE and 8 for PAE. (This leaves us limited
with 12MB for non-PAE kernels and 14MB for PAE kernels.)
To get things right, we'd need to subtract the KERNBASE from the
virtual address (but KERNBASE is not easy to figure out from here),
or have physical addresses set properly in the ELF headers.
Discussed with: jhb
VM pages into mbufs as it can -- up to the free send socket buffer space.
The outer loop then drops the whole mbuf chain into the send socket buffer,
calls tcp_output() on it and then waits until 50% of the socket buffer are
free again to repeat the cycle. This way tcp_output() gets the full amount
of data to work with and can issue up to 64K sends for TSO to chop up in
the network adapter without using any CPU cycles. Thus it gets very efficient
especially with the readahead the VM and I/O system do.
The previous sendfile(2) code simply looped over the file, turned each 4K
page into an mbuf and sent it off. This had the effect that TSO could only
generate 2 packets per send instead of up to 44 at its maximum of 64K.
Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of
sleeping on mbuf allocation failures.
Benchmarking shows significant improvements (95% confidence):
45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO)
83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO)
(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month
longjmp to the default context. As result, "alltrace" command may
be prematurely terminated (without error message). This is happens,
for instance, when system is low on memory and referenced page in
kernel-mode thread stack is swapped out.
Protect "alltrace" against termination on trap by setting temporary
kdb_jmpbuf context.
Submitted by: Peter Holm
device (kind) specific unit field to the common field. This change
allows a future version of libefi to work without requiring anything
more than what is defined in struct devdesc and as such makes it
possible to compile said version of libefi for different platforms
without requiring that those platforms have identical derivatives
of struct devdesc.
as we have no use for that info. Instead let this function return the
keyboard ID and verify at its invocation in sunkbd_configure() that we're
talking to a Sun type 4/5/6 keyboard, i.e. a keyboard supported by this
driver.
- Add an option SUNKBD_EMULATE_ATKBD whose code is based on the respective
code in ukbd(4) and like UKBD_EMULATE_ATSCANCODE causes this driver to
emit AT keyboard/KB_101 compatible scan codes in K_RAW mode as assumed by
kbdmux(4). Unlike UKBD_EMULATE_ATSCANCODE, SUNKBD_EMULATE_ATKBD also
triggers the use of AT keyboard maps and thus allows to use the map files
in share/syscons/keymaps with this driver at the cost of an additional
translation (in ukbd(4) this just is the way of operation).
- Implement an option SUNKBD_DFLT_KEYMAP, which like the equivalent options
of the other keyboard drivers allows to specify the default in-kernel
keyboard map. For obvious reasons this made to only work when also using
SUNKBD_EMULATE_ATKBD.
- Implement sunkbd_check(), sunkbd_check_char() and sunkbd_clear_state(),
which are also required for interoperability with kbdmux(4).
- Implement K_CODE mode and FreeBSD keypad compose.
- As a minor hack define KBD_DFLT_KEYMAP also in the !SUNKBD_EMULATE_ATKBD
case so we can obtain fkey_tab from <dev/kbd/kbdtables.h> rather than
having to duplicate it and #ifdef some more code.
- Don't use the TX-buffer for writing the two command bytes for setting the
keyboard LEDs as this consequently requires a hardware FIFO that is at
least two bytes in depth, which the NMOS-variant of the Zilog SCCs doesn't
have. Thus use an inlined version of uart_putc() to consecutively write
the command bytes (a cleaner approach would be to do this via the soft
interrupt handler but that variant wouldn't work while in ddb(4)). [1]
- Fix some minor style(9) bugs.
PR: 90316 [1]
Reviewed by: marcel [1]
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
BIO_READ/BIO_WRITE is sent to vnode-backed provider (BIO_DELETE or
BIO_FLUSH).
Reported by: ceri
Add support for BIO_FLUSH to vnode-backed md(4) devices based on
VOP_FSYNC().
we won't be able to exit from the thread.
Function g_eli_cpu_is_disabled() stoled from kern_pmc.c.
PR: 104669
Reported by: Nikolay Mirin <nik@optim.com.ru>
MFC after: 1 week
- Test the mac_type rather than if_hwassist (since ifp doesn't exist yet)
to determine if the adapter supports TSO and thus to change the sizes
for the bus_dma tag.
Reviewed by: glebius
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).
Pointed out by: tegge
Reviewed by: tegge
RSTP provides faster spanning tree convergence, the protocol will exchange
information with neighboring switches to quickly transition to forwarding
without creating loops. The code will default to RSTP mode but will downgrade
any port connected to a legacy STP network so is fully backward compatible.
Reviewed by: syrinx
Tested by: syrinx
a lock to prevent interspersed strings written from different CPUs
at the same time.
To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.
String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.
Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.
A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.
Reviewed by: scottl, jhb
MFC after: 2 weeks
- Add FS_GJOURNAL flag which enables gjournal support on a file system.
- Add cg_unrefs field to the cylinder group structure which holds
number of unreferenced (orphaned) inodes in the given cylinder group.
- Add fs_unrefs field to the super block structure which holds
total number of unreferenced (orphaned) inodes.
- When file or a directory is orphaned (last reference is removed, but
object is still open), increase fs_unrefs and cg_unrefs fields,
which is a hint for fsck in which cylinder groups looks for such
(orphaned) objects.
- When file is last closed, decrease {fs,cg}_unrefs fields.
- Add VV_DELETED vnode flag which points at orphaned objects.
Sponsored by: home.pl
support enabled.
Add mnt_gjprovider field which keeps gjournal provider's name on which
file system is placed on. This allows to not place file system on gjournal
directly and allows gjournal class to pair gjournal provider with file
system.
Sponsored by: home.pl
journaling and can be tought about marking file system as clean before
doing journal switch, which easly allows to add journaling to file
systems that don't have this feature.
Sponsored by: home.pl
This bug results in data corruption with NFS/TCP. Writes are silently dropped
on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires).
Reviewed by: ups@
during detach() similar to other NIC drivers rather than allocating them
during init() and freeing them during stop():
- Move creation of tx bus_dma tag amd maps and tx_buffer_area from
em_setup_transmit_structures() to em_allocate_transmit_structures().
- Call em_allocate_xxx_structures() in em_attach().
- Only call em_free_xxx_structures() in em_detach().
- Change em_setup_xxx_structures() to free any existing tx or rx buffers
and in the case of rx repopulate the ring with newer buffers.
Reviewed by: jfv
the EOP descriptor in the first descriptor of the packet. And then
in em_txeof() search for DD bits set only in the EOP descriptors,
embedding the cleanup of all packet's descriptors into inner loop.
This change is important for future chips, where DD bit is going
to be set only on the EOP descriptors.
Submitted by: jfv
Details:
o if_em.c changes:
- Added several new PCI ids.
- Check em_check_phy_reset_block() before doing SIOCSIFMEDIA ioctl.
- Don't touch TARC registers, they are now handled in shared
code in if_em_hw.c.
- Move RDH and RDT setting to the end of
em_initialize_receive_unit().
- Declare em_read_pcie_cap_reg(), now empty.
o if_em_hw.c dropped in from vendor, then restored rev. 1.15.
o if_em_hw.h dropped in from vendor, then modified:
- Added RX overrun interrupt flag to interrupt enable mask.
- Remove declarations of em_io_read(), em_io_write().
Approved by: jfv
the CAM_NEW_TRAN_CODE that has been in the tree for some years now.
This first step consists solely of adding to or correcting
CAM_NEW_TRAN_CODE pieces in the kernel source tree such
that a both a GENERIC (at least on i386) and a LINT build
with CAM_NEW_TRAN_CODE as an option will compile correctly
and run (at least with some the h/w I have).
After a short settle time, the other pieces (making
CAM_NEW_TRAN_CODE the default and updating libcam
and camcontrol) will be brought in.
This will be an incompatible change in that the size of structures
related to XPT_PATH_INQ and XPT_{GET,SET}_TRAN_SETTINGS change
in both size and content. However, basic system operation and
basic system utilities work well enough with this change.
Reviewed by: freebsd-scsi and specific stakeholders
argument in parentheses so these macros are safe to use and invocations
with an expression as the argument like __bswap32_const(42 << 23 | 13)
work as expected. Additionally, mask all the individually shifted bytes
as appropriate so the bytes which exceed the width of the respective
__bswapN_const() macro in invocations like __bswap16_const(0xdead600d)
are ignored like it's the case with the corresponding __bswapN_var()
function.
MFC after: 3 days
only those bars that had addresses assigned by the BIOS and where the
bridges were properly programmed. Now even unprogrammed ones work.
This was needed for sun4v. We still only implement up to 2GB memory
ranges, even for 64-bit bars. PCI standards at least through 2.2 say
that this is the max (or 1GB is, I only know it is < 32bits).
o Always define pci_addr_t as uint64_t. A pci address is always 64-bits,
but some hosts can't address all of them.
o Preserve the upper half of the 64-bit word during resource probing.
o Test to make sure that 64-bit values can fit in a u_long (true on some
platforms, but not others). Don't use those that can't.
o minor pedantry about data sizes.
o Better bridge resource reporting in bootverbose case.
o Minor formatting changes to cope with different data types on different
platforms.
Submitted by: jmg, with many changes by me to fully support 64-bit
addresses.
Though it is named after overclocking tool for ASUS motherboards,
it is not capable to change clock ratio or CPU core voltage.
This driver exports Templature, Power output voltage, Fan RPM under
dev.acpi_aiboost.0.*.
Descriptions for these values are set to sysctl describe, which can be
get by sysctl -d.
in #ifdef __NO_STRICT_ALIGNMENT rather than #ifdef __i386__. This
means that amd64 now also uses the optimized code. [1]
While at it, fix a nearby style(9) bug.
- Remove the hw.dc_quick SYSCTL, which allowed to turn off the above
mentioned optimization, as like the equivalent and already removed
- In dc_setcfg() suppress printing a warning when forcing the receiver
and transceiver to idle state times out for chips where the status
bits in question just never change (observed in detail with DM9102A)
and therefore the warning would be highly likely false positive. [2]
- In dc_ifmedia_sts() add a missing DC_UNLOCK().
Tested by: Hans-Joerg Sirtl on amd64 [1]
PR: 82681 [2]
Obtained from: NetBSD tlp(4) [2]
MFC after: 1 week
in #ifdef __NO_STRICT_ALIGNMENT rather than #if defined(__i386__) ||
defined(__amd64__). Currently this change is cosmetic only though.
While at it, fix a nearby style(9) bug and remove a no longer used
header.
are no longer limited to a virtual address space of 16 megabytes,
only mask high two bits of a virtual address. This allows to load
larger kernels (up to 1 gigabyte). Not masking addresses at all
was a bad idea on machines with less than >3G of memory -- kernels
are linked at 0xc0xxxxxx, and that would attempt to load a kernel
at above 3G. By masking only two highest bits we stay within the
safe limits while still allowing to boot larger kernels.
(This is a safer reimplmentation of sys/boot/i386/boot2/boot.2.c
rev. 1.71.)
Prodded by: jhb
Tested by: nyan (pc98)
dynamic nature (if no native aio code is available, the linux part
returns ENOSYS because of missing requisites) should be solved differently
than it is.
All this will be done in P4.
Not included in this commit is a backout of the changes to the native aio
code (removing static in some places). Those changes (and some more) will
also be needed when the reworked linux aio stuff will reenter the tree.
Requested by: rwatson
Discussed with: rwatson
- Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input()
Reviewed by: glebius, rwatson
- purge_flow_set():
- Do not leak memory while purging queues which are not bound to pipe.
- style(9) cleanup
MFC after: 2 months
not completely decided at config time. Just don't default to using
the TSC if there are multiple active CPUs. Also, don't default to
using the TSC if it is broken. SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.
This only helps much for SMP kernels running on 1 CPU. The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range. Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.
Use the same condition for declaring perfmon stuff as for using it.
Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of
vfs_setdirty(), thereby eliminating a second acquisition and release
of the same vm object lock.
queues lock to BIO_READ operations. Recent changes to the implementation
of the per-page flags have eliminated the need for the page queues lock
in the other cases.
- Don't use a frame pointer. Our callers need a frame pointer, but we
could only use one to support things that aren't supported. (These
things are:
- profiling of profiling
- debugging of profiling. The core ENTRY() macro doesn't support
forcing a frame pointer for debugging, so don't do more here.)
- Ensure that we are in the text section and have normal alignment.
- Use the normal syntax for `.type'.
Fixed a syntax error for the (!__KERNEL && !__GNUCLIKE_ASM) case in
rev.1.36. Apparently, this case has never been reached even by lint.
Submitted by: stefanf
{amd64,i386}/include/profile.h:
In case the above case is actually reached, break it properly by
providing null support that will fail at link time instead of a stub
that gives wrong (null) profiling at runtime.
Rename MAX_SAMPLE_RATES macro to OSS_MAX_SAMPLE_RATES. The old
macro clashed with those used in other applications and libaries
(ex: RtAudio). 4Front responded by updating their spec, so we
will follow suit.
Submitted by: ryanb
Noticed by: pointyhat/kris
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.
Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities. It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).
*/prof_machdep.c:
- Removed all uses of CNAME(). CNAME() is easy enough to use in pure
asm code, but using it in inline asm requires messy quoting. The
core pure asm code has been hacked on more and all uses of CNAME() in
it have already gone away. Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
new device support, and it is hoped a more stable driver for 6.2. RELEASE.
This checkin was discussed and approved today by RE, scottl, jhb, and pdeuskar
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
net.inet.ip.dummynet.curr_time
net.inet.ip.dummynet.searches
net.inet.ip.dummynet.search_steps
to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs.
- Implement simple mechanics for dummynet(4) internal time correction.
Under certain circumstances (system high load, dummynet lock contention, etc)
dummynet's tick counter can be significantly slower than it should be.
(I've observed up to 25% difference on one of my production servers).
Since this counter used for packet scheduling, it's accuracy is vital for
precise bandwidth limitation.
Introduce new sysctl nodes:
net.inet.ip.dummynet.
tick_lost - number of ticks coalesced by taskqueue thread.
tick_adjustment - number of time corrections done.
tick_diff - adjusted vs non-adjusted tick counter difference
tick_delta - last vs 'standard' tick differnece (usec).
tick_delta_sum - accumulated (and not corrected yet) time
difference (usec).
Reviewed by: glebius
MFC after: 2 month
The 'nooption' kernel config entry has to be used to turn KSE off now.
This isn't my preferred way of dealing with this, but I'll defer to
scottl's experience with the io/mem kernel option change and the grief
experienced over that.
Submitted by: scottl@
except sun4v.
This change makes the transition from a default to an option more
transparent and is an attempt to head off all the compliants that are
likely from people who don't read UPDATING, based on experience with
the io/mem change.
Submitted by: scottl@
i386). Use -mprofiler-epilogue again, and don't use -finstrument-functions.
The former has been fixed for arches that implement high-res profiling,
and the latter has been useless for kernel profiling since gcc-3.4
when it started forcing -fno-inline. -fno-inline gives a kernel with
performance characteristics too different from a normal kernel to be
worth profiling, by turning off inlining of all the little optimized
functions in headers. This interacts especially badly with FreeBSD's
use of "static inline" for all inlines in headers, by creating many
separate copies of the little functions, so not inlining tends to
increase cache pressure where it should reduce it, and (since gprof(1)
doesn't understand the copies) the statistics for the little functions
are hard to interpret even if you want them.
to twice unlock the vnode. Check that ni_vp and ni_dvp are different before
doing second unlock.
Reviewed by: rwatson
Approved by: pjd (mentor)
MFC after: 1 week
counters of allocs/frees/use for each malloc type to calculating InUse,
MemUse, and Requests as displayed by the userspace vmstat -m. This is
more useful when debugging malloc(9)-related memory leaks, where the
count of allocs/frees may not usefully reflect that current memory
allocation (i.e., when highly variable size allocations occur with the
same malloc type, such as with contigmalloc).
MFC after: 3 days
Limitations observed by: scottl
keyboard is attached only after the system has already booted.
If USB keyboard is also present, and there's no kbdmux(4), the problem
has been hiding itself because as soon as we get to multi-user, the
USB keyboard becomes an active keyboard (see devd.conf), thus marking
atkbd inactive and letting the old code initialize the keyboard.
With kbdmux(4), or if there's no USB keyboard, the atkbd keyboard is
always active, whether it's physically attached or not, thus it never
initialized itself properly on a physical attach.
To fix this, move block that initialized the keyboard on attach upper
so it doesn't depend on the (KBD_IS_ACTIVE(kbd) && KBD_IS_BUSY(kbd))
condition. Also move KBD_FOUND_DEVICE() a few lines upper so that
KDSETLED and KDSETREPEAT that follow it propagate to the controller.
MFC after: 3 days
: revision 1.27
: date: 2000/05/28 12:43:24; author: ache; state: Exp; lines: +3 -2
: Manipulate with AltGR Led (really CapsLock Led) only in K_XLATE mode, because
: all other modes not set ALKED flag and it means that CapsLock always turned
: off for them.
: Real bug example is X11 which never turn on CapsLock with Russian keyboard.
:
: PR: 18651
: Submitted by: "Mike E. Matsnev" <mike@po.cs.msu.su>
MFC after: 3 days