for use thoughout the tcp subsystem.
It is IPv4 and IPv6 aware creates a line in the following format:
"TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"
A "\n" is not included at the end. The caller is supposed to add
further information after the standard tcp log header.
The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use. All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.
Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.
Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.
tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
void *ip6hdr)
Usage example:
struct ip *ip;
char *tcplog;
if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
tcplog, __func__);
free(s, M_TCPLOG);
}
lock and unlock conditionally, not just set the flag on it conditionally.
In practice, this bug couldn't manifest, as in the current revision of
the code, no callers pass a NULL rep.
CID: 1416
Found with: Coverity Prevent(tm)
While ng_fec called the ioctl to let interfaces in the bundle know
the list of multicast addresses had changed, it never actually
updated that list on the interfaces in the bundle. Consequently,
the multicast filters could be programmed incorrectly.
if_lagg does this correctly, by maintaining a list of addresses
that it has added to interfaces in the bundle. This commit basically
takes the if_lagg code and adds it to ng_fec.
A version of this patch for RELENG_6 has fixed some problems with
IPv6 ND over ng_fec. This is probably the problem in PR 107523.
PR: 107523
Tested by: Rob Gallagher <robert.gallagher@heanet.ie>
Obtained from: if_lagg
MFC after: 3 weeks
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
speedup and will be more useful after each gains a spinlock in the
impending thread_lock() commit.
- Move initialization and asserts into init/fini routines. fini routines
are only needed in the INVARIANTS case for now.
Submitted by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff
specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the
feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
flags is clearer (i.e., defined specific macro names instead of using
hard-coded values).
Approved by: gnn (mentor)
MFC after: 1 week
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
input argument.
- When sending failed due to a no route to host, we leaked
the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by: gnn
defined. This restores the old behavior, and eliminates the
dependency on the kernconf.tmpl when INCLUDE_CONFIG_FILE isn't
included in the kernel config. There were many people in the terminal
room that had almost, but not quite, up-to-date config files that this
helps. I don't know if this is the result of skew among the cvsup
servers, or some other more subtle problem. However, this fix should
work for any config of recent vintage (I tested with the latest, and
one before the recent changes, and eye-balled the intermediate
versions).
Reviewed by: the terminal room crew
adapter list still capable, but only PCI-E adapters are now enabled.
The user can enable older PCI-X or PCI adapters using ifconfig.
Secondly, Arthur Hartwig pointed out my MSI change was not working
correctly, changed to something that now does. Thanks Arthur.
There was also a fundamental bug in the 82575 MSIX code, the MSIX
registers had to be mapped, opps :)
Rubber-stamped by: Pdeuskar
the power_nodriver tunable is off. pci_cfg_save() already checks the
tunable internally, and no other callers of pci_cfg_save() check the
tunable.
Reviewed by: imp
- Updated firmware to latest release (v3.4.8) to fix TSO + jumbo frame lockup
- Added MSI (hw.bce.msi_enable) and TSO (hw.bce.tso_enable) sysctls
- Fixed kernel panic when MSI is used and module is unloaded
- Added several new debug routines
- Removed slack space for RX/TX chains since it only covers sloppy coding
- Fixed a potential problem when programming jumbo MTU size in hardware
- Various other comment changes
MFC after: 4 weeks
because on at least my dc based cards there's garbage in there. The
recent changes in the resource code appears to have unmasked this
problem... At least dc now probes/attaches better than it did before.
Also, we no longer need to write to the cfg for the other registers.
different versions of FreeBSD source tree.
Old config(8) can now be used unless you want to use INCLUDE_CONFIG_FILE
option.
Approved by: imp
Reviewed by: imp
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:
tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()
tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()
This is a mechanical move with appropriate renames and making
them static if used only locally.
The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
value in the mbuf with the result of the calculation. Previously,
if we chose to return an ICMP message, the quoted UDP checksum bytes
would be different to what was sent.
PR: 112471
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 3 weeks
legacy codepath match the 82575, without this we were seeing bridging
fail on 82546 adapters. Secondly, I have limited TSO to PCI Express
adapters, I meant to do this and it got dropped in the earlier delta.
Next, I am dropping in the latest shared code from our development
team, consensus was that this should be done frequently, so I am :)
Approved by: pdeuskar
exists and contains the 'C' flag.
o The partition label can be the empty string. It's how labels are
cleared.
o When an action fails, lower permissions when they were raised
in order to allow the action. A failed action will not result
in any uncommitted changes.
o Allow the flags paremeter to be present but empty. It's the
equivalent of not being present.
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.
MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).
MFC after: 1 week
SIGCHLD/kevent(2) notification of process termination and wait(). Now
we no longer drop locks between sending the notification and marking
the process as a zombie. Previously, if another process attempted to do
a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked
the process while the exiting thread was in cpu_exit(), then wait() would
fail to find the process, which is quite astonishing to the process
calling wait().
MFC after: 3 days
option value so that unrecognized options are ignored as specified in RFC2711.
(packets containing an MLD router alert option are passed to the upper layer
as before).
Approved by: gnn (mentor), ume (mentor)
functions from their origininal place to their own files.
TCP Reassembly from tcp_input.c -> tcp_reass.c
TCP Timewait from tcp_subr.c -> tcp_timewait.c
is caused by my latest changes to config(8). You're supposed to install new
config(8) in order to prevent yourself from seeing a warning about old
version of that tool.
You should configure the kernel with a new config(8) then.
Oked by: rwatson, cognet (mentor)
This change will let us to have full configuration of a running kernel
available in sysctl:
sysctl -b kern.conftxt
The same configuration is also contained within the kernel image. It can be
obtained with:
config -x <kernelfile>
Current functionality lets you to quickly recover kernel configuration, by
simply redirecting output from commands presented above and starting kernel
build procedure. "include" statements are also honored, which means options
and devices from included files are also included.
Please note that comments from configuration files are not preserved by
default. In order to preserve them, you can use -C flag for config(8). This
will bring configuration file and included files literally; however,
redirection to a file no longer works directly.
This commit was followed by discussion, that took place on freebsd-current@.
For more details, look here:
http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.htmlhttp://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html
Development of this patch took place in Perforce, hierarchy:
//depot/user/wkoszek/wkoszek_kconftxt/
Support from: freebsd-current@ (links above)
Reviewed by: imp@
Approved by: imp@
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
passed zero as exit signal.
GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
a macro that does all of this. This removes all printfs from
the code and makes the code more portable and easier to
read.
- Static Analysis (cisco) - found a few bugs, but mostly we
add checks for NULL pointers and such to make the tool
happy. We now pass the Cisco SA tools checks except for
where it does not understand tailq/lists. We still need
to look at the coverity tools output too (this is like
the cisco SA tool) and see if it wants us to fix any other
items. Hopefully this will be the last major churn in the
code other than bug fixes.
This patch does the following:
- Remove un-necessary code that is not even compiling into the driver
under TW_OSL_NON_DMA_MEM_ALLOC_PER_REQUEST defines.
- Remove bundled firmware image and associated "files" entry for tw_cl_fwimg.c
- Remove bundled firmware flashing routines. We now have tw_update userspace
FreeBSD controller flash utility.
- Fix driver crash on load due to shared interrupt.
- Fix 2 lock leaks for Giant lock.
- Fix CCB leak.
- Add support for 9650SE controllers.
Many thanks to 3Ware/AMCC for continuing to support FreeBSD.
time workaround for problems with 82571 adapters and LAAs, one port
getting reset can cause the other to have its RAR[0] also reset,
thus overwriting an LAA. This fix works around it by also keeping
the address in the last array member.
The other bug is specific to the new 575 adapter, its transmit code
logic in handling hwassists was too crude, it broken when doing
bridges. I am much happier with the new logic,we may want to change
the legacy path at some point to something similar.
Reviewed by: pdeuskar
Approved by: pdeuskar
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.
MFC after: 1 week
1) adding the thread to the sleepq via sleepq_add() before dropping the
lock, and 2) dropping the sleepq lock around calls to lc_unlock() for
sleepable locks (i.e. locks that use sleepq's in their implementation).
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.
119373: o Remove the query verb, along with the request and response
parameters.
o Add the version and output parameters.
119390: [APM,GPT] Properly clear deleted entries.
119394: o Make the alias the standard and use the '!' to prefix
literal partition types.
o Treat schemes and partition types as case insensitive.
119462: [GPT] Fix a page fault caused when modifying a partition entry
without a new partition type.
stack will process from 50 to 15. As this is a sysctl variable it
can be tuned up or down at the user/administrator's whim.
Submitted by: itojun
MFC after: 1 day
to the coverity tool.. may even be the same one.. not sure).
- A bug in the way sctp_abort() and friends were
setting the IP_CLOSE flag.. and NOT passing the
last argument as a (,1)... so that things would
get freed..
- Update to latest (1.4.17) firmware.
- Use the new MXGEFW_CMD_UNALIGNED_TEST (added in firmare 1.4.16) to
have the firmware tell us if the PCIe chipset supports aligned PCIe
completions.
- Hard to maintain, and frequently out of date whitelist of PCIe
chipsets known to produce aligned completions removed, as it has been
replaced in its role of selecting the correct firmware to run by the
use of MXGEFW_CMD_UNALIGNED_TEST.
- Break the dma test out of mxge_reset() and into its own function
(mxge_dma_test()) so it can be used by both the normal DMA test, and
to run the unaligned test.
- Improved support for enabling ECRCs
Sponsored by: Myricom Inc.
- PR-SCTP would ignore FWD-TSN's above a rwnd's worth
of TSN's (1 byte msgs).. this left the peer hopelessly
out of sync.. or an attacker. So now we abort the assoc.
- New IFN hash, also rename hashes to match addr/ifn now
that the vrf has multiple.
- Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
as defined in the Socket API ID.
- Export MTU information via sysctl.
- Vrf's need table id's. This is default for
BSD, but may be other things later when BSD
fully supports VRFs.
- Additional stream reset bug (caught by cisco dev-test).
- Additional validations for the address in sending a message (socket api).
-------- and -----
- Fix association notifications not to give the active open
side false notifications.
- Fix so sendfile and SENDALL will work properly (missing
flag to say socket sender is done).
- Fix Bug that prevented COOKIES from being retransmitted.
- Break out connectx into helper sub-models so that iox routines can
reuse the helpers.
- When an address is added during system init (non-dynamic mode) make
sure that the "defer use" flag is not set.
** its compiling on XR now :-D **
Reviewed by: gnn
and in_setsockaddr(), containing only stale comments on why they
exist, remove them and initialize the protosw for UDP to directly
reference in_setpeeraddr() and in_setsockaddr().
The entire code is wrapperd in #ifdef ... #endif so it won't harm
the actual implementation, but developers are encouraged to test it.
For arm, ia64, ppc, sparc64 and sun4v some work is still
needed, thus arch maintainers are encouraged to bring their arch on par
with respect to i386 and amd64.
Approved by: re (implicit?)
o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.
Approved by: re (implicit?)
and change it to a void function.
We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late
late segments arriving for such a connection is handled directly in the TW
code.
and show up with different names: first try to open provider using
remembered name and compare its ident, if equal, this is our provider,
if not equal or there is no provider with such name, find provider with
remembered ident and don't care about the name.
- Locks were not being unlocked when an invalid size chunk is
sent in.
- When a notification comes in, we cannot use it to look up
the fragment interleave stream information since its not
on a stream.
Seems to work on RELENG_4 through -current and also on sparc64
now. There may still be some issues with the auto attach/detach
code to sort out.
MFC after: 3 days
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194
DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).
DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).
DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
GEOM::ident attribute).
First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:
- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
copying whole data from one disk to another should not yield the same
ident for the other disk,
- there could be more than one provider with the same ident, but only if
they point at exactly the same physical storage, this is the case for
multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
like geli, gbde, should just attach class name to the ident of the
underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.
The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.
OK'ed by: phk
- Make wlan_amrr depend on wlan, so that it can find various symbols in
wlan module if wlan is not compiled into kernel.
Approved by: sam (mentor)
Tested by: kevlo
- http://www.intel.com/design/chipsets/specupdt/245051.htm
AC97 Soft Audio and Soft Modem Master Abort Errata
Issue:
Use of either soft audio or soft modem on an Intel® 82443MX PCISet
based platform running a 100 MHz Processor System Bus and an AC97 codec
may result in failures. The system continues to function normally while
the AC97 hardware may not resume and may require a cold-boot to
recover. As a result of the failure, the Master Abort Status bit will
be set in the audio or modem function PCI header space.
Workaround:
Force uncacheable DMA on both BDL and pcm buffers.
Tested by: Emil Holmstr|m <emil@linux.se>
- Remove explicit call to pmap_change_attr(), since we now have proper
and functional definition of BUS_DMA_NOCACHE.
- Enable PCI(e) bus snooping for non i386/amd64 as an alternative for
uncacheable DMA.
- Codecs changes:
* Analag Device -> Analog Devices, AD1988.
* New codec: VIA VT1708 and VT1709, Realtek ALC262, ALC861-VD and
ALC885.
* Various fixups for Conexant Waikiki, fix recording (read: microphone)
on various Analog Devices codecs due to vendor BIOS mess, various
quirks for several ASUS laptops/boards.
- Fix connection list handling, closely following the specification to
handle range of nids.
- Basic Jack sense polling infrastructure for possible hardwares with
broken unsolicited response interrupt.
Ideas/Submitted/Tested by: Andriy Gapon <avg@icyb.net.ua>,
#freebsd-azalia, many.
state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace().
Move to ANSI C function header, correct prototype types so that short TCP
state is no longer promoted to int unnecessarily.
Add comments.
MFC after: 3 weeks
Updated copyright date to 2007.
Tested with BCM5706 A3.
Added ID for BCM5708 B2.
Removed unused driver version string.
Modified BCE_PRINTF macro to automatically fill-in the sc pointer.
Fixed a kernel panic when the driver was loaded as a module from the
command-line because the MII bus pointer was null (i.e. the MII bus
hadn't been enumerated yet).
Added fix proposed by Vladimir Ivanov <wawa@yandex-team.ru> to prevent
driver state corruption when releasing the lock during the ISR in
bce_rx_intr() to send packets up the stack.
Added new TX chain and register read sysctl interfaces for debugging.
Cleaned up formatting for various other debug routines.
Added a new statistic maintained by firmware which tracks the number
of received packets dropped because no receive buffers are available.
correct network drivers with respect to busmaster DMA, go over it
with at duster to make other aspects of it a role model:
Eliminate the pci specific softc, it serves no rational purpose.
Use convenience resource allocation/deallocation functions to save
code and errorhandling.
Switch from bus_space_{read|write}_%u() to bus_{read|write}_%u()
functions and forget about tags and handles, the resource will know
about those, should they be needed. This also eliminates a number
of inconsistently named local variables.
it was full and a collision occured, then we would leave
a inp locked. Also fixes a missing inp unlock if IPSEC was
on and it failed during the attach. Bug found by Weongyo Jeong.
as UF_OPENING. Disable closing of that entries. This should fix the crashes
caused by devfs_open() (and fifo_open()) dereferencing struct file * by
index, while the filedescriptor is closed by parallel thread.
Idea by: tegge
Reviewed by: tegge (previous version of patch)
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
in comments for .c and .h files respectively. Jack may want to clean up
style or other aspects once he's up and about again, but this gets the
kernel compiling.
shared code infrastructure that is family specific and
modular. There is also support for our latest gigabit
nic, the 82575 that is MSI/X and multiqueue capable.
The new shared code changes some interfaces to the core
code but testing at Intel has been going on for months,
it is fairly stable.
I have attempted to be careful in retaining any fixes that
CURRENT had and we did not, I apologize in advance if any
thing gets clobbered, I'm sure I'll hear about it :)
Approved by pdeuskar
on each socket buffer with the socket buffer's mutex. This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O
operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer
I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave (0-2).
- Codenomicon security test updates - length checks and such.
- Bug in stream reset (2 actually).
- setpeerprimary could unlock a null pointer, fixed.
- Added a flag in the pcb so netstat can see if we are listening easier.
Obtained from: (some of the Listen changes from Weongyo Jeong)
pointers. A structure is more readable and less error-prone. It
also avoids problems when a function pointer doesn't have the
same width as a void pointer.
functions with CPUs they apply to only, otherwise default to the
plain C functions. This is modeled in a way so that f.e. a Cheetah
version of these functions can be inserted easily.
Not because I admit they are technically wrong and not because of bug
reports (I receive nothing). But because I surprisingly meets so
strong opposition and resistance so lost any desire to continue that.
Anyone who interested in POSIX can dig out what changes and how
through cvs diffs.
the UPA_IMR2 resource is also shared with/a subset of the Schizo PCI
bus B CSR bank. I'm not entirely sure how this previously managed to
escape testing...
consistent with the naming of other structure field members, and
reducing improper grep matches. Clean up and comment structure
fields in structure definition.
sc->mii_anegticks according to whether the respective BGE chip
supports Fast Ethernet only or also Gigabit Ethernet.
- At least the BGE chips I've tested with wedge when isolating them
so document this as the reason for setting MIIF_NOISOLATE and
remove the unused (and partially even #ifdef'ed out) isolation
related code. Add code that panics if we encounter a non-zero MII
instance as generally there's no way a PHY requiring MIIF_NOISOLATE
can be handled gracefully in a multi-PHY configuration (it's ok for
the internal PHY of single-PHY-only-NIC to not support isolation
though).
- Additionally set MIIF_NOLOOP as loopback doesn't seem to work
either and remove the #ifdef'ed out code for adding respective
media. The MIIF_NOLOOP flag currently triggers nothing but
hopefully will be respected by mii_phy_setmedia() later on.
Reviewed by: jkim, yongari
MFC after: 1 month
Blade 2500, Fire V210 and probably some other sparc64 machines.
These chips are typically not fitted with an EEPROM which means
that we have to obtain the MAC address via OFW and that some chip
tests will just always fail.
These changes are based on the respective code found in OpenBSD
with some additional info obtained from OpenSolaris and some style
suggestions by jkim@. They also have the desired side-effect of
respecting the 'local-mac-address?' system configuration variable
for the affected BGEs.
- In bge_attach() factor out calling bge_release_resources() before
going to the fail label into the fail label as well as replace a
magic 6 with ETHER_ADDR_LEN.
Reviewed by: yongari (before style changes), jkim
- Wake up DMA engine after adding a new receive buffer.
- Skip buffers which have unknown state after error.
- More rigid error detection.
MFC after: 1 week
as some combinations of chipset, controller and target do not behave
correctly when DMA is enabled for other commands.
PR: kern/103602
MFC after: 2 weeks
were never freed, but the big ring was freed twice.
-Don't supply rx hw csums for frames which are padded beyond the
length specified in the ip header. If the padding is non-zero,
the hw csum will be incorrect for such frames.
Sponsored by: Myricom
non-mapped data as possible at once and not page-by-page. Which this change we
combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK()
operations.
Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23%
performance increase.
This workaround the problem in Parallels/VMWare where the emulated drivers are
slower, especially with ATA_FLUSHCACHE. The problem appears much more
frequently with ZFS which use it a lot more.
Approved: sos, pjd
- vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has
to be called before we write the data in case someone makes page dirty after
our write, but before our vm_page_undirty() call.
- Always dmu_write, not matter if uiomove() succeeded, because it could
partially be ok and we would lose some changes.
All good ideas from: ups
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.
Submitted by: sobomax
MFC after: 2 weeks
- Fix for a bug where a close would not wait for all (directio)
dirty buffers to drain. The nfsnode was not marked NMODIFIED
when there were directio dirtied buffers pending, causing this.
- No reason to vhold/vrele the vp when enqueueing DirectIO requests
for the nfsiods. The vnode can't really go way since the close
has to wait for these requests to drain.
MFC after: 1 week
Submitted by: mohans
specific request and thus should first try to be allocated from the
sys_resource pool. This avoids using the sys_resource pool for wildcard
requests that have bounded ranges coming from cbb(4) and Host-PCI pcib(4)
drivers.
Tested by: Andrea Bittau <a.bittau of cs.ucl.ac.uk fame>
Sleuthing by: Andrea Bittau as well
that the MSI mapping window is fixed at 0xfee00000 and the capability
does not include two more dwords used to program the address. Supporting
this mostly results in quieting spurious warnings during boot about
non-default MSI mapping windows.
- HT 2.00b also added a new HT capability type, so support that in pciconf.
MFC after: 3 days
Tested by: jmg
It seems that valid pause frames(Tx flow control) cause GMAC to hang
such that it resulted in watchdog timeout. As a work around don't
flush Rx MAC FIFO if we've received pause frames.
Tested by: Harald Schmalzbauer (h DOT schmalzbauer AT omnisec DOT de)
Under certain circumtances, if TSO is active, Yukon II generates
corrupted IP packets. All corrupted IP packets I noticed were the the
last segmented packet in a TSO request. The corrupted packet resulted
in retransmission of the damaged packet which in turn decreased network
performance dramatically.
Unfortunately it seems that there is no way to workaround this bug
as TSO is completely handled in hardware. Disable TSO until we find a
working workaround or a new silicon revision that doesn't have this
hardware bug.
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.
The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.
MFC after: 3 days