Unlike TX interrupt, ST201 does not provide any mechanism to
suppress RX interrupts. ste(4) can generate more than 70k RX
interrupts under heavy RX traffics such that these excessive
interrupts make system useless to process other useful things.
Maybe this was the major reason why polling support code was
introduced to ste(4).
The STE_COUNTDOWN register provides a programmable counter that
will generate an interrupt upon its expiration. We program
STE_DMACTL register to use 3.2us clock rate to drive the counter
register. Whenever ste(4) serves RX interrupt, the driver rearm
the timer to expire after STE_IM_RX_TIMER_DEFAULT time and disables
further generation of RX interrupts. This trick seems to work well
and ste(4) generates less than 8k RX interrupts even under 64 bytes
UDP torture test. Combined with TX interrupts, the total number of
interrupts are less than 10k which looks reasonable on heavily
loaded controller.
The default RX interrupt moderation time is 150us. Users can change
the value at any time with dev.ste.%d.int_rx_mod sysctl node.
Setting it 0 effectively disables the RX interrupt moderation
feature. Now we have both TX/RX interrupt moderation code so remove
loop of interrupt handler which resulted in sub-optimal performance
as well as more register accesses.
M5229 appears to be once again fixed. If this happens to return
we probably should disable ATAPI DMA in ataacerlabs(4) instead
just like the Linux libATA does.
in intr_execute_handlers(). If we managed to get here without an
associated interrupt controller we have way bigger problems.
While at it predict stray vector interrupts as false as they are
rather unlikely.
- Don't blindly call the clear function of an interrupt controller
when adding a handler in inthand_add() as interrupt controllers
like the one driven by upa(4) are auto-clearing and thus provide
NULL instead.
Server Return mode, where not all packets would be visible to the load
balancer or gateway.
This commit should be reverted when we merge future pf versions. The
benefit it would provide is that this version does not break any existing
public interface and thus won't be a problem if we want to MFC it to
earlier FreeBSD releases.
Discussed with: mlaier
Obtained from: OpenBSD
Sponsored by: iXsystems, Inc.
MFC after: 1 month
This brings hwpmc(4) support for 2nd and 3rd generation XScale cores.
Right now it's enabled by default to make sure we test this a bit.
When the time comes it can be disabled by default.
Tested on Gateworks boards.
A man page is coming.
Obtained from: //depot/user/rpaulo/xscalepmc/...
a bit of a detour we can just iterate through the banks array instead
of having to calculate every offset. This change is inspired by the
powerpc version of this function.
- Add support for the JBus to EBus bridges which hang off of nexus(4).
to PCIe bridges.
- Add support for talking the PROM mappings over to the kernel IOTSB
just like we do with the kernel TSB in order to allow OFW drivers
to continue to work.
- Change some members, parameters and variables to unsigned where
more appropriate.
enable IDE I/O" bit which prevents data access traps with revision
0xc8 in Fire-based machines when pci(4) enables PCIM_CMD_PORTEN.
- Like for sun4v also don't add the PCI side of host-PCIe bridges to
the bus on sun4u as they don't have configuration space implement
there either.
of the interrupt handler in intr_fast() as the handler might clobber
it (no in-tree handler currently does but an upcoming one will).
While at it, tidy the register usage in the interrupt counting code.
transmitted frames. So request interrupt for every 16th frames. Due
to the limitation of hardware we can't suppress the interrupt as
driver should have to check TX status register. The TX status
register can store up to 31 TX status so driver can't send more
than 31 frames without reading TX status register.
With this change controller would not generate TX completion
interrupt for every frame, so reclaim transmitted frames in
ste_tick().
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.
As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.
used to return success without respect to the result.
While I'm here use mii_mediachg() in ste_init_locked which allows
driver to use currently configured media. ste_ifmedia_upd() is
supposed to be called whenever user changes current media settings.
o Let RX filter handler program promiscuous/multicast filter as
well as broadcasting.
o Remove unnecessary register access.
o Simplify ioctl handler and have set_rxfilter to handle
IFF_PROMISC and IFF_ALLMULTI change instead of directly
programming the controller.
o Removed unnecessary error variable reinitialization in ioctl
handler.
o Add IFF_DRV_RUNNING check before programming multicast filter.
o Configure maximum allowed frame length before enabling MAC.
Datasheet didn't say the exact ordering of programming sequence
but it looks more natural to set maximum allowed frame length
first prior to enabling controller.
1ms. Since we switched to memory register mapping make sure to
flush PCI posted write by reading the register again.
While I'm here add additional delays in loop while driver waits the
completion of the reset.
The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The
frequencies are rounded to the nearest whole MHz.
While here, rename and re-type bus_frequency, processor_frequency and
itc_frequency to bus_freq, cpu_freq and itc_freq and make them static.
As unsigned integers, the hw.freq.cpu sysctl can more easily be made
generic (across all architectures) making porting easier.
MFC after: 3 days
If ste(4) encounter TX underrun or excessive collisions the TX MAC
of controller is stalled so driver should wake it up again. TX
underrun requires increasing TX threshold value to minimize
further TX underruns. Previously ste(4) used to reset controller
to recover from TX underrun, excessive collision and reclaiming
error. However datasheet says only TX underrun requires resetting
entire controller. So implement ste_restart_tx() that restarts TX
MAC and do not perform full reset except TX underrun case.
Now ste(4) uses CSR_READ_2 instead of CSR_READ_1 to read
STE_TX_STATUS register. This way ste(4) will also read frame id
value and we can write the same value back to STE_TX_FRAMEID
register instead of overwriting it to 0. The datasheet was wrong
in write back of STE_TX_STATUS so add some comments why we do so.
Also always invoke ste_txeoc() after ste_txeof() in ste_poll as
without reading TX status register can stall TX MAC.
and are found in sun4u and sun4v machines based on the Fire ASIC.
- Initialize the configuration space of the PCI to EBus variant the
same way as OpenSolaris does.
- Change INTMAP_VEC() to take an INO as its second argument rather
than an INR. The former is what I actually intended with this
macro and how it's currently used.
is that the JBus to EBus bridges share the interrupt controller of a
sibling JBus to PCIe bridge (at least as far as the OFW device tree
is concerned, in reality they are part of the same chip) so we have to
probe and attach the latter first. That happens to be also the case
due to the fact that the JBus to PCIe bridges appear first in the OFW
device tree but it doesn't hurt to ensure the right order.
receiving incoming traffics, try harder to gracefully stop active
DMA cycles and then stop MACs. This is the way what datasheet
recommends and seems to work reliably. Resetting controller while
active DMAs are in progress is bad thing as we can't predict how
DMAs touche allocated TX/RX buffers. This change ensures controller
stop state before attempting to release allocated TX/RX buffers.
Also update MAC statistics which could have been updated during the
wait time of MAC stop.
While I'm here remove unnecessary controller resets in various
location. ste(4) no longer relies on hard controller reset to stop
controller and resetting controller also clears all configured
settings which makes it hard to implement WOL in near future.
Now resetting a controller is performed in ste_init_locked().
with SSM MLDv2 by default.
This is current practice and complies with RFC 4604, as well as being
required by production IPv6 networks in Japan.
The behaviour may be disabled by setting the net.inet6.mld.use_allow
sysctl/tunable to 0.
Requested by: Hideki Yamamoto
MFC after: 1 week
interrupt. If we want to use link state change interrupt ste(4)
should also implement auto-negotiation complete handler as well as
various PHY access handling. Now link state change is handled by
mii(4) polling so it will automatically update link state UP/DOWN
events which in turn make ste(4) usable with lagg(4).
r199559 added a private timer to drive watchdog and the timer also
used to drive MAC statistics update. Because the MAC statistics
update is called whenever statistics counter reaches near-full, it
drove watchdog timer too fast such that it caused false watchdog
timeouts under heavy TX traffic conditions.
Fix the regression by separating ste_stats_update() from driving
watchdog timer and introduce a new function ste_tick() that handles
periodic job such as driving watchdog, MAC statistics update and
link state check etc.
While I'm here clear armed watchdog timer in ste_stop().
link state and PHY related information.
Remove ste_link and ste_one_phy variable of softc as it's not used
anymore.
While I'm here add IFF_DRV_RUNNING check in ste_start_locked().
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month
o Sorted includes and added missing header files.
o Added basic endianness support. In theory ste(4) should work on
any architectures.
o Remove the use of contigmalloc(9), contigfree(9) and vtophys(9).
o Added 8 byte alignment limitation of TX/RX descriptor.
o Added 1 byte alignment requirement for TX/RX buffers.
o ste(4) controllers does not support DAC. Limit DMA address space
to be within 32bit address.
o Added spare DMA map to gracefully recover from DMA map failure.
o Removed dead code for checking STE_RXSTAT_DMADONE bit. The bit
was already checked in each iteration of loop so it can't be true.
o Added second argument count to ste_rxeof(). It is used to limit
number of iterations done in RX handler. ATM polling is the only
consumer.
o Removed ste_rxeoc() which was added to address RX stuck issue
(cvs rev 1.66). Unlike TX descriptors, ST201 supports chaining
descriptors to form a ring for RX descriptors. If RX descriptor
chaining is not supported it's possible for controller to stop
receiving incoming frames once controller pass the end of RX
descriptor which in turn requires driver post new RX
descriptors to receive more frames. For TX descriptors which
does not support chaning, we exactly do manual chaining in
driver by concatenating new descriptors to the end of previous
TX chain.
Maybe the workaround was borrowed from other drivers that does
not support RX descriptor chaining, which is not valid for ST201
controllers. I still have no idea how this address RX stuck
issue and I can't reproduce the RX stuck issue on DFE-550TX
controller.
o Removed hw.ste_rxsyncs sysctl as the workaround was removed.
o TX/RX side bus_dmamap_load_mbuf_sg(9) support.
o Reimplemented optimized ste_encap().
o Simplified TX logic of ste_start_locked().
o Added comments for TFD/RFD requirements.
o Increased number of RX descriptors to 128 from 64. 128 gave much
better performance than 64 under high network loads.
the leading underscores since they are now implemented.
- Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info
structure.
Reviewed by: rwatson
MFC after: 2 weeks
+ in many places, replace &V_layer3_chain with a local
variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures
This device only appears on the ACPI bus, so isn't caught by the current
entry for it in the uart(4) ISA attachment.
PR: kern/140172
Reviewed by: jhb, marcel
Approved by: ed (mentor)
MFC after: 2 weeks
causes additional MSIs messages sent if several ports asked for attention
same time. Time window before clearing is not important, as these interrupts
are level triggered by interrupt source.
flag. Besides providing the redundand information, need to update both
vnode and object flags causes more acquisition of vnode interlock.
OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects.
Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for
vnode-backed vm objects.
Suggested and reviewed by: alc
Tested by: pho
MFC after: 3 weeks
* Read the pci capability register to identify AGP 3 support
* Add missing smaller aperture sizes for AGP3 chips.
* Fix the aperture size calculation on AGP2 chips.
All sizes between 32M and 256M reported as 256M.
* Add \n to error string.
This all seems to get the CLE266 EPIA-M board agp working properly, now
back to work on drm.
MFC after: 2 weeks
Quite contrary to VT6130 datasheet which says it supports up to 8K
jumbo frame, VT6130 does not seem to send jumbo frame that is
larger than 4K in length. Trying to send a frame that is larger
than 4K cause TX MAC hang.
Even though it's possible to allow 4K jumbo frame for VT6130, I
think it's meaningless to allow 4K jumbo frame. I'm not sure VT6132
also has the same limitation but I guess it uses the same MAC of
VT6130.
controller will split the jumbo frame into multiple RX buffers.
However it seems the hardware always dma the frame to 8 bytes
boundary for the split frames. Only the first part of the fragment
can have 4 byte alignment and subsequent buffers should be 8 bytes
aligned. Change RX buffer the alignment requirement to 8 bytes from
4 bytes.
Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.
The fixes:
- Cache the cdev_priv of the controlling TTY in struct session. Change
devfs_access() to compare against the cdev_priv instead of the vnode.
This allows you to bypass UNIX permissions, even across different
mounts of devfs.
- Extend devfs_prison_check() to unconditionally expose the device node
of the controlling TTY, even if normal prison nesting rules normally
don't allow this. This actually allows you to interact with this
device node.
To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).
Reported by: Many people
Just like a similar change we made to the TTY code about half a year
ago, make these strings look similar.
Suggested by: Jille Timmermans <jille@quis.cx>
This tunable allows one to enable (1) or disable (0) gestures like tap
and tap-hold on Synaptics TouchPad when the Extended mode isn't enabled
(ie. "hw.psm.synaptics_support" not set).
By default, the value is -1 in order to keep the current behaviour of
not enabling/disabling gestures explicitly.
PR: kern/139272
Submitted by: David Horn <dhorn2000 AT gmail DOT com>
Reviewed by: David Horn <dhorn2000 AT gmail DOT com>
target one. Since r184058, linux_do_tkill() calls tdsignal() instead of
kill(), without checking for validity of supplied signal number. Prevent
panic when supplied signal is 0 by finishing work after checks.
Found and tested by: scf
MFC after: 3 days
value is obtained by dividing it by 256, not by 2550; also,
one second is 10^9 nanoseconds, not 1800000000 nanoseconds.
- Due to rounding error, setting watchdog to a really small
timeout (<1 sec) was turning the watchdog off. It should
set the watchdog to a small timeout instead.
- Implemented error checking in ipmi_wd_event(), as required
by watchdog(9).
PR: kern/130512
Submitted by: Dmitrij Tejblum
- Additionally, check that the timeout value is within the
supported range, and if it's too large, act as required by
watchdog(9).
MFC after: 3 days
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.
ipfw add 5000 count log ....
and run
tcpdump -ni ipfw0 ...
net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.
More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.
perform a function such as ejecting a 3G autoinstaller disk. The eventhandler
system properly tracks threads and is safe to unload, remove the
setting/clearing of a function pointer in the kernel by u3g(4) which included a
tsleep for safety.
threads are executing the eventhandler, sleep in this case to make it safe for
module unload. If the runcount was up then an entry would have been marked
EHE_DEAD_PRIORITY so use this as a trigger to do the wakeup in
eventhandler_prune_list().
Reviewed by: jhb
controllers. TX/RX interrupt mitigation is controlled by
VGE_TXSUPPTHR and VGE_RXSUPPTHR register. These registers suppress
generation of interrupts until the programmed frames counter equals
to the registers. VT61xx also supports interrupt hold off timer
register. If this interrupt hold off timer is active all interrupts
would be disabled until the timer reaches to 0. The timer value is
reloaded whenever VGE_ISR register written. The timer resolution is
about 20us.
Previously vge(4) used single shot timer to reduce Tx completion
interrupts. This required VGE_CRS1 register access in Tx
start/completion handler to rearm new timeout value and it did not
show satisfactory result(more than 50k interrupts under load). Rx
interrupts was not moderated at all such that vge(4) used to
generate too many interrupts which in turn made polling(4) better
approach under high network load.
This change activates all interrupt moderation mechanism and
initial values were tuned to generate interrupt less than 8k per
second. That number of interrupts wouldn't add additional packet
latencies compared to polling(4). These interrupt parameters could
be changed with sysctl.
dev.vge.%d.int_holdoff
dev.vge.%d.rx_coal_pkt
dev.vge.%d.tx_coal_pkt
Interface has be brought down and up again before change take
effect.
With interrupt moderation there is no more need to loop in
interrupt handler. This loop always added one more register access.
While I'm here remove dead code which tried to implement subset of
interrupt moderation.
to list them all in the Makefile for the module,
otherwise it won't load due to missing symbols.
The problem only affected head with ipfw built as a module.
Reported by David Horn
ethernet controller was recognized. VIA consistently calls
"Velocity" family for gigabit ethernet controllers. For fast
ethernet controllers they uses "Rhine" family(vr(4) controllers))
and vr(4) already shows "Rhine" in probe message.
tagged frames so add checksum offloading capabilities. Also add
missing VLAN hardware tagging control in ioctl handler and let
upper stack know current VLAN capabilities.
This is SMBus controller found in Intel Platform Controller Hub (PCH),
which is a general name that refers to Intel 5 Series chipsets and
3400 Series chipsets.
Submitted by: Dmitry S. Luhtionov <mitya@cabletv.dp.ua>
MFC after: 3 days
- move global variables around to reduce the scope and make them
static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
(the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
slipped in due to cut&paste
- remove duplicate static variables in different files;
MFC after: 1 month