Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.
While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.
Approved by: re (blanket)
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.
While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.
Further simplify assembly code by dealing with ASTs from C.
Approved by: re (blanket)
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device. The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does. In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object. Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock. Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock. Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock. So, now the
doomed object has a reference count of one! Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc(). Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object. However, the doomed object is still in use by Pa.
Repeating my key point, vm_pager_object_lookup() must not return a
doomed object. Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().
Approved by: re (blanket)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).
This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.
Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.
Approved by: re (kensmith)
instead of per IOMMU, so we no longer need to program all of them
identically in systems having multiple IOMMUs. This continues the
rototilling of the nexus(4) done about 5 months ago, which amongst
others changed nexus(4) and the drivers for host-to-foo bridges
to provide bus_get_dma_tag methods, allowing to handle DMA tags in
a hierarchical way and to link them with devices.
This still doesn't move the silicon bug workarounds for Sabre (and
in the uncommitted schizo(4) for Tomatillo) bridges into special
bus_dma_tag_create() and bus_dmamap_sync() methods though, as w/o
fully newbus'ified bus_dma_tag_create() and bus_dma_tag_destroy()
this still requires too much hackery, i.e. per-child parent DMA
tags in the parent driver.
- Let the host-to-foo drivers supply the maximum physical address
of the IOMMU accompanying the bridges. Previously iommu(4) hard-
coded an upper limit of 16GB, which actually only applies to the
IOMMUs of the Hummingbird and Sabre bridges. The Psycho variants
as well as the U2S in fact can can translate to up to 2TB, i.e.
translate to 41-bit physical addresses. According to the recently
available Tomatillo documentation these bridges even translate to
43-bit physical addresses and hints at the Schizo bridges doing
43 bits as well.
This fixes the issue the FreeBSD 6.0 todo list item "Max RAM on
sparc64" was refering to and pretty much obsoletes the lack of
support for bounce buffers on sparc64.
Thanks to Nathan Whitehorn for pointing me at the Tomatillo manual.
Approved by: re (kensmith)
requiring DC_TX_ALIGN or DC_TX_COALESCE, which was previously done
in dc_start_locked(), into dc_encap().
o In dc_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
dc_start_locked() to not trying to prepend the mbuf (chain) if
dc_encap() has freed it.
- Likewise, if bus_dmamap_load_mbuf() fails as dc_dma_map_txbuf()
failed, free the mbuf possibly allocated by the above call to
m_defrag() and drop the packet.
o In dc_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 6 free TX
descriptors. Further down the road dc_encap() will bail if there
are only 5 or fewer free TX descriptors, causing dc_start_locked()
to abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 5 with a macro DC_TX_LIST_RSVD.
- Just always assign idx to sc->dc_cdata.dc_tx_cons; it doesn't
make much sense to exclude the idx == sc->dc_cdata.dc_tx_cons
case.
o In dc_dma_map_txbuf() there's no need to set sc->dc_cdata.dc_tx_err
to error if the latter is != 0, bus_dmamap_load_mbuf() already
returns the same error value in that case anyway.
o For less overhead, convert to use bus_dmamap_load_mbuf_sg() for
loading RX buffers.
o Remove some banal and/or outdated comments.
Approved by: re (kensmith)
MFC after: 1 week
to clear RL_TDESC_VLANCTL_TAG). This fixes sending packets in the
native VLAN when running both tagged and an untagged VLAN over the
same trunk and descriptors are recycled.
Approved by: re (kensmith)
MFC after: 1 week
d_mmap methods. prep_cdevsw() already installs the shims that
acquire/drop Giant for the methods of a driver that specified the
D_NEEDGIANT flag.
Reviewed by: alc
Approved by: re (kensmith)
- If the path cost is calculated when the link is down, set a pending flag so
it is calculated again when it comes back up.
- To not use 00:00:00:00:00:00 as the bridge id, all interfaces are scanned and
the lowest number wins. All zeros is too low.
Approved by: re (rwatson)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.
Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.
Approved by: re (blanket)
tdq_group structure. Hyper-threaded cores won't really benefit from
seperate locks anyway.
- Seperate out the migration case from sched_switch to simplify the main
switch code. We only migrate here if called via sched_bind().
- When preempted place the preempted thread back in the same queue at
the head.
- Improve the cpu group and topology infrastructure.
Tested by: many on current@
Approved by: re
message explained why the size is 1 sector, but the code used a
size of 1 cluster.
I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.
Approved by: re (kensmith)
- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.
Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.
This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.
Obtained from: jhb
Approved by: re (kensmith)
64bit counters are needed to simplify traffic accounting and
reduce system load at the big PPP concentrators.
Approved by: re (rwatson), glebius (mentor)
Till now node's transmit path was completely unprotected
and so wasn't thread safe in multilink mode. It's receive path was
declared as WRITER as the simpliest protection method but it
reduces performance when compression or encryption enabled.
Approved by: re (rwatson), glebius (mentor)
communicate with another private port.
All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.
An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.
Approved by: re (rwatson)
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections. When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.
To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key. Time how long it takes to
drop the connection.
The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.
Now boot with kern.hz=1000 (default). The same test causes the ssh session
to drop after just 9-10 seconds. This is a big deal on a wifi connection.
With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100. Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.
This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should
always be about 30. If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.
This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.
As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf. This is safe to run from 6.0 onwards.
Approved by: re (rwatson)
Reviewed by: andre, silby
that could cause panics and corruption under moderate load. Many thanks
to Matt Reimer, Tom McDonald, and the rest of the guys at VPOP.net for
their help in identifying and testing this.
Approved by: re
only USB 1.1 speeds available, but this shouldn't hurt. Now that we have
working usb support for this board, this is a natural followup.
Approved by: re (kensmith)
7 months. You must have JP6 in the 1-2 position to supply power to the
USB devices, but I've used uftdi, uplcom and umass successfully. If you
have it in 2-3, then nothing will show up. Also, if you have the FQPA
packaging for the AT91RM9200 (like the KN9202 boards have), you will get
the following message
uhub0: device problem (IOERROR), disabling port 2
due to a hardware erratum. It is safe to ignore as it is about pins that
aren't brought out on the FQPA package and aren't proeprly terminated either.
Alas, there's no register to read to tell the FQPA from the BGA versions.
Submitted by: Daan Vreeken
Approved by: re (kensmith)
revision 1.66
date: 2007/07/31 06:23:26; author: marcel; state: Exp; lines: +2 -2
Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (blanket)
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (rwatson)
errors (especially when jumbo frames are enabled or in low memory systems)
because the RX chain was corrupted when an mbuf was mapped to an unexpected
number of buffers.
- Fixed a problem that would cause kernel panics when an excessively
fragmented TX mbuf couldn't be defragmented and was released by
bce_tx_encap().
Approved by: re(hrs)
MFC after: 7 days
bucket pointer. The virtual mapping may not be present in the
translation cache. This will result in a nested TLB fault at
a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
a rfi. Behaviour is undefined otherwise.
Approved by: re (blanket)
(INTR_FILTER). This includes:
o Save a pointer to the sapic structure and IRQ for every vector,
so that we can quickly EOI, mask and unmask the interrupt.
o Add locking to the sapic code now that we can reprogram a
sapic on multiple CPUs at the same time.
o Use u_int for the vector and IRQ. We only have 256 vectors, so
using a 64-bit type for it is rather excessive.
o Properly handle concurrent registration of a handler for the
same vector.
Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.
Approved by: re (blacket)
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.
Approved by: re (blanket)
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.
This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.
Approved by: re (kensmith)
MFC After: 3 days
the fast or safe/slow method is in use. Fast remains at 1000, slow is
now at 850 (always preferred to TSC). Since the HPET has proven slower
than ACPI-fast on some systems, drop its quality to 900. In the future,
it is hoped that HPET performance will improve as it is the main
timer Intel supports. HPET may move back to 2000 in -current once RELENG_7
is branched to ensure that it gets tested.
Approved by: re
<netinet/tcp_fsm.h> is included into any compilation unit that needs
tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG
conditionals. This allows kernels both with and without TCPDEBUG to
build, and unbreaks the tinderbox.
Approved by: re (rwatson)
pc98 motherboards do not provide us with the correct day of week
either. Ignore the day of week when setting the clock here too.
Approved by: re (bmah)
Requested from: nyan
MFC after: 3 weeks
the duration of the function. The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.
Approved by: re (kensmith)