immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.
Suggested by: bde
without Giant held.
A quick outline of the locking strategy:
Since all IOMMUs are synchronized, there is a single lock, iommu_mtx,
which protects the hardware registers (where needed) and the global and
per-IOMMU software states. As soon as the IOMMUs are divorced, each struct
iommu_state will have its own mutex (and the remaining global state
will be moved into the struct).
The dvma rman has its own internal mutex; the TSB slots may only be
accessed by the owner of the corresponding resource, so neither needs
extra protection.
Since there is a second access path to maps via LRU queues, the consumer-
provided locking is not sufficient; therefore, each map which is on a
queue is additionally protected by iommu_mtx (in part, there is one
member which only the map owner may access). Each map on a queue may
be accessed and removed from or repositioned in a queue in any context as
long as the lock is held; only the owner may insert a map.
To reduce lock contention, some bus_dma functions remove the map from
the queue temporarily (on behalf of the map owner) for some operations and
reinsert it when they are done. Shorter operations and operations which are
not done on behalf of the lock owner are completely covered by the lock.
To facilitate the locking, reorganize the streaming buffer handling;
while being there, fix an old oversight which would cause the streaming
buffer to always be flushed, regardless of whether streaming was enabled
in the TSB entry. The streaming buffer is still disabled for now, since
there are a number of drivers which lack critical bus_dmamp_sync() calls.
Additional testing by: jake
series, the 8139C+ has a descriptor-based DMA mechanism, and its
performance is actually pretty respectable. Note: the 8139D chip does
not support C+ mode. Only the 8139C+ and 8169 gigE chips support C+ mode.
Supported features:
- RX and TX checksum offload
- hardware VLAN tag insertion/extraction
- TX interrupt moderation using the 8139's on-board timer
Everything should be properly busdma'ed and endian-independent, so
things should work ok on non-x86 platforms. Unfortunately, my call
for testers on this code was met with deafening silence, and I don't
have access to any non-x86 FreeBSD boxes at the moment, so this is
speculation.
The device detection code has been cleaned up a little as well
(thanks to Michal Mertl) for the patches.
There are also updates to the rl(4) man page (which I accidentally
checked in before when I updated the dc(4) man page. Oops.)
Todo: finish support for the 8169 gigabit ethernet chip. This
mainly requires writing an rlgphy driver to handle the 8169's built-in
PHY. This will have to wait until I actually get my hands on an 8169
card for testing though. (I still can't find a source for one in the
U.S. Suggestions/pointers welcome.)
- MN-110 10/100 USB ethernet (ADMtek Pegasus II, if_aue)
- MN-120 10/100 cardbus (ADMtek Centaur-C, if_dc)
- MN-130 10/100 PCI (ADMtek Centaur-P, if_dc)
Also update dc(4) man page to mention support for MN-120 and MN-130.
* Always use polled mode. The intr approach did not work for many
controllers and required the hw.acpi.ec.event_driven workaround.
* Only use an edge (not level) triggered GPE handler
* Add sc->ec_mtx for locking operations to a single EC. There were
many race conditions earlier between an SCI event and EcRead/Write.
* Use 1 ms as the global lock timeout
* Only acquire global lock if _GLK != 0
* Update EcWaitEvent to use an incremental backoff delay in its
poll loop. Wait 50 ms max instead of 10. Most ECs respond
in < 5 us (50 us when heavily loaded). However, some time out
occasionally even with a 10 ms timeout. For delays past 1 ms, use
msleep instead of DELAY to give SCI interrupts a chance to occur.
* Add EcCommand to send a command and wait for the appropriate event.
* The hw.acpi.ec.event_driven tunable is no longer applicable and
has been removed.
Ideas from: Linux
bus_dma_tag_create. We need to be sure that our packets are
kept in-sequence (that's how ATM is supposed to work) and
therefor use BUS_DMA_NOWAIT in all calls to bus_dmamap_load.
For memory allocated with bus_dmamem_alloc the use of anything
other than NULL arguments for the locking is anyway bogus because
this memory never should need bouncing and hence the load should never
be defered.
Allow the receipt of OAM and RM cells on raw connections. Caveat: it seems
that RM cells are still processed by the hardware even when we open the
connection as UBR.
register, present only on 3c90xB and later NICs. This meant that you could
not use a 1500 byte MTU with VLANs on original 3c905/3c900 cards (boomerang
chipset). The boomerang chip does support large frames though, just not
in the same way: you can set the 'allow large frames' bit in the MAC
control register to receive frames up to 4K in size.
Changes:
- Set the 'allow large frames' bit for boomerang chips and increase
the packet size register for cyclone and later chips. This allows
us to use IFCAP_VLAN_MTU on all supported xl(4) NICs.
- Actually set the IFCAP_VLAN_MTU flag in the capabilities word
in xl_attach().
- Change the method used to detect older boomerang chips. My 3c575C
cardbus NIC was being incorrectly identified as 3c90x chip instead
of 3c90xB because the capabilities word in its EEPROM reports
a bizzare value. In addition to checking for the supportsNoTxLength
bit, also check for the absence of the supportsLargePackets bit.
Both of these cases denote a 3c90xB chip.
- Make RX and TX checksums configurable via the SIOCSIFCAP ioctl.
- Avoid an unecessary le32toh() in xl_rxeof(): we already have the
received frame size in the lower 16 bits of rxstat, no need to
read it again.
Tested with 3c905-TX, 3c900-TPO, 3c980C and 3c575C NICs.
on the implied sign extension. The single unified VADDR() macro was
not able to avoid sign extending the VM_MAXUSER_ADDRESS/USRSTACK values.
Be explicit about UVADDR() (positive address space) and KVADDR()
(kernel negative address space) to make mistakes show up more
spectacularly.
Increase user VM space from 1/2TB (512GB) to 128TB.
corresponding release code. This was preventing the use of more than
1/2TB of user VM. I also spent a week staring at this code only to
eventually find that I'd mistakenly typed a P as an R.
rather than a non-existing pte. There is code elsewhere in i386/amd64
pmap that neglects to handle the large page cases because it knows that
it will see PG_PS in the returned "pte".
- Use atomic ops to update the bigpipe count
- Make the bigpipe count sysctl readable
- Remove a duplicate comparison in an if statement
- Comment two SYSCTLs.
extra trailing space.
- Don't bother probing a generic ISA bus device if isab0 already exists.
Some BIOSes place an ACPI psuedo-device with the HID of a generic ISA bus
device under the PCI-ISA bridge device. This is not the best solution
but will work for now. The isa bus driver only allows for one ISA bus
anyways.
having the PCI-ISA bridge driver depend on both pci and isa.
- Have the PCI-EISA bridge driver depend on both pci and eisa as well.
- Make acpi_isab.c depend on acpi and isa.
Submitted by: Marius Strobl <marius@alchemy.franken.de> (1,2)
extracted from received frames, both in the IFCAP_VLAN_HWTAGGING case
and not. (Some drivers may already do this masking internally, but
doing it here doesn't hurt and insures consistency.)
- In vlan_ioctl(), don't let the user set a VLAN ID value with anything
besides the VLID bits set, otherwise we will have trouble matching
an interface in vlan_input() later.
PR: kern/46405
ACPI nodes with the plug and play ID's defined for a "Generic ISA Bus
Device" as defined in section 10.7 of the ACPI 2.0 specification. This
gives machines like the Libretto that contain a fake ISA bus that is not
connected via a PCI-ISA bridge an ISA bus for ISA devices to attach to.
Tested by: markm
than the shortcircuited version I had been using, which only worked
properly on i386 & amd64.
Also, change an autoscale constant to account for the more correct
kmem_map size.
Problem noticed by: mux
- Factor out code common to all ISA bridge drivers attach methods into a
isab_attach() function.
- Rename the PCI-ISA bridge driver's attach function to pci_isab_attach()
and have it call isab_attach().
support matching a list of addr/mask pairs so one can write
more efficient rulesets which were not possible before e.g.
add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16
The change is fully backward compatible.
ipfw2 and manpage commit to follow.
MFC after: 3 days
- Limit the total number of pipes so that we do not
exhaust all vm objects in the kernel map. When
this limit is reached, a ratelimited message will
be printed to the console.
- Put a soft limit on the amount of memory consumable
by pipes. Once the limit has been reached, all new
pipes will be limited to 4K in size, rather than the
default of 16K.
- Put a limit on the number of pages that may be used
for high speed page flipping in order to reduce the
amount of wired memory. Pipe writes that occur
while this limit is exceeded will fall back to
non-page flipping mode.
The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.
These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes. (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)
PR: 53627
Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com
MFC after: 1 week
the bulk out buffer size to 16 bytes. The bulk out endpoint descriptor
reports 32 bytes, but if you use this value, data will get dropped.
Reviewed/approved by: scottl
- Change vm_pageout_object_deactivate_pages()'s first parameter from a
vm_map_t to a pmap_t.
- Change vm_pageout_object_deactivate_pages()'s and
vm_pageout_map_deactivate_pages()'s last parameter from a vm_pindex_t
to a long. Since the number of pages in an address space doesn't
require 64 bits on an i386, vm_pindex_t is overkill.
to have this driver working on sparc64. It still needs to be made
endian-clean before it can work there.
Special thanks to dragonk@evilcode.net for sending me a dc(4) card so
that I was able to do this work.
Many cheers to all the people that tested this change, thanks to them,
this change shouldn't break anything :-).
Tested by: marcel (i386 and ia64), ru (i386), wilko (alpha),
mbr (i386), wpaul (i386) and
Will Saxon <WillS@housing.ufl.edu> (i386)
reset them only if they were previously in use. Unconditionally
resetting the registers wipes them out frequently, which interferes
with their use for kernel debugging.
While I'm here, be less verbose in the associated comment of a
neighboring function.
Noticed by: bde
insertion and extraction) has revealed two bugs:
- In vlan_start(), we're supposed to check the underlying interface to
see if it has the IFCAP_VLAN_HWTAGGING cabability set and, if so, set
things up for the VLAN_OUTPUT_TAG() routine. However the code checks
ifp->if_capabilities, which is the vlan pseudo-interface's capabilities
when it should be checking p->if_capabilities, which relates to the
underlying physical interface. Change ifp->if_capabilities to
p->if_capabilities so this works.
- In vlan_input(), we have to extract the 16-bit tag value from the
received frame and use it to figure out which vlan interface gets
the frame. The code that we use to track down the desired vlan
pseudo-interface is:
for (ifv = LIST_FIRST(&ifv_list); ifv != NULL;
ifv = LIST_NEXT(ifv, ifv_list))
if (ifp == ifv->ifv_p && tag == ifv->ifv_tag)
break;
The problem is that 'tag' is not computed consistently. In the case
where the interface supports hardware VLAN tag extraction and calls
VLAN_INPUT_TAG(), we do this:
tag = *(u_int*)(mtag+1);
But in the software emulation case, we do this
tag = EVL_VLANOFTAG(ntohs(evl->evl_tag));
The problem here is the EVL_VLANOFTAG() macro is only ever applied
in this one case. It's never applied to ifv->ifv_tag or anwhere else.
We must be consistent: either it's applied everywhere or nowhere.
To see how this can be a problem, do something like
ifconfig vlan0 vlan 12345 vlandev foo0 and observe the results.
I'm not quite sure what the right thing is to do here. Neither the
vlan(4) nor ifconfig(8) man pages suggest which way to go. For now,
I've removed this use of EVL_VLANOFTAG() so that the tag will match
correctly in all cases. I will not get upset if somebody makes a
compelling argument for using EVL_VLANOFTAG() everywhere instead,
as long as the use is consistent.
tested for playback.
* modify device name strings for ich chips to better conform with their
common names.
* remove superflous 'AC97 controller' from nforce device names.
MFC after: 1 week
to get a stacktrace. This does not work even with M_NOWAIT when we
have WITNESS and is generally a bad idea (pointed out by bde@). We
allocate an 8K heap for use by the unwinder when ddb is active. A
stack trace roughly takes up half of that in any case, so we have
some room for complex unwind situations. We don't want to waste too
much space though. Due to the nature of unwinding, we don't worry
too much about fragmentation or performance of unwinding while in
the debugger. For now we have our own heap management, but we may
be able to leverage from existing code at some later time.
While here:
o Make sure we actually free the unwind environment after unwinding.
This fixes a memory leak.
o Replace Doug's license with mine in unwind.c and unwind.h. Both
files don't have much, if any, of Doug's code left since the EPC
syscall overhaul and the import of the unwinder.
o Remove dead code.
o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.
notice another typo in the same line. This typo makes libthr unuseable,
but it's effects where counter-balanced by the extra semicolon, which
made libthr remarkably useable for the past several months.
Should work with both regular and fast ipsec (mutually exclusive).
See manpage for more details.
Submitted by: Ari Suutari (ari.suutari@syncrontech.com)
Revised by: sam
MFC after: 1 week
- Associate logical CPUs on the same physical core with the same kseq.
- Adjust code that assumed there would only be one running thread in any
kseq.
- Wrap the HTT code with a ULE_HTT_EXPERIMENTAL ifdef. This is a start
towards HyperThreading support but it isn't quite there yet.
BUS_DMA_NOWAIT flag, since the code can't handle this.
- Use NULL, NULL for the lockfunc and lockfuncarg parameters of
bus_dma_tag_create() since deferred loads can't happen now.
as the target process' pid, it may exist if the process forked before leaving
the pgrp.
Thix fixes a panic that happens when calling setpgid to make a process
re-enter the pgrp with the same pgid as its pid if the pgrp still exists.
forced to do slightly bogus power state manipulation. However, this
is one of those features that is preventing further progress, so mark
them as BURN_BIRDGES like I did for the drivers in sys/dev/...
This, like the other change, are a no-op unless you have BURN_BRIDGES
in your kernel.
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.
This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.
be delivered to that thread, regardless of whether it
has it masked or not.
Previously, if the targeted thread had the signal masked,
it would be put on the processes' siglist. If
another thread has the signal umasked or unmasks it before
the target, then the thread it was intended for would never
receive it.
This patch attempts to solve the problem by requiring callers
of tdsignal() to say whether the signal is for the thread or
for the process. If it is for the process, then normal processing
occurs and any thread that has it unmasked can receive it.
But if it is destined for a specific thread, it is put on
that thread's pending list regardless of whether it is currently
masked or not.
The new behaviour still needs more work, though. If the signal
is reposted for some reason it is always posted back to the
thread that handled it because the information regarding the
target of the signal has been lost by then.
Reviewed by: jdp, jeff, bde (style)
However, they are presently necessary due to bigger bogusness in the
pci bus layer not doing the right thing on suspend/resume or on
initial device probe. This is exactly the sort of thing that the
BURN_BRIDGES option was invented for. Mark all of them as
BURN_BRIDGES. As soon as I have the powerstate stuff properly
integrated into the pci bus code, I intend to remove all these
workarounds.
locks held by each thread.
- Fix a bug in the original BSD/OS code where a contested lock was not
properly handed off from the old thread to the new thread when a
contested lock with more than one blocked thread was transferred from
one thread to another.
- Don't use an atomic operation to write the MTX_CONTESTED value to
mtx_lock in the aforementioned special case. The memory barriers and
exclusion provided by sched_lock are sufficient.
Spotted by: alc (2)
disabled.
- Change the apm driver to match the acpi driver's behavior by checking to
see if the device is disabled in the identify routine instead of in the
probe routine. This way if the device is disabled it is never created.
Note that a few places (ips(4), Alpha SMP) used "disable" instead of
"disabled" for their hint names, and these hints must be changed to
"disabled". If this is a big problem, resource_disabled() can always be
changed to honor both names.
careful to call all map_load calls with BUS_DMA_NOWAIT because we
really don't want some PDUs to wait while others go out - ATM guarantees
the ordering of cells and also of PDUs (within one VC, that is). With
BUS_DMA_NOWAIT bus_dmamap_load should never return EINPROGRESS.
Make the tag used for transmission buffers one larger than the maximum
AAL5 PDU (65535). This is needed, because all PDU sizes need to be round
up to multiple of four for the card and PDUs that are just below the
maximum size will be rounded up to 65536
real SATA disks now that I can test it.
Add support for the SiI 3112 SATA chip using memory mapped I/O.
Update the support for the SiI 0680 to use the memio interface as well.
Sponsored by: David Leimbach <leimy2k@mac.com> (3112 based controller)
Sponsored by: FreeBSD Systems (www.FreeBSDsystems.com) (SATA disks)