Commit Graph

1086 Commits

Author SHA1 Message Date
marius
d60b8a3096 Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by:	jhb
Reviewed by:	grehan, jhb, marcel
Approved by:	re (kensmith), jhb (PCI maintainer hat)
2007-09-30 11:05:18 +00:00
marius
4894db0d57 o Revert the part of if_gem.c rev. 1.35 which added a call to gem_stop()
to gem_attach() as the former access softc members not yet initialized
  at that time and gem_reset() actually is enough to stop the chip. [1]
o Revise the use of gem_bitwait(); add bus_barrier() calls before calling
  gem_bitwait() to ensure the respective bit has been written before we
  starting polling on it and poll for the right bits to change, f.e. even
  though we only reset RX we have to actually wait for both GEM_RESET_RX
  and GEM_RESET_TX to clear. Add some additional gem_bitwait() calls in
  places we've been missing them according to the GEM documentation.
  Along with this some excessive DELAYs, which probably only were added
  because of bugs in gem_bitwait() and its use in the first place, as
  well as as have of an gem_bitwait() reimplementation in gem_reset_tx()
  were removed.
o Add gem_reset_rxdma() and use it to deal with GEM_MAC_RX_OVERFLOW errors
  more gracefully as unlike gem_init_locked() it resets the RX DMA engine
  only, causing no link loss and the FIFOs not to be cleared. Also use it
  deal with GEM_INTR_RX_TAG_ERR errors, with previously were unhandled.
  This was based on information obtained from the Linux GEM and OpenSolaris
  ERI drivers.
o Turn on workarounds for silicon bugs in the Apple GMAC variants.
  This was based on information obtained from the Darwin GMAC and Linux GEM
  drivers.
o Turn on "infinite" (i.e. maximum 31 * 64 bytes in length) DMA bursts.
  This greatly improves especially RX performance.
o Optimize the RX path, this consists of:
  - kicking the receiver as soon as we've a spare descriptor in gem_rint()
    again instead of just once after all the ready ones have been handled;
  - kicking the receiver the right way, i.e. as outlined in the GEM
    documentation in batches of 4 and by pointing it to the descriptor
    after the last valid one;
  - calling gem_rint() before gem_tint() in gem_intr() as gem_tint() may
    take quite a while;
  - doubling the size of the RX ring to 256 descriptors.
  Overall the RX performance of a GEM in a 1GHz Sun Fire V210 was improved
  from ~100Mbit/s to ~850Mbit/s.
o In gem_add_rxbuf() don't assign the newly allocated mbuf to rxs_mbuf
  before calling bus_dmamap_load_mbuf_sg(), if bus_dmamap_load_mbuf_sg()
  fails we'll free the newly allocated mbuf, unable to recycle the
  previous one but a NULL pointer dereference instead.
o In gem_init_locked() honor the return value of gem_meminit().
o Simplify gem_ringsize() and dont' return garbage in the default case.
  Based on OpenBSD.
o Don't turn on MAC control, MIF and PCS interrupts unless GEM_DEBUG is
  defined as we don't need/use these interrupts for operation.
o In gem_start_locked() sync the DMA maps of the descriptor rings before
  every kick of the transmitter and not just once after enqueuing all
  packets as the NIC might instantly start transmitting after we kicked
  it the first time.
o Keep state of the link state and use it to enable or disable the MAC
  in gem_mii_statchg() accordingly as well as to return early from
  gem_start_locked() in case the link is down. [3]
o Initialize the maximum frame size to a sane value.
o In gem_mii_statchg() enable carrier extension if appropriate.
o Increment if_ierrors in case of an GEM_MAC_RX_OVERFLOW error and in
  gem_eint(). [3]
o Handle IFF_ALLMULTI correctly; don't set it if we've turned promiscuous
  group mode on and don't clear the flag if we've disabled promiscuous
  group mode (these were mostly NOPs though). [2]
o Let gem_eint() also report GEM_INTR_PERR errors.
o Move setting sc_variant from gem_pci_probe() to gem_pci_attach() as
  device probe methods are not supposed to touch the softc.
o Collapse sc_inited and sc_pci into bits for sc_flags.
o Add CTASSERTs ensuring that GEM_NRXDESC and GEM_NTXDESC are set to
  legal values.
o Correctly set up for 802.3x flow control, though #ifdef out the code
  that actually enables it as this needs more testing and mainly a proper
  framework to support it.
o Correct and add some conversions from hard-coded functions names to
  __func__ which were borked or forgotten in if_gem.c rev. 1.42.
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o In gem_pci_attach() in case attaching fails release the resources in
  the opposite order they were allocated.
o Make gem_reset() static to if_gem.c as it's not needed outside that
  module.
o Remove the GEM_GIGABIT flag and the associated code; GEM_GIGABIT was
  never set and the associated code was in the wrong place.
o Remove sc_mif_config; it was only used to cache the contents of the
  respective register within gem_attach().
o Remove the #ifdef'ed out NetBSD/OpenBSD code for establishing a suspend
  hook as it will never be used on FreeBSD.
o Also probe Apple Intrepid 2 GMAC and Apple Shasta GMAC, add support for
  Apple K2 GMAC. Based on OpenBSD.
o Add support for Sun GBE/P cards, or in other words actually add support
  for cards based on GEM to gem(4). This mainly consists of adding support
  for the TBI of these chips. Along with this the PHY selection code was
  rewritten to hardcode the PHY number for certain configurations as for
  example the PHY of the on-board ERI of Blade 1000 shows up twice causing
  no link as the second incarnation is isolated.
  These changes were ported from OpenBSD with some additional improvements
  and modulo some bugs.
o Add code to if_gem_pci.c allowing to read the MAC-address from the VPD on
  systems without Open Firmware.
  This is an improved version of my variant of the respective code in
  if_hme_pci.c
o Now that gem(4) is MI enable it for all archs.

Pointed out by:	yongari [1]
Suggested by:	rwatson [2], yongari [3]
Tested on:	i386 (GEM), powerpc (GMACs by marcel and yongari),
		sparc64 (ERI and GEM)
Reviewed by:	yongari
Approved by:	re (kensmith)
2007-09-26 21:14:18 +00:00
brueffer
26461bf019 Use the correct expanded name for SCTP.
PR:		116496
Submitted by:	koitsu
Reviewed by:	rrs
Approved by:	re (kensmith)
2007-09-26 20:05:07 +00:00
alc
d1bce06c64 Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq.  Instead, they are kept in a separate per-object
splay tree of cached pages.  However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock.  Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE).  The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held.  Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case.  Cached pages
are reclaimed far, far more often than they are reactivated.  Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated.  Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page.  Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
   Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
2007-09-25 06:25:06 +00:00
alc
20b10da706 It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren.  They should
be.  This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc().  I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago.  This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)
2007-09-15 18:47:02 +00:00
marcel
b031fef0fe Revamp the interrupt handling in support of INTR_FILTER. This includes:
o  Revamp the PIC I/F to only abstract the PIC hardware. The
   resource handling has been moved to nexus, where it belongs.
o  Include EOI and MASK+EOI methods to the PIC I/F in support of
   INTR_FILTER.
o  With the allocation of interrupt resources and setup of
   interrupt handlers in the common platform code we can delay
   talking to the PIC hardware after enumeration of all devices.
   Introduce a call to powerpc_intr_enable() in configure_final()
   to achieve that and have powerpc_setup_intr() only program the
   PIC when !cold.
o  As a consequence of the above, remove all early_attach() glue
   from the OpenPIC and Heathrow PIC drivers and have them
   register themselves when they're found during enumeration.
o  Decouple the interrupt vector from the interrupt request line.
   Allocate vectors increasingly so that they can be used for
   the intrcnt index as well. Extend the Heathrow PIC driver to
   translate between IRQ and vector. The OpenPIC driver already
   has the support for vectors in hardware.

Approved by: re (blanket)
2007-08-11 19:25:32 +00:00
marcel
8bda20dd9d Re-enable external interrupts for faults, traps and syscalls.
Approved by: re (blanket)
2007-08-08 01:19:12 +00:00
marcel
d6e4edefa7 Eliminate <machine/interruptvar.h> as it has only a single
prototype. In the future that prototype will not be needed
at all anyway, but for now it's moved to intr_machdep.h.

Approved by: re (blanket)
2007-08-07 23:33:35 +00:00
marcel
7e2354fa52 Remove redundant prototype.
Approved by: re (blanket)
2007-08-07 18:40:02 +00:00
marcel
2cb62192de Add prototype for trap().
Approved by: re (blanket)
2007-08-07 18:39:28 +00:00
marcel
4f54c1e251 Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.

Based on a patch from: grehan@
Approved by: re (rwatson)
2007-07-31 06:23:26 +00:00
marcel
3a7ad72651 Cast the arguments to atomic_*_ptr() when mapping it to atomic_*_32()
This is a minimal fix.

Approved by: re (kensmith)
2007-07-10 04:40:00 +00:00
yongari
07622d6318 Reimplement bus_dmamap_load with bus_dmamap_load_buffer.
Previously it didn't honor parent dma tag's restrictions such that
an invalid dma segment could be passed to device. The driver for the
device may panic in sanity check routine for the dma segment or may
produce unexpected results. I have no idea how it could ever have
worked before.

Reviewed by:	grehan
Tested by:	gad
Approved by:	re (hrs)
2007-06-22 03:57:36 +00:00
yongari
9429895767 Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by:	scottl (long a ago)
Approved by:	re (hrs)
2007-06-22 03:54:53 +00:00
alc
a8415c5a0d Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
delphij
c990e91fd1 Enable SCTP by default for GENERIC kernels in order to give it
more exposure.  The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.

Approved by:	re (kensmith)
Discussed with:	rrs
2007-06-14 17:14:27 +00:00
marcel
0d73aaee3a Enable GEOM_PART_MBR by default. On ia64 this replaces GEOM_MBR. 2007-06-13 05:07:42 +00:00
marcel
75588c5a15 Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
2007-06-09 21:55:17 +00:00
rwatson
4f2da72e8b Enable AUDIT by default in the GENERIC kernel, allowing security event
auditing to be turned on without a kernel recompile, just an rc.conf
option.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2007-06-08 20:29:07 +00:00
marcel
024859acd2 Sync with other platforms: add kluge to use contigmalloc when the
alignment is larger than the size and print a diagnostic when we
didn't satisfy the alignment.
2007-06-08 04:46:50 +00:00
grehan
806e204058 Fix the compile. Band-aid until it is worked out how to use the context
switch api on ppc.
2007-06-06 06:01:56 +00:00
jeff
be3241715a - Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:57:32 +00:00
attilio
e333d0ff0e Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
piso
42dfc78150 In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb
2007-05-31 19:25:35 +00:00
alc
1dfb7ec904 Eliminate some unused definitions that came from NetBSD. 2007-05-28 21:04:22 +00:00
marcel
9983edfa3a Don't initialize the decrementer before initclocks() is called.
Use cpu_initclocks() for that as it assures that relevant locks
have been initialized.
2007-05-27 21:05:35 +00:00
alc
a530caef2a Eliminate an unused definition. 2007-05-27 20:34:26 +00:00
kan
4c2d706212 Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
2007-05-22 02:22:58 +00:00
jeff
e1996cb960 - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
alc
b34f6f7ab1 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
grehan
d1c72876f0 Add ofw bus methods to the ppc nexus driver. This will be used in future
EFIKA platform support.

PR:	111522
Submitted by:	Andrew Turner, andrew at fubar geek nz
2007-04-20 03:24:59 +00:00
pjd
f4e110ebf2 Remove trailing '.' for consistency! 2007-04-10 21:40:13 +00:00
pjd
b159725895 Add UFS_GJOURNAL options to the GENERIC kernel.
Approved by:	re (kensmith)
2007-04-10 16:49:41 +00:00
rwatson
c2d2fa5e1e If nooption SMP on powerpc, also nooption ADAPTIVE_SX, which depends on
SMP and is now in the global NOTES.
2007-04-01 11:10:16 +00:00
marcel
32bc24cb16 Add bge(4).
Fix a white-space nit while I'm here.
2007-04-01 06:24:19 +00:00
marcel
b8a1b7a931 When writing to PCI configuration registers, don't immediately
read the same register back. It can cause hangs or machine
checks in certain cases. One particular case is with bge(4)
when a reset is initiated for the controller.

MFC after: 1 month
2007-04-01 06:15:53 +00:00
marcel
8c8fca2743 Remove unused file. 2007-04-01 00:41:01 +00:00
alc
b03ddb707b Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
mohans
a332cb00d5 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00
piso
b9dfa1fd20 Update openpic to support the new bus_setup_intr() syntax.
Reviewed by: marcel
2007-03-07 11:42:14 +00:00
piso
a02f7880b0 Make pswitch_intr() returns interrupt handling status. 2007-03-02 15:13:17 +00:00
piso
363e58ec6b Catch up with bus_setup_intr() modification and garbage collect a
reference to INTR_FAST.
2007-02-25 15:04:08 +00:00
piso
6a2ffa86e5 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
marcel
dd00b51193 The table of known CPU models ends with an entry that has a version
of 0, not with an entry that has an empty CPU name.

Submitted by: Andrew Turner (andrew@fubar.geek.nz)
2007-02-18 17:40:09 +00:00
kevlo
7afc90147a Remove the cast to caddr_t for sfp, they're not needed.
Reviewed by: marcel
2007-02-12 08:59:33 +00:00
brooks
beaea8e48e Include GEOM_LABEL in GENERIC. It's very useful and not well publicized
enough.

Approved by:	pjd
2007-02-09 19:03:18 +00:00
marcel
0245423ad8 Evolve the ctlreq interface added to geom_gpt into a generic
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).

The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
2007-02-07 18:55:31 +00:00
marcel
db6667954e Remove stale header.
MFC after: 3 days
2007-01-26 04:58:31 +00:00
marcel
62b15d9b50 Propagate the CPU model to the hw.model sysctl. 2007-01-14 21:45:05 +00:00