o The function is defined unconditionally but depends on SPR_SVR,
which is defined conditionally.
o spr.h defines mfspr() and mtspr(), which is no worse to use.
by the parent for interrupt resources. This corrects parsing of
the interrupts property.
With parsing of the property fixed, add all interrupts to the
resource list. Bump the max. number of interrupts from 5 to 6
as scc(4) attached to macio(4) has 6 interrupts (3 per channel).
Submitted by: Nathan Whitehorn <nathanw@uchicago.edu>
- detect number of LAWs in run time and initalize accordingly
- introduce decode windows target IDs used in MPC8572
- other minor updates
Obtained from: Freescale, Semihalf
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
Sponsored by: Nokia
for better structure.
Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.
In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
the fact that we have a 1:1 mapping by virtue of the BATs.
Eliminate the now unused moea_rkva_alloc(), moea_pa_map() and
moea_pa_unmap() functions.
Pointed out by: grehan.
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.
Reviewed by: jhb
Sponsored by: Nokia
so that all implemented variants have proper prototypes. The 8-bit,
16-bit and 64-bit variants are not implemented.
This really fixes the current build breakages caused by type casting
and struct aliasing rules.
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).
Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.
The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.
In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.
Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}
will have a special section, named .PPC.EMB.apuinfo, which will
tell GDB that a BookE processor is targeted and which will
result in GDB using a different register definition. In order
to support remote GDB for BookE, we need the GDB stub in the
kernel look for that section and use the BookE definitions.
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:
intr_bind(rman_get_start(irq_res), cpu);
however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.
Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}
might be currently programmed into the registers.
Underlying firmware (U-Boot) would typically program MAC address into the
first unit only, and others are left uninitialized. It is now possible to
retrieve and program MAC address for all units properly, provided they were
passed on in the bootinfo metadata.
Reviewed by: imp, marcel
Approved by: cognet (mentor)
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
set a default name. If the IRQ is added as a consequence of
configurating the IRQ without there ever being a handler
assigned to it, we will not have a name. This breaks the
fragile intrcnt/intrnames logic.
in*() and out*() primitives should not be used, other than by
ISA drivers. In this case they were used for memory-mapped I/O
and were not even used in the spirit of the primitives.
It so happens that U-Boot disables the D-cache when booting
an ELF image, so this change makes sure we run with the
D-cache enabled from now on. It shows too...
While here, remove the duplicate definition of the hw.model
sysctl.
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)
used in the kernel only (by virtue of checking for _KERNEL),
ports like lsof (part of gtop) cheat. It sets _KERNEL, but does
not set either AIM or E500. As such, PCPU_MD_FIELDS didn't get
defined and the build broke.
The catch-all is to define PCPU_MD_FIELDS with a dummy integer
when at the end of line we ended up without a definition for it.
it's probed first. The PowerPC platform code deals with everything.
As such, probe devices in order of their location in the memory map.
o Refactor the ocpbus_alloc_resource for readability and make sure we
set the RID in the resource as per the new convention.
- Even for the PCI Express host controller we need to use bus 0
for configuration space accesses to devices directly on the
host controller's bus.
- Pass the maximum number of slots to pci_ocp_init() because the
caller knows how many slots the bus has. Previously a PCI or
PCI-X bus underneath a PCI Express host controller would not
be enumerated properly.
o Pull the interrupt routing logic out of pci_ocp_init() and into
its own function. The logic is not quite right and is expected
to be a bit more complex.
o Fix/add support for PCI domains. The PCI domain is the unit
number as per other PCI host controller drivers. As such, we
can use logical bus numbers again and don't have to guarantee
globally unique bus numbers. Remove pci_ocp_busnr. Return the
highest bus number ito the caller of pci_ocp_init() now that
we don't have a global variable anymore.
o BAR programming fixes:
- Non-type0 headers have at most 1 BAR, not 0.
- First write ~0 to the BAR in question and then read back its
size.
Obtained from: Juniper Networks (mostly)
The kernel config file is KERNCONF=MPC85XX, so the usual procedure applies:
1. make buildworld TARGET_ARCH=powerpc
2. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
This default config uses kernel-level FPU emulation. For the soft-float world
approach:
1. make buildworld TARGET_ARCH=powerpc TARGET_CPUTYPE=e500
2. disable FPU_EMU option in sys/powerpc/conf/MPC85XX
3. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
Approved by: cognet (mentor)
MFp4: e500
The PQ3 is a high performance integrated communications processing system
based on the e500 core, which is an embedded RISC processor that implements
the 32-bit Book E definition of the PowerPC architecture. For details refer
to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E
This port was tested and successfully run on the following members of the PQ3
family: MPC8533, MPC8541, MPC8548, MPC8555.
The following major integrated peripherals are supported:
* On-chip peripherals bus
* OpenPIC interrupt controller
* UART
* Ethernet (TSEC)
* Host/PCI bridge
* QUICC engine (SCC functionality)
This commit brings the main functionality and will be followed by individual
drivers that are logically separate from this base.
Approved by: cognet (mentor)
Obtained from: Juniper, Semihalf
MFp4: e500
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:
<machine/pcpu.h>
<machine/pcb.h>
<machine/kdb.h>
<machine/hid.h>
<machine/frame.h>
Areas which use the above are adjusted and cleaned up.
Credits for this rework go to marcel@
Approved by: cognet (mentor)
MFp4: e500
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.
Sponsored by: Nokia
variations (e500 currently), this provides a gcc-level FPU emulation and is an
alternative approach to the recently introduced kernel-level emulation
(FPU_EMU).
Approved by: cognet (mentor)
MFp4: e500
can run on processors that don't have a FPU. This is typically the
case for Book E processors. While a tuned system will probably want
to use soft-float (or use a processor that has a FPU if the usage is
FP intensive enough), allowing hard-float on FPU-less systems gives
great portability and flexibility.
Obtained from: NetBSD
the PIC also informs the platform at which IRQ level it can start
assigning IPIs, since this can depend on the number of IRQs
supported for external interrupts.
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.
The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.
With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.
Obtained from: Juniper, Semihalf
processors (it's the PowerPC Operating Environment Architecture).
AIM designates the processors made by the Apple-IBM-Motorola
alliance and those we typically support.
While here, remove the NetBSD option IPKDB. It's not an option
used by us. Also, PPC_HAVE_FPU is not used by us either. Remove
that too.
Obtained from: Juniper, Semihalf
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
cast as uint32_t which is defined as unsigned int. gcc doesn't want to
consider that there might not be much difference between an int and
a long on a 32 bit architecture.
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.
Reviewed by: jhb, jeff, silby
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().
Reviewed by: tegge
MFC after: 6 weeks
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().
PR: ia64/118024
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().
This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.
Reported by: numerous
Reviewed by: marcel
MFC after: 1 day
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
kern/sched_ule.c - Add __powerpc__ to the list of supported architectures
powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE
powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct
powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
updating the old thread's lock. Note: uniprocessor-only, will require
modification for MP support.
powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.
Reviewed by: marcel, jeffr (slightly older version)
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
to gem_attach() as the former access softc members not yet initialized
at that time and gem_reset() actually is enough to stop the chip. [1]
o Revise the use of gem_bitwait(); add bus_barrier() calls before calling
gem_bitwait() to ensure the respective bit has been written before we
starting polling on it and poll for the right bits to change, f.e. even
though we only reset RX we have to actually wait for both GEM_RESET_RX
and GEM_RESET_TX to clear. Add some additional gem_bitwait() calls in
places we've been missing them according to the GEM documentation.
Along with this some excessive DELAYs, which probably only were added
because of bugs in gem_bitwait() and its use in the first place, as
well as as have of an gem_bitwait() reimplementation in gem_reset_tx()
were removed.
o Add gem_reset_rxdma() and use it to deal with GEM_MAC_RX_OVERFLOW errors
more gracefully as unlike gem_init_locked() it resets the RX DMA engine
only, causing no link loss and the FIFOs not to be cleared. Also use it
deal with GEM_INTR_RX_TAG_ERR errors, with previously were unhandled.
This was based on information obtained from the Linux GEM and OpenSolaris
ERI drivers.
o Turn on workarounds for silicon bugs in the Apple GMAC variants.
This was based on information obtained from the Darwin GMAC and Linux GEM
drivers.
o Turn on "infinite" (i.e. maximum 31 * 64 bytes in length) DMA bursts.
This greatly improves especially RX performance.
o Optimize the RX path, this consists of:
- kicking the receiver as soon as we've a spare descriptor in gem_rint()
again instead of just once after all the ready ones have been handled;
- kicking the receiver the right way, i.e. as outlined in the GEM
documentation in batches of 4 and by pointing it to the descriptor
after the last valid one;
- calling gem_rint() before gem_tint() in gem_intr() as gem_tint() may
take quite a while;
- doubling the size of the RX ring to 256 descriptors.
Overall the RX performance of a GEM in a 1GHz Sun Fire V210 was improved
from ~100Mbit/s to ~850Mbit/s.
o In gem_add_rxbuf() don't assign the newly allocated mbuf to rxs_mbuf
before calling bus_dmamap_load_mbuf_sg(), if bus_dmamap_load_mbuf_sg()
fails we'll free the newly allocated mbuf, unable to recycle the
previous one but a NULL pointer dereference instead.
o In gem_init_locked() honor the return value of gem_meminit().
o Simplify gem_ringsize() and dont' return garbage in the default case.
Based on OpenBSD.
o Don't turn on MAC control, MIF and PCS interrupts unless GEM_DEBUG is
defined as we don't need/use these interrupts for operation.
o In gem_start_locked() sync the DMA maps of the descriptor rings before
every kick of the transmitter and not just once after enqueuing all
packets as the NIC might instantly start transmitting after we kicked
it the first time.
o Keep state of the link state and use it to enable or disable the MAC
in gem_mii_statchg() accordingly as well as to return early from
gem_start_locked() in case the link is down. [3]
o Initialize the maximum frame size to a sane value.
o In gem_mii_statchg() enable carrier extension if appropriate.
o Increment if_ierrors in case of an GEM_MAC_RX_OVERFLOW error and in
gem_eint(). [3]
o Handle IFF_ALLMULTI correctly; don't set it if we've turned promiscuous
group mode on and don't clear the flag if we've disabled promiscuous
group mode (these were mostly NOPs though). [2]
o Let gem_eint() also report GEM_INTR_PERR errors.
o Move setting sc_variant from gem_pci_probe() to gem_pci_attach() as
device probe methods are not supposed to touch the softc.
o Collapse sc_inited and sc_pci into bits for sc_flags.
o Add CTASSERTs ensuring that GEM_NRXDESC and GEM_NTXDESC are set to
legal values.
o Correctly set up for 802.3x flow control, though #ifdef out the code
that actually enables it as this needs more testing and mainly a proper
framework to support it.
o Correct and add some conversions from hard-coded functions names to
__func__ which were borked or forgotten in if_gem.c rev. 1.42.
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o In gem_pci_attach() in case attaching fails release the resources in
the opposite order they were allocated.
o Make gem_reset() static to if_gem.c as it's not needed outside that
module.
o Remove the GEM_GIGABIT flag and the associated code; GEM_GIGABIT was
never set and the associated code was in the wrong place.
o Remove sc_mif_config; it was only used to cache the contents of the
respective register within gem_attach().
o Remove the #ifdef'ed out NetBSD/OpenBSD code for establishing a suspend
hook as it will never be used on FreeBSD.
o Also probe Apple Intrepid 2 GMAC and Apple Shasta GMAC, add support for
Apple K2 GMAC. Based on OpenBSD.
o Add support for Sun GBE/P cards, or in other words actually add support
for cards based on GEM to gem(4). This mainly consists of adding support
for the TBI of these chips. Along with this the PHY selection code was
rewritten to hardcode the PHY number for certain configurations as for
example the PHY of the on-board ERI of Blade 1000 shows up twice causing
no link as the second incarnation is isolated.
These changes were ported from OpenBSD with some additional improvements
and modulo some bugs.
o Add code to if_gem_pci.c allowing to read the MAC-address from the VPD on
systems without Open Firmware.
This is an improved version of my variant of the respective code in
if_hme_pci.c
o Now that gem(4) is MI enable it for all archs.
Pointed out by: yongari [1]
Suggested by: rwatson [2], yongari [3]
Tested on: i386 (GEM), powerpc (GMACs by marcel and yongari),
sparc64 (ERI and GEM)
Reviewed by: yongari
Approved by: re (kensmith)
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.
It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().
Approved by: re (kensmith)
o Revamp the PIC I/F to only abstract the PIC hardware. The
resource handling has been moved to nexus, where it belongs.
o Include EOI and MASK+EOI methods to the PIC I/F in support of
INTR_FILTER.
o With the allocation of interrupt resources and setup of
interrupt handlers in the common platform code we can delay
talking to the PIC hardware after enumeration of all devices.
Introduce a call to powerpc_intr_enable() in configure_final()
to achieve that and have powerpc_setup_intr() only program the
PIC when !cold.
o As a consequence of the above, remove all early_attach() glue
from the OpenPIC and Heathrow PIC drivers and have them
register themselves when they're found during enumeration.
o Decouple the interrupt vector from the interrupt request line.
Allocate vectors increasingly so that they can be used for
the intrcnt index as well. Extend the Heathrow PIC driver to
translate between IRQ and vector. The OpenPIC driver already
has the support for vectors in hardware.
Approved by: re (blanket)
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (rwatson)
Previously it didn't honor parent dma tag's restrictions such that
an invalid dma segment could be passed to device. The driver for the
device may panic in sanity check routine for the dma segment or may
produce unexpected results. I have no idea how it could ever have
worked before.
Reviewed by: grehan
Tested by: gad
Approved by: re (hrs)
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.
Reviewed by: scottl (long a ago)
Approved by: re (hrs)
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
more exposure. The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.
Approved by: re (kensmith)
Discussed with: rrs
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde
Approved by: jeff (mentor)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.
Discussed with: jhb
same way it was enabled for Linux binares in linuxulator.
This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194
read the same register back. It can cause hangs or machine
checks in certain cases. One particular case is with bge(4)
when a reset is initiated for the controller.
MFC after: 1 month
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.