current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.
Tested on: alpha, sparc64
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)
Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)
amd64, and is a factor of 3 less than the value previously auto-sized on
a 12GB machine, which would cause an overflow in calculations involving the
maxbcache int, causing bufinit() to loop forever at boot.
Reviewed by: mlaier, peter
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.
6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
variable and returns the previous value of the variable.
Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week
This kernel config briefly describes some of the major MAC policies
available on FreeBSD. The hope is that this will raise the awareness
about MAC and get more people interested.
Discussed with: scottl
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.
This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.
Discussed with: bde
archs and enable splash(4) by default (the non-working screen savers
either don't compile or just have no effect when loaded, i.e. don't
cause harm).
MFC after: 1 week
in the arm __swp() and sparc64 casa() and casax() functions is actually
being used as an input and output and not just the value of the register
that points to the memory location. This was the underlying source of
the mbuf refcount problems on sparc64 a while back. For arm this should be
a nop because __swp() has a constraint to clobber all memory which can
probably be removed now.
Reviewed by: alc, cognet
MFC after: 1 week
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.
MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.
Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)
- Let creator_bitblt() return ENODEV as it's not implemented (missed
in sys/dev/fb/creator.c rev. 1.6).
- As a speed optimization inline the creator_ras_wait() etc. helper
functions and also cache setting the font increment, font width
and plane mask. [1]
- I got the meaning of V_DISPLAY_BLANK wrong, it's blank like turn
off and not blank like turn on and clear the screen. So move
clearing the screen to creator_clear() were it hopefully belongs.
- Properly implement V_DISPLAY_BLANK, V_DISPLAY_STAND_BY and
V_DISPLAY_SUSPEND. This makes blank_saver.ko and green_saver.ko
work. [1]
- Change the order of operations in creator_fill_rect(), i.e. write
y before x and cy before cx. This fixes drawing the top part of
the border with Elite3D cards when switching from Xorg to a VTY.
- Move setting the chip configuration we use and invalidating the
cache variables to creator_set_mode() and set the V_ADP_MODECHANGE
flag. This causes creator_set_mode() to be called when the X server
shuts down which fixes the screen corruption caused most of the
time by Xorg not restoring the original configuration present at
startup.
Inspired by/based on: Xorg [1]
Approved by: re (scottl)
psm(4), ukbd(4), ums(4) and usb(4) on by default. Modulo some nits with
the most annoying one probably being USB keyboards no longer working at
the OFW boot prompt after halting FreeBSD these drivers work fine on
sparc64 including X and there's nothing left that I'd consider a show-
stopper. I.e. graphical consoles on sun4u machines should either work
out of the box or by plugging in a card that is supported by either
creator(4) or machfb(4). The exception obviously are SBus-only machines
without UPA slots like some Ultra 1 (but which also still lack support
in other areas) and certain Exx0 (but which probably are mainly used
with serial consoles anyway). I'll try to add a cgsix(4) for these later
as Sun CG6 cards are probably the most common SBus framebuffer cards in
sun4u machines. I however don't see much sense in adding drivers for the
dozen of SBus framebuffers that were destined for sparc v8 machines.
The rest of the USB drivers aren't enabled as I'm only aware of ukbd(4)
and ums(4) as well as ohci(4) working with the on-board ALI M5237 and
Sun PCIO-2 controllers. Aue(4) definitely doesn't work on sparc64, yet.
Thanks to:
- Jake for the initial work on syscons(4) on sparc64 and creator(4).
- Marcel for uart(4) and especially for its support for the SCCs which
are only used on sparc64 so far. In various regards it wouldn't have
been possible to enable syscons(4) by default on sparc64, yet, without
uart(4).
- All that tested patches.
Ok'ed by: scottl (RE hat), tmm
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.
Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.
Remove stale comments from pmap_init().
Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.
Tested by: cognet, kensmith, marcel
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
the number of registered adapters instead of determining again whether
stdout is a supported card (and which might have failed to attach and
register).
- Drop creator_set_mode() and move the relevant parts to creator_fill_rect()
and creator_putc() respectively. This is a bit cleaner than having to
make sure that creator_set_mode() was called before creator_fill_rect()
or creator_putc() are used and matches better what Xorg does.
- Fix a bug in the handling of the FBIOSCURSOR IOCTL; the code was meant
to return ENODEV for all invocations expect when used to disable the
cursor and not just when used for enabling the cursor.
- In case the adapter is the OFW stdout move its OFW cursor to the start
of the last line on halt so OFW output doesn't get intermixed with what
FreeBSD left on the screen. With hindsight this is what the faking of a
hardware cursor which was removed in the last revision really was about,
i.e. to keep the OFW updated about the current cursor position. The new
approach however is simpler while producing the same result and doesn't
cause the first letter of the OFW output to be turned into a blank and
a newline.
- Add variable names to the prototypes of creator_cursor_*() which were
added in the last revision and list them alphabetically in order to match
the style of this file.
for the SYS_RES_IOPORT -> SYS_RES_MEMORY transition again. While it
was helpful to not need to change all of the affected drivers in a
single pass together with ebus(4) we probably shouldn't start into
6.0 with such a hack.
This requires some of the modules of affected drivers to be rebuilt,
namely: auxio(4), snd_audiocs(4) and puc(4).
by default, yet.
- Replace "graphics cards" with "framebuffers" in the description
of creator(4) in order to make it uniform with the description of
machfb(4) and the latter occur both on-board and as add-on cards.
- Use register macros instead of magic values in the code. [1]
- Check the return values of OF_getprop() and other stuff that actually
can fail.
- Let the unimplemented video driver methods return ENODEV rather
than 0 so other code isn't tricked into thinking a certain operation
was successfull. In case of e.g. the video driver creator_ioctl()
this caused vidcontrol(1) to return random garbage information.
Remove the TODO macros in the unimplemented video driver methods
which did a printf("%s: unimplemented\n", __func__). Under certain
circumstances these managed to invoke a printf() when a low-level
console device wasn't attached, yet, causing a Fast Data Access MMU
Miss. These macros were only really usefull for development anyway.
- Set the struct video_adapter and struct video_info va_flags and
vi_flags etc. as appropriate.
- In creator_configure() don't rely on hitting the node which is the
chosen console device first when searching the OFW tree for adapters
compatible with this driver. Instead just check whether the chosen
console device is a viable target for this driver. Targets that are
not the console (including additional cards in multi-head configs)
will be attached through creator_upa_attach(). I think this how the
code in creator_configure() was actually meant to work.
Honour the VIO_PROBE_ONLY flag and don't initialise and register the
console device twice when creator_configure() is called a second time
during sc_probe_unit().
Let creator_configure() return the number of the found adapters,
i.e. 1 in case probing succeeds, as it's expected. The return values
of video adapter configure functions however currently aren't checked
so this doesn't make a difference at the moment.
- In creator_upa_attach() don't rely on probing and attaching the
adapter which is the console first, in case there are multiple
adpaters and one of them is the console this could lead into using
the video adapter unit 0 twice.
- Make the check for DACs with inverted cursor control a bit more
precise and actually honour that information when turning the cursor
on or off. Add a helper function creator_cursor_enable() for this
in order to keep code duplication low. [1]
- Don't bother with faking a hardware cursor in case a device is the
console. Apparently this was meant to start kernel output right after
where the firmware left. In general this isn't worth the fuzz and
also had no real effect as creator_set_mode() did clear the screen
in any case, not just in case a device was not the console.
- Implement creator_fill_rect() and use it to actually blank the
display in creator_blank_display() when the mode is V_DISPLAY_BLANK,
moving blanking the display out of creator_set_mode(). Use it also
to implement creator_set_border() so the border can be re-drawn
when switching to a VTY from X, exiting X, etc. (which leaves us
with a black border most of the time).
- Implement the video driver creator_ioctl(), moving the implementation
of the IOCTL interface from the fbN CDEV version of creator_ioctl()
into the video driver version and use the latter to implement the
former. Use fb_commonioctl() to handle most of the FBIO IOCTLs.
This gives programs like vidcontrol(1) which use the video driver
creator_ioctl() a chance of working.
Implement turning off the cursor via the FBIOSCURSOR IOCTL, which
Xorg uses to in order to inform the OS that it's taking over the
cursor. In creator_putm() check whether the cursor is enabled and
(re-)install it if necessary, moving installing the cursor out of
creator_init() and into a helper function creator_cursor_install().
This fixes the missing mouse pointer when switching to a VTY from X,
exiting X, etc.
- Some clean-up (remove unused/useless code, etc.).
o sparc64/creator/creator_upa.c / sparc64/sparc64/sc_machdep.c:
- Attach syscons(4) as an own pseudo-device on the nexus rather than
directly in creator_upa_attach(), similiar to attaching syscons(4)
as a pseudo-device on isa(4) on other archs. This makes it a whole
lot easier to do the right thing in multi-head configs, especially
with different types of graphics adapters. [2]
- Set SC_AUTODETECT_KBD by default so USB keyboards work out of the
box. [2]
Based on/obtained from: Xorg 'ffb' driver [1]
Based on/obtained from: FreeBSD/powerpc [2]
Use bus_generic_probe() and add a bus_add_child() interface method to
allow device drivers to use the identify method to add themselves if
need be (e.g. syscons(4)).
- Use FBSDID.
consist of the expected number of address and size cells (we can't use
dynamic arrays here because at the point in the boot process when this
code is used malloc() doesn't work, yet). This fixes a Fast Data Access
MMU Miss when uart(4) (erroneously) calls OF_decode_addr() to decode
the address of PS/2 keyboards. PS/2 keyboards use a different and also
undocumented scheme at the first parent node than mapping at 'ranges'
properties. It's however not worth implementing that other scheme and
actually also fits atkbdc(4) better to just start at the first parent
node of PS/2 keyboards which is the 8042 controller (I have atkbdc(4)
working that way).
- Use FBSDID.
MFC after: 1 month
- Add locking.
- Account for if the MC146818_NO_CENT_ADJUST flag is set we don't need
to check wheter year < POSIX_BASE_YEAR.
- Add some comments about mapping the day of week from the range the
generic clock code uses to the range the chip uses and which I meant
to add in the initial version.
- Minor clean-up, use __func__ instead of hardcoded function names in
error strings.
o in the rtc(4) front-end additionally:
- Don't leak resources in case mc146818_attach() fails.
- Account for ebus(4) defaulting to SYS_RES_MEMORY for the memory
resources since ebus.c rev. 1.22.
- Add support for storing the century in MK48TXX_WDAY_CB on MK48Txx with
extended registers when the MK48TXX_NO_CENT_ADJUST flag is set (and which
is termed somewhat confusing as it actually means don't manually adjust
the century in the driver).
- Add the MI part of interfacing the watchdog functionality of MK48Txx with
extended registers with watchdog(9). This is inspired by the SunOS/Solaris
drivers for the 'eeprom' devices also having watchdog support. I actually
expected this to work out of the box on Sun Exx00 machines with 'eeprom'
devices which have a 'watchdog-enable' property. On terminal count of the
the watchdog timer however only the MK48TXX_FLAGS_WDF bit rises but the
reset signal and the interrupt respectively (depending on whether the
MK48TXX_WDOG_WDS bit of the chip and the MK48TXX_WDOG_ENABLE_WDS flag
of the driver respectively is set) goes nowhere. Apparently passing the
reset signal on to the WDR line of the CPUs has to be enabled somewhere
else but we don't have documentation for the Exx00 specific controllers.
I decided to commit this nevertheless so it can be enabled in the eeprom(4)
front-end later in e.g. 6.0-STABLE without breaking the API. Besides the
Exx00 the watchdog part of the MK48Txx should also work on E250 and E450.
Possibly also without extra fiddling on these machines but I haven't
found someone willing to give it a try on such a machine so far.
- Use uintXX_t instead of u_intXX_t, use __func__ instead of hardcoded
function names in error strings.
eeprom_ebus_attach() and eeprom_sbus_attach() into eeprom_attach()
respectively. Since the introduction of the ofw_bus interface some
time ago and now that ebus(4) also uses SYS_RES_MEMORY for the
memory resources since ebus.c rev. 1.22 there is no longer a
need to have separate front-ends for ebus(4), fhc(4) and sbus(4).
- Fail gracefully instead of panicing when the model can't be
determined.
- Don't leak resources when mk48txx_attach() fails.
- Use FBSDID.
compatibility with ISA devices while in fact all known EBus devices
actually use memory space turned out to be not a good idea as so far
there is only the 'rtc' device known to show up either on an EBus or
ISA bus but not on any of the other busses used on sparc64. However
there are quite a couple of them that show up on either EBus, FireHose
or SBus. In order to save extra code in the respective drivers switch
ebus(4) to actually use SYS_RES_MEMORY for the memory resources of
its children. At least for transition still accept SYS_RES_IOPORT
and silently change it to SYS_RES_MEMORY. [1]
- In ebus_probe() use ofw_bus_get_name() instead of re-implementing it
via ofw_bus_get_node() and OF_getprop().
- Remove some unused variables.
- Use FBSDID.
Discussed with: tmm (some time ago)
the iteration variable as the RID when adding the respective resource
to the child via bus_set_resource(). In case a device has both I/O
and memory resources this generates gaps in the newbus resources of
the child, e.g. its first memory resource might end up as RID 1.
To solve this mimic resource_list_add_next() via resource_list_find()
and bus_set_resource(); we can't just use resource_list_add_next()
here as this would circumvent the limit checks in isa_set_resource()
of the common ISA code.
This however is more or less a theoretical problem so far as all known
ISA devices on sparc64 soley use I/O space.
- Just use bus_generic_rl_release_resource() for isa_release_resource()
instead of re-implementing the former.
- Improve some comments to better reflect reality, minor clean-up and
simplifications, return NULL instead of 0 were appropriate.
front-end and the LSI64854 and NCR53C9x code in case one of these
functions fails. Add detach functions to these parts and make esp(4)
detachable.
- Revert rev. 1.7 of esp_sbus.c, since rev. 1.34 of sbus.c the clockfreq
IVAR defaults to the per-child values.
- Merge ncr53c9x.c rev. 1.111 from NetBSD (partial):
On reset, clear state flags and the msgout queue.
In NetBSD code to notify the upper layer (i.e. CAM in FreeBSD) on reset
was also added with this revision. This is believed to be not necessary
in FreeBSD and was not merged.
This makes ncr53c9x.c to be in sync with NetBSD up to rev. 1.114.
- Conditionalize the LSI64854 support on sbus(4) only instead of sbus(4)
and esp(4) as it's also required for the 'dma', 'espdma' and 'ledma'
busses/devices as well as the 'SUNW,bpp' device (printer port) which
all hang off of sbus(4).
- Add a driver for the 'dma', 'espdma' and 'ledma' (pseudo-)busses/
devices. These busses and devices actually represent the LSI64854 DMA
engines for the ESP SCSI and LANCE Ethernet controllers found on the
SBus of Ultra 1 and SBus add-on cards. With 'espdma' and 'ledma' the
'esp' and 'le' devices hang off of the respective DMA bus instead of
directly from the SBus. The 'dma' devices are either also used in this
manner or on some add-on cards also as a companion device to an 'esp'
device which also hangs off directly from the SBus. With the latter
variant it's a bit tricky to glue the DMA engine to the core logic of
the respective 'esp' device. With rev. 1.35 of sbus.c we are however
guaranteed that such a 'dma' device is probed before the respective
'esp' device which simplifies things a lot. [1]
- In the esp(4) SBus front-end read the part-unique ID code of Fast-SCSI
capable chips the right way. This fixes erroneously detecting some
chips as FAS366 when in fact they are not. Add explicit checks for the
FAS100A, FAS216 and FAS236 variants instead treating all of these as
ESP200. That way we can correctly set the respective Fast-SCSI config
bits instead of driving them out of specs. This includes adding the
FAS100A and FAS236 variants to the NCR53C9x core code. We probably
still subsume some chip variants as ESP200 while in fact they are
another variant which however shouldn't really matter as this will
only happen when these chips are driven at 25MHz or less which implies
not being able to run Fast-SCSI. [3]
- Add a workaround to the NCR53C9x interrupt handler which ignores the
stray interrupt generated by FAS100A when doing path inquiry during
boot and which otherwiese would trigger a panic.
- Add support for the 'esp' devices hanging off of a 'dma' or 'espdma'
busses or which are companions of 'dma' devices to esp(4). In case of
the variants that hang off of a DMA device this is a bit hackish as
esp(4) then directly uses the softc of the respective parent to talk
to the DMA engine. It might make sense to add an interface for this
in order to implement this in a cleaner way however it's not yet clear
how the requirements for the LANCE Ethernet controllers are and the
hack works for now. [2]
This effectively adds support for the onboard SCSI controller in
Ultra 1 as well as most of the ESP-based SBus add-on cards to esp(4).
With this the code for supporting the Performance Technologies SBS430
SBus SCSI add-on cards is also largely in place the remaining bits
were however omitted as it's unclear from the NetBSD how to couple
the DMA engine and the core logic together for these cards.
Obtained from: OpenBSD [1]
Obtained from: NetBSD [2]
Clue from: BSD/OS [3]
Reviewed by: scottl (earlier version)
Tested with: FSBE/S add-on card (FAS236), SSHA add-on card (ESP100A),
Ultra 1 (onboard FAS100A), Ultra 2 (onboard FAS366)
device and which also applies to the children. This is very usefull for
drivers for the various subordinate busses so they don't need to fiddle
with the OFW node of their parent themselves. As SBus busses hang of the
nexus and we don't use the ofw_bus interface for nexus devices, yet, this
would also require special knowledge about this in the drivers for the
SBus children which these shouldn't need to have.
This includes switching to use an unshifted IGN in the sc_ign member of
the sbus(4) softc internally.
- For SBus child devices where there are variants that are actually split
split into two SBus devices (as opposed to the first half of the device
being a SBus device and the second half hanging off of the first one)
like 'auxio' and 'SUNW,fdtwo' or 'dma' and 'esp' probe the SBus device
which is a prerequisite to the driver attaching to the second one with
a lower order. This saves us from dealing with different probe orders
in the respective device drivers which generally is more hackish.
- Remove a stale comment about the 'specials' array above the attaching
of the child devices. This is a remnant of the NetBSD/sparc origin of
this code. There the 'specials' array is also used to probe certain
devices which are prerequisites to others first. Why NetBSD soley
relies on the devices having the expected order in the OFW tree on
sparc64 isn't clear to me, as far as I can tell OFW doesn't guaranteed
such things.
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).
Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
- Merge lsi64854.c rev. 1.25 from NetBSD: nuke trailing whitespace.
- Update NetBSD RCS IDs according to what was actually already merged.
- Remove dv_name from the lsi64854_softc and use device_printf() instead.
- Use __func__ instead of hardcoded function names in error messages.
- Use ulmin() instead of min() for comparing the DMA sizes as the values
involved actually are represented by 64bit unsigned instead of 32bit
unsigned. As far as I can't tell this doesn't make a difference in
practice though.
- Some style(9) fixes (mainly indentation).
- Remove unnecessary braces.
at their old location in sys/dev/esp after they were repo-copied to
sys/sparc64/sbus at rev. 1.1:
sys/dev/esp/lsi64854.c rev. 1.2
sys/dev/esp/lsi64854var.h rev. 1.2
Add some style(9) touch ups; style(9) states that new code should follow
these conventions and, well, this is a new driver.
Tested on: i386, sparc64
Reviewed by: scottl
with the attaching of the children done in the bus attach function like
it's supposed to be.
- In the bus probe nomatch function print the resources of the children
like it's done in the other sparc64 specific bus drivers.
- For the clock frequency IVAR use the per-child values and fall back to
the bus default in case a child doesn't have the respective property
instead of always using the bus default so a child driver doesn't need
to obtain the per-child value itself (see also the commit message of
sys/dev/esp/esp_sbus.c rev. 1.7).
- Add support for pass-through allocations. The comment preceding
sbus_alloc_resource() wasn't quite correct, we need to support pass-
through allocations for the 'espdma' and 'ledma' (pseudo-)busses which
hang off of the SBus in Ultra 1 machines. There can also be actual
bridges like the SBus-to-PCMCIA bridge on the SBus and the XBox (SBus
extension box) probably also involves one.
- Use auto-generated typedefs for the prototypes of the device interface
functions.
- Style(9) fixes (mainly don't use function calls in initializers).
- Use __func__ instead of hardcoded function names in error messages.
- Try to make error messages sound uniform.
- Try to keep the code within 80 columns.
- Correct some typos.
- Correct some function declarations to match their prototypes.
- Remove unused headers, macros and variables.
- Remove a bzero() superfluous due to allocating with M_ZERO.
- Use FBSDID.
Don't use atomic ops to increment interrupt stats.
On sparc64 this reduces delay until tick interrupts are service by 1/10th
on average. In turn this reduces the clock drift caused by these delays
so there's less drift which has to be compensated in tick_hardclock().
This includes switching from atomically incrementing the global cnt.v_intr
to the asm equivalent of PCPU_LAZY_INC(cnt.v_intr) in exception.S
- Correct some comments to match the registers actually used.
- Correct some format specifiers, interrupt levels passed in are u_int.
- Use FBSDID.
Ok'ed by: jhb
- Fix NULL pointer dereferences caused when an ithread or a handler is
NULL which happens when a stray interrupt triggers after the respective
device interrupt was torn down.
- Remove the critical section around INTR_FAST handlers which actually
was a nested critical section. Both tl0_intr() and tl1_intr() already
enter a critical section for calling intr_execute_handlers().
MFC after: 3 days
call tick_stop() again after tick_init() as tick interrupts already
have been disabled as part of tick_init().
- In spinlock_enter() replace the magic value for PIL TICK with the
respective macro.
- Use FBSDID.
SpitFire erratum #54) which can cause writes to the TICK_CMPR register
to fail. This seems to fix the dying clocks problem reported by jhb@
and kris@. [1]
- In tick_start() don't reset the tick counter of the boot processor to
zero. It's initially reset in _start() and afterwards but _before_
tick_start() is called on the BSP the APs synchronise with the tick
counter of the BSP in mp_startup(). Resetting the tick counter of the
BSP in tick_start() probably also was the cause of problems seen when
using the CPU tick counter as timecounter on SMP machines.
Not resetting the tick counter of the BSP in mp_startup() makes the
tick counters and tick interrupts between the BSP and APs be pretty
much in sync as it's supposed to be. This also means there's no longer
a real reason to have separate tick_start() and tick_start_ap() so
merge them and zap tick_start_ap(). This is also a first step in
simplifying the interface to the tick counters in preparation to use
alternate clock hardware where available.
- Switch to the algorithm used on FreeBSD/ia64 for updating the tick
interrupt register and which compensates the clock drift caused by
varying delays between when the tick interrupts actually trigger and
when they are serviced. Not compensating the clock drift mainly hurts
interactive performance especially when using WITNESS. [2]
For further information about the algorithm also see the commit log
of sys/ia64/ia64/interrupt.c rev. 1.38.
On sparc64 the sysctls for monitoring the behaviour of the tick
interrupts are machdep.tick.adjust_edges, machdep.tick.adjust_excess,
machdep.tick.adjust_missed and machdep.tick.adjust_ticks.
- In tick_init() just use tick_stop() for stopping the tick interrupts
until a proper handler is set up later. This also stops the system
tick interrupt on USIII systems earlier.
- In tick_start() check for a rough upper limit of HZ.
- Some minor changes, e.g. use FBSDID, remove unused headers, etc.
Info obtained from: Linux [1]
Ok'ed by: marcel [2]
Additional testing by: kris (earlier version of the workaround), jhb
X-MFC after: 3 days [1]
disabling interrupts before updating the saved pil in the thread. If we
save the value first then it can be clobbered if an interrupt comes in
and the interrupt handler tries to acquire a spin lock.
Submitted by: marius
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.
Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD
inevitable component in Sun Exx00 machines and provides serial ports,
NVRAM and TOD amongst others which are handled by uart(4) and eeprom(4)
respectively). This driver currently only prints out information about
the chassis on attach and allows to blink the 'Cycling' LED (which is
duplicated on the front panel) of the clock board just like fhc(4) does
for the other boards. The device name for the LED is /dev/led/clockboard.
Obtained from: OpenBSD
Tested by: joerg
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in fhc_alloc_resource().
- In case the board model can't be determined just print "unknown model"
so the physical slot number is reported in any case.
- Add support for blinking the 'Cycling' LED of boards on a fhc(4) hanging
of off the nexus (i.e. all boards except the clock board) via led(4).
All boards have at least 3 controllable status LEDs, 'Power', 'Failure'
and 'Cycling'. While the 'Cycling' LED is suitable for signaling from
the OS the others are better off being controlled by the firmware.
The device name for the 'Cycling' LED of each board is /dev/led/boardX
where X is the physical slot number of the board. [1]
Obtained from: OpenBSD [1]
Tested by: joerg [1]
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in central_alloc_resource().
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.
PR: conf/40855
MFC after: 1 week
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
Change fhc(4) to use IRQ numbers instead of RIDs for allocating the
IRQs of children. This works similar to e.g. sbus(4), i.e. add the
IRQ resources as fully specified to the resource lists of the children,
allocate them like normal. When establishing the interrupt search the
interrupt maps of the children for a matching INO to determine which
map we need to write the fully specified interrupt number to and to
enable the mapping (before the RID was used to indicate which interrupt
map to use).
- dev/puc/puc.c:
Revert rev. 1.38, with the above change fhc(4) no longer needs special
treatment for allocating IRQs.
Thanks to: joerg for providing access to an E3500
- Use FBSDID.
- Remove unused macro.
- Use auto-generated typedefs for the prototypes of the bus and device
interface functions.
- Terminate the output of device_printf(9) with a newline char.
- Honour the return values of malloc(), OF_getprop(), etc.
- Use __func__ instead of hardcoded function names.
- Print the physical slot number and the board model on attach.
MFC after: 1 month
- Use FBSDID.
- Remove an unused include.
- Use auto-generated typedefs for the prototypes of the device interface
functions.
- Terminate the output of device_printf(9) with a newline char.
- Honour the return value of malloc(3).
MFC after: 1 month
call vector which was added in rev. 1.52. This change was done way before
sparc64 switched to a 64-bit time_t so all binaries are expected to have
been recompiled by now.
aid for ABI breakages caused by system call changes. These changes were
done way before sparc64 switched to a 64-bit time_t so all binaries are
expected to have been recompiled by now.
place.
This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.
By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.
Submitted by: netchild
Reviewed by: various developers on arch@, some time ago
for this are the on-board SCCs and UARTs that use a shared IRQ. [1]
- Rework the interrupt counting code to account for shared interrupts. [1]
- In case ithread_add_handler() failed in inthand_add() just return with
the error code instead of setting up a non-fast handler regardless or
setting up a non-fast handler instead of a fast handler. I can't think
of a situation where the former behaviour would do the right thing.
Reviewed by: marcel [1]
Based on: sys/i386/i386/intr_machdep.c [1]
- Use FBSDID.
- Use uintXX_t instead of u_intXX_t.
- Be consistent with white-space.
- Mark some globals as static.
- Add a missing prototype.
- Remove a unused variable.
- etc.
loaded, the tick interrupt enabled and a handler that resets the tick
counter on every tick interrupt. While this isn't documented this can
cause DELAY() to wait for a value the tick counter will not reach when
used in early boot, i.e. before cpu_initclocks() is called, depending
on when in the cycle DELAY() is called, the delay value and the value
the tick compare register is set to. The excessive use of DELAY() in
uart(4) when probing Sun keyboards seems to always manage to trigger
this, resulting in a hang during boot.
Disable the tick interrupt in tick_init(), which is called early in
sparc64_init(), until the interrupt is enabled again in tick_start(),
called by cpu_initclocks(), with our own handler. This fixes the hang
during probing Sun keyboards on AXi boards and Ultra 10, with other
machines like Ultra 5 probably being affected but not tested.
Additional testing by: Matthias Muthmann
MFC after: 1 week
for nodes hanging off of Central (untested), FireHose (untested) and
PCI (tested) busses.
- Add an additional parameter to OF_decode_addr() which specifies the
index of the register bank to decode.
These should allow to eventually add support for the Z8530 hanging off of
FireHose to uart(4) and to write support for PCI-based graphics adapters.
Suggested by: tmm (back in '03)
uses the i8237 without trying to emulate the PC architecture move
the register definitions for the i8237 chip into the central include
file for the chip, except for the PC98 case which is magic.
Add new isa_dmatc() function which tells us as cheaply as possible
if the terminal count has been reached for a given channel.
o Disable ofw_console(4), sab(4) and zs(4).
sab(4) and zs(4) are disabled because the hardware controlled by
them is handled by uart(4)+puc(4) and the latter combination is
functionally complete and up to date.
ofw_console(4) is disabled because it doesn't claim the device it
controls (through OFW) and thus interferes with puc(4)+uart(4),
which has sufficient knowledge to extract the necessary information
from OFW to setup the console. Put differently, ofw_console(4) is
not a proper device driver and can only do harm. Its functionality
is completely handled by uart(4).
This commit makes uart(4) the default driver for serial ports.
MFC after: 2 weeks
bridge in the device tree which lacks the mandatory (also by the OFW PCI
bus binding spec) "reg" property. Change the code to just ignore nodes
missing the "reg" property instead of panicing when encountering such a
node. Also ignore nodes without a "name" property (guaranteed by the OFW
PCI bus binding spec). This brings the behaviour of the MD OFW PCI code
regarding such incomplete nodes in line with the EBus and the SBus code.
Tested by: Cyril Tikhomiroff <tikho@anor.net>
MFC after: 1 month
now use a pool mutex to manage the reference counts. This fixes races
resulting in use-after-free.
Tested by: kris, David Cornejo dave at dogwood dot com
Reported by: bmilekic's MemGuard
MFC after: 1 week
won't exist for EBus. Just fail the allocation by returning NULL.
Now drivers that are MI can try resources that the driver knows may
be used by the device.
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.
Remove a bogus comment from pmap_enter_quick().
Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.
Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
These devices should be probed first because they are at fixed
locations and cannot be turned off. ISA PNP devices, on the other
hand, can be turned off and often can be flexible in the resources
they use. Probe them last, as always.
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.
This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.
Tested on: alpha, amd64, i386, ia64, sparc64
clock found on the ISA bus (some USIIe, USIIi and USIIIi models) and
EBus (USIII models) instead of a MK48Txx clock.
Testet by: Matthew T. Lager" <freebsd@trinetworks.com> on Sun Fire V100,
Xavier Beaudouin <kiwi@oav.net> on Netra X1 (initial version)
- The claim in the commit log of rev. 1.11 of dev/uart/uart_cpu_sparc64.c
etc. that UARTs are the only relevant ISA devices on sparc64 turned out
to be false. While there are sparc64 models where UARTs are the only
devices on the ISA bus there are in fact also low-cost models where all
devices traditionally found on the EBus are hooked up to the ISA bus.
There are also models that use a mix between EBus and ISA devices with
things like an AT keyboard controller and other rather interesting
devices that we might want to support in the futute hook up to the ISA
bus.
In order to not need to add sparc64 specific device_identify methods to
all of the respective ISA drivers and also not add OFW specific code to
the common ISA code make the sparc64 ISA bus code fake up PnP devices so
most ISA drivers probe their devices without further changes.
Unfortunately Sun doesn't adhere to the ISA bindings defined in IEEE
1275-1994 for the properties of most of the ISA devices which would
allow to obtain the vendor and logical IDs from their properties. So we
we just use a simple table which maps the name properties to PnP IDs.
This could be done in a more sophisticated way but I courrently don't
see the need for this. [1]
- Add the children with fully mapped and specified resources (in the OFW
sense) similar to what is done in the EBus code for the IRQ resources
of the children as adjusting the resources and the resource list entries
respectively in isa_alloc_resource() as done perviously causes trouble
with drivers which use rman_get_start(), pass-through or allocate and
release resources multiple times, etc.
Adjusting the resources might be better off in a bus_activate_resource
method but the common ISA code currently doesn't allow for an
isa_activate_resource(). [2]
With this change:
- ppbus(4) and lpt(4) attach and work (modulo ECP mode, which requires
real ISADMA code but it currently only consists of stubs on sparc64).
- atkbdc(4) and atkbdc(4) attach, no further testing done.
- fdc(4) itself attaches but causes a hang while attaching fd0 also
when is DMA disabled, further work in fdc(4) is required here as e.g.
fd0 uses the address of fd1 on sparc64 (not sure if sparc64 supports
more than one floppy drive at all).
All of these drivers previously caused panics in the sparc64 ISA code.
- Minor changes, e.g. use __FBSDID, remove a dupe word in a comment and
declare one global variable which isn't used outside of isa.c static.
o dev/uart/uart_cpu_sparc64.c and modules/uart/Makefile:
- Remove the code for registering the UARTs on the ISA bus from the
sparc64 uart_cpu_identify() again and rely on probing them via PnP.
Original idea by: tmm [1]
No objections by: tmm [1], [2]
I have in mind for the genclock interface):
- Recognize the MK48T18 as well (differs from the MK48T08 only in
packaging options and voltages).
- Allow MD code to provide functions for reading/writing NVRAM/RTC
locations.
If passed NULL, the old behaviour using bus_space_{read,write}_1() is
used. Otherwise, all access to the chip goes via the MD functions.
This is necessary for mvmeppc boards where the mk48txx NVRAM/RTC is
not directly addressable.
- Cleanup MI mk48txx(4) todclock driver:
- Prepare mk48txxvar.h and leave only register definitions in
mk48txxreg.h.
- Define struct mk48txx_softc as usual devices and allocate necessary
members in it.
- Change mk48txx_attach() to only take a device_t.
o While converting the sparc64 eeprom driver to the above changes:
- Remove some dead code and stale comments.
- Use the NVRAM size provided by the mk48txx driver instead of hardcoding
it as suggested by a comment.
- Add a comment about why it doesn't make much sense to read the hostid
directly from the NVRAM except for displaying it when attaching.
- Don't print the hostid if it reads all zero because it's stored
elsewhere.
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.
This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
on UltraSPARC workstations. The driver is based on OpenBSD's SBus
cs4231 driver and heavily modified to incorporate into sound(4)
infrastructure. Due to the lack of APCDMA documentation, the DMA
code of SBus cs4231 came from OpenBSD's driver.
The driver runs without Giant lock and supports both SBus and EBus
based CS4231 audio controller. Special thanks to marius for providing
feedbacks during the driver writing. His feedback made it possible
to write hiccup free playback code under high system loads.
Approved by: jake (mentor)
Reviewed by: marius (initial version)
Tested by: marius, kwm, Julian C. Dunn(jdunn AT opentrend DOT net)
be used to announce various system activity.
The auxio device provides auxiliary I/O functions and is found on various
SBus/EBus UltraSPARC models. At present, only front panel LED is
controlled by this driver.
Approved by: jake (mentor)
Reviewed by: joerg
Tested by: joerg
MP machines (hopefully). CPU timers are OK on UP machines but we
don't keep the timers in sync on MP machines so if the CPU's timer
is chosen as the primary timecounter it's possible for time to
not be monotonically increasing because different CPU's counters
may be used at different times. But the CPU's counters are otherwise
one of the higher quality counters available. So, on UP machines
we'll use a relatively high quality value but on MP machines we'll
use a quality that should prevent the CPU's counters from being chosen.
Requested by: green (who did the first version of the patch)
Reviewed by: marius, green
MFC after: 1 week
the structure space had been obtained from malloc() its contents is
random garbage. The choice of value being set is part of a larger effort
to solve some timecounter issues on MP machines (while working on that
we noticed this problem).
Noticed by: marius
Reviewed by: marius, green
MFC after: 3 days
longer than 'normal'. The cause is still being tracked down but
in the meantime there are machines where raising IPI_RETRIES does
help - it's not just a case of the machine staying locked up longer
and then panic-ing anyway. Several helpful folks on sparc64@ tried
a patch that helped figure out what to raise this number to.
Discussed on: sparc64@
MFC after: 3 days
MAXWIN to the register window manipulation functions - rwindow_load()
calls rwindow_save() so this one addition should take care of both.
This should help find places that pcb_nsaved doesn't get initialized
properly.
Suggested by: jake
for the new thread. The rest of the fields in the pcb wind up being
written to before they're read as a normal part of the pcb usage but
this field may be read upon return to userland, having it be uninitialized
garbage is bad.
Submitted by: Andrew Belashov (bel at orel dot ru)
Reviewed by: jake
MFC after: 3 days
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.
MFC after: 1 month
Tested on: i386, alpha, sparc64
example the maximum segment size is 64K while the boundary is set
to 8K due to controller limitations. It is impossible to NOT cross
the boundary for any segment size that's larger than the boundary.
So, once we inherited the boundary from the parent tag, make sure
to reduce the maximum segment size to the boundary if it was larger.
MT5 candidate.
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.
This is a MT5 candidate.
Reviewed by: marcel
Submitted by: sos (in part)
It can be switched back once 5.3 is tested and released. Also turn on
PREEMPTION as many of the stability problems with it have been fixed.
MT5: 3 days.
but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.
MFC after: 3 days
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.
MFC after: 2 days
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.
The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.
This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...
Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.
If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.
Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
- Add some commented out NICs from i386 GENERIC. Most of them look like they
would work but I'm not sure if they are endian-clean and can't test. There
was a report that sk(4) works on sparc64 but it doesn't look like it would
because it doesn't use busdma.
- Improve some of the descriptions of sparc64 specific devices.
There's no functional change, i.e. no added or deleted uncommented devices or
options, in this commit.
- Chase the split of pcm(4). This unbreaks LINT compiles.
- sc(4) basically works and a lot of its options should be supported.
- Add the creator and ofw_console drivers.
- vinum(4) should work, at least its module was turned on for sparc64 a while
ago.
- Don't build sio(4). Its EBus front-end was removed a while ago and the ISA
one hardly works. Use uart(4) instead, it's not perfect yet but works much
better.
variable. If set to "true" OF_getetheraddr() will now return the unique
MAC address stored in the "local-mac-address" property of the device's
OFW node if present and the host address/system default MAC address if
the node doesn't doesn't have such a property. If set to "false" the
host address will be returned for all devices like before this change.
This brings the behaviour of device drivers for NICs with OFW support/
FCode, i.e. dc(4) for on-board DM9102A on Sun machines, gem(4) and hme(4),
regarding "local-mac-address?" in line with NetBSD and Solaris.
The man pages of the respective drivers will be updated separately to
reflect this change.
- Remove OF_getetheraddr2() which was used as a stopgap in dc(4). Its
functionality is now part of OF_getetheraddr().
subset ("compatible", "device_type", "model" and "name") of the standard
properties in drivers for devices on Open Firmware supported busses. The
standard properties "reg", "interrupts" und "address" are not covered by
this interface because they are only of interest in the respective bridge
code. There's a remaining standard property "status" which is unclear how
to support properly but which also isn't used in FreeBSD at present.
This ofw_bus kobj-interface allows to replace the various (ebus_get_node(),
ofw_pci_get_node(), etc.) and partially inconsistent (central_get_type()
vs. sbus_get_device_type(), etc.) existing IVAR ones with a common one.
This in turn allows to simplify and remove code-duplication in drivers for
devices that can hang off of more than one OFW supported bus.
- Convert the sparc64 Central, EBus, FHC, PCI and SBus bus drivers and the
drivers for their children to use the ofw_bus kobj-interface. The IVAR-
interfaces of the Central, EBus and FHC are entirely replaced by this. The
PCI bus driver used its own kobj-interface and now also uses the ofw_bus
one. The IVARs special to the SBus, e.g. for retrieving the burst size,
remain.
Beware: this causes an ABI-breakage for modules of drivers which used the
IVAR-interfaces, i.e. esp(4), hme(4), isp(4) and uart(4), which need to be
recompiled.
The style-inconsistencies introduced in some of the bus drivers will be
fixed by tmm@ in a generic clean-up of the respective drivers later (he
requested to add the changes in the "new" style).
- Convert the powerpc MacIO bus driver and the drivers for its children to
use the ofw_bus kobj-interface. This invloves removing the IVARs related
to the "reg" property which were unused and a leftover from the NetBSD
origini of the code. There's no ABI-breakage caused by this because none
of these driver are currently built as modules.
There are other powerpc bus drivers which can be converted to the ofw_bus
kobj-interface, e.g. the PCI bus driver, which should be done together
with converting powerpc to use the OFW PCI code from sparc64.
- Make the SBus and FHC front-end of zs(4) and the sparc64 eeprom(4) take
advantage of the ofw_bus kobj-interface and simplify them a bit.
Reviewed by: grehan, tmm
Approved by: re (scottl)
Discussed with: tmm
Tested with: Sun AX1105, AXe, Ultra 2, Ultra 60; PPC cross-build on i386
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.
Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
Implement the protection check required by the pmap_extract_and_hold()
specification.
Remove the acquisition and release of Giant from pmap_extract_and_hold() and
pmap_protect().
Many thanks to Ken Smith for resolving a sparc64-specific initialization
problem in my original patch.
Tested by: kensmith@
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.
Name of the macro found by: jhb
Reviewed by: jhb
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
those architectures without pmap locking.
- Eliminate the acquisition and release of Giant from vm_map_protect().
(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.
Head nodding: jhb, bmilekic
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.
requested by: rwatson@
tested on: amd64, i386, ia64
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.
The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.
With this change, ia64 has support for breakpoints.
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Remove implementation of Debugger().
o Check kdb_active instead of db_active.
o Call kdb_trap() according to the new world order.
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.
This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.
Approved by: scottl (with his re@ hat)
than a a stack-limited list. This removes the artifical limit on s/g list
size.
cvs: ----------------------------------------------------------------------
Ultra2 users may want to set OFWCONS_POLL_HZ to a value of '20'.
I have left default value at '4' as higher values can consume a more
than is acceptable amount of CPU, and we don't have a consensus yet
what is an optimal value.
Submitted by: Pyun YongHyeon <yongari@kt-is.co.kr>
its primary use is for the FEPS/FAS366 SCSI found in Sun Ultra 1e and 2
machines. Once the pci front-end is ported, this driver can replace the
amd(4) driver.
The code as-is is fairly stable. I've disabled tagged-queueing until I can
figure out a corruption bug related to it. I'm importing it now so that
people with these machines can (finally) stop netbooting and report bugs
before 5.3.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
struct vmspace is freed from cpu_sched_exit() to pmap_release().
This has the advantage of being able to rely on MI code to decide
when a free should occur, instead of having to inspect the reference
count ourselves.
At the same time, turn the per-CPU vmspace pointer into a pmap pointer,
so that pmap_release() can deal with pmaps exclusively.
Reviewed (and embrassing bug spotted) by: jake
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.