1057 Commits

Author SHA1 Message Date
nyan
2ae09c9ace Implement the bus_space_map() function to allocate resources and initialize
a bus_handle, but currently it does only initializing a bus_handle.
2003-09-23 08:22:34 +00:00
marcel
db44a810da Fix the last remaining problem encountered by KSE: apparently it is
not guaranteed that the RSE writes the NaT collection immediately,
sort of atomically, to the backing store when it writes the register
immediately prior to the NaT collection point. This means that we
cannot assume that the low 9 bits of the backingstore pointer do not
point to the NaT collection. This is rather a surprise and I don't
know at this time if it's a bug in the Merced or that it's actually
a valid condition of the architecture. A quick scan over the sources
does not indicate that we depend on the false assumption elsewhere,
but it's something to keep in mind.

The fix is to write the saved contents of the ar.rnat register to
the backingstore prior to entering the loop that copies the dirty
registers from the kernel stack to the user stack.
2003-09-20 20:34:58 +00:00
marcel
10c85423c8 Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These
functions reference UMA internals from <vm/uma_int.h>, which makes
them highly unwanted in non-UMA specific files.

While here, prune the includes in pmap.c and use __FBSDID(). Move
the includes above the descriptive comment.

The copyright of uma_machdep.c is assigned to the project and can
be reassigned to the foundation if and when when such is preferrable.
2003-09-20 19:27:48 +00:00
marcel
901f7b2cb7 Fix the most significant KSE breakage caused by not restoring the
restart instruction bits in the PSR. As such, we were returning
from interrupt to the instruction in the bundle that caused us
to enter the kernel, only now we're returning to a completely
different bundle.

While close here: add two KASSERTs to make sure that we restore
sync contexts only when entered the kernel through a syscall and
restore an async context only when entered the kernel through an
interrupt, trap or fault.

While not exactly here, but close enough: use suword64() when we
copy the dirty registers from the kernel stack to the user stack.
The code was intended to be be replaced shortly after being added,
but that was a couple of weeks ago. I might as well avoid that it
is a source for panics until it's replaced.
2003-09-19 22:51:26 +00:00
marcel
77b076f87c Revamp trap(): make it more explicit which kinds of traps/faults we
can get (or not) and what we do with them. This fixes the behaviour
for NaT consumption and speculation faults in that we now don't panic
for user faults.

Remove the dopanic label and move the code to a function. This makes
it easier in the simulator to set a breakpoint.

While here, remove the special handling of the old break-based syscall
path and move it to where we handle the break vector. While here,
reserve a new break immediate for KSE. We currently use the old break-
based syscall to deal with restoring async contexts. However, it has
the side-effect of also setting the signal mask and callong ast() on
the way out. The new break immediate simply restores the context and
returns without calling ast().
2003-09-19 22:41:52 +00:00
marcel
758f95abc6 Change TRAPF_USERMODE and CLOCKF_USERMODE to not test for CPL == 3,
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).
2003-09-19 07:48:22 +00:00
marcel
4ab0c8936b Include "opt_kstack_pages.h". We export KSTACK_PAGES to assembly and
better have the right value.
2003-09-19 00:37:41 +00:00
alc
76fcb264a0 Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from:	tegge
2003-09-12 07:07:49 +00:00
marcel
6dce2e8fe5 Rewrite the SAPIC initialization to always program the RTEs with what
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.
2003-09-10 22:49:38 +00:00
jhb
5fcbea6e19 Move the definitions for ACPI MADT table entries not present in the ACPICA
distribution to a MI header so it can be shared with other architectures.
2003-09-10 06:32:27 +00:00
marcel
bdbd90ef2b Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.

The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.
2003-09-09 05:59:09 +00:00
alc
a81d9ad0b9 Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address.  Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by:	tegge
2003-09-08 02:45:03 +00:00
wpaul
ce0ede96f1 Take the support for the 8139C+/8169/8169S/8110S chips out of the
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)

rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:

- Added support for jumbo frames. Currently, there seems to be
  a limit of approximately 6200 bytes for jumbo frames on transmit.
  (This was determined via experimentation.) The 8169S/8110S chips
  apparently are limited to 7.5K frames on transmit. This may require
  some more work, though the framework to handle jumbo frames on RX
  is in place: the re_rxeof() routine will gather up frames than span
  multiple 2K clusters into a single mbuf list.

- Fixed bug in re_txeof(): if we reap some of the TX buffers,
  but there are still some pending, re-arm the timer before exiting
  re_txeof() so that another timeout interrupt will be generated, just
  in case re_start() doesn't do it for us.

- Handle the 'link state changed' interrupt

- Fix a detach bug. If re(4) is loaded as a module, and you do
  tcpdump -i re0, then you do 'kldunload if_re,' the system will
  panic after a few seconds. This happens because ether_ifdetach()
  ends up calling the BPF detach code, which notices the interface
  is in promiscuous mode and tries to switch promisc mode off while
  detaching the BPF listner. This ultimately results in a call
  to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
  to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
  here turns the chip back on and restarts the 1-second timeout loop
  that drives re_tick(). By the time the timeout fires, if_re.ko
  has been unloaded, which results in a call to invalid code and
  blows up the system.

  To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
  which stops the ioctl routine from trying to reset the chip.

- Modified comments in re_rxeof() relating to the difference in
  RX descriptor status bit layout between the 8139C+ and the gigE
  chips. The layout is different because the frame length field
  was expanded from 12 bits to 13, and they got rid of one of the
  status bits to make room.

- Add diagnostic code (re_diag()) to test for the case where a user
  has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
  NICs have the REQ64# and ACK64# lines connected even though the
  board is 32-bit only (in this case, they should be pulled high).
  This fools the chip into doing 64-bit DMA transfers even though
  there is no 64-bit data path. To detect this, re_diag() puts the
  chip into digital loopback mode and sets the receiver to promiscuous
  mode, then initiates a single 64-byte packet transmission. The
  frame is echoed back to the host, and if the frame contents are
  intact, we know DMA is working correctly, otherwise we complain
  loudly on the console and abort the device attach. (At the moment,
  I don't know of any way to work around the problem other than
  physically modifying the board, so until/unless I can think of a
  software workaround, this will have do to.)

- Created re(4) man page

- Modified rlphy.c to allow re(4) to attach as well as rl(4).

Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).
2003-09-08 02:11:25 +00:00
marcel
ae81d949d1 Untangle the code in this file to improve understandability. Both
ia64_count_cpus() and ia64_probe_sapics() called a single function
to do the the actual work. The difference in behaviour was handled
in that function and was further complicated by adding bootverbose
related code. As such, even the simplest of changes was hard to
comprehend.

Untangling has been done by increasing code duplication and using
a more naive style of coding. FWIW, the object file is slightly
smaller than before, so things aren't as bad as it may seem.

Triggered by: a simple fix on the P4 branch that never got merged.
2003-09-07 23:09:08 +00:00
alc
e443888faa MFamd64/i386
Add necessary page locking to pmap_mincore().
2003-09-07 20:02:38 +00:00
marcel
f7a2b4ef59 MFp4: Revamped GENERIC (and hints). This is some much more pleasant
to look at...
2003-09-07 06:39:51 +00:00
marcel
d45904e351 Replace sio(4) with uart(4). Remove the sio(4) hints and only add
those hints used by uart(4) for the determination of the serial
console in the absence of the HCDP table.
2003-09-07 05:47:10 +00:00
marcel
b7c5acb1f2 Fix a place where I forgot to change the code that checks whether
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.

To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.

Run into this: davidxu
2003-09-05 22:50:10 +00:00
marcel
0d2e39083f Use pmap_steal_memory() for the msgbuf instead of trying to squeeze
it in the last chunk (phys_avail block). The last chunk very often is
not larger than one or two pages, resulting in a msgbuf that's too
small to hold a complete verbose boot.
Note that pmap_steal_memory() will bzero the memory it "allocates".
Consequently, ia64 will never preserve previous msgbufs. This is not
a noticable difference in practice. If the msgbuf could be reused,
it was invariably too small to have anything preserved anyway.
2003-09-01 07:06:57 +00:00
marcel
f11904081f Use direct mapped KVA for the sf_buf allocator, as made possible
by the previous commit. While here, fix a typo, reformat comments
and fix a long line.

Tested with: ftpd
2003-09-01 00:12:27 +00:00
alc
8b0114def1 Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files.  The rationale for this
migration is illustrated by the modified amd64 allocator.  It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space.  On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page.  Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.
2003-08-29 20:04:10 +00:00
njl
638644189e Minor style cleanups. 2003-08-28 16:30:31 +00:00
marcel
d6144c3ba3 Change LOG2_PAGE_SIZE from 14 to 15 bits. This will cause the CTASSERT
in vm_page.h to be reached and thus slightly increases the overall
coverage of LINT on ia64.
2003-08-25 20:02:18 +00:00
marcel
4270721f5e Add the bits for a LINT kernel. It has been verified to compile. We
may need to polish this.
2003-08-23 21:47:33 +00:00
marcel
0a0d516cca Remove PAGE_SIZE_4K, PAGE_SIZE_8K and PAGE_SIZE_16K and replace them
with LOG2_PAGE_SIZE. A single option is better to LINT than multiple
mutual exclusive ones.
2003-08-23 03:39:55 +00:00
marcel
3d03264769 Remove unused inclusion of opt_acpi.h 2003-08-23 00:07:52 +00:00
jhb
bcd3074e53 Regen. 2003-08-21 14:16:41 +00:00
jhb
c33688e027 Swap sigaction/sigreturn since they are in the wrong order.
Noticed indirectly by:	peter
2003-08-21 14:16:00 +00:00
marcel
dd5e41ad29 Undo the mistake made in revision 1.77 of trap.c and which was the
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.

It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.

So, we're back to looking at the privilege level like it should be.
There's no other way.

Pointy hat: marcel
2003-08-20 05:30:35 +00:00
gordon
2456eb188f Fixup the ELF branding information to point to the new home of rtld. 2003-08-17 08:08:38 +00:00
marcel
4194d813c1 In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.
2003-08-16 23:15:15 +00:00
marcel
c1d4b42a69 Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)
2003-08-16 16:57:57 +00:00
marcel
0cde071e2f Fix a range check bug. Don't left-shift the integer argument 'data'.
Sign extension happens after the shift, not before so that boundary
cases like 0x40000000 will not be caught properly.
Instead, right shift ndirty. It is guaranteed to be a multiple of 8.
While here, do some manual code motion and code commoning.

Range check bug pointed out by: iedowse
2003-08-16 01:49:38 +00:00
marcel
cae6951d00 Fix the generation of coredumps. We did not take the dirty registers
that were on the kernel stack into account. For now we write them
out to the register stack of the process before creating the dump.
This however is not the final solution. The problem is that we may
invalidate the coredump by overwriting vital information due to an
invalid backing store pointer. Instead we need to write the dirty
registers to an unused region of VM which will result in a seperate
segment in the coredump. For now we can at least get to all the
registers from a coredump.
2003-08-15 05:52:48 +00:00
marcel
5fbc98d240 Add an instruction group break after the move to application register
and the move to control register to avoid dependency violations when
these functions are used. Note that explicit data and instruction
serialization also need to be in a subsequent instruction group.
This too requires that we have an igrp break here.
2003-08-15 05:46:33 +00:00
marcel
b37a3e34cd Introduce two machine specific ptrace(2) requests: PT_GETKSTACK and
PT_SETKSTACK. These requests allow the tracing process to access the
dirty registers of the traced process that are on the kernel stack.

Note that there's currently no way to access the rnat register for
those dirty registers that are not (yet) covered by a nat collection
point. The interface for this is still being slept on.

Also note that implied by these requests is the division of work:
The tracing process has to keep track of where registers are spilled
and is responsible to figure out where the NaT bit of the stacked
registers are at any time during the execution of the traced process.
The kernel provides the interfaces but will not abstract the fact
that the register stack can be split. This model does not follow
the approach taken in Linux where PT_PEEK and PT_POKE deals with
this automagically.
2003-08-15 05:40:59 +00:00
marcel
ec1e7cccba Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is
in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the
gateway page, which means that improper memory accesses to the gateway
page while in user mode would panic the kernel. Use VM_MAX_ADDRESS
instead. It ends before the gateway page. The difference between
VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.
2003-08-13 03:20:10 +00:00
marcel
ad2c7b6428 Put an instruction group break between the move to ar.rnat and the
move to ar.rsc. The RSE must be in enforced lazy mode when writing
to RSE modifyable registers. In this case we restore the RSE NaT
collection register ar.rnat. I have seen 2 general exception faults
on pluto1 now that indicate that the move to ar.rsc has already
happened prior to the move to ar.rnat, meaning that the RSE is not
in enforced lazy mode anymore. The ia64 dependency and instruction
ordering rules seem to allow having both registers written to in
the same instruction group, provided ar.rsc is written to later than
ar.rnat (based on the ordering semantics). It appears that we may
be pushing our luck. For now, put them in seperate cycles (by means
of the instruction group break). If we ever get a general exception
fault on the move to ar.rnat again, we have definite proof that
something else is fishy.
2003-08-13 02:49:50 +00:00
imp
3bc162cfa3 Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon
2003-08-12 23:24:05 +00:00
marcel
cb02f3b09a Extend identifycpu():
o  Differentiate between CPU family and CPU model. There are multiple
   Itanium 2 models and it's nice to differentiate between them.
o  Seperately export the CPU family and CPU model with sysctl.
o  Merced is the only model in the Itanium family.
o  Add Madison to the Itanium 2 family. We already knew about McKinley.
o  Print the CPU family between parenthesis, like we do with the i386
   CPU class.

My prototype now identifies itself as:
	CPU: Merced (800.03-Mhz Itanium)

pluto1 and pluto2 will eventually identify themselves as:
	CPU: McKinley (900.00-Mhz Itanium 2)
2003-08-12 08:10:16 +00:00
marcel
eb75f40ad2 Cleanup prototypes in cpu.h, including fswintrberr and any references
to it. Sort the remaining prototypes in cpu.h.

No functional change.
2003-08-12 03:51:53 +00:00
marcel
fd356a4423 Cleanup and style(9) fixes. No functional change. 2003-08-11 21:25:19 +00:00
marcel
37e1a74113 o move cpu_reset() from vm_machdep.c to machdep.c.
o reorder cpu_boot(), cpu_halt() and identifycpu().

No functional change.
2003-08-10 21:33:07 +00:00
marcel
07045a57cf Now that we can ignore up to 8KB of dirty registers, remove the RSE
magic from exec_setregs(). In set_mcontext() we now also don't have
to worry that we entered the kernel with more that 512 bytes of
dirty registers on the kernel stack. Note that we cannot make any
assumptions anymore WRT to NaT collection points in exec_setregs(),
so we have to deal with them now.
2003-08-10 08:04:21 +00:00
marcel
c39e23c83d MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry(). 2003-08-08 00:30:26 +00:00
jhb
37641f86f1 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
marcel
139a0b455d Better define the flags in the mcontext_t and properly set the flags
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
	_MC_FLAGS_ASYNC_CONTEXT
	_MC_FLAGS_HIGHFP_VALID
	_MC_FLAGS_KSE_SET_MBOX
	_MC_FLAGS_RETURN_VALID
	_MC_FLAGS_SCRATCH_VALID

Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
2003-08-07 07:52:39 +00:00
marcel
897a96b736 o Fix cut-n-paste whitespace corruption in previous commit
o  For trap-based upcalls the argument (the kse_mailbox) to
   the UTS must be written onto the kernel stack, not the
   user stack. While here, deal with the fact that we may
   be at a NaT collection point.
2003-08-07 07:40:19 +00:00
marcel
023428afce In cpu_set_upcall_kse(), create the upcall according to the entry
path into the kernel. Normally it's due to a syscall, but one can
also be created as the result of a clock interrupt (for example).
This now even more looks like exec_setregs().

While here, add an assert that we don't expect more than 8KB of
dirty registers on the kernel stack.
2003-08-06 23:28:19 +00:00
marcel
f8309da488 o In revision 1.45 of exception.S we changed exception_restore to
unconditionally restore ar.k7 (kernel memory stack) and ar.k6
   (kernel register stack). I don't know what I was smoking then,
   but if you unconditionally restore ar.k6, you also want to
   compute its value unconditionally. By having the computation
   predicated and dependent on whether we return to user mode, we
   would end up writing junk (= invalid value for ar.bspstore) if
   we would return to kernel mode. But the whole point of the
   unconditional restoration was that there is a grey area where
   we still need to have ar.k6 restored. If we restore with a junk
   value, we would end up wedging the machine on the next interrupt.
   So, unconditionally calculate the value we unconditionally write
   to ar.k6.

o  The previous braino was found while making the following change:
   We used to clear the lower 9 bits of the value we write to ar.k6.
   The meaning being that we know that the kernel register stack is
   at least 512 byte aligned and simply clearing the lower 9 bits
   allows us to return to a context of which we don't have dirty
   registers on the kernel stack, even though the context that
   entered the kernel does have dirty registers on the kernel stack.
   By masking-off the lower bits, we correctly obtain the base of
   the register stack without having to worry that we didn't actually
   reached the base while unwinding it.
   The change is to mask off the lower 13 bits, knowing that the
   kernel register stack is always 8KB aligned. The advantage is that
   we don't have to worry anymore if there's more than 512 bytes of
   dirty registers on the kernel stack. A situation that frequently
   occurs. In exec_setregs() in machdep.c:1.147 or older, we had to
   deal with that situation by copying the active portion of the
   register stack down in multiples of 512 bytes. Now that we mask off
   the lower 13 bits we don't have to do that at all. Contemporary
   IPF processors have a register file that can hold up to 96 stacked
   registers (=784 bytes [incl. 2 NaT collections]). With no indication
   that register files grow beyond a couple of hundred registers, we
   should not have to worry about it anymore... and yes, 640KB is
   enough for everybody :-)
   This change helps setcontext(2) and cpu_set_upcall_kse() in that
   they can return to completely different contexts without having to
   mess with the kernel stack. Of course exec_setregs() doesn't need
   to do that anymore as well.
2003-08-06 21:32:38 +00:00