freebsd-dev

Author	SHA1	Message	Date
marcel	8b065d9ba6	Swap the syscall caller frame info (i.e. the return pointer and frame marker) and the syscall stub frame info in the trap frame. Previously we stored the stub frame info in (rp,pfs) and the caller frame info in (iip,cfm). This ends up being suboptimal for the following reasons: 1. When we create a new context, such as for an execve(2), we had to set the (rp,pfs) pair for the entry point when using the syscall path out of the kernel but we need to set the (iip,cfm) pair when we take the interrupt way out. This is mostly just an inconsistency from the kernel's point of view, but an ugly irregularity from gdb(1)'s point of view. 2. The getcontext(2) and setcontext(2) syscalls had to swap the (rp,pfs) and (iip,cfm) pairs to make the context compatible with one created purely in userland. Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal handlers that actually peek at the mcontext_t and to gdb(1). Since this change is made for gdb(1) and we don't care about signal handlers that peek at the mcontext_t because we're still a tier 2 platform, this ABI breakage is academic at this moment in time. Note that there was no real reason to save the caller frame info in (iip,cfm) and the stub frame info in (rp,pfs).	2003-10-03 03:50:29 +00:00
marcel	243eb29986	Drop any and all support for varargs. There's no history to worry about because we're still tier 2 and our current compiler, as well as future compilers will not support varargs. This is mostly a no-op in practice, because <sys/varargs.h> should already cause compile failures.	2003-09-28 05:34:07 +00:00
phk	a8a211223e	Set cn_name, not cn_dev	2003-09-26 10:37:16 +00:00
peter	420ccff7be	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
nyan	3306aedb3e	Implement the bus_space_map() function to allocate resources and initialize a bus_handle, but currently it does only initializing a bus_handle.	2003-09-23 08:22:34 +00:00
marcel	3cd5131d2b	Fix the last remaining problem encountered by KSE: apparently it is not guaranteed that the RSE writes the NaT collection immediately, sort of atomically, to the backing store when it writes the register immediately prior to the NaT collection point. This means that we cannot assume that the low 9 bits of the backingstore pointer do not point to the NaT collection. This is rather a surprise and I don't know at this time if it's a bug in the Merced or that it's actually a valid condition of the architecture. A quick scan over the sources does not indicate that we depend on the false assumption elsewhere, but it's something to keep in mind. The fix is to write the saved contents of the ar.rnat register to the backingstore prior to entering the loop that copies the dirty registers from the kernel stack to the user stack.	2003-09-20 20:34:58 +00:00
marcel	3c8a86653d	Move uma_small_alloc() and uma_small_free() to uma_machdep.c. These functions reference UMA internals from <vm/uma_int.h>, which makes them highly unwanted in non-UMA specific files. While here, prune the includes in pmap.c and use __FBSDID(). Move the includes above the descriptive comment. The copyright of uma_machdep.c is assigned to the project and can be reassigned to the foundation if and when when such is preferrable.	2003-09-20 19:27:48 +00:00
marcel	9716337ec6	Fix the most significant KSE breakage caused by not restoring the restart instruction bits in the PSR. As such, we were returning from interrupt to the instruction in the bundle that caused us to enter the kernel, only now we're returning to a completely different bundle. While close here: add two KASSERTs to make sure that we restore sync contexts only when entered the kernel through a syscall and restore an async context only when entered the kernel through an interrupt, trap or fault. While not exactly here, but close enough: use suword64() when we copy the dirty registers from the kernel stack to the user stack. The code was intended to be be replaced shortly after being added, but that was a couple of weeks ago. I might as well avoid that it is a source for panics until it's replaced.	2003-09-19 22:51:26 +00:00
marcel	a55f3823b7	Revamp trap(): make it more explicit which kinds of traps/faults we can get (or not) and what we do with them. This fixes the behaviour for NaT consumption and speculation faults in that we now don't panic for user faults. Remove the dopanic label and move the code to a function. This makes it easier in the simulator to set a breakpoint. While here, remove the special handling of the old break-based syscall path and move it to where we handle the break vector. While here, reserve a new break immediate for KSE. We currently use the old break- based syscall to deal with restoring async contexts. However, it has the side-effect of also setting the signal mask and callong ast() on the way out. The new break immediate simply restores the context and returns without calling ast().	2003-09-19 22:41:52 +00:00
marcel	1b3cfdf8e6	Change TRAPF_USERMODE and CLOCKF_USERMODE to not test for CPL == 3, but for CPL != 0. For some reason yet unknown it is possible for the CPL to be 2. This would previously be counted as kernel mode, which resulted in nasty panics. By changing the test it is now treated as user mode, which is more correct. We still need to figure out how it is possible that the privilege level can be 2 (or 1 for that matter), because it's not used by us. We only use 3 (user mode) and 0 (kernel mode).	2003-09-19 07:48:22 +00:00
marcel	43cc2482fd	Include "opt_kstack_pages.h". We export KSTACK_PAGES to assembly and better have the right value.	2003-09-19 00:37:41 +00:00
alc	cf7c7842ae	Add a new parameter to pmap_extract_and_hold() that is needed to eliminate Giant from vmapbuf(). Idea from: tegge	2003-09-12 07:07:49 +00:00
marcel	42a246d920	Rewrite the SAPIC initialization to always program the RTEs with what we think is the correct trigger mode and polarity. This allows us to implement BUS_CONFIG_INTR() as an update of the RTE in question. Consequently, we can trust the RTE when we enable an interrupt and avoids that we need to know about the trigger mode and polarity at that time.	2003-09-10 22:49:38 +00:00
jhb	0678b02a19	Move the definitions for ACPI MADT table entries not present in the ACPICA distribution to a MI header so it can be shared with other architectures.	2003-09-10 06:32:27 +00:00
marcel	c41bc1d5c3	Introduce IA64_ID_PAGE_{MASK\|SHIFT\|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.	2003-09-09 05:59:09 +00:00
alc	e7c2643436	Introduce a new pmap function, pmap_extract_and_hold(). This function atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe. Reviewed by: tegge	2003-09-08 02:45:03 +00:00
wpaul	5e79307cb8	Take the support for the 8139C+/8169/8169S/8110S chips out of the rl(4) driver and put it in a new re(4) driver. The re(4) driver shares the if_rlreg.h file with rl(4) but is a separate module. (Ultimately I may change this. For now, it's convenient.) rl(4) has been modified so that it will never attach to an 8139C+ chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to match the 8169/8169S/8110S gigE chips. if_re.c contains the same basic code that was originally bolted onto if_rl.c, with the following updates: - Added support for jumbo frames. Currently, there seems to be a limit of approximately 6200 bytes for jumbo frames on transmit. (This was determined via experimentation.) The 8169S/8110S chips apparently are limited to 7.5K frames on transmit. This may require some more work, though the framework to handle jumbo frames on RX is in place: the re_rxeof() routine will gather up frames than span multiple 2K clusters into a single mbuf list. - Fixed bug in re_txeof(): if we reap some of the TX buffers, but there are still some pending, re-arm the timer before exiting re_txeof() so that another timeout interrupt will be generated, just in case re_start() doesn't do it for us. - Handle the 'link state changed' interrupt - Fix a detach bug. If re(4) is loaded as a module, and you do tcpdump -i re0, then you do 'kldunload if_re,' the system will panic after a few seconds. This happens because ether_ifdetach() ends up calling the BPF detach code, which notices the interface is in promiscuous mode and tries to switch promisc mode off while detaching the BPF listner. This ultimately results in a call to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init() to handle the IFF_PROMISC flag change. Unfortunately, calling re_init() here turns the chip back on and restarts the 1-second timeout loop that drives re_tick(). By the time the timeout fires, if_re.ko has been unloaded, which results in a call to invalid code and blows up the system. To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(), which stops the ioctl routine from trying to reset the chip. - Modified comments in re_rxeof() relating to the difference in RX descriptor status bit layout between the 8139C+ and the gigE chips. The layout is different because the frame length field was expanded from 12 bits to 13, and they got rid of one of the status bits to make room. - Add diagnostic code (re_diag()) to test for the case where a user has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some NICs have the REQ64# and ACK64# lines connected even though the board is 32-bit only (in this case, they should be pulled high). This fools the chip into doing 64-bit DMA transfers even though there is no 64-bit data path. To detect this, re_diag() puts the chip into digital loopback mode and sets the receiver to promiscuous mode, then initiates a single 64-byte packet transmission. The frame is echoed back to the host, and if the frame contents are intact, we know DMA is working correctly, otherwise we complain loudly on the console and abort the device attach. (At the moment, I don't know of any way to work around the problem other than physically modifying the board, so until/unless I can think of a software workaround, this will have do to.) - Created re(4) man page - Modified rlphy.c to allow re(4) to attach as well as rl(4). Note that this code works for the sample 8169/Marvell 88E1000 NIC that I have, but probably won't work for the 8169S/8110S chips. RealTek has sent me some sample NICs, but they haven't arrived yet. I will probably need to add an rlgphy driver to handle the on-board PHY in the 8169S/8110S (it needs special DSP initialization).	2003-09-08 02:11:25 +00:00
marcel	273d98a6c6	Untangle the code in this file to improve understandability. Both ia64_count_cpus() and ia64_probe_sapics() called a single function to do the the actual work. The difference in behaviour was handled in that function and was further complicated by adding bootverbose related code. As such, even the simplest of changes was hard to comprehend. Untangling has been done by increasing code duplication and using a more naive style of coding. FWIW, the object file is slightly smaller than before, so things aren't as bad as it may seem. Triggered by: a simple fix on the P4 branch that never got merged.	2003-09-07 23:09:08 +00:00
alc	2bc0aef39f	MFamd64/i386 Add necessary page locking to pmap_mincore().	2003-09-07 20:02:38 +00:00
marcel	38911e608f	MFp4: Revamped GENERIC (and hints). This is some much more pleasant to look at...	2003-09-07 06:39:51 +00:00
marcel	6f76fc4fad	Replace sio(4) with uart(4). Remove the sio(4) hints and only add those hints used by uart(4) for the determination of the serial console in the absence of the HCDP table.	2003-09-07 05:47:10 +00:00
marcel	5b6604fa58	Fix a place where I forgot to change the code that checks whether we return to kernel or userland. This triggered a panic in a KSE application when TDF_USTATCLOCK was set in the case userland was interrupted, but we never called ast() on our way out. As such, we called ast() at some other time. Unfortunately, TDF_USTATCLOCK handling assumes running in the interrupt thread. This was not the case anymore. To avoid making the same mistake later, interrupt() now returns to its caller whether we interrupted userland or not. This avoids that we have to duplicate the check in assembly, where it's bound to fall off the scope. Now we simply check the return value and call ast() if appropriate. Run into this: davidxu	2003-09-05 22:50:10 +00:00
marcel	436dce9447	Use pmap_steal_memory() for the msgbuf instead of trying to squeeze it in the last chunk (phys_avail block). The last chunk very often is not larger than one or two pages, resulting in a msgbuf that's too small to hold a complete verbose boot. Note that pmap_steal_memory() will bzero the memory it "allocates". Consequently, ia64 will never preserve previous msgbufs. This is not a noticable difference in practice. If the msgbuf could be reused, it was invariably too small to have anything preserved anyway.	2003-09-01 07:06:57 +00:00
marcel	2745ee0477	Use direct mapped KVA for the sf_buf allocator, as made possible by the previous commit. While here, fix a typo, reformat comments and fix a long line. Tested with: ftpd	2003-09-01 00:12:27 +00:00
alc	9187e573c5	Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.	2003-08-29 20:04:10 +00:00
njl	4d86abccfd	Minor style cleanups.	2003-08-28 16:30:31 +00:00
marcel	8818babd0f	Change LOG2_PAGE_SIZE from 14 to 15 bits. This will cause the CTASSERT in vm_page.h to be reached and thus slightly increases the overall coverage of LINT on ia64.	2003-08-25 20:02:18 +00:00
marcel	f905b2ddd4	Add the bits for a LINT kernel. It has been verified to compile. We may need to polish this.	2003-08-23 21:47:33 +00:00
marcel	2de2941ca7	Remove PAGE_SIZE_4K, PAGE_SIZE_8K and PAGE_SIZE_16K and replace them with LOG2_PAGE_SIZE. A single option is better to LINT than multiple mutual exclusive ones.	2003-08-23 03:39:55 +00:00
marcel	9070e01349	Remove unused inclusion of opt_acpi.h	2003-08-23 00:07:52 +00:00
jhb	305c2afb1d	Regen.	2003-08-21 14:16:41 +00:00
jhb	f709d1df90	Swap sigaction/sigreturn since they are in the wrong order. Noticed indirectly by: peter	2003-08-21 14:16:00 +00:00
marcel	505e5b0cd4	Undo the mistake made in revision 1.77 of trap.c and which was the ultimate trigger for the follow-up fixes in revisions 1.78, 1.80, 1.81 and 1.82 of trap.c. I was simply too pre-occupied with the gateway page and how it blurs kernel space with user space and vice versa that I couldn't see that it was all a load of bollocks. It's not the IP address that matters, it's the privilege level that counts. We never run in user space with lifted permissions and we sure can not run in kernel space without it. Sure, the gateway page is the exception, but not if you look at the privilege level. It's user space if you run with user permissions and kernel space otherwise. So, we're back to looking at the privilege level like it should be. There's no other way. Pointy hat: marcel	2003-08-20 05:30:35 +00:00
gordon	7ee368a275	Fixup the ELF branding information to point to the new home of rtld.	2003-08-17 08:08:38 +00:00
marcel	46db143ec2	In vm_thread_swap{in\|out}(), remove the alpha specific conditional compilation and replace it with a call to cpu_thread_swap{in\|out}(). This allows us to add similar code on ia64 without cluttering the code even more.	2003-08-16 23:15:15 +00:00
marcel	40fca14e9d	Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)	2003-08-16 16:57:57 +00:00
marcel	364e7e366c	Fix a range check bug. Don't left-shift the integer argument 'data'. Sign extension happens after the shift, not before so that boundary cases like 0x40000000 will not be caught properly. Instead, right shift ndirty. It is guaranteed to be a multiple of 8. While here, do some manual code motion and code commoning. Range check bug pointed out by: iedowse	2003-08-16 01:49:38 +00:00
marcel	bfb7c59a86	Fix the generation of coredumps. We did not take the dirty registers that were on the kernel stack into account. For now we write them out to the register stack of the process before creating the dump. This however is not the final solution. The problem is that we may invalidate the coredump by overwriting vital information due to an invalid backing store pointer. Instead we need to write the dirty registers to an unused region of VM which will result in a seperate segment in the coredump. For now we can at least get to all the registers from a coredump.	2003-08-15 05:52:48 +00:00
marcel	ecc29cd470	Add an instruction group break after the move to application register and the move to control register to avoid dependency violations when these functions are used. Note that explicit data and instruction serialization also need to be in a subsequent instruction group. This too requires that we have an igrp break here.	2003-08-15 05:46:33 +00:00
marcel	087df6e6d1	Introduce two machine specific ptrace(2) requests: PT_GETKSTACK and PT_SETKSTACK. These requests allow the tracing process to access the dirty registers of the traced process that are on the kernel stack. Note that there's currently no way to access the rnat register for those dirty registers that are not (yet) covered by a nat collection point. The interface for this is still being slept on. Also note that implied by these requests is the division of work: The tracing process has to keep track of where registers are spilled and is responsible to figure out where the NaT bit of the stacked registers are at any time during the execution of the traced process. The kernel provides the interfaces but will not abstract the fact that the register stack can be split. This model does not follow the approach taken in Linux where PT_PEEK and PT_POKE deals with this automagically.	2003-08-15 05:40:59 +00:00
marcel	005e7e4571	Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address is in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the gateway page, which means that improper memory accesses to the gateway page while in user mode would panic the kernel. Use VM_MAX_ADDRESS instead. It ends before the gateway page. The difference between VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.	2003-08-13 03:20:10 +00:00
marcel	01ca0e2ae2	Put an instruction group break between the move to ar.rnat and the move to ar.rsc. The RSE must be in enforced lazy mode when writing to RSE modifyable registers. In this case we restore the RSE NaT collection register ar.rnat. I have seen 2 general exception faults on pluto1 now that indicate that the move to ar.rsc has already happened prior to the move to ar.rnat, meaning that the RSE is not in enforced lazy mode anymore. The ia64 dependency and instruction ordering rules seem to allow having both registers written to in the same instruction group, provided ar.rsc is written to later than ar.rnat (based on the ordering semantics). It appears that we may be pushing our luck. For now, put them in seperate cycles (by means of the instruction group break). If we ever get a general exception fault on the move to ar.rnat again, we have definite proof that something else is fishy.	2003-08-13 02:49:50 +00:00
imp	5d883de6e0	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon	2003-08-12 23:24:05 +00:00
marcel	dcfbeee896	Extend identifycpu(): o Differentiate between CPU family and CPU model. There are multiple Itanium 2 models and it's nice to differentiate between them. o Seperately export the CPU family and CPU model with sysctl. o Merced is the only model in the Itanium family. o Add Madison to the Itanium 2 family. We already knew about McKinley. o Print the CPU family between parenthesis, like we do with the i386 CPU class. My prototype now identifies itself as: CPU: Merced (800.03-Mhz Itanium) pluto1 and pluto2 will eventually identify themselves as: CPU: McKinley (900.00-Mhz Itanium 2)	2003-08-12 08:10:16 +00:00
marcel	80dc534d2e	Cleanup prototypes in cpu.h, including fswintrberr and any references to it. Sort the remaining prototypes in cpu.h. No functional change.	2003-08-12 03:51:53 +00:00
marcel	72055e1933	Cleanup and style(9) fixes. No functional change.	2003-08-11 21:25:19 +00:00
marcel	86952f61f4	o move cpu_reset() from vm_machdep.c to machdep.c. o reorder cpu_boot(), cpu_halt() and identifycpu(). No functional change.	2003-08-10 21:33:07 +00:00
marcel	a4f7cd1a33	Now that we can ignore up to 8KB of dirty registers, remove the RSE magic from exec_setregs(). In set_mcontext() we now also don't have to worry that we entered the kernel with more that 512 bytes of dirty registers on the kernel stack. Note that we cannot make any assumptions anymore WRT to NaT collection points in exec_setregs(), so we have to deal with them now.	2003-08-10 08:04:21 +00:00
marcel	884f89576e	MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().	2003-08-08 00:30:26 +00:00
jhb	e78286124c	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
marcel	a43546505d	Better define the flags in the mcontext_t and properly set the flags when we create contexts. The meaning of the flags are documented in <machine/ucontext.h>. I only list them here to help browsing the commit logs: _MC_FLAGS_ASYNC_CONTEXT _MC_FLAGS_HIGHFP_VALID _MC_FLAGS_KSE_SET_MBOX _MC_FLAGS_RETURN_VALID _MC_FLAGS_SCRATCH_VALID Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)	2003-08-07 07:52:39 +00:00
marcel	2bae7b1e21	o Fix cut-n-paste whitespace corruption in previous commit o For trap-based upcalls the argument (the kse_mailbox) to the UTS must be written onto the kernel stack, not the user stack. While here, deal with the fact that we may be at a NaT collection point.	2003-08-07 07:40:19 +00:00
marcel	92fb32b428	In cpu_set_upcall_kse(), create the upcall according to the entry path into the kernel. Normally it's due to a syscall, but one can also be created as the result of a clock interrupt (for example). This now even more looks like exec_setregs(). While here, add an assert that we don't expect more than 8KB of dirty registers on the kernel stack.	2003-08-06 23:28:19 +00:00
marcel	bb0c32f2d8	o In revision 1.45 of exception.S we changed exception_restore to unconditionally restore ar.k7 (kernel memory stack) and ar.k6 (kernel register stack). I don't know what I was smoking then, but if you unconditionally restore ar.k6, you also want to compute its value unconditionally. By having the computation predicated and dependent on whether we return to user mode, we would end up writing junk (= invalid value for ar.bspstore) if we would return to kernel mode. But the whole point of the unconditional restoration was that there is a grey area where we still need to have ar.k6 restored. If we restore with a junk value, we would end up wedging the machine on the next interrupt. So, unconditionally calculate the value we unconditionally write to ar.k6. o The previous braino was found while making the following change: We used to clear the lower 9 bits of the value we write to ar.k6. The meaning being that we know that the kernel register stack is at least 512 byte aligned and simply clearing the lower 9 bits allows us to return to a context of which we don't have dirty registers on the kernel stack, even though the context that entered the kernel does have dirty registers on the kernel stack. By masking-off the lower bits, we correctly obtain the base of the register stack without having to worry that we didn't actually reached the base while unwinding it. The change is to mask off the lower 13 bits, knowing that the kernel register stack is always 8KB aligned. The advantage is that we don't have to worry anymore if there's more than 512 bytes of dirty registers on the kernel stack. A situation that frequently occurs. In exec_setregs() in machdep.c:1.147 or older, we had to deal with that situation by copying the active portion of the register stack down in multiples of 512 bytes. Now that we mask off the lower 13 bits we don't have to do that at all. Contemporary IPF processors have a register file that can hold up to 96 stacked registers (=784 bytes [incl. 2 NaT collections]). With no indication that register files grow beyond a couple of hundred registers, we should not have to worry about it anymore... and yes, 640KB is enough for everybody :-) This change helps setcontext(2) and cpu_set_upcall_kse() in that they can return to completely different contexts without having to mess with the kernel stack. Of course exec_setregs() doesn't need to do that anymore as well.	2003-08-06 21:32:38 +00:00
marcel	1a16963d08	o Put the syscall return registers in the context. Not only do we need this for swapcontext(), KSE upcalls initiated from ast() also need to save them so that we properly return the syscall results after having had a context switch. Note that we don't use r11 in the kernel. However, the runtime specification has defined r8-r11 as return registers, so we put r11 in the context as well. I think deischen@ was trying to tell me that we should save the return registers before. I just wasn't ready for it :-) o The EPC syscall code has 2 return registers and 2 frame markers to save. The first (rp/pfs) belongs to the syscall stub itself. The second (iip/cfm) belongs to the caller of the syscall stub. We want to put the second in the context (note that iip and cfm relate to interrupts. They are only being misused by the syscall code, but are not part of a regular context). This way, when the context is switched to again, we return to the caller of setcontext(2) as one would expect. o Deal with dirty registers on the kernel stack. The getcontext() syscall will flush the RSE, so we don't expect any dirty registers in that case. However, in thread_userret() we also need to save the context in certain cases. When that happens, we are sure that there are dirty registers on the kernel stack. This implementation simply copies the registers, one at a time, from the kernel stack to the user stack. NAT collections are not dealt with. Hence we don't preserve NaT bits. A better solution needs to be found at some later time. We also don't deal with this in all cases in set_mcontext. No temporay solution is implemented because it's not a showstopper. The problem is that we need to ignore the dirty registers and we automaticly do that for at most 62 registers. When there are more than 62 dirty registers we have a memory "leak". This commit is fundamental for KSE support.	2003-08-05 18:52:02 +00:00
marcel	753746a93b	Fix logic bug in the previous commit. Any region less than 5 is a user space region. Hence, we need to test if 5 is greater than the region; not greater equal. This bug caused us to call ast() while interrupting kernel mode.	2003-08-04 22:00:48 +00:00
jhb	98b95fa83b	- Since td_critnest is now initialized in MI code, it doesn't have to be set in cpu_critical_fork_exit() anymore. - As far as I can tell, cpu_thread_link() has never been used, not even when it was originally added, so remove it.	2003-08-04 20:32:45 +00:00
marcel	a9b84efaa9	Cleanup the clock code. This includes: o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2). While here, remove some nearby (trivial) left-overs from alpha and other cleanups.	2003-08-04 05:13:18 +00:00
marcel	3704dfdebb	Fix handling of external interrupts: we weren't calling ast() when interrupting user mode. The net effect of this bug is that a clock interrupt does not cause rescheduling and processes are not preempted. It only takes a "while (1);" to render the machine useless. This bug was introduced by the context changes and EPC syscall code. Handling of ASTs was moved to C for clarity and ease of maintenance, but was not added for the external interrupt case. This needs to be revisited. We now have calls to do_ast() in trap(), break_syscall() and ivt_External_Interrupt(). A single call in exception_restore covers these 3 places without duplication. This is where we handled ASTs prior to the overhaul, except that the meat has been moved to do_ast(), a C function. This was the goal to begin with. Pointy hat: marcel	2003-08-04 00:08:39 +00:00
obrien	05880247b0	Style sync.	2003-08-03 07:50:19 +00:00
marcel	373458bac2	Don't use uint64_t. Use unsigned long instead. One is supposed to use ucontext_t without having to include headers other than <ucontext.h>.	2003-08-02 01:12:31 +00:00
marcel	632f89ec09	Write the preserved registers to (and read them from) struct reg and struct fpreg.	2003-08-01 07:21:34 +00:00
bmilekic	e7a849e42d	Make sure that when the PV ENTRY zone is created in pmap, that it's created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In the i386 case in particular, the pmap code would hook a special page allocation routine that allocated from kernel_map and not kmem_map, and so when/if the pageout daemon drained the zones, it could actually push out slabs from the PV ENTRY zone but call UMA's default page_free, which resulted in pages allocated from kernel_map being freed to kmem_map; bad. kmem_free() ignores the return value of the vm_map_delete and just returns. I'm not sure what the exact repercussions could be, but it doesn't look good. In the PAE case on i386, we also set-up a zone in pmap, so be conservative for now and make that zone also ZONE_NOFREE and ZONE_VM. Do this for the pmap zones for the other archs too, although in some cases it may not be entirely necessarily. We'd rather be safe than sorry at this point. Perhaps all UMA_ZONE_VM zones should by default be also UMA_ZONE_NOFREE? May fix some of silby's crashes on the PV ENTRY zone.	2003-07-31 03:39:51 +00:00
peter	67d9dee027	Deal with 'options KSTACK_PAGES' being a global option.	2003-07-31 01:31:32 +00:00
peter	840d1823b6	Cosmetic: fix some disorder of #include "opt_...." files	2003-07-31 01:29:09 +00:00
peter	056626ca51	Remove leftover relic of pmap_new_thread() etc.	2003-07-31 01:28:41 +00:00
mux	f5326def16	- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed memory in bus_dmamem_alloc(). This is possible now that contigmalloc() supports the M_ZERO flag. - Remove the locking of Giant around calls to contigmalloc() since contigmalloc() now grabs Giant itself.	2003-07-27 13:52:10 +00:00
marcel	a81b699337	Remove prototype of ia64_pa_access(). The function has been moved to mem.c where it's been made static.	2003-07-26 10:13:30 +00:00
marcel	1073a0689c	Avoid using __aligned(16). Instead define the jmp_buf in terms of long doubles. This gives us 16-byte alignment. Add a CTASSERT for the size of the jmp_buf to detect ABI breakages.	2003-07-26 08:03:43 +00:00
marcel	e5dec7e8fa	Unbreak ia64 builds now -Werror is enabled again. Avoid obsolete memory operand construct.	2003-07-26 07:23:25 +00:00
marcel	5b786e1bdc	Revert previous commit. We don't use setjmp()/longjmp() for context switching anymore, so there's no need to save and restore GP. This change breaks threaded applications linked against libc_r. Pull the tier 2 card again: relink. This will link against libthr instead.	2003-07-25 22:36:48 +00:00
alc	7fa838affc	MFi386 revision 1.416 Add vm object locking to pmap_prefault(). Note: powerpc and sparc64 do not implement this function.	2003-07-25 18:58:39 +00:00
marcel	e9cb751c39	Remove __aligned(16) from the definition of struct _ia64_fpreg. It's a non-standard construct. Instead, redefine struct _ia64_fpreg as a union and put a long double in it. On ia64 and for LP64, this is defined by the ABI to have 16-byte alignment. For ILP32 a long double has 4-byte alignment, but we don't support ILP32. Note that the in-memory image of a long double does not match the in- memory image of spilled FP registers. This means that one cannot use the fpr_flt field to interpet the bits. For this reason we continue to use an aggregate type.	2003-07-25 08:02:24 +00:00
marcel	8cdcaf359b	Remove INVARIANT* and WITNESS. This makes the simulator much more pleasant to use.	2003-07-25 07:52:20 +00:00
marcel	7cd9b64674	Move ia64_pa_access() from machdep.c to mem.c and declare it static. It's only used in mem.c and cannot accidentally be used elsewhere this way.	2003-07-25 05:37:13 +00:00
marcel	806441a3b4	Disable the single-step trap on a debug related trap, including of course the single-step trap itself.	2003-07-25 00:11:14 +00:00
marcel	b1fac7bd73	We sloppily created an array for the high FP registers (f32-f127), but this just created a weird inconsistency when porting gdb(1). Instead, we name each high FP register seperately, like we do for all the other registers.	2003-07-23 03:08:34 +00:00
marcel	5b339aaee3	Rename thread_siginfo to cpu_thread_siginfo.	2003-07-15 04:43:33 +00:00
marcel	75ba4246d9	Enable the high FP registers when we call the FPSWA handler and disable them again afterwards. This fixes a disabled FP fault while in the FPSWA handler. While here, merge the FP fault and FP trap handling code to reduce code duplication. Where code was different, it was not sure it should be. Trigger case: ports/math/atlas	2003-07-13 04:08:16 +00:00
marcel	20ffeaeb15	Add logic to trace across/over a trapframe. We have ABI markers in our unwind information for functions that are entry points into the kernel. When stepping to the next frame, the unwinder will let us know when sych a marker was encountered. We use this to stop the current unwind session, query the trapframe and restart a new unwind session based on the new trapframe. The implementation is a bit sloppy, but at this time there are bigger fish to fry.	2003-07-12 04:35:09 +00:00
marcel	c8858f5f0e	Add a body directive before the first instruction in epc_syscall(). This results in a zero length prologue and a body that covers the whole function. This is more correct.	2003-07-11 08:52:48 +00:00
marcel	e902530657	Remove a gratuitous align directive after the endp directive for IVT entries.	2003-07-11 08:49:26 +00:00
marcel	2dfcef817e	Don't call malloc() and free() while in the debugger and unwinding to get a stacktrace. This does not work even with M_NOWAIT when we have WITNESS and is generally a bad idea (pointed out by bde@). We allocate an 8K heap for use by the unwinder when ddb is active. A stack trace roughly takes up half of that in any case, so we have some room for complex unwind situations. We don't want to waste too much space though. Due to the nature of unwinding, we don't worry too much about fragmentation or performance of unwinding while in the debugger. For now we have our own heap management, but we may be able to leverage from existing code at some later time. While here: o Make sure we actually free the unwind environment after unwinding. This fixes a memory leak. o Replace Doug's license with mine in unwind.c and unwind.h. Both files don't have much, if any, of Doug's code left since the EPC syscall overhaul and the import of the unwinder. o Remove dead code. o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.	2003-07-05 23:21:58 +00:00
alc	dd2f3bbb2f	Background: pmap_object_init_pt() premaps the pages of a object in order to avoid the overhead of later page faults. In general, it implements two cases: one for vnode-backed objects and one for device-backed objects. Only the device-backed case is really machine-dependent, belonging in the pmap. This commit moves the vnode-backed case into the (relatively) new function vm_map_pmap_enter(). On amd64 and i386, this commit only amounts to code rearrangement. On alpha and ia64, the new machine independent (MI) implementation of the vnode case is smaller and more efficient than their pmap-based implementations. (The MI implementation takes advantage of the fact that objects in -CURRENT are ordered collections of pages.) On sparc64, pmap_object_init_pt() hadn't (yet) been implemented.	2003-07-03 20:18:02 +00:00
ru	c779e7b55e	The .s files were repo-copied to .S files. Approved by: marcel Repocopied by: joe	2003-07-02 12:57:07 +00:00
marcel	3294aa6de7	The use of SYSINIT requires the inclusion of <sys/kernel.h>	2003-07-02 01:22:29 +00:00
mux	ff4a533f2f	Make this even closer to other busdma backends.	2003-07-01 21:21:45 +00:00
mux	3c322bc439	Sync bounce pages support with the alpha backend. More precisely: o use a mutex to protect the bounce pages structure. o use a SYSINIT function to initialize the bounce pages structures and thus avoid a race condition in alloc_bounce_pages(). o add support for the BUS_DMA_NOWAIT flag in bus_dmamap_load(). o remove obsolete splhigh()/splx() calls. o remove printf() about incorrect locking in busdma_swi() and sync busdma_swi() with the one of the alpha backend. o use __FBSDID.	2003-07-01 18:08:05 +00:00
mux	a50d8901a5	Honor the boundary of the busdma tag when allocating bounce pages. This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and was never fixed in other busdma backends using bounce pages.	2003-07-01 16:54:54 +00:00
scottl	2fdb52b864	Mega busdma API commit. Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg. Lockfunc allows a driver to provide a function for managing its locking semantics while using busdma. At the moment, this is used for the asynchronous busdma_swi and callback mechanism. Two lockfunc implementations are provided: busdma_lock_mutex() performs standard mutex operations on the mutex that is specified from lockfuncarg. dftl_lock() is a panic implementation and is defaulted to when NULL, NULL are passed to bus_dma_tag_create(). The only time that NULL, NULL should ever be used is when the driver ensures that bus_dmamap_load() will not be deferred. Drivers that do not provide their own locking can pass busdma_lock_mutex,&Giant args in order to preserve the former behaviour. sparc64 and powerpc do not provide real busdma_swi functions, so this is largely a noop on those platforms. The busdma_swi on is64 is not properly locked yet, so warnings will be emitted on this platform when busdma callback deferrals happen. If anyone gets panics or warnings from dflt_lock() being called, please let me know right away. Reviewed by: tmm, gibbs	2003-07-01 15:52:06 +00:00
alc	7f87019628	- Export pmap_enter_quick() to the MI VM. This will permit the implementation of a largely MI pmap_object_init_pt() for vnode-backed objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64 and powerpc. - Correct a mismatch between pmap_object_init_pt()'s prototype and its various implementations. (I plan to keep pmap_object_init_pt() as the MD hook for device-backed objects on i386 and amd64.) - Correct an error in ia64's pmap_enter_quick() and adjust its interface to match the other versions. Discussed with: marcel	2003-06-29 21:20:04 +00:00
alc	0566856e9f	- Remove the calls to pmap_install() from pmap_object_init_pt(); they are redundant. Discussed with: marcel - MFi386: Add vm object locking to pmap_object_init_pt().	2003-06-29 06:10:32 +00:00
marcel	cc29aafb98	Implement cpu_set_upcall_kse(). Elementary testing shows that this function behaves correctly in principle, but is not expected to be 100% complete. In any case, with this commit we have KSE ported enough to start runtime testing with threaded applications and fix whatever bugs or omissions we encounter. Yay!	2003-06-28 09:22:25 +00:00
davidxu	1dcde6fa83	Add a machine depended function thread_siginfo, SA signal code will use the function to construct a siginfo structure and use the result to export to userland. Reviewed by: julian	2003-06-28 06:34:08 +00:00
scottl	870f77fed0	Do the first and mostly mechanical step of adding mutex support to the bus_dma async callback scheme. Note that sparc64 does not seem to do async callbacks. Note that ia64 callbacks might not be MPSAFE at the moment. Note that powerpc doesn't seem to do async callbacks due to the implementation being incomplete. Reviewed by: mostly silence on arch@	2003-06-27 08:31:48 +00:00
marcel	40004f4ec1	Add TLS related relocation.	2003-06-19 06:51:43 +00:00
alc	23534b3723	Fix a performance bug in all of the various implementations of uma_small_alloc(): They always zeroed the page regardless of what the caller requested.	2003-06-18 02:57:38 +00:00
davidxu	95b64acdb5	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.	2003-06-15 00:31:24 +00:00
alc	c1ed791c1f	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().	2003-06-14 23:23:55 +00:00
alc	89789483ab	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.	2003-06-14 06:20:25 +00:00
marcel	6239a6d8dd	Remove kernel event tracing. The overhead is significant when running under ski.	2003-06-14 00:01:24 +00:00
marcel	0b9048a1fc	Make sure pcpu->pc_pcb is pointing to a 16-byte aligned address. The PCB contains FP registers, whose alignment must be 16 bytes at least. Since the PCB pointed to by pc_pcb is immediately after the PCPU itself, round-up the size of thge PCPU to a multiple of 16 bytes. The PCPU is page aligned. This fixes a misalignment trap caused by stopping a CPU in a SMP kernel, such as been done when entering the debugger. Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>	2003-06-12 00:15:18 +00:00
peter	a7852fc239	GC unused cpu_wait() function	2003-06-11 05:20:33 +00:00
jmallett	abb298a21c	Note that scbus is required for SCSI, not just "required" in general. Submitted by: Edward Kaplan (tmbg37 on IRC) Reviewed by: rwatson (in principle)	2003-06-08 02:03:02 +00:00
marcel	03e44e9dc5	pmap_find_vhpt() has been observed to return a NULL pointer when the caller assumes this to not happen by means of performing an indirection without checking the return value. Add KASSERTs to force a kernel with INVARIANTS to panic. This is a short-term measure. The pmap code is scheduled to be overhauled.	2003-06-07 04:17:39 +00:00
marcel	4723cff4cd	If we get a fault in the gateway page, which would happen if we try to deliver a signal and the RSE backing store has been exhausted or the backing store pointer has been clobbered, we need to make sure we call userret() and do_ast() when we exit from trap(). Not adjusting the local variable 'user' in this case will prevent the faulty process from being terminated and we end up in an infinite fault repetition. Faulty process provided by: bento	2003-06-07 04:10:07 +00:00
marcel	b304025b63	Use TRAPF_USERMODE() to replace an equivalent check in trap(). While here, amend the related comment.	2003-06-06 23:44:05 +00:00
marcel	95bb800d5e	Have TRAPF_USERMODE() take into account that the gateway page is not always kernel space. It should be treated as user space when run with user privileges (which is the case for the signal trampolines). This fixes its only use in a KASSERT in subr_trap.c.	2003-06-06 23:27:18 +00:00
marcel	cc08ace0c3	Fix the dreaded double counting that was present on alpha as well and got fixed two weeks after the ia64 version was copied from the alpha version (see rev 1.32 of sys/alpha/alpha/mem.c). As such, we were missing the same continue as on alpha. While here, add a default case for the device minor switch and do some general style(9) cleanups. WARNING: this file still has bugs. When reading from region 6 or region 7, we don't validate the physical address. One can trivially cause a machine check by trying to read from address 0xFFFFFFFFFFFFFFF0 or something that uses the unimplemented physical address bits. Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>	2003-06-04 21:56:10 +00:00
marcel	f763a40070	Change the second (and last) argument of cpu_set_upcall(). Previously we were passing in a void* representing the PCB of the parent thread. Now we pass a pointer to the parent thread itself. The prime reason for this change is to allow cpu_set_upcall() to copy (parts of) the trapframe instead of having it done in MI code in each caller of cpu_set_upcall(). Copying the trapframe cannot always be done with a simply bcopy() or may not always be optimal that way. On ia64 specifically the trapframe contains information that is specific to an entry into the kernel and can only be used by the corresponding exit from the kernel. A trapframe copied verbatim from another frame is in most cases useless without some additional normalization. Note that this change removes the assignment to td->td_frame in some implementations of cpu_set_upcall(). The assignment is redundant. A previous call to cpu_thread_setup() already did the exact same assignment. An added benefit of removing the redundant assignment is that we can now change td_pcb without nasty side-effects. This change officially marks the ability on ia64 for 1:1 threading. Not tested on: amd64, powerpc Compile & boot tested on: alpha, sparc64 Functionally tested on: i386, ia64	2003-06-04 21:13:21 +00:00
marcel	6a6f6a30fc	Improve set_mcontext: o Don't copy psr verbatim from the user supplied context. Only allow userland to change the processor settings that are part of the user mask.	2003-06-01 23:22:56 +00:00
marcel	803ad569a5	Improve on cpu_set_upcall: o Use pcb and tf for the new pcb and the new trapframe and use pcb0 for the old (current) pcb. The mix of pcb, pcb2 and tf was slightly confusing. o Don't define td->td_frame here. It has already been set previously by cpu_thread_setup. Add a KASSERT to make sure pcb and tf are both non-NULL. o Make sure the number of dirty registers is 0 for the new thread. There are no user registers on the backing store because we heven't enter userland yet.	2003-06-01 23:19:21 +00:00
marcel	e3a4b69afc	Implement cpu_thread_setup(). This is mostly the same as on i386, except for the fact that trapframes have a size recorded in it that we set here too. We need this for proper thread setup. Pointed out by: mtm	2003-06-01 08:29:43 +00:00
marcel	67399b06da	Now that we have the signal trampolines in the gateway page and the gateway page is considered kernel space, we can panic when we should only SIGSEGV. Hence, add the additional constraint that for page faults we also require running with kernel privileges. The gateway page is the only kernel code running with user privileges, iso this is a correct way to exclude the gateway page from kernel land. We do not currently exclude the gateway page for other faults as it is not always the right way to do it. Further tuning will happen on a case by case bases.	2003-05-31 21:21:35 +00:00
marcel	fbba60b432	Implement cpu_set_upcall(). Required by libthr and used by thr_create(2). This implementation is so far only compile tested. But since this is also the last of the functions required to support libthr, we're now functionally complete (for some weird definition of functionally; and complete). Runtime testing can commence.	2003-05-31 21:14:25 +00:00
marcel	7600163ffe	Implement set_mcontext() and get_mcontext(). Just as for sendsig() and sigreturn(), we cheat and assume the preserved registers are still on-chip and unmodified. This is actually the case, but more by accident than by design. We need to use unwinding eventually or explicitly compile the kernel in a way that the compiler steers clear from using the preserved registers completely.	2003-05-31 21:07:08 +00:00
marcel	768ef19f61	Make the regset pointers const pointers for the context restore functions. This works better with set_mcontext() and is more precise in general.	2003-05-31 21:02:18 +00:00
marcel	5548639b36	Some ia32 related finetuning for the EPC syscall path: o The SDM states that flushing the RSE in the cycle prior to the call to ia32 code yields the best performance. We don't really care to much about performance here, but we do the same anyway. I'm being paranoia and conservative here. o Only initialize the ia32 state registers, not the registers used as scratch by the ia32 engine. This saves a couple of loads from the trapframe, but also helps debugging: we don't clobber useful debugging data (engineering hints :-) o Make sure all general registers constituting ia32 state have been initialized. If there's no useful to be loaded from the trapframe, clear the register. This avoids accidentally leaking NaT bits. o Make sure we set ar.k6 prior to clobbering ar.bspstore and also set ar.k7 prior to setting sp. This fixes a race seen for ia64 native code as well (and previously fixed too).	2003-05-31 20:57:26 +00:00
marcel	fa03c95a7a	Make sure we have all the dirty registers in user frames on the backing store before we discard them. It is possible that we enter the kernel (due to an execve in this case) with a lot of dirty user registers and that the RSE has only partially spilled them (to make room for new frames). We cannot move the backing store pointer down (to discard user registers) when not all of the user registers are on the backing store. So, we flush the register stack IFF this happens. Unconditionally doing the flush is too costly, because the condition in which we need to flush is very rare. This change appears to fix the SIGSEGV that sometimes happen for newly executed processes and so far also appears to fix the last of the corruption. It is possible, although not likely, that this change prevents some other bug from happening, even though it is itself not a fix. Hence the uncertainty. We'll know in a couple of months I guess :-)	2003-05-31 20:42:35 +00:00
hmp	643f3c2b62	Rename BUS_DMAMEM_NOSYNC to BUS_DMA_COHERENT. The current name is confusing, because it indicates to the client that a bus_dmamap_sync() operation is not necessary when the flag is specified, which is wrong. The main purpose of this flag is to hint the underlying architecture that DMA memory should be mapped in a coherent way, but the architecture can ignore it. But if the architecture does supports coherent mapping of memory, then it makes bus_dmamap_sync() calls cheap. This flag is the same as the one in NetBSD's Bus DMA. Reviewed by: gibbs, scottl, des (implicitly) Approved by: re@ (jhb)	2003-05-30 20:40:33 +00:00
marcel	ae709691e7	Move the sysctls of the misalignment handler to where they belong and use OID_AUTO instead of fixed IDs. Approved by: re@ (blanket)	2003-05-29 06:30:36 +00:00
marcel	fdd8b4a738	Fix what I think is a cut-n-paste bug: use OID_AUTO for the print_usertrap sysctl instead of CPU_UNALIGNED_PRINT. The latter is used already. Approved by: re@ (blanket)	2003-05-29 05:09:15 +00:00
marcel	b7bdcd6651	A flushrs must be the first in an instruction group. Approved by: re@ (blanket)	2003-05-27 07:10:58 +00:00
scottl	e2411ba2b8	Bring back bus_dmasync_op_t. It is now a typedef to an int, though the BUS_DMASYNC_ definitions remain as before. The does not change the ABI, and reverts the API to be a bit more compatible and flexible. This has survived a full 'make universe'. Approved by: re (bmah)	2003-05-27 04:59:59 +00:00
marcel	be94a372e6	Have the unwinder allocate memory with M_NOWAIT. The unwinder is used by DDB and we cannot know in advance whether it's save to sleep. It often enough isn't. We may want to pre-allocate space to cover the most common cases without having to use malloc at all, but that requires some analysis. We leave that for later. Approved by: re@ (blanket)	2003-05-27 01:15:16 +00:00
marcel	59fed3f8eb	Fix fu{byte\|word} and su{byte\|word}: o If the address was not within user space we jumped to fusufault where we would clear pcb_onfault and return 0. There are two bugs here: 1. We never got to the point where we assigned the address of pcb_onfault to r15, which means that we would clobber some random memory location, including I/O space or ROM. 2. We're supposed to return -1 on error. o Make sure we have proper memory ordering for setting pcb_onfault, doing the memory access to user space and clearing pcb_onfault. For the fu* family of functions this means that we need a mf instruction, because we don't have acquire semantics on stores and release semantics on loads (hence st;ld cannot be ordered without intermediate mf). While here, implement casuptr() so that we are a (small) step closer to supporting libthr and deobfuscate the non-implementation of {f\|s}uswintr. Approved by: re@ (blanket)	2003-05-27 01:00:12 +00:00
marcel	d43543ffea	Revision 1.99 of this file changed the allocation request from VM_ALLOC_INTERRUPT to VM_ALLOC_SYSTEM. There was no mention of this in commit log as it was considered harmless. Guess what: it does harm. WITNESS showed that we can not safely grab the page queue lock in vm_page_alloc() in all cases as we may have to sleep on it. Revert the request to VM_ALLOC_INTERRUPT to circumvent this. We panic if vm_page_alloc returns 0. I'm not entirely happy about this, but we have bigger fish to fry. Approved by: re@ (blanket)	2003-05-26 22:54:18 +00:00
marcel	37196ccbff	Now that we define user mode as any IP address that isn't in the kernel's VA regions, we cannot limit the use of break-based syscalls to user mode only. The signal trampolines are in the gateway page, which is mapped into the process address space in region 5 and thus is kernel space. We don't special case the gateway page here. Allow break-based syscalls from anywhere in the kernel VA space. Approved by: re@ (blanket)	2003-05-25 01:01:28 +00:00
marcel	253115005e	Fix a source of instability specific to an EPC userland. We return to userland with interrupts disabled until we restore PSR. However, it has been observed that interrupts do actually happen before they are enabled again. This is a bit surprising and I don't know yet what's going on exactly. Nevertheless, the code was not crafted carefully enough to allow interrupts to happen and we could clobber the kernel stack of another thread when interrupts did happen. This is what happens: we restore the (memory) stack pointer (sp) and the register stack base prior to restoring ar.k6 and ar.k7. This is not a problem if interrupts don't happen between setting sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen. Since sp/ar.bspstore already point to the userland stacks, we need to switch to the kernel stack in interrupt. However, ar.k6 and ar.k7 have not been set, which means that we were switching to some unrelated kstack and happily clobbered the trapframe present there if the thread to which the kstack belonged was in kernel mode or otherwise we could have our trapframe clobbered if that other thread enters the kernel. Nasty either way. We now carefully restore ar.k6 prior to restoring ar.bspstore and likewise for ar.k7 and sp. All we need is the guarantee that an interrupt does not clobber ar.k6 or ar.k7 before we're back in userland. That has been achieved by restoring ar.k6/ar.k7 unconditionally (see exception.s) While here, remove the disabling of interrupts on EPC entry. It was added as a way to "resolve" the crashes until it was understood what was going on. I think I achieved the latter, so we can remove the patch. Note that setting up a trapframe with interrupts enabled has it's own share of corner cases, but it's better to properly fixed those than to keep a mostly wrong patch around because we're afraid to remove it... Approved by: re@ (blanket)	2003-05-24 22:53:10 +00:00
marcel	6d0ee0d770	Be more careful how we restore interrupts. Don't rewrite most of the PSR only to achieve setting PSR.i back to it's previous value. It makes it impossible to change any of the 30+ other unrelated bits when done between intr_disable() and intr_restore(). That's bad. Instead have intr_disable() return 1 when interrupts were previously enabled and 0 otherwise and only enable interrupts in intr_restore() when given a non-0 value. This change specifically disallows using intr_restore() to disable interrupts. The reason is simple: interrupts only need to be restored after they are being disabled, which means that intr_restore() is called with interrupts disabled and we only need to enable them if they were previously enabled. This change does not fix any bugs, other than that it bugged me... Approved by: re@ (blanket)	2003-05-24 21:44:24 +00:00
marcel	2301f98ad2	Consistently us the same metric to differentiate between kernel mode and user mode. We need to take into account that the EPC syscall path introduces a grey area in which one can argue either way, including a third: neither. We now use the region in which the IP address lies. Regions 5, 6 and 7 are kernel VA regions and if the IP lies any any of those regions we assume we're in kernel mode. Hence, we can be in kernel mode even if we're not on the kernel stack and/or have user privileges. There're gremlins living in the twilight zone :-) For the EPC syscall path this particularly means that the process leaves user mode the moment it calls into the gateway page. This makes the most sense because from a process' point of view the call represents a request to the kernel for some service and that service has been performed if the call returns. With the metric we picked, this also means that we're back in user mode IFF the call returns. Approved by: re@ (blanket)	2003-05-24 21:16:19 +00:00
marcel	c4e56d4da9	Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack) when returning from an interrupt. Both registers are used on interrupt to switch to the right kernel stack, but other than that they are not used. This means we only have to make sure they contain proper values while in user mode. As such, we conditionally restored these registers based on whether we returned to userland or not. A nice property of conditionally restoring ar.k6 and ar.k7 is that it introduces two invariants: ar.k6 always points to the bottom of the kernel stack and ar.k7 always points to the top of the kernel stack (immediately below the PCB we have there). However, the EPC syscall path introduces an irregularity: there's no "thin red line" between user and kernel. There's a grey area that's a couple of instructions wide. Any interruption in that grey area is bound to see an inconsistent state. One such state is that we're in kernel space for all practical purposes, but we still need to have ar.k6 and ar.k7 restored as if we're in userland. Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing a valuable invariant. Both registers now hold the extend of the usable portion of the kernel stack at any interrupt nesting, which when in userland mean the bottom and the top of the kstack.	2003-05-24 20:51:55 +00:00
marcel	0f7b725a60	Fix an alpha inheritance bug: On alpha, PAL is involved in context management and after wiring the CPU (in alpha_init()) a context switch was performed to tell PAL about the context. This was bogusly brought over to ia64 where it introduced bugs, because we restored the context from a mostly uninitialized PCB. The cleanup constitutes: o Remove the unused arguments from ia64_init(). o Don't return from ia64_init(), but instead call mi_startup() directly. This reduces the amount of muckery in assembly and also allows for the next bullet: o Save our currect context prior to calling mi_startup(). The reason for this is that many threads are created from thread0 by cloning the PCB. By saving our context in the PCB, we have something sane to clone. It also ensures that a cloned thread that does not alter the context in any way will return to the saved context, where we're ready for the eventuality with a nice, user unfriendly panic(). The cleanup fixes at least the following bugs: o Entering mi_startup() with the RSE in enforced lazy mode. o Re-execution of ia64_init() in certain "lab" conditions. While here, add proper unwind directives to __start() so that the unwind knows it has reached the bottom of the (call) stack. Approved by: re@ (blanket)	2003-05-24 00:17:34 +00:00
marcel	8d4f097d8b	Fix a (new) source of instability: When interrupting a kernel context, we don't need to switch stacks (memory nor register). As such, we were also not restoring the register stack pointer (ar.bspstore). This, however, fails to be valid in 1 situation: when we interrupt a register stack switch as is being done in restorectx(). The problem is that restorectx() needs to have ar.bsp == ar.bspstore before it can assign the new value to ar.bspstore. This is achieved by doing a loadrs prior to assigning to ar.bspstore. If we take an interrupt in between the loadrs and the assignment and we don't make sure we restore the ar.bspstore prior to returning from the interrupt, we switch stacks with possibly non-zero dirty registers, which means that the new frame pointer (ar.bsp) will be invalid. So, instead of jumping over the restoration of the register frame pointer and related registers, we conditionalize it based on whether we return to kernel context or user context. A future performance tweak is possible by only restoring ar.bspstore when returning to kernel mode and when the RSE is in enforced lazy mode. One cannot assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode anyway. While here (well, not quite) don't unconditionally assign to ar.bspstore in exception_save. Only do that when we actually switch stacks. It can only harm us to do it unconditionally. Approved by: re@ (blanket)	2003-05-23 23:55:31 +00:00
marcel	fd06ce647b	In swapctx(), put the RSE in enforced lazy mode before we flush the register stack. There's nothing really wrong with flushing before putting the RSE in enforced lazy mode, provided you don't depend on ar.bspstore being equal to ar.bsp when the RSE has been put in enforced lazy more. The small window between the flush and setting the RSE may be sufficient to have the RSE eagerly increase the dirty region (and hence cause ar.bspstore != ar.bsp) or have an interrupt that may even get the laziest RSE to do something. Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so nothing was and is broken. But the code was non-intuitive and easily confuses. This is a source of future bugs. Note: the advantage of not depending on ar.bspstore is that there's some recilience against an interrupted flushrs. Clobbering is limited to stacked register contents only, not to RSE address clobbering. Approved: re@ (blanket)	2003-05-23 23:16:43 +00:00
marcel	8c20c58be6	o Fix a definite bogon: the dirty bity fault, instruction access failt and data access fault install the PTE in question into the VHPT table. However, a post-increment was missing and we wrote the raw PTE data into the pagesize/access key field. This leaves a corrupt VHPT entry. o While here, remove the explicit cache purge. Insertion into the translation implicitly purges any overlapping entries. o Make sure there's a cycle break between the itc and the rfi. o Whitespace fixes.	2003-05-20 06:57:20 +00:00
marcel	abdd0836ad	Rename the "IA64 ITC" counter to "ITC" counter. We don't call the "TSC" counter on i386 "I386 TSC". Approved by: re@ (blanket)	2003-05-20 06:51:20 +00:00
marcel	96d6ed0590	Prevent corruption of the VHPT collision chain by protecting it with a mutex. The only volatile chain operations are insertion and deletion but since updating an existing PTE also updates the VHPT entry itself, and we have the VHPT mutex in both other cases, we also lock when we update an existing PTE even though no chain operation is involved. Note that we perform the insertion and deletion careful enough that we don't need to lock traversals. If we need to lock traversals, we also need to lock from the exception handler, which we can't without creating a trapframe. We're now able to withstand a -j8 buildworld. More work is needed to withstand Murphy fields. In other words: we still have a bogon... Approved by: re@ (blanket)	2003-05-20 02:52:41 +00:00
kan	dd3a4fe537	sys/sys/limits.h: - Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)' is alays wrong because __FOO_VISIBLE is always defined (to 0 for invisibility). sys/<arch>/include/limits.h sys/<arch>/include/_limits.h: - Style fixes. Submitted by: bde Reviewed by: bsdmike Approved by: re (scottl)	2003-05-19 20:29:07 +00:00
marcel	0ab69419df	Turn pmap_install_pte() into a critical section. We better not get interrupted while writing into the VHPT table. While here, make sure memory accesses a properly ordered. Tag invalidation must happen first so that the hardware VHPT walker will not be able to match this entry while we're updating it and we have to make sure the new new tag gets written only after the PTE is completely updated. Approved by: re (blanket)	2003-05-19 08:02:36 +00:00
marcel	0b8ca3a8d4	Unconditionally set pcb_current_pmap. WIP versions of the code previously committed cleared pcb_current_pmap prior to changing the region registers, but that was removed before committing. Since we don't normally (at all?) pass a NULL pointer, the bug was mostly harmless. Fix it while I'm here... I'm here because we need to have data serialization after writing to the region registers. Not doing so was likely the cause of the hangs we were experiencing. General exceptions in cpu_switch may also be caused by the lack of serialization. Approved by: re (blanket)	2003-05-19 06:05:30 +00:00
marcel	f58bb7d65a	pmap_install() needs to be atomic WRT to context switching. Protect switching user regions (region 0-4) with schedlock. Avoid unnecessary recursion on schedlock by moving the core functionality to another function (pmap_switch()) where we assert schedlock is held. Turn pmap_install() into a wrapper that grabs schedlock. This minimizes the number of callsites that need to be changed. Since we already have schedlock in cpu_switch() and cpu_throw(), have them call pmap_switch() directly. These were also the only two calls to pmap_install() outside pmap.c, so make pmap_install() static and remove its prototype from pmap.h Approved by: re (blanket)	2003-05-19 04:16:30 +00:00
marcel	8045e66201	Remove unused files. cpu_switch() and cpu_throw(), normally in swtch.s, can be found in machdep.c. Approved: re@	2003-05-17 04:55:04 +00:00
marcel	2c3af6b0c7	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)	2003-05-16 21:26:42 +00:00
marcel	4742b3abf1	o In pmap_install, don't prevent switching the pmap if we're switching to kernel_pmap. The pmap is not special enough. o Clear the active bit on the pmap we're switching out. o Fix some nearby style(9) bugs. Approved by: re@	2003-05-16 07:57:44 +00:00
marcel	e18b3f977b	Indent a comment. This makes 1.100. Still approved by: re@ (blanket)	2003-05-16 07:05:08 +00:00
marcel	f6ab86d828	Turn pmap_growkernel() into a critical section. While here, initialize kernel_vm_end in pmap_bootstrap. Don't delay the initialization until we need to grow the kernel VM space. This BTW happens twice before we enter either single- or multi-user mode. Don't adjust kernel_vm_end while growing based on whether the KPT contains a non-NULL entry. We trust kernel_vm_end to be correct and we make sure it's still correct after growing. Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.	2003-05-16 07:03:15 +00:00
marcel	6a36805952	Revamp the RID allocation code: o Limit the size of the region ID map to 64KB. This gives a bitmap that is large enough to keep track of 2^19 numbers. The minimal map size is 32KB. The reason we limit the map size is that processor models may have implemented a 24-bit region ID, which would give a 2MB bitmap while the maximum number of allocations is always less than PID_MAX*5, which is less than 2^19. o Allocate all region IDs up-front. The slight downside of reserving more RIDs then a process needs (3 for ia64 native and 1 for ia32) is preferable over the call to pmap_ensure_rid() where RIDs are allocated on demand. On SMP systems this may lead to a race condition. o When allocating a region ID, don't use arc4random(). We're not interested in randomness or uniform distribution across the spectrum. We only need uniqueness. Random numbers may easily collide when the number of allocated RIDs is high, creating a possibly unbounded retry rate.	2003-05-16 06:40:40 +00:00
marcel	d3715e0039	Move the conditional definition of KSTACK_MAX_PAGES up ahead where it's more visible. Approved by: re@ (blanket)	2003-05-16 06:17:34 +00:00
marcel	7a98b54102	This file creates register sets based on the runtime specification. The advantage of using register sets is that you don't focus on each register seperately, but instead instroduce a level of abstraction. This reduces the chance of errors, and also simplifies the code. The register sers form the basis of everything register. The sets in this file are: struct _special contains all of the control related registers, such as instruction pointer and stack pointer. It also contains interrupt specific registers like the faulting address. The set is roughly split in 3 groups. The first contains the registers that define a context or thread. This is the only group that the kernel needs to switch threads. The second group contains registers needed in addition to the first group needed to switch userland threads. This group contains the thread pointer and the FP control register. The third group contains those registers we need for execption handling and are used on top of the first two groups. struct _callee_saved, struct _callee_saved_fp These sets contain the preserved registers, including the NaT after spilling. The general registers (including branch registers) are seperated from the FP registers for ptrace(2). struct _caller_saved, struct _caller_saved_fp These sets contain the scratch registers based on SDM 2.1, This means that both ar.csd and ar.ccd are included here, even though they contain ia32 segment register descriptions. We keep seperate NaT bits for scratch and preserved registers, because they are never saved/restored at the same time. struct _high_fp The upper 96 FP registers that can be enabled/disabled seperately on the CPU from the lower 32 FP registers. Due to the size of this set, we treat them specially, even though they are defined as scratch registers. CVS ----------------------------------------------------------------------	2003-05-15 08:36:03 +00:00
marcel	26cfa9fe2e	This file contains elementary context related functions used to save and restore "sets" of registers in various places. The restorectx and swapctx functions are used by cpu_switch() and deal with the special registers, as well as the preserved registers. The callee_saved functions are used to save and restore the preserved registers (integer and floating-point). They are useful for signal delivery and ptrace support. The save_high_fp and restore_high_fp functions are used to "load" and "unload" to and from the CPU as part of lazy context switching. The ia32 specific context functions have been kept with the ia32 code. Approved by: re@ (blanket)	2003-05-15 08:08:32 +00:00
marcel	b176577e02	This file contains the code that implements the syscall path based on the epc instruction. The epc instruction, given the permissions of the page in which the epc is located, allows the privilege level to be increased with little or no overhead. The previous privilege level is recorded in the current frame marker and is restored by a regular (function) return. Since the epc instruction has to live in a page with non-standard properties, we hardwire a "gateway" page in the address space. The address of the gateway page is exported to userland in ar.k7. This allows us to rewire the page without breaking the ABI. The syscall stubs in libc are regular function calls that slightly differ from the normal runtime. The difference is mostly to simplify the stubs themselves by by moving some of the logic to the kernel. The libc stubs call into the gateway page (offset 0), from where the kernel trampolines to the code that sets up a minimal trapframe and arranges to execute from the kernel stack. The way back is basicly the same. The kernel returns to the gateway page, whereby privilege is dropped, and jumps back to the syscall stub. Only the special registers are saved in the trapframe. None of the scratch registers are preserved and since the kernel follows the same runtime model, none of the preserved registers are saved. Future enhancements can include the implementation of lightweight syscalls, where kernel functions are performed without setting up a trapframe. Good candidates are the *context syscalls for example. Now that there's a gateway page from which code can be executed in a non-privileged context, we also have the ideal place to put the signal trampolines. By moving the signal trampolines from the user stack to the gateway page, we open up the doors to unexecutable stacks. The gateway page contains signal trampolines for both the "legacy" break-based syscall code and the new and improved epc- based syscall code. Approved: re@ (blanket)	2003-05-15 07:51:22 +00:00
jhb	f0272107fb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)	2003-05-13 20:36:02 +00:00
kan	7ff03aee33	Style fixes. Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they were marked for deprecation ever since SUSv1 at least. Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is supported. Restore a lost comment in MI _limits.h file and remove it from sys/limits.h where it does not belong.	2003-05-04 22:13:04 +00:00
marcel	9bab15512c	Fix c99 victim: the accepted character '0 most now be types as '0'.	2003-05-03 23:05:16 +00:00
marcel	4359b7238f	Option KADB does not exist. It came from alpha, where it still exists.	2003-05-02 20:34:15 +00:00
marcel	80cd0e6d1f	Kill MID_MACHINE, its a.out specific, the only platform that supports it is i386. All of the other platforms should remove it too. -- peter@	2003-04-30 23:16:33 +00:00
jhb	48eb8eab8c	Range check the syscall number before looking it up in the syscallnames[] array. Submitted by: pho	2003-04-30 17:59:27 +00:00
kan	d7b605c280	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
marcel	fc30bfe794	Revamp the newbus functions: o do not use the in* and out* functions. These functions are used by legacy drivers and thus must have ia32 compatible behaviour. Hence, they need to have fences. Using these functions for newbus would then pessimize performance. o remove the conditional compilation of PIO and/or MEMIO support. It's a PITA without having any significant benefit. We always support them both. Since there are no I/O ports on ia64 (they are simulated by the chipset by translating memory mapped I/O to predefined uncacheable memory regions) the only difference between PIO and MEMIO is in the address calculation. There should be enough ILP that can be exploited here that making these computations compile-time conditional is not worth it. We now also don't use the read* and write* functions. o Add the missing *_8 variants. They were missing, although not missed. It's for completeness. o Do not add the fences that were present in the low-level support functions here. We're using uncacheable memory, which means that accesses are in program order. Change the barrier implementation to not only do a memory fence, but also an acceptance fence. This should more reliably synchronize drivers with the hardware. The memory fence enforces ordering, but does not imply visibility (ie the access does not necessarily have happened). This is what the acceptance deals with. cpufunc.h cleanup: o Remove the low-level memory mapped I/O support functions. They are not used. Keep the low-level I/O port access functions for legacy drivers and add fences to ensure ia32 compatibility. o Remove the syscons specific functions now that we have moved the proper definitions where they belong. o Replace the ia64_port_address() and ia64_memory_address() functions with macros. There's a bigger change inline functions get inlined when there aren't function callsi and the calculations are simply enough to do it with macros. Replace the one reference to ia64_memory address in mp_machdep.c to use the macro.	2003-04-29 09:50:03 +00:00
jhb	7d7a41d0f4	- Push down Giant into the sysarch() calls that still need Giant. - Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is invalid.	2003-04-25 20:04:02 +00:00
jhb	21de071f52	Regen.	2003-04-25 15:59:44 +00:00
jhb	e493fd3faa	Oops, the thr_* and jail_attach() syscall entries should be NOPROTO rather than STD.	2003-04-25 15:59:18 +00:00
deischen	4b37b6e450	Add an argument to get_mcontext() which specified whether the syscall return values should be cleared. The system calls getcontext() and swapcontext() want to return 0 on success but these contexts can be switched to at a later time so the return values need to be cleared in the saved register sets. Other callers of get_mcontext() would normally want the context without clearing the return values. Remove the i386-specific context saving from the KSE code. get_mcontext() is not i386-specific any more. Fix a bad pointer in the alpha get_mcontext() code. The context was being bcopy()'d from &td->tf_frame, but tf_frame is itself a pointer, so the thread was being copied instead. Spotted by jake. Glanced at by: jake Reviewed by: bde (months ago)	2003-04-25 01:50:30 +00:00
jhb	c30db0c24f	Regen.	2003-04-24 20:50:57 +00:00
jhb	91308d46a4	Fix the thr_create() entry by adding a trailing \. Also, sync up the MP safe flag for thr_* with the main table.	2003-04-24 20:49:46 +00:00
kan	b073b4daca	Add a new sys/limits.h file which in turn depends on machine/_limits.h to get actual constant values. This is in preparation for machine/limits.h retirement. Discussed on: standards@ Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*) Modified by: kan	2003-04-23 21:41:59 +00:00
jhb	f3d7052aea	- Replace inline implementations of sigprocmask() with calls to kern_sigprocmask() in the various binary compatibility emulators. - Replace calls to sigsuspend(), sigaltstack(), sigaction(), and sigprocmask() that used the stackgap with calls to the corresponding kern_sig*() functions instead without using the stackgap.	2003-04-22 18:23:49 +00:00
davidxu	c3b8b61056	Remove single threading detecting code, these code really should be replaced by thread_user_enter(), but current we don't want to enable this in trap.	2003-04-22 03:17:41 +00:00
marcel	efc0c38e40	Don't use the tpa instruction to implement pmap_kextract. The tpa instruction requires that a translation is present in the TC. This may trigger a TLB miss and a subsequent call to vm_fault(). This implementation is deliberately non-inline for debugging and profiling purposes. Partial or full inlining should eventually be done. Valuable insights by: jake	2003-04-22 01:48:43 +00:00
simokawa	8674e42fa6	Add FireWire drivers to GENERIC.	2003-04-21 16:44:05 +00:00
jhb	27fa8b59bd	Use the proc lock to protect p_singlethread and a P_WEXIT test. This fixes a couple of potential KSE panics on non-i386 arch's that weren't holding the proc lock when calling thread_exit().	2003-04-18 20:20:00 +00:00
marcel	3df9a5196e	Add the EHCI host controller.	2003-04-16 01:29:08 +00:00
mux	f81b8b1670	I deserve a big pointy hat for having missed all those references to bus_dmasync_op_t in my last commit.	2003-04-10 23:50:06 +00:00
mux	41c3ac60b2	Change the operation parameter of bus_dmamap_sync() from an enum to an int and redefine the BUS_DMASYNC_* constants as flags. This allows us to specify several operations in one call to bus_dmamap_sync() as in NetBSD.	2003-04-10 23:03:33 +00:00
mike	ee5efe23ec	o In struct prison, add an allprison linked list of prisons (protected by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails. Reviewed by: rwatson, tjr	2003-04-09 02:55:18 +00:00
des	93c2d21808	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-04-08 14:25:47 +00:00
marcel	af39be00aa	Remove COMPAT_FREEBSD4. It's impossible because FreeBSD 4 does not run on ia64 at all.	2003-04-08 08:32:00 +00:00
marcel	6baa93cddc	Remove the 32KB VHPT section from the kernel image. We don't really use it because we allocate a VHPT based on the size of the physical memory and even if the allocated VHPT is 32KB, we don't use the in- image section for it. Since the VHPT must be naturally aligned, we save 48K on average (due to alignment). Consequently, we start off with the VHPT disabled (it is assumed the VHPT is disabled because the EFI loader runs without memory address translation and thus has no need to setup the VHPT). It's probably a good idea to explicitly disable the VHPT if we make the use of the VHPT optional.	2003-04-06 21:31:26 +00:00
marcel	2b120d8952	Also set the access bit in the PTE when we get a data dirty bit fault. This avoids an immediate access bit fault when we serviced the dirty bit fault in case the access bit is unset. This typically happens for newly allocated memory that's being zeroed and thus very common.	2003-04-06 05:55:36 +00:00
marcel	35b1bbbb4c	Include <geom/geom_disk.h> and stop including <sys/disk.h>. The former gives us 'struct disk'.	2003-04-05 21:14:05 +00:00
des	bf10676408	Define ovbcopy() as a macro which expands to the equivalent bcopy() call, to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's bcopy() doesn't handle overlap like ours. Remove all implementations of ovbcopy(). Previously, bzero was a function pointer on i386, to save a jmp to bzero_vector. Get rid of this microoptimization as it only confuses things, adds machine-dependent code to an MD header, and doesn't really save all that much. This commit does not add my pagezero() / pagecopy() code.	2003-04-04 17:29:55 +00:00
phk	ee395e078c	Use bioq_flush() to drain a bio queue with a specific error code. Retain the mistake of not updating the devstat API for now. Spell bioq_disksort() consistently with the remaining bioq_*(). #include <geom/geom_disk.h> where this is more appropriate.	2003-04-01 15:06:26 +00:00
jeff	c44b6b488c	- Add thr and umtx system calls.	2003-04-01 01:15:56 +00:00
jeff	fde71359bc	- Define a new md function 'casuptr'. This atomically compares and sets a pointer that is in user space. It will be used as the basic primitive for a kernel supported user space lock implementation. - Implement this function in x86's support.s - Provide stubs that return -1 in all other architectures. Implementations will follow along shortly. Reviewed by: jake	2003-04-01 00:18:55 +00:00
jeff	3e36051ca6	- Add a placeholder for sigwait	2003-03-31 23:36:40 +00:00
jeff	3946316f71	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.	2003-03-31 22:49:17 +00:00
jeff	e81bb84595	- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.	2003-03-31 22:02:38 +00:00
jeff	ff354db2d8	- Use sigexit() instead of twiddling the signal mask, catch, ignore, and action bits to allow SIGILL to work as expected. This brings this file in line with other architectures.	2003-03-31 21:40:47 +00:00
das	3fda8d8ac7	Correct LDBL_* constants based on values from i386.	2003-03-27 20:38:22 +00:00
jake	a780914035	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)	2003-03-25 00:07:06 +00:00
ru	5f25fa151f	Remove bitrot associated with `maxusers'. Submitted by: bde	2003-03-22 14:18:23 +00:00
mux	09448722ff	Use atomic operations to increment and decrement the refcount in busdma tags. There are currently no tags shared accross different drivers so this isn't needed at the moment, but it will be required when we'll have a proper newbus method to get the parent busdma tag.	2003-03-20 19:45:26 +00:00
jake	4f8dd3b959	Made the prototypes for pmap_kenter and pmap_kremove MD. These functions are machine dependent because they are not required to update the tlb when mappings are added or removed, and doing so is machine dependent. In addition, an implementation may require that pages mapped with pmap_kenter have a backing vm_page_t, which is not necessarily true of all physical pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the physical address in order to make this requirement clear.	2003-03-16 04:16:03 +00:00
mux	90f311efb8	Bah, get it right this time and add sys/lock.h before sys/mutex.h.	2003-03-14 13:30:31 +00:00
mux	62daf20ba8	Oops, add missing includes. Pass me the pointy hat. Reported by: jake	2003-03-14 00:04:37 +00:00
mux	f834cef113	Grab Giant around calls to contigmalloc() and contigfree() so that drivers converted to be MP safe don't have to deal with it.	2003-03-13 17:18:48 +00:00
mux	35953f3b7d	Memory allocated with contigmalloc() should be freed with contigfree(), not with free().	2003-03-13 17:10:54 +00:00
marcel	86ab2164e6	Fix two rounds of breakages and cleanup. Remove the sccdebug sysctl while I'm here and garbage collect dead code (ssc_clone). Define d_maxsize as DFLTPHYS for now because that's what it will be if we don't define it.	2003-03-10 01:58:31 +00:00
phk	d7d1ad0eb0	Centralize the devstat handling for all GEOM disk device drivers in geom_disk.c. As a side effect this makes a lot of #include <sys/devicestat.h> lines not needed and some biofinish() calls can be reduced to biodone() again.	2003-03-08 08:01:31 +00:00

... 2 3 4 5 6 ...

1211 Commits