freebsd-skq

Author	SHA1	Message	Date
marcel	a524773fc0	Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.	2011-04-30 20:49:00 +00:00
jhb	08955ceac0	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.	2011-04-29 18:41:21 +00:00
attilio	d685681d59	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
rmacklem	66b402e198	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.	2011-04-27 17:51:51 +00:00
marcel	f3caa733d6	Remove prototypes of non-existent functions.	2011-04-25 22:38:09 +00:00
mav	512a6cd715	Switch the GENERIC kernels for all architectures to the new CAM-based ATA stack. It means that all legacy ATA drivers are disabled and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers for each type in order of detection, unless configured otherwise with tunables, see cam(4)). ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX.	2011-04-24 08:58:58 +00:00
marcel	09f8bacb54	Use the new arch_loadaddr I/F to align ELF objects to PBVM page boundaries. For good measure, align all other objects to cache lines boundaries. Use the new arch_loadseg I/F to keep track of kernel text and data so that we can wire as much of it as is possible. It is the responsibility of the kernel to link critical (read IVT related) code and data at the front of the respective segment so that it's covered by TRs before the kernel has a chance to add more translations. Use a better way of determining whether we're loading a legacy kernel or not. We can't check for the presence of the PBVM page table, because we may have unloaded that kernel and loaded an older (legacy) kernel after that. Simply use the latest load address for it.	2011-04-03 23:49:20 +00:00
kib	7c2eaa21fe	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
alc	411b69f8a0	Eliminate an unused definition. Reviewed by: marcel	2011-03-26 20:40:33 +00:00
marcel	cd817c0d10	Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.	2011-03-21 18:20:53 +00:00
marcel	e347771c19	Change region 4 to be part of the kernel. This serves 2 purposes: 1. The PBVM is in region 4, so if we want to make use of it, we need region 4 freed up. 2. Region 4 and above cannot be represented by an off_t by virtue of that type being signed. This is problematic for truss(1), ktrace(1) and other such programs.	2011-03-21 01:09:50 +00:00
bz	c41eae2d13	For now remove options FLOWTABLE from the remaining GENERIC kernel configurations and make it opt-in for those who want it. LINT will still build it. While it may be a perfect win in some scenarios, it still troubles users (see PRs) in general cases. In addition we are still allocating resources even if disabled by sysctl and still leak arp/nd6 entries in case of interface destruction. Discussed with: qingli (2010-11-24, just never executed) Discussed with: juli (OCTEON1) PR: kern/148018, kern/155604, kern/144917, kern/146792 MFC after: 2 weeks	2011-03-19 15:50:34 +00:00
marcel	2fa9bafa3d	o Move the IVT and supporting functions to the front of the text segment so that it's always mapped by the loader. o Change the alternate fault handlers to account for PBVM. Since currently the region is handled by the VHPT, no alternate faults will be generated for it.	2011-03-18 22:45:43 +00:00
marcel	1e701a660a	Remove inclusion of unneeded bootinfo.h header.	2011-03-18 22:33:19 +00:00
marcel	a49f210540	Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.	2011-03-18 15:36:28 +00:00
marcel	8e0b0a2284	MFaltix: Add support for Pre-Boot Virtual Memory (PBVM) to the loader. PBVM allows us to link the kernel at a fixed virtual address without having to make any assumptions about the physical memory layout. On the SGI Altix 350 for example, there's no usuable physical memory below 192GB. Also, the PBVM allows us to control better where we're going to physically load the kernel and its modules so that we can make sure we load the kernel in memory that's close to the BSP. The PBVM is managed by a simple page table. The minimum size of the page table is 4KB (EFI page size) and the maximum is currently set to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment one can specify in a linker script. The bottom line is that PBVM is between 64KB and 8GB in size. The loader maps the PBVM page table at a fixed virtual address and using a single translations. The PBVM itself is also mapped using a single translation for a maximum of 32MB. While here, increase the heap in the EFI loader from 512KB to 2MB and set the stage for supporting relocatable modules.	2011-03-16 03:53:18 +00:00
mdf	f225a1fc1c	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.	2011-03-11 18:56:55 +00:00
mdf	675238071f	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh	2011-03-10 22:56:00 +00:00
dchagin	69b8756d3d	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
alc	2f4da8e71e	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
marcel	78c6e82cfc	Comment-out FLOWTABLE. It causes a kernel panic due to a misaligned memory access related to an IPv6 route update. PR: kern/148018	2011-02-06 22:18:37 +00:00
mdf	b291e9a365	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
pluknet	5f536fc1d3	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
jkim	c94e91a673	Remove empty dev_mem_md_init() stubs.	2011-01-17 23:06:47 +00:00
jkim	ea861abf2a	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
jhb	c17f46e472	Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes. Reviewed by: bde	2011-01-11 13:59:06 +00:00
kib	4f8260e700	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
das	404cc61926	The highest-precision floating point type on ia64 has 64 bits of precision, so DECIMAL_DIG should be 21, as on i386/amd64.	2011-01-09 06:05:02 +00:00
tijl	89281909e1	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
tijl	af03e997ba	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
kib	ed862725de	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
brucec	6e3faf1602	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
brucec	dc1c4b9270	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
alc	de534f6887	phys_avail[] is correctly defined as an array of vm_paddr_t's in machdep.c. Use that same type, and not vm_offset_t, in this include file.	2010-12-01 05:52:27 +00:00
jhb	acd72eb169	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
jhb	c016e5df49	Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.	2010-11-09 20:41:10 +00:00
jkim	36651dc0f0	Reduce diff between platforms and fix style(9) bugs.	2010-11-09 00:14:39 +00:00
jhb	45c0759920	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
neel	11129fcf49	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb	2010-09-29 21:53:11 +00:00
imp	425d541f16	Remove clauses 3 and 4, per changes to NetBSD versions of these files.	2010-09-25 04:41:42 +00:00
avg	c9fe8ad7f0	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
jhb	d02cab2556	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
kib	d9f088a03e	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
kib	8e1e89f01b	Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well. MFC after: 1 week	2010-08-07 11:57:13 +00:00
jhb	19ddbf5c38	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
jhb	007376c918	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
mdf	6857471cf3	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)	2010-07-28 15:36:12 +00:00
jhb	f27c8b35e2	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
kib	9ac2754b6d	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
marcel	d729021fb8	Add acpi_find_table() -- a convenience function for looking up an ACPI table given the signature.	2010-07-07 20:07:33 +00:00
marcel	6268aba2f1	Remove pointless BOOTP conditional.	2010-07-07 19:34:48 +00:00
marcel	4f88b67ea9	Provide more examples for error injection.	2010-07-06 23:13:21 +00:00
marcel	94d3d209e1	Allocate and setup an interrupt vector for corrected machine checks. For now, just print when we get the interrupt, but eventually we need to collect the details and provide a more useful report.	2010-07-03 20:19:20 +00:00
marcel	f9988e6402	When compiling with profiling, we define PROF for userspace and GPROF for the kernel.	2010-07-01 00:30:35 +00:00
marcel	88c7d17a4c	While functions are ideally aligned to a 32-byte boundary, don't assume this to be the case.	2010-06-30 22:29:02 +00:00
jhb	de324e256c	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
marcel	bd74f111f8	The ptc.g operation for the Mckinley and Madison processors has the side-effect of purging more than the requested translation. While this is not a problem in general, it invalidates the assumption made during constructing the trapframe on entry into the kernel in SMP configurations. The assumption is that only the first store to the stack will possibly cause a TLB miss. Since the ptc.g purges the translation caches of all CPUs in the coherency domain, a ptc.g executed on one CPU can cause a purge on another CPU that is currently running the critical code that saves the state to the trapframe. This can cause an unexpected TLB miss and with interrupt collection disabled this means an unexpected data nested TLB fault. A data nested TLB fault will not save any context, nor provide a way for software to determine what caused the TLB miss nor where it occured. Careful construction of the kernel entry and exit code allows us to handle a TLB miss in precisely orchastrated points and thereby avoiding the need to wire the kernel stack, but the unexpected TLB miss caused by the ptc.g instructution resulted in an unrecoverable condition and resulting in machine checks. The solution to this problem is to synchronize the kernel entry on all CPUs with the use of the ptc.g instruction on a single CPU by implementing a bare-bones readers-writer lock that allows N readers (= N CPUs entering the kernel) and 1 writer (= execution of the ptc.g instruction on some CPU). This solution wins over a rendez-vous approach by not interrupting CPUs with an IPI. This problem has not been observed on the Montecito. PR: ia64/147772 MFC after: 6 days	2010-06-12 01:45:29 +00:00
alc	6a3535c3fa	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
marcel	a502e20888	Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4) in particular, do not handle deferred DMA map load operations at all. Any error, and especially EINPROGRESS, is treated as a hard error and typically abort the current operation. The fact that the busdma code queues the load operation for when resources (i.e. bounce buffers in this particular case) are available makes this especially problematic. Bounce buffering, unlike what the PR synopsis would suggest, works fine. While on the subject, properly implement swi_vm(). PR: 147502 MFC after: 1 week	2010-06-11 03:00:32 +00:00
alc	7c212e010d	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
alc	5c8460ba2e	Simplify the inner loop of get_pv_entry(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty, wait instead until the loop completes.	2010-05-30 20:31:12 +00:00
alc	841f6f3ff0	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.	2010-05-29 18:26:44 +00:00
alc	3f1d4b057c	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
kib	83ca04e61c	Change ia64' struct syscall_args definition so that args is a pointer to the arguments array instead of array itself. ia64 syscall arguments are readily available in the frame, point args to it, do not do unnecessary bcopy. Still reserve the array in syscall_args for ia32 emulation. Suggested and reviewed by: marcel MFC after: 1 month	2010-05-24 17:24:14 +00:00
alc	32b13ee957	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
jhb	cf780ce267	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
marcel	b651ec8311	Switch to C99 exact-width types.	2010-05-19 00:23:10 +00:00
alc	f6c07c5b87	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
alc	40b44f9713	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
alc	df2257e061	MFamd64/i386 r207205 Clearing a page table entry's accessed bit and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it. Moreover, on ia64, don't set the page's dirty field unless pmap_protect() is removing write access.	2010-04-29 15:47:31 +00:00
attilio	6dfd3f3030	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks	2010-04-28 15:38:01 +00:00
kib	e20b2d597f	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
alc	0a905b1db9	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
kib	e91c695f77	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
thompsa	bd3f3db8dd	Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had the illusion of a tunable setting but was always turned on regardless. MFC after: 1 week	2010-04-22 21:31:34 +00:00
marcel	7cced2abf4	Populate the sysctl tree with any MCA records we collected. The sequence number is used as the name of a sysctl node, under which we add the MCA records using the CPU id as the leaf name. Add the hw.mca.inject sysctl to provide a way to inject MC errors and trigger machine checks. PR: ia64/113102	2010-04-13 22:20:12 +00:00
marcel	3ac1019f84	Change the (generic) argument to ia64_store_mca_state() from the cpuid to the struct pcpu of the CPU. We casting between pointer types only then.	2010-04-13 15:55:18 +00:00
marcel	b6a532016f	o s/u_int64_t/uint64_t/g o style(9) fixes.	2010-04-13 15:51:25 +00:00
marcel	af4d5feefb	Sync up to SDM 2.2.	2010-04-13 03:10:38 +00:00
marcel	35f33c0a98	Bring up-to-date: o Switch to ITANIUM2 has the cpu. This has absolutely no effect on the code, but makes for a better example. o Drop COMPAT_FREEBSD6. We're tier 2, so you're supposed to run 8-stable or newer. o Add PREEMPTION. It works now. o Remove HWPMC_HOOKS. We don't have support for hwpmc yet. o Add a bunch of new devices: atapist, hptiop, amr, ips, twa, igb, ixgbe, ae, age, alc, ale, bce, bfe, et, jme, msk, nge, sk, ste, stge, tx, vge, axe, rue, udav, fwip, and all USB serial. o Remove "legacy" devices: le, vx, dc, pcn, rl, sis. Make sure to the module list is a superset of what goes into GENERIC.	2010-03-27 06:53:11 +00:00
marcel	bf41341894	Implement interrupt to CPU binding. Assign interrupts to CPUs in a round-robin fashion, starting with the highest priority interrupt on the highest-numbered CPU and cycling downwards.	2010-03-27 05:40:50 +00:00
marcel	8b9600e577	Remove nx_pcibus from the nexus resource. Nexus is not involved with PCI busses. Remove nexus_read_ivar() and nexus_write_ivar() to give default behaviour. Remove <machine/nexusvar.h> as well, because there's nothing in it that's being used.	2010-03-27 03:15:34 +00:00
marcel	fad010c732	Rename disable_intr() to ia64_disable_intr() and rename enable_intr() to ia64_enable_intr(). This reduces confusion with intr_disable() and intr_restore(). Have configure_final() call ia64_finalize_intr() instead of enable_intr() in preparation of adding support for binding interrupts to all CPUs.	2010-03-26 21:22:02 +00:00
marcel	a8e6531463	Only use the interval timer for clock interrupts on the BSP and have the BSP use IPIs to trigger clock interrupts on the APs. This allows us to run on hardware configurations for which the ITC has non-uniform frequencies across CPUs. While here, change the clock XIV to type IPI so as to protect the interrupt delivery against CPU re-balancing once that's implemented.	2010-03-26 02:29:15 +00:00
nwhitehorn	7de0bbaff5	Fix the ia64 build. Pointy hat to: me	2010-03-26 00:53:13 +00:00
nwhitehorn	d63c82a6ac	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb	2010-03-25 14:24:00 +00:00
marcel	e8d4382fb5	o Remove the pmap argument to pmap_invalidate_all() as it's not used other than in a potentially dangerous KASSERT. o Hand-inline pmap_remove_page() as it's only called from 1 place and the abstraction that pmap_remove_page() provides is not enough to warrant the obfuscation. Eliminate the dangerous KASSERT in the process. o In pmap_remove_pte(), remove the KASSERT for pmap being the current one as it's not safe in the face of CPU migration.	2010-03-22 18:24:42 +00:00
marcel	a52af8cfd6	Drop the pmap argument to pmap_invalidate_page(). It's not used other than in a KASSERT. The KASSERT is broken in that it's done outside the critical section and as such isn't protected against CPU migration. Improve pmap_invalidate_page() as follows: o calculate vhpt_ofs inside the critical region for exactly the same reason. o calculate the tag outside the FOREACH loop, as it's loop-invariant. This is more efficient. o Replace the test and set with an atomic cmpset operation because we are changing other CPU's VHPT tables and this avoids invalidating after the entry got modified. Not necessarily a problem, but better safe than sorry.	2010-03-22 04:24:19 +00:00
marcel	faf8fb6e5b	With preemption, the high FP registers may get enabled by cpu_switch() before we grab the mutex. Don't assert that they must be disabled at that point. We pretty much bypass all logic in that case anyway and leave immediately, so there's no harm.	2010-03-22 04:01:45 +00:00
marcel	1d18613088	Fix interrupt handling by extending the critical region so that preemption doesn't happen until after all pending interrupt have been services. While here again, simplify the EOI handling by doing it after we call the XIV-specific handlers, rather than in each of them. The original thought was that we may want to do an EOI first and the actual IPI handling next, but that's mostly a micro-optimization.	2010-03-22 03:55:18 +00:00
marcel	6b762ecd8b	Disable interrupts when calling into SAL for PCI configuration cycles. This serves 2 purposes: 1. It prevents preemption and CPU migration while running SAL code. 2. It reduces the chance of stack overflows: we're supposed to enter SAL with at least 16KB of either memory- or register stack space, which we can't do without switching to a different stack.	2010-03-22 03:06:11 +00:00
marcel	d5582bd601	Define curthread as an inline function that loads the thread pointer directly from r13, the pcpu pointer. This guarantees correct behaviour when the thread migrates to a different CPU.	2010-03-22 02:01:33 +00:00
marcel	db34b4a5eb	Print MD fields in the pcpu to aid debugging.	2010-03-21 22:39:11 +00:00
marcel	8ed51dd270	Don't include <machine/_regset.h> when _MACHINE_REGSET_H_ in defined. This is not for multiple inclusion purposes, because _regset.h already handles this, but to enable inclusion of the MD header by cross-tools on non-ia64 installations. The cross-tool can include _regset.h itself before including MD headers that depend on it.	2010-03-21 22:33:09 +00:00
marcel	964c1781bd	Don't check for boot_verbose in the environment. The loader does that already and sets RB_VERBOSE. The loader has always done it.	2010-03-20 04:22:22 +00:00
marcel	a16f0fefdc	Revamp the interrupt code based on the previous commit: o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs. o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable. o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c. o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with. o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI. o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.	2010-03-17 00:37:15 +00:00
marcel	38cee4c837	Have cpu_throw() loop on blocked_lock as well. This bug has existed a long time and has gone unnoticed just as long, because I kept using sched_4bsd (due to sched_ule not working with preemption), but GENERIC had sched_ule by default -- including SMP. While here, remove unused inclusion of <machine/clock.h>, remove totally bogus inclusion of <i386/include/specialreg.h>.	2010-03-15 16:53:09 +00:00
ed	6cb70302d9	Remove COMPAT_43TTY from stock kernel configuration files. COMPAT_43TTY enables the sgtty interface. Even though its exposure has only been removed in FreeBSD 8.0, it wasn't used by anything in the base system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your ports/packages are less than two years old, they will prefer termios over sgtty.	2010-03-13 09:21:00 +00:00
nwhitehorn	4f924bab25	Accidentally committed test code. Remove it. Big pointy hat: me	2010-03-11 14:54:54 +00:00
nwhitehorn	142a4d2993	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
marcel	1fe700c954	Remove inclusion of <i386/include/psl.h> While here move inclusion of <sys/lock.h> in a better place.	2010-03-09 02:08:02 +00:00
marcel	09b202ca29	Remove support for SYS_RES_DRQ.	2010-03-09 02:05:01 +00:00
joel	2e980c4bcf	The NetBSD Foundation has granted permission to remove clause 3 and 4 from the software. Obtained from: NetBSD	2010-03-03 17:55:51 +00:00
marcel	2600409816	Interrupt related cleanups: o Assign vectors based on priority, because vectors have implied priority in hardware. o Use unordered memory accesses to the I/O sapic and use the acceptance form of the mf instruction. o Remove the sapicreg.h and sapicvar.h headers. All definitions in sapicreg.h are private to sapic.c and all definitions in sapicvar.h are either private or interface functions. Move the interface functions to intr.h. o Hide the definition of struct sapic.	2010-02-27 18:55:43 +00:00
marcel	27371be823	Prefer I-units and M-units for nop instructions. This works around McKinley flaws. It also avoids using the F-unit in the kernel for no reason.	2010-02-22 01:23:41 +00:00
marcel	61400e541a	Normalize nop instructions: Only use 0 for the immediate operand.	2010-02-21 23:41:59 +00:00
marcel	710af9deaa	Remove pm_active from struct pmap as it serves no purpose. MFC after: 1 week	2010-02-21 23:10:13 +00:00
attilio	b9f41eb470	Adjust style (following the already existing rules) for the newly introduced option DEADLKRES. Reported by: danfe, julian, avg	2010-02-15 23:44:48 +00:00
marcel	fa7b6a95c4	Some code cleanups: o s/u_int32_t/uint32_t/g o Add multiple-inclusion protection. o Break long lines.	2010-02-14 17:03:20 +00:00
marcel	d64c132e24	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.	2010-02-14 16:56:24 +00:00
attilio	184538e270	Add the options DEADLKRES (introducing the deadlock resolver thread) in the 'debugging' section of any HEAD kernel and enable for the mainstream ones, excluding the embedded architectures. It may, of course, enabled on a case-by-case basis. Sponsored by: Sandvine Incorporated Requested by: emaste Discussed with: kib	2010-02-10 16:30:04 +00:00
marcel	a85ee262b8	Fix single-stepping when the kernel was entered through the EPC syscall path. When the taken branch leaves the kernel and enters the process, we still need to execute the instruction at that address. Don't raise SIGTRAP when we branch into the process, but enable single-stepping instead.	2010-02-06 20:46:14 +00:00
marcel	973b5fa5f2	In pci_cfgregread() and pci_cfgregwrite(), validate the arguments and check that the alignment matches the width of the read or write.	2010-01-28 04:50:09 +00:00
marcel	0e7685cf23	In cpu_switch(), use an atomic operation to set the td_lock of the old thread to the mutex that's passed. Pointed out by: attilio, jhb	2010-01-27 02:32:07 +00:00
marcel	52263e30e2	Remove cpu_boot() and call efi_reset_system() directly from cpu_reset().	2010-01-23 23:16:50 +00:00
marcel	47afb8a1a7	Add ioctl requests to /dev/io on ia64 for reading and writing EFI variables. The primary reason for this is that it allows sysinstall(8) to add a boot menu item for the newly installed FreeBSD image.	2010-01-14 02:48:39 +00:00
marcel	109b8dd039	Fix previous commitr:. efi_var_set() was copied from efi_var_get(), but wasn't actually changed.	2010-01-14 02:38:46 +00:00
marcel	82fc1e77df	Add wrappers for the RT Variable Services. While here, translate the EFI status into a standard errno value and change efi_set_time() to return a standard error. MFC after: 1 week	2010-01-14 02:14:21 +00:00
marcel	ef030a7c4e	Use io(4) for I/O port access on ia64, rather than through sysarch(2). I/O port access is implemented on Itanium by reading and writing to a special region in memory. To hide details and avoid misaligned memory accesses, a process did I/O port reads and writes by making a MD system call. There's one fatal problem with this approach: unprivileged access was not being prevented. /dev/io serves that purpose on amd64/i386, so employ it on ia64 as well. Use an ioctl for doing the actual I/O and remove the sysarch(2) interface. Backward compatibility is not being considered. The sysarch(2) approach was added to support X11, but support for FreeBSD/ia64 was never fully implemented in X11. Thus, nothing gets broken that didn't need more work to begin with. MFC after: 1 week	2010-01-11 18:10:13 +00:00
imp	80a1a3fce5	Add INCLUDE_CONFIG_FILE in GENERIC on all non-embedded platforms. # This is the resolution of removing it from DEFAULTS... MFC after: 5 days	2010-01-10 17:44:22 +00:00
bz	ad608e4e42	In sys/<arch>/conf/Makefile set TARGET to <arch>. That allows sys/conf/makeLINT.mk to only do certain things for certain architectures. Note that neither arm nor mips have the Makefile there, thus essentially not (yet) supporting LINT. This would enable them do add special treatment to sys/conf/makeLINT.mk as well chosing one of the many configurations as LINT. This is a hack of doing this and keeping it in a separate commit will allow us to more easily identify and back it out. Discussed on/with: arch, jhb (as part of the LINT-VIMAGE thread) MFC after: 1 month	2010-01-08 18:57:31 +00:00
imp	699b88787b	Revert 200594. This file isn't intended for these sorts of things.	2010-01-04 21:30:04 +00:00
brooks	3071fcfc73	Add vlan(4) to all GENERIC kernels. MFC after: 1 week	2010-01-03 20:40:54 +00:00
marcel	816aa79e51	Change BUS_SPACE_MAXADDR from 2^32-1 to 2^64-1. 2^32-1 is representative for its origin, more than for its accuracy. MFC after: 1 week	2010-01-02 00:37:00 +00:00
marcel	33f49fd7d2	Revamp bus_space access functions: o Optimize for memory mapped I/O by making all I/O port acceses function calls and marking the test for the IA64_BUS_SPACE_IO tag with __predict_false(). Implement the I/O port access functions in a new file, called bus_machdep.c. o Change the bus_space_handle_t for memory mapped I/O to the virtual address rather than the physical address. This eliminates the PA->VA translation for every I/O access. The handle for I/O port access is still the port number. o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string variants from cpufunc.h and define them in bus.h. On ia64 these are not CPU functions at all. In bus.h they are merely aliases for the new I/O port access functions defined in bus_machdep.h. o Handle the ACPI resource bug in nexus_set_resource(). There we can do it once so that we don't have to worry about it whenever we need to write to an I/O port that is really a memory mapped address. The upshot of this change is that the KBI is better defined and that I/O port access always involves a function call, allowing us to change the actual implementation without breaking the KBI. For memory mapped I/O the virtual address is abstracted, so that we can change the VA->PA mapping in the kernel without causing an KBI breakage. The exception at this time is for bus_space_map() and bus_space_unmap(). MFC after: 1 week.	2009-12-30 18:15:25 +00:00
rnoland	3dc3ad8568	Update d_mmap() to accept vm_ooffset_t and vm_memattr_t. This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime...	2009-12-29 21:51:28 +00:00
antoine	bfd388c026	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
marcel	dfcc7c385f	Use unordered memory loads and stores for the in* and out* family of functions.	2009-12-26 22:22:09 +00:00
marcel	6c6778bb48	Export the bus, cpu and itc frequencies under the hw.freq sysctl node. The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The frequencies are rounded to the nearest whole MHz. While here, rename and re-type bus_frequency, processor_frequency and itc_frequency to bus_freq, cpu_freq and itc_freq and make them static. As unsigned integers, the hw.freq.cpu sysctl can more easily be made generic (across all architectures) making porting easier. MFC after: 3 days	2009-12-23 04:48:42 +00:00
marcel	84986e81a5	Add a bit definition for invalid timestamp in the record header.	2009-12-23 04:39:05 +00:00
dougb	38047fc578	Add INCLUDE_CONFIG_FILE, and a note in comments about how to also include the comments with CONFIGARGS	2009-12-16 02:17:43 +00:00
marcel	2b1df305a4	In exception_save, write-back ar.rnat after switching the backing- store. Writing to ar.bspstore is defined to leave ar.rnat undefined. PR: ia64/120315 MFC after: 3 days	2009-12-08 00:44:23 +00:00
marcel	2a4cc74b50	Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.	2009-12-07 06:41:27 +00:00
marcel	faf626e11c	Allocate the VHPT for each CPU in cpu_mp_start(), rather than allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU without memory waste -- MAXCPU is now 32 for SMP kernels. This change also eliminates the VHPT scaling based in the total memory in the system. It's the workload that determines the best size of the VHPT. The workload can be affected by the amount of memory, but not necessarily. For example, there's no performance difference between VHPT sizes of 256KB, 512KB and 1MB when building the LINT kernel. This was observed with a system that has 8GB of memory. By default the kernel will allocate a 1MB VHPT. The user can tune the system with the "machdep.vhpt.log2size" tunable.	2009-12-07 00:54:02 +00:00
marcel	5ccb87e2cc	Make sure bus space accesses use unorder memory loads and stores. Memory accesses are posted in program order by virtue of the uncacheable memory attribute. Since GCC, by default, adds acquire and release semantics to volatile memory loads and stores, we need to use inline assembly to guarantee it. With inline assembly, we don't need volatile pointers anymore. Itanium does not support semaphore instructions to uncacheable memory.	2009-12-03 04:06:48 +00:00
marcel	3797d9ecfd	Move the sysctl related fields to the end of the structure and make them conditional upon _KERNEL. libkvm includes <sys/pcpu.h> and <sys/sysctl.h> does not expose the structure definitions to userland.	2009-11-29 20:17:50 +00:00
marcel	e128830dea	Eliminate teh use of MAXCPU in static arrays of interrupt counters by adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context. While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.	2009-11-28 21:01:15 +00:00
alc	dcb93e6c95	Simplify the invocation of vm_fault(). Specifically, eliminate the flag VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib	2009-11-27 20:24:11 +00:00
marcel	0ec75e125c	Improve upon revision 196196 by removing the newly added comment in the wrong place and instead add a KASSERT in the right place.	2009-11-24 01:35:21 +00:00
marcel	f2027e763f	Revert previous commit. The problem was not related to overrunning the kernel stack at all. The new USB stack simply caused a change in timing that triggered a firmware bug more often. The addition of PRINTF_BUFR_SIZE apparently triggered the same firmware bug even more reliably. But even with KSTACK_PAGES=5, one instance of the firmware bug remained: booting with a CD inserted. This problem was run into by accident after installing Debian and having to boot FreeBSD to fixup the GPT partitioning (Thanks... not). After bumping KSTACK_PAGES to 5, it was pretty unbelievable that the stack was still being too small. After updating the firmware we could boot with a CD inserted and KSTACK_PAGES could be lowered back to 4 pages without problems. Note: It is believed to be a timing related firmware bug, because the machine check information showed access to the serial console on one CPU and access to the EHCI HCD on the other CPU. Since both are devices on the management unit and thus virtualized in some way, any execution trace that does not include concurrent access to the BMC from both CPUs is fine. Note also that it's not understood exactly how increasing the kernel stack avoided hitting the firmware bug. A change in page faults does change timing, but it's not known if that's what's happening here. In any case: the problem is being monitored. Reverting back to 4 pages for the kernel stack is preferred, because it makes it easier to switch to 16K pages (double the page size) without wasting too much memory by not being able to half the number of pages...	2009-11-23 21:09:23 +00:00
marcel	b65660166a	No need to include opt_kstack_pages.h, because KSTACK_PAGES is already defined through genassym.c	2009-11-20 07:40:02 +00:00
marcel	bbdd2d54f5	Add a seatbelt to the Nested TLB Fault handler to give us a chance to panic when we have an unexpected TLB fault while interrupt collection is disabled. Use a token rather than the actual address of the restart point to avoid the need for the movl instruction. The token is arbitrary. For the drummers: it's based on a single paradiddle.	2009-11-20 03:14:54 +00:00
marcel	e19c5f654c	opt_* headers are included using the quoted form.	2009-11-19 01:27:22 +00:00
kib	3cf53f181e	Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval(). Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month	2009-11-10 11:43:07 +00:00
marcel	943e1b107a	Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers. This change fixes the high FP inconsistency panics. A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.	2009-10-31 22:27:31 +00:00
kib	ce081b037e	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
marcel	b15a68a1be	Add PRINTF_BUFR_SIZE=128, since we have SMP by default. While here, fix tabulation.	2009-10-24 20:35:34 +00:00
marcel	db77e6c4a7	A 32KB kernel stack is not quite enough. The new USB stack is a bit more stack hungry as compared to the old one that my RX2660 gets a machine check and spontaneously reboots at the time the USB DVD drive is found and attached to CAM as a mass storage device. This doesn't happen always, but definitely varies per kernel build. Likewise when using a 128-byte printf buffer. The additional 128 bytes that printf needs seems to be enough to have the memory stack and register stack collide and causing a machine check. Thus: Bump KSTACK_PAGES from 4 to 5.	2009-10-24 20:28:42 +00:00

1 2 3 4 5 ...

1919 Commits