freebsd-skq

Author	SHA1	Message	Date
Ed Schouten	6bfa9a2d66	Replace all calls to minor() with dev2unit(). After I removed all the unit2minor()/minor2unit() calls from the kernel yesterday, I realised calling minor() everywhere is quite confusing. Character devices now only have the ability to store a unit number, not a minor number. Remove the confusion by using dev2unit() everywhere. This commit could also be considered as a bug fix. A lot of drivers call minor(), while they should actually be calling dev2unit(). In -CURRENT this isn't a problem, but it turns out we never had any problem reports related to that issue in the past. I suspect not many people connect more than 256 pieces of the same hardware. Reviewed by: kib	2008-09-27 08:51:18 +00:00
Konstantin Belousov	a8d403e102	Change the static struct sysentvec and struct Elf_Brandinfo initializers to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month	2008-09-24 10:14:37 +00:00
Stanislav Sedov	4f55a298ce	- Recognize SAVE and OSXSAVE extended processor features. Approved by: kib (mentor) MFC after: 1 month	2008-09-18 18:51:32 +00:00
Joseph Koshy	d0d0192f83	Correct a callchain capture bug on the i386. On the i386 architecture, the processor only saves the current value of `%esp' on stack if a privilege switch is necessary when entering the interrupt handler. Thus, `frame->tf_esp' is only valid for an entry from user mode. For interrupts taken in kernel mode, we need to determine the top-of-stack for the interrupted kernel procedure by adding the appropriate offset to the current frame pointer. Reported by: kris, Fabien Thomas Tested by: Fabien Thomas <fabien.thomas at netasq dot com>	2008-09-15 06:47:52 +00:00
John Baldwin	3591fea8b0	Add a 'hw.pci.mcfg' tunable. It can be set to 0 to disable memory-mapped PCI config access.	2008-09-11 21:42:11 +00:00
John Baldwin	289f40c67b	Update the comments above the 0xcf9 register reset attempt to match the code. We only attempt a single reset using this method (a "hard" reset), and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to force a reset. MFC after: 1 week	2008-09-11 18:33:57 +00:00
John Baldwin	2d10570afe	Some K8 chipsets don't expose all of the PCI devices on bus 0 via PCIe memory-mapped config access. Add a workaround for these systems by checking the first function of each slot on bus 0 using both the memory-mapped config access and the older type 1 I/O port config access. If we find a slot that is only visible via the type 1 I/O port config access, we flag that slot. Future PCI config transactions to flagged slots on bus 0 use type 1 I/O port config access rather than memory mapped config access.	2008-09-10 18:06:08 +00:00
Konstantin Belousov	3bd5e467b2	The pcb_gs32p should be per-cpu, not per-thread pointer. This is location in GDT where the segment descriptor from pcb_gs32sd is copied, and the location is in GDT local to CPU. Noted and reviewed by: peter MFC after: 1 week	2008-09-08 09:59:05 +00:00
Konstantin Belousov	1772c3e9bb	Provide private per-CPU GDTs on amd64. This is required at least for the linux CB_GS32BIT to work. Noted by: nox Reviewed by: peter MFC after: 1 week	2008-09-08 09:55:51 +00:00
Konstantin Belousov	7b1608fde1	In linux_set_thread_area(), mark pcb as PCB_GS32BIT. This was missed when r180992 was committed. Reviewed by: peter MFC after: 1 week	2008-09-08 09:09:23 +00:00
Konstantin Belousov	575a30d883	Fix inconsistencies in the comments. MFC after: 1 week	2008-09-08 08:58:29 +00:00
Konstantin Belousov	9dee707cf0	Segment registers are stored in the uc_mcontext member of the struct l_ucontext. To restore the registers content, trampoline needs to dereference uc_mcontext instead of taking some undefined values from l_ucontext. Submitted by: Dmitry Chagin <dchagin@> MFC after: 1 week	2008-09-07 16:39:21 +00:00
Konstantin Belousov	f98c3ea74e	- When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386 processes, clear PCB_32BIT and PCB_GS32BIT bits [1]. - Reread the fs and gs bases from the msr unconditionally, not believing the values in pcb_fsbase and pcb_gsbase, since usermode may reload segment registers, invalidating the cache. [2]. Both problems resulted in the wrong fs base, causing wrong tls pointer be dereferenced in the usermode. Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1] Reported by: Bernd Walter <ticsoat cicely7 cicely de>, Artem Belevich <fbsdlist at src cx>[2] Reviewed by: peter MFC after: 3 days	2008-09-02 17:52:11 +00:00
Jung-uk Kim	a2b12e3b23	Move empty filter handling to MI source. MFC after: 3 days	2008-08-26 21:06:31 +00:00
Jung-uk Kim	f471e5690e	Fix a typo in copyrights.	2008-08-25 20:43:13 +00:00
John Baldwin	ad86a65e32	Adjust the handling the various timer frequencies when using the lapic timer. Previously, the various divisors were fixed which meant that while it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails with any other hz value. With these changes, we now pick a lapic timer hz based on the value of hz. If hz is >= 1500, then the lapic timer runs at hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we run at hz * 4. We compute a divider at runtime to make stathz run as close to 128 as we can since stathz really wants to be run at something close to that frequency. Profiling just runs on every clock tick. So some examples: With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer is still at 2000 (as it is now), stathz is at 133 (as it is now), and profhz will be 2000 (previously 666). MFC after: 2 weeks	2008-08-23 12:35:43 +00:00
John Baldwin	d320e05ca5	Extend the support for PCI-e memory mapped configuration space access: - Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the rest of the kernel. It now also accepts parameters via function arguments rather than global variables. - Add a notion of minimum and maximum bus numbers and reject requests for an out of range bus. - Add more range checks on slot/func/reg/bytes parameters to the cfg reg read/write routines. Don't panic on any invalid parameters, just fail the request (writes do nothing, reads return -1). This matches the behavior of the other cfg mechanisms. - Port the memory mapped configuration space access to amd64. On amd64 we simply use the direct map (via pmap_mapdev()) for the memory mapped window. - During acpi_attach() just after loading the ACPI tables, check for a MCFG table. If it exists, call pciereg_cfgopen() on each subtable (memory mapped window). For now we only support windows for domain 0 that start with bus 0. This removes the need for more chipset-specific quirks in the MD code. - Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets since these machines should all have MCFG tables via ACPI. - Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen() earlier. MFC after: 2 weeks	2008-08-22 02:14:23 +00:00
Ed Schouten	bc093719ca	Integrate the new MPSAFE TTY layer to the FreeBSD operating system. The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan	2008-08-20 08:31:58 +00:00
John Baldwin	70d12a18f2	Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week	2008-08-19 19:53:52 +00:00
Jung-uk Kim	69e08c86a5	Correctly check unsignedness of all BPF_LD\|BPF_IND instructions. This is roughly from sys/net/bpf_filter.c r1.12 and r1.14.	2008-08-18 19:14:26 +00:00
Jung-uk Kim	3bfea8682f	- Make these files compilable on user land. - Update copyrights and fix style(9).	2008-08-18 18:59:33 +00:00
Konstantin Belousov	8ad85ff260	The doreti_iret_fault code is always called with gs base MSR containing kernel gs base, because %rip is adjusted only on kernel-mode trap caused by iretq execution. On the other hand, the stack contains (hardware part of) trap frame from the usermode. As a consequence, checking for frame mode and doing swapgs causes the kernel to enter trap() with usermode gs base. Remove the check for mode and conditional swapgs, we already have right gs base in the MSR. Submitted by: Nate Eldredge <neldredge math ucsd edu> MFC after: 3 days	2008-08-18 08:47:27 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Jung-uk Kim	8c4d5bbc6f	Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does.	2008-08-13 19:52:00 +00:00
Jung-uk Kim	f40611e24f	- Remove unnecessary jump instruction(s) when offset(s) is/are zero(s). - Constantly use conditional jumps for unsigned integers.	2008-08-13 19:25:09 +00:00
Jung-uk Kim	095130bf72	Update copyrights and fix style(9).	2008-08-12 21:31:31 +00:00
Jung-uk Kim	059485d074	Replace all stack usages with registers and remove unused macros.	2008-08-12 20:10:45 +00:00
John Baldwin	e80531c27f	Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr, stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter, sysexit, pause, monitor, mwait, and swapgs (amd64 only). MFC after: 1 week	2008-08-11 20:19:42 +00:00
Alan Cox	b09485a336	Intel describes the behavior of their processors as "undefined" if two or more mappings to the same physical page have different memory types, i.e., PAT settings. Consequently, if pmap_change_attr() is applied to a virtual address range within the kernel map, then the corresponding ranges of the direct map also need to be changed. Enhance pmap_change_attr() to handle this case automatically. Add a comment describing what pmap_change_attr() does. Discussed with: jhb	2008-08-09 05:46:13 +00:00
Stanislav Sedov	e085f869d5	- Add cpuctl(4) pseudo-device driver to provide access to some low-level features of CPUs like reading/writing machine-specific registers, retrieving cpuid data, and updating microcode. - Add cpucontrol(8) utility, that provides userland access to the features of cpuctl(4). - Add subsequent manpages. The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX is created for each cpu present in the systems. The pseudo-device minor number corresponds to the cpu number in the system. The cpuctl(4) pseudo- device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID and UPDATE. The first pair alows the caller to read/write machine-specific registers from the correspondent CPU. cpuid data could be retrieved using the CPUID call, and microcode updates are applied via UPDATE. The permissions are inforced based on the pseudo-device file permissions. RDMSR/CPUID will be allowed when the caller has read access to the device node, while WRMSR/UPDATE will be granted only when the node is opened for writing. There're also a number of priv(9) checks. The cpucontrol(8) utility is intened to provide userland access to the cpuctl(4) device features. The utility also allows one to apply cpu microcode updates. Currently only Intel and AMD cpus are supported and were tested. Approved by: kib Reviewed by: rpaulo, cokane, Peter Jeremy MFC after: 1 month	2008-08-08 16:26:53 +00:00
Alan Cox	517abd0e4e	Introduce pmap_change_attr_locked().	2008-08-07 04:56:29 +00:00
Alan Cox	494c177e81	Make pmap_kenter_attr() static.	2008-08-04 08:04:09 +00:00
Ed Schouten	200d80cd74	Disconnect drivers that haven't been ported to MPSAFE TTY yet. As clearly mentioned on the mailing lists, there is a list of drivers that have not been ported to the MPSAFE TTY layer yet. Remove them from the kernel configuration files. This means people can now still use these drivers if they explicitly put them in their kernel configuration file, which is good. People should keep in mind that after August 10, these drivers will not work anymore. Even though owners of the hardware are capable of getting these drivers working again, I will see if I can at least get them to a compilable state (if time permits).	2008-08-03 10:32:17 +00:00
Alan Cox	75accfd97d	Enhance pmap_mapdev_attr(). Take advantage of recent enhancements to pmap_change_attr() in order to use the direct map for any cache mode, not just write-back mode. It is worth noting that this change also eliminates a situation in which we have two mappings to the same physical memory with different cache modes. Submitted by: Magesh Dhasayyan (with some changes by me) Discussed with: jhb	2008-08-02 03:43:54 +00:00
Alan Cox	67cbc11594	Enhance pmap_change_attr() with the ability to demote 1GB page mappings.	2008-08-01 04:55:38 +00:00
Alan Cox	ba65f767c0	Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache mode changes, and cache and TLB invalidation when some or all of the specified range is already mapped with the specified cache mode. Submitted by: Magesh Dhasayyan	2008-07-31 22:45:28 +00:00
Alan Cox	c1695335d1	Eliminate recomputation of the PDE by pmap_pde_attr().	2008-07-31 04:42:42 +00:00
Jack F Vogel	859ff640f3	Add igb to the default kernel MFC after:ASAP	2008-07-30 22:27:38 +00:00
Konstantin Belousov	8f4a1f3a83	Bring back the save/restore of the %ds, %es, %fs and %gs registers for the 32bit images on amd64. Change the semantic of the PCB_32BIT pcb flag to request the context switch code to operate on the segment registers. Its previous meaning of saving or restoring the %gs base offset is assigned to the new PCB_GS32BIT flag. FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit emulation sets PCB_32BIT \| PCB_GS32BIT. Reviewed by: peter MFC after: 2 weeks	2008-07-30 11:30:55 +00:00
Alan Cox	b0c139d336	Don't allow pmap_change_attr() to be applied to the recursive mapping.	2008-07-28 04:59:48 +00:00
Alan Cox	a8bb29e5d2	Add a check for 1GB page mappings to pmap_change_attr() so that it fails gracefully. (On K10 family processors the direct map is implemented using 1GB page mappings.)	2008-07-28 03:58:49 +00:00
Alan Cox	35db2ce0dc	Style fixes to several function definitions.	2008-07-27 18:18:50 +00:00
Alan Cox	91842e53a9	Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2MB page mapping to 4KB page mappings when the specified attribute change only applies to a portion of the 2MB page. Previously, in such cases, pmap_change_attr() gave up and returned an error. Submitted by: Magesh Dhasayyan	2008-07-27 17:32:36 +00:00
Alan Cox	9a8f043722	Increase the ceiling on the size of the buffer map.	2008-07-19 23:42:38 +00:00
Alan Cox	59a23cacd4	Correct an error in pmap_change_attr()'s initial loop that verifies that the given range of addresses are mapped. Previously, the loop was testing the same address every time. Submitted by: Magesh Dhasayyan	2008-07-18 22:05:51 +00:00
Alan Cox	53d13c6030	Simplify pmap_extract()'s control flow, making it more like the related functions pmap_extract_and_hold() and pmap_kextract().	2008-07-18 20:07:50 +00:00
Alan Cox	36e6513df5	Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is specified when appropriate. Reviewed by: scottl	2008-07-15 03:34:49 +00:00
Alan Cox	cfcbf8c6fd	Handle a race between pmap_kextract() and pmap_promote_pde(). This race caused ZFS to crash when restoring a snapshot with superpage promotion enabled. Reported by: kris	2008-07-13 18:19:53 +00:00
Ed Schouten	f4d811f0b2	Make uart(4) the default serial port driver on i386 and amd64. The uart(4) driver has the advantage of supporting a wider variety of hardware on a greater amount of platforms. This driver has already been the standard on platforms such as ia64, powerpc and sparc64. I've decided not to change anything on pc98. I'd rather let people from the pc98 team look at this. Approved by: philip (mentor), marcel	2008-07-13 07:20:14 +00:00
Alan Cox	8bfadfd616	Refine the changes made in SVN rev 180430. Specifically, instantiate a new page table page only if the 2MB page mapping has been used. Also, refactor some assertions.	2008-07-12 21:24:42 +00:00
Alan Cox	85a0a1be91	In order to apply pmap_demote_pde() to a page directory entry (PDE) from the direct map, the PDE must have PG_M and PG_A preset. Noticed by: Magesh Dhasayyan	2008-07-12 18:43:57 +00:00
Alan Cox	e1cb4a353c	Extend pmap_demote_pde() to include the ability to instantiate a new page table page where none existed before.	2008-07-10 16:22:24 +00:00
Peter Wemm	401989b00b	Band-aid a problem with 32 bit selector setup. Initialize %ds, %es, and %fs during CPU startup. Otherwise a garbage value could leak to a 32-bit process if a process migrated to a different CPU after exec and the new CPU had never exec'd a 32-bit process. A more complete fix is needed, but this mitigates the most frequent manifestations. Obtained from: ups	2008-07-09 19:44:37 +00:00
Alan Cox	bb7964b205	Fix lines that are too long in pmap_growkernel() by substituting shorter but equivalent expressions.	2008-07-09 06:04:10 +00:00
Alan Cox	8136b7265f	Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one- time initialization of kernel_vm_end, not in its main body. (I could not, however, resist the urge to optimize the main body.) Reduce the number of preallocated page directory pages to just those needed to support NKPT page table pages. (In fact, this allows me to revert a couple of my earlier changes to create_pagetables().)	2008-07-08 22:59:17 +00:00
Alan Cox	6d2005e71a	Rev 180333, ``Change create_pagetables() and pmap_init() so that many fewer page table pages have to be preallocated ...'', violates an assumption made by minidumpsys(): kernel_vm_end is the highest virtual address that has ever been used by the kernel. Now, however, the kernel code, data, and bss may reside at addresses beyond kernel_vm_end. This revision modifies the upper bound on minidumpsys()'s two page table traversals to account for this possibility.	2008-07-08 04:00:22 +00:00
Xin LI	dbd47f1592	Add HWPMC_HOOKS to GENERIC kernels, this makes hwpmc.ko work out of the box.	2008-07-07 22:55:11 +00:00
Alan Cox	cc82a18b88	In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not reclaim a cached page. Simply put, there was no ordering between VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates VM_ALLOC_SYSTEM. While I'm here, teach pmap_growkernel() to request a prezeroed page. MFC after: 1 week	2008-07-07 17:25:09 +00:00
Alan Cox	4a7c66163b	Change create_pagetables() and pmap_init() so that many fewer page table pages have to be preallocated by create_pagetables().	2008-07-06 22:36:28 +00:00
Alan Cox	13e0058451	Increase the kernel map's size to 7GB, making room for a kmem map of size greater than 4GB. (Auto-sizing will set the ceiling on the kmem map size to 4.2GB.)	2008-07-05 20:44:55 +00:00
Alan Cox	0cbeb44158	Eliminate an unused declaration. (In fact, the declaration is bogus because the variable is defined static to pmap.c on i386.) Found by: CScout	2008-07-04 17:36:12 +00:00
Alan Cox	db0a9105b1	Increase the ceiling on the kmem map's size to 3.6GB. Also, define the ceiling as a fraction of the kernel map's size rather than an absolute quantity. Thus, scaling of the kmem map's size will be automatic with changes to the kernel map's size.	2008-07-03 04:53:14 +00:00
Alan Cox	c4a6405c88	Eliminate an unnecessary static variable: nkpt.	2008-07-02 05:41:23 +00:00
Alan Cox	17e2138882	Document the layout of the address space, borrowing heavily from http://lists.freebsd.org/pipermail/freebsd-amd64/2005-July/005578.html	2008-06-30 03:14:39 +00:00
Alan Cox	67ce249ac9	Compute NKPDPE from NKPT. This reduces the number of knobs that must be turned in order to change the size of the kernel virtual address space.	2008-06-30 02:35:55 +00:00
Alan Cox	ce3cb38836	Strictly speaking, the definition of VM_MAX_KERNEL_ADDRESS is wrong. However, in practice, the error (currently) makes no difference because the computation performed by KVADDR() hides the error. This revision fixes the error. Also, eliminate a (now) unused definition.	2008-06-29 19:13:27 +00:00
Alan Cox	f4f491d095	Increase the size of the kernel virtual address space to 6GB. Until the maximum size of the kmem map can be greater than 4GB, there is little point in making the kernel virtual address space larger than 6GB. Tested by: kris@	2008-06-29 18:35:00 +00:00
Ed Schouten	721351876c	Remove the unused major/minor numbers from iodev and memdev. Now that st_rdev is being automatically generated by the kernel, there is no need to define static major/minor numbers for the iodev and memdev. We still need the minor numbers for the memdev, however, to distinguish between /dev/mem and /dev/kmem. Approved by: philip (mentor)	2008-06-25 07:45:31 +00:00
Jung-uk Kim	b86977a5ab	Emit opcodes closer to GNU as(1) generated codes and micro-optimize.	2008-06-24 20:12:12 +00:00
Jung-uk Kim	292f013c88	Rehash and clean up BPF JIT compiler macros to match AT&T notations.	2008-06-23 23:09:52 +00:00
Alan Cox	bd4328d3a6	Ensure that KERNBASE is no less than the virtual address -2GB.	2008-06-23 15:22:53 +00:00
Alan Cox	0ee1368a96	Prepare for a larger kernel virtual address space. Specifically, once KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical memory allocated during bootstrap will be offset from the low-end of the kernel's page table.	2008-06-21 19:19:09 +00:00
Alan Cox	948c5cc27e	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, KERNBASE cannot change. However, KERNBASE is sometimes used as the start of the kernel virtual address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same address, there should be no visible effect from this change (yet). That said, kris@ has tested crash dumps under the full patch that increases the kernel virtual address space on amd64 to 6GB. Tested by: kris@	2008-06-20 20:59:31 +00:00
Xin LI	4d52a57549	Add et(4), a port of DragonFly's Agere ET1310 10/100/Gigabit Ethernet device driver, written by sephe@ Obtained from: DragonFly Sponsored by: iXsystems MFC after: 2 weeks	2008-06-20 19:28:33 +00:00
Alan Cox	293ab7c941	Make preparations for increasing the size of the kernel virtual address space on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, KERNBASE cannot change. However, KERNBASE is sometimes used as the start of the kernel virtual address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same address, there should be no visible effect from this change (yet).	2008-06-20 05:22:09 +00:00
Alan Cox	f9a4e9e4a9	Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A before PG_M. This sometimes prevents unnecessary removal of write access from a PTE. Overall, the net result is fewer demotions and promotion failures.	2008-06-13 19:33:56 +00:00
Alan Cox	9d1b7fa31f	Reverse the direction of pmap_promote_pde()'s traversal over the specified page table page. The direction of the traversal can matter if pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't been modified (PG_M). In general, if there are two or more such PTEs to choose among, it is better to write protect the one nearer the high end of the page table page rather than the low end. This is because most programs access memory in an ascending direction. The net result of this change is a sometimes significant reduction in the number of failed promotion attempts and the number of pages that are write protected by pmap_promote_pde().	2008-06-12 05:18:09 +00:00
Alan Cox	71e26e2c0e	Correct an error in pmap_promote_pde() that may result in an errant promotion within the kernel's address space. Specifically, pmap_promote_pde() is only called when the page table page (PTP) that is referenced by the given PDE has a full "use count", i.e., its wire_count is 512. Although this guarantees for a user address space that all 512 PTEs in the PTP hold valid mappings, the same is not true of the kernel's address space. A kernel PTP always has a use count of 512 regardless of the state of the PTEs. Therefore, pmap_promote_pde() should not assume (or assert) that the first PTE in the PTP is valid.	2008-06-01 07:36:59 +00:00
Pyun YongHyeon	20f99a5be4	Add jme(4) to the list of drivers supported by GENERIC kernel.	2008-05-27 02:22:32 +00:00
Bjoern A. Zeeb	2e598474fa	Remove ISDN4BSD (I4B) from HEAD as it is not MPSAFE and parts relied on the now removed NET_NEEDS_GIANT. Most of I4B has been disconnected from the build since July 2007 in HEAD/RELENG_7. This is what was removed: - configuration in /etc/isdn - examples - man pages - kernel configuration - sys/i4b (drivers, layers, include files) - user space tools - i4b support from ppp - further documentation Discussed with: rwatson, re	2008-05-26 10:40:09 +00:00
John Birrell	367f3ce5e6	Add the DTrace hooks for exception handling (Function boundary trace -fbt- provider), cyclic clock and syscalls.	2008-05-24 06:32:26 +00:00
Alan Cox	d1fdd63483	The VM system no longer uses setPQL2(). Remove it and its helpers.	2008-05-23 04:03:54 +00:00
Pyun YongHyeon	83a17b90eb	Add age(4) to the list of drivers supported by GENERIC kernel.	2008-05-19 02:30:27 +00:00
Alan Cox	1ec1304bdb	Retire pmap_addr_hint(). It is no longer used.	2008-05-18 04:16:57 +00:00
Remko Lodder	6e535f6e5b	Resort the if_ti driver to match the PCI Network cards instead of placing it under the mii devices list. PR: kern/123147 Submitted by: gavin Approved by: imp (mentor, implicit) MFC after: 3 days	2008-05-17 23:50:00 +00:00
Attilio Rao	13d4b2b0bc	Removed unused assembly offsets for structures digging.	2008-05-16 13:23:47 +00:00
Roman Divacky	7c0cc5f941	Regen. Approved by: kib (mentor)	2008-05-13 20:02:26 +00:00
Roman Divacky	4732e446fb	Implement robust futexes. Most of the code is modelled after what Linux does. This is because robust futexes are mostly userspace thing which we cannot alter. Two syscalls maintain pointer to userspace list and when process exits a routine walks this list waking up processes sleeping on futexes from that list. Reviewed by: kib (mentor) MFC after: 1 month	2008-05-13 20:01:27 +00:00
Alan Cox	ef4d480ced	Correct an error in pmap_align_superpage(). Specifically, correctly handle the case where the mapping is greater than a superpage in size but the alignment of the physical pages spans a superpage boundary.	2008-05-11 20:33:47 +00:00
Alan Cox	d3249b142b	Introduce pmap_align_superpage(). It increases the starting virtual address of the given mapping if a different alignment might result in more superpage mappings.	2008-05-09 16:48:07 +00:00
Sam Leffler	6c26723b19	enable IEEE80211_DEBUG and IEEE80211_AMPDU_AGE by default	2008-05-03 17:05:38 +00:00
Sam Leffler	3971d07be7	Intel 4965 wireless driver (derived from openbsd driver of the same name)	2008-04-29 21:36:17 +00:00
Alan Cox	26b77ff3b1	Always use PG_PS_FRAME to extract the physical address of a 2/4MB page from a PDE.	2008-04-25 16:00:39 +00:00
Jeff Roberson	6c47aaae12	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia	2008-04-25 05:18:50 +00:00
Roman Divacky	a6d043e30d	Implement linux_truncate64() syscall. Tested by: Aline de Freitas <aline@riseup.net> Approved by: kib (mentor)	2008-04-23 15:56:33 +00:00
Poul-Henning Kamp	9b4a8ab7ba	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.	2008-04-22 19:38:30 +00:00
Sam Leffler	b032f27c36	Multi-bss (aka vap) support for 802.11 devices. Note this includes changes to all drivers and moves some device firmware loading to use firmware(9) and a separate module (e.g. ral). Also there no longer are separate wlan_scan* modules; this functionality is now bundled into the wlan module. Supported by: Hobnob and Marvell Reviewed by: many Obtained from: Atheros (some bits)	2008-04-20 20:35:46 +00:00
Sam Leffler	f446360711	move awi to the Attic; it will not make the jump to the new world order Reviewed by: imp	2008-04-20 19:20:39 +00:00
Peter Wemm	3bf67daaf3	Put in a real isa_irq_pending() stub in order to remove two lines of dmesg noise from sio per unit. sio likes to probe if interrupts are configured correctly by looking at the pending bits of the atpic in order to put a non-fatal warning on the console. I think I'd rather read the pending bits from the apics, but I'm not sure its worth the hassle.	2008-04-19 07:25:57 +00:00
Jeff Roberson	66247efa5a	- Add inlines for the monitor and mwait instructions. Sponsored by: Nokia	2008-04-18 05:47:56 +00:00
Jung-uk Kim	01c3b1b200	Regenerate.	2008-04-16 19:27:36 +00:00
Jung-uk Kim	26833f3f9a	Add stubs for syscalls introduced in Linux 2.6.17 kernel. Some GNU libc version started using them before 2.6.17 was officially out. MFC after: 3 days	2008-04-16 19:25:39 +00:00
Warner Losh	917ac33d4e	This file is unused on amd64.	2008-04-15 02:10:14 +00:00
Poul-Henning Kamp	36bff1ebfb	Convert amd64 and i386 to share the atrtc device driver.	2008-04-14 08:00:00 +00:00
Rui Paulo	6f15a9e57a	Connect k8temp(4) to the build.	2008-04-12 14:20:22 +00:00
Jeff Roberson	9b33b154b5	- Add the interrupt vector number to intr_event_create so MI code can lookup hard interrupt events by number. Ignore the irq# for soft intrs. - Add support to cpuset for binding hardware interrupts. This has the side effect of binding any ithread associated with the hard interrupt. As per restrictions imposed by MD code we can only bind interrupts to a single cpu presently. Interrupts can be 'unbound' by binding them to all cpus. Reviewed by: jhb Sponsored by: Nokia	2008-04-11 03:26:41 +00:00
Alan Cox	f4d2c7f13e	Correct pmap_copy()'s method for extracting the physical address of a 2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME, not PG_FRAME, to extract the physical address of a 2/4MB page from a PDE. Change the last argument passed to pmap_pv_insert_pde() from a vm_page_t representing the first 4KB page of a 2/4MB page to the vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary conversion from a vm_paddr_t to a vm_page_t in pmap_copy().	2008-04-10 16:04:50 +00:00
Konstantin Belousov	50ad4fc65c	Regenerate	2008-04-08 09:51:19 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Alan Cox	109d493230	Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.	2008-04-07 07:38:02 +00:00
John Baldwin	1ee1b68792	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64	2008-04-05 19:58:30 +00:00
Alan Cox	2addc03d04	Eliminate an unnecessary test and its misleading comment from pmap_enter().	2008-04-04 18:00:22 +00:00
Alan Cox	bc8a0d87bd	Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The given pmap is never NULL, and therefore pmap_pml4e() can never return NULL. The pervasive use of these inline functions throughout the pmap makes these simple changes worthwhile.	2008-04-02 04:39:47 +00:00
Paul Saab	6e7534b8c8	Add support to mincore for detecting whether a page is part of a "super" page or not. Reviewed by: alc, ups	2008-03-28 04:29:27 +00:00
Doug Rabson	fa9d9930ca	Add kernel module support for nfslockd and krpc. Use the module system to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k' option to rpc.lockd and make kernel NLM the default. A user can still force the use of the old user NLM by building a kernel without NFSLOCKD and/or removing the nfslockd.ko module.	2008-03-27 11:54:20 +00:00
John Birrell	e483943791	When building a kernel module, define MAXCPU the same as SMP so that modules work with and without SMP.	2008-03-27 05:03:26 +00:00
Poul-Henning Kamp	dad3b6c6fd	Back in the good old days, PC's had random pieces of rock for frequency generation and what frequency the generated was anyones guess. In general the 32.768kHz RTC clock x-tal was the best, because that was a regular wrist-watch Xtal, whereas the X-tal generating the ISA bus frequency was much lower quality, often costing as much as several cents a piece, so it made good sense to check the ISA bus frequency against the RTC clock. The other relevant property of those machines, is that they typically had no more than 16MB RAM. These days, CPU chips croak if their clocks are not tightly within specs and all necessary frequencies are derived from the master crystal by means if PLL's. Considering that it takes on average 1.5 second to calibrate the frequency of the i8254 counter, that more likely than not, we will not actually use the result of the calibration, and as the final clincher, we seldom use the i8254 for anything besides BEL in syscons anyway, it has become time to drop the calibration code. If you need to tell the system what frequency your i8254 runs, you can do so from the loader using hw.i8254.freq or using the sysctl kern.timecounter.tc.i8254.frequency.	2008-03-26 22:12:00 +00:00
Poul-Henning Kamp	3a995824f6	Eliminate unnecessary #includes	2008-03-26 20:26:12 +00:00
Poul-Henning Kamp	e465985885	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
Poul-Henning Kamp	ebfbcd612a	Rename timer0_max_count to i8254_max_count. Rename timer0_real_max_count to i8254_real_max_count and make it static. Rename timer_freq to i8254_freq and make it a loader tunable.	2008-03-26 15:03:24 +00:00
Poul-Henning Kamp	f168bfa529	The RTC related pscnt and psdiv variables have no business being public.	2008-03-26 13:25:27 +00:00
Jung-uk Kim	cb7d38abf2	Belatedly add BPF_JITTER in NOTES for supported architectures.	2008-03-24 22:23:22 +00:00
Peter Wemm	f001eabf3a	First pass at (possibly futile) microoptimizing of cpu_switch. Results are mixed. Some pure context switch microbenchmarks show up to 29% improvement. Pipe based context switch microbenchmarks show up to 7% improvement. Real world tests are far less impressive as they are dominated more by actual work than switch overheads, but depending on the machine in question, workload, kernel options, phase of moon, etc, a few percent gain might be seen. Summary of changes: - don't reload MSR_[FG]SBASE registers when context switching between non-threaded userland apps. These typically cost 120 clock cycles each on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no faster on this. - The above change only helps unthreaded userland apps that tend to use the same value for gsbase. Threaded apps will get no benefit from this. - reorder things like accessing the pcb to be in memory order, to give prefetching a better chance of working. Operations are now in increasing memory address order, rather than reverse or random. - Push some lesser used code out of the main code paths. Hopefully allowing better code density in cache lines. This is probably futile. - (part 2 of previous item) Reorder code so that branches have a more realistic static branch prediction hint. Both Intel and AMD cpus default to predicting branches to lower memory addresses as being taken, and to higher memory addresses as not being taken. This is overridden by the limited dynamic branch prediction subsystem. A trip through userland might overflow this. - Futule attempt at spreading the use of the results of previous operations in new operations. Hopefully this will allow the cpus to execute in parallel better. - stop wasting 16 bytes at the top of kernel stack, below the PCB. - Never load the userland fs/gsbase registers for kthreads, but preserve curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!) Microbenchmarking this code seems to be really sensitive to things like scheduling luck, timing, cache behavior, tlb behavior, kernel options, other random code changes, etc. While it doesn't help heavy userland workloads much, it does help high context switch loads a little, and should help those that involve switching via kthreads a bit more. A special thanks to Kris for the testing and reality checks, and Jeff for tormenting me into doing this. :) This is still work-in-progress.	2008-03-23 23:09:06 +00:00
Alan Cox	58680920e9	Correct an error in pmap_mincore() when applied to a 2MB page mapping: Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the 2MB physical page from the PDE.	2008-03-23 23:04:09 +00:00
Peter Wemm	22c0c6e9d3	Export TDP_KTHREAD to asm files.	2008-03-23 22:46:37 +00:00
Peter Wemm	6c73bb3557	Move pcb_flags to make trivially better use of cache lines.	2008-03-23 22:45:51 +00:00
Peter Wemm	3d60169ef4	Protect the setting of the fsbase/gsbase MSR registers and the pcb_[fg]sbase values with a critical section, like the rest of the kernel.	2008-03-23 22:44:56 +00:00
Alan Cox	702006ff76	To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. However, these changes enable me to remove a work-around from pmap_promote_pde(), the superpage promotion procedure. (Note: The AMD processors that we have tested, including the latest, the Phenom, still exhibit the historical behavior.) Acknowledgments: After I observed the problem, Stephan (ups) was instrumental in characterizing the exact behavior of Intel's recent TLBs. Tested by: Peter Holm	2008-03-23 20:38:01 +00:00
Konstantin Belousov	3f7905d29c	Prevent the overflow in the calculation of the next page directory. The overflow causes the wraparound with consequent corruption of the (almost) whole address space mapping. As Alan noted, pmap_copy() does not require the wrap-around checks because it cannot be applied to the kernel's pmap. The checks there are included for consistency. Reported and tested by: kris (i386/pmap.c:pmap_remove() part) Reviewed by: alc MFC after: 1 week	2008-03-23 07:07:27 +00:00
John Baldwin	eb2b0540e5	Explicitly use spinlock_enter/exit rather than locking the icu_lock spin lock in the 8259A drivers as these drivers are only used on UP systems. This slightly reduces the penalty of an SMP kernel (such as GENERIC) on a UP x86 machine.	2008-03-20 21:53:27 +00:00
John Baldwin	dcc8106854	Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ resource to a CPU. The default method is to pass the request up to the parent similar to BUS_CONFIG_INTR() so that all busses don't have to explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine similar to bus_setup/teardown_intr() is added for device drivers to use. Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource must be allocated, but it can happen in any order with respect to bus_setup_intr(). Currently it is only supported on amd64 and i386 via nexus(4) methods that simply call the intr_bind() routine. Tested by: gallatin	2008-03-20 21:24:32 +00:00
John Baldwin	6d2d1c044f	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}	2008-03-17 22:42:01 +00:00
Pawel Jakub Dawidek	6eb4157ffc	Implement atomic_fetchadd_long() for all architectures and document it. Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)	2008-03-16 21:20:50 +00:00
Roman Divacky	d8653dd986	Regen.	2008-03-16 16:29:37 +00:00
Roman Divacky	5dfb688191	Implement sched_setaffinity and get_setaffinity using real cpu affinity setting primitives. Reviewed by: jeff Approved by: kib (mentor)	2008-03-16 16:27:44 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
John Baldwin	eaf86d1678	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
John Baldwin	c9107e85d9	Fix a silly bogon which prevented all the CPUs that are tagged as interrupt receivers from being given interrupts if any CPUs in the system were not tagged as interrupt receivers that I introduced when switching the x86 interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC IDs. In practice this only affects systems with Hyperthreading (though disabling HTT in the BIOS would workaround the issue) as that is the only case currently where one can have CPUs that aren't tagged as interrupt receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical CPUs of which 2 were interrupt receivers) the result was that all device interrupts were sent to CPU 0. MFC after: 1 week Pointy hat to: jhb	2008-03-14 03:44:42 +00:00
John Baldwin	5217af301c	Rework how the nexus(4) device works on x86 to better handle the idea of different "platforms" on x86 machines. The existing code already handles having two platforms: ACPI and legacy. However, the existing approach was rather hardcoded and difficult to extend. These changes take the approach that each x86 hardware platform should provide its own nexus(4) driver (it can inherit most of its behavior from the default legacy nexus(4) driver) which is responsible for probing for the platform and performing appropriate platform-specific setup during attach (such as adding a platform-specific bus device). This does mean changing the x86 platform busses to no longer use an identify routine for probing, but to move that logic into their matching nexus(4) driver instead. - Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it can be overriden. - Expose a nexus_init_resources() routine which initializes the various resource managers so that subclassed nexus(4) drivers can invoke it from their attach routine. - The legacy nexus(4) driver explicitly adds a legacy0 device in its attach routine. - The ACPI driver no longer contains an new-bus identify method. Instead it exposes a public function (acpi_identify()) which is a probe routine that the MD nexus(4) drivers can use to probe for ACPI. All of the probe logic in acpi_probe() is now moved into acpi_identify() and acpi_probe() is just a stub. - On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via acpi_identify() and claims the nexus0 device if the probe succeeds. It then explicitly adds an acpi0 device in its attach routine. - The legacy(4) driver no longer knows anything about the acpi0 device. - On ia64 if acpi_identify() fails you basically end up with no devices. This matches the previous behavior where the old acpi_identify() would fail to add an acpi0 device again leaving you with no devices. Discussed with: imp Silence on: arch@	2008-03-13 20:39:04 +00:00
Konstantin Belousov	22eca0bf45	Since version 4.3, gcc changed its behaviour concerning the i386/amd64 ABI and the direction flag, that is it now assumes that the direction flag is cleared at the entry of a function and it doesn't clear once more if needed. This new behaviour conforms to the i386/amd64 ABI. Modify the signal handler frame setup code to clear the DF {e,r}flags bit on the amd64/i386 for the signal handlers. jhb@ noted that it might break old apps if they assumed DF == 1 would be preserved in the signal handlers, but that such apps should be rare and that older versions of gcc would not generate such apps. Submitted by: Aurelien Jarno <aurelien aurel32 net> PR: 121422 Reviewed by: jhb MFC after: 2 weeks	2008-03-13 10:54:38 +00:00
John Baldwin	391664b110	The variable MTRR registers actually have variable-sized PhysBase and PhysMask fields based on the number of physical address bits supported by the current CPU. The old code assumed 36 bits on i386 and 40 bits on amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits. In at least one case (the new Intel CPU) having the size of the mask field wrong resulted in writing questionable values into the MTRR registers on the application processors (BSP as well if you modify the MTRRs via memcontrol or running X, etc.). The result of the questionable physmask was that all of memory was apparently treated as uncached rather than write-back resulting in a very significant performance hit. Fix this by constructing a run-time mask for the PhysBase and PhysMask fields based on the number of physical address bits supported by the CPU. All 64-bit capable CPUs provide a count of PA bits supported via the 0x80000008 extended CPUID feature, so use that if it is available. If that feature is not available, then assume 36 PA bits. While I'm here, expand the (now-unused) macros for the PhysBase and PhysMask fields to the current largest possible value (52 PA bits). MFC after: 1 week PR: i386/120516 Reported by: Nokia	2008-03-12 22:09:19 +00:00
John Baldwin	f15a9cd288	Minimize diffs with i686_mem.c: - A few whitespace changes I missed in the style(9) changes. - Move M_MEMDESC to mem.c.	2008-03-12 21:43:50 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
John Baldwin	1b085fde87	Style(9) these files. No changes in the compiled code. (Verified by diff'ing objdump -d output).	2008-03-11 21:41:36 +00:00
John Baldwin	336d8e5536	Add constants for the various fields in MTRR registers. MFC after: 1 week Verified by: md5(1)	2008-03-11 20:10:37 +00:00
John Baldwin	463e0f91cb	Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows the cpufreq drivers to reliably use properties of PCI devices for quirks, etc. - For the legacy drivers, add CPU devices via an identify routine in the CPU driver itself rather than in the legacy driver's attach routine. - Add CPU devices after Host-PCI bridges in the acpi bus driver. - Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and check its device ID rather than having a bogus PCI attachment that only checked for the ID in probe and always failed. As a side effect, you can now kldload ichss after boot. - Fix the ichss(4) driver to use the correct device_t for the ICH (and not for ichss0) when doing PCI config space operations to enable SpeedStep. MFC after: 2 weeks Reviewed by: njl, Andriy Gapon avg of icyb.net.ua	2008-03-10 22:18:07 +00:00
Jeff Roberson	32c9d3a767	- Rather than repeating the same preemption code everywhere call the scheduler specific sched_preempt() routine.	2008-03-10 01:32:48 +00:00
Rink Springer	2e7328e7cc	Import uslcom(4) from OpenBSD - this is a driver for Silicon Laboratories CP2101/CP2102 based USB serial adapters. Reviewed by: imp, emaste Obtained from: OpenBSD MFC after: 2 weeks	2008-03-05 14:13:30 +00:00
Alan Cox	0116b8b321	Add support for automatic promotion of 4KB page mappings to 2MB page mappings. Automatic promotion can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled. (Expect this to change.) Reviewed by: ups Tested by: kris, Peter Holm	2008-03-04 18:50:15 +00:00
Jeff Roberson	81aa71755b	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
Ruslan Ermilov	58eefce0e6	Eliminate whitespace diffs to the i386 version.	2008-02-19 06:30:49 +00:00
Scott Long	7bbd40c57e	Teach the dump and minidump code to respect the maxioszie attribute of the disk; the hard-coded assumption of 64K doesn't work in all cases.	2008-02-15 06:26:25 +00:00
Scott Long	54f8dbc48f	If busdma is being used to realign dynamic buffers and the alignment is set to PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't reserve any pages. Adjust to be correct. Review of other architectures is forthcoming. Submitted by: Joseph Golio	2008-02-12 16:24:30 +00:00
Jung-uk Kim	865df544c6	Fix Linux mmap with MAP_GROWSDOWN flag. Reported by: Andriy Gapon (avg at icyb dot net dot ua) Tested by: Andriy Gapon (avg at icyb dot net dot ua) Pointyhat: me MFC after: 3 days	2008-02-11 19:35:03 +00:00
Scott Long	593c873471	Remove the rr232x driver. It has been superceded by the hptrr driver.	2008-02-03 07:07:30 +00:00
David Schultz	2cb2359632	Add a few more CPUID feature bits while here. We don't support these features yet.	2008-02-02 23:17:27 +00:00
David Schultz	67f6aa5ccf	SSE4 CPUID bits	2008-02-02 22:40:17 +00:00
John Baldwin	7157eae462	For no good reason I had assumed that ACPI table headers would be page aligned (or at least not cross a page boundary). However, it turns out that on at least one machine one table header does cross a page boundary. This caused problems with the MADT early probe as it uses the crash dump map to load ACPI tables by loading the RSDT/XSDT into pages 1 ... N and loading the header of each ACPI table header into page 0 looking for the MADT. However, if a table header crossed a page boundary, then page 1 would get trashed resulting in a panic. Fix this by reserving the first 2 pages for ACPI table headers (headers are less than a page in size, so 2 pages will be sufficient) and use pages 2 .. N for the RSDT and XSDT. Note: amd64 should probably be simplified to just use pmap_mapbios() for all these tables which will use the direct map and not need the crash dump hack. MFC after: 5 days Tested on: i386 Reported by: Pete French petefrench of ticketswitch.com	2008-01-31 16:51:43 +00:00
Alexander Motin	2a57ca33c7	Move GET_STACK_USAGE from MI header to i386/amd64 MD ones. Somebody who can, please feel free to implement it for other archs or copy this one if it suits.	2008-01-31 08:24:27 +00:00
Ruslan Ermilov	007b1b7bae	Add a wrapper function that bound checks writes to the dump device.	2008-01-28 19:04:07 +00:00
John Baldwin	c05655bfda	Use cpu_spinwait() (i.e., "pause") when spinning on rdtsc during DELAY(). MFC after: 1 week	2008-01-17 18:59:38 +00:00
Alan Cox	6634dbbde4	Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions.	2008-01-17 18:25:52 +00:00
Bruce Evans	a4b679d859	Translate from the i386. All FP constants and operations are evaluated in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD said that they were evalated in the "interesting" (buggy) i387 methods. float_t was broken compatibly with FLT_EVAL_METHOD. These definitions seem to be broken on powerpc and possibly on arm. float_t is float on powerpc with gcc [-notraditional] according to glibc, and FLT_EVAL_METHOD is marked with XXX on arm.	2008-01-17 13:12:46 +00:00
Alan Cox	dd9d15f294	Make pmap_is_prefaultable() more TLB friendly. Specifically, make it use the kernel's direct map instead of the pmap's recursive mapping to access the lowest level in the page table. The direct map is preferable for two reasons: (1) The TLB is more likely to hold the required direct mapping because pmap_enter() has already used the direct map to access a nearby PTE and (2) loading a direct mapping into the TLB involves walking only 2 or 3 levels of the page table instead of 4.	2008-01-14 21:25:06 +00:00
Bruce Evans	31e30d75d5	Fix fpset() to not trap if there is a currently unmasked exception. Unmasked exceptions (which can be fixed up using fpset() before they trap) are very rare, especially on amd64 since SSE exceptions trap synchronously, but I want to merge the faster amd64 implementations of fpset*() back to i386 without introducing the bug on i386. The i386 implementation has always avoided the trap automatically by changing things using load/store of the FP environment, but this is very slow. Most changes only affect the control word, so they can usually be done much more efficiently, and amd64 has always done this, but loading the control word can trap. This version use the fast method only in the usual case where it will not trap. This only costs a couple of integer instructions (including one branch which I haven't optimized carefully yet) in the usual case, but bloats the inlines a lot. The inlines were already a bit too large to handle both the FPU and SSE.	2008-01-11 17:11:32 +00:00
Bruce Evans	548868b38d	Fix some style bugs: - fix a previous style fix: shifts should be in the correct direction even if they are null. - restore a comment about namespace pollution from floatingpoint.h 1.12 and update it. - remove unused namespace pollution FP_*REG. - improve some comments. - sort macro definitions for entry points. - don't use underscores for macro args.	2008-01-11 14:11:46 +00:00
Bruce Evans	0714d1a223	Simplify the ifdefs: - fix this to compile with C++ by casting ints to enums in a few places and by using the correct parameter type for _fpsetprec(). Remove __cplusplus ifdefs which disabled the buggy code. - remove __CC_SUPPORTS___INLINE ifdefs. `__inline' vs `inline', and either of these #defined away, are supposed to be handled by very old ifdefs in <sys/cdefs.h>. Thus the __CC_SUPPORTS___INLINE macro is not needed here (or anywhere else that it used). It is less needed here than in most places, since this file is userland-only and userland is far from supporting INTEL_COMPILER. The __CC_SUPPORTS___INLINE__ macro which was used here is even less needed. It is to support spelling `inline' as `__inline__' instead of the usual spelling `__inline'. Fix some style bugs that I missed in the previous commit (remove unused asms and sort more variables).	2008-01-09 15:03:03 +00:00
Bruce Evans	a2de358449	Fix some style bugs (mainly, use explicit shifts when accessing bit-fields even if the shift count happens to be 0, sort declarations, and spell __inline normally).	2008-01-09 13:35:31 +00:00
Bruce Evans	fe26672a8f	Improve some comments.	2008-01-09 10:42:47 +00:00
Alan Cox	fa093ee242	Convert a PMAP_DIAGNOSTIC to a KASSERT.	2008-01-08 08:30:30 +00:00
John Baldwin	5965c4b71c	Add COMPAT_FREEBSD7 and enable it in configs that have COMPAT_FREEBSD6.	2008-01-07 21:40:11 +00:00
Alan Cox	5cccf58676	Shrink the size of struct vm_page on amd64 and i386 by eliminating pv_list_count from struct md_page. Ever since Peter rewrote the pv entry allocator for amd64 and i386 pv_list_count has been correctly maintained but otherwise unused.	2008-01-06 18:51:04 +00:00
Alan Cox	eb2a051720	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.	2008-01-03 07:34:34 +00:00
Alan Cox	86f1449310	Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel() instead of writing apologetic comments. As it turns out, I need every kernel page table page to have a legitimate pindex to support superpage promotion on kernel memory. Correct a nearby style error: Pointers should be compared to NULL.	2008-01-02 08:54:39 +00:00
Rui Paulo	d9aa6eb4fe	Add asmc(4). Requested by: njl (mentor)	2007-12-28 22:50:04 +00:00
Alan Cox	b8e7fc24fe	Add configuration knobs for the superpage reservation system. Initially, the reservation will only be enabled on amd64.	2007-12-27 16:45:39 +00:00
Robert Watson	3de213cc00	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.	2007-12-25 17:52:02 +00:00
Scott Long	b063a42270	Add the 'hptrr' driver for supporting the following Highpoint RocketRAID cards: o RocketRAID 172x series o RocketRAID 174x series o RocketRAID 2210 o RocketRAID 222x series o RocketRAID 2240 o RocketRAID 230x series o RocketRAID 231x series o RocketRAID 232x series o RocketRAID 2340 o RocketRAID 2522 Many thanks to Highpoint for their continued support of FreeBSD. Submitted by: Highpoint	2007-12-15 00:56:17 +00:00
Rui Paulo	319b564536	Disallow the legacy USB circuit to generate an SMI# via an ICH register (MacBooks only). This allows MacBooks to boot in SMP mode without any trick and solves the timer problems with HZ=1000. MFC after: 1 week Reviewed by: njl (mentor), jhb Approved by: njl (mentor), jhb	2007-12-12 20:24:06 +00:00
Alan Cox	dbfb54ffea	Eliminate compilation warnings due to the use of non-static inlines through the introduction and use of the __gnu89_inline attribute. Submitted by: bde (i386) MFC after: 3 days	2007-12-09 21:00:36 +00:00
Alan Cox	7501865c53	Use 1GB virtual pages to implement the direct map on architectures that support this feature. Wrap a nearby line that is too long. MFC after: 6 weeks	2007-12-08 21:48:27 +00:00
Alan Cox	4ad863249b	Recognize architectural support for 1GB virtual pages. MFC after: 6 weeks	2007-12-08 21:13:01 +00:00
Joseph Koshy	d07f36b075	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.	2007-12-07 08:20:17 +00:00
Konstantin Belousov	d24031dd0c	Fix the ABI change of the signal delivered on the access to the page with insufficient protection mode. For the i386 and amd64, create the tunable, machdep.prot_fault_translation, with the following behaviour: 0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE from vm_fault based on the ELF OSABI note: no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR 1 = always SIGBUS/BUS_PAGE_FAULT 2 = always SIGSEGV/SEGV_ACCERR This would do mostly automatic correction of ABI breakage, with the exception of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For them, sysctl would allow to run the binary with manual settings. Discussed with: portmgr (kris) PR: kern/118304 MFC after: 3 days	2007-12-04 12:33:03 +00:00
Alan Cox	491bc4fe00	Style change: Use NULL rather than 0 where appropriate.	2007-12-04 08:17:04 +00:00
Robert Watson	3c90d1ea74	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
Poul-Henning Kamp	d31fc8ce59	Remove XRPU driver, after asking all the users.	2007-12-01 20:07:45 +00:00
Alan Cox	58041e4b9c	Improve get_pv_entry()'s handling of low-memory conditions. After page allocation fails and pv entries are reclaimed, there may be an unused pv entry in a pv chunk that survived the reclamation. However, previously, after reclamation, get_pv_entry() did not look for an unused pv entry in a surviving pv chunk; it simply retried the page allocation. Now, it does look for an unused pv entry before retrying the page allocation. Note: This only applies to RELENG_7. Earlier branches use a different pv entry allocator. MFC after: 6 weeks	2007-11-30 07:14:42 +00:00
Bruce Evans	d5c90663b2	Don't use plain "ret" instructions at targets of jump instructions, since the branch caches on at least Athlon XP through Athlon 64 CPU's don't understand such instructions and guarantee a cache miss taking at least 10 cycles. Use the documented workaround "ret $0" instead ("nop; ret" also works, but "ret $0" is probably faster on old CPUs). Normal code (even asm code) doesn't branch to "ret", since there is usually some cleanup to do, but the __mcount, .mcount and .mexitcount entry points were optimized too well to have the minimum number of instructions (3 instructions each if profiling is not enabled) and they did this. I didn't see a significant number of cache misses for .mexitcount, but for the shared "ret" for __mcount and .mcount I observed cache misses costing 26 cycles each. For a send(2) syscall that makes about 70 function calls, the cost of these cache misses alone increased the syscall time from about 4000 cycles to about 7000 cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled; after this fix, configuring profiling only costs about 600 cycles in the 4000, which is consistent with almost perfect branch prediction in the mcounting calls.	2007-11-29 02:01:21 +00:00
Bruce Evans	7e7c8806bf	Remove entry points for -finstrument functions since they are currently unused except to obfuscate disassemblies. -mprofiler-epilogue is currently with gcc-4 (it does too little), but -finstrument-functions is broken in a different way (it does too much). amd64 version: meger whitespace fixes from i386 version.	2007-11-29 01:15:03 +00:00
Alan Cox	b3e2a63fa6	Account for pv entry pages in the total number of wired pages. (Note: pv entry pages have always been included in the total number of wired pages on i386 just not amd64.) MFC after: 6 weeks	2007-11-28 22:41:14 +00:00
John Baldwin	98bbce55fa	Adjust the code to probe for the PCI config mechanism to use. - On amd64, just assume type #1 is always used. PCI 2.0 mandated deprecated type #2 and required type #1 for all future bridges which was well before amd64 existed. - For i386, ignore whatever value was in 0xcf8 before testing for type #1 and instead rely on the other tests to determine if type #1 works. Some newer machines leave garbage in 0xcf8 during boot and as a result the kernel doesn't find PCI at all (which greatly confuses ACPI which expects PCI to exist when PCI busses are in the namespace). MFC after: 3 days Discussed with: scottl	2007-11-28 22:20:08 +00:00
Attilio Rao	573c6b82df	Make ADAPTIVE_GIANT as the default in the kernel and remove the option. Currently, Giant is not too much contented so that it is ok to treact it like any other mutexes. Please don't forget to update your own custom config kernel files. Approved by: cognet, marcel (maintainers of arches where option is not enabled at the moment)	2007-11-28 05:50:45 +00:00
John Baldwin	23d34db956	Remove the 'needbounce' variable from the _bus_dmamap_load_buffer() routine. It is not needed as the existing tests for segment coalescing already handle bounced addresses and it prevents legal segment coalescing in certain edge cases. MFC after: 1 week Reviewed by: scottl	2007-11-27 17:28:12 +00:00
Joseph Koshy	4c8e514bdc	MFP4: Add assembly language symbols used by hwpmc(4)'s callchain capture.	2007-11-23 03:03:30 +00:00
Scott Long	8611774e5e	Extend critical section coverage in the low-level interrupt handlers to include the ithread scheduling step. Without this, a preemption might occur in between the interrupt getting masked and the ithread getting scheduled. Since the interrupt handler runs in the context of curthread, the scheudler might see it as having a such a low priority on a busy system that it doesn't get to run for a _long_ time, leaving the interrupt stranded in a disabled state. The only way that the preemption can happen is by a fast/filter handler triggering a schduling event earlier in the handler, so this problem can only happen for cases where an interrupt is being shared by both a fast/filter handler and an ithread handler. Unfortunately, it seems to be common for this sharing to happen with network and USB devices, for example. This fixes many of the mysterious TCP session timeouts and NIC watchdogs that were being reported. Many thanks to Sam Lefler for getting to the bottom of this problem. Reviewed by: jhb, jeff, silby	2007-11-21 04:03:51 +00:00
Alan Cox	59677d3c0e	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks	2007-11-17 22:52:29 +00:00
John Baldwin	185250da23	Add support for cross double fault frames in stack traces: - Populate the register values for the trapframe put on the stack by the double fault handler. - Teach DDB's trace routine to treat a double fault like other trap frames. MFC after: 3 days	2007-11-15 22:00:57 +00:00
Marcel Moolenaar	0c3967e7fe	o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc(). i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc(). PR: ia64/118024	2007-11-14 20:21:54 +00:00
Julian Elischer	e01eafef2a	A bunch more files that should probably print out a thread name instead of a process name.	2007-11-14 06:51:33 +00:00
Julian Elischer	431f890614	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.	2007-11-14 06:21:24 +00:00
Benjamin Close	037347714a	Link wpi(4) into the build. This includes: o mtree (for legal/intel_wpi) o manpage for i386/amd64 archs o module for i386/amd64 archs o NOTES for i386/amd64 archs Approved by: mlaier (comentor)	2007-11-08 22:09:37 +00:00
Alan Cox	605385f843	Add comments explaining why all stores updating a non-kernel page table must be globally performed before calling any of the TLB invalidation functions. With one exception, on amd64, this requirement was already met. Fix this one case. Also, as a clarification, change an existing atomic op into a release. (Suggested by: jhb) Reported and reviewed by: ups MFC after: 3 days	2007-11-05 18:13:34 +00:00
Konstantin Belousov	89b57fcf01	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb	2007-11-05 11:36:16 +00:00
Alan Cox	6afd4b92f7	Eliminate spurious "Approaching the limit on PV entries, ..." warnings. Specifically, whenever vm_page_alloc(9) returned NULL to get_pv_entry(), we issued a warning regardless of the number of pv entries in use. (Note: The older pv entry allocator in RELENG_6 does not have this problem.) Reported by: Jeremy Chadwick Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry(). This was a holdover from earlier times when the page daemon was responsible for the reclamation of pv entries. MFC after: 5 days	2007-11-03 05:15:26 +00:00
Peter Wemm	c8b14fa8f0	Move nvram out of DEFAULTS. There really isn't a lot of justification for consuming the memory. The module works just fine in the unlikely case that this is needed. It can still be compiled into a custom kernel.	2007-10-29 22:19:08 +00:00
John Baldwin	8518d50a63	- Add constants for the different memory types in the SMAP table. - Use the SMAP types and constants from <machine/pc/bios.h> in the boot code rather than duplicating it.	2007-10-28 21:23:49 +00:00
John Baldwin	54a3fb6f8f	Don't test the APIC flag in the cpuid features for amd64 to see if a local APIC is present or not. All amd64 CPUs have a local APIC and some BIOSen don't set the CPUID_APIC flag. MFC after: 1 week	2007-10-27 13:34:53 +00:00
Peter Wemm	d556638404	Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not refactored it to be a generic device. Instead of being part of the standard kernel, there is now a 'nvram' device for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off with 'nodevice nvram'. This matches the previous behavior when it was first committed.	2007-10-26 03:23:54 +00:00
Warner Losh	47e87d5ad0	Ooops. Put back Invariants and witness Submitted by: csjp	2007-10-26 02:35:42 +00:00
Warner Losh	97816f8e3d	Add usb serial devices by default. I'm tired of telling people how to do this that should know better :-).	2007-10-26 02:20:29 +00:00
John Baldwin	e0f5da6d08	Update copyright attribution. MFC after: 3 days	2007-10-24 21:16:22 +00:00
Ken Smith	95b55771b2	Switch over to ULE as the default scheduler for amd64 and i386 architectures.	2007-10-19 12:30:33 +00:00
Alexander Leidinger	9f05d312b3	Backout sensors framework. Requested by: phk Discussed on: cvs-all	2007-10-15 20:00:24 +00:00
Alexander Leidinger	989500bf1a	Import it(4) and lm(4), supporting most popular Super I/O Hardware Monitors. Submitted by: Constantine A. Murenin <cnst@FreeBSD.org> Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors) Mentored by: syrinx Tested by: many OKed by: kensmith Obtained from: OpenBSD (parts)	2007-10-14 10:55:50 +00:00
Marius Strobl	55aaf894e8	Make the PCI code aware of PCI domains (aka PCI segments) so we can support machines having multiple independently numbered PCI domains and don't support reenumeration without ambiguity amongst the devices as seen by the OS and represented by PCI location strings. This includes introducing a function pci_find_dbsf(9) which works like pci_find_bsf(9) but additionally takes a domain number argument and limiting pci_find_bsf(9) to only search devices in domain 0 (the only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order to no longer report false positives when searching for siblings and dupe devices in the same domain respectively. Along with this change the sole host-PCI bridge driver converted to actually make use of PCI domain support is uninorth(4), the others continue to use domain 0 only for now and need to be converted as appropriate later on. Note that this means that the format of the location strings as used by pciconf(8) has been changed and that consumers of <sys/pciio.h> potentially need to be recompiled. Suggested by: jhb Reviewed by: grehan, jhb, marcel Approved by: re (kensmith), jhb (PCI maintainer hat)	2007-09-30 11:05:18 +00:00
Christian Brueffer	4fabde5686	Use the correct expanded name for SCTP. PR: 116496 Submitted by: koitsu Reviewed by: rrs Approved by: re (kensmith)	2007-09-26 20:05:07 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Attilio Rao	c8790f5d09	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re	2007-09-20 20:38:43 +00:00
Konstantin Belousov	96a2b63525	Fill in cr2 in the signal context from ksi->ksi_addr. Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR. Submitted by: rdivacky Suggested by: jhb Sponsored by: Google Summer of Code 2007 PR: kern/77710 Approved by: re (kensmith)	2007-09-20 13:46:26 +00:00
David Malone	3ab8526963	The kernel version of Linux statfs64 is actually supposed to take 3 arguments, but we had forgotten the second argument. Also make the Linux statfs64 struct depend on the architecture because it has an extra 4 bytes padding on amd64 compared to i386. The three argument fix is from David Taylor, the struct statfs64 stuff is my fault. With this patch I can install i386 Linux matlab on an amd64 machine. Submitted by: David Taylor <davidt_at_yadt.co.uk> Approved by: re (kensmith)	2007-09-18 19:50:33 +00:00
Peter Wemm	8bff6a112b	Fix an undefined symbol that as/ld neglected to flag as a problem. It was used in assembler code in such a way that no unresolved relocation records were generated, so ld didn't flag the problem. You can see this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems. The impact of this is that the intrcount/intrnames arrays do not have the intended amount of space reserved. This could lead to interesting problems due to the arrays being present in the middle of kernel code. An overflow would be rather interesting as executable code would be used as per-cpu incrementing interrupt counters. This fixes it for now by exporting MAXCPU to the assembler. A better fix might be to define these data structures in C - they're only referenced in the kernel from C code these days anyway. Approved by: re (kensmith)	2007-09-17 21:55:28 +00:00
Jeff Roberson	b61ce5b0e6	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)	2007-09-17 05:31:39 +00:00
Alan Cox	6bce07ae73	It has been observed on the mailing lists that the different categories of pages don't sum to anywhere near the total number of pages on amd64. This is for the most part because uma_small_alloc() pages have never been counted as wired pages, like their kmem_malloc() brethren. They should be. This changes fixes that. It is no longer necessary for the page queues lock to be held to free pages allocated by uma_small_alloc(). I removed the acquisition and release of the page queues lock from uma_small_free() on amd64 and ia64 weeks ago. This patch updates the other architectures that have uma_small_alloc() and uma_small_free(). Approved by: re (kensmith)	2007-09-15 18:47:02 +00:00
Attilio Rao	4486adc51f	Currently the LO_NOPROFILE flag (which is masked on upper level code by per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is not really honoured. In particular lock_profile_obtain_lock_failure() and lock_profile_obtain_lock_success() are naked respect this flag. The bug leads to locks marked with no-profiling to be profiled as well. In the case of the clock_lock, used by the timer i8254 this leads to unpredictable behaviour both on amd64 and ia32 (double faults panic, sudden reboots, etc.). The amd64 clock_lock is also not marked as not profilable as it should be. Fix these bugs adding proper checks in the lock profiling code and at clock_lock initialization time. i8254 bug pointed out by: kris Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it> Approved by: jeff (mentor) Approved by: re	2007-09-14 01:12:39 +00:00
Attilio Rao	0b2e598c14	This is a follow-up, cleaning-up commit about recent changes involving topology foo functions. Working at the patch for topology problems in ia32/amd64 evicted some problems regarding functions ordering in the SI_SUB_CPU family of SYSINIT'ed subsystems. In order to avoid problems with new modified to involved functions, a correct ordering is not semantically specified for SI_SUB_CPU functions (for a larger view of the issue please visit: http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html ) Discussed with: peter Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org> Approved by: jeff Approved by: re	2007-09-11 22:54:09 +00:00
Konstantin Belousov	0e6ed4feab	Regenerate. Approved by: re (kensmith)	2007-08-28 12:36:23 +00:00
Konstantin Belousov	b6e645c90f	Implement fake linux sched_getaffinity() syscall to enable java to work with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets native scheduler affinity syscalls. Submitted by: rdivacky Reviewed by: jkim Sponsored by: Google Summer of Code 2007 Approved by: re (kensmith)	2007-08-28 12:26:35 +00:00
Joseph Koshy	ea49750231	Assign sizes to assembly language support functions. Approved by: re (kensmith)	2007-08-22 05:06:14 +00:00
Joseph Koshy	298889efcb	Define an END() macro for use in i386 and amd64 assembly code, akin to the one available on the ia64, sparc64, and sun4v architectures. Approved by: re (kensmith)	2007-08-22 04:26:07 +00:00
Alan Cox	8beae25391	In general, when we map a page into the kernel's address space, we no longer create a pv entry for that mapping. (The two exceptions are mappings into the kernel's exec and pipe submaps.) Consequently, there is no reason for get_pv_entry() to dig deep into the free page queues, i.e., use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to reclaim pv entries. Approved by: re (kensmith)	2007-08-21 04:59:34 +00:00
Dag-Erling Smørgrav	83d18f2283	Add a driver for the on-die digital thermal sensor found on Intel Core and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The driver attaches to each cpu device and creates a sysctl node in that device's sysctl context (dev.cpu.N.temperature). When invoked, the handler binds to the appropriate CPU to ensure a correct reading. Submitted by: Rui Paulo <rpaulo@fnop.net> Sponsored by: Google Summer of Code 2007 Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH Approved by: re (kensmith) MFC after: 3 weeks	2007-08-15 19:26:03 +00:00
Peter Wemm	b7778ae08f	Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to cpu_start_mp(). This is after we have read the cpuid registers to calculate the hyperthreading_cpus value for the sysctl that enables or disables hyperthread cores. Change mp_topology() to use that information rather than trying to do it itself. This solves the problem of ULE being incorrectly told that dual core Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least, we now have a single piece of code to identify hyperthreading. Obtained from: jhb Approved by: re (kensmith)	2007-08-02 21:17:58 +00:00
John Baldwin	de016534a8	If the trap number stored in the trapframe is corrupted into a negative value, then we would use a negative index into the trap_msg[] array resulting in a nested page fault. Make the 'type' variable holding the trap number unsigned to avoid this. MFC after: 2 weeks Approved by: re (rwatson)	2007-07-26 15:32:55 +00:00
David Malone	6d8617d42a	If clock_ct_to_ts fails to convert time time from the real time clock, print a one line error message. Add some comments on not being able to trust the day of week field (I'll act on these comments in a follow up commit). Approved by: re MFC after: 3 weeks	2007-07-23 09:42:32 +00:00
Jeff Roberson	40380a6a6b	- Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to require fewer blocking loops. - Don't use atomic ops with 4BSD or on UP. - Only use the blocking loop if ULE is compiled in. - Use the correct memory barrier. Discussed with: attilio, jhb, ssouhlal Tested by: current@ Approved by: re	2007-07-17 22:36:56 +00:00
John Baldwin	59d8f3ff08	Fix a couple of issues with the stack limit for 32-bit processes on 64-bit kernels exposed by the recent fixes to resource limits for 32-bit processes on 64-bit kernels: - Let ABIs expose their maximum stack size via a new pointer in sysentvec and use that in preference to maxssiz during exec() rather than always using maxssiz for all processses. - Apply the ABI's limit fixup to the previous stack size when adjusting RLIMIT_STACK to determine if the existing mapping for the stack needs to be grown or shrunk (as well as how much it should be grown or shrunk). Approved by: re (kensmith)	2007-07-12 18:01:31 +00:00
Peter Wemm	79d5bdcca5	Don't add the 'pad' argument to the mmap/truncate/etc syscalls. Submitted by: kensmith Approved by: re (kensmith)	2007-07-04 23:06:43 +00:00
Bjoern A. Zeeb	118043c6b1	Temporary disconnect i4bing, i4bisppp and i4bipr from the build for the 7.0 timeframe. This is needed because I4B is not locked and NET_NEEDS_GIANT goes away. The plan is to lock I4B and bring everything back for 7.1. Approved by: re (kensmith)	2007-07-04 00:18:39 +00:00
Nate Lawson	a1ec53930b	Revert previous commit, retaining cpufreq. Approved by: re (implicitly)	2007-07-01 22:19:20 +00:00
Nate Lawson	a7b811a620	Add cpufreq(4) to GENERIC. It does not change the frequency by default, so systems should be relatively unaffected. Users can then simply enable powerd(8) in rc.conf to take advantage of it. Approved by: re	2007-07-01 21:47:45 +00:00
Alan Cox	ba4b85e482	Pages that do belong to an object and page queue can now be freed without holding the page queues lock. Thus, the page table pages released by pmap_remove() and pmap_remove_pages() can be freed after the page queues lock is released. Approved by: re (kensmith)	2007-07-01 07:08:26 +00:00
Matt Jacob	7fc02735f4	Check for pte being NULL in return from pmap_pte_pde- unlikely or even impossible, but it's better ot have a panic and a quiesced gcc4.2.	2007-06-17 04:27:45 +00:00
Matt Jacob	27705ac087	Initialize lastaddr to zero to make gcc4.2 happy.	2007-06-17 04:21:58 +00:00
Peter Wemm	5915fb72fb	Prototype (but functional) Linux-ish /dev/nvram interface to the extra 114 bytes of cmos ram in the PC clock chip. The big difference between this and the Linux version is that we do not recalculate the checksums for bytes 16..31. We use this at work when cloning identical machines - we can copy the bios settings as well. Reading /dev/nvram gives 114 bytes of data but you can seek/read/write whichever bytes you like. Yes, this is a "foot, gun, fire!" type of device.	2007-06-15 22:58:14 +00:00
Xin LI	a2346f7c3c	Enable SCTP by default for GENERIC kernels in order to give it more exposure. The current state of SCTP implementation is considered to be ready for 32-bit platforms, but still need some work/testing on 64-bit platforms. Approved by: re (kensmith) Discussed with: rrs	2007-06-14 17:14:27 +00:00
Pyun YongHyeon	b5f0caf909	Add nfe(4) to the list of drivers supported by GENERIC kernel. While I'm here comment out nve(4) as nfe(4) will take over. Approved by: re	2007-06-12 02:24:30 +00:00
Matt Jacob	f2114f3bcd	Check against maxsegsz being zero in bus_dma_tag_create and return EINVAL if it is. Reviewed by: scott long	2007-06-11 17:57:24 +00:00
Andrew Thompson	ed3247cea7	Add wlan_scan_ap and wlan_scan_sta to platforms that include wlan.	2007-06-11 08:26:40 +00:00
Marcel Moolenaar	2b39bb4f4f	Use default options for default partitioning schemes, rather than making the relevant files standard. This avoids duplication and makes it easier to override/disable unwanted schemes. Since ARM doesn't have a DEFAULTS configuration file, leave the source files for the BSD and MBR partitioning schemes in files.arm for now.	2007-06-11 00:38:06 +00:00

... 3 4 5 6 7 ...

5360 Commits