freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	faccac2d45	Correct a critical accounting error in pmap_demote_pde(). Specifically, when pmap_demote_pde() allocates a page table page to implement a user-space demotion, it must increment the pmap's resident page count. Not doing so, can lead to an underflow during address space termination that causes pmap_remove() to exit prematurely, before it has destroyed all of the mappings within the specified range. The ultimate effect or symptom of this error is an assertion failure in vm_page_free_toq() because the page being freed is still mapped. This error is only possible when superpage promotion is enabled. Thus, it only affects FreeBSD versions greater than 7.2. Tested by: pho, alc Reviewed by: alc Approved by: re (rwatson) MFC after: 1 week	2009-08-17 13:27:55 +00:00
John Baldwin	21157ad3b1	Adjust the handling of the local APIC PMC interrupt vector: - Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc() routines in the local APIC code that the hwpmc(4) driver can use to manage the local APIC PMC interrupt vector. - Do not enable the local APIC PMC interrupt vector by default when HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly enables the interrupt when it is succesfully initialized and disables the interrupt when it is unloaded. This avoids enabling the interrupt on unsupported CPUs which may result in spurious NMIs. Reported by: rnoland Reviewed by: jkoshy Approved by: re (kib) MFC after: 2 weeks	2009-08-14 21:05:08 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Ed Schouten	61fb73de41	Make the MacBook3,1 boot again. Approved by: re (kib)	2009-08-02 11:26:23 +00:00
Rui Paulo	1f93ae9453	Refine the MacBook hack to only match early models that have Intel ICH. Discussed with: kjim Approved by: re (kib)	2009-07-27 13:51:55 +00:00
John Baldwin	013818111a	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks	2009-07-24 13:50:29 +00:00
Konstantin Belousov	206a336872	When the page caching attributes are changed, after new mapping is established, OS shall flush the caches on all processors that may have used the mapping previously. This operation is not needed if processors support self-snooping. If not, but clflush instruction is implemented on the CPU, series of the clflush can be used on the mapping region. Otherwise, we have to flush the whole cache. The later operation is very expensive, and AMD-made CPUs do not have self-snooping. Implement cache flush for remapped region by using clflush for amd64, when supported by CPU. Proposed and reviewed by: alc Approved by: re (kensmith)	2009-07-22 14:32:38 +00:00
Alan Cox	9861cbc6ca	Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)	2009-07-19 21:40:19 +00:00
Alan Cox	13de722155	An addendum to r195649, "Add support to the virtual memory system for configuring machine-dependent memory attributes...": Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page. Eliminate pointless code from pmap_cache_bits() on amd64. Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap. Approved by: re (kib)	2009-07-18 01:50:05 +00:00
Jung-uk Kim	e8c4d3e407	Match PCI Express root bridge _HID directly instead of relying on _CID. Reviewed by: jhb Approved by: re (kib)	2009-07-13 21:36:31 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Rui Paulo	59aa14a91d	Implementation of the upcoming Wireless Mesh standard, 802.11s, on the net80211 wireless stack. This work is based on the March 2009 D3.0 draft standard. This standard is expected to become final next year. This includes two main net80211 modules, ieee80211_mesh.c which deals with peer link management, link metric calculation, routing table control and mesh configuration and ieee80211_hwmp.c which deals with the actually routing process on the mesh network. HWMP is the mandatory routing protocol on by the mesh standard, but others, such as RA-OLSR, can be implemented. Authentication and encryption are not implemented. There are several scripts under tools/tools/net80211/scripts that can be used to test different mesh network topologies and they also teach you how to setup a mesh vap (for the impatient: ifconfig wlan0 create wlandev ... wlanmode mesh). A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled by default on GENERIC kernels for i386, amd64, sparc64 and pc98. Drivers that support mesh networks right now are: ath, ral and mwl. More information at: http://wiki.freebsd.org/WifiMesh Please note that this work is experimental. Also, please note that bridging a mesh vap with another network interface is not yet supported. Many thanks to the FreeBSD Foundation for sponsoring this project and to Sam Leffler for his support. Also, I would like to thank Gateworks Corporation for sending me a Cambria board which was used during the development of this project. Reviewed by: sam Approved by: re (kensmith) Obtained from: projects/mesh11s	2009-07-11 15:02:45 +00:00
Konstantin Belousov	d77e2734a1	When amd64 CPU cannot load segment descriptor during trap return to usermode, it generates GPF, that is mirrored to user mode as SIGSEGV. The offending register in mcontext should contain the value loading of which generated the GPF, and it is so on i386. On amd64, we currently report segment descriptor in tf_err, while segment register contains the corrected value loaded by trap handler. Fix the issue by behaving like i386, reloading segment register in trap frame after signal frame is pushed onto user stack. Noted and tested by: pho Approved by: re (kensmith)	2009-07-10 10:29:16 +00:00
Konstantin Belousov	a2622e5dc2	Restore the segment registers and segment base MSRs for amd64 syscall return path only when neither thread was context switched while executing syscall code nor syscall explicitely modified LDT or MSRs. Save segment registers in trap handlers before interrupts are enabled, to not allow context switches to happen before registers are saved. Use separated byte in pcb for indication of fast/full return, since pcb_flags are not synchronized with context switches. The change puts back syscall microbenchmark numbers that were slowed down after commit of the support for LDT on amd64. Reviewed by: jeff Tested (and tested, and tested ...) by: pho Approved by: re (kensmith)	2009-07-09 09:34:11 +00:00
Alan Cox	133898afd5	When pmap_change_attr() changes the PAT setting on a kernel mapping, it has to simultaneously change the PAT setting for the same pages within the direct map region. This may require the demotion of a 2MB page mapping and the allocation of a page table page. This revision gives the highest possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that pmap_change_attr() is less likely to fail. (In general, kernel page table page allocations have the highest priority, so this is not creating a new precedent.) (Demotion of 1GB page mappings within the direct map already specifies VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be changed.) Approved by: re (kib)	2009-07-06 18:43:42 +00:00
John Baldwin	0d0a6650d7	After the per-CPU IDT changes, the IDT vector of an interrupt could change when the interrupt was moved from one CPU to another. If the interrupt was enabled, then the old IDT vector needs to be disabled and the new IDT vector needs to be enabled. This was mostly masked prior to the recent MSI changes since in the older code almost all allocated IDT vectors were already enabled and the enabled vectors on the BSP during boot covered enough of the IDT range. However, after the MSI changes, MSI interrupts that were allocated but not enabled (e.g. DRM with MSI) during boot could result in an allocated IDT vector that wasn't enabled. The round-robin at the end of boot could place another interrupt at the same IDT vector without enabling the IDT vector causing trap 30 faults. Fix this by explicitly disabling/enabling the old and new IDT vectors for enabled interrupt sources when moving an interrupt between CPUs via the pic_assign_cpu() method. While here, fix a bug in my earlier changes so that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu() fails to allocate a new IDT vector and returns ENOSPC. Approved by: re (kensmith)	2009-07-06 18:23:00 +00:00
John Baldwin	f7d7cd0c76	MFi386: Add a 'show idt' command to DDB to display the non-default function pointers in the interrupt descriptor table. Approved by: re (kensmith)	2009-07-06 18:10:27 +00:00
Sam Leffler	8c393fd1f0	Cleanup ALIGNED_POINTER: o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v) o define as "1" on amd64 and i386 where there is no restriction o make the type returned consistent with ALIGN o remove _ALIGNED_POINTER o make associated comments consistent Reviewed by: bde, imp, marcel Approved by: re (kensmith)	2009-07-05 17:45:48 +00:00
Ed Schouten	89fe4c0a2b	Enable POSIX semaphores on all non-embedded architectures by default. More applications (including Firefox) seem to depend on this nowadays, so not having this enabled by default is a bad idea. Proposed by: miwi Patch by: Florian Smeets <flo kasimir com> Approved by: re (kib)	2009-07-02 18:24:37 +00:00
John Baldwin	cebc7fb16c	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
Doug Rabson	259d14ed88	Don't include rpcv2.h - it has been removed. Submitted by: ed@ Approved by: re	2009-07-01 07:34:28 +00:00
Andriy Gapon	462fab84b8	remove unused/unneeded extern declarations This should result in no changes to compiled code. Reviewed by: alc Approved by: re (kib) MFC after: 1 day	2009-06-30 11:16:32 +00:00
Robert Watson	ad8dacbb91	Catch missed AUDIT_ARG() -> AUDIT_ARG_CMD() on amd64. Submitted by: Florian Smeets <flo at kasimir.com> Approved by: re (kib) (implicit) MFC after: 1 week	2009-06-27 15:03:50 +00:00
Robert Watson	14961ba789	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week	2009-06-27 13:58:44 +00:00
Alan Cox	5797795f5a	Correct the #endif comment. Noticed by: jmallett Approved by: re (kib)	2009-06-26 16:22:24 +00:00
Alan Cox	e999111ae7	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
John Baldwin	4e9dba6322	Fix kernels compiled without SMP support. Make intr_next_cpu() available for UP kernels but as a stub that always returns the single CPU's local APIC ID. Reported by: kib	2009-06-25 20:35:46 +00:00
John Baldwin	b4805f449c	- Restore the behavior of pre-allocating IDT vectors for MSI interrupts. This is mostly important for the multiple MSI message case where the IDT vectors for the entire group need to be allocated together. This also restores the assumptions made by the PCI bus code that it could invoke PCIB_MAP_MSI() once MSI vectors were allocated. - To avoid whiplash with CPU assignments, change the way that CPUs are assigned to interrupt sources on activation. Instead of assigning the CPU via pic_assign_cpu() before calling enable_intr(), allow the different interrupt source drivers to ask the MD interrupt code which CPU to use when they allocate an IDT vector. I/O APIC interrupt pins do this in their pic_enable_intr() routines giving the same behavior as before. MSI sources do it when the IDT vectors are allocated during msi_alloc() and msix_alloc(). - Change the intr_table_lock from an sx lock to a mutex. Tested by: rnoland	2009-06-25 18:13:46 +00:00
John Baldwin	7af55bd450	Whitespace fix.	2009-06-24 19:16:48 +00:00
Alexander Motin	9f23a6caa4	Make algorithm a bit more bulletproof.	2009-06-23 23:16:37 +00:00
Jeff Roberson	50c202c592	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
Alexander Motin	9e9be26906	Fix variable name.	2009-06-23 22:08:25 +00:00
Alexander Motin	96c5d068d8	Rework r193814: While general idea of patch was good, it was not working properly due the way it was implemented. When we are using same timer interrupt for several of hard/prof/stat purposes we should not send several IPIs same time to other CPUs. Sending several IPIs same time leads to terrible accounting/profiling results due to strong synchronization effect, when the second interrupt handler accounts processing of the first one. Interlink timer events in a such way, that no more then one IPI is sent for any original timer interrupt.	2009-06-23 21:45:33 +00:00
Alan Cox	0f6766f3da	Eliminate dead code. These definitions should have been deleted with the introduction of i686_mem.c in r45405. Merge adjacent #ifdef _KERNEL/#endif blocks.	2009-06-22 04:21:02 +00:00
Paul Saab	c2b6d60d26	I have several machines where the following warning is printed: warning: no time-of-day clock registered, system time will not be set accurately Provide hints to atrtc on amd64 since it's not being described in ACPI on some systems. Reviewed by: jhb	2009-06-15 21:55:29 +00:00
Alexander Motin	bb74c2db4d	Forbid multi-vector MSI interrupt vectors migration to another CPU once allocated. MSI have strict vectors allocation requirements, which are not satisfied now during reallocation. This is not the best possible solution, but better then just broken, as it was. No objections: current@, arch@, jhb@	2009-06-15 13:47:49 +00:00
Alan Cox	387aabc513	Long, long ago in r27464 special case code for mapping device-backed memory with 4MB pages was added to pmap_object_init_pt(). This code assumes that the pages of a OBJT_DEVICE object are always physically contiguous. Unfortunately, this is not always the case. For example, jhb@ informs me that the recently introduced /dev/ksyms driver creates a OBJT_DEVICE object that violates this assumption. Thus, this revision modifies pmap_object_init_pt() to abort the mapping if the OBJT_DEVICE object's pages are not physically contiguous. This revision also changes some inconsistent if not buggy behavior. For example, the i386 version aborts if the first 4MB virtual page that would be mapped is already valid. However, it incorrectly replaces any subsequent 4MB virtual page mappings that it encounters, potentially leaking a page table page. The amd64 version has a bug of my own creation. It potentially busies the wrong page and always an insufficent number of pages if it blocks allocating a page table page. To my knowledge, there have been no reports of these bugs, hence, their persistance. I suspect that the existing restrictions that pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would choose to map, for example, that the first page must be aligned on a 2 or 4MB physical boundary and that the size of the mapping must be a multiple of the large page size, were enough to avoid triggering the bug for drivers like ksyms. However, one side effect of testing the OBJT_DEVICE object's pages for physical contiguity is that a dubious difference between pmap_object_init_pt() and the standard path for mapping devices pages, i.e., vm_fault(), has been eliminated. Previously, pmap_object_init_pt() would only instantiate the first PG_FICTITOUS page being mapped because it never examined the rest. Now, however, pmap_object_init_pt() uses the new function vm_object_populate() to instantiate them all (in order to support testing their physical contiguity). These pages need to be instantiated for the mechanism that I have prototyped for automatically maintaining the consistency of the PAT settings across multiple mappings, particularly, amd64's direct mapping, to work. (Translation: This change is also being made to support jhb@'s work on the Nvidia feature requests.) Discussed with: jhb@	2009-06-14 19:51:43 +00:00
Ed Schouten	c2919fd8ff	Enable PRINTF_BUFR_SIZE on i386 and amd64 by default. In the past there have been some reports of PRINTF_BUFR_SIZE not functioning correctly. Instead of having garbled console messages, we should just see whether the issues are still there and analyze them. Approved by: re	2009-06-14 18:01:35 +00:00
Pyun YongHyeon	d68875eb7e	Add alc(4), a driver for Atheros AR8131/AR8132 PCIe ethernet controller. These controllers are also known as L1C(AR8131) and L2C(AR8132) respectively. These controllers resembles the first generation controller L1 but usage of different descriptor format and new register mappings over L1 register space requires a new driver. There are a couple of registers I still don't understand but the driver seems to have no critical issues for performance and stability. Currently alc(4) supports the following hardware features. o MSI o TCP Segmentation offload o Hardware VLAN tag insertion/stripping o Tx/Rx interrupt moderation o Hardware statistics counters(dev.alc.%d.stats) o Jumbo frame o WOL AR8131/AR8132 also supports Tx checksum offloading but I disabled it due to stability issues. I'm not sure this comes from broken sample boards or hardware bugs. If you know your controller works without problems you can still enable it. The controller has a silicon bug for Rx checksum offloading, so the feature was not implemented. I'd like to say big thanks to Atheros. Atheros kindly sent sample boards to me and answered several questions I had. HW donated by: Atheros Communications, Inc.	2009-06-10 02:07:58 +00:00
Kip Macy	98fda6ac58	opt in to flowtable on i386/amd64	2009-06-09 21:58:14 +00:00
Kip Macy	c804d618eb	remove flowtable from DEFAULTS	2009-06-09 20:26:52 +00:00
Bjoern A. Zeeb	78c7c6adbc	Unbreak the build for amd64 after r193814 using correct variable names.	2009-06-09 09:47:02 +00:00
Ariff Abdullah	b65cb1db3c	When using i8254 as the only kernel timer source: - Interpolate stat/prof clock using clkintr() in a similar fashion to local APIC timer, since statclock usually run slower. - Liberate hardclockintr() from taking the burden of handling both stat and prof clock interrupt. Instead, send IPIs within clkintr() to handle those.	2009-06-09 07:26:52 +00:00
Ariff Abdullah	867cdecd40	Move C1E workaround into its own idle function. Previous workaround works only during initial booting process, while there are laptops/BIOSes that tend to act 'smarter' by force enabling C1E if the main power adapter being pulled out, rendering previous workaround ineffective. Given the fact that we still rely on local APIC to drive timer interrupt, this workaround should keep all Turion (probably Phenom too) X\d+ alive whether its on battery power or not. URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html Tested by: Peter Jeremy <peterjeremy at optushome d com d au>	2009-06-09 04:17:36 +00:00
Jung-uk Kim	230bb4d90d	Rewrite OsdSynch.c to reflect the latest ACPICA more closely: - Implement ACPI semaphore (ACPI_SEMAPHORE) with condvar(9) and mutex(9). - Implement ACPI mutex (ACPI_MUTEX) with mutex(9). - Implement ACPI lock (ACPI_SPINLOCK) with spin mutex(9).	2009-06-08 20:07:16 +00:00
Ed Schouten	5942207fb4	Revert my change; reintroduce __gnu89_inline. It turns out our compiler in stable/7 can't build this code anymore. Even though my opinion is that those people should just run `make kernel-toolchain' before building a kernel, I am willing to wait and commit this after we've branched stable/8. Requested by: rwatson	2009-06-08 18:23:43 +00:00
Ed Schouten	032e3d1d19	Remove __gnu89_inline. Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems. Reviewed by: alc	2009-06-08 17:27:25 +00:00
Alan Cox	3cfc28b0a0	Now that amd64's kernel map is 512GB (SVN rev 192216), there is no reason to cap its buffer map at 1GB. MFC after: 6 weeks	2009-06-08 16:43:40 +00:00
Konstantin Belousov	12be48a216	Put intrcnt, eintrcnt, intrnames and eintrnames into the .data section. Noted by: "Tseng, Kuo-Lang" <kuo-lang.tseng intel com>, bde MFC after: 3 days	2009-06-05 20:23:29 +00:00
Jung-uk Kim	129d3046ef	Import ACPICA 20090521.	2009-06-05 18:44:36 +00:00
Robert Watson	bd875f5f13	Remove MAC kernel config files and add "options MAC" to GENERIC, with the goal of shipping 8.0 with MAC support in the default kernel. No policies will be compiled in or enabled by default, but it will now be possible to load them at boot or runtime without a kernel recompile. While the framework is not believed to impose measurable overhead when no policies are loaded (a result of optimization over the past few months in HEAD), we'll continue to benchmark and optimize as the release approaches. Please keep an eye out for performance or functionality regressions that could be a result of this change. Approved by: re (kensmith) Obtained from: TrustedBSD Project	2009-06-02 18:31:08 +00:00
Dmitry Chagin	f8cd0af232	Implement accept4 syscall. Approved by: kib (mentor) MFC after: 1 month	2009-06-01 20:48:39 +00:00
Robert Watson	33dd50646e	Regenerate generated syscall files following changes to struct sysent in r193234.	2009-06-01 16:14:38 +00:00
Jamie Gritton	76ca6f88da	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)	2009-05-29 21:27:12 +00:00
John Baldwin	515c5b1ede	Don't bother reading the initial value of the machine check banks during startup on Pentium 4 CPUs. This wasn't safe to do on APs during AP startup, was of limited value, and won't be used for future processors.	2009-05-20 16:11:22 +00:00
John Baldwin	dfc77ef51f	- Add a tunable 'hw.mca.enabled' that can be used to enable/disable the machine check code. Disable it by default for now. - When computing the mask of bits that determines a non-restartable event during a machine check exception, or-in the overflow flag rather than replacing the other flags. PR: i386/134586 [2] Submitted by: Andi Kleen andi-fbsd firstfloor.org	2009-05-18 21:50:06 +00:00
John Baldwin	d3da228f37	Add a read-only sysctl hw.pci.mcfg to mirror the tunable by the same name. MFC after: 1 week	2009-05-18 21:47:32 +00:00
John Baldwin	8aba835b8e	Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend using 128 byte alignment for locks. (See IA-32 SDM Vol 3A 7.11.6.7)	2009-05-18 19:33:59 +00:00
Marcel Moolenaar	dbb95048da	Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet. Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.	2009-05-18 18:37:18 +00:00
Kip Macy	b522d2c99b	correct range in comment pointed out by alc	2009-05-16 22:08:00 +00:00
Kip Macy	e127902229	update vm map comment pointed out by Larry Rosenman	2009-05-16 22:00:13 +00:00
Kip Macy	b6d82b1ae9	Increase default kernel map to 512GB I briefly discussed this with alc. It could lead to problems for greater than 64GB. However, that seems unlikely in practice.	2009-05-16 20:57:08 +00:00
Dmitry Chagin	3933bde22e	Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and SOCK_NONBLOCK flags, that allow to save fcntl() calls. Implement a variation of the socket() syscall which takes a flags in addition to the type argument. Approved by: kib (mentor) MFC after: 1 month	2009-05-16 18:48:41 +00:00
John Baldwin	76dae09449	Trim the default set of device hints on i386 and amd64: - Remove vga0 and the disabled uart2/uart3 hints from both platforms. - Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and le0 from i386. All these hints were marked 'disabled' and thus already did not work "out of the box". Discussed with: imp	2009-05-14 21:53:35 +00:00
Attilio Rao	120b18d86f	FreeBSD right now support 32 CPUs on all the architectures at least. With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-05-14 17:43:00 +00:00
John Baldwin	9dc0b3d54f	Implement simple machine check support for amd64 and i386. - For CPUs that only support MCE (the machine check exception) but not MCA (i.e. Pentium), all this does is print out the value of the machine check registers and then panic when a machine check exception occurs. - For CPUs that support MCA (the machine check architecture), the support is a bit more involved. - First, there is limited support for decoding the CPU-independent MCA error codes in the kernel, and the kernel uses this to output a short description of any machine check events that occur. - When a machine check exception occurs, all of the MCx banks on the current CPU are scanned and any events are reported to the console before panic'ing. - To catch events for correctable errors, a periodic timer kicks off a task which scans the MCx banks on all CPUs. The frequency of these checks is controlled via the "hw.mca.interval" sysctl. - Userland can request an immediate scan of the MCx banks by writing a non-zero value to "hw.mca.force_scan". - If any correctable events are encountered, the appropriate details are stored in a 'struct mca_record' (defined in <machine/mca.h>). The "hw.mca.count" is a count of such records and each record may be queried via the "hw.mca.records" tree by specifying the record index (0 .. count - 1) as the next name in the MIB similar to using PIDs with the kern.proc.* sysctls. The idea is to export machine check events to userland for more detailed processing. - The periodic timer and hw.mca sysctls are only present if the CPU supports MCA. Discussed with: emaste (briefly) MFC after: 1 month	2009-05-13 17:53:04 +00:00
Alan Cox	07a7b85e94	Correct a rare use-after-free error in pmap_copy(). This error was introduced in amd64 revision 1.540 and i386 revision 1.547. However, it had no harmful effects until after a recent change, r189698, on amd64. (In other words, the error is harmless in RELENG_7.) The error is triggered by the failure to allocate a pv entry for the one and only mapping in a page table page. I am addressing the error by changing pmap_copy() to abort if either pv entry allocation or page table page allocation fails. This is appropriate because the creation of mappings by pmap_copy() is optional. They are a (possible) optimization, and not a requirement. Correct a nearby whitespace error in the i386 pmap_copy(). Crash reported by: jeff@ MFC after: 6 weeks	2009-05-13 07:42:53 +00:00
Dmitry Chagin	03cc95d21a	Translate l_timeval arg to native struct timeval in linux_setsockopt()/linux_getsockopt() for SO_RCVTIMEO, SO_SNDTIMEO opts as l_timeval has MD members. Remove bogus __packed attribute from l_timeval struct on __amd64__. PR: kern/134276 Submitted by: Thomas Mueller <tmueller sysgo com> Approved by: kib (mentor) MFC after: 2 weeks	2009-05-11 13:50:42 +00:00
Dmitry Chagin	8d30f381ef	Do not export AT_CLKTCK when emulating Linux kernel prior to 2.4.0, as it has appeared in the 2.4.0-rc7 first time. Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK), glibc falls back to the hard-coded CLK_TCK value when aux entry is not present. Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value. For older applications/libc's which depends on hard-coded CLK_TCK value user should set compat.linux.osrelease less than 2.4.0. Approved by: kib (mentor)	2009-05-10 18:43:43 +00:00
Dmitry Chagin	1ca16454b3	Rework r189362, r191883. The frequency of the statistics clock is given by stathz. Use stathz if it is available, otherwise use hz. Pointed out by: bde Approved by: kib (mentor)	2009-05-10 18:16:07 +00:00
Jun Kuriyama	b3b17597ea	- Use "device\t" and "options \t" for consistency.	2009-05-10 00:00:25 +00:00
Jamie Gritton	7ae27ff49f	Move the per-prison Linux MIB from a private one-off pointer to the new OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)	2009-05-07 18:36:47 +00:00
Dmitry Chagin	13f20d7e86	To avoid excessive code duplication move MI definitions to the MI header file. As it is defined in Linux. Approved by: kib (mentor) MFC after: 1 month	2009-05-07 09:39:20 +00:00
Doug Rabson	ad5c667f35	Disable adaptive mutexes and rwlocks for XENHVM.	2009-05-06 17:52:38 +00:00
Doug Rabson	8480241102	Fix XENHVM build.	2009-05-06 17:48:39 +00:00
Alexander Motin	614dd4f83c	Do not try to initialize LAPIC timer if we are not going to use it. It solves assertion, when kernel built with INVARIANTS configured to use i8254 timer.	2009-05-05 01:13:20 +00:00
Jung-uk Kim	4ef853cc7f	Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and fix SMP topology detection. On i386, we extend it to cover Core, Core 2, and Core i7 processors, not just Pentium 4 family, and move it to better place. On amd64, all supported Intel CPUs should have this MSR.	2009-05-04 18:05:27 +00:00
Alexander Motin	1703f2b424	Rename statclock_disable variable to atrtcclock_disable that it actually is, and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock controlling it. Setting it to 0 disables using RTC clock as stat-/ profclock sources. Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254 hardclock, when LAPIC and RTC clocks are disabled. This allows to reduce global interrupt rate of idle system down to about 100 interrupts per core, permitting C3 and deeper C-states provide maximum CPU power efficiency.	2009-05-03 17:47:21 +00:00
Alexander Motin	6a3a164d6e	Add support for using i8254 and rtc timers as event sources for amd64 SMP system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.	2009-05-02 12:20:43 +00:00
Dmitry Chagin	d789bfd562	Move extern variable definitions to the header file. Approved by: kib (mentor) MFC after: 1 month	2009-05-02 10:06:49 +00:00
Alexander Motin	58a2bb4996	Add resume methods to i8254 and atrtc devices.	2009-05-01 21:43:04 +00:00
Alexander Motin	2f369c9496	Small addition to r191720. Restore previous behaviour for the case of unknown interrupt. Invocation of IRQ -1 crashes my system on resume. Returning 0, as it was, is not perfect also, but at least not so dangerous.	2009-05-01 20:53:37 +00:00
Sam Leffler	dcad868984	o add uath o sort usb wireless drivers	2009-05-01 17:20:16 +00:00
Alexander Motin	1ecff35a6b	Use value -1 instead of 0 for marking unused APIC vectors. This fixes IRQ0 routing on LAPIC-enabled systems. Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers as hard-/stat-/profclock sources falling back to using i8254 and rtc timers. On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU enters C3 or deeper power state. It makes no problems for interrupt processing, as chipset wakes up CPU on interrupt triggering. But entering C3 state kills LAPIC timer and freezes system time, making C3 and deeper states practically unusable. Using i8254 timer allows to avoid this problem. By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters C3 state, saving more then a Watt of total idle power (>10%) in addition to all other power-saving techniques. This technique is not working for SMP yet, as only one CPU receives timer interrupts. But I think that problem could be fixed by forwarding interrupts to other CPUs with IPI.	2009-05-01 17:05:49 +00:00
Dmitry Chagin	79262bf1f0	Reimplement futexes. Old implemention used Giant to protect the kernel data structures, but at the same time called malloc(M_WAITOK), that could cause the calling thread to sleep and lost Giant protection. User-visible result was the missed wakeup. New implementation uses one sx lock per futex. The sx protects the futex structures and allows to sleep while copyin or copyout are performed. Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation is requested and either caller specified futexes are equial or second futex already exists. This is acceptable since the situation can only occur from the application error, and glibc falls back to old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error. Approved by: kib (mentor) MFC after: 1 month	2009-05-01 15:36:02 +00:00
Jung-uk Kim	788399cbd9	- Fix divide-by-zero panic when SMP kernel is used on UP system[1]. - Avoid possible divide-by-zero panic on SMP system when the CPUID is disabled, unsupported, or buggy. Submitted by: pluknet (pluknet at gmail dot com)[1]	2009-04-30 22:10:04 +00:00
Jeff Roberson	82fcb0f192	- Add support for cpuid leaf 0xb. This allows us to determine the topology of nehalem/corei7 based systems. - Remove the cpu_cores/cpu_logical detection from identcpu. - Describe the layout of the system in cpu_mp_announce(). Sponsored by: Nokia	2009-04-29 06:54:40 +00:00
John Baldwin	10395e0714	Reduce the number of bounce zones (and thus the number of bounce pages used in some cases): - Ignore DMA tag boundaries when allocating bounce pages. The boundaries don't determine whether or not parts of a DMA request bounce. Instead, they are just used to carve up segments. - Allow tags with sub-page alignment to share bounce pages since bounce pages are always page aligned. Reviewed by: scottl (amd64) MFC after: 1 month	2009-04-23 20:24:19 +00:00
John Baldwin	125f11d360	Adjust the way we number CPUs on x86 so that we attempt to "group" all logical CPUs in a package. We do this by numbering the non-boot CPUs by starting with the first CPU whose APIC ID is after the boot CPU and wrapping back around to APIC ID 0 if needed rather than always starting at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list CPUs based on the mapping established by assign_cpu_ids() rather than making assumptions about the algorithm assign_cpu_ids() uses. MFC after: 1 month	2009-04-22 21:40:37 +00:00
Robert Watson	9725389e1e	Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing a fair number of static data structures, making this an unlikely option to try to change without also changing source code. [1] Change default cache line size on ia64, sparc64, and sun4v to 128 bytes, as this was what rtld-elf was already using on those platforms. [2] Suggested by: bde [1], jhb [2] MFC after: 2 weeks	2009-04-20 12:59:23 +00:00
Robert Watson	22037b2d2c	Add description and cautionary note regarding CACHE_LINE_SIZE. MFC after: 2 weeks Suggested by: alc	2009-04-19 21:26:36 +00:00
Robert Watson	a93fa8f2bb	For each architecture, define CACHE_LINE_SHIFT and a derived CACHE_LINE_SIZE constant. These constants are intended to over-estimate the cache line size, and be used at compile-time when a run-time tuning alternative isn't appropriate or available. Defaults for all architectures are 64 bytes, except powerpc where it is 128 bytes (used on G5 systems). MFC after: 2 weeks Discussed on: arch@	2009-04-19 20:19:13 +00:00
Kip Macy	34b07340ff	- Import infrastructure for caching flows as a means of accelerating L3 and L2 lookups as well as providing stateful load balancing when used with RADIX_MPATH. - Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at runtime with 'sysctl net.inet.flowtable.enable=1'. - Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to their kernel config files. - A minimal hookup will be added to ip_output in a subsequent commit. I would like to see more review before bringing in changes that require more churn. Supported by: Bitgravity Inc.	2009-04-19 00:16:04 +00:00
John Baldwin	842f11bef6	Restore bus DMA bounce pages to an offset of 0 when they are released by a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be reused with a non-zero offset by a tag that doesn't have BUS_DMA_KEEP_PG_OFFSET leading to data corruption. Sleuthing by: avg Reviewed by: scottl	2009-04-17 13:22:18 +00:00
Marcel Moolenaar	6ad9a99f21	Add a compat option to the EBR scheme that controls the naming of the partitions (GEOM_PART_EBR_COMPAT). When compatibility is enabled, changes to the partitioning are disallowed. Remove the device name aliasing added previously to provide backward compatibility, but which in practice doesn't give us anything. Enable compatibility on amd64 and i386.	2009-04-15 22:38:22 +00:00
Jung-uk Kim	cebe9dc98a	A simple rewrite of biossmap.c: - Do not iterate int 15h, function e820h twice. Instead, we use STAILQ to store each return buffer and copy all at once. - Export optional extended attributes defined in ACPI 3.0 as separate metadata. Currently, there are only two bits defined in the specification. For example, if the descriptor has extended attributes and it is not enabled, it has to be ignored by OS. We may implement it in the kernel later if it is necessary and proven correct in reality. - Check return buffer size strictly as suggested in ACPI 3.0. Reviewed by: jhb	2009-04-15 17:31:22 +00:00
Konstantin Belousov	3feb57a0a8	The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the uio_td to extract pages from, instead of unconditionally use kernel pmap. Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version) PR: amd64/133592 Reviewed by: scottl (original patch), jhb MFC after: 2 weeks	2009-04-13 19:20:32 +00:00
Ed Schouten	e1048f7678	Simplify in/out functions (for i386 and AMD64). Remove a hack to generate more efficient code for port numbers below 0x100, which has been obsolete for at least ten years, because GCC has an asm constraint to specify that. Submitted by: Christoph Mallon <christoph mallon gmx de>	2009-04-11 14:01:01 +00:00
Jack F Vogel	b698ab40c8	Add ixgbe to the GENERIC amd64 kernel in place of the older ixgb driver. I will add to other architectures after this one proves trouble free. MFC after: 2 weeks	2009-04-10 00:40:48 +00:00
Ed Schouten	2c97d32a81	Also remove the unused __word_swap_int*() macros. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>	2009-04-08 19:10:20 +00:00

1 2 3 4 5 ...

5443 Commits