freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	760faf9fdd	Fix signal delivery for the iBCS2 binaries. The iBCS2 sysvec uses current FreeBSD signal trampoline, but does not specifies sv_sigcode_base, since shared page is not mapped. This results in the zero %eip for the signal frame. Fall back to calculating %eip as offset from the psstrings when sv_sigcode_base is not initialized. Reported by: Rich Naill <rich@enterprisesystems.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-08 16:57:55 +00:00
Konstantin Belousov	6f8a44a5dd	Add bits for the AMD features from CPUID function 0x80000001 ECX, described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf. Partially based on the patch submitted by: Dmitry Luhtionov <dmitryluhtionov@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-08 16:32:30 +00:00
Alan Cox	c70af4875e	As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other words, every architecture is now auto-sizing the kmem arena. This revision changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and the definition of VM_KMEM_SIZE becomes optional. Replace or eliminate all existing definitions of VM_KMEM_SIZE. With auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for clarity. Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to that of setting the tunable vm.kmem_size. Whereas the macros VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set the size of the kmem arena at compile-time without that value being overridden by auto-sizing. Update the nearby comments to reflect the kmem submap being replaced by the kmem arena. Stop duplicating the auto-sizing formula in every machine- dependent vmparam.h and place it in kmeminit() where auto-sizing takes place. Reviewed by: kib (an earlier version) Sponsored by: EMC / Isilon Storage Division	2013-11-08 16:25:00 +00:00
Mark Johnston	57170f49f2	Remove references to an unused fasttrap probe hook, and remove the corresponding x86 trap type. Userland DTrace probes are currently handled by the other fasttrap hooks (dtrace_pid_probe_ptr and dtrace_return_probe_ptr). Discussed with: rpaulo	2013-10-31 02:35:00 +00:00
Gleb Smirnoff	66e01d73cd	- Provide necessary includes. - Remove unnecessary includes. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-29 11:17:49 +00:00
Konstantin Belousov	86be9f0dd5	Import the driver for VT-d DMAR hardware, as specified in the revision 1.3 of Intelб╝ Virtualization Technology for Directed I/O Architecture Specification. The Extended Context and PASIDs from the rev. 2.2 are not supported, but I am not aware of any released hardware which implements them. Code does not use queued invalidation, see comments for the reason, and does not provide interrupt remapping services. Code implements the management of the guest address space per domain and allows to establish and tear down arbitrary mappings, but not partial unmapping. The superpages are created as needed, but not promoted. Faults are recorded, fault records could be obtained programmatically, and printed on the console. Implement the busdma(9) using DMARs. This busdma backend avoids bouncing and provides security against misbehaving hardware and driver bad programming, preventing leaks and corruption of the memory by wild DMA accesses. By default, the implementation is compiled into amd64 GENERIC kernel but disabled; to enable, set hw.dmar.enable=1 loader tunable. Code is written to work on i386, but testing there was low priority, and driver is not enabled in GENERIC. Even with the DMAR turned on, individual devices could be directed to use the bounce busdma with the hw.busdma.pci<domain>:<bus>:<device>:<function>.bounce=1 tunable. If DMARs are capable of the pass-through translations, it is used, otherwise, an identity-mapping page table is constructed. The driver was tested on Xeon 5400/5500 chipset legacy machine, Haswell desktop and E5 SandyBridge dual-socket boxes, with ahci(4), ata(4), bce(4), ehci(4), mfi(4), uhci(4), xhci(4) devices. It also works with em(4) and igb(4), but there some fixes are needed for drivers, which are not committed yet. Intel GPUs do not work with DMAR (yet). Many thanks to John Baldwin, who explained me the newbus integration; Peter Holm, who did all testing and helped me to discover and understand several incredible bugs; and to Jim Harris for the access to the EDS and BWG and for listening when I have to explain my findings to somebody. Sponsored by: The FreeBSD Foundation MFC after: 1 month	2013-10-28 13:33:29 +00:00
Glen Barber	6b48eebec6	Document XENHVM and xenpci are mutually inclusive. Submitted by: gibbs Approved by: re (delphij) Sponsored by: The FreeBSD Foundation	2013-10-11 19:40:28 +00:00
Dimitry Andric	e38c5308a2	Remove redundant declarations of szsigcode and sigcode in sys/i386/ibcs2/ibcs2_sysvec.c, to silence two gcc warnings. Approved by: re (gjb) MFC after: 3 days	2013-10-07 16:57:48 +00:00
Dimitry Andric	d359802e10	Remove redundant declaration of force_evtchn_callback() in the i386-specific xen-os.h, to silence a gcc warning. Approved by: re (gjb) MFC after: 3 days	2013-10-07 16:53:26 +00:00
Justin T. Gibbs	5fdd34ee20	Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id field. Perform vcpu enumeration for Xen PV and HVM environments and convert all Xen drivers to use vcpu_id instead of a hard coded assumption of the mapping algorithm (acpi or apic ID) in use. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) amd64/include/pcpu.h: i386/include/pcpu.h: Add vcpu_id to the amd64 and i386 pcpu structures. dev/xen/timer/timer.c x86/xen/xen_intr.c Use new vcpu_id instead of assuming acpi_id == vcpu_id. i386/xen/mp_machdep.c: i386/xen/mptable.c x86/xen/hvm.c: Perform Xen HVM and Xen full PV vcpu_id mapping. x86/xen/hvm.c: x86/acpica/madt.c Change SYSINIT ordering of acpi CPU enumeration so that it is guaranteed to be available at the time of Xen HVM vcpu id mapping.	2013-10-05 23:11:01 +00:00
John-Mark Gurney	29904f46d6	add aesni module to i386 and amd64 NOTES... Approved by: re (gjb)	2013-10-04 17:21:01 +00:00
Konstantin Belousov	ad43b98491	Free both KVA and backing pages when freeing TSS memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2013-09-23 20:14:15 +00:00
Gleb Smirnoff	255c1caae3	- Create kern.ipc.sendfile namespace, and put the new "readhead" OID there as "kern.ipc.sendfile.readahead". - Push all nsfbuf related tunables into MD code. Don't move them to new namespace in favor of POLA. Reviewed by: scottl Approved by: re (gjb)	2013-09-22 13:36:52 +00:00
Justin T. Gibbs	9298484319	Fix compilation of the i386 PAE kernel config. sys/i386/include/xen/xenvar.h: Provide vtomach() when PAE is defined. Approved by: re (blanket Xen)	2013-09-22 00:54:22 +00:00
Justin T. Gibbs	566a5f5020	Merge Xen PVHVM support into the GENERIC kernel config for both amd64 and i386. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: - Introduce two new CPU hooks for initialization and resume purposes. This allows us to get rid of the XENHVM ifdefs in mp_machdep, and also sets some hooks into common code that can be used by other hypervisor implementations. sys/amd64/conf/XENHVM: sys/i386/conf/XENHVM: - Remove these configs now that GENERIC has builtin support for Xen HVM. sys/kern/subr_smp.c: - Make sure there are no pending IPIs when suspending a system. sys/x86/xen/hvm.c: - Add cpu init and resume vectors that are called from mp_machdep using the new hooks. - Only clear the vcpu_info mapping data on resume. It is already clear for the BSP on a cold boot and is set correctly as APs are started. - Gate xen_hvm_init_cpu only to systems running under Xen. sys/x86/xen/xen_intr.c: - Gate the setup of event channels only to systems running under Xen.	2013-09-20 22:59:22 +00:00
David Christensen	4e4007688c	Substantial rewrite of bxe(4) to add support for the BCM57712 and BCM578XX controllers. Approved by: re MFC after: 4 weeks	2013-09-20 20:18:49 +00:00
Justin T. Gibbs	428b7ca290	Add support for suspend/resume/migration operations when running as a Xen PVHVM guest. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information. sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers. sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume. sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU. sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume. sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus. sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension. sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations. sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.	2013-09-20 05:06:03 +00:00
Justin T. Gibbs	e96ca45522	sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: Set PCPU apic_id and acpi_id fields in a fasion compatible with both UP and SMP configurations. Suggested by: jhb Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks	2013-09-20 04:35:09 +00:00
Alan Cox	deb179bb4c	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division	2013-09-20 04:30:18 +00:00
Justin T. Gibbs	8a21c7fbe8	sys/i386/xen_mp_machdep.c: Set a 'fake' acpi_id for the i386 PV port, it is needed in order to use VIRQs or IPI event channels. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks	2013-09-19 14:41:10 +00:00
Pawel Jakub Dawidek	3fded357af	Fix panic in ktrcapfail() when no capability rights are passed. While here, correct all consumers to pass NULL instead of 0 as we pass capability rights as pointers now, not uint64_t. Reported by: Daniel Peyrolon Tested by: Daniel Peyrolon Approved by: re (marius)	2013-09-18 19:26:08 +00:00
Roman Divacky	69d912af45	Regen. Approved by: re (delphij)	2013-09-18 18:49:26 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	70ccaaf58e	Regen. Approved by: re (delphij)	2013-09-18 17:58:03 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
Bryan Venteicher	03c6abfd1c	Add vmx(4) to i386 and amd64 GENERIC Approved by: re (gjb)	2013-09-17 01:54:13 +00:00
Konstantin Belousov	06646d663a	Merge the change r255607 from amd64 to i386. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-09-16 19:58:37 +00:00
Alan Cox	87ee6303e5	Prior to r254304, we only began scanning the active page queue when the amount of free memory was close to the point at which we would begin reclaiming pages. Now, we continuously scan the active page queue, regardless of the amount of free memory. Consequently, we are continuously calling pmap_ts_referenced() on active pages. Prior to this change, pmap_ts_referenced() would always demote superpage mappings in order to obtain finer-grained reference information. This made sense because we were coming under memory pressure and would soon have to begin reclaiming pages. Now, however, with continuous scanning of the active page queue, these demotions are taking a toll on performance. To address this problem, I have replaced the demotion with a heuristic for periodically clearing the reference flag on superpage mappings. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division	2013-09-11 17:23:42 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Justin T. Gibbs	e44af46e4c	Implement PV IPIs for PVHVM guests and further converge PV and HVM IPI implmementations. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers. sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role. sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations. sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files. sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.	2013-09-06 22:17:02 +00:00
Bryan Venteicher	ddb4ffd0c6	Add vmx device to the i386 and amd64 NOTES files	2013-09-06 20:24:21 +00:00
Gleb Smirnoff	2ee9b44cae	Fix build with gcc. Move sf_buf_alloc()/sf_buf_free() declarations to MD headers.	2013-09-06 17:44:13 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Justin T. Gibbs	081f835212	Better conformance to style(9) and organizational cleanup. No functional changes. sys/i386/xen/mp_machdep.c: Remove extra newlines. Group externs, forward delarations, local types, and pcpu data. Wrap at 80 columns. Use parens in return statements. Tab indent members of array initializers. MFC after: 2 weeks	2013-09-02 22:22:56 +00:00
Justin T. Gibbs	9f40021f28	Introduce a new, HVM compatible, paravirtualized timer driver for Xen. Use this new driver for both PV and HVM instances. This driver requires a Xen hypervisor that supports vector callbacks, VCPUOP hypercalls, and reports that it has a "safe PV clock". New timer driver: Submitted by: will Sponsored by: Spectra Logic Corporation PV port to new driver, and bug fixes: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/dev/xen/timer/timer.c: - Register a PV timer device driver which (currently) implements device_{identify,probe,attach} and stubs device_detach. The detach routine requires functionality not provided by timecounters(4). The suspend and resume routines need additional work (due to Xen requiring that the hypercalls be executed on the target VCPU), and aren't needed for our purposes. - Make sure there can only be one device instance of this driver, and that it only registers one eventtimers(4) and one timecounters(4) device interface. Make both interfaces use PCPU data as needed. - Match, with a few style cleanups & API differences, the Xen versions of the "fetch time" functions. - Document the magic scale_delta() better for the i386 version. - When registering the event timer, bind a separate event channel for the timer VIRQ to the device's event timer interrupt handler for each active VCPU. Describe each interrupt as "xen_et:c%d", so they can be identified per CPU in "vmstat -i" or "show intrcnt" in KDB. - When scheduling a timer into the hypervisor, try up to 60 times if the hypervisor rejects the time as being in the past. In the common case, this retry shouldn't happen, and if it does, it should only happen once. This is because the event timer advertises a minimum period of 100usec, which is only less than the usual hypercall round trip time about 1 out of every 100 tries. (Unlike other similar drivers, this one actually checks whether the hypervisor accepted the singleshot timer set hypercall.) - Implement a RTC PV clock based on the hypervisor wallclock. sys/conf/files: - Add dev/xen/timer/timer.c if the kernel configuration includes either the XEN or XENHVM options. sys/conf/files.i386: sys/i386/include/xen/xen_clock_util.h: sys/i386/xen/clock.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/xen_rtc.c: - Remove previous PV timer used in i386 XEN PV kernels, the new timer introduced in this change is used instead (so we share the same code between PVHVM and PV). MFC after: 2 weeks	2013-08-29 23:11:58 +00:00
Justin T. Gibbs	76acc41fb7	Implement vector callback for PVHVM and unify event channel implementations Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.	2013-08-29 19:52:18 +00:00
Alan Cox	51321f7c31	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division	2013-08-29 15:49:05 +00:00
Justin T. Gibbs	8ac6a7aa17	Rename definition of HYPERVISOR_VIRT_START to avoid conflict with upstream Xen definition found in xen/interface/arch-x86/xen-x86_32.h. Submitted by: Roger Pau Monné Reviewed by: gibbs MFC after: 2 weeks	2013-08-22 20:07:06 +00:00
Konstantin Belousov	e68c64f0ba	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-22 18:12:24 +00:00
David E. O'Brien	46be218dce	The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". Requested by: so@ (des) Submitted by: obrien, arthurmesh@gmail.com Obtained from: Juniper Networks	2013-08-21 22:43:29 +00:00
Jung-uk Kim	1533b9f714	Reimplement atomic operations on PDEs and PTEs in pmap.h. This change significantly reduces duplicate code and make it easier to read. Reviewed by: alc, bde	2013-08-21 22:40:29 +00:00
Jung-uk Kim	5188b5f3c2	Implement atomic_cmpset_64() and atomic_swap_64() for i386.	2013-08-21 22:30:11 +00:00
Jung-uk Kim	3264fd707a	Reimplement atomic_load_acq_64() and atomic_store_rel_64() for i386. These functions are now real functions rather than function pointers. Supposedly, it is faster for modern processors. Suggested by: bde	2013-08-21 22:27:42 +00:00
Jung-uk Kim	d36eb3f1c4	Remove empty lines before return statements for style consistency.	2013-08-21 22:05:58 +00:00
Jung-uk Kim	8a1ee2d346	Implement atomic_swap() and atomic_testandset(). Reviewed by: arch, bde, jilles, kib	2013-08-21 22:03:06 +00:00
Jung-uk Kim	da255e4c7f	- Remove the "a" constraint from main output operand for atomic_cmpset(). - Use "+" modifier for the "expect" because it is also an output (unused).	2013-08-21 21:30:06 +00:00
Jung-uk Kim	fe94be3da7	Use '+' modifier for a memory operand that is both an input and an output. It was actually done in r86301 but reverted in r150182 because GCC 3.x was not able to handle it for a memory operand. Apparently, this problem was fixed in GCC 4.1+ and several contrib sources already rely on this feature.	2013-08-21 21:14:16 +00:00
Jung-uk Kim	c1c84ce1bf	Remove bogus labels. No functional change.	2013-08-21 20:49:46 +00:00
Jung-uk Kim	ee93d1173a	Use consistent style. No functional change.	2013-08-21 20:43:50 +00:00
Pawel Jakub Dawidek	417ffc66fa	Add process descriptors support to the GENERIC kernel. It is already being used by the tools in base systems and with sandboxing more and more tools the usage should only increase. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> Sponsored by: Google Summer of Code 2013 MFC after: 1 month	2013-08-18 10:21:29 +00:00
Jilles Tjoelker	0f3a4d8051	libc: Access _logname_valid more efficiently. The variable _logname_valid is not exported via the version script; therefore, change C and i386/amd64 assembler code to remove indirection (which allowed interposition). This makes the code slightly smaller and faster. Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC, there is no place containing the address of each variable, so there is no possible definition for PIC_GOT.	2013-08-17 19:24:58 +00:00
Jung-uk Kim	5772203b17	Simplify check for CMPXCHG8B instruction. Note CMPXCHG8B instruction is always available for Rise mP6 processors although it is not set by CPUID.	2013-08-15 21:09:05 +00:00
Jung-uk Kim	38da30b419	Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these two files were functionally identical.	2013-08-13 22:05:10 +00:00
Jung-uk Kim	3bd12ca8f1	Tidy up global locks for ACPICA. There is no functional change.	2013-08-13 21:34:03 +00:00
Konstantin Belousov	c325e866f4	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
Attilio Rao	e946b94934	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Andriy Gapon	9ba0691bdd	follow up to r254051 - update powerpc/GENERIC64 as well, suggested by mdf - update comments so that they make sense after the change, suggested by jhb X-MFC after: never (change specific to head)	2013-08-09 08:11:09 +00:00
Andriy Gapon	818d282e7b	enable KDB_TRACE in GENERICs KDB_TRACE is not an alternative to DDB/etc, they are complementary. So I do not see any reason to not enable KDB_TRACE by default. X-MFC after: never (change specific to head)	2013-08-07 08:03:50 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Jeff Roberson	2c0b86b48f	- Introduce a specific function, pmap_remove_kernel_pde, for removing huge pages in the kernel's address space. This works around several asserts from pmap_demote_pde_locked that did not apply and gave false warnings. Discovered by: pho Reviewed by: alc Sponsored by: EMC / Isilon Storage Division	2013-08-05 00:28:03 +00:00
David E. O'Brien	0e6a0799a9	Back out r253779 & r253786.	2013-07-31 17:21:18 +00:00
David E. O'Brien	99ff83da74	Decouple yarrow from random(4) device. * Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option. The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow. * random(4) device doesn't really depend on rijndael-. Yarrow, however, does. Add random_adaptors.[ch] which is basically a store of random_adaptor's. random_adaptor is basically an adapter that plugs in to random(4). random_adaptor can only be plugged in to random(4) very early in bootup. Unplugging random_adaptor from random(4) is not supported, and is probably a bad idea anyway, due to potential loss of entropy pools. We currently have 3 random_adaptors: + yarrow + rdrand (ivy.c) + nehemeiah * Remove platform dependent logic from probe.c, and move it into corresponding registration routines of each random_adaptor provider. probe.c doesn't do anything other than picking a specific random_adaptor from a list of registered ones. * If the kernel doesn't have any random_adaptor adapters present then the creation of /dev/random is postponed until next random_adaptor is kldload'ed. * Fix randomdev_soft.c to refer to its own random_adaptor, instead of a system wide one. Submitted by: arthurmesh@gmail.com, obrien Obtained from: Juniper Networks Reviewed by: obrien	2013-07-29 20:26:27 +00:00
Andriy Gapon	a29cc9a34b	Revert r253748,253749 This WIP should not have been committed yet. Pointyhat to: avg	2013-07-28 18:44:17 +00:00
Andriy Gapon	366d8bfb7b	put contents of cpu.h under _KERNEL no userland-serviceable parts inside MFC after: 20 days	2013-07-28 18:32:27 +00:00
Andriy Gapon	a69e8d609e	x86: detect mwait capabilities and extensions, when present Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks	2013-07-28 17:54:42 +00:00
Jeff Roberson	2f84c08eee	- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc. This eliminates some unusual uses of that API in favor of more typical uses of kmem_malloc(). Discussed with: kib/alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-07-26 19:06:14 +00:00
Andrey V. Elsukov	dbd4437b06	Include sys/systm.h after sys/param.h. Suggested by: pluknet	2013-07-15 15:40:57 +00:00
Gleb Smirnoff	59b9c4f289	Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5. Now that r253351 moved sendfile() stats to a separate struct, the last field used in mbstat is m_mcfail, which is updated, but never read or obtained from userland.	2013-07-15 12:18:36 +00:00
Andrey V. Elsukov	05d1f5bce0	Introduce new structure sfstat for collecting sendfile's statistics and remove corresponding fields from struct mbstat. Use PCPU counters and SFSTAT_INC() macro for update these statistics. Discussed with: glebius	2013-07-15 06:16:57 +00:00
Konstantin Belousov	3c901a9040	Create a proper stack frame for i386 version of bcopy(), despite the function is leaf. The frame allows ddb to not loose the direct caller of bcopy() in backtrace. Other functions from support.s would benefit from the same change as well, but for now bcopy() is the most frequent offender. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-13 19:42:52 +00:00
Konstantin Belousov	30dac21d0a	Explicitely panic instead of possibly doing undefined things when ptelist KVA is exhausted. Currently this cannot happen, the added panic serves as assert. Discussed with: alc Sponsored by: The FreeBSD Foundation	2013-07-11 05:15:30 +00:00
Konstantin Belousov	3fb25770a9	MFamd64 r253140: Clear m->object for the page taken from the delayed free list in pmap_pv_reclaim(). Noted by: alc	2013-07-11 05:10:36 +00:00
Xin LI	1fdeb1651c	Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver. This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms. Many thanks to HighPoint for providing this driver. MFC after: 1 day	2013-07-06 07:49:41 +00:00
Konstantin Belousov	70a7dd5d5b	Fix issues with zeroing and fetching the counters, on x86 and ppc64. Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation	2013-07-01 02:48:27 +00:00
Jung-uk Kim	b1ddd13145	Move definitions required by userland applications out of acpica_machdep.h.	2013-06-27 00:22:40 +00:00
Konstantin Belousov	c788f92509	Some clarifications and updates for the comments, mostly retrieved from Bruce Evans. Trim the trailing spaces. MFC after: 1 week	2013-06-19 05:05:16 +00:00
Justin T. Gibbs	7efb630573	Adjust i386 Xen PV support for updated Xen interface files. sys/i386/include/xen/xenvar.h: sys/i386/xen/xen_machdep.c: sys/xen/interface/foreign/structs.py: sys/xen/evtchn/evtchn.c: MAX_VIRT_CPUS => XEN_LEGACY_MAX_VCPUS Submitted by: Roger Pau Monné Reviewed by: gibbs	2013-06-17 01:43:07 +00:00
Justin T. Gibbs	a8f6ac0573	Upgrade Xen interface headers to Xen 4.2.1. Move FreeBSD from interface version 0x00030204 to 0x00030208. Updates are required to our grant table implementation before we can bump this further. sys/xen/hvm.h: Replace the implementation of hvm_get_parameter(), formerly located in sys/xen/interface/hvm/params.h. Linux has a similar file which primarily stores this function. sys/xen/xenstore/xenstore.c: Include new xen/hvm.h header file to get hvm_get_parameter(). sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: Correctly protect function definition and variables from being included into assembly files in xen-os.h Xen memory barriers are now prefixed with "xen_" to avoid conflicts with OS native primatives. Define Xen memory barriers in terms of the native FreeBSD primatives. Sponsored by: Spectra Logic Corporation Reviewed by: Roger Pau Monné Tested by: Roger Pau Monné Obtained from: Roger Pau Monné (bug fixes)	2013-06-14 23:43:44 +00:00
Jeff Roberson	17a2737732	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
Konstantin Belousov	9138579845	Assert that interrupts are enabled in the trap handlers on x86 before calling generic code to deliver signals. Discussed with: bde Tested by: pho MFC after: 1 week	2013-06-03 17:40:05 +00:00
Konstantin Belousov	07d46f9c18	MFamd64: when printing the trap information, show the %esp value. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-06-03 04:19:21 +00:00
Konstantin Belousov	cb5bfd1240	Use slightly more idiomatic expression to get the address of array. Tested by: dim, pgj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:39:39 +00:00
Konstantin Belousov	6806ce6ec8	When handling an exception from the attempt from loading the faulting context on return from the trap handler, re-enable the interrupts on i386 and amd64. The trap return path have to disable interrupts since the sequence of loading the machine state is not atomic. The trap() function which transfers the control to the special handler would enable the interrupt, but an iret loads the previous eflags with PSL_I clear. Then, the special handler calls trap() on its own, which now sees the original eflags with PSL_I set and does not enable interrupts. The end result is that signal delivery and process exiting code could be executed with interrupts disabled, which is generally wrong and triggers several assertions. For amd64, the interrupts are enabled conditionally based on PSL_I in the eflags of the outer frame, as it is already done for doreti_iret_fault. For i386, the interrupts are enabled unconditionally, the ast loop could have opened a window with interrupts enabled just before the iret anyway. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:26:08 +00:00
Achim Leubner	dce93cd06d	Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver. Approved by: scottl (mentor)	2013-05-24 09:22:43 +00:00
Attilio Rao	9af6d512f5	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc	2013-05-21 20:38:19 +00:00
Marcel Moolenaar	cb34ed4434	Add basic support for FDT to i386 & amd64. This change includes: 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.	2013-05-21 03:05:49 +00:00
Ed Schouten	31ffd8039d	Improve readability of static assertions for OFFSET_* macros. Instead of doing all sorts of weird casting of constants to pointer-pointers, simply use the standard C offsetof() macro to obtain the offset of the respective fields in the structures.	2013-05-13 21:47:17 +00:00
Peter Wemm	dda759d344	Tidy up some CVS workarounds.	2013-05-12 01:53:47 +00:00
Attilio Rao	941646f5ec	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc	2013-05-07 22:46:24 +00:00
Tijl Coosemans	c67f5b54d9	Remove redundant definitions of _ALIGN and _ALIGNBYTES.	2013-04-21 11:12:44 +00:00
Gabor Kovesdan	a8b5c2a0aa	- Correct spelling in comments Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:56:11 +00:00
Gabor Kovesdan	f0d0985ee9	- Correct mispellings of word miscellaneous Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:43:46 +00:00
Konstantin Belousov	fcb29b9210	Fix the name of the pcb member in the comments. Submitted by: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 3 days	2013-04-13 15:20:33 +00:00
Edward Tomasz Napierala	8ed9860914	Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE' and kern.cam.ctl.disable tunable; those were introduced as a workaround to make it possible to boot GENERIC on low memory machines. With ctl(4) being built as a module and automatically loaded by ctladm(8), this makes CTL work out of the box. Reviewed by: ken Sponsored by: FreeBSD Foundation	2013-04-12 16:25:03 +00:00
Konstantin Belousov	706c56e4a9	Pass the segmented address of the counter, based on %fs, i.e. offset from the pcpu[0] to the counter base, instead of the linear address.	2013-04-09 17:55:39 +00:00
Gleb Smirnoff	4e76af6a41	Merge from projects/counters: counter(9). Introduce counter(9) API, that implements fast and raceless counters, provided (but not limited to) for gathering of statistical data. See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html for more details. In collaboration with: kib Reviewed by: luigi Tested by: ae, ray Sponsored by: Nginx, Inc.	2013-04-08 19:40:53 +00:00
Gleb Smirnoff	17dece86fe	Merge from projects/counters: Pad struct pcpu so that its size is denominator of PAGE_SIZE. This is done to reduce memory waste in UMA_PCPU_ZONE zones. Sponsored by: Nginx, Inc.	2013-04-08 19:19:10 +00:00
Alexander Motin	45f6d66569	Remove all legacy ATA code parts, not used since options ATA_CAM enabled in most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never	2013-04-04 07:12:24 +00:00
Konstantin Belousov	d4e9009cc8	Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal buffer map size, auto-tuned on the 4GB machine. Having the maxbcache bigger than the buffer map causes the transient bio map sizing logic to assume that there is enough KVA to use approximately 90MB (buffer map is sized to 110MB, and maxbcache is 200MB). The increase in the KVA usage caused other big KVA consumers, like nvidia.ko, to fail the initialization. Change the definition for both PAE and non-PAE cases, since PAE is even more KVA-starved. Reported and tested by: David Wolfskill Discussed with: alc Sponsored by: The FreeBSD Foundation	2013-03-27 10:52:18 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Alan Cox	9f585991ba	The kernel pmap is statically allocated, so there is really no need to explicitly initialize its pm_root field to zero. Sponsored by: EMC / Isilon Storage Division	2013-03-10 21:07:44 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Bryan Venteicher	0cfbcf8c7b	Remove the virtio dependency entry for the VirtIO device drivers. This will prevent the kernel from linking if the device driver are included without the virtio module. Remove pci and scbus for the same reason. Also explain the relationship and necessity of the virtio and virtio_pci modules. Currently in FreeBSD, we only support VirtIO PCI, but it could be replaced with a different interface (like MMIO) and the device (network, block, etc) will still function. Requested by: luigi Approved by: grehan (mentor) MFC after: 3 days	2013-03-06 07:17:53 +00:00
Kenneth D. Merry	3a45b4781a	Re-enable CTL in GENERIC on i386 and amd64, but turn on the CTL disable tunable by default. This will allow GENERIC configurations to boot on small memory boxes, but not require end users who want to use CTL to recompile their kernel. They can simply set kern.cam.ctl.disable=0 in loader.conf. The eventual solution to the memory usage problem is to change the way CTL allocates memory to be more configurable, but this should fix things for small memory situations in the mean time. UPDATING: Explain the change in the CTL configuration, and how users can enable CTL if they would like to use it. sys/conf/options: Add a new option, CTL_DISABLE, that prevents CTL from initializing. ctl.c: If CTL_DISABLE is turned on, don't initialize. i386/conf/GENERIC, amd64/conf/GENERIC: Re-enable device ctl, and add the CTL_DISABLE option.	2013-03-04 21:18:45 +00:00
Attilio Rao	03e78eac37	Fix-up r247622 by also renaming pv_list iterator into the xen pmap verbatim copy. Sponsored by: EMC / Isilon storage division Reported by: tinderbox	2013-03-03 01:02:57 +00:00
Attilio Rao	b38d37f7b5	Merge from vmc-playground branch: Rename the pv_entry_t iterator from pv_list to pv_next. Besides being more correct technically (as the name seems to suggest this is a list while it is an iterator), it will also be needed by vm_radix work to avoid a nameclash on macro expansions. Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: flo, pho, jhb, davide	2013-03-02 14:19:08 +00:00
Adrian Chadd	fe138cc2af	Disable the ctl driver in GENERIC. It unfortunately steals a fair chunk of RAM at startup even if it's not actively used, which prevents FreeBSD VMs of 128MB from successfully booting and running.	2013-03-02 08:12:41 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Attilio Rao	dc1558d1cd	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-27 18:12:13 +00:00
Warner Losh	ebf4319503	Locking for todr got pushed down into inittodr and the client libraries it calls (although some might not be doing it right). We are serialized right now by giant as well. This means the splsoftclock are now an anachronism that has no benefit, even marking where locking needs to happen. Remove them.	2013-02-21 07:16:40 +00:00
Konstantin Belousov	31a53cd036	Convert machine/elf.h, machine/frame.h, machine/sigframe.h, machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386. Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far. The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.	2013-02-20 17:39:52 +00:00
Jung-uk Kim	00a54dfb1c	Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is no functional change.	2013-02-15 22:43:08 +00:00
Andriy Gapon	1a89ca4cf5	cpususpend_handler: mark AP as resumed only after fully setting up lapic Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days	2013-02-02 12:04:32 +00:00
Andriy Gapon	548b201607	x86 suspend/resume: suspend pics and pseudo-pics in reverse order - change 'pics' from STAILQ to TAILQ - ensure that Local APIC is always first in 'pics' Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days	2013-02-02 12:02:42 +00:00
Eitan Adler	4752ed3d7f	Remove support for plip from the GENERIC kernel as no systems in the last 10 years require this support. Discussed with: db Discussed with: kib Reviewed by: imp Reviewed by: jhb Reviewed by: -hackers Approved by: cperciva (mentor)	2013-02-01 20:17:11 +00:00
Andre Oppermann	8291b48244	Remove unused VM_MAX_AUTOTUNE_NMBCLUSTERS define.	2013-02-01 14:16:37 +00:00
John Baldwin	d825ce0a5d	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
John Baldwin	fb709557a3	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
John Baldwin	b5821c6f0e	Fix build with SMP disabled.` Reported by: bf	2013-01-19 01:18:22 +00:00
John Baldwin	f876ffeae3	Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway. MFC after: 2 weeks	2013-01-17 21:32:25 +00:00
Bryan Venteicher	ae366ffcbd	Add VirtIO to the i386 and amd64 GENERIC kernels This also removes the kludge from r239009 that covered only the network driver. Reviewed by: grehan Approved by: grehan (mentor) MFC after: 1 week	2013-01-13 07:14:16 +00:00
Konstantin Belousov	0dcbedfa61	Enable the UFS quotas for big-iron GENERIC kernels. Discussed with: mckusick MFC after: 2 weeks	2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav	36fca20f10	As discussed on -current last October, remove the firewire drivers from GENERIC.	2013-01-03 14:30:24 +00:00
Marius Strobl	9355088a94	Fix !INVARIANTS && !SMP build. MFC after: 3 days	2013-01-03 01:09:50 +00:00
Jim Harris	f2fcc434ee	Revert r243960 based on feedback regarding keeping x86 headers unified (mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@). Alternate implementation will be made in a separate commit.	2012-12-13 21:27:20 +00:00
Jim Harris	71a30c4436	Add amd64 implementations for 8-byte bus_space routines. Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week	2012-12-06 22:33:31 +00:00
Konstantin Belousov	349438a243	Print the frame addresses for the backtraces on i386 and amd64. It allows both to inspect the frame sizes and to manually peek into the frames from ddb, if needed. Reviewed by: dim MFC after: 2 weeks	2012-12-03 22:16:51 +00:00
Jung-uk Kim	7609e73ca0	Remove duplicate code. Reduce diff between amd64 and i386.	2012-12-01 00:56:19 +00:00
Jung-uk Kim	8c2b353ead	Use volatile keywords properly.	2012-11-30 20:15:01 +00:00
Jung-uk Kim	231ac244f8	Tidy up inline assembly. No functional change.	2012-11-30 00:59:37 +00:00
Dimitry Andric	bdea742cf7	Fix a minor warning in sys/i386/xen/clock.c. MFC after: 3 days	2012-11-12 20:50:11 +00:00
Alfred Perlstein	79f62ed690	Allow maxusers to scale on machines with large address space. Some hooks are added to clamp down maxusers and nmbclusters for small address space systems. VM_MAX_AUTOTUNE_MAXUSERS - the max maxusers that will be autotuned based on physical memory. VM_MAX_AUTOTUNE_NMBCLUSTERS - max nmbclusters based on physical memory. These are set to the old values on i386 to preserve the clamping that was being done to all arches. Another macro VM_AUTOTUNE_NMBCLUSTERS is provided to allow an override for the calculation on a MD basis. Currently no arch defines this. Reviewed by: peter MFC after: 2 weeks	2012-11-10 02:08:40 +00:00
Attilio Rao	cfedf924d3	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris	2012-11-03 23:03:14 +00:00
Konstantin Belousov	140dedb81c	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks	2012-11-02 13:56:36 +00:00
Konstantin Belousov	9065aa6497	Add missed sched_pin(). Submitted by: Svatopluk Kraus <onwahe@gmail.com> Reviewed by: alc MFC after: 3 days	2012-10-24 18:21:22 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Eitan Adler	a8de37b024	This isn't functionally identical. In some cases a hint to disable unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)	2012-10-22 13:06:09 +00:00
Eitan Adler	1611e2c0d1	The 'testing memory' patch gets printed too many times Approved by: cperciva (implicit)	2012-10-22 11:57:26 +00:00
Eitan Adler	76b7512247	Now that device disabling is generic, remove extraneous code from the device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week	2012-10-22 03:41:14 +00:00
Eitan Adler	267cc84937	Explain the upcoming delay by printing a message when the kernel is about to begin testing memory. Reviewed by: dteske, adri Approved by: cperciva MFC after: 1 week	2012-10-22 03:16:39 +00:00
Konstantin Belousov	737ced3ecf	MFamd64: add machdep.uprintf_signal. MFC after: 1 week	2012-10-14 17:09:50 +00:00
Andriy Gapon	851dbc07af	pciereg_cfg*: use assembly to access the mem-mapped cfg space AMD BKDG for CPU families 10h and later requires that the memory mapped config is always read into or written from al/ax/eax register. Discussed with: kib, alc Reviewed by: kib (earlier version) MFC after: 25 days	2012-10-14 10:13:50 +00:00
Alan Cox	c1a16a1fb2	Replace all uses of the vm page queues lock by a new R/W lock. Unfortunately, this lock cannot be defined as static under Xen because it is (ab)used to serialize queued page table changes. Tested by: sbruno	2012-10-12 23:26:00 +00:00
Alan Cox	0bec9f73db	MFi386 r241356 Add several asserts. MFC after: 3 days	2012-10-10 17:15:34 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Attilio Rao	3a4730256a	Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks	2012-10-09 14:32:30 +00:00
Attilio Rao	af2bdacafb	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius	2012-10-09 12:22:43 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Konstantin Belousov	4e4458327a	Add several asserts to i386 pmap, which mostly state that pv entry shall have corresponding pte. Reviewed by: alc Tested by: pho MFC after: 3 days	2012-10-08 18:33:08 +00:00
Alan Cox	e1de0706a0	In a few places, like the implementation of ptrace(), a thread may call upon pmap_enter() to create a mapping within a different address space, i.e., not the thread's own address space. On i386, this entails the creation of a temporary mapping to the affected page table page (PTP). In general, pmap_enter() will read from this PTP, allocate a PV entry, and write to this PTP. The trouble comes when the system is short of memory. In order to allocate a new PV entry, an older PV entry has to be reclaimed. Reclaiming a PV entry involves destroying a mapping, which requires access to the affected PTP. Thus, the PTP mapped at the beginning of pmap_enter() is no longer mapped at the end of pmap_enter(), which leads to pmap_enter() modifying the wrong PTP. To address this problem, pmap_pv_reclaim() is changed to use an alternate method of mapping PTPs. Update a related comment. Reported by: pho Diagnosed by: kib MFC after: 5 days	2012-10-08 16:57:05 +00:00
Kenneth D. Merry	25aae1bed3	Add the mps(4) driver to the i386 GENERIC config file. LSI has tested it on i386 and verified that it works. Submitted by: Harald Schmalzbauer, John Baldwin, Kashyap Desai MFC after: 3 days	2012-10-01 21:42:32 +00:00
Kevin Lo	954c5baed9	Add missing header needed by free(9). Spotted by: David Wolfskill <david at catwhisker dot org>	2012-09-30 15:42:20 +00:00
Kevin Lo	b5db12bfb5	Free result of device_get_children(9).	2012-09-30 09:21:10 +00:00
John Baldwin	960b5a7080	- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks	2012-09-28 11:59:32 +00:00
Alan Cox	e4b8a2fc5a	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.	2012-09-28 05:30:59 +00:00
Dimitry Andric	f99157cced	After r205013, amd64 and i386 CPU family and model IDs were printed out in hexadecimal, but without any 0x prefix, which can be very misleading. MFC after: 3 days	2012-09-21 10:31:19 +00:00
Jim Harris	eb85d44f06	Integrate nvme(4) and nvd(4) into the amd64 and i386 builds. Sponsored by: Intel	2012-09-17 19:26:33 +00:00
Eitan Adler	582212fa04	s/teh/the/g Approved by: cperciva MFC after: 3 days	2012-09-14 21:59:55 +00:00
Konstantin Belousov	c5e3d0ab11	Rename the IVY_RNG option to RDRAND_RNG. Based on submission by: Arthur Mesh <arthurmesh@gmail.com> MFC after: 2 weeks	2012-09-13 10:12:16 +00:00
Alan Cox	7336315b0a	Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(), pmap_unmapdev()'s own direct efforts to destroy the page table entries are redundant, so eliminate them. Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS. Setting PTE_W on MIPS is inconsistent with the implementation of this function on other architectures. Moreover, PTE_W should not be set, unless the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}() doesn't do. MFC after: 10 days	2012-09-10 16:11:29 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Konstantin Belousov	ef9461ba0e	Add support for new Intel on-CPU Bull Mountain random number generator, found on IvyBridge and supposedly later CPUs, accessible with RDRAND instruction. From the Intel whitepapers and articles about Bull Mountain, it seems that we do not need to perform post-processing of RDRAND results, like AES-encryption of the data with random IV and keys, which was done for Padlock. Intel claims that sanitization is performed in hardware. Make both Padlock and Bull Mountain random generators support code covered by kernel config options, for the benefit of people who prefer minimal kernels. Also add the tunables to disable hardware generator even if detected. Reviewed by: markm, secteam (simon) Tested by: bapt, Michael Moll <kvedulv@kvedulv.de> MFC after: 3 weeks	2012-09-05 13:18:51 +00:00
Alan Cox	d8f9ed32c5	Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the comment describing them. Both the function names and the comment had grown stale. Quite some time has passed since these pmap implementations last used the page's hold count to track the number of valid mapping within a page table page. Also, returning TRUE from pmap_unwire_ptp() rather than _pmap_unwire_ptp() eliminates a few instructions from callers like pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used directly by a conditional statement.	2012-09-05 06:02:54 +00:00
Xin LI	0807ad7422	Add hpt27xx to GENERIC kernel for amd64 and i386 systems. MFC after: 2 weeks	2012-09-04 21:02:57 +00:00
John Baldwin	778eefa40d	Fix duplicate entries for mwl(4): - Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is already present in sys/conf/NOTES). - Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES. - While here, add a description to the sfxge line in amd64/conf/NOTES.	2012-09-04 19:19:36 +00:00
Dimitry Andric	7b4806d69f	Remove the argument-less .align directive in sys/i386/bios/smapi_bios.S. Specifying no argument is undocumented in the gas manual, and clang's integrated assembler refuses to parse it. Also, removing it causes no change at all in the resulting object file. MFC after: 1 week	2012-08-29 18:22:52 +00:00
John Baldwin	4c1044491b	Fix misspelled "Infiniband". Submitted by: gcooper MFC after: 3 days	2012-08-28 11:34:09 +00:00
Dag-Erling Smørgrav	ae7f84a9a4	Parly revert r239255: reinstate a default maxswzone on i386, where KVA is scarce, but set it slightly higher so we can handle 8 GB of swap.	2012-08-27 13:22:27 +00:00
Glen Barber	67944c4572	Grammar fix: s/NIC's/NICs/ MFC after: 3 days	2012-08-26 01:21:02 +00:00
Dag-Erling Smørgrav	e2082935f0	As discussed on -current, remove the hardcoded default maxswzone. MFC after: 3 weeks	2012-08-14 17:01:21 +00:00
John Baldwin	6e4ac34b07	Remove the deassert INIT IPI from the IPI startup sequence for APs. It is not listed in the boot sequence in the MP specification (1.4), and it is explicitly ignored on modern CPUs. It was only ever required when bootstrapping systems with external APICs (that is, SMP machines with 486s), which FreeBSD has never supported (and never will). While here, tidy some comments and remove some banal ones.	2012-08-13 18:52:51 +00:00
John Baldwin	a4284ef768	Add a 10 millisecond delay after sending the initial INIT IPI. This matches the algorithm in the MP specification (1.4). Previously we were sending out the deassert INIT IPI immediately after the initial INIT IPI was sent.	2012-08-13 16:33:22 +00:00
Colin Percival	347c7fd7bf	Build modules along with the XENHVM kernels. No objections from: freebsd-xen mailing list MFC after: 1 week	2012-08-13 07:36:57 +00:00
Alan Cox	bab30462fb	Eliminate an unnecessary acquisition and release of the page queues lock from pmap_pte(). PT_SET_MA() is not a queued mapping update, but instead an immediate mapping update, so the page queues lock is not required here. Reviewed by: cperciva	2012-08-10 05:47:04 +00:00
Konstantin Belousov	0220d04fe3	Add lfence(). MFC after: 1 week	2012-08-01 17:24:53 +00:00
John Baldwin	bb07b39fa1	Regen.	2012-07-30 20:45:17 +00:00
John Baldwin	e6e0554a98	The linux_lstat() system call accepts a pointer to a 'struct l_stat', not a 'struct ostat'.	2012-07-30 20:44:45 +00:00
Konstantin Belousov	a42fa0af44	Change (unused) prototype for stmxcsr() to match reality. Noted by: jhb MFC after: 1 week	2012-07-30 19:26:02 +00:00
Konstantin Belousov	e93d0cbef1	MFamd64 r238623: Introduce curpcb magic variable. Requested and reviewed by: bde MFC after: 3 weeks	2012-07-26 09:11:37 +00:00
Konstantin Belousov	1e39a4bcee	MFCamd64 r238598: Provide siginfo.si_code for floating point errors when error occurs using the SSE math processor. MFC after: 3 weeks	2012-07-21 21:52:48 +00:00
Konstantin Belousov	dad46c5594	MFamd64 r238668: Stop clearing x87 exceptions in the #MF handler. Requested by: bde MFC after: 1 week	2012-07-21 21:49:05 +00:00
Konstantin Belousov	4596f12de7	MFamd64 r238597: Add stmxcsr. MFC after: 3 weeks	2012-07-21 21:39:23 +00:00
Konstantin Belousov	1a3f27687e	MFamd64 r238669: Force clean FPU state in PCB user FPU save area for PT_I386_{GET,SET}XMMREGS. Reported by: bde MFC after: 1 week	2012-07-21 21:39:02 +00:00
John Baldwin	d706ec297a	Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h> on x86 and use that to implement stop_emulating() in the fpu/npx code. Reimplement start_emulating() in the non-XEN case by using load_cr0() and rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in the description of these instructions in Volume 2 of the ADM. Reviewed by: kib MFC after: 1 month	2012-07-09 20:55:39 +00:00
John Baldwin	5355f65974	Partially revert r217515 so that the mem_range_softc variable is always present on x86 kernels. This fixes the build of kernels that include 'device acpi' but do not include 'device mem'. MFC after: 1 month	2012-07-09 20:42:08 +00:00
Christian Brueffer	87e5ba0b54	Fix XEN build, broken in r237924. Reported by: gcooper Pointy hat: brueffer	2012-07-02 14:03:19 +00:00
Christian Brueffer	593a9dc9b8	Replace an unreachable panic() in vm86_getptr (been there for 13 years) with a KASSERT() behind the functions's only consumer. Suggested by: kib Reviewed by: kib CID: 4494 Found with: Coverity Prevent(tm) MFC after: 2 weeks	2012-07-01 12:59:00 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Konstantin Belousov	a30facd9c7	Commit changes missed from r237435. Properly calculate the signal trampoline addresses after the shared page is enabled. Handle FreeBSD ABIs without shared page support too. Reported and tested by: David Wolfskill <david catwhisker org> (previous version) Pointy hat to: kib MFC after: 1 month	2012-06-22 16:05:56 +00:00
Konstantin Belousov	d69ae4126b	Enable shared page on i386, now it has a use for vdso_timehands. MFC after: 1 month	2012-06-22 07:16:29 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Konstantin Belousov	232aa31fb9	Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to timekeeping information. MFC after: 1 week	2012-06-22 06:38:31 +00:00
Navdeep Parhar	09fe63205c	- Updated TOE support in the kernel. - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m \| grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp \| grep toe # sockstat -46c \| grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)	2012-06-19 07:34:13 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Adrian Chadd	83567110bd	Oops - use the actual 11n enable option.	2012-06-15 15:32:16 +00:00

... 2 3 4 5 6 ...

12657 Commits