freebsd-nq

Author	SHA1	Message	Date
Justin T. Gibbs	e96ca45522	sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: Set PCPU apic_id and acpi_id fields in a fasion compatible with both UP and SMP configurations. Suggested by: jhb Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks	2013-09-20 04:35:09 +00:00
Alan Cox	deb179bb4c	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division	2013-09-20 04:30:18 +00:00
Justin T. Gibbs	8a21c7fbe8	sys/i386/xen_mp_machdep.c: Set a 'fake' acpi_id for the i386 PV port, it is needed in order to use VIRQs or IPI event channels. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks	2013-09-19 14:41:10 +00:00
Pawel Jakub Dawidek	3fded357af	Fix panic in ktrcapfail() when no capability rights are passed. While here, correct all consumers to pass NULL instead of 0 as we pass capability rights as pointers now, not uint64_t. Reported by: Daniel Peyrolon Tested by: Daniel Peyrolon Approved by: re (marius)	2013-09-18 19:26:08 +00:00
Roman Divacky	69d912af45	Regen. Approved by: re (delphij)	2013-09-18 18:49:26 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	70ccaaf58e	Regen. Approved by: re (delphij)	2013-09-18 17:58:03 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
Bryan Venteicher	03c6abfd1c	Add vmx(4) to i386 and amd64 GENERIC Approved by: re (gjb)	2013-09-17 01:54:13 +00:00
Konstantin Belousov	06646d663a	Merge the change r255607 from amd64 to i386. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-09-16 19:58:37 +00:00
Alan Cox	87ee6303e5	Prior to r254304, we only began scanning the active page queue when the amount of free memory was close to the point at which we would begin reclaiming pages. Now, we continuously scan the active page queue, regardless of the amount of free memory. Consequently, we are continuously calling pmap_ts_referenced() on active pages. Prior to this change, pmap_ts_referenced() would always demote superpage mappings in order to obtain finer-grained reference information. This made sense because we were coming under memory pressure and would soon have to begin reclaiming pages. Now, however, with continuous scanning of the active page queue, these demotions are taking a toll on performance. To address this problem, I have replaced the demotion with a heuristic for periodically clearing the reference flag on superpage mappings. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division	2013-09-11 17:23:42 +00:00
John Baldwin	edb572a38c	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)	2013-09-09 18:11:59 +00:00
Justin T. Gibbs	e44af46e4c	Implement PV IPIs for PVHVM guests and further converge PV and HVM IPI implmementations. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers. sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role. sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations. sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files. sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.	2013-09-06 22:17:02 +00:00
Bryan Venteicher	ddb4ffd0c6	Add vmx device to the i386 and amd64 NOTES files	2013-09-06 20:24:21 +00:00
Gleb Smirnoff	2ee9b44cae	Fix build with gcc. Move sf_buf_alloc()/sf_buf_free() declarations to MD headers.	2013-09-06 17:44:13 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
Justin T. Gibbs	081f835212	Better conformance to style(9) and organizational cleanup. No functional changes. sys/i386/xen/mp_machdep.c: Remove extra newlines. Group externs, forward delarations, local types, and pcpu data. Wrap at 80 columns. Use parens in return statements. Tab indent members of array initializers. MFC after: 2 weeks	2013-09-02 22:22:56 +00:00
Justin T. Gibbs	9f40021f28	Introduce a new, HVM compatible, paravirtualized timer driver for Xen. Use this new driver for both PV and HVM instances. This driver requires a Xen hypervisor that supports vector callbacks, VCPUOP hypercalls, and reports that it has a "safe PV clock". New timer driver: Submitted by: will Sponsored by: Spectra Logic Corporation PV port to new driver, and bug fixes: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/dev/xen/timer/timer.c: - Register a PV timer device driver which (currently) implements device_{identify,probe,attach} and stubs device_detach. The detach routine requires functionality not provided by timecounters(4). The suspend and resume routines need additional work (due to Xen requiring that the hypercalls be executed on the target VCPU), and aren't needed for our purposes. - Make sure there can only be one device instance of this driver, and that it only registers one eventtimers(4) and one timecounters(4) device interface. Make both interfaces use PCPU data as needed. - Match, with a few style cleanups & API differences, the Xen versions of the "fetch time" functions. - Document the magic scale_delta() better for the i386 version. - When registering the event timer, bind a separate event channel for the timer VIRQ to the device's event timer interrupt handler for each active VCPU. Describe each interrupt as "xen_et:c%d", so they can be identified per CPU in "vmstat -i" or "show intrcnt" in KDB. - When scheduling a timer into the hypervisor, try up to 60 times if the hypervisor rejects the time as being in the past. In the common case, this retry shouldn't happen, and if it does, it should only happen once. This is because the event timer advertises a minimum period of 100usec, which is only less than the usual hypercall round trip time about 1 out of every 100 tries. (Unlike other similar drivers, this one actually checks whether the hypervisor accepted the singleshot timer set hypercall.) - Implement a RTC PV clock based on the hypervisor wallclock. sys/conf/files: - Add dev/xen/timer/timer.c if the kernel configuration includes either the XEN or XENHVM options. sys/conf/files.i386: sys/i386/include/xen/xen_clock_util.h: sys/i386/xen/clock.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/xen_rtc.c: - Remove previous PV timer used in i386 XEN PV kernels, the new timer introduced in this change is used instead (so we share the same code between PVHVM and PV). MFC after: 2 weeks	2013-08-29 23:11:58 +00:00
Justin T. Gibbs	76acc41fb7	Implement vector callback for PVHVM and unify event channel implementations Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.	2013-08-29 19:52:18 +00:00
Alan Cox	51321f7c31	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division	2013-08-29 15:49:05 +00:00
Justin T. Gibbs	8ac6a7aa17	Rename definition of HYPERVISOR_VIRT_START to avoid conflict with upstream Xen definition found in xen/interface/arch-x86/xen-x86_32.h. Submitted by: Roger Pau Monné Reviewed by: gibbs MFC after: 2 weeks	2013-08-22 20:07:06 +00:00
Konstantin Belousov	e68c64f0ba	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-22 18:12:24 +00:00
David E. O'Brien	46be218dce	The PADLOCK_RNG and RDRAND_RNG kernel options are now devices. Thus "device padlock_rng" and "device rdrand_rng" should be used instead of "options PADLOCK_RNG" & "options RDRAND_RNG". Requested by: so@ (des) Submitted by: obrien, arthurmesh@gmail.com Obtained from: Juniper Networks	2013-08-21 22:43:29 +00:00
Jung-uk Kim	1533b9f714	Reimplement atomic operations on PDEs and PTEs in pmap.h. This change significantly reduces duplicate code and make it easier to read. Reviewed by: alc, bde	2013-08-21 22:40:29 +00:00
Jung-uk Kim	5188b5f3c2	Implement atomic_cmpset_64() and atomic_swap_64() for i386.	2013-08-21 22:30:11 +00:00
Jung-uk Kim	3264fd707a	Reimplement atomic_load_acq_64() and atomic_store_rel_64() for i386. These functions are now real functions rather than function pointers. Supposedly, it is faster for modern processors. Suggested by: bde	2013-08-21 22:27:42 +00:00
Jung-uk Kim	d36eb3f1c4	Remove empty lines before return statements for style consistency.	2013-08-21 22:05:58 +00:00
Jung-uk Kim	8a1ee2d346	Implement atomic_swap() and atomic_testandset(). Reviewed by: arch, bde, jilles, kib	2013-08-21 22:03:06 +00:00
Jung-uk Kim	da255e4c7f	- Remove the "a" constraint from main output operand for atomic_cmpset(). - Use "+" modifier for the "expect" because it is also an output (unused).	2013-08-21 21:30:06 +00:00
Jung-uk Kim	fe94be3da7	Use '+' modifier for a memory operand that is both an input and an output. It was actually done in r86301 but reverted in r150182 because GCC 3.x was not able to handle it for a memory operand. Apparently, this problem was fixed in GCC 4.1+ and several contrib sources already rely on this feature.	2013-08-21 21:14:16 +00:00
Jung-uk Kim	c1c84ce1bf	Remove bogus labels. No functional change.	2013-08-21 20:49:46 +00:00
Jung-uk Kim	ee93d1173a	Use consistent style. No functional change.	2013-08-21 20:43:50 +00:00
Pawel Jakub Dawidek	417ffc66fa	Add process descriptors support to the GENERIC kernel. It is already being used by the tools in base systems and with sandboxing more and more tools the usage should only increase. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> Sponsored by: Google Summer of Code 2013 MFC after: 1 month	2013-08-18 10:21:29 +00:00
Jilles Tjoelker	0f3a4d8051	libc: Access _logname_valid more efficiently. The variable _logname_valid is not exported via the version script; therefore, change C and i386/amd64 assembler code to remove indirection (which allowed interposition). This makes the code slightly smaller and faster. Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC, there is no place containing the address of each variable, so there is no possible definition for PIC_GOT.	2013-08-17 19:24:58 +00:00
Jung-uk Kim	5772203b17	Simplify check for CMPXCHG8B instruction. Note CMPXCHG8B instruction is always available for Rise mP6 processors although it is not set by CPUID.	2013-08-15 21:09:05 +00:00
Jung-uk Kim	38da30b419	Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these two files were functionally identical.	2013-08-13 22:05:10 +00:00
Jung-uk Kim	3bd12ca8f1	Tidy up global locks for ACPICA. There is no functional change.	2013-08-13 21:34:03 +00:00
Konstantin Belousov	c325e866f4	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
Attilio Rao	e946b94934	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Andriy Gapon	9ba0691bdd	follow up to r254051 - update powerpc/GENERIC64 as well, suggested by mdf - update comments so that they make sense after the change, suggested by jhb X-MFC after: never (change specific to head)	2013-08-09 08:11:09 +00:00
Andriy Gapon	818d282e7b	enable KDB_TRACE in GENERICs KDB_TRACE is not an alternative to DDB/etc, they are complementary. So I do not see any reason to not enable KDB_TRACE by default. X-MFC after: never (change specific to head)	2013-08-07 08:03:50 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Jeff Roberson	2c0b86b48f	- Introduce a specific function, pmap_remove_kernel_pde, for removing huge pages in the kernel's address space. This works around several asserts from pmap_demote_pde_locked that did not apply and gave false warnings. Discovered by: pho Reviewed by: alc Sponsored by: EMC / Isilon Storage Division	2013-08-05 00:28:03 +00:00
David E. O'Brien	0e6a0799a9	Back out r253779 & r253786.	2013-07-31 17:21:18 +00:00
David E. O'Brien	99ff83da74	Decouple yarrow from random(4) device. * Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option. The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow. * random(4) device doesn't really depend on rijndael-. Yarrow, however, does. Add random_adaptors.[ch] which is basically a store of random_adaptor's. random_adaptor is basically an adapter that plugs in to random(4). random_adaptor can only be plugged in to random(4) very early in bootup. Unplugging random_adaptor from random(4) is not supported, and is probably a bad idea anyway, due to potential loss of entropy pools. We currently have 3 random_adaptors: + yarrow + rdrand (ivy.c) + nehemeiah * Remove platform dependent logic from probe.c, and move it into corresponding registration routines of each random_adaptor provider. probe.c doesn't do anything other than picking a specific random_adaptor from a list of registered ones. * If the kernel doesn't have any random_adaptor adapters present then the creation of /dev/random is postponed until next random_adaptor is kldload'ed. * Fix randomdev_soft.c to refer to its own random_adaptor, instead of a system wide one. Submitted by: arthurmesh@gmail.com, obrien Obtained from: Juniper Networks Reviewed by: obrien	2013-07-29 20:26:27 +00:00
Andriy Gapon	a29cc9a34b	Revert r253748,253749 This WIP should not have been committed yet. Pointyhat to: avg	2013-07-28 18:44:17 +00:00
Andriy Gapon	366d8bfb7b	put contents of cpu.h under _KERNEL no userland-serviceable parts inside MFC after: 20 days	2013-07-28 18:32:27 +00:00
Andriy Gapon	a69e8d609e	x86: detect mwait capabilities and extensions, when present Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks	2013-07-28 17:54:42 +00:00
Jeff Roberson	2f84c08eee	- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc. This eliminates some unusual uses of that API in favor of more typical uses of kmem_malloc(). Discussed with: kib/alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-07-26 19:06:14 +00:00
Andrey V. Elsukov	dbd4437b06	Include sys/systm.h after sys/param.h. Suggested by: pluknet	2013-07-15 15:40:57 +00:00
Gleb Smirnoff	59b9c4f289	Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5. Now that r253351 moved sendfile() stats to a separate struct, the last field used in mbstat is m_mcfail, which is updated, but never read or obtained from userland.	2013-07-15 12:18:36 +00:00
Andrey V. Elsukov	05d1f5bce0	Introduce new structure sfstat for collecting sendfile's statistics and remove corresponding fields from struct mbstat. Use PCPU counters and SFSTAT_INC() macro for update these statistics. Discussed with: glebius	2013-07-15 06:16:57 +00:00
Konstantin Belousov	3c901a9040	Create a proper stack frame for i386 version of bcopy(), despite the function is leaf. The frame allows ddb to not loose the direct caller of bcopy() in backtrace. Other functions from support.s would benefit from the same change as well, but for now bcopy() is the most frequent offender. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-13 19:42:52 +00:00
Konstantin Belousov	30dac21d0a	Explicitely panic instead of possibly doing undefined things when ptelist KVA is exhausted. Currently this cannot happen, the added panic serves as assert. Discussed with: alc Sponsored by: The FreeBSD Foundation	2013-07-11 05:15:30 +00:00
Konstantin Belousov	3fb25770a9	MFamd64 r253140: Clear m->object for the page taken from the delayed free list in pmap_pv_reclaim(). Noted by: alc	2013-07-11 05:10:36 +00:00
Xin LI	1fdeb1651c	Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver. This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms. Many thanks to HighPoint for providing this driver. MFC after: 1 day	2013-07-06 07:49:41 +00:00
Konstantin Belousov	70a7dd5d5b	Fix issues with zeroing and fetching the counters, on x86 and ppc64. Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation	2013-07-01 02:48:27 +00:00
Jung-uk Kim	b1ddd13145	Move definitions required by userland applications out of acpica_machdep.h.	2013-06-27 00:22:40 +00:00
Konstantin Belousov	c788f92509	Some clarifications and updates for the comments, mostly retrieved from Bruce Evans. Trim the trailing spaces. MFC after: 1 week	2013-06-19 05:05:16 +00:00
Justin T. Gibbs	7efb630573	Adjust i386 Xen PV support for updated Xen interface files. sys/i386/include/xen/xenvar.h: sys/i386/xen/xen_machdep.c: sys/xen/interface/foreign/structs.py: sys/xen/evtchn/evtchn.c: MAX_VIRT_CPUS => XEN_LEGACY_MAX_VCPUS Submitted by: Roger Pau Monné Reviewed by: gibbs	2013-06-17 01:43:07 +00:00
Justin T. Gibbs	a8f6ac0573	Upgrade Xen interface headers to Xen 4.2.1. Move FreeBSD from interface version 0x00030204 to 0x00030208. Updates are required to our grant table implementation before we can bump this further. sys/xen/hvm.h: Replace the implementation of hvm_get_parameter(), formerly located in sys/xen/interface/hvm/params.h. Linux has a similar file which primarily stores this function. sys/xen/xenstore/xenstore.c: Include new xen/hvm.h header file to get hvm_get_parameter(). sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: Correctly protect function definition and variables from being included into assembly files in xen-os.h Xen memory barriers are now prefixed with "xen_" to avoid conflicts with OS native primatives. Define Xen memory barriers in terms of the native FreeBSD primatives. Sponsored by: Spectra Logic Corporation Reviewed by: Roger Pau Monné Tested by: Roger Pau Monné Obtained from: Roger Pau Monné (bug fixes)	2013-06-14 23:43:44 +00:00
Jeff Roberson	17a2737732	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
Konstantin Belousov	9138579845	Assert that interrupts are enabled in the trap handlers on x86 before calling generic code to deliver signals. Discussed with: bde Tested by: pho MFC after: 1 week	2013-06-03 17:40:05 +00:00
Konstantin Belousov	07d46f9c18	MFamd64: when printing the trap information, show the %esp value. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-06-03 04:19:21 +00:00
Konstantin Belousov	cb5bfd1240	Use slightly more idiomatic expression to get the address of array. Tested by: dim, pgj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:39:39 +00:00
Konstantin Belousov	6806ce6ec8	When handling an exception from the attempt from loading the faulting context on return from the trap handler, re-enable the interrupts on i386 and amd64. The trap return path have to disable interrupts since the sequence of loading the machine state is not atomic. The trap() function which transfers the control to the special handler would enable the interrupt, but an iret loads the previous eflags with PSL_I clear. Then, the special handler calls trap() on its own, which now sees the original eflags with PSL_I set and does not enable interrupts. The end result is that signal delivery and process exiting code could be executed with interrupts disabled, which is generally wrong and triggers several assertions. For amd64, the interrupts are enabled conditionally based on PSL_I in the eflags of the outer frame, as it is already done for doreti_iret_fault. For i386, the interrupts are enabled unconditionally, the ast loop could have opened a window with interrupts enabled just before the iret anyway. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:26:08 +00:00
Achim Leubner	dce93cd06d	Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver. Approved by: scottl (mentor)	2013-05-24 09:22:43 +00:00
Attilio Rao	9af6d512f5	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc	2013-05-21 20:38:19 +00:00
Marcel Moolenaar	cb34ed4434	Add basic support for FDT to i386 & amd64. This change includes: 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.	2013-05-21 03:05:49 +00:00
Ed Schouten	31ffd8039d	Improve readability of static assertions for OFFSET_* macros. Instead of doing all sorts of weird casting of constants to pointer-pointers, simply use the standard C offsetof() macro to obtain the offset of the respective fields in the structures.	2013-05-13 21:47:17 +00:00
Peter Wemm	dda759d344	Tidy up some CVS workarounds.	2013-05-12 01:53:47 +00:00
Attilio Rao	941646f5ec	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc	2013-05-07 22:46:24 +00:00
Tijl Coosemans	c67f5b54d9	Remove redundant definitions of _ALIGN and _ALIGNBYTES.	2013-04-21 11:12:44 +00:00
Gabor Kovesdan	a8b5c2a0aa	- Correct spelling in comments Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:56:11 +00:00
Gabor Kovesdan	f0d0985ee9	- Correct mispellings of word miscellaneous Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:43:46 +00:00
Konstantin Belousov	fcb29b9210	Fix the name of the pcb member in the comments. Submitted by: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 3 days	2013-04-13 15:20:33 +00:00
Edward Tomasz Napierala	8ed9860914	Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE' and kern.cam.ctl.disable tunable; those were introduced as a workaround to make it possible to boot GENERIC on low memory machines. With ctl(4) being built as a module and automatically loaded by ctladm(8), this makes CTL work out of the box. Reviewed by: ken Sponsored by: FreeBSD Foundation	2013-04-12 16:25:03 +00:00
Konstantin Belousov	706c56e4a9	Pass the segmented address of the counter, based on %fs, i.e. offset from the pcpu[0] to the counter base, instead of the linear address.	2013-04-09 17:55:39 +00:00
Gleb Smirnoff	4e76af6a41	Merge from projects/counters: counter(9). Introduce counter(9) API, that implements fast and raceless counters, provided (but not limited to) for gathering of statistical data. See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html for more details. In collaboration with: kib Reviewed by: luigi Tested by: ae, ray Sponsored by: Nginx, Inc.	2013-04-08 19:40:53 +00:00
Gleb Smirnoff	17dece86fe	Merge from projects/counters: Pad struct pcpu so that its size is denominator of PAGE_SIZE. This is done to reduce memory waste in UMA_PCPU_ZONE zones. Sponsored by: Nginx, Inc.	2013-04-08 19:19:10 +00:00
Alexander Motin	45f6d66569	Remove all legacy ATA code parts, not used since options ATA_CAM enabled in most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never	2013-04-04 07:12:24 +00:00
Konstantin Belousov	d4e9009cc8	Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal buffer map size, auto-tuned on the 4GB machine. Having the maxbcache bigger than the buffer map causes the transient bio map sizing logic to assume that there is enough KVA to use approximately 90MB (buffer map is sized to 110MB, and maxbcache is 200MB). The increase in the KVA usage caused other big KVA consumers, like nvidia.ko, to fail the initialization. Change the definition for both PAE and non-PAE cases, since PAE is even more KVA-starved. Reported and tested by: David Wolfskill Discussed with: alc Sponsored by: The FreeBSD Foundation	2013-03-27 10:52:18 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Alan Cox	9f585991ba	The kernel pmap is statically allocated, so there is really no need to explicitly initialize its pm_root field to zero. Sponsored by: EMC / Isilon Storage Division	2013-03-10 21:07:44 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Bryan Venteicher	0cfbcf8c7b	Remove the virtio dependency entry for the VirtIO device drivers. This will prevent the kernel from linking if the device driver are included without the virtio module. Remove pci and scbus for the same reason. Also explain the relationship and necessity of the virtio and virtio_pci modules. Currently in FreeBSD, we only support VirtIO PCI, but it could be replaced with a different interface (like MMIO) and the device (network, block, etc) will still function. Requested by: luigi Approved by: grehan (mentor) MFC after: 3 days	2013-03-06 07:17:53 +00:00
Kenneth D. Merry	3a45b4781a	Re-enable CTL in GENERIC on i386 and amd64, but turn on the CTL disable tunable by default. This will allow GENERIC configurations to boot on small memory boxes, but not require end users who want to use CTL to recompile their kernel. They can simply set kern.cam.ctl.disable=0 in loader.conf. The eventual solution to the memory usage problem is to change the way CTL allocates memory to be more configurable, but this should fix things for small memory situations in the mean time. UPDATING: Explain the change in the CTL configuration, and how users can enable CTL if they would like to use it. sys/conf/options: Add a new option, CTL_DISABLE, that prevents CTL from initializing. ctl.c: If CTL_DISABLE is turned on, don't initialize. i386/conf/GENERIC, amd64/conf/GENERIC: Re-enable device ctl, and add the CTL_DISABLE option.	2013-03-04 21:18:45 +00:00
Attilio Rao	03e78eac37	Fix-up r247622 by also renaming pv_list iterator into the xen pmap verbatim copy. Sponsored by: EMC / Isilon storage division Reported by: tinderbox	2013-03-03 01:02:57 +00:00
Attilio Rao	b38d37f7b5	Merge from vmc-playground branch: Rename the pv_entry_t iterator from pv_list to pv_next. Besides being more correct technically (as the name seems to suggest this is a list while it is an iterator), it will also be needed by vm_radix work to avoid a nameclash on macro expansions. Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: flo, pho, jhb, davide	2013-03-02 14:19:08 +00:00
Adrian Chadd	fe138cc2af	Disable the ctl driver in GENERIC. It unfortunately steals a fair chunk of RAM at startup even if it's not actively used, which prevents FreeBSD VMs of 128MB from successfully booting and running.	2013-03-02 08:12:41 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Attilio Rao	dc1558d1cd	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-27 18:12:13 +00:00
Warner Losh	ebf4319503	Locking for todr got pushed down into inittodr and the client libraries it calls (although some might not be doing it right). We are serialized right now by giant as well. This means the splsoftclock are now an anachronism that has no benefit, even marking where locking needs to happen. Remove them.	2013-02-21 07:16:40 +00:00
Konstantin Belousov	31a53cd036	Convert machine/elf.h, machine/frame.h, machine/sigframe.h, machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386. Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far. The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.	2013-02-20 17:39:52 +00:00
Jung-uk Kim	00a54dfb1c	Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is no functional change.	2013-02-15 22:43:08 +00:00

1 2 3 4 5 ...

12540 Commits