freebsd-dev

Author	SHA1	Message	Date
Ken Smith	7042aba738	Add a warning about why sbp(4) is commented out so that curious folks are forewarned they might wind up with a hole in their foot if they decide to give it a try. Suggested by: dougb	2011-10-19 21:55:20 +00:00
Ken Smith	4c0ba9b742	Comment out the sbp(4) driver for architectures that support it. As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112) but was left alone in head so people could work on fixing an issue that caused boot failure on some motherboards. Apparently nobody has worked on it and we are getting reports of boot failure with the 9.0 test builds. So this time I'll comment out the driver in head (still hoping someone will work on it) and MFC to stable/9. Submitted by: Alberto Villa <avilla at FreeBSD dot org>	2011-10-18 13:45:16 +00:00
Dag-Erling Smørgrav	a417d4a46b	Trace attempts to call restricted MD syscalls.	2011-10-18 07:39:27 +00:00
Konstantin Belousov	6bfe4c78c8	Remove unused define. MFC after: 1 month	2011-10-07 16:09:44 +00:00
Xin LI	db1fda10b4	Add the 9750 SATA+SAS 6Gb/s RAID controller card driver, tws(4). Many thanks for their contiued support to FreeBSD. This is version 10.80.00.003 from codeset 10.2.1 [1] Obtained from: LSI http://kb.lsi.com/Download16574.aspx [1]	2011-10-04 21:40:25 +00:00
Konstantin Belousov	c06f5f6cea	Do not allow the kernel to access usermode pages without installed fault handler. Panic immediately in such situation, on i386 and amd64. Reviewed by: avg, jhb MFC after: 1 week	2011-10-03 17:01:31 +00:00
Attilio Rao	8d79dfca55	Add some improvements in the idle table callbacks: - Replace instances of manual assembly instruction "hlt" call with halt() function calling. - In cpu_idle_mwait() avoid races in check to sched_runnable() using the same pattern used in cpu_idle_hlt() with the 'hlt' instruction. - Add comments explaining the logic behind the pattern used in cpu_idle_hlt() and other idle callbacks. In collabouration with: jhb, mav Reviewed by: adri, kib MFC after: 3 weeks	2011-10-03 14:23:00 +00:00
Neel Natu	40f31c0d0a	Kernel configuration for a bhyve guest.	2011-09-26 07:05:40 +00:00
Kip Macy	9eca9361f9	Auto-generated code from sys_ prefixing makesyscalls.sh change Approved by: re(bz)	2011-09-16 14:04:14 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Peter Grehan	fab4c373af	IFC @ r225592 sys/dev/bvm/bvm_console.c - move up to the new alt-break order.	2011-09-15 22:14:35 +00:00
Konstantin Belousov	20aee906b4	Put amd64_syscall() prototype in md_var.h. Requested by: jhb Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-15 09:54:07 +00:00
Konstantin Belousov	7a1c55c380	Microoptimize the return path for the fast syscalls on amd64. Arrange the code to have the fall-through path to follow the likely target. Do not use intermediate register to reload user %rsp. Proposed by: alc Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-15 09:53:04 +00:00
Konstantin Belousov	e4505da615	The jump target shall be after the padding, not into it. Reported by: alc Approved by: re (bz) MFC after: 2 weeks	2011-09-11 18:00:46 +00:00
Christian Brueffer	b48f7c4c8d	Fix a zyd(4) comment typo that was copy+pasted into most kernel config files. PR: 160276 Submitted by: MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp> Approved by: re (kib) MFC after: 1 week	2011-09-11 17:39:51 +00:00
Konstantin Belousov	8bd1142b52	Perform amd64-specific microoptimizations for native syscall entry sequence. The effect is ~1% on the microbenchmark. In particular, do not restore registers which are preserved by the C calling sequence. Align the jump target. Avoid unneeded memory accesses by calculating some data in syscall entry trampoline. Reviewed by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:08:10 +00:00
Konstantin Belousov	26ccf4f10f	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:05:09 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
John Baldwin	3a3ba1b069	Enable the puc(4) driver on amd64 and i386 in GENERIC. This allows devices supported by puc(4) to work "out of the box" since puc.ko does not work "out of the box". Reviewed by: marcel Approved by: re (kib) MFC after: 1 week	2011-08-26 21:22:34 +00:00
John Baldwin	cee0b197de	Make NKPT a kernel option on amd64 so that it can be set to a non-default value from kernel config files. Reviewed by: alc Approved by: re (kib) MFC after: 1 week	2011-08-26 17:08:22 +00:00
Bjoern A. Zeeb	61bc18a327	In HEAD when doing no further checkes there is no reason use the temporary variable and check with if as TUNABLE_*_FETCH do not alter values unless successfully found the tunable. Reported by: jhb, bde MFC after: 3 days X-MFC with: r224516 Approved by: re (kib)	2011-08-20 19:21:46 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Rick Macklem	88c037e26a	Change all the sample kernel configurations to use NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since NFSCL and NFSD are now the defaults. The client change is needed for diskless configurations, so that the root mount works for fstype nfs. Reported by seanbru at yahoo-inc.com for i386/XEN. Approved by: re (hrs)	2011-08-07 20:16:46 +00:00
Bjoern A. Zeeb	0a5f264d60	Introduce a tunable to disable the time consuming parts of bootup memtesting, which can easily save seconds to minutes of boot time. The tunable name is kept general to allow reusing the code in alternate frameworks. Requested by: many Discussed on: arch (a while a go) Obtained from: Sandvine Incorporated Reviewed by: sbruno Approved by: re (kib) MFC after: 2 weeks	2011-07-30 13:33:05 +00:00
Attilio Rao	786ef92b7b	Bump MAXCPU for amd64, ia64 and XLP mips appropriately. From now on, default values for FreeBSD will be 64 maxiumum supported CPUs on amd64 and ia64 and 128 for XLP. All the other architectures seem already capped appropriately (with the exception of sparc64 which needs further support on jalapeno flavour). Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced during the infrastructure cleanup for supporting MAXCPU > 32. This covers cpumask_t retiral too. The switch is considered completed at the present time, so for whatever bug you may experience that is reconducible to that area, please report immediately. Requested by: marcel, jchandra Tested by: pluknet, sbruno Approved by: re (kib)	2011-07-19 13:00:30 +00:00
Attilio Rao	68b739cd6f	Add the possibility to specify from kernel configs MAXCPU value. This patch is going to help in cases like mips flavours where you want a more granular support on MAXCPU. No MFC is previewed for this patch. Tested by: pluknet Approved by: re (kib)	2011-07-19 00:37:24 +00:00
Peter Grehan	bd2228ab3e	IFC @ r224187	2011-07-18 22:00:21 +00:00
Attilio Rao	521ea19d1c	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)	2011-07-18 15:19:40 +00:00
Neel Natu	14ddf164ba	Get rid of redundant initialization of 'dmask'. It was being re-initialized shortly afterwards.	2011-07-06 21:40:48 +00:00
Jung-uk Kim	f0b28f005e	Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take %rcx as "extensions" in long mode. If any unused bit is set in %rcx, these instructions cause general protection fault. Fix style nits and synchronize i386 with amd64.	2011-07-05 18:42:10 +00:00
Attilio Rao	470107b2f1	MFC	2011-07-04 11:13:00 +00:00
Alan Cox	80788b2a27	When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib	2011-07-02 23:42:04 +00:00
Peter Grehan	23300944db	IFC @ r223696 to pick up dfr's userboot	2011-06-30 17:37:42 +00:00
Jonathan Anderson	12bc222e57	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)	2011-06-30 10:56:02 +00:00
Attilio Rao	7b744f6b01	MFC	2011-06-30 10:19:43 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Jonathan Anderson	24c1c3bf71	We may split today's CAPABILITIES into CAPABILITY_MODE (which has to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd	2011-06-29 13:03:05 +00:00
Peter Grehan	a5615c9044	IFC @ r222830	2011-06-28 06:26:03 +00:00
Attilio Rao	6b6603b30e	Remove the pc_cpumask usage from amd64. Reviewed by: alc Tested by: pluknet	2011-06-26 21:36:53 +00:00
Attilio Rao	de138ec703	MFC	2011-06-24 16:35:40 +00:00
John Baldwin	1368987ae4	Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to the x86 tree. The $PIR code is still only enabled on i386 and not amd64. While here, make the qpi(4) driver on conditional on 'device pci'.	2011-06-22 21:04:13 +00:00
Attilio Rao	9b571ec6b3	MFC	2011-06-22 19:42:32 +00:00
John Baldwin	e8f40e32eb	Oops, missed these in 223424. Reported by: jkim	2011-06-22 18:48:07 +00:00
John Baldwin	3bf59bd14f	Use uintXX_t instead of u_intXX_t.	2011-06-22 17:55:16 +00:00
John Baldwin	38d7a61ba4	Add a helper routine to conditionally modify the start address of a resource allocation from an x86 Host-PCI bridge driver so that it can be reused by the ACPI Host-PCI bridge driver (and eventually the MPTable Host-PCI bridge driver) instead of duplicating the same logic. Note that this means that hw.acpi.host_mem_start is now replaced with the hw.pci.host_mem_start tunable that was already used in the non-ACPI case. This also removes hw.acpi.host_mem_start on ia64 where it was not applicable (the implementation was very x86-specific). While here, adjust the logic to apply the new start address on any "wildcard" allocation even if that allocation comes from a subset of the allowable address range. Reviewed by: imp (1)	2011-06-22 16:15:15 +00:00
Attilio Rao	2e4cefa632	Remove the usage of pc_other_cpus from amd64. Tested by: pluknet	2011-06-21 09:19:38 +00:00
Konstantin Belousov	1c23d0f727	Fix vfork. Add comments.	2011-06-18 12:13:28 +00:00
Hans Petter Selasky	144b716627	Enable USB 3.0 support by default in i386 and amd64 GENERIC kernels. Discussed with: joel @ and thompsa @ MFC after: 7 days	2011-06-14 20:30:49 +00:00
Joel Dahl	701b698b6f	Enable sound support by default on i386 and amd64. The generic sound driver has been added, along with enough device-specific drivers to support the most common audio chipsets. We've discussed enabling it from time to time over the years and we've received numerous requests from users, so we decided that shipping 9.0 with working audio by default would be the best thing to do. Bug reports should be sent to the multimedia@ mailing list, as usual. Approved by: mav No objection: re	2011-06-11 09:08:46 +00:00
John Baldwin	049dc0d1ff	Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the Host-PCI bridge drivers and nexus.	2011-06-10 12:30:16 +00:00
Andriy Gapon	234dab4a82	remove code for dynamic offlining/onlining of CPUs on x86 The code has definitely been broken for SCHED_ULE, which is a default scheduler. It may have been broken for SCHED_4BSD in more subtle ways, e.g. with manually configured CPU affinities and for interrupt devilery purposes. We still provide a way to disable individual CPUs or all hyperthreading "twin" CPUs before SMP startup. See the UPDATING entry for details. Interaction between building CPU topology and disabling CPUs still remains fuzzy: topology is first built using all availble CPUs and then the disabled CPUs should be "subtracted" from it. That doesn't work well if the resulting topology becomes non-uniform. This work is done in cooperation with Attilio Rao who in addition to reviewing also provided parts of code. PR: kern/145385 Discussed with: gcooper, ambrisko, mdf, sbruno Reviewed by: attilio Tested by: pho, pluknet X-MFC after: never	2011-06-08 08:12:15 +00:00
Attilio Rao	74e4245e3f	Bring back the number of CPU to 32.	2011-06-07 08:05:23 +00:00
Attilio Rao	81c02539f1	MFC	2011-06-06 21:38:39 +00:00
Andriy Gapon	ecee337a8c	don't use cpuid level 4 in x86 cpu topology detection if it's not supported This regression was introduced in r213323. There are probably no Intel cpus that support amd64 mode, but do not support cpuid level 4, but it's better to keep i386 and amd64 versions of this code in sync. Discovered by: pho Tested by: pho MFC after: 2 weeks	2011-06-06 14:23:13 +00:00
John Baldwin	8b28761278	Some tweaks to the CPUID support: - Don't always pass the cpuid request to the current CPU as some nodes we will emulate purely in software. - Pass in the APIC ID of the virtual CPU so we can return the proper APIC ID. - Always report a completely flat topology with no SMT or multicore. - Report the CPUID2_HV feature and implement support for the 0x40000000 CPUID level. - Use existing constants from <machine/specialreg.h> when possible and use cpu_feature2 when checking for VMX support.	2011-06-02 14:04:07 +00:00
John Baldwin	b3996dd47c	Add a 'show vmcs' DDB command to dump state about the current CPU's current VMCS.	2011-06-02 13:49:19 +00:00
Kevin Lo	a92e80be3f	Bring back r222275. runfw(4) will statically link in rt2870.fw.uu to the kernel, though I have MODULES_OVERRIDE="" in GENERIC. Spotted by: thompsa	2011-05-25 10:04:13 +00:00
Kevin Lo	6d5ee6cd7f	run(4) needs firmware loaded to work	2011-05-25 04:46:48 +00:00
Peter Grehan	87c3644c64	IFC @ r222256	2011-05-24 15:39:34 +00:00
Attilio Rao	d955f0fccf	Revert a patch that involountary sneaked in while I was MFCing.	2011-05-23 23:51:01 +00:00
Attilio Rao	a9ff18a210	MFC	2011-05-23 01:17:30 +00:00
Neel Natu	ad54f37429	Fix a long standing bug in VMXCTX_GUEST_RESTORE(). There was an assumption by the "callers" of this macro that on "return" the %rsp will be pointing to the 'vmxctx'. The macro was not doing this and thus when trying to restore host state on an error from "vmlaunch" or "vmresume" we were treating the memory locations on the host stack as 'struct vmxctx'. This led to all sorts of weird bugs like double faults or invalid instruction faults. This bug is exposed by the -O2 option used to compile the kernel module. With the -O2 flag the compiler will optimize the following piece of code: int loopstart = 1; ... if (loopstart) { loopstart = 0; vmx_launch(); } else vmx_resume(); into this: vmx_launch(); Since vmx_launch() and vmx_resume() are declared to be __dead2 functions the compiler is free to do this. The compiler has no way to know that the functions return indirectly through vmx_setjmp(). This optimization in turn leads us to trigger the bug in VMXCTX_GUEST_RESTORE(). With this change we can boot a 8.1 guest on a 9.0 host. Reported by: jhb@	2011-05-20 03:23:09 +00:00
Neel Natu	3caf3beb5c	Avoid unnecessary sign extension when promoted to a 64-bit integer. This was benign because the interruption info field is a 32-bit quantity and the hardware guarantees that the upper 32-bits are all zeros. But it did make reading the objdump output very confusing.	2011-05-20 02:08:05 +00:00
Peter Grehan	1f3025e133	Changes to allow the GENERIC+bhye kernel built from this branch to run as a 1/2 CPU guest on an 8.1 bhyve host. bhyve/inout.c inout.h fbsdrun.c - Rather than exiting on accesses to unhandled i/o ports, emulate hardware by returning -1 on reads and ignoring writes to unhandled ports. Support the previous mode by allowing a 'strict' parameter to be set from the command line. The 8.1 guest kernel was vastly cut down from GENERIC and had no ISA devices. Booting GENERIC exposes a massive amount of random touching of i/o ports (hello syscons/vga/atkbdc). bhyve/consport.c dev/bvm/bvm_console.c - implement a simplistic signature for the bvm console by returning 'bv' for an inw on the port. Also, set the priority of the console to CN_REMOTE if the signature was returned. This works better in an environment where multiple consoles are in the kernel (hello syscons) bhyve/rtc.c - return 0 for the access to RTC_EQUIPMENT (yes, you syscons) amd64/vmm/x86.c x86.h - hide a bunch more CPUID leaf 1 bits from the guest to prevent cpufreq drivers from probing. The next step will be to move CPUID handling completely into user-space. This will allow the full spectrum of changes from presenting a lowest-common-denominator CPU type/feature set, to exposing (almost) everything that the host can support. Reviewed by: neel Obtained from: NetApp	2011-05-19 21:53:25 +00:00
Attilio Rao	5f6b159db7	MFC	2011-05-18 16:01:29 +00:00
Jung-uk Kim	2b052e43be	Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features. Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel AVX instructions. The SSE5 bit was recycled as XOP extended instruction bit, CVT16 was deprecated in favor of F16C (half-precision float conversion instructions for AVX), and the remaining FMA4 (4-operand FMA instructions) gained a separate CPUID bit. Replace non-existent references with today's CPUID specifications.	2011-05-17 22:36:16 +00:00
John Baldwin	e22b232b0e	Enable handling of 1GB pages in the direct map since HEAD supports those. Submitted by: neel	2011-05-15 02:09:12 +00:00
John Baldwin	34a6b2d627	First cut at porting the kernel portions of 221828 and 221905 from the BHyVe reference branch to HEAD.	2011-05-14 20:35:01 +00:00
Peter Grehan	6c4c7d0f96	bhyve import part 2 of 2, guest kernel changes. This branch is now considered frozen: future bhyve development will take place in a branch off -CURRENT. sys/dev/bvm/bvm_console.c sys/dev/bvm/bvm_dbg.c - simple console driver/gdb debug port used for bringup. supported by user-space bhyve executable sys/conf/options.amd64 sys/amd64/amd64/minidump_machdep.c - allow NKPT to be set in the kernel config file sys/amd64/conf/GENERIC - mptable config options; bhyve user-space executable creates an mptable with number of CPUs, and optional vendor extension - add bvm console/debug - set NKPT to 512 to allow loading of large RAM disks from the loader - include kdb/gdb sys/amd64/amd64/local_apic.c sys/amd64/amd64/apic_vector.S sys/amd64/include/specialreg.h - if x2apic mode available, use MSRs to access the local APIC, otherwise fall back to 'classic' MMIO mode sys/amd64/amd64/mp_machdep.c - support AP spinup on CPU models that don't have real-mode support by overwriting the real-mode page with a message that supplies the bhyve user-space executable with enough information to start the AP directly in 64-bit mode. sys/amd64/amd64/vm_machdep.c - insert pause statements into cpu shutdown busy-wait loops sys/dev/blackhole/blackhole.c sys/modules/blackhole/Makefile - boot-time loadable module that claims all PCI bus/slot/funcs specified in an env var that are to be used for PCI passthrough sys/amd64/amd64/intr_machdep.c - allow round-robin assignment of device interrupts to CPUs to be disabled from the loader sys/amd64/include/bus.h - convert string ins/outs instructions to loops of individual in/out since bhyve doesn't support these yet sys/kern/subr_bus.c - if the device was no created with a fixed devclass, then remove it's association with the devclass it was associated with during probe. Otherwise, new drivers do not get a chance to probe/attach since the device will stay married to the first driver that it probed successfully but failed to attach. Sponsored by: NetApp, Inc.	2011-05-14 18:37:24 +00:00
Attilio Rao	b2aa562e7b	MFC	2011-05-13 20:58:48 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Peter Grehan	366f60834f	Import of bhyve hypervisor and utilities, part 1. vmm.ko - kernel module for VT-x, VT-d and hypervisor control bhyve - user-space sequencer and i/o emulation vmmctl - dump of hypervisor register state libvmm - front-end to vmm.ko chardev interface bhyve was designed and implemented by Neel Natu. Thanks to the following folk from NetApp who helped to make this available: Joe CaraDonna Peter Snyder Jeff Heller Sandeep Mann Steve Miller Brian Pawlowski	2011-05-13 04:54:01 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Dmitry Chagin	98cde5eede	Remove wrong comment. MFC after: 1 week.	2011-05-11 17:57:15 +00:00
Jung-uk Kim	00c885e181	Add SC_PIXEL_MODE to GENERIC for amd64 and i386. Requested by: many	2011-05-10 16:44:16 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
Andriy Gapon	fdf30d59a6	prepare code that does topology detection for amd cpus for bulldozer This also introduces a new detection path for family 10h and newer pre-bulldozer cpus, pre-10h hardware should not be affected. Tested by: Gary Jennejohn <gljennjohn@googlemail.com> (with pre-10h hardware) MFC after: 2 weeks	2011-05-06 13:51:54 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	8c0ef2464e	Revert md_assert_preempt() introduction. Discussed with: jeff, jhb	2011-05-04 20:29:40 +00:00
Attilio Rao	94ebcddde3	MFC	2011-05-03 18:57:46 +00:00
John Baldwin	6162795be0	Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can be disabled via 'nooptions NEW_PCIB'.	2011-05-03 18:23:11 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Attilio Rao	7be8a2de4f	MFC @ r221324	2011-05-02 14:23:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
Bernhard Schmidt	d1f25d5dcb	Add the remaining wireless drivers. Discussed with: joel	2011-05-01 13:26:34 +00:00
Attilio Rao	f1edea81ac	Add the function md_assert_nopreempt(), which is a very consistent function on the possibility of a thread to not preempt. As this function is very tied to x86 (interrupts disabled checkings) it is not intended to be used in MI code.	2011-04-30 23:12:37 +00:00
Kevin Lo	5aaea65247	Add urtw(4)	2011-04-29 06:36:39 +00:00
Jung-uk Kim	c34e9dbee1	Define "Hypervisor Present" bit. This bit is used by several hypervisors to identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware, and MS Hyper-V are known to set this bit. MFC after: 3 days	2011-04-28 22:23:39 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Rick Macklem	4309e17add	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.	2011-04-27 17:51:51 +00:00
Alexander Motin	0d307e0905	- Add shim to simplify migration to the CAM-based ATA. For each new adaX device in /dev/ create symbolic link with adY name, trying to mimic old ATA numbering. Imitation is not complete, but should be enough in most cases to mount file systems without touching /etc/fstab. - To know what behavior to mimic, restore ATA_STATIC_ID option in cases where it was present before. - Add some more details to UPDATING.	2011-04-26 17:01:49 +00:00
Maxim Sobolev	f30bc1f3b5	With the typical memory size of the system in tenth of gigabytes counting memory being dumped in 16MB increments is somewhat silly. Especially if the dump fails and everything you've got for debugging is screen filled with numbers in 16 decrements... Replace that with percentage-based progress with max 10 updates all fitting into one line. Collapse other very "useful" piece of crash information (total ram) into the same line to save some more space. MFC after: 1 week	2011-04-26 16:14:55 +00:00
Rick Macklem	7c208ed659	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them. Reviewed by: jhb MFC after: 2 weeks	2011-04-25 22:22:51 +00:00
Alexander Motin	97b53e3634	Switch the GENERIC kernels for all architectures to the new CAM-based ATA stack. It means that all legacy ATA drivers are disabled and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers for each type in order of detection, unless configured otherwise with tunables, see cam(4)). ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX.	2011-04-24 08:58:58 +00:00
Konstantin Belousov	3136faa59d	Make pmap_invalidate_cache_range() available for consumption on amd64. Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU cache for the set of pages, which are not neccessary mapped. Since its supposed use is to prepare the move of the pages ownership to a device that does not snoop all CPU accesses to the main memory (read GPU in GMCH), do not rely on CPU self-snoop feature. amd64 implementation takes advantage of the direct map. On i386, extract the helper pmap_flush_page() from pmap_page_set_memattr(), and use it to make a temporary mapping of the flushed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2011-04-18 21:24:42 +00:00
Jung-uk Kim	0e72764232	Add a function rdtsc32() to read lower 32 bits from TSC and discard upper 32 bits. Some times compiler inserts unnecessary instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. It reduces such compiler mistakes where every cycle counts.	2011-04-14 16:53:32 +00:00
Jung-uk Kim	4854ae249c	Consistently use __volatile as the rest of this file.	2011-04-14 16:19:41 +00:00
Jung-uk Kim	f5ac47f44c	Prefer C99 standard integers to reduce diff from i386 version.	2011-04-14 16:14:35 +00:00
Jung-uk Kim	a7817c7ae5	Reduce errors in effective frequency calculation.	2011-04-12 23:49:07 +00:00
Jung-uk Kim	b9e4376214	Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and MPERF MSRs are available. It was disabled in r216443. Remove the earlier hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little bit more reliable now.	2011-04-12 23:04:01 +00:00
Jung-uk Kim	dd3e254ebd	Add forgotten declarations for tsc_perf_stat from the previous commit.	2011-04-12 22:22:01 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	3731174954	Add definitions for CPUID instruction 6, ECX information.	2011-04-12 22:12:23 +00:00
Konstantin Belousov	2140d9c83b	Remove setting of PCB_FULL_IRET at the places where we are going to call update_gdt_{f,g}sbase. The functions set the flag when td == curthread, and sysarch is always called with curthread. Reviewed by: jhb, jkim MFC after: 1 week	2011-04-08 21:27:31 +00:00
Konstantin Belousov	5ab73cbcba	Disable local interrupts before testing the PCB_FULL_IRET flag. Thread might be preempted after testing, which causes the flag to be cleared. If ast was not delivered, we will do sysret with potentially wrong fs/gs bases. Reviewed by: jhb, jkim MFC after: 1 week (together with r220430, r220452)	2011-04-08 21:26:50 +00:00
Ryan Stone	7d6a0bf373	Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi and machdep.kdb_on_nmi. Approved by: emaste (mentor) MFC after: 1 week	2011-04-08 14:39:41 +00:00
John Baldwin	13fb631aff	Fix a bug in the previous change to restore the fast path for syscall return. The ast() function may cause a context switch in which case PCB_FULL_IRET would be set in the pcb. However, the code was not rechecking the flag after ast() returned and would not properly restore the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after ast() returns. While here, trim an instruction (and memory access) from the doreti path and fix a typo in a comment. MFC after: 1 week	2011-04-08 13:33:57 +00:00
John Baldwin	ff265077cf	Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field. MFC after: 3 days	2011-04-08 13:30:48 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
John Baldwin	615d2dffa3	pcb_flags is an int, so use testl rather than testq. Pointy hat to: jhb Submitted by: jkim MFC after: 1 week	2011-04-07 23:13:22 +00:00
John Baldwin	1438b4ced1	If a system call does not request a full interrupt return, use a fast path via the sysretq instruction to return from the system call. This was removed in 190620 and not quite fully restored in 195486. This resolves most of the performance regression in system call microbenchmarks between 7 and 8 on amd64. Reviewed by: kib MFC after: 1 week	2011-04-07 21:32:25 +00:00
Jung-uk Kim	efd393d539	Remove stale checks for RDTSC support. amd64 must have TSC support anyway.	2011-04-07 21:29:34 +00:00
Konstantin Belousov	7332c129e0	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
Andriy Gapon	a930718af1	Revert r220032:linux compat: add SO_PASSCRED option with basic handling I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg	2011-03-31 08:14:51 +00:00
Adrian Chadd	dba9c85977	Break out the ath PCI logic into a separate device/module. Introduce the AHB glue for Atheros embedded systems. Right now it's hard-coded for the AR9130 chip whose support isn't yet in this HAL; it'll be added in a subsequent commit. Kernel configuration files now need both 'ath' and 'ath_pci' devices; both modules need to be loaded for the ath device to work.	2011-03-31 08:07:13 +00:00
Edward Tomasz Napierala	72bcfc9693	Revert part of r220137, committed by mistake - RACCT is _not_ supposed to be enabled in GENERIC.	2011-03-29 18:16:49 +00:00
Edward Tomasz Napierala	097055e26d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
Alan Cox	1c675a3bc3	The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000 instead of 0x100000. As a side effect, an amd64 kernel now loads at physical address 0x200000 instead of 0x100000. This is probably for the best because it avoids the use of a 2MB page mapping for the first 1MB of the kernel that also spans the fixed MTRRs. However, getmemsize() still thinks that the kernel loads at 0x100000, and so the physical memory between 0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired constant in getmemsize() by a symbol "kernphys" that is defined by the linker script. In collaboration with: kib	2011-03-28 06:35:17 +00:00
Alan Cox	63740078e0	Amd64 doesn't have a lazypmap ipi.	2011-03-27 16:18:51 +00:00
Andriy Gapon	01a9e1a11b	linux compat: add SO_PASSCRED option with basic handling This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks	2011-03-26 11:25:36 +00:00
Andriy Gapon	931f0826ea	linux compat: add non-dummy capget and capset system calls, regenerate And drop dummy definitions for those system calls. This may transiently break the build. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:59:24 +00:00
Andriy Gapon	1f4ec5a3ba	linux compat: add non-dummy capget and capset system calls PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:51:56 +00:00
Dmitry Chagin	acface683e	Export the correct AT_PLATFORM value. Since signal trampolines are copied to the shared page do not need to leave place on the stack for it. Forgotten in the previous commit. MFC after: 1 Week	2011-03-26 09:25:35 +00:00
Alan Cox	1587dfd730	Move an external declaration to the appropriate header file.	2011-03-26 06:21:05 +00:00
Jung-uk Kim	cd45fec044	Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU capabilities (e.g., CMPXCHG8B) to work around compatibility issues with ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID and there is no MSR to expose the feature although all mP6 processors are capable of CMPXCHG8B according to datasheets I found from the Net. Clean up and simplify VIA PadLock detection while I am in the neighborhood.	2011-03-26 02:02:07 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	d2b74735b8	For now remove options FLOWTABLE from the remaining GENERIC kernel configurations and make it opt-in for those who want it. LINT will still build it. While it may be a perfect win in some scenarios, it still troubles users (see PRs) in general cases. In addition we are still allocating resources even if disabled by sysctl and still leak arp/nd6 entries in case of interface destruction. Discussed with: qingli (2010-11-24, just never executed) Discussed with: juli (OCTEON1) PR: kern/148018, kern/155604, kern/144917, kern/146792 MFC after: 2 weeks	2011-03-19 15:50:34 +00:00
Jung-uk Kim	38b8542ca9	Deprecate tsc_present as the last of its real consumers finally disappeared.	2011-03-15 17:19:52 +00:00
David Christensen	dd46ab31de	- Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE. (BCM57710, BCM57711, BCM57711E) MFC after: One month	2011-03-14 22:42:41 +00:00
Dmitry Chagin	8f1e49a638	Enable shared page use for amd64/linux32 and i386/linux binaries. Move signal trampoline code from the top of the stack to the shared page. MFC after: 2 Weeks	2011-03-13 14:58:02 +00:00
Andriy Gapon	d549ef5638	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls Regenerate system call and systrace support files. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (earlier version) MFC after: 3 weeks	2011-03-12 08:58:19 +00:00
Andriy Gapon	56ede1074e	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls This commits makes necessary changes in syscall/sysent generation infrastructure. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (ealier version) MFC after: 3 weeks	2011-03-12 08:51:43 +00:00
Andriy Gapon	136882cf92	amd64/NOTES: use a greater number in KSTACK_PAGES example This is a minor cosmetic change - the users are more likely to want to increase (rather than decrease) default kernel stack size, which is already 4 pages on amd64. MFC after: 4 days	2011-03-11 19:21:42 +00:00
Matthew D Fleming	c77715ef6c	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.	2011-03-11 18:56:55 +00:00
Jung-uk Kim	79422085d4	Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.	2011-03-11 00:44:32 +00:00
Matthew D Fleming	cd67ac41ae	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh	2011-03-10 22:56:00 +00:00
Jung-uk Kim	bc34c87e81	Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.	2011-03-10 20:02:58 +00:00
Julian Elischer	a8066a9d3b	Add a small change to the comment in the GENRIC config files that include udbp Submitted by: Chris Forgron, cforgeron at acsi dot ca MFC after: 1 week	2011-03-09 17:15:11 +00:00
Dmitry Chagin	e5d81ef1b5	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
Dmitry Chagin	372f5e052f	Remove dead code. MFC after: 1 Week	2011-03-07 08:12:07 +00:00
Alan Cox	cb25117d54	Make a change to the implementation of the direct map to improve performance on processors that support 1 GB pages. Specifically, if the end of physical memory is not aligned to a 1 GB page boundary, then map the residual physical memory with multiple 2 MB page mappings rather than a single 1 GB page mapping. When a 1 GB page mapping is used for this residual memory, access to the memory is slower than when multiple 2 MB page mappings are used. (I suspect that the reason for this slowdown is that the TLB is actually being loaded with 4 KB page mappings for the residual memory.) X-MFC after: r214425	2011-03-02 00:24:07 +00:00
Robert Watson	74b5505e5d	Continue to introduce Capsicum capability mode: White list sysarch calls allowed in capability mode; arguably, there should be some link between the capability mode model and the privilege model here. Sysarch is a morass similar to ioctl, in many senses. Submitted by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:35:48 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Dmitry Chagin	dc4f0a9e11	To avoid excessive code duplication create wrapper for fill regs from stack frame. Change the trap() code to use newly created function instead of explicit regs assignment.	2011-02-16 17:50:21 +00:00
Dmitry Chagin	09d6cb0a23	For realtime signals fill the sigval value.	2011-02-15 21:46:36 +00:00
Dmitry Chagin	fde6316272	Sort include files in the alphabetical order.	2011-02-13 19:07:48 +00:00
Dmitry Chagin	222198ab0b	Move linux_clone(), linux_fork(), linux_vfork() to a MI path.	2011-02-12 18:17:12 +00:00
Dmitry Chagin	c8d6845e9e	In preparation for moving linux_clone() to a MI path introduce linux_set_upcall_kse().	2011-02-12 16:33:00 +00:00
Dmitry Chagin	2c7660ba3e	In preparation for moving linux_clone () to a MI path move the TLS code in a separate function. Use function parameter instead of direct using register.	2011-02-12 15:50:21 +00:00
Dmitry Chagin	9bd9b52478	Regen for r218610.	2011-02-12 15:36:25 +00:00
Dmitry Chagin	f91ea2518b	The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one.	2011-02-12 15:33:25 +00:00
Konstantin Belousov	6f9ec5aab0	Clear the padding when returning context to the usermode, for MI ucontext_t and x86 MD parts. Kernel allocates the structures on the stack, and not clearing reserved fields and paddings causes leakage. Noted and discussed with: bde MFC after: 2 weeks	2011-02-05 15:10:27 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Dmitry Chagin	77192fddeb	Regen for r218101. MFC after: 1 Month.	2011-01-30 20:38:26 +00:00
Dmitry Chagin	8d73c2bfd1	Change linux futex syscall definition to match actual linux one. MFC after: 1 Month.	2011-01-30 20:31:43 +00:00
Dmitry Chagin	9adaae9403	The kern_wait() code already removes the SIGCHLD signal for the waited process. Removing other SIGCHLD signals is not needed and may cause problems. Pointed out by: jilles MFC after: 1 Month.	2011-01-30 18:17:38 +00:00
Dmitry Chagin	572fb2e33e	My style(9) bug. Pointed out by: kib MFC after: 1 Month.	2011-01-29 07:22:33 +00:00
Dmitry Chagin	adc7ece00a	Implement a variation of the linux_common_wait() which should be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.	2011-01-28 18:47:07 +00:00
Dmitry Chagin	53c74fc607	To avoid excessive code duplication move struct rusage translation to a separate function. MFC after: 1 Month.	2011-01-28 18:28:06 +00:00
Konstantin Belousov	77185f473b	linux_sigreturn() loads the struct trapframe from l_sigcontext members, thus making a signed extension of 32 bit register context. If the register is not touched in usermode between return from signal and next syscall entry, the sign-extension part of 64bit register is not cleared, causing linux32_fetch_syscall_args() to read wrong values. Use unsigned type for the registers in the linux sigcontext. Reported by: Jacob Frelinger <jacob.frelinger duke edu>, arundel In collaboration with: dchagin MFC after: 1 week	2011-01-27 21:45:38 +00:00
Dmitry Chagin	a5c1afadeb	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
Matthew D Fleming	f89f7ada8d	Set td_kstack_pages for thread0. This was already being done for most architectures, but i386 and amd64 were missing it. Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>	2011-01-26 17:06:13 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Konstantin Belousov	de64ee1a30	Use CTLFLAG_RDTUN for read-only sysctl that exports tunable. Reminded by: pjd MFC after: 6 days	2011-01-19 21:35:48 +00:00
Konstantin Belousov	9e52b8b629	Make the length of the LDT a loader tunable, machdep.max_ldt_segment, and export it with read-only sysctl. Remove unused defines. Reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 23:00:22 +00:00
Konstantin Belousov	a05c98a099	Use malloc(9) instead of kmem_alloc(9) for temporal copy of the user-supplied descriptor array. Noted and reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 22:56:10 +00:00
John Baldwin	6bd823f334	- Remove some always-true checks (checking for unsigned < 0). - Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT when descriptors are provided. Specifically, allow the 'start == 0' and 'num == 0' special case used to free all LDT entries that previously failed with EINVAL. Submitted by: clang via rdivacky (some of 1) Reviewed by: kib	2011-01-18 16:43:01 +00:00
Jung-uk Kim	2fea643112	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
Jung-uk Kim	df74996c3d	Avoid preemption while manipulating CRs and MTRRs. Tested by: ariff	2011-01-17 17:30:35 +00:00
Jung-uk Kim	bdbf2db5b2	Remove redundant, bogus, and even harmful uses of setting TS bit in CR0. It is done from fpstate_drop() when it is really necessary. Reviewed by: kib MFC after: 1 week	2011-01-14 21:09:01 +00:00
Matthew D Fleming	240577c2a7	Fix up a few more sysctl(9) mis-typing found in various LINT builds.	2011-01-13 18:20:27 +00:00
John Baldwin	072e9838e2	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve	2011-01-13 17:00:22 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Tijl Coosemans	d22e78d6b9	Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs. Approved by: kib (mentor)	2011-01-08 18:09:48 +00:00
Konstantin Belousov	6297a3d843	Create shared (readonly) page. Each ABI may specify the use of page by setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions. Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page. Reviewed by: alc	2011-01-08 16:13:44 +00:00
Tijl Coosemans	a56e818f29	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
Tijl Coosemans	9858863cd4	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
Konstantin Belousov	39198f15ee	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
Jung-uk Kim	50e3cec377	Increase size of pcb_flags to four bytes. Requested by: bde, jhb	2010-12-22 19:57:03 +00:00
Jung-uk Kim	e6c006d96a	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months	2010-12-22 00:18:42 +00:00
Tijl Coosemans	81bd5041a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
Konstantin Belousov	7222d2fbee	Inform a compiler which asm statements in the x86 implementation of atomics change eflags. Reviewed by: jhb MFC after: 2 weeks	2010-12-18 16:41:11 +00:00
Jung-uk Kim	e1c9d39ebe	Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This function always returned the nominal frequency instead of current frequency because we use RDTSC instruction to calculate difference in CPU ticks, which is supposedly constant for the case. Now we support cpu_get_nominal_mhz() for the case, instead. Note it should be just enough for most usage cases because cpu_est_clockrate() is often times abused to find maximum frequency of the processor.	2010-12-14 20:07:51 +00:00
Robert Watson	9c9f06e60d	Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching its similar disabling of adaptive mutexes and rwlocks. The existing comment on why this is the case also applies to sx locks. MFC after: 3 days Discussed with: attilio	2010-12-13 12:15:46 +00:00
Konstantin Belousov	60c7c84e85	In fpudna()/npxdna(), mark FPU context initialized and optionally mark user FPU context initialized, if current context is user context. It was reversed in r215865, by inadequate change of this code fragment to a call to fpuuserinited()/npxuserinited(). The issue is only relevant for in-kernel users of FPU. Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net> Tested by: Mike Tancsa MFC after: 3 days	2010-12-12 16:16:39 +00:00
Robert Watson	996177338f	Derive the XENHVM kernel from GENERIC, adding only the options required to support PV drivers (such as xenpci), and non-adptive locking (along with a comment about why). This change eliminates the synchronisation problem between GENERIC and XENHVM, which had become severely rotted in HEAD, and in 8-STABLE included non-production kernel debugging features such as WITNESS. However, it comes at the cost of enabling devices and options that may not be present under Xen (such as random ethernet cards). For now, opt for a simpler kernel configuration file rather than using nooptions/ nodevice to enumerate and eliminate them. This leads to a somewhat larger XENHVM kernel. This is an MFC candidate for 8-STABLE before 8.2, in order to provide a production-worthy XENHVM kernel configuration for amd64. Discussed with: gibbs, cperciva Reported by: Piete Brooks <Piete.Brooks at cl.cam.ac.uk> Sponsored by: DARPA, AFRL MFC after: 3 days	2010-12-10 22:22:01 +00:00
Colin Percival	91ff9dc058	Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.	2010-12-09 06:41:50 +00:00
Jung-uk Kim	71e0b05797	Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC. Remove a confusing comment about converting to MHz as we never did.	2010-12-08 23:40:41 +00:00
Colin Percival	af60888734	On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192, while on i386 we have MAX_BPAGES=512. Implement this difference via '#ifdef __i386__'. With this commit, the i386 and amd64 busdma_machdep.c files become identical; they will soon be replaced by a single file under sys/x86.	2010-12-08 20:20:10 +00:00
Colin Percival	ec195da48a	MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen support, but if/when that support is ever added we'll want this, and until then it's harmless.	2010-12-08 19:52:04 +00:00
Colin Percival	81261a5a6d	MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and branch mispredictions. No objections from: scottl	2010-12-08 19:42:21 +00:00
Jung-uk Kim	dd7d207dcb	Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86. Discussed with: avg	2010-12-08 00:09:24 +00:00
Jung-uk Kim	7214d5d75b	Remove stale comments about P-state invariant TSC and fix style(9) nits.	2010-12-07 22:43:25 +00:00
Jung-uk Kim	1bcc28295b	Do not register a event handler for CPU freqency changes when it is found P-state invariant. This is continuation of r216274.	2010-12-07 22:34:51 +00:00
Jung-uk Kim	4a9c4056dc	Now the P-state invariant TSC is probed early enough, do not register event handlers for CPU freqency changes when it is found P-state invariant. Adjust a comment about non-existent tsc_freq_max() while I am here.	2010-12-07 22:23:26 +00:00
Jung-uk Kim	78a661bbaa	Probe P-state invariant TSC from rightful place.	2010-12-07 22:12:02 +00:00
Konstantin Belousov	1b3c32568a	Update some comments related to use of amd64 full context switch. In exec_linux_setregs(), use locally cached pointer to pcb to set pcb_full_iret. In set_regs(), note that full return is needed when code that sets segment registers is enabled. MFC after: 1 week	2010-12-07 12:44:33 +00:00
Konstantin Belousov	0f0170e66a	Retire write-only PCB_FULLCTX pcb flag on amd64. Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week	2010-12-07 12:17:43 +00:00
Konstantin Belousov	3e0ddb6781	Do not leak %rdx value in the previous image to the new image after execve(2). Note that ia32 binaries already handle this properly, since ia32_setregs() resets td_retval[1], but not exec_setregs(). We still do not conform to the amd64 ABI specification, since %rsp on the image startup is not aligned to 16 bytes. PR: amd64/124134 Discussed with: Petr Salinger <Petr.Salinger seznam cz> (who convinced me that there is indeed several bugs) MFC after: 1 week	2010-12-06 15:15:27 +00:00
Jung-uk Kim	2f7ab7e85d	Revert r216161. It is not necessary because we zero-fill BSS anyway. Requested by: jhb	2010-12-03 22:27:51 +00:00
Jung-uk Kim	b14fe63392	Explicitly initialize TSC frequency. To calibrate TSC frequency, we use DELAY(9) and it may use TSC in turn if TSC frequency is non-zero. MFC after: 3 days	2010-12-03 21:54:10 +00:00
Jung-uk Kim	e391a266ed	Do not change CPU ticker frequency if TSC is P-state invariant. Note this change was meant to be committed with r184102 (and its subsequent MFCs) but it fell off somehow. Pointyhat to: jkim MFC after: 3 days	2010-12-03 21:06:30 +00:00
Rebecca Cran	c90f7d9b44	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Konstantin Belousov	c6fb218c3c	Calling fill_fpregs() for curthread is legitimate, and ELF coredump does this. Reported and tested by: pho MFC after: 5 days	2010-11-28 17:56:34 +00:00
Alan Cox	686b00d691	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
Konstantin Belousov	5c6eb03790	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
Tijl Coosemans	ce4ec51dbe	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
Ulrich Spörlein	02604cd4f4	Remove kernel support for BB profiling, now that kernbb(8) is gone, too. PR: bin/83558 Reviewed by: jkim	2010-11-26 08:11:43 +00:00
Dimitry Andric	1496505287	Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use unambiguous inline assembly to load a float variable.	2010-11-25 22:19:40 +00:00
Dimitry Andric	cfe92f33bc	Change ambiguous (or invalid, depending on how strict you want to be :) assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since the intent was to move 16 bits of data, in this case. Found by: clang Reviewed by: kib	2010-11-24 18:35:11 +00:00
Jung-uk Kim	d2d0fda841	Remove a stale tunable introduced in r215703.	2010-11-23 17:28:23 +00:00
Jung-uk Kim	42ca4a29de	Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does better job since r215703 and it is safer now.	2010-11-23 16:12:35 +00:00
Andriy Gapon	9b984feb3d	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
Jung-uk Kim	7dd052c1d9	- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR. Flushing TLBs is required to ensure cache coherency according to the AMD64 architecture manual. Flushing caches is only required when changing from a cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or UC-). Since this function is only used once per processor during startup, there is no need to take any shortcuts. - Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the default of WB and UC. This is to avoid transition from a cacheable memory type to an uncacheable type to minimize possible cache incoherency. Since we perform flushing caches and TLBs now, this change may not be necessary any more but we do not want to take any chances. - Remove Apple hardware specific quirks. With the above changes, it seems this hack is no longer needed. - Improve pmap_cache_bits() with an array to map PAT memory type to index. This array is initialized early from pmap_init_pat(), so that we do not need to handle special cases in the function any more. Now this function is identical on both amd64 and i386. Reviewed by: jhb Tested by: RM (reuf_m at hotmail dot com) Ryszard Czekaj (rychoo at freeshell dot net) army.of.root (army dot of dot root at googlemail dot com) MFC after: 3 days	2010-11-22 19:52:44 +00:00
Andriy Gapon	b43d292565	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
Andriy Gapon	7af7c7624a	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
Andriy Gapon	8fd6d51347	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
Jung-uk Kim	816b3bd1b0	Restore CR0 after MTRR initialization for correctness sakes. There will be no noticeable change because we enable caches before we enter here for both BSP and AP cases. Remove another pointless optimization for CR4.PGE bit while I am here.	2010-11-16 23:26:02 +00:00
Jung-uk Kim	50083a5624	Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this code but probably it only worked by chance because modifying CR4.PGE bit causes invlidation of entire TLBs. Since these are very rare events, this micro-optimization seems useless. Reviewed by: jhb	2010-11-16 22:44:58 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Jung-uk Kim	19da400c64	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
Andriy Gapon	290e14f881	amd64: introduce minidump version 2 After KVA space was increased to 512GB on amd64 it became impractical to use PTEs as entries in the minidump map of dumped pages, because size of that map alone would already be 1GB. Instead, we now use PDEs as page map entries and employ two stage lookup in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are now dumped as regular pages. Fixed page map size now is 2MB. libkvm keeps support for accessing amd64 minidumps of version 1. Support for 1GB pages is added. Many thanks to Alan Cox for his guidance, numerous reviews, suggestions, enhancments and corrections. Reviewed by: alc [kernel part] MFC after: 15 days	2010-11-11 18:35:28 +00:00
Jung-uk Kim	93a8847473	Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.	2010-11-10 18:50:12 +00:00
Jung-uk Kim	7c2bf852d7	Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.	2010-11-10 01:29:56 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
Jung-uk Kim	cedd86cafa	Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.	2010-11-09 00:27:18 +00:00
Jung-uk Kim	2473325fa8	Reduce diff between platforms and fix style(9) bugs.	2010-11-09 00:14:39 +00:00
John Baldwin	13e25cb7a5	Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.	2010-11-08 20:57:02 +00:00
John Baldwin	f67b4bd367	A few small style and whitespace fixes.	2010-11-08 20:05:22 +00:00
Alan Cox	d9a799683c	Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked as "bogus". Reported by: Jia-Shiun Li	2010-11-07 21:48:49 +00:00
John Baldwin	0108cce0a4	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
Andriy Gapon	3b50d59fef	x86 topo_probe: do not probe smp topology if only one cpu is visible This could lead to a division by zero if hardware is multi-core and/or multi-threaded, but for some (quite unusual) reason FreeBSD sees only one logical processor. This could happen, for example, if neither MADT nor MP Table are presented by BIOS. Also: - assert in topo_probe_0x4 that BSP is accounted for - neither cpu_cores nor cpu_logical should be zero after successful probing, so either being zero is an indication of failed probing Reported by: vwe, Dan Allen <danallen46@airwired.net> Tested by: Dan Allen <danallen46@airwired.net> MFC after: 3 days	2010-11-04 08:51:45 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
John Baldwin	5ecdb3c46b	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
Alan Cox	2eeee67ce8	Add another safety belt to pmap_demote_DMAP().	2010-10-30 23:49:37 +00:00
Alan Cox	59fb2d9b04	Don't demote in pmap_demote_DMAP() if the specified length is zero.	2010-10-30 17:21:32 +00:00
Attilio Rao	ba2a27351b	Merge nexus.c from amd64 and i386 to x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-28 16:31:39 +00:00
John Baldwin	89d84a4055	Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[] is only populated for multiple-CPU machines. This also matches what the code does when SMP is not enabled. PR: bin/151616 Tested by: "Damian S. Kolodziejczyk" damkol \| gmail Submitted by: avg MFC after: 1 week	2010-10-28 13:44:19 +00:00
Attilio Rao	a3da97926d	Merge the mptable support from MD bits to x86 subtree. Sponsored by: Sandvine Incorporated Discussed with: jhb	2010-10-28 07:58:06 +00:00
Alan Cox	92ababa777	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
Attilio Rao	256439c972	Merge dump_machdep.c i386/amd64 under the x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-26 12:46:26 +00:00
John Baldwin	0689bdcc19	Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable(). Requested by: bde	2010-10-25 15:31:13 +00:00
John Baldwin	c6390f7ac5	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
Alan Cox	353b642ced	Update pmap_extract() to handle 1GB page mappings. Some device drivers use pmap_extract() rather than pmap_kextract() on direct map addresses. Thus, pmap_extract() needs to be able to deal with 1GB page mappings if we are to use 1GB page mappings for the direct map. (See r197580.)	2010-10-15 15:23:34 +00:00
Jung-uk Kim	56b11f84a7	Remove trailing ", " from `sysctl machdep.idle_available' output.	2010-10-12 20:53:12 +00:00
Konstantin Belousov	78ae4338a2	Add macro DECLARE_MODULE_TIED to denote a module as requiring the kernel of exactly the same __FreeBSD_version as the headers module was compiled against. Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules use kernel interfaces that the Release Engineering Team feel are not stable enough to guarantee they will not change during the life cycle of a STABLE branch. In particular, the layout of struct sysentvec is declared to be not part of the STABLE KBI. Discussed with: bz, rwatson Approved by: re (bz, kensmith) MFC after: 2 weeks	2010-10-12 09:18:17 +00:00
Konstantin Belousov	b3b4bec7e6	Regen.	2010-10-08 07:19:05 +00:00
Konstantin Belousov	5d2a6a61b4	Fix typo. Submitted by: arundel MFC after: 3 days	2010-10-08 07:18:44 +00:00
Konstantin Belousov	3f506a78ce	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
Konstantin Belousov	2d5db3709b	The makectx() function, used by kdb_trap() to reconstruct pcb from trap frame when trap initiated kdb entry, incorrectly calculated the value of %rsp for trapped thread. According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than only on a CPL change." Even assuming the conditional push of the %ss:%rsp, the calculation was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64. Always use the tf_rsp from trap frame. The change supposedly fixes stepping when using kgdb backend for kdb. Submitted by: Zhouyi Zhou <zhouzhouyi gmail com> PR: amd64/151167 Reviewed by: avg MFC after: 1 week	2010-10-03 13:52:17 +00:00
Andriy Gapon	d443a96ffb	i386 and amd64 mp_machdep: improve topology detection for Intel CPUs This patch is significantly based on previous work by jkim. List of changes: - added comments that describe topology uniformity assumption - added reference to Intel Processor Topology Enumeration article - documented a few global variables that describe topology - retired weirdly set and used logical_cpus variable - changed fallback code for mp_ncpus > 0 case, so that CPUs are treated as being different packages rather than cores in a single package - moved AMD-specific code to topo_probe_amd [jkim] - in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT and core masks and match APIC IDs against those masks [started by jkim] - in topo_probe_0x4() drop code for double-checking topology parameters by looking at L1 cache properties [jkim] - in topo_probe_0xb() add fallback path to topo_probe_0x4() as prescribed by Intel [jkim] Still to do: - prepare for upcoming AMD CPUs by using new mechanism of uniform topology description [pointed by jkim] - probe cache topology in addition to CPU topology and probably use that for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have the same CPU topology, but Athlon cores do not share L2 cache while Core2's do (no L3 cache in both cases) - think of supporting non-uniform topologies if they are ever implemented for platforms in question - think how to better described old HTT vs new HTT distinction, HTT vs SMT can be confusing as SMT is a generic term - more robust code for marking CPUs as "logical" and/or "hyperthreaded", use HTT mask instead of modulo operation - correct support for halting logical and/or hyperthreaded CPUs, let scheduler know that it shouldn't schedule any threads on those CPUs PR: kern/145385 (related) In collaboration with: jkim Tested by: Sergey Kandaurov <pluknet@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Chip Camden <sterling@camdensoftware.com>, Steve Wills <steve@mouf.net>, Olivier Smedts <olivier@gid0.org>, Florian Smeets <flo@smeets.im> MFC after: 1 month	2010-10-01 10:32:54 +00:00
Neel Natu	5c1a8dc028	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb	2010-09-29 21:53:11 +00:00
David Xu	295fbd498e	Now userland POSIX semaphore is based on umtx. The kernel module is only used to support binary compatible, if want to run old binary, you need to kldload the module.	2010-09-24 09:04:16 +00:00
Norikatsu Shigemura	cbf4dac64f	Add support 'device tpm' for amd64. Add tpm(4)'s default setting to /boot/defaults/loader.conf. Add 'device tpm' to NOTES for amd64 and i386. Discussed with: takawata Approved by: imp (mentor)	2010-09-19 14:40:37 +00:00
Andriy Gapon	0b750af1b1	amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory KVA space is abundant on amd64, so there is no reason to limit kernel map size to a fraction of available physical memory. In fact, it could be larger than physical memory. This should help with memory auto-tuning for ZFS and shouldn't affect other workloads. This should reduce number of circumstances for "kmem_map too small" panics, but probably won't eliminate them entirely due to potential kmem fragmentation. In fact, you might want/need to limit maximum ARC size after this commit if you need to resrve more memory for applications. This change was discussed on arch@ and nobody said "don't do it". MFC after: 6 weeks	2010-09-17 07:36:32 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Kenneth D. Merry	d3c7b9a08a	MFp4 (//depot/projects/mps/...) Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers. This driver supports basic I/O, and works with SAS and SATA drives and expanders. Basic error recovery works (i.e. timeouts and aborts) as well. Integrated RAID isn't supported yet, and there are some known bugs. So this isn't ready for production use, but is certainly ready for testing and additional development. For the moment, new commits to this driver should go into the FreeBSD Perforce repository first (//depot/projects/mps/...) and then get merged into -current once they've been vetted. This has only been added to the amd64 GENERIC, since that is the only architecture I have tested this driver with. Submitted by: scottl Discussed with: imp, gibbs, will Sponsored by: Yahoo, Spectra Logic Corporation	2010-09-10 15:03:56 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Roman Divacky	27d4fea6c5	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
Jung-uk Kim	305c5c0acb	Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use these values in the function.	2010-08-30 21:19:42 +00:00
Rui Paulo	cba3269417	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
Rui Paulo	0bc1991a4a	Call the necessary DTrace function pointers when we have different kinds of traps. Sponsored by: The FreeBSD Foundation	2010-08-25 09:10:32 +00:00
Rui Paulo	8a8d8fa3d1	Add two DTrace trap type values. Used by fasttrap. Sponsored by: The FreeBSD Foundation	2010-08-24 13:13:24 +00:00
Attilio Rao	67a94de261	Revert part of the r211149 as I erroneously ported the logical_cpus from Yahoo! patchset as a mask (and according manipulating variables) while it is actually a CPU count. Submitted by: neel MFC after: 1 month X-MFC: 211149	2010-08-19 22:37:43 +00:00
John Baldwin	8c7a92bd4a	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
Pietro Cerutti	e0e08e6a60	- The iMac9,1 needs the PAT workaround as well Approved by: cognet	2010-08-17 12:17:24 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
Jung-uk Kim	0405a5efe7	Reset switchtime to zero rather than the current CPU ticker (TSC) value. It is more appropriate in this context because TSC MSR is reset to zero when the CPU is restarted from S3 and above. Move acpi_resync_clock() back to where it was before r211202. It does not make a difference any more.	2010-08-13 22:08:42 +00:00
Attilio Rao	3742bd96fe	Revert r211176: As long as interrupts are disabled and there is not explicit call to sched_add() there can't be any preemption there, thus the calls may be consistent. Reported by: kib, jhb	2010-08-12 13:46:43 +00:00
Jung-uk Kim	a1004d0abf	Reset switchtime and switchticks after resynchronizing the system clock. This should fix weird runtime problem after resume on amd64. It also fixes "calcru: runtime went backwards" warnings with bootverbose.	2010-08-12 00:20:46 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
Attilio Rao	807ef45666	IPI handlers may run generally with interrupts disabled because they are served via an interrupt gate. However, that doesn't explicitly prevent preemption and thread migration thus scheduler pinning may be necessary in some handlers. Fix that. Tested by: gianni MFC after: 1 month	2010-08-11 10:51:27 +00:00
Attilio Rao	7cd8b4cd42	Fix a typo due to a stale version of the patch. Reported by: gianni, rdivacky MFC after: 1 month X-MFC: 211149	2010-08-10 18:29:39 +00:00
Attilio Rao	4c967b618d	Fix some places that may use cpumask_t while they still use 'int' types. While there, also fix some places assuming cpu type is 'int' while u_int is really meant. Note: this will also fix some possible races in per-cpu data accessings to be addressed in further commits. In collabouration with: Yahoo! Incorporated (via sbruno and peter) Tested by: gianni MFC after: 1 month	2010-08-10 16:14:10 +00:00
Attilio Rao	d35534bf42	Simplify the logic for handling ipi_selected() and ipi_cpu() in the amd64/i386 case. Reviewed by: jhb Tested by: gianni MFC after: 1 month X-MFC: 210939	2010-08-09 20:25:06 +00:00
David Malone	ee04083c8a	Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not being used.	2010-08-08 20:34:53 +00:00
Konstantin Belousov	1757d9699d	Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well. MFC after: 1 week	2010-08-07 11:57:13 +00:00
Bernhard Schmidt	5ec432ed82	Fix whitespace nits. PR: conf/148989 Submitted by: pluknet <pluknet at gmail.com> MFC after: 3 days	2010-08-06 18:46:27 +00:00
Jung-uk Kim	64299552b9	Remove unnecessary casting and simplify code. We are not there yet. ;-)	2010-08-06 17:21:32 +00:00
Jung-uk Kim	05db09e056	Correct argument order of acpi_restorecpu(), which was forgotten in r210804.	2010-08-06 15:59:00 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
John Baldwin	e2865ebbc2	Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the generic PCI-PCI bridge driver and only override specific methods. This should fix suspend/resume of PCI-PCI bridges using these drivers.	2010-08-05 17:48:37 +00:00
Jung-uk Kim	aa9928df7a	Remove an unnecessary register load.	2010-08-03 16:08:58 +00:00
Jung-uk Kim	3ab42a25a9	savectx() has not been used for fork(2) for about 15 years. [1] Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU, unconditionally reload FPU state. Pointed out by: bde [1]	2010-08-03 15:32:08 +00:00
Jung-uk Kim	6305bb243c	Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make better use of cache lines by placing it before pcb_save (now pcb_user_save), which is moved to the end of pcb since r210777.	2010-08-02 18:12:30 +00:00
Jung-uk Kim	a2d2c83668	- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1] savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]	2010-08-02 17:35:00 +00:00
John Baldwin	7134e39042	Tweak the logic to disable CLFLUSH in virtual environments to work around problems with flushing the local APIC register range so that it checks vm_guest directly. Reviewed by: kib, alc MFC after: 2 weeks	2010-08-02 17:01:23 +00:00
Xin LI	16430b12a3	In rdmsr_safe, use zero extend (by doing a 32-bit movl over eax to itself) instead of a sign extend. Discussed with: stas MFC after: 1 month	2010-07-30 21:39:28 +00:00
Xin LI	a3bc0a4e5c	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
John Baldwin	536af0d751	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
Jung-uk Kim	9727ca6a77	Fix another fallout from r208833. savectx() is used to save CPU context for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot have a valid pointer in pcb_save. This should restore the previous behaviour.	2010-07-29 16:49:20 +00:00
Jung-uk Kim	39381048f0	Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.	2010-07-29 16:41:21 +00:00
John Baldwin	a955c461ad	The corrected error count field is dependent on CMCI, not TES. MFC after: 1 week	2010-07-28 21:52:09 +00:00
Matthew D Fleming	d7854da193	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)	2010-07-28 15:36:12 +00:00
Alan Cox	a14a949872	The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.) Submitted by: kib	2010-07-28 04:47:40 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Jung-uk Kim	172754036a	Simplify fldcw() macro. There is no reason to use pointer here. No object file change after this commit (verified with md5).	2010-07-26 23:20:55 +00:00
Jung-uk Kim	07c86dcf54	Add missing ldmxcsr() prototype for lint case.	2010-07-26 23:02:18 +00:00
Jung-uk Kim	30402401a7	Reduce diff against fenv.h: Mark all inline asms as volatile for safety. No object file change after this commit (verified with md5).	2010-07-26 22:16:36 +00:00
Jung-uk Kim	2e50fa36a5	FNSTSW instruction can use AX register as an operand. Obtained from: fenv.h	2010-07-26 21:24:52 +00:00
Jung-uk Kim	9bfb10b154	Re-implement FPU suspend/resume for amd64. This removes superfluous uses of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs(). Also, we do not touch PCB flags any more. MFC after: 1 month	2010-07-26 19:53:09 +00:00
Konstantin Belousov	970eba46d5	Remove unneeded includes. Submitted by: alc MFC after: 1 week	2010-07-26 14:38:51 +00:00
Konstantin Belousov	d48dda1244	Regen	2010-07-23 21:31:03 +00:00
Konstantin Belousov	0b53d1569e	Remove the linux_exec_copyin_args(), freebsd32_exec_copyin_args() may server as well. COMPAT_FREEBSD32 is a prerequisite for COMPAT_LINUX32. Reviewed by: alc MFC after: 3 weeks	2010-07-23 21:30:33 +00:00
Alan Cox	69a8f9e3d1	Eliminate a little bit of duplicated code.	2010-07-23 18:58:27 +00:00
Konstantin Belousov	87d45a0392	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
Alexander Motin	060d7431b5	Add hints for i8254 timer on i386 and amd64. Some people report about systems with PnP/ACPI not reporting i8254 timer. In some cases it can be fatal, as i8254 can be the only available time counter hardware. From other side we are now heavily depend on i8254 timer and till the last time it's init/usage was completely hardcoded. So this change just restores previous behavior in more regular fashion.	2010-07-16 23:21:46 +00:00
Alexander Motin	fcc06be1b2	Move functions declaration to MI code, following implementation.	2010-07-15 17:49:35 +00:00
Alan Cox	7db39bcb05	Optimize pmap_remove()'s handling of PG_G mappings. Specifically, instead of calling pmap_invalidate_page() for each PG_G mapping, call pmap_invalidate_range() for each range of PG_G mappings. In addition, eliminate a redundant call to pmap_invalidate_page(). Both pmap_remove_pte() and pmap_remove_page() called pmap_invalidate_page() when the mapping had the PG_G attribute. Now, only pmap_remove_page() calls pmap_invalidate_page(). Altogether, these changes eliminate 53% of the TLB shootdowns for a "buildworld" on a ZFS file system. On FFS, the reduction is 3%. MFC after: 6 weeks	2010-07-15 16:25:51 +00:00
Bernhard Schmidt	774f94f14c	- Update 6000 firmware to 9.221.4.1 - Add 6050 firmware MFC after: 2 weeks	2010-07-15 11:26:07 +00:00
Warner Losh	1003cfe94d	Remove obsolete undef of COPY_SIGCODE. It appears to have not been used in FreeBSD in quite some time (maybe since before 4.4-lite :) Submitted by: bde	2010-07-13 15:06:13 +00:00
Jung-uk Kim	588697d478	Move i386-inherited logic of building ACPI headers for acpi_wakeup.c into better places and remove intermediate makefile and shell scripts. This makes parallel kernel build little bit safer for amd64.	2010-07-12 21:08:35 +00:00
John Baldwin	05f81eb67d	Remove a dead test. We already exclude NMI traps from this code in an earlier condition. MFC after: 1 week	2010-07-12 20:45:37 +00:00
Konstantin Belousov	aaa95ccb50	When switching the thread from the processor, store %dr7 content into the pcb before disabling watchpoints. Otherwise, when the thread is restored on a processor, watchpoints are still disabled. Submitted by: Tijl Coosemans <tijl coosemans org> (I would be much happier if Tijl commited this himself) MFC after: 1 week	2010-07-12 19:59:15 +00:00
Alan Cox	8155e5d561	Reduce the number of global TLB shootdowns generated by pmap_qenter(). Specifically, teach pmap_qenter() to recognize the case when it is being asked to replace a mapping with the very same mapping and not generate a shootdown. Unfortunately, the buffer cache commonly passes an entire buffer to pmap_qenter() when only a subset of the mappings are changing. For the extension of buffers in allocbuf() this was resulting in unnecessary shootdowns. The addition of new pages to the end of the buffer need not and did not trigger a shootdown, but overwriting the initial mappings with the very same mappings was seen as a change that necessitated a shootdown. With this change, that is no longer so. For a "buildworld" on amd64, this change eliminates 14-15% of the pmap_invalidate_range() shootdowns, and about 4% of the overall shootdowns. MFC after: 3 weeks	2010-07-10 18:22:44 +00:00
Konstantin Belousov	2680dac9e1	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
Alan Cox	77cb6e6f8d	Correctly maintain the per-cpu field "curpmap" on amd64 just like we do on i386. The consequences of not doing so on amd64 became apparent with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS options. Specifically, single-threaded applications were generating unnecessary IPIs to shoot-down the TLB on other processors. However, this is clearly nonsensical because a single-threaded application is only running on the current processor. The reason that this happens is that pmap_activate() is unable to properly update the old pmap's field "pm_active" without the correct "curpmap". So, in effect, stale bits in "pm_active" were leading pmap_protect(), pmap_remove(), pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary processor that wasn't even running the same application. Reviewed by: kib MFC after: 3 weeks	2010-07-08 03:35:00 +00:00
Rui Paulo	80599de862	Fix style issues with the previous commit, namely use-tab-instead-of-space and don't use underscores in macro variables. Pointed out by: bde	2010-07-07 12:08:58 +00:00
Kevin Lo	054d87d697	Add the u3g(4) driver. I can't find any reason why it's not here.	2010-07-07 09:23:46 +00:00
Rui Paulo	8923c96ee1	Introduce USD_{SET,GET}{BASE,LIMIT}. These help setting up the user segment descriptor hi and lo values. Idea from Solaris. Reviewed by: kib	2010-07-06 16:56:27 +00:00
Alexander Motin	af565edaaa	Revert r209638. After commit, there appeared to be more people who liked previous name of stray interrupt counters, then responded to the list.	2010-07-02 17:22:15 +00:00
Alexander Motin	b7bc6aa726	Make stray irq counters have format alike to other counters. Unified format makes string processing (for example by `systat -vm`) easier.	2010-07-01 21:58:46 +00:00
John Baldwin	fc0de8f0b6	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
Konstantin Belousov	13cedde2cb	Regenerate	2010-06-28 18:17:21 +00:00
Konstantin Belousov	595473a587	Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64 ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week	2010-06-23 20:44:07 +00:00
Konstantin Belousov	692add74d8	Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also, note that usercontext is not initialized anymore in fpstate_drop(). Systematically replace references to npxgetregs() and npxsetregs() by npxgetuserregs() and npxsetuserregs() in comments. Noted by: bde	2010-06-23 12:17:13 +00:00
Alexander Motin	25eb1b8c15	Some style fixes for r209371. Submitted by: jhb@	2010-06-22 16:20:10 +00:00
Alexander Motin	875b8844be	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
Konstantin Belousov	9e3e64e797	Only enable kdtrace hook in the LINT on the architectures that implement it.	2010-06-18 18:51:09 +00:00
Konstantin Belousov	b376ebac6b	In the ia32_{get,set}_fpcontext(), use fpu{get,set}userregs instead of fpu{get,set}regs. Noted by: bde MFC after: 1 month	2010-06-17 12:35:17 +00:00
Alexander Motin	d364638110	Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64. This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)	2010-06-17 11:54:49 +00:00
John Baldwin	61d3f0bab2	Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks	2010-06-15 18:51:41 +00:00
Konstantin Belousov	07c809237a	Remove two obsoleted comments, add a note about 32bit compatibility. MFC after: 1 month	2010-06-15 18:16:04 +00:00
Konstantin Belousov	4a23ecc77e	Rename CRITSECT_ASSERT to CRITICAL_ASSERT. Suggested by: jhb MFC after: 1 month	2010-06-15 14:59:35 +00:00
Konstantin Belousov	997534958e	Use critical sections instead of disabling local interrupts to ensure the consistency between PCPU fpcurthread and the state of the FPU. Explicitely assert that the calling conventions for fpudrop() are adhered too. In cpu_thread_exit(), add missed critical section entrance. Reviewed by: bde Tested by: pho MFC after: 1 month	2010-06-15 09:19:33 +00:00
Jung-uk Kim	c977e11721	Fix ACPI suspend/resume on amd64, which was broken since r208833. We need actual storage for FPU state to save and restore.	2010-06-14 20:08:26 +00:00
Alexander Motin	2f9fc3899b	Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu() for pre-bound IRQs during boot, submit there LAPIC ID, same as in other places, not CPU ID.	2010-06-14 07:38:53 +00:00
Kenneth D. Merry	7c049a853c	MFC 199549, 199997, 204158, 207673, and 208901. Bring in a number of netfront changes: r199549 \| jhb Remove commented out reference to if_watchdog and an assignment of zero to if_timer. Reviewed by: scottl r199997 \| gibbs Add media ioctl support and link notifications so that devd will attempt to run dhclient on a netfront (xn) device that is setup for DHCP in /etc/rc.conf. PR: kern/136251 (fixed differently than the submitted patch) r204158 \| kmacy - make printf conditional - fix witness warnings by making configuration lock a mutex r207673 \| joel Switch to our preferred 2-clause BSD license. Approved by: kmacy r208901 \| ken A number of netfront fixes and stability improvements: - Re-enable TSO. This was broken previously due to CSUM_TSO clearing the CSUM_TCP flag, so our checksum flags were incorrectly set going to the netback driver. That was fixed in r206844 in tcp_output.c, so we can turn TSO back on here. - Fix the way transmit slots are calculated, so that we can't overfill the ring. - Avoid sending packets with more fragments/segments than netback can handle. The Linux netback code can only handle packets of MAX_SKB_FRAGS, which turns out to be 18 on machines with 4K pages. We can easily generate packets with 32 or so fragments with TSO turned on. Right now the solution is just to drop the packets (since netback doesn't seem to handle it gracefully), but we should come up with a way to allow a driver to tell the TCP stack the maximum number of fragments it can handle in a single packet. - Fix the way the consumer is tracked in the receive path. It could get out of sync fairly easily. - Use standard Xen ring macros to make it clearer how netfront is using the rings. - Get rid of Linux-ish negative errno return values. - Added more documentation to the driver. - Refactored code to make it easier to read. - Some other minor fixes. Reviewed by: gibbs Sponsored by: Spectra Logic Approved by: re (bz)	2010-06-11 19:17:36 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Alan Cox	9124d0d6a3	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
Alexander Kabaev	60743cbd22	Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests in Linux emulation layer. Linux seems to only require that pos is page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter checking is too strict to allow some Linux binaries to run. tsMuxeR is one example of such a binary. Discussed with: jhb MFC after: 1 week	2010-06-10 17:59:47 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
John Baldwin	b9cd2f771a	Move the MD support for PCI message signalled interrupts to the x86 tree as it is identical for i386 and amd64.	2010-06-08 18:36:03 +00:00
John Baldwin	2465e30f0c	Move the machine check support code to the x86 tree since it is identical on i386 and amd64. Requested by: alc	2010-06-08 18:04:07 +00:00
John Baldwin	53a908cb07	Move the I/O APIC code to the x86 tree since it is identical on i386 and amd64.	2010-06-08 17:51:21 +00:00
John Baldwin	bfc7a4fc48	- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask the interrupt followed by a brief delay if it is not currently masked before moving the interrupt. - Move the icu_lock out of ioapic_program_intpin() and into callers. This closes a race where ioapic_program_intpin() could use a stale value of the masked state to compute the masked bit in the register. Reviewed by: mav MFC after: 2 weeks	2010-06-08 17:08:13 +00:00
Konstantin Belousov	4f24f88ebb	Style-compilant order of declarations. Noted by: bde MFC after: 1 month	2010-06-06 16:13:50 +00:00
Konstantin Belousov	6cf9a08d2c	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month	2010-06-05 15:59:59 +00:00
Attilio Rao	875e0aa40d	MFC r207329, r208716: - Extract the IODEV_PIO interface from ia64 and make it MI. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved. Sponsored by: Sandvine Incorporated Approved by: re (kib, bz)	2010-06-01 21:19:58 +00:00
Alan Cox	b2830a9649	Eliminate a stale comment.	2010-05-31 06:06:10 +00:00
Alan Cox	72dc3eb65b	Simplify the inner loop of pmap_collect(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.	2010-05-30 18:48:41 +00:00
Alan Cox	8f0d5d3b9f	When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-29 17:10:45 +00:00
John Baldwin	0c86af8162	Defer initializing machine checks for the boot CPU until the local APIC is fully configured. MFC after: 1 month	2010-05-28 17:50:24 +00:00
Alan Cox	52d8ba372e	Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all().	2010-05-28 06:49:57 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
John Baldwin	58ccad7ddc	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month	2010-05-24 15:45:05 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alexander Motin	dbd55f3ff0	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Alexander Motin	fa1ed4bd1a	Unify local_apic.c for x86 archs,	2010-05-23 17:45:01 +00:00
John Baldwin	e826ef1ec4	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
Poul-Henning Kamp	065b12a703	Rename an argument from "exp" to "expect" since the former makes FlexeLint uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.	2010-05-20 06:18:03 +00:00
John Baldwin	3b642a049b	Add constants for the optional EOI suppression support in local APICs and EOI registers in I/O APICs.	2010-05-19 19:52:41 +00:00
Konstantin Belousov	3c5636cdf1	MFC r207958: Route all returns from the interrupts and faults through the doreti_iret labeled iretq instruction. MFC r208026: Do not use .extern.	2010-05-19 09:32:59 +00:00
Konstantin Belousov	c89c4842bc	MFC r207957: Remove unneeded overrides of the segment registers.	2010-05-19 09:30:41 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Konstantin Belousov	7565f3e837	Do not use .extern, it is not strictly needed with gas and it is custom to omit it. Requested by: bde MFC after: 6 days	2010-05-13 09:59:10 +00:00
Konstantin Belousov	6a440fc3e0	Route all returns from the interrupts and faults through the doreti_iret labeled iretq instruction. Suppose that multithreaded process executes two threads, currently scheduled on different processors. Let assume that thread A executes using %cs or %ss pointing into the descriptor from LDT. If IPI comes which handler does not return by jump to doreti, and meantime thread B invalidates descriptor pointed to by %cs or %ss, then iretq from IPI handler could fault. Routing the return by doreti_iret allows kernel to catch the situation and recover from it by sending signal to the usermode. Tested by: pho MFC after: 1 week	2010-05-12 10:29:35 +00:00
Konstantin Belousov	99b1ff2ac4	Remove unneeded overrides of the segment registers in the inner trap frame upon segment register load fault. The doreti procedure does not load segment registers when returning to the kernel frame, and current values in the segment descriptor cache already allow the kernel mode to run, not modified by faulted loaded. Suggested by: bde Tested by: pho MFC after: 1 week	2010-05-12 10:29:06 +00:00
Konstantin Belousov	eb77a08756	MFC r207676: Add definitions for Intel AESNI CPUID bits and print the capabilities on boot.	2010-05-12 09:34:10 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Konstantin Belousov	19effccdee	MFC r204051 (by imp): n64 has a different size for KINFO_PROC_SIZE. Approved by: imp MFC r207152: Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. MFC r207269: Style: use #define<TAB> instead of #define<SPACE>.	2010-05-08 18:54:47 +00:00
Konstantin Belousov	5d3f8936f8	MFC r207463: Remove debugging code that was not used once since commit.	2010-05-08 12:40:38 +00:00
Alan Cox	7024db1d40	Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.	2010-05-06 16:39:43 +00:00
Konstantin Belousov	d4b0face18	MFC r207570: Style and comment adjustements.	2010-05-06 04:57:10 +00:00
Konstantin Belousov	db8fd40e9f	Add definitions for Intel AESNI CPUID bits and print the capabilities on boot. Hardware provided by: Sentex Communications MFC after: 1 week	2010-05-05 21:07:47 +00:00
Joel Dahl	8e0ad55abb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
Konstantin Belousov	bf6f1b56c5	Style and comment adjustements. Suggested and reviewed by: bde MFC after: 3 days	2010-05-03 14:30:49 +00:00
Warner Losh	16a79e54bc	Revert 207494: it was only for testing purposes.	2010-05-02 06:24:17 +00:00
Warner Losh	f459061b4c	Move to the new way of specifying compat options. The backs out the FOO = BAR form, in favor of listing the mapping in a separate file for more compatibility with older versions of config.	2010-05-02 06:20:42 +00:00
Konstantin Belousov	6a5baa54fd	Remove debugging code that was not used once since commit. Suggested by: bde MFC after: 1 week	2010-05-01 13:15:35 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Andrew Thompson	ad65806013	MFC r207077 Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had the illusion of a tunable setting but was always turned on regardless.	2010-04-29 22:44:04 +00:00
Attilio Rao	d8b878873e	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks	2010-04-28 15:38:01 +00:00
Konstantin Belousov	ea5e5dda7b	MFC r206992: As was done in r155238 for i386 and in r155239 for amd64, clear the carry flag for ia32 binary executed on amd64 host in get_mcontext().	2010-04-27 10:50:09 +00:00
Konstantin Belousov	8bac98182a	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
Pyun YongHyeon	ffb1296f2c	MFC r206625: Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet. This driver was written by Alexander Pohoyda and greatly enhanced by Nikolay Denev. I don't have these hardwares but this driver was tested by Nikolay Denev and xclin. Because SiS didn't release data sheet for this controller, programming information came from Linux driver and OpenSolaris. Unlike other open source driver for SiS190/191, sge(4) takes full advantage of TX/RX checksum offloading and does not require additional copy operation in RX handler. The controller seems to have advanced offloading features like VLAN hardware tag insertion/stripping, TCP segmentation offload(TSO) as well as jumbo frame support but these features are not available yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw> who sent fix for receiving VLAN oversized frames.	2010-04-26 17:03:56 +00:00
Kip Macy	cd4d97b439	missed pv access before pmap lock	2010-04-25 23:51:05 +00:00
Kip Macy	c28d264aa0	Incremental reduction of delta with head_page_lock_2 branch - replace modification of pmap resident_count with pmap_resident_count_{inc,dec} - the pv list is protected by the pmap lock, but in several cases we are relying on the vm page queue mutex, move pv_va read under the pmap lock	2010-04-25 23:18:02 +00:00
Andrew Thompson	f6a9d95ba3	Set USB_DEBUG like the other platforms, I had turned it off to test the build before committing r207077. Spotted by: marius	2010-04-25 22:01:32 +00:00
Alan Cox	0d2e1c3e39	Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified. In contrast to pmap_remove() or pmap_remove_all(), the mapping is not being destroyed, so the notion that the page was accessed is not lost. Moreover, clearing the page table entry's accessed bit and setting the page's PG_REFERENCED flag can throw off the page daemon's activity count calculation. Finally, in my tests, I found that 15% of the atomic memory operations being performed by pmap_protect() were only to clear PG_A, and not change protection. This could, by itself, be fixed, but I don't see the point given the above argument. Remove a comment from pmap_protect_pde() that is no longer meaningful after the above change.	2010-04-25 20:40:45 +00:00
Kip Macy	81b47f28bc	apply style(9) changes applied to head_page_lock_2 requested by: kib@	2010-04-24 21:17:07 +00:00

... 6 7 8 9 10 ...

6349 Commits