freebsd-skq

Author	SHA1	Message	Date
Jung-uk Kim	2b052e43be	Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features. Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel AVX instructions. The SSE5 bit was recycled as XOP extended instruction bit, CVT16 was deprecated in favor of F16C (half-precision float conversion instructions for AVX), and the remaining FMA4 (4-operand FMA instructions) gained a separate CPUID bit. Replace non-existent references with today's CPUID specifications.	2011-05-17 22:36:16 +00:00
Attilio Rao	b2aa562e7b	MFC	2011-05-13 20:58:48 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Dmitry Chagin	98cde5eede	Remove wrong comment. MFC after: 1 week.	2011-05-11 17:57:15 +00:00
Jung-uk Kim	00c885e181	Add SC_PIXEL_MODE to GENERIC for amd64 and i386. Requested by: many	2011-05-10 16:44:16 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
Andriy Gapon	fdf30d59a6	prepare code that does topology detection for amd cpus for bulldozer This also introduces a new detection path for family 10h and newer pre-bulldozer cpus, pre-10h hardware should not be affected. Tested by: Gary Jennejohn <gljennjohn@googlemail.com> (with pre-10h hardware) MFC after: 2 weeks	2011-05-06 13:51:54 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	8c0ef2464e	Revert md_assert_preempt() introduction. Discussed with: jeff, jhb	2011-05-04 20:29:40 +00:00
Attilio Rao	94ebcddde3	MFC	2011-05-03 18:57:46 +00:00
John Baldwin	6162795be0	Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can be disabled via 'nooptions NEW_PCIB'.	2011-05-03 18:23:11 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Attilio Rao	7be8a2de4f	MFC @ r221324	2011-05-02 14:23:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
Bernhard Schmidt	d1f25d5dcb	Add the remaining wireless drivers. Discussed with: joel	2011-05-01 13:26:34 +00:00
Attilio Rao	f1edea81ac	Add the function md_assert_nopreempt(), which is a very consistent function on the possibility of a thread to not preempt. As this function is very tied to x86 (interrupts disabled checkings) it is not intended to be used in MI code.	2011-04-30 23:12:37 +00:00
Kevin Lo	5aaea65247	Add urtw(4)	2011-04-29 06:36:39 +00:00
Jung-uk Kim	c34e9dbee1	Define "Hypervisor Present" bit. This bit is used by several hypervisors to identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware, and MS Hyper-V are known to set this bit. MFC after: 3 days	2011-04-28 22:23:39 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Rick Macklem	4309e17add	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.	2011-04-27 17:51:51 +00:00
Alexander Motin	0d307e0905	- Add shim to simplify migration to the CAM-based ATA. For each new adaX device in /dev/ create symbolic link with adY name, trying to mimic old ATA numbering. Imitation is not complete, but should be enough in most cases to mount file systems without touching /etc/fstab. - To know what behavior to mimic, restore ATA_STATIC_ID option in cases where it was present before. - Add some more details to UPDATING.	2011-04-26 17:01:49 +00:00
Maxim Sobolev	f30bc1f3b5	With the typical memory size of the system in tenth of gigabytes counting memory being dumped in 16MB increments is somewhat silly. Especially if the dump fails and everything you've got for debugging is screen filled with numbers in 16 decrements... Replace that with percentage-based progress with max 10 updates all fitting into one line. Collapse other very "useful" piece of crash information (total ram) into the same line to save some more space. MFC after: 1 week	2011-04-26 16:14:55 +00:00
Rick Macklem	7c208ed659	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them. Reviewed by: jhb MFC after: 2 weeks	2011-04-25 22:22:51 +00:00
Alexander Motin	97b53e3634	Switch the GENERIC kernels for all architectures to the new CAM-based ATA stack. It means that all legacy ATA drivers are disabled and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers for each type in order of detection, unless configured otherwise with tunables, see cam(4)). ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX.	2011-04-24 08:58:58 +00:00
Konstantin Belousov	3136faa59d	Make pmap_invalidate_cache_range() available for consumption on amd64. Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU cache for the set of pages, which are not neccessary mapped. Since its supposed use is to prepare the move of the pages ownership to a device that does not snoop all CPU accesses to the main memory (read GPU in GMCH), do not rely on CPU self-snoop feature. amd64 implementation takes advantage of the direct map. On i386, extract the helper pmap_flush_page() from pmap_page_set_memattr(), and use it to make a temporary mapping of the flushed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2011-04-18 21:24:42 +00:00
Jung-uk Kim	0e72764232	Add a function rdtsc32() to read lower 32 bits from TSC and discard upper 32 bits. Some times compiler inserts unnecessary instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. It reduces such compiler mistakes where every cycle counts.	2011-04-14 16:53:32 +00:00
Jung-uk Kim	4854ae249c	Consistently use __volatile as the rest of this file.	2011-04-14 16:19:41 +00:00
Jung-uk Kim	f5ac47f44c	Prefer C99 standard integers to reduce diff from i386 version.	2011-04-14 16:14:35 +00:00
Jung-uk Kim	a7817c7ae5	Reduce errors in effective frequency calculation.	2011-04-12 23:49:07 +00:00
Jung-uk Kim	b9e4376214	Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and MPERF MSRs are available. It was disabled in r216443. Remove the earlier hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little bit more reliable now.	2011-04-12 23:04:01 +00:00
Jung-uk Kim	dd3e254ebd	Add forgotten declarations for tsc_perf_stat from the previous commit.	2011-04-12 22:22:01 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	3731174954	Add definitions for CPUID instruction 6, ECX information.	2011-04-12 22:12:23 +00:00
Konstantin Belousov	2140d9c83b	Remove setting of PCB_FULL_IRET at the places where we are going to call update_gdt_{f,g}sbase. The functions set the flag when td == curthread, and sysarch is always called with curthread. Reviewed by: jhb, jkim MFC after: 1 week	2011-04-08 21:27:31 +00:00
Konstantin Belousov	5ab73cbcba	Disable local interrupts before testing the PCB_FULL_IRET flag. Thread might be preempted after testing, which causes the flag to be cleared. If ast was not delivered, we will do sysret with potentially wrong fs/gs bases. Reviewed by: jhb, jkim MFC after: 1 week (together with r220430, r220452)	2011-04-08 21:26:50 +00:00
Ryan Stone	7d6a0bf373	Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi and machdep.kdb_on_nmi. Approved by: emaste (mentor) MFC after: 1 week	2011-04-08 14:39:41 +00:00
John Baldwin	13fb631aff	Fix a bug in the previous change to restore the fast path for syscall return. The ast() function may cause a context switch in which case PCB_FULL_IRET would be set in the pcb. However, the code was not rechecking the flag after ast() returned and would not properly restore the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after ast() returns. While here, trim an instruction (and memory access) from the doreti path and fix a typo in a comment. MFC after: 1 week	2011-04-08 13:33:57 +00:00
John Baldwin	ff265077cf	Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field. MFC after: 3 days	2011-04-08 13:30:48 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
John Baldwin	615d2dffa3	pcb_flags is an int, so use testl rather than testq. Pointy hat to: jhb Submitted by: jkim MFC after: 1 week	2011-04-07 23:13:22 +00:00
John Baldwin	1438b4ced1	If a system call does not request a full interrupt return, use a fast path via the sysretq instruction to return from the system call. This was removed in 190620 and not quite fully restored in 195486. This resolves most of the performance regression in system call microbenchmarks between 7 and 8 on amd64. Reviewed by: kib MFC after: 1 week	2011-04-07 21:32:25 +00:00
Jung-uk Kim	efd393d539	Remove stale checks for RDTSC support. amd64 must have TSC support anyway.	2011-04-07 21:29:34 +00:00
Konstantin Belousov	7332c129e0	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
Andriy Gapon	a930718af1	Revert r220032:linux compat: add SO_PASSCRED option with basic handling I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg	2011-03-31 08:14:51 +00:00
Adrian Chadd	dba9c85977	Break out the ath PCI logic into a separate device/module. Introduce the AHB glue for Atheros embedded systems. Right now it's hard-coded for the AR9130 chip whose support isn't yet in this HAL; it'll be added in a subsequent commit. Kernel configuration files now need both 'ath' and 'ath_pci' devices; both modules need to be loaded for the ath device to work.	2011-03-31 08:07:13 +00:00
Edward Tomasz Napierala	72bcfc9693	Revert part of r220137, committed by mistake - RACCT is _not_ supposed to be enabled in GENERIC.	2011-03-29 18:16:49 +00:00
Edward Tomasz Napierala	097055e26d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
Alan Cox	1c675a3bc3	The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000 instead of 0x100000. As a side effect, an amd64 kernel now loads at physical address 0x200000 instead of 0x100000. This is probably for the best because it avoids the use of a 2MB page mapping for the first 1MB of the kernel that also spans the fixed MTRRs. However, getmemsize() still thinks that the kernel loads at 0x100000, and so the physical memory between 0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired constant in getmemsize() by a symbol "kernphys" that is defined by the linker script. In collaboration with: kib	2011-03-28 06:35:17 +00:00
Alan Cox	63740078e0	Amd64 doesn't have a lazypmap ipi.	2011-03-27 16:18:51 +00:00
Andriy Gapon	01a9e1a11b	linux compat: add SO_PASSCRED option with basic handling This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks	2011-03-26 11:25:36 +00:00
Andriy Gapon	931f0826ea	linux compat: add non-dummy capget and capset system calls, regenerate And drop dummy definitions for those system calls. This may transiently break the build. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:59:24 +00:00
Andriy Gapon	1f4ec5a3ba	linux compat: add non-dummy capget and capset system calls PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:51:56 +00:00
Dmitry Chagin	acface683e	Export the correct AT_PLATFORM value. Since signal trampolines are copied to the shared page do not need to leave place on the stack for it. Forgotten in the previous commit. MFC after: 1 Week	2011-03-26 09:25:35 +00:00
Alan Cox	1587dfd730	Move an external declaration to the appropriate header file.	2011-03-26 06:21:05 +00:00
Jung-uk Kim	cd45fec044	Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU capabilities (e.g., CMPXCHG8B) to work around compatibility issues with ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID and there is no MSR to expose the feature although all mP6 processors are capable of CMPXCHG8B according to datasheets I found from the Net. Clean up and simplify VIA PadLock detection while I am in the neighborhood.	2011-03-26 02:02:07 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	d2b74735b8	For now remove options FLOWTABLE from the remaining GENERIC kernel configurations and make it opt-in for those who want it. LINT will still build it. While it may be a perfect win in some scenarios, it still troubles users (see PRs) in general cases. In addition we are still allocating resources even if disabled by sysctl and still leak arp/nd6 entries in case of interface destruction. Discussed with: qingli (2010-11-24, just never executed) Discussed with: juli (OCTEON1) PR: kern/148018, kern/155604, kern/144917, kern/146792 MFC after: 2 weeks	2011-03-19 15:50:34 +00:00
Jung-uk Kim	38b8542ca9	Deprecate tsc_present as the last of its real consumers finally disappeared.	2011-03-15 17:19:52 +00:00
David Christensen	dd46ab31de	- Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE. (BCM57710, BCM57711, BCM57711E) MFC after: One month	2011-03-14 22:42:41 +00:00
Dmitry Chagin	8f1e49a638	Enable shared page use for amd64/linux32 and i386/linux binaries. Move signal trampoline code from the top of the stack to the shared page. MFC after: 2 Weeks	2011-03-13 14:58:02 +00:00
Andriy Gapon	d549ef5638	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls Regenerate system call and systrace support files. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (earlier version) MFC after: 3 weeks	2011-03-12 08:58:19 +00:00
Andriy Gapon	56ede1074e	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls This commits makes necessary changes in syscall/sysent generation infrastructure. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (ealier version) MFC after: 3 weeks	2011-03-12 08:51:43 +00:00
Andriy Gapon	136882cf92	amd64/NOTES: use a greater number in KSTACK_PAGES example This is a minor cosmetic change - the users are more likely to want to increase (rather than decrease) default kernel stack size, which is already 4 pages on amd64. MFC after: 4 days	2011-03-11 19:21:42 +00:00
Matthew D Fleming	c77715ef6c	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.	2011-03-11 18:56:55 +00:00
Jung-uk Kim	79422085d4	Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.	2011-03-11 00:44:32 +00:00
Matthew D Fleming	cd67ac41ae	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh	2011-03-10 22:56:00 +00:00
Jung-uk Kim	bc34c87e81	Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.	2011-03-10 20:02:58 +00:00
Julian Elischer	a8066a9d3b	Add a small change to the comment in the GENRIC config files that include udbp Submitted by: Chris Forgron, cforgeron at acsi dot ca MFC after: 1 week	2011-03-09 17:15:11 +00:00
Dmitry Chagin	e5d81ef1b5	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
Dmitry Chagin	372f5e052f	Remove dead code. MFC after: 1 Week	2011-03-07 08:12:07 +00:00
Alan Cox	cb25117d54	Make a change to the implementation of the direct map to improve performance on processors that support 1 GB pages. Specifically, if the end of physical memory is not aligned to a 1 GB page boundary, then map the residual physical memory with multiple 2 MB page mappings rather than a single 1 GB page mapping. When a 1 GB page mapping is used for this residual memory, access to the memory is slower than when multiple 2 MB page mappings are used. (I suspect that the reason for this slowdown is that the TLB is actually being loaded with 4 KB page mappings for the residual memory.) X-MFC after: r214425	2011-03-02 00:24:07 +00:00
Robert Watson	74b5505e5d	Continue to introduce Capsicum capability mode: White list sysarch calls allowed in capability mode; arguably, there should be some link between the capability mode model and the privilege model here. Sysarch is a morass similar to ioctl, in many senses. Submitted by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:35:48 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Dmitry Chagin	dc4f0a9e11	To avoid excessive code duplication create wrapper for fill regs from stack frame. Change the trap() code to use newly created function instead of explicit regs assignment.	2011-02-16 17:50:21 +00:00
Dmitry Chagin	09d6cb0a23	For realtime signals fill the sigval value.	2011-02-15 21:46:36 +00:00
Dmitry Chagin	fde6316272	Sort include files in the alphabetical order.	2011-02-13 19:07:48 +00:00
Dmitry Chagin	222198ab0b	Move linux_clone(), linux_fork(), linux_vfork() to a MI path.	2011-02-12 18:17:12 +00:00
Dmitry Chagin	c8d6845e9e	In preparation for moving linux_clone() to a MI path introduce linux_set_upcall_kse().	2011-02-12 16:33:00 +00:00
Dmitry Chagin	2c7660ba3e	In preparation for moving linux_clone () to a MI path move the TLS code in a separate function. Use function parameter instead of direct using register.	2011-02-12 15:50:21 +00:00
Dmitry Chagin	9bd9b52478	Regen for r218610.	2011-02-12 15:36:25 +00:00
Dmitry Chagin	f91ea2518b	The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one.	2011-02-12 15:33:25 +00:00
Konstantin Belousov	6f9ec5aab0	Clear the padding when returning context to the usermode, for MI ucontext_t and x86 MD parts. Kernel allocates the structures on the stack, and not clearing reserved fields and paddings causes leakage. Noted and discussed with: bde MFC after: 2 weeks	2011-02-05 15:10:27 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Dmitry Chagin	77192fddeb	Regen for r218101. MFC after: 1 Month.	2011-01-30 20:38:26 +00:00
Dmitry Chagin	8d73c2bfd1	Change linux futex syscall definition to match actual linux one. MFC after: 1 Month.	2011-01-30 20:31:43 +00:00
Dmitry Chagin	9adaae9403	The kern_wait() code already removes the SIGCHLD signal for the waited process. Removing other SIGCHLD signals is not needed and may cause problems. Pointed out by: jilles MFC after: 1 Month.	2011-01-30 18:17:38 +00:00
Dmitry Chagin	572fb2e33e	My style(9) bug. Pointed out by: kib MFC after: 1 Month.	2011-01-29 07:22:33 +00:00
Dmitry Chagin	adc7ece00a	Implement a variation of the linux_common_wait() which should be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.	2011-01-28 18:47:07 +00:00
Dmitry Chagin	53c74fc607	To avoid excessive code duplication move struct rusage translation to a separate function. MFC after: 1 Month.	2011-01-28 18:28:06 +00:00
Konstantin Belousov	77185f473b	linux_sigreturn() loads the struct trapframe from l_sigcontext members, thus making a signed extension of 32 bit register context. If the register is not touched in usermode between return from signal and next syscall entry, the sign-extension part of 64bit register is not cleared, causing linux32_fetch_syscall_args() to read wrong values. Use unsigned type for the registers in the linux sigcontext. Reported by: Jacob Frelinger <jacob.frelinger duke edu>, arundel In collaboration with: dchagin MFC after: 1 week	2011-01-27 21:45:38 +00:00
Dmitry Chagin	a5c1afadeb	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
Matthew D Fleming	f89f7ada8d	Set td_kstack_pages for thread0. This was already being done for most architectures, but i386 and amd64 were missing it. Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>	2011-01-26 17:06:13 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Konstantin Belousov	de64ee1a30	Use CTLFLAG_RDTUN for read-only sysctl that exports tunable. Reminded by: pjd MFC after: 6 days	2011-01-19 21:35:48 +00:00
Konstantin Belousov	9e52b8b629	Make the length of the LDT a loader tunable, machdep.max_ldt_segment, and export it with read-only sysctl. Remove unused defines. Reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 23:00:22 +00:00
Konstantin Belousov	a05c98a099	Use malloc(9) instead of kmem_alloc(9) for temporal copy of the user-supplied descriptor array. Noted and reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 22:56:10 +00:00
John Baldwin	6bd823f334	- Remove some always-true checks (checking for unsigned < 0). - Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT when descriptors are provided. Specifically, allow the 'start == 0' and 'num == 0' special case used to free all LDT entries that previously failed with EINVAL. Submitted by: clang via rdivacky (some of 1) Reviewed by: kib	2011-01-18 16:43:01 +00:00
Jung-uk Kim	2fea643112	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
Jung-uk Kim	df74996c3d	Avoid preemption while manipulating CRs and MTRRs. Tested by: ariff	2011-01-17 17:30:35 +00:00
Jung-uk Kim	bdbf2db5b2	Remove redundant, bogus, and even harmful uses of setting TS bit in CR0. It is done from fpstate_drop() when it is really necessary. Reviewed by: kib MFC after: 1 week	2011-01-14 21:09:01 +00:00
Matthew D Fleming	240577c2a7	Fix up a few more sysctl(9) mis-typing found in various LINT builds.	2011-01-13 18:20:27 +00:00
John Baldwin	072e9838e2	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve	2011-01-13 17:00:22 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Tijl Coosemans	d22e78d6b9	Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs. Approved by: kib (mentor)	2011-01-08 18:09:48 +00:00
Konstantin Belousov	6297a3d843	Create shared (readonly) page. Each ABI may specify the use of page by setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions. Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page. Reviewed by: alc	2011-01-08 16:13:44 +00:00
Tijl Coosemans	a56e818f29	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
Tijl Coosemans	9858863cd4	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
Konstantin Belousov	39198f15ee	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
Jung-uk Kim	50e3cec377	Increase size of pcb_flags to four bytes. Requested by: bde, jhb	2010-12-22 19:57:03 +00:00
Jung-uk Kim	e6c006d96a	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months	2010-12-22 00:18:42 +00:00
Tijl Coosemans	81bd5041a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
Konstantin Belousov	7222d2fbee	Inform a compiler which asm statements in the x86 implementation of atomics change eflags. Reviewed by: jhb MFC after: 2 weeks	2010-12-18 16:41:11 +00:00
Jung-uk Kim	e1c9d39ebe	Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This function always returned the nominal frequency instead of current frequency because we use RDTSC instruction to calculate difference in CPU ticks, which is supposedly constant for the case. Now we support cpu_get_nominal_mhz() for the case, instead. Note it should be just enough for most usage cases because cpu_est_clockrate() is often times abused to find maximum frequency of the processor.	2010-12-14 20:07:51 +00:00
Robert Watson	9c9f06e60d	Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching its similar disabling of adaptive mutexes and rwlocks. The existing comment on why this is the case also applies to sx locks. MFC after: 3 days Discussed with: attilio	2010-12-13 12:15:46 +00:00
Konstantin Belousov	60c7c84e85	In fpudna()/npxdna(), mark FPU context initialized and optionally mark user FPU context initialized, if current context is user context. It was reversed in r215865, by inadequate change of this code fragment to a call to fpuuserinited()/npxuserinited(). The issue is only relevant for in-kernel users of FPU. Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net> Tested by: Mike Tancsa MFC after: 3 days	2010-12-12 16:16:39 +00:00
Robert Watson	996177338f	Derive the XENHVM kernel from GENERIC, adding only the options required to support PV drivers (such as xenpci), and non-adptive locking (along with a comment about why). This change eliminates the synchronisation problem between GENERIC and XENHVM, which had become severely rotted in HEAD, and in 8-STABLE included non-production kernel debugging features such as WITNESS. However, it comes at the cost of enabling devices and options that may not be present under Xen (such as random ethernet cards). For now, opt for a simpler kernel configuration file rather than using nooptions/ nodevice to enumerate and eliminate them. This leads to a somewhat larger XENHVM kernel. This is an MFC candidate for 8-STABLE before 8.2, in order to provide a production-worthy XENHVM kernel configuration for amd64. Discussed with: gibbs, cperciva Reported by: Piete Brooks <Piete.Brooks at cl.cam.ac.uk> Sponsored by: DARPA, AFRL MFC after: 3 days	2010-12-10 22:22:01 +00:00
Colin Percival	91ff9dc058	Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.	2010-12-09 06:41:50 +00:00
Jung-uk Kim	71e0b05797	Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC. Remove a confusing comment about converting to MHz as we never did.	2010-12-08 23:40:41 +00:00
Colin Percival	af60888734	On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192, while on i386 we have MAX_BPAGES=512. Implement this difference via '#ifdef __i386__'. With this commit, the i386 and amd64 busdma_machdep.c files become identical; they will soon be replaced by a single file under sys/x86.	2010-12-08 20:20:10 +00:00
Colin Percival	ec195da48a	MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen support, but if/when that support is ever added we'll want this, and until then it's harmless.	2010-12-08 19:52:04 +00:00
Colin Percival	81261a5a6d	MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and branch mispredictions. No objections from: scottl	2010-12-08 19:42:21 +00:00
Jung-uk Kim	dd7d207dcb	Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86. Discussed with: avg	2010-12-08 00:09:24 +00:00
Jung-uk Kim	7214d5d75b	Remove stale comments about P-state invariant TSC and fix style(9) nits.	2010-12-07 22:43:25 +00:00
Jung-uk Kim	1bcc28295b	Do not register a event handler for CPU freqency changes when it is found P-state invariant. This is continuation of r216274.	2010-12-07 22:34:51 +00:00
Jung-uk Kim	4a9c4056dc	Now the P-state invariant TSC is probed early enough, do not register event handlers for CPU freqency changes when it is found P-state invariant. Adjust a comment about non-existent tsc_freq_max() while I am here.	2010-12-07 22:23:26 +00:00
Jung-uk Kim	78a661bbaa	Probe P-state invariant TSC from rightful place.	2010-12-07 22:12:02 +00:00
Konstantin Belousov	1b3c32568a	Update some comments related to use of amd64 full context switch. In exec_linux_setregs(), use locally cached pointer to pcb to set pcb_full_iret. In set_regs(), note that full return is needed when code that sets segment registers is enabled. MFC after: 1 week	2010-12-07 12:44:33 +00:00
Konstantin Belousov	0f0170e66a	Retire write-only PCB_FULLCTX pcb flag on amd64. Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week	2010-12-07 12:17:43 +00:00
Konstantin Belousov	3e0ddb6781	Do not leak %rdx value in the previous image to the new image after execve(2). Note that ia32 binaries already handle this properly, since ia32_setregs() resets td_retval[1], but not exec_setregs(). We still do not conform to the amd64 ABI specification, since %rsp on the image startup is not aligned to 16 bytes. PR: amd64/124134 Discussed with: Petr Salinger <Petr.Salinger seznam cz> (who convinced me that there is indeed several bugs) MFC after: 1 week	2010-12-06 15:15:27 +00:00
Jung-uk Kim	2f7ab7e85d	Revert r216161. It is not necessary because we zero-fill BSS anyway. Requested by: jhb	2010-12-03 22:27:51 +00:00
Jung-uk Kim	b14fe63392	Explicitly initialize TSC frequency. To calibrate TSC frequency, we use DELAY(9) and it may use TSC in turn if TSC frequency is non-zero. MFC after: 3 days	2010-12-03 21:54:10 +00:00
Jung-uk Kim	e391a266ed	Do not change CPU ticker frequency if TSC is P-state invariant. Note this change was meant to be committed with r184102 (and its subsequent MFCs) but it fell off somehow. Pointyhat to: jkim MFC after: 3 days	2010-12-03 21:06:30 +00:00
Rebecca Cran	c90f7d9b44	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Konstantin Belousov	c6fb218c3c	Calling fill_fpregs() for curthread is legitimate, and ELF coredump does this. Reported and tested by: pho MFC after: 5 days	2010-11-28 17:56:34 +00:00
Alan Cox	686b00d691	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
Konstantin Belousov	5c6eb03790	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
Tijl Coosemans	ce4ec51dbe	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
Ulrich Spörlein	02604cd4f4	Remove kernel support for BB profiling, now that kernbb(8) is gone, too. PR: bin/83558 Reviewed by: jkim	2010-11-26 08:11:43 +00:00
Dimitry Andric	1496505287	Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use unambiguous inline assembly to load a float variable.	2010-11-25 22:19:40 +00:00
Dimitry Andric	cfe92f33bc	Change ambiguous (or invalid, depending on how strict you want to be :) assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since the intent was to move 16 bits of data, in this case. Found by: clang Reviewed by: kib	2010-11-24 18:35:11 +00:00
Jung-uk Kim	d2d0fda841	Remove a stale tunable introduced in r215703.	2010-11-23 17:28:23 +00:00
Jung-uk Kim	42ca4a29de	Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does better job since r215703 and it is safer now.	2010-11-23 16:12:35 +00:00
Andriy Gapon	9b984feb3d	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
Jung-uk Kim	7dd052c1d9	- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR. Flushing TLBs is required to ensure cache coherency according to the AMD64 architecture manual. Flushing caches is only required when changing from a cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or UC-). Since this function is only used once per processor during startup, there is no need to take any shortcuts. - Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the default of WB and UC. This is to avoid transition from a cacheable memory type to an uncacheable type to minimize possible cache incoherency. Since we perform flushing caches and TLBs now, this change may not be necessary any more but we do not want to take any chances. - Remove Apple hardware specific quirks. With the above changes, it seems this hack is no longer needed. - Improve pmap_cache_bits() with an array to map PAT memory type to index. This array is initialized early from pmap_init_pat(), so that we do not need to handle special cases in the function any more. Now this function is identical on both amd64 and i386. Reviewed by: jhb Tested by: RM (reuf_m at hotmail dot com) Ryszard Czekaj (rychoo at freeshell dot net) army.of.root (army dot of dot root at googlemail dot com) MFC after: 3 days	2010-11-22 19:52:44 +00:00
Andriy Gapon	b43d292565	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
Andriy Gapon	7af7c7624a	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
Andriy Gapon	8fd6d51347	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
Jung-uk Kim	816b3bd1b0	Restore CR0 after MTRR initialization for correctness sakes. There will be no noticeable change because we enable caches before we enter here for both BSP and AP cases. Remove another pointless optimization for CR4.PGE bit while I am here.	2010-11-16 23:26:02 +00:00
Jung-uk Kim	50083a5624	Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this code but probably it only worked by chance because modifying CR4.PGE bit causes invlidation of entire TLBs. Since these are very rare events, this micro-optimization seems useless. Reviewed by: jhb	2010-11-16 22:44:58 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Jung-uk Kim	19da400c64	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
Andriy Gapon	290e14f881	amd64: introduce minidump version 2 After KVA space was increased to 512GB on amd64 it became impractical to use PTEs as entries in the minidump map of dumped pages, because size of that map alone would already be 1GB. Instead, we now use PDEs as page map entries and employ two stage lookup in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are now dumped as regular pages. Fixed page map size now is 2MB. libkvm keeps support for accessing amd64 minidumps of version 1. Support for 1GB pages is added. Many thanks to Alan Cox for his guidance, numerous reviews, suggestions, enhancments and corrections. Reviewed by: alc [kernel part] MFC after: 15 days	2010-11-11 18:35:28 +00:00
Jung-uk Kim	93a8847473	Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.	2010-11-10 18:50:12 +00:00
Jung-uk Kim	7c2bf852d7	Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.	2010-11-10 01:29:56 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
Jung-uk Kim	cedd86cafa	Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.	2010-11-09 00:27:18 +00:00
Jung-uk Kim	2473325fa8	Reduce diff between platforms and fix style(9) bugs.	2010-11-09 00:14:39 +00:00
John Baldwin	13e25cb7a5	Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.	2010-11-08 20:57:02 +00:00
John Baldwin	f67b4bd367	A few small style and whitespace fixes.	2010-11-08 20:05:22 +00:00
Alan Cox	d9a799683c	Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked as "bogus". Reported by: Jia-Shiun Li	2010-11-07 21:48:49 +00:00
John Baldwin	0108cce0a4	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
Andriy Gapon	3b50d59fef	x86 topo_probe: do not probe smp topology if only one cpu is visible This could lead to a division by zero if hardware is multi-core and/or multi-threaded, but for some (quite unusual) reason FreeBSD sees only one logical processor. This could happen, for example, if neither MADT nor MP Table are presented by BIOS. Also: - assert in topo_probe_0x4 that BSP is accounted for - neither cpu_cores nor cpu_logical should be zero after successful probing, so either being zero is an indication of failed probing Reported by: vwe, Dan Allen <danallen46@airwired.net> Tested by: Dan Allen <danallen46@airwired.net> MFC after: 3 days	2010-11-04 08:51:45 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
John Baldwin	5ecdb3c46b	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
Alan Cox	2eeee67ce8	Add another safety belt to pmap_demote_DMAP().	2010-10-30 23:49:37 +00:00
Alan Cox	59fb2d9b04	Don't demote in pmap_demote_DMAP() if the specified length is zero.	2010-10-30 17:21:32 +00:00
Attilio Rao	ba2a27351b	Merge nexus.c from amd64 and i386 to x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-28 16:31:39 +00:00
John Baldwin	89d84a4055	Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[] is only populated for multiple-CPU machines. This also matches what the code does when SMP is not enabled. PR: bin/151616 Tested by: "Damian S. Kolodziejczyk" damkol \| gmail Submitted by: avg MFC after: 1 week	2010-10-28 13:44:19 +00:00
Attilio Rao	a3da97926d	Merge the mptable support from MD bits to x86 subtree. Sponsored by: Sandvine Incorporated Discussed with: jhb	2010-10-28 07:58:06 +00:00
Alan Cox	92ababa777	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
Attilio Rao	256439c972	Merge dump_machdep.c i386/amd64 under the x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-26 12:46:26 +00:00
John Baldwin	0689bdcc19	Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable(). Requested by: bde	2010-10-25 15:31:13 +00:00
John Baldwin	c6390f7ac5	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
Alan Cox	353b642ced	Update pmap_extract() to handle 1GB page mappings. Some device drivers use pmap_extract() rather than pmap_kextract() on direct map addresses. Thus, pmap_extract() needs to be able to deal with 1GB page mappings if we are to use 1GB page mappings for the direct map. (See r197580.)	2010-10-15 15:23:34 +00:00
Jung-uk Kim	56b11f84a7	Remove trailing ", " from `sysctl machdep.idle_available' output.	2010-10-12 20:53:12 +00:00
Konstantin Belousov	78ae4338a2	Add macro DECLARE_MODULE_TIED to denote a module as requiring the kernel of exactly the same __FreeBSD_version as the headers module was compiled against. Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules use kernel interfaces that the Release Engineering Team feel are not stable enough to guarantee they will not change during the life cycle of a STABLE branch. In particular, the layout of struct sysentvec is declared to be not part of the STABLE KBI. Discussed with: bz, rwatson Approved by: re (bz, kensmith) MFC after: 2 weeks	2010-10-12 09:18:17 +00:00
Konstantin Belousov	b3b4bec7e6	Regen.	2010-10-08 07:19:05 +00:00
Konstantin Belousov	5d2a6a61b4	Fix typo. Submitted by: arundel MFC after: 3 days	2010-10-08 07:18:44 +00:00
Konstantin Belousov	3f506a78ce	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
Konstantin Belousov	2d5db3709b	The makectx() function, used by kdb_trap() to reconstruct pcb from trap frame when trap initiated kdb entry, incorrectly calculated the value of %rsp for trapped thread. According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than only on a CPL change." Even assuming the conditional push of the %ss:%rsp, the calculation was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64. Always use the tf_rsp from trap frame. The change supposedly fixes stepping when using kgdb backend for kdb. Submitted by: Zhouyi Zhou <zhouzhouyi gmail com> PR: amd64/151167 Reviewed by: avg MFC after: 1 week	2010-10-03 13:52:17 +00:00
Andriy Gapon	d443a96ffb	i386 and amd64 mp_machdep: improve topology detection for Intel CPUs This patch is significantly based on previous work by jkim. List of changes: - added comments that describe topology uniformity assumption - added reference to Intel Processor Topology Enumeration article - documented a few global variables that describe topology - retired weirdly set and used logical_cpus variable - changed fallback code for mp_ncpus > 0 case, so that CPUs are treated as being different packages rather than cores in a single package - moved AMD-specific code to topo_probe_amd [jkim] - in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT and core masks and match APIC IDs against those masks [started by jkim] - in topo_probe_0x4() drop code for double-checking topology parameters by looking at L1 cache properties [jkim] - in topo_probe_0xb() add fallback path to topo_probe_0x4() as prescribed by Intel [jkim] Still to do: - prepare for upcoming AMD CPUs by using new mechanism of uniform topology description [pointed by jkim] - probe cache topology in addition to CPU topology and probably use that for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have the same CPU topology, but Athlon cores do not share L2 cache while Core2's do (no L3 cache in both cases) - think of supporting non-uniform topologies if they are ever implemented for platforms in question - think how to better described old HTT vs new HTT distinction, HTT vs SMT can be confusing as SMT is a generic term - more robust code for marking CPUs as "logical" and/or "hyperthreaded", use HTT mask instead of modulo operation - correct support for halting logical and/or hyperthreaded CPUs, let scheduler know that it shouldn't schedule any threads on those CPUs PR: kern/145385 (related) In collaboration with: jkim Tested by: Sergey Kandaurov <pluknet@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Chip Camden <sterling@camdensoftware.com>, Steve Wills <steve@mouf.net>, Olivier Smedts <olivier@gid0.org>, Florian Smeets <flo@smeets.im> MFC after: 1 month	2010-10-01 10:32:54 +00:00
Neel Natu	5c1a8dc028	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb	2010-09-29 21:53:11 +00:00
David Xu	295fbd498e	Now userland POSIX semaphore is based on umtx. The kernel module is only used to support binary compatible, if want to run old binary, you need to kldload the module.	2010-09-24 09:04:16 +00:00
Norikatsu Shigemura	cbf4dac64f	Add support 'device tpm' for amd64. Add tpm(4)'s default setting to /boot/defaults/loader.conf. Add 'device tpm' to NOTES for amd64 and i386. Discussed with: takawata Approved by: imp (mentor)	2010-09-19 14:40:37 +00:00
Andriy Gapon	0b750af1b1	amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory KVA space is abundant on amd64, so there is no reason to limit kernel map size to a fraction of available physical memory. In fact, it could be larger than physical memory. This should help with memory auto-tuning for ZFS and shouldn't affect other workloads. This should reduce number of circumstances for "kmem_map too small" panics, but probably won't eliminate them entirely due to potential kmem fragmentation. In fact, you might want/need to limit maximum ARC size after this commit if you need to resrve more memory for applications. This change was discussed on arch@ and nobody said "don't do it". MFC after: 6 weeks	2010-09-17 07:36:32 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Kenneth D. Merry	d3c7b9a08a	MFp4 (//depot/projects/mps/...) Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers. This driver supports basic I/O, and works with SAS and SATA drives and expanders. Basic error recovery works (i.e. timeouts and aborts) as well. Integrated RAID isn't supported yet, and there are some known bugs. So this isn't ready for production use, but is certainly ready for testing and additional development. For the moment, new commits to this driver should go into the FreeBSD Perforce repository first (//depot/projects/mps/...) and then get merged into -current once they've been vetted. This has only been added to the amd64 GENERIC, since that is the only architecture I have tested this driver with. Submitted by: scottl Discussed with: imp, gibbs, will Sponsored by: Yahoo, Spectra Logic Corporation	2010-09-10 15:03:56 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Roman Divacky	27d4fea6c5	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
Jung-uk Kim	305c5c0acb	Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use these values in the function.	2010-08-30 21:19:42 +00:00
Rui Paulo	cba3269417	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
Rui Paulo	0bc1991a4a	Call the necessary DTrace function pointers when we have different kinds of traps. Sponsored by: The FreeBSD Foundation	2010-08-25 09:10:32 +00:00

... 2 3 4 5 6 ...

5992 Commits