freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	447274a88b	MFC	2011-05-15 15:47:16 +00:00
Henrik Brix Andersen	149d1c897e	Add I2C bus driver for the AMD Geode LX series CS5536 Companion Device. Reviewed by: jhb (newbus bits only), adrian	2011-05-15 14:01:23 +00:00
Attilio Rao	b2aa562e7b	MFC	2011-05-13 20:58:48 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Attilio Rao	739e31f6d7	MFC	2011-05-13 15:20:57 +00:00
Alexander Motin	167aee3895	Refactor Xen PV code to use new event timers subsystem. That uses one-shot Xen timer and time counter to provide one-shot and periodic time events. On my tests this reduces idle interruts rate down to about 30Hz, and accor- ding to Xen VM Manager reduces host CPU load by three times comparing to the previous periodic 100Hz clock. Also now, when needed, it is possible to increase HZ rate without useless CPU burning during idle periods. Now only ia64 and some ARMs left not migrated to the new event timers.	2011-05-13 12:39:37 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Jung-uk Kim	00c885e181	Add SC_PIXEL_MODE to GENERIC for amd64 and i386. Requested by: many	2011-05-10 16:44:16 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	b9f714be9f	MFC	2011-05-07 23:34:14 +00:00
Alexander Motin	d96fd07637	Don't use MWAIT for short sleeps under XEN, as it was before r212541. This fixes panic during boot in PV mode on Xen 3.2.	2011-05-07 12:27:25 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
Andriy Gapon	fdf30d59a6	prepare code that does topology detection for amd cpus for bulldozer This also introduces a new detection path for family 10h and newer pre-bulldozer cpus, pre-10h hardware should not be affected. Tested by: Gary Jennejohn <gljennjohn@googlemail.com> (with pre-10h hardware) MFC after: 2 weeks	2011-05-06 13:51:54 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	8c0ef2464e	Revert md_assert_preempt() introduction. Discussed with: jeff, jhb	2011-05-04 20:29:40 +00:00
Attilio Rao	94ebcddde3	MFC	2011-05-03 18:57:46 +00:00
John Baldwin	6162795be0	Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can be disabled via 'nooptions NEW_PCIB'.	2011-05-03 18:23:11 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Attilio Rao	171c7d9bf6	MFC	2011-05-02 22:03:30 +00:00
Bernhard Schmidt	13c98eb780	All PCI based wireless drivers seem to be explicitly removed from the PAE kernel config, do that also for those added to GENERIC lately.	2011-05-02 16:51:02 +00:00
Attilio Rao	7be8a2de4f	MFC @ r221324	2011-05-02 14:23:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
Attilio Rao	ef6146b9a3	- Merge a fix fixup for the last lazyfix removal - Sync xen with i386 about the ipi_send_cpu() usage	2011-05-02 13:56:47 +00:00
Bernhard Schmidt	d1f25d5dcb	Add the remaining wireless drivers. Discussed with: joel	2011-05-01 13:26:34 +00:00
Attilio Rao	a4823f2d0c	Remove unnused typedef.	2011-05-01 00:08:13 +00:00
Attilio Rao	f1edea81ac	Add the function md_assert_nopreempt(), which is a very consistent function on the possibility of a thread to not preempt. As this function is very tied to x86 (interrupts disabled checkings) it is not intended to be used in MI code.	2011-04-30 23:12:37 +00:00
Attilio Rao	9734077245	Remove the support for lazy cr3 switching from i386. amd64 has already this micro-optimization removed. Submitted by: kib	2011-04-30 23:02:17 +00:00
Kevin Lo	5aaea65247	Add urtw(4)	2011-04-29 06:36:39 +00:00
Jung-uk Kim	c34e9dbee1	Define "Hypervisor Present" bit. This bit is used by several hypervisors to identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware, and MS Hyper-V are known to set this bit. MFC after: 3 days	2011-04-28 22:23:39 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Rick Macklem	4309e17add	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.	2011-04-27 17:51:51 +00:00
Alexander Motin	0d307e0905	- Add shim to simplify migration to the CAM-based ATA. For each new adaX device in /dev/ create symbolic link with adY name, trying to mimic old ATA numbering. Imitation is not complete, but should be enough in most cases to mount file systems without touching /etc/fstab. - To know what behavior to mimic, restore ATA_STATIC_ID option in cases where it was present before. - Add some more details to UPDATING.	2011-04-26 17:01:49 +00:00
Rick Macklem	7c208ed659	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them. Reviewed by: jhb MFC after: 2 weeks	2011-04-25 22:22:51 +00:00
Alexander Motin	97b53e3634	Switch the GENERIC kernels for all architectures to the new CAM-based ATA stack. It means that all legacy ATA drivers are disabled and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers for each type in order of detection, unless configured otherwise with tunables, see cam(4)). ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX.	2011-04-24 08:58:58 +00:00
Jung-uk Kim	51268821a9	Do not invoke resume event handlers if suspend was successful. Pointy hat to: jkim	2011-04-19 16:30:17 +00:00
Jung-uk Kim	ba40504144	Add suspend/resume event handlers for apm(4) as well.	2011-04-19 16:20:55 +00:00
Konstantin Belousov	3136faa59d	Make pmap_invalidate_cache_range() available for consumption on amd64. Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU cache for the set of pages, which are not neccessary mapped. Since its supposed use is to prepare the move of the pages ownership to a device that does not snoop all CPU accesses to the main memory (read GPU in GMCH), do not rely on CPU self-snoop feature. amd64 implementation takes advantage of the direct map. On i386, extract the helper pmap_flush_page() from pmap_page_set_memattr(), and use it to make a temporary mapping of the flushed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2011-04-18 21:24:42 +00:00
Jung-uk Kim	0e72764232	Add a function rdtsc32() to read lower 32 bits from TSC and discard upper 32 bits. Some times compiler inserts unnecessary instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. It reduces such compiler mistakes where every cycle counts.	2011-04-14 16:53:32 +00:00
Jung-uk Kim	4854ae249c	Consistently use __volatile as the rest of this file.	2011-04-14 16:19:41 +00:00
Jung-uk Kim	f54c13ea44	Consistently use C99 standard integers as the rest of this file.	2011-04-14 16:02:52 +00:00
Jung-uk Kim	a7817c7ae5	Reduce errors in effective frequency calculation.	2011-04-12 23:49:07 +00:00
Jung-uk Kim	b9e4376214	Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and MPERF MSRs are available. It was disabled in r216443. Remove the earlier hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little bit more reliable now.	2011-04-12 23:04:01 +00:00
Jung-uk Kim	dd3e254ebd	Add forgotten declarations for tsc_perf_stat from the previous commit.	2011-04-12 22:22:01 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	3731174954	Add definitions for CPUID instruction 6, ECX information.	2011-04-12 22:12:23 +00:00
Ryan Stone	7d6a0bf373	Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi and machdep.kdb_on_nmi. Approved by: emaste (mentor) MFC after: 1 week	2011-04-08 14:39:41 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
Jung-uk Kim	d521c6b9c4	Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These functions are implemented with CMPXCHG8B instruction where it is available, i. e., all Pentium-class and later processors. Note this instruction is also used for atomic_store_rel_64() because a simple XCHG-like instruction for 64-bit memory access does not exist, unfortunately. If the processor lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are performed with interrupt temporarily disabled, assuming it does not support SMP. Although this assumption may be little naive, it is true in reality. This implementation is inspired by Linux.	2011-04-06 23:59:59 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Jung-uk Kim	57af65d401	Use cpu_ticks() for get_cyclecount(9) rather than checking existence of TSC at run-time on i386. cpu_ticks() is set to use RDTSC early enough on i386 where it is available. Otherwise, cpu_ticks() is driven by the current timecounter hardware as binuptime(9) does. This also avoids unnecessary namespace pollution from <machine/cputypes.h>.	2011-04-04 22:56:33 +00:00
Andriy Gapon	a930718af1	Revert r220032:linux compat: add SO_PASSCRED option with basic handling I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg	2011-03-31 08:14:51 +00:00
Adrian Chadd	dba9c85977	Break out the ath PCI logic into a separate device/module. Introduce the AHB glue for Atheros embedded systems. Right now it's hard-coded for the AR9130 chip whose support isn't yet in this HAL; it'll be added in a subsequent commit. Kernel configuration files now need both 'ath' and 'ath_pci' devices; both modules need to be loaded for the ath device to work.	2011-03-31 08:07:13 +00:00
Andriy Gapon	01a9e1a11b	linux compat: add SO_PASSCRED option with basic handling This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks	2011-03-26 11:25:36 +00:00
Andriy Gapon	931f0826ea	linux compat: add non-dummy capget and capset system calls, regenerate And drop dummy definitions for those system calls. This may transiently break the build. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:59:24 +00:00
Andriy Gapon	1f4ec5a3ba	linux compat: add non-dummy capget and capset system calls PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:51:56 +00:00
Dmitry Chagin	acface683e	Export the correct AT_PLATFORM value. Since signal trampolines are copied to the shared page do not need to leave place on the stack for it. Forgotten in the previous commit. MFC after: 1 Week	2011-03-26 09:25:35 +00:00
Jung-uk Kim	cd45fec044	Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU capabilities (e.g., CMPXCHG8B) to work around compatibility issues with ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID and there is no MSR to expose the feature although all mP6 processors are capable of CMPXCHG8B according to datasheets I found from the Net. Clean up and simplify VIA PadLock detection while I am in the neighborhood.	2011-03-26 02:02:07 +00:00
Alan Cox	e9a3f7852d	Modestly increase the maximum allowed size of the kmem map on i386. Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.	2011-03-23 16:38:29 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	d2b74735b8	For now remove options FLOWTABLE from the remaining GENERIC kernel configurations and make it opt-in for those who want it. LINT will still build it. While it may be a perfect win in some scenarios, it still troubles users (see PRs) in general cases. In addition we are still allocating resources even if disabled by sysctl and still leak arp/nd6 entries in case of interface destruction. Discussed with: qingli (2010-11-24, just never executed) Discussed with: juli (OCTEON1) PR: kern/148018, kern/155604, kern/144917, kern/146792 MFC after: 2 weeks	2011-03-19 15:50:34 +00:00
Jung-uk Kim	2ffa4044e9	Rework r219679. Always check CPU class at run-time to make it predictable. Unfortunately, it pulls in <machine/cputypes.h> but it is small enough and namespace pollution is minimal, I hope. Pointed out by: bde Pointy hat: jkim	2011-03-16 16:09:08 +00:00
Jung-uk Kim	1f5cdd5a99	Partially revert r219672. After r198295, kernel need to seed randomness as soon as possible for stack protector. However, dummy timecounter does not have enough entropy and we don't need to sacrifice Pentium class and later. Pointed out by: Maxim Dounin (mdounin at mdounin dot ru)	2011-03-15 21:45:10 +00:00
Jung-uk Kim	b2b9331c44	Remove tsc_present from this file, really.	2011-03-15 18:09:29 +00:00
Jung-uk Kim	38b8542ca9	Deprecate tsc_present as the last of its real consumers finally disappeared.	2011-03-15 17:19:52 +00:00
Jung-uk Kim	d8ea2a492e	Unconditionally use binuptime(9) for get_cyclecount(9) on i386. Since this function is almost exclusively used for random harvesting, there is no need for micro-optimization. Adjust the manual page accordingly.	2011-03-15 17:14:26 +00:00
Jung-uk Kim	eb14346a8e	Make get_cyclecount(9) little bit more useful where binuptime(9) is used.	2011-03-14 23:30:14 +00:00
David Christensen	dd46ab31de	- Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE. (BCM57710, BCM57711, BCM57711E) MFC after: One month	2011-03-14 22:42:41 +00:00
Dmitry Chagin	8f1e49a638	Enable shared page use for amd64/linux32 and i386/linux binaries. Move signal trampoline code from the top of the stack to the shared page. MFC after: 2 Weeks	2011-03-13 14:58:02 +00:00
Andriy Gapon	d549ef5638	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls Regenerate system call and systrace support files. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (earlier version) MFC after: 3 weeks	2011-03-12 08:58:19 +00:00
Andriy Gapon	56ede1074e	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls This commits makes necessary changes in syscall/sysent generation infrastructure. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (ealier version) MFC after: 3 weeks	2011-03-12 08:51:43 +00:00
Jung-uk Kim	79422085d4	Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.	2011-03-11 00:44:32 +00:00
Jung-uk Kim	cf0d2bb216	Detect NSC/AMD Geode SC1100 properly, not just Stepping 0. Although it is unclear that "TSC stops ticking with HLT instruction" problem is present with other steppings, it is limited to Stepping 0 for now.	2011-03-10 22:20:11 +00:00
Jung-uk Kim	bc34c87e81	Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.	2011-03-10 20:02:58 +00:00
Julian Elischer	a8066a9d3b	Add a small change to the comment in the GENRIC config files that include udbp Submitted by: Chris Forgron, cforgeron at acsi dot ca MFC after: 1 week	2011-03-09 17:15:11 +00:00
Dmitry Chagin	e5d81ef1b5	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
Robert Watson	74b5505e5d	Continue to introduce Capsicum capability mode: White list sysarch calls allowed in capability mode; arguably, there should be some link between the capability mode model and the privilege model here. Sysarch is a morass similar to ioctl, in many senses. Submitted by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:35:48 +00:00
John Baldwin	4cef260f42	Fix whitespace nit.	2011-02-22 14:58:14 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Dmitry Chagin	dc4f0a9e11	To avoid excessive code duplication create wrapper for fill regs from stack frame. Change the trap() code to use newly created function instead of explicit regs assignment.	2011-02-16 17:50:21 +00:00
Dmitry Chagin	09d6cb0a23	For realtime signals fill the sigval value.	2011-02-15 21:46:36 +00:00
Dmitry Chagin	fde6316272	Sort include files in the alphabetical order.	2011-02-13 19:07:48 +00:00
Dmitry Chagin	222198ab0b	Move linux_clone(), linux_fork(), linux_vfork() to a MI path.	2011-02-12 18:17:12 +00:00
Dmitry Chagin	c8d6845e9e	In preparation for moving linux_clone() to a MI path introduce linux_set_upcall_kse().	2011-02-12 16:33:00 +00:00
Dmitry Chagin	2c7660ba3e	In preparation for moving linux_clone () to a MI path move the TLS code in a separate function. Use function parameter instead of direct using register.	2011-02-12 15:50:21 +00:00
Dmitry Chagin	9bd9b52478	Regen for r218610.	2011-02-12 15:36:25 +00:00
Dmitry Chagin	f91ea2518b	The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one.	2011-02-12 15:33:25 +00:00
Alan Cox	02e5228ca0	Setting VV_TEXT here is redundant. It is already set by do_execve(). Reviewed by: kib	2011-02-09 18:45:33 +00:00
Konstantin Belousov	b17ef03604	Fix linking of the kernel without device npx. MFC after: 2 weeks	2011-02-05 15:37:10 +00:00
Konstantin Belousov	6f9ec5aab0	Clear the padding when returning context to the usermode, for MI ucontext_t and x86 MD parts. Kernel allocates the structures on the stack, and not clearing reserved fields and paddings causes leakage. Noted and discussed with: bde MFC after: 2 weeks	2011-02-05 15:10:27 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Dmitry Chagin	77192fddeb	Regen for r218101. MFC after: 1 Month.	2011-01-30 20:38:26 +00:00
Dmitry Chagin	8d73c2bfd1	Change linux futex syscall definition to match actual linux one. MFC after: 1 Month.	2011-01-30 20:31:43 +00:00
Dmitry Chagin	9adaae9403	The kern_wait() code already removes the SIGCHLD signal for the waited process. Removing other SIGCHLD signals is not needed and may cause problems. Pointed out by: jilles MFC after: 1 Month.	2011-01-30 18:17:38 +00:00
Dmitry Chagin	adc7ece00a	Implement a variation of the linux_common_wait() which should be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.	2011-01-28 18:47:07 +00:00
Dmitry Chagin	a5c1afadeb	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
Matthew D Fleming	f89f7ada8d	Set td_kstack_pages for thread0. This was already being done for most architectures, but i386 and amd64 were missing it. Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>	2011-01-26 17:06:13 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Jung-uk Kim	fd240d6d9f	Fix yet another fallout from r208833. VM86 BIOS call may cause page fault when FPU is in use. Reported by: Marc UBM Bocklet (ubm dot freebsd at googlemail dot com) Tested by: b. f. (bf1783 at googlemail dot com) MFC after: 3 days	2011-01-19 17:09:07 +00:00
Konstantin Belousov	55aabb7fd1	For architectures not using direct map , and requiring real KVA page for sf buf allocation, use wakeup() instead of wakeup_one() to notify sf buffer waiters about free buffer. sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given, and for simultaneous wakeup and signal delivery, msleep() returns EINTR/ERESTART despite the thread was selected for wakeup_one(). As result, we loose a wakeup, and some other waiter will not be woken up. Reported and tested by: az Reviewed by: alc, jhb MFC after: 1 week	2011-01-18 21:57:02 +00:00
John Baldwin	6bd823f334	- Remove some always-true checks (checking for unsigned < 0). - Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT when descriptors are provided. Specifically, allow the 'start == 0' and 'num == 0' special case used to free all LDT entries that previously failed with EINVAL. Submitted by: clang via rdivacky (some of 1) Reviewed by: kib	2011-01-18 16:43:01 +00:00
Jung-uk Kim	2fea643112	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
Jung-uk Kim	df74996c3d	Avoid preemption while manipulating CRs and MTRRs. Tested by: ariff	2011-01-17 17:30:35 +00:00
John Baldwin	072e9838e2	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve	2011-01-13 17:00:22 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Tijl Coosemans	d22e78d6b9	Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs. Approved by: kib (mentor)	2011-01-08 18:09:48 +00:00
Tijl Coosemans	a56e818f29	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
Tijl Coosemans	d942996baf	On 32 bit architectures define (u)int64_t as (unsigned) long long instead of (unsigned) int __attribute__((__mode__(__DI__))). This aligns better with macros such as (U)INT64_C, (U)INT64_MAX, etc. which assume (u)int64_t has type (unsigned) long long. The mode attribute was used because long long wasn't standardised until C99. Nowadays compilers should support long long and use of the mode attribute is discouraged according to GCC Internals documentation. The type definition has to be marked with __extension__ to support compilation with "-std=c89 -pedantic". Discussed with: bde Approved by: kib (mentor)	2011-01-08 11:47:55 +00:00
Tijl Coosemans	9858863cd4	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
Tijl Coosemans	911127a0d6	Remove unused support for 64 bit long on 32 bit architectures. It was used mainly to discover and fix some 64-bit portability problems before 64-bit arches were widely available. Discussed with: bde Approved by: kib (mentor)	2011-01-07 22:57:31 +00:00
Konstantin Belousov	39198f15ee	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
John Baldwin	c305730dc0	Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by registering a filter handler rather than a threaded handler. Also remove a bogus use of INTR_MPSAFE for a filter.	2011-01-06 21:08:06 +00:00
Colin Percival	b5e61aab00	Spell CRITICAL_ASSERT correctly. Submitted by: jhb MFC with: r216944	2011-01-04 16:29:07 +00:00
Colin Percival	aa829cef6a	Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to be in {pmap_pinit, pmap_copy, pmap_release} at a time. This reduces the rate of panics when running 'make index' from ~0.6/hour to ~0.02/hour (p < 10^-30). At a later date this locking will be removed, and for this reason, it is wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in place with the intention of shipping somewhat-stable Xen bits in FreeBSD 8.2-RELEASE. PR: kern/153672 MFC after: 3 days	2011-01-04 15:55:15 +00:00
Robert Watson	2913e88c91	Make "options XENHVM" compile for i386, not just amd64 -- a largely mechanical change. This opens the door for using PV device drivers under Xen HVM on i386, as well as more general harmonisation of i386 and amd64 Xen support in FreeBSD. Reviewed by: cperciva MFC after: 3 weeks	2011-01-04 14:49:54 +00:00
Colin Percival	d01b2dad73	Adjust the critical section protecting _xen_flush_queue to cover the entire range where the page mapping request queue needs to be atomically examined and modified. Oddly, while this doesn't seem to affect the overall rate of panics (running 'make index' on EC2 t1.micro instances, there are 0.6 +/- 0.1 panics per hour, both before and after this change), it eliminates vm_fault from panic backtraces, leaving only backtraces going through vmspace_fork.	2011-01-04 00:16:38 +00:00
Colin Percival	aaaf607148	Make i386_set_ldt work on i386/XEN, step 5/5. When cleaning up a thread, reset its LDT to the default LDT. Note: Casting the LDT pointer to an int and storing it in pc_currentldt is wildly bogus, but is harmless since pc_currentldt is a write-only variable. MFC after: 3 days	2010-12-31 17:42:25 +00:00
Colin Percival	698cc19d6b	Make i386_set_ldt work on i386/XEN, step 4/5. Use xen_update_descriptor to update the LDT rather than bcopy. Under Xen, pages used for holding LDTs must be read-only, so we can't make the change ourselves. Ths obvious alternative of "remap the page read-write, make the change, then map it read-only again" doesn't work since Xen won't allow an LDT page to be remapped as R/W. An arguably better solution is used by NetBSD: They don't modify LDTs in-place at all, but instead copy the entire LDT, modify the new version, then atomically swap. MFC after: 3 days	2010-12-31 17:41:14 +00:00
Colin Percival	de187b8df2	Make i386_set_ldt work on i386/XEN, step 3/5. Synchronize reality with comment: The user_ldt_alloc function is supposed to return with dt_lock held. Due to broken locking in i386/xen/pmap.c, we drop dt_lock during the call to pmap_map_readonly and then pick it up again; this can be removed once the Xen pmap locking is fixed. MFC after: 3 days	2010-12-31 17:40:30 +00:00
Colin Percival	90b7d33458	Make i386_set_ldt work on i386/XEN, step 2/5. Don't map physical to machine page numbers in pte_load_store, since it uses PT_SET_VA (which takes a physical page number and converts it to a machine page number). MFC after: 3 days	2010-12-31 17:39:58 +00:00
Colin Percival	d262f2dcfc	Make i386_set_ldt work on i386/XEN, step 1/5. Lock the vm page queue mutex around calls to pte_store. As with many other uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs to be replaced at some future date by a spin lock dedicated to protecting the queue of pending xen page mapping hypervisor calls. (But for now, bogus locking is better than a panic.) MFC after: 3 days	2010-12-31 17:39:31 +00:00
Pyun YongHyeon	2608aefc0b	Add driver for DM&P Vortex86 RDC R6040 Fast Ethernet. The controller is commonly found on DM&P Vortex86 x86 SoC. The driver supports all hardware features except flow control. The flow control was intentionally disabled due to silicon bug. DM&P Electronics, Inc. provided all necessary information including sample board to write driver and answered many questions I had. Many thanks for their support of FreeBSD. H/W donated by: DM&P Electronics, Inc.	2010-12-31 00:21:41 +00:00
Warner Losh	714cf6c0df	Revert r216777, per jhb@	2010-12-28 22:45:29 +00:00
Warner Losh	1977f3f168	Comment out npx and isa from NOTES file. We don't need them here since DEFAULTS already pulls them in.	2010-12-28 21:22:08 +00:00
Warner Losh	78b92d19e0	Remove mem, io, isa and npx since they are duplicative of the entries in DEFAULTS. Saves 8 lines of warnings when we build XBOX.	2010-12-28 21:20:58 +00:00
Colin Percival	4a416f8375	Remove a "not strictly correct" (and panic-inducing) workaround for a bug which doesn't seem to exist. PR: kern/141328 MFC after: 3 days	2010-12-28 14:36:32 +00:00
Colin Percival	76c9650713	Build the modules which can be built. The excluded modules fall into two categories: Those which can't build with PAE because they attempt to cast a pointer to a bus_addr_t (mostly scsi drivers); and those which can't be built with XEN because they conflict with something in xen-os.h (e.g., in cxgb there is a conflicting definition of test_and_clear_bit). MFC after: 1 week	2010-12-27 23:59:27 +00:00
Colin Percival	8ea0b3bb2f	Lock the vm page queue mutex in pmap_pte_release around the call to PMAP_SET_VA; this fixes a mutex-not-held panic when a process which called mlock(2) exits, and parallels a change made in pmap_pte 10 months ago (svn r204160). Note: The locking in this code is utterly broken. We should not be using the VM page queue mutex to protect the queue of pending Xen page mapping hypervisor calls. Even if it made sense to do so, this commit and r204160 introduce LORs between the vm page queue mutex and PMAP2mutex. (However, a possible deadlock is better than a guaranteed panic, and this change will hopefully make life easier for whoever fixes the Xen pmap locking in the future.) PR: kern/140313 MFC after: 3 days	2010-12-26 13:05:43 +00:00
Tijl Coosemans	81bd5041a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
Alan Cox	9d555e459c	Redo some parts of r216333, specifically, the locking changes to pmap_extract_and_hold(), and undo the rest. In particular, I forgot that PG_PS and PG_PTE_PAT are the same bit.	2010-12-19 07:31:56 +00:00
Konstantin Belousov	7222d2fbee	Inform a compiler which asm statements in the x86 implementation of atomics change eflags. Reviewed by: jhb MFC after: 2 weeks	2010-12-18 16:41:11 +00:00
Konstantin Belousov	a9b31c256e	In pmap_extract(), unlock pmap lock earlier. The calculation does not need the lock when operating on local variables. Reviewed by: alc	2010-12-18 11:31:32 +00:00
Jung-uk Kim	e1c9d39ebe	Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This function always returned the nominal frequency instead of current frequency because we use RDTSC instruction to calculate difference in CPU ticks, which is supposedly constant for the case. Now we support cpu_get_nominal_mhz() for the case, instead. Note it should be just enough for most usage cases because cpu_est_clockrate() is often times abused to find maximum frequency of the processor.	2010-12-14 20:07:51 +00:00
Konstantin Belousov	60c7c84e85	In fpudna()/npxdna(), mark FPU context initialized and optionally mark user FPU context initialized, if current context is user context. It was reversed in r215865, by inadequate change of this code fragment to a call to fpuuserinited()/npxuserinited(). The issue is only relevant for in-kernel users of FPU. Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net> Tested by: Mike Tancsa MFC after: 3 days	2010-12-12 16:16:39 +00:00
Colin Percival	20d1a304b3	Reduce the Xen timecounter from 1GHz to 2^-9 GHz, thereby increasing the timecounter period from 2^32 ns (~4.3s) to 2^41 ns (~36m39s). Some time sharing systems can skip clock interrupts for a few seconds when under load (e.g., if we've recently used more than our fair share of CPU and someone else wants a burst of CPU) and we were losing time in quanta of 2^32 ns due to timecounter wrapping. Increasing the timecounter period up to 2^41 ns is definitely overkill, but we still have microsecond timecounter precision, and anyone using paravirtualized hardware when they need submicrosecond timing is crazy.	2010-12-11 22:33:33 +00:00
Colin Percival	0f30ed5bc6	Make the machdep.independent_wallclock sysctl do what it says on the box.	2010-12-11 20:12:42 +00:00
Alan Cox	d1cf854b5d	When r207410 eliminated the acquisition and release of the page queues lock from pmap_extract_and_hold(), it didn't take into account that pmap_pte_quick() sometimes requires the page queues lock to be held. This change reimplements pmap_extract_and_hold() such that it no longer uses pmap_pte_quick(), and thus never requires the page queues lock. For consistency, adopt the same idiom as used by the new implementation of pmap_extract_and_hold() in pmap_extract() and pmap_mincore(). It also happens to make these functions shorter. Fix a style error in pmap_pte(). Reviewed by: kib@	2010-12-09 20:16:00 +00:00
Colin Percival	91ff9dc058	Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.	2010-12-09 06:41:50 +00:00
Jung-uk Kim	71e0b05797	Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC. Remove a confusing comment about converting to MHz as we never did.	2010-12-08 23:40:41 +00:00
Colin Percival	af60888734	On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192, while on i386 we have MAX_BPAGES=512. Implement this difference via '#ifdef __i386__'. With this commit, the i386 and amd64 busdma_machdep.c files become identical; they will soon be replaced by a single file under sys/x86.	2010-12-08 20:20:10 +00:00
Jung-uk Kim	dd7d207dcb	Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86. Discussed with: avg	2010-12-08 00:09:24 +00:00
Jung-uk Kim	61d14101dd	Use int for 'tsc_present' instead of u_int. It is just a boolean.	2010-12-07 23:19:49 +00:00
Jung-uk Kim	7214d5d75b	Remove stale comments about P-state invariant TSC and fix style(9) nits.	2010-12-07 22:43:25 +00:00
Jung-uk Kim	1bcc28295b	Do not register a event handler for CPU freqency changes when it is found P-state invariant. This is continuation of r216274.	2010-12-07 22:34:51 +00:00
Jung-uk Kim	4a9c4056dc	Now the P-state invariant TSC is probed early enough, do not register event handlers for CPU freqency changes when it is found P-state invariant. Adjust a comment about non-existent tsc_freq_max() while I am here.	2010-12-07 22:23:26 +00:00
Jung-uk Kim	78a661bbaa	Probe P-state invariant TSC from rightful place.	2010-12-07 22:12:02 +00:00
Colin Percival	716d203d6b	MFamd64 r204214: Enforce stronger alignment semantics (require that the end of segments be aligned, not just the start of segments) in order to allow Xen's blkfront driver to operate correctly. PR: kern/152818 MFC after: 3 days	2010-12-05 03:20:55 +00:00
Colin Percival	a39dc31fca	Remove gratuitous i386/amd64 inconsistency in favour of the less verbose version of declaring a variable initialized to zero.	2010-12-04 23:36:40 +00:00
Colin Percival	5c5590862f	Remove unnecessary #includes which seem to have been accidentally added as part of CVS r1.76 (in January 2006).	2010-12-04 23:24:35 +00:00
Jung-uk Kim	2f7ab7e85d	Revert r216161. It is not necessary because we zero-fill BSS anyway. Requested by: jhb	2010-12-03 22:27:51 +00:00
Jung-uk Kim	b14fe63392	Explicitly initialize TSC frequency. To calibrate TSC frequency, we use DELAY(9) and it may use TSC in turn if TSC frequency is non-zero. MFC after: 3 days	2010-12-03 21:54:10 +00:00
Jung-uk Kim	e391a266ed	Do not change CPU ticker frequency if TSC is P-state invariant. Note this change was meant to be committed with r184102 (and its subsequent MFCs) but it fell off somehow. Pointyhat to: jkim MFC after: 3 days	2010-12-03 21:06:30 +00:00
Rebecca Cran	c90f7d9b44	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Colin Percival	d42446149f	Fix bug introduced by r194784: Under XEN, the page(s) allocated to dpcpu for CPU #0 weren't being properly reserved. Under VM pressure this would cause problems when the dpcpu structures were overwritten by arbitrary data; the most common symptom was a panic when netisr attempted to lock a mutex. For some reason the XEN code keeps track of the start of available memory in the variables 'first', 'physfree', and 'init_first'; as far as I can tell, we always have first == physfree == init_first * PAGE_SIZE. The earlier commit adjusted 'first' (which, on !XEN, is the only variable which tracks this value) but not the other two variables. Exercise for reader: Eliminate two of these three variables.	2010-11-29 06:50:30 +00:00
Konstantin Belousov	c6fb218c3c	Calling fill_fpregs() for curthread is legitimate, and ELF coredump does this. Reported and tested by: pho MFC after: 5 days	2010-11-28 17:56:34 +00:00
Konstantin Belousov	5c6eb03790	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
Tijl Coosemans	ce4ec51dbe	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
Ulrich Spörlein	02604cd4f4	Remove kernel support for BB profiling, now that kernbb(8) is gone, too. PR: bin/83558 Reviewed by: jkim	2010-11-26 08:11:43 +00:00
Colin Percival	5c0ab2fa8b	Revert r215819 and fix the bug properly. In pmap_qremove, paging table updates were being queued by pmap_kremove, but the queue wasn't being flushed; as a result, the updates didn't happen until after the call to pmap_invalidate_range, and old entries could stick around in the TLB. Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range ensures that after the invalidation the TLB will be repopulated with the correct new entries. Thanks to: kib, avg, alc	2010-11-25 22:06:07 +00:00
Dimitry Andric	079d7e43ca	Use unambiguous inline assembly to load a float variable. GNU as silently converts 'fld' to 'flds', without taking the actual variable type into account (!), but clang's integrated assembler rightfully complains about it. Discussed with: cperciva	2010-11-25 18:14:18 +00:00
John Baldwin	9d76324839	Add device IDs for two more ServerWorks Host-PCI bridges so that we can read their starting PCI bus number for older systems that do not support ACPI (or have a broken _BBN method). PR: kern/148108 MFC after: 1 week	2010-11-25 15:42:33 +00:00
Colin Percival	98702b3990	Work around paging bug. Somehow we seem to be ending up with entries in the TLB which don't correspond to ptes with PG_V set; prior to this commit I'm sometimes getting the wrong data when pages are loaded into the buffer cache (they're being loaded, but the missing TLB invalidation is causing the wrong data to be visible).	2010-11-25 15:41:34 +00:00
Colin Percival	1a3b2b87de	Rename HYPERVISOR_multicall (which performs the multicall hypercall) to _HYPERVISOR_multicall, and create a new HYPERVISOR_multicall function which invokes _HYPERVISOR_multicall and checks that the individual hypercalls all succeeded.	2010-11-25 15:05:21 +00:00
Colin Percival	0bd7a92067	Remove vestigal debugging code which, in fork-heavy workloads, can cause a 30x slowdown.	2010-11-25 04:45:31 +00:00
Jung-uk Kim	d2d0fda841	Remove a stale tunable introduced in r215703.	2010-11-23 17:28:23 +00:00
Andriy Gapon	9b984feb3d	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
Jung-uk Kim	7dd052c1d9	- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR. Flushing TLBs is required to ensure cache coherency according to the AMD64 architecture manual. Flushing caches is only required when changing from a cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or UC-). Since this function is only used once per processor during startup, there is no need to take any shortcuts. - Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the default of WB and UC. This is to avoid transition from a cacheable memory type to an uncacheable type to minimize possible cache incoherency. Since we perform flushing caches and TLBs now, this change may not be necessary any more but we do not want to take any chances. - Remove Apple hardware specific quirks. With the above changes, it seems this hack is no longer needed. - Improve pmap_cache_bits() with an array to map PAT memory type to index. This array is initialized early from pmap_init_pat(), so that we do not need to handle special cases in the function any more. Now this function is identical on both amd64 and i386. Reviewed by: jhb Tested by: RM (reuf_m at hotmail dot com) Ryszard Czekaj (rychoo at freeshell dot net) army.of.root (army dot of dot root at googlemail dot com) MFC after: 3 days	2010-11-22 19:52:44 +00:00
Colin Percival	c3f128981e	In xen_get_timecount, return the full ns-precision time rather than rounding to 1/HZ precision. I have no idea why the rounding was introduced in the first place, but it makes FreeBSD unhappy.	2010-11-22 09:04:29 +00:00
Colin Percival	61381fcf2d	Unifdef XEN. This file is only compiled with the XEN kernel option set, and the !XEN bits get in the way of understanding the code.	2010-11-20 21:36:12 +00:00
Colin Percival	8cdabbaf32	Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses. Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...). In a few places we take advantage of the fact that xpmap_ptom can commute with setting PG_* flags. This commit should have no net effect save to improve the readability of this code.	2010-11-20 20:04:29 +00:00
Colin Percival	ad520892d7	Make pmap_release consistent with pmap_pinit with respect to unpinning pages. The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm not quite sure why...) and this commit adds a corresponding #if 0 in pmap_release to avoid unpinning those pages. Some versions of Xen seem to silently ignore requests to unpin pages which were never pinned in the first place, but some return an error (causing FreeBSD to panic) prior to this commit.	2010-11-19 15:12:19 +00:00
Andriy Gapon	b43d292565	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
Andriy Gapon	7af7c7624a	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
Andriy Gapon	8fd6d51347	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
Colin Percival	f4c884f95a	Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt) to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.	2010-11-18 21:29:43 +00:00
Colin Percival	f86f965ef8	Don't KASSERT in pmap_release that xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME) for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for 0 <= i < NPGPTD. This fixes an inevitable panic with XEN && PAE && INVARIANTS when pmap_release is called (e.g., when /sbin/init is launched).	2010-11-18 21:02:40 +00:00
Jung-uk Kim	816b3bd1b0	Restore CR0 after MTRR initialization for correctness sakes. There will be no noticeable change because we enable caches before we enter here for both BSP and AP cases. Remove another pointless optimization for CR4.PGE bit while I am here.	2010-11-16 23:26:02 +00:00
Jung-uk Kim	50083a5624	Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this code but probably it only worked by chance because modifying CR4.PGE bit causes invlidation of entire TLBs. Since these are very rare events, this micro-optimization seems useless. Reviewed by: jhb	2010-11-16 22:44:58 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Jung-uk Kim	a3c464fb3c	MFamd64: (based on) r209957 Move logic of building ACPI headers for acpi_wakeup.c into better places, remove intermediate makefile and shell script, and reduce diff between i386 and amd64.	2010-11-12 20:55:14 +00:00
Jung-uk Kim	19da400c64	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
Jung-uk Kim	926ad40ff9	Add compat shim for apm(4) to translate APM BIOS function numbers from i386 to PC98-specific ones. Any binaries using apm ioctl(4) commands but built for i386 should also work on PC98 now. Reviewed by: imp, nyan	2010-11-11 19:20:33 +00:00
Jung-uk Kim	93a8847473	Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.	2010-11-10 18:50:12 +00:00
Jung-uk Kim	7c2bf852d7	Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.	2010-11-10 01:29:56 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
Jung-uk Kim	cedd86cafa	Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.	2010-11-09 00:27:18 +00:00
Jung-uk Kim	2473325fa8	Reduce diff between platforms and fix style(9) bugs.	2010-11-09 00:14:39 +00:00
John Baldwin	13e25cb7a5	Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.	2010-11-08 20:57:02 +00:00
John Baldwin	c5b0b5fc6b	Sync the APIC startup sequence with amd64: - Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1. - Probe CPUs at SI_SUB_TUNABLES - 1. This allows i386 to set a truly accurate mp_maxid value rather than always setting it to MAXCPU - 1.	2010-11-08 20:35:09 +00:00
John Baldwin	6c3c2b8b87	Remove stub symbols for APIC-related functions when 'device apic' is not included in a kernel config. These stubs had existed previously so that acpi.ko could always include the MADT parsing code and still link with a kernel that did not include 'device apic'.	2010-11-08 20:32:35 +00:00
John Baldwin	f67b4bd367	A few small style and whitespace fixes.	2010-11-08 20:05:22 +00:00
Alan Cox	228a253795	Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on superpage promotion or demotion. Micro-optimize pmap_kenter_pde(). Reviewed by: kib, jhb (an earlier version) MFC after: 1 week	2010-11-07 18:42:37 +00:00
John Baldwin	0108cce0a4	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
Andriy Gapon	3b50d59fef	x86 topo_probe: do not probe smp topology if only one cpu is visible This could lead to a division by zero if hardware is multi-core and/or multi-threaded, but for some (quite unusual) reason FreeBSD sees only one logical processor. This could happen, for example, if neither MADT nor MP Table are presented by BIOS. Also: - assert in topo_probe_0x4 that BSP is accounted for - neither cpu_cores nor cpu_logical should be zero after successful probing, so either being zero is an indication of failed probing Reported by: vwe, Dan Allen <danallen46@airwired.net> Tested by: Dan Allen <danallen46@airwired.net> MFC after: 3 days	2010-11-04 08:51:45 +00:00
John Baldwin	239da85bbc	Further tweaks to the ram_attach() routine: - Use > 2^32 - 1 instead of >= when checking for memory regions above 4G. - Skip SMAP entries > 4G on i386 rather than breaking out of the loop since SMAP entries are not guaranteed to be in order. - Remove 'i' and loop over 'rid' directly in the dump_avail[] case. - Only check for 4G regions in the dump_avail[] case on i386 if PAE is enabled since vm_paddr_t is 32-bit in the !PAE case. Submitted by: alc	2010-11-02 17:56:16 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
John Baldwin	5ecdb3c46b	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
Attilio Rao	ba2a27351b	Merge nexus.c from amd64 and i386 to x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-28 16:31:39 +00:00
John Baldwin	89d84a4055	Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[] is only populated for multiple-CPU machines. This also matches what the code does when SMP is not enabled. PR: bin/151616 Tested by: "Damian S. Kolodziejczyk" damkol \| gmail Submitted by: avg MFC after: 1 week	2010-10-28 13:44:19 +00:00
Attilio Rao	a3da97926d	Merge the mptable support from MD bits to x86 subtree. Sponsored by: Sandvine Incorporated Discussed with: jhb	2010-10-28 07:58:06 +00:00
Attilio Rao	256439c972	Merge dump_machdep.c i386/amd64 under the x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-26 12:46:26 +00:00
John Baldwin	0689bdcc19	Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable(). Requested by: bde	2010-10-25 15:31:13 +00:00
John Baldwin	c6390f7ac5	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
Justin T. Gibbs	ff662b5c98	Improve the Xen para-virtualized device infrastructure of FreeBSD: o Add support for backend devices (e.g. blkback) o Implement extensions to the Xen para-virtualized block API to allow for larger and more outstanding I/Os. o Import a completely rewritten block back driver with support for fronting I/O to both raw devices and files. o General cleanup and documentation of the XenBus and XenStore support code. o Robustness and performance updates for the block front driver. o Fixes to the netfront driver. Sponsored by: Spectra Logic Corporation sys/xen/xenbus/init.txt: Deleted: This file explains the Linux method for XenBus device enumeration and thus does not apply to FreeBSD's NewBus approach. sys/xen/xenbus/xenbus_probe_backend.c: Deleted: Linux version of backend XenBus service routines. It was never ported to FreeBSD. See xenbusb.c, xenbusb_if.m, xenbusb_front.c xenbusb_back.c for details of FreeBSD's XenBus support. sys/xen/xenbus/xenbusvar.h: sys/xen/xenbus/xenbus_xs.c: sys/xen/xenbus/xenbus_comms.c: sys/xen/xenbus/xenbus_comms.h: sys/xen/xenstore/xenstorevar.h: sys/xen/xenstore/xenstore.c: Split XenStore into its own tree. XenBus is a software layer built on top of XenStore. The old arrangement and the naming of some structures and functions blurred these lines making it difficult to discern what services are provided by which layer and at what times these services are available (e.g. during system startup and shutdown). sys/xen/xenbus/xenbus_client.c: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_probe.c: sys/xen/xenbus/xenbusb.c: sys/xen/xenbus/xenbusb.h: Split up XenBus code into methods available for use by client drivers (xenbus.c) and code used by the XenBus "bus code" to enumerate, attach, detach, and service bus drivers. sys/xen/reboot.c: sys/dev/xen/control/control.c: Add a XenBus front driver for handling shutdown, reboot, suspend, and resume events published in the XenStore. Move all PV suspend/reboot support from reboot.c into this driver. sys/xen/blkif.h: New file from Xen vendor with macros and structures used by a block back driver to service requests from a VM running a different ABI (e.g. amd64 back with i386 front). sys/conf/files: Adjust kernel build spec for new XenBus/XenStore layout and added Xen functionality. sys/dev/xen/balloon/balloon.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/blkfront/blkfront.c: sys/xen/xenbus/... sys/xen/xenstore/... o Rename XenStore APIs and structures from xenbus_* to xs_. o Adjust to use of M_XENBUS and M_XENSTORE malloc types for allocation of objects returned by these APIs. o Adjust for changes in the bus interface for Xen drivers. sys/xen/xenbus/... sys/xen/xenstore/... Add Doxygen comments for these interfaces and the code that implements them. sys/dev/xen/blkback/blkback.c: o Rewrite the Block Back driver to attach properly via newbus, operate correctly in both PV and HVM mode regardless of domain (e.g. can be in a DOM other than 0), and to deal with the latest metadata available in XenStore for block devices. o Allow users to specify a file as a backend to blkback, in addition to character devices. Use the namei lookup of the backend path to automatically configure, based on file type, the appropriate backend method. The current implementation is limited to a single outstanding I/O at a time to file backed storage. sys/dev/xen/blkback/blkback.c: sys/xen/interface/io/blkif.h: sys/xen/blkif.h: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Extend the Xen blkif API: Negotiable request size and number of requests. This change extends the information recorded in the XenStore allowing block front/back devices to negotiate for optimal I/O parameters. This has been achieved without sacrificing backward compatibility with drivers that are unaware of these protocol enhancements. The extensions center around the connection protocol which now includes these additions: o The back-end device publishes its maximum supported values for, request I/O size, the number of page segments that can be associated with a request, the maximum number of requests that can be concurrently active, and the maximum number of pages that can be in the shared request ring. These values are published before the back-end enters the XenbusStateInitWait state. o The front-end waits for the back-end to enter either the InitWait or Initialize state. At this point, the front end limits it's own capabilities to the lesser of the values it finds published by the backend, it's own maximums, or, should any back-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures including allocation of the shared ring, publishes its maximum capabilities to the XenStore and transitions to the Initialized state. o The back-end waits for the front-end to enter the Initalized state. At this point, the back end limits it's own capabilities to the lesser of the values it finds published by the frontend, it's own maximums, or, should any front-end data be missing in the store, the values supported by the original protocol. It then initializes it's internal data structures, attaches to the shared ring and transitions to the Connected state. o The front-end waits for the back-end to enter the Connnected state, transitions itself to the connected state, and can commence I/O. Although an updated front-end driver must be aware of the back-end's InitWait state, the back-end has been coded such that it can tolerate a front-end that skips this step and transitions directly to the Initialized state without waiting for the back-end. sys/xen/interface/io/blkif.h: o Increase BLKIF_MAX_SEGMENTS_PER_REQUEST to 255. This is the maximum number possible without changing the blkif request header structure (nr_segs is a uint8_t). o Add two new constants: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, and BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK. These respectively indicate the number of segments that can fit in the first ring-buffer entry of a request, and for each subsequent (sg element only) ring-buffer entry associated with the "header" ring-buffer entry of the request. o Add the blkif_request_segment_t typedef for segment elements. o Add the BLKRING_GET_SG_REQUEST() macro which wraps the RING_GET_REQUEST() macro and returns a properly cast pointer to an array of blkif_request_segment_ts. o Add the BLKIF_SEGS_TO_BLOCKS() macro which calculates the number of ring entries that will be consumed by a blkif request with the given number of segments. sys/xen/blkif.h: o Update for changes in interface/io/blkif.h macros. o Update the BLKIF_MAX_RING_REQUESTS() macro to take the ring size as an argument to allow this calculation on multi-page rings. o Add a companion macro to BLKIF_MAX_RING_REQUESTS(), BLKIF_RING_PAGES(). This macro determines the number of ring pages required in order to support a ring with the supplied number of request blocks. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: o Negotiate with the other-end with the following limits: Reqeust Size: MAXPHYS Max Segments: (MAXPHYS/PAGE_SIZE) + 1 Max Requests: 256 Max Ring Pages: Sufficient to support Max Requests with Max Segments. o Dynamically allocate request pools and segemnts-per-request. o Update ring allocation/attachment code to support a multi-page shared ring. o Update routines that access the shared ring to handle multi-block requests. sys/dev/xen/blkfront/blkfront.c: o Track blkfront allocations in a blkfront driver specific malloc pool. o Strip out XenStore transaction retry logic in the connection code. Transactions only need to be used when the update to multiple XenStore nodes must be atomic. That is not the case here. o Fully disable blkif_resume() until it can be fixed properly (it didn't work before this change). o Destroy bus-dma objects during device instance tear-down. o Properly handle backend devices with powef-of-2 sector sizes larger than 512b. sys/dev/xen/blkback/blkback.c: Advertise support for and implement the BLKIF_OP_WRITE_BARRIER and BLKIF_OP_FLUSH_DISKCACHE blkif opcodes using BIO_FLUSH and the BIO_ORDERED attribute of bios. sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: Fix various bugs in blkfront. o gnttab_alloc_grant_references() returns 0 for success and non-zero for failure. The check for < 0 is a leftover Linuxism. o When we negotiate with blkback and have to reduce some of our capabilities, print out the original and reduced capability before changing the local capability. So the user now gets the correct information. o Fix blkif_restart_queue_callback() formatting. Make sure we hold the mutex in that function before calling xb_startio(). o Fix a couple of KASSERT()s. o Fix a check in the xb_remove_ macro to be a little more specific. sys/xen/gnttab.h: sys/xen/gnttab.c: Define GNTTAB_LIST_END publicly as GRANT_REF_INVALID. sys/dev/xen/netfront/netfront.c: Use GRANT_REF_INVALID instead of driver private definitions of the same constant. sys/xen/gnttab.h: sys/xen/gnttab.c: Add the gnttab_end_foreign_access_references() API. This API allows a client to batch the release of an array of grant references, instead of coding a private for loop. The implementation takes advantage of this batching to reduce lock overhead to one acquisition and release per-batch instead of per-freed grant reference. While here, reduce the duration the gnttab_list_lock is held during gnttab_free_grant_references() operations. The search to find the tail of the incoming free list does not rely on global state and so can be performed without holding the lock. sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/evtchn/evtchn.c: sys/xen/xen_intr.h: o Implement the bind_interdomain_evtchn_to_irqhandler API for HVM mode. This allows an HVM domain to serve back end devices to other domains. This API is already implemented for PV mode. o Synchronize the API between HVM and PV. sys/dev/xen/xenpci/xenpci.c: o Scan the full region of CPUID space in which the Xen VMM interface may be implemented. On systems using SuSE as a Dom0 where the Viridian API is also exported, the VMM interface is above the region we used to search. o Pass through bus_alloc_resource() calls so that XenBus drivers attaching on an HVM system can allocate unused physical address space from the nexus. The block back driver makes use of this facility. sys/i386/xen/xen_machdep.c: Use the correct type for accessing the statically mapped xenstore metadata. sys/xen/interface/hvm/params.h: sys/xen/xenstore/xenstore.c: Move hvm_get_parameter() to the correct global header file instead of as a private method to the XenStore. sys/xen/interface/io/protocols.h: Sync with vendor. sys/xeninterface/io/ring.h: Add macro for calculating the number of ring pages needed for an N deep ring. To avoid duplication within the macros, create and use the new __RING_HEADER_SIZE() macro. This macro calculates the size of the ring book keeping struct (producer/consumer indexes, etc.) that resides at the head of the ring. Add the __RING_PAGES() macro which calculates the number of shared ring pages required to support a ring with the given number of requests. These APIs are used to support the multi-page ring version of the Xen block API. sys/xeninterface/io/xenbus.h: Add Comments. sys/xen/xenbus/... o Refactor the FreeBSD XenBus support code to allow for both front and backend device attachments. o Make use of new config_intr_hook capabilities to allow front and back devices to be probed/attached in parallel. o Fix bugs in probe/attach state machine that could cause the system to hang when confronted with a failure either in the local domain or in a remote domain to which one of our driver instances is attaching. o Publish all required state to the XenStore on device detach and failure. The majority of the missing functionality was for serving as a back end since the typical "hot-plug" scripts in Dom0 don't handle the case of cleaning up for a "service domain" that is not itself. o Add dynamic sysctl nodes exposing the generic ivars of XenBus devices. o Add doxygen style comments to the majority of the code. o Cleanup types, formatting, etc. sys/xen/xenbus/xenbusb.c: Common code used by both front and back XenBus busses. sys/xen/xenbus/xenbusb_if.m: Method definitions for a XenBus bus. sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusb_back.c: XenBus bus specialization for front and back devices. MFC after: 1 month	2010-10-19 20:53:30 +00:00
Jung-uk Kim	56b11f84a7	Remove trailing ", " from `sysctl machdep.idle_available' output.	2010-10-12 20:53:12 +00:00
Konstantin Belousov	78ae4338a2	Add macro DECLARE_MODULE_TIED to denote a module as requiring the kernel of exactly the same __FreeBSD_version as the headers module was compiled against. Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules use kernel interfaces that the Release Engineering Team feel are not stable enough to guarantee they will not change during the life cycle of a STABLE branch. In particular, the layout of struct sysentvec is declared to be not part of the STABLE KBI. Discussed with: bz, rwatson Approved by: re (bz, kensmith) MFC after: 2 weeks	2010-10-12 09:18:17 +00:00
Alan Cox	fb4c8540b2	Initialize KPTmap in locore so that vm86.c can call vtophys() (or really pmap_kextract()) before pmap_bootstrap() is called. Document the set of pmap functions that may be called before pmap_bootstrap() is called. Tested by: bde@ Reviewed by: kib@ Discussed with: jhb@ MFC after: 6 weeks	2010-10-05 17:06:51 +00:00
Konstantin Belousov	3f506a78ce	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
Andriy Gapon	d443a96ffb	i386 and amd64 mp_machdep: improve topology detection for Intel CPUs This patch is significantly based on previous work by jkim. List of changes: - added comments that describe topology uniformity assumption - added reference to Intel Processor Topology Enumeration article - documented a few global variables that describe topology - retired weirdly set and used logical_cpus variable - changed fallback code for mp_ncpus > 0 case, so that CPUs are treated as being different packages rather than cores in a single package - moved AMD-specific code to topo_probe_amd [jkim] - in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT and core masks and match APIC IDs against those masks [started by jkim] - in topo_probe_0x4() drop code for double-checking topology parameters by looking at L1 cache properties [jkim] - in topo_probe_0xb() add fallback path to topo_probe_0x4() as prescribed by Intel [jkim] Still to do: - prepare for upcoming AMD CPUs by using new mechanism of uniform topology description [pointed by jkim] - probe cache topology in addition to CPU topology and probably use that for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have the same CPU topology, but Athlon cores do not share L2 cache while Core2's do (no L3 cache in both cases) - think of supporting non-uniform topologies if they are ever implemented for platforms in question - think how to better described old HTT vs new HTT distinction, HTT vs SMT can be confusing as SMT is a generic term - more robust code for marking CPUs as "logical" and/or "hyperthreaded", use HTT mask instead of modulo operation - correct support for halting logical and/or hyperthreaded CPUs, let scheduler know that it shouldn't schedule any threads on those CPUs PR: kern/145385 (related) In collaboration with: jkim Tested by: Sergey Kandaurov <pluknet@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Chip Camden <sterling@camdensoftware.com>, Steve Wills <steve@mouf.net>, Olivier Smedts <olivier@gid0.org>, Florian Smeets <flo@smeets.im> MFC after: 1 month	2010-10-01 10:32:54 +00:00
Neel Natu	5c1a8dc028	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb	2010-09-29 21:53:11 +00:00
David Xu	8d2a935e45	Remove a redundant instruction for casuword.	2010-09-29 02:36:58 +00:00
John Baldwin	1691b62202	Rewrite the i386 memory probe: - Check for SMAP data from the loader first. If it exists, don't bother doing any VM86 calls at all. This will be more friendly for non-BIOS boot environments such as EFI, etc. - Move the base memory setup into a new basemem_setup() routine instead of duplicating it. - Simplify the XEN case by removing all of the VM86/SMAP parsing code rather than just jumping over it. - Adjust some comments to better explain the code flow. MFC after: 2 weeks	2010-09-27 19:36:15 +00:00
David Xu	295fbd498e	Now userland POSIX semaphore is based on umtx. The kernel module is only used to support binary compatible, if want to run old binary, you need to kldload the module.	2010-09-24 09:04:16 +00:00
Norikatsu Shigemura	cbf4dac64f	Add support 'device tpm' for amd64. Add tpm(4)'s default setting to /boot/defaults/loader.conf. Add 'device tpm' to NOTES for amd64 and i386. Discussed with: takawata Approved by: imp (mentor)	2010-09-19 14:40:37 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Roman Divacky	27d4fea6c5	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
Rui Paulo	cba3269417	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
Rui Paulo	6bf9fb35e5	Sync DTrace bits with amd64 and fix the build. Sponsored by: The FreeBSD Foundation	2010-08-26 11:22:12 +00:00
Jung-uk Kim	db1cea00ad	Increase maximum number of page table entries per VM86 context from 8 to 24 pages, yet again. Now we can allocate a whole segment, which is required for shadowing option ROM images, for example.	2010-08-25 21:13:23 +00:00
Rui Paulo	0bc1991a4a	Call the necessary DTrace function pointers when we have different kinds of traps. Sponsored by: The FreeBSD Foundation	2010-08-25 09:10:32 +00:00
Rui Paulo	8a8d8fa3d1	Add two DTrace trap type values. Used by fasttrap. Sponsored by: The FreeBSD Foundation	2010-08-24 13:13:24 +00:00
Attilio Rao	67a94de261	Revert part of the r211149 as I erroneously ported the logical_cpus from Yahoo! patchset as a mask (and according manipulating variables) while it is actually a CPU count. Submitted by: neel MFC after: 1 month X-MFC: 211149	2010-08-19 22:37:43 +00:00
John Baldwin	8c7a92bd4a	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
Rui Paulo	187278cadc	For every instance of '.if ${CC} == "foo"' or '.if ${CC} != "foo"' in Makefiles or .mk files, use ${CC:T:Mfoo} instead, so only the basename of the compiler command (excluding any arguments) is considered. This allows you to use, for example, CC="/nondefault/path/clang -xxx", and still have the various tests in bsd..mk identify your compiler as clang correctly. ICC if cases were also changed. Submitted by: Dimitry Andric <dimitry at andric.com>	2010-08-17 20:39:28 +00:00
Pietro Cerutti	e0e08e6a60	- The iMac9,1 needs the PAT workaround as well Approved by: cognet	2010-08-17 12:17:24 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
Attilio Rao	3742bd96fe	Revert r211176: As long as interrupts are disabled and there is not explicit call to sched_add() there can't be any preemption there, thus the calls may be consistent. Reported by: kib, jhb	2010-08-12 13:46:43 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
Attilio Rao	807ef45666	IPI handlers may run generally with interrupts disabled because they are served via an interrupt gate. However, that doesn't explicitly prevent preemption and thread migration thus scheduler pinning may be necessary in some handlers. Fix that. Tested by: gianni MFC after: 1 month	2010-08-11 10:51:27 +00:00
Attilio Rao	7cd8b4cd42	Fix a typo due to a stale version of the patch. Reported by: gianni, rdivacky MFC after: 1 month X-MFC: 211149	2010-08-10 18:29:39 +00:00
Attilio Rao	4c967b618d	Fix some places that may use cpumask_t while they still use 'int' types. While there, also fix some places assuming cpu type is 'int' while u_int is really meant. Note: this will also fix some possible races in per-cpu data accessings to be addressed in further commits. In collabouration with: Yahoo! Incorporated (via sbruno and peter) Tested by: gianni MFC after: 1 month	2010-08-10 16:14:10 +00:00
Attilio Rao	d35534bf42	Simplify the logic for handling ipi_selected() and ipi_cpu() in the amd64/i386 case. Reviewed by: jhb Tested by: gianni MFC after: 1 month X-MFC: 210939	2010-08-09 20:25:06 +00:00
David Malone	ee04083c8a	Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not being used.	2010-08-08 20:34:53 +00:00
Bernhard Schmidt	5ec432ed82	Fix whitespace nits. PR: conf/148989 Submitted by: pluknet <pluknet at gmail.com> MFC after: 3 days	2010-08-06 18:46:27 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Jung-uk Kim	439f3d8b81	Implement a simple native VM86 backend for X86BIOS. Now i386 uses native VM86 calls instead of the real mode emulator as a backend. VM86 has been proven reliable for very long time and it is actually few times faster than emulation. Increase maximum number of page table entries per VM86 context from 3 to 8 pages. It was (ridiculously) low and insufficient for new VM86 backend, which shares one context globally. Slighly rearrange and clean up the emulator backend to accommodate new code. The only visible change here is stack size, which is decreased from 64K to 4K bytes to sync. with VM86. Actually, it seems there is no need for big stack in real mode. MFC after: 1 month	2010-08-05 18:48:30 +00:00
John Baldwin	e2865ebbc2	Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the generic PCI-PCI bridge driver and only override specific methods. This should fix suspend/resume of PCI-PCI bridges using these drivers.	2010-08-05 17:48:37 +00:00
John Baldwin	7134e39042	Tweak the logic to disable CLFLUSH in virtual environments to work around problems with flushing the local APIC register range so that it checks vm_guest directly. Reviewed by: kib, alc MFC after: 2 weeks	2010-08-02 17:01:23 +00:00
Xin LI	a3bc0a4e5c	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
John Baldwin	536af0d751	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
Jung-uk Kim	994ce54d01	MFamd64: r210615 Fix another fallout from r208833. savectx() is used to save CPU context for crash dump (dumppcb) and kdb (stoppcbs). For both cases, we cannot have a valid pointer in pcb_save. This should restore the previous behaviour.	2010-07-29 17:00:41 +00:00
John Baldwin	a955c461ad	The corrected error count field is dependent on CMCI, not TES. MFC after: 1 week	2010-07-28 21:52:09 +00:00
Matthew D Fleming	d7854da193	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)	2010-07-28 15:36:12 +00:00
Alan Cox	a14a949872	The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.) Submitted by: kib	2010-07-28 04:47:40 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Jung-uk Kim	172754036a	Simplify fldcw() macro. There is no reason to use pointer here. No object file change after this commit (verified with md5).	2010-07-26 23:20:55 +00:00
Jung-uk Kim	8b019a8887	Remove an unused macro since r189418.	2010-07-26 22:55:14 +00:00
Jung-uk Kim	30402401a7	Reduce diff against fenv.h: Mark all inline asms as volatile for safety. No object file change after this commit (verified with md5).	2010-07-26 22:16:36 +00:00
Jung-uk Kim	2e50fa36a5	FNSTSW instruction can use AX register as an operand. Obtained from: fenv.h	2010-07-26 21:24:52 +00:00
Rui Paulo	daef39e7ae	Remove the acpi_aiboost driver. It has been replaced by aibs(4).	2010-07-25 17:55:57 +00:00
Rui Paulo	2b95672852	MFamd64: Add USD_GETBASE(), USD_SETBASE(), USD_GETLIMIT() and USD_SETLIMIT().	2010-07-21 18:47:52 +00:00
Tijl Coosemans	3245ecbe92	Store fsbase and gsbase in the right fields of the mcontext. They were switched. PR: i386/148344 Approved by: kib (mentor) MFC after: 1 week	2010-07-20 12:36:36 +00:00
Alexander Motin	060d7431b5	Add hints for i8254 timer on i386 and amd64. Some people report about systems with PnP/ACPI not reporting i8254 timer. In some cases it can be fatal, as i8254 can be the only available time counter hardware. From other side we are now heavily depend on i8254 timer and till the last time it's init/usage was completely hardcoded. So this change just restores previous behavior in more regular fashion.	2010-07-16 23:21:46 +00:00
Alexander Motin	fcc06be1b2	Move functions declaration to MI code, following implementation.	2010-07-15 17:49:35 +00:00
Bernhard Schmidt	774f94f14c	- Update 6000 firmware to 9.221.4.1 - Add 6050 firmware MFC after: 2 weeks	2010-07-15 11:26:07 +00:00
Warner Losh	1003cfe94d	Remove obsolete undef of COPY_SIGCODE. It appears to have not been used in FreeBSD in quite some time (maybe since before 4.4-lite :) Submitted by: bde	2010-07-13 15:06:13 +00:00
Alan Cox	8155e5d561	Reduce the number of global TLB shootdowns generated by pmap_qenter(). Specifically, teach pmap_qenter() to recognize the case when it is being asked to replace a mapping with the very same mapping and not generate a shootdown. Unfortunately, the buffer cache commonly passes an entire buffer to pmap_qenter() when only a subset of the mappings are changing. For the extension of buffers in allocbuf() this was resulting in unnecessary shootdowns. The addition of new pages to the end of the buffer need not and did not trigger a shootdown, but overwriting the initial mappings with the very same mappings was seen as a change that necessitated a shootdown. With this change, that is no longer so. For a "buildworld" on amd64, this change eliminates 14-15% of the pmap_invalidate_range() shootdowns, and about 4% of the overall shootdowns. MFC after: 3 weeks	2010-07-10 18:22:44 +00:00
Konstantin Belousov	b543e91ba5	Fix spacing. Noted by: pgollucci MFC after: 3 weeks	2010-07-09 21:27:42 +00:00
Konstantin Belousov	2680dac9e1	For both i386 and amd64 pmap, - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks	2010-07-09 20:05:56 +00:00
Alexander Motin	af565edaaa	Revert r209638. After commit, there appeared to be more people who liked previous name of stray interrupt counters, then responded to the list.	2010-07-02 17:22:15 +00:00
Alexander Motin	b7bc6aa726	Make stray irq counters have format alike to other counters. Unified format makes string processing (for example by `systat -vm`) easier.	2010-07-01 21:58:46 +00:00
John Baldwin	fc0de8f0b6	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
Konstantin Belousov	13cedde2cb	Regenerate	2010-06-28 18:17:21 +00:00
Rui Paulo	bbe4a97d41	Import the acpi_aibs(4) driver written by Constantine A. Murenin. It has more features than acpi_aiboost(4) and it will eventually replace acpi_aiboost(4). Submitted by: Constantine A. Murenin <cnst at FreeBSD.org> Reviewed by: freebsd-acpi, imp MFC after: 1 month	2010-06-25 15:32:46 +00:00
Konstantin Belousov	595473a587	Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64 ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week	2010-06-23 20:44:07 +00:00
Konstantin Belousov	692add74d8	Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also, note that usercontext is not initialized anymore in fpstate_drop(). Systematically replace references to npxgetregs() and npxsetregs() by npxgetuserregs() and npxsetuserregs() in comments. Noted by: bde	2010-06-23 12:17:13 +00:00
Konstantin Belousov	1060a94fb5	After the FPU use requires #MF working due to INT13 FPU exception handling removal, MFi386 r209198: Use critical sections instead of disabling local interrupts to ensure the consistency between PCPU fpcurthread and the state of FPU. Reviewed by: bde Tested by: pho	2010-06-23 11:21:19 +00:00
Konstantin Belousov	699d648aab	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 11:12:58 +00:00
Konstantin Belousov	95882b9865	Remove unused i586 optimized bcopy/bzero/etc implementations that utilize FPU registers for copying. Remove the switch table and jumps from bcopy/bzero/... to the actual implementation. As a side-effect, i486-optimized bzero is removed. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 10:40:28 +00:00
Alexander Motin	25eb1b8c15	Some style fixes for r209371. Submitted by: jhb@	2010-06-22 16:20:10 +00:00
Alexander Motin	875b8844be	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
Konstantin Belousov	9e3e64e797	Only enable kdtrace hook in the LINT on the architectures that implement it.	2010-06-18 18:51:09 +00:00
Alexander Motin	d364638110	Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64. This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)	2010-06-17 11:54:49 +00:00
John Baldwin	61d3f0bab2	Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks	2010-06-15 18:51:41 +00:00
Alexander Motin	2f9fc3899b	Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu() for pre-bound IRQs during boot, submit there LAPIC ID, same as in other places, not CPU ID.	2010-06-14 07:38:53 +00:00
Alexander Motin	1440de203f	Check general TSC presence before doing more specific checks and printfs.	2010-06-12 13:10:03 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Alan Cox	9124d0d6a3	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)	2010-06-11 15:49:39 +00:00
Alexander Kabaev	60743cbd22	Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requests in Linux emulation layer. Linux seems to only require that pos is page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter checking is too strict to allow some Linux binaries to run. tsMuxeR is one example of such a binary. Discussed with: jhb MFC after: 1 week	2010-06-10 17:59:47 +00:00
Alan Cox	ce18658792	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@	2010-06-10 16:56:35 +00:00
John Baldwin	b9cd2f771a	Move the MD support for PCI message signalled interrupts to the x86 tree as it is identical for i386 and amd64.	2010-06-08 18:36:03 +00:00
John Baldwin	2465e30f0c	Move the machine check support code to the x86 tree since it is identical on i386 and amd64. Requested by: alc	2010-06-08 18:04:07 +00:00
John Baldwin	53a908cb07	Move the I/O APIC code to the x86 tree since it is identical on i386 and amd64.	2010-06-08 17:51:21 +00:00
John Baldwin	bfc7a4fc48	- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask the interrupt followed by a brief delay if it is not currently masked before moving the interrupt. - Move the icu_lock out of ioapic_program_intpin() and into callers. This closes a race where ioapic_program_intpin() could use a stale value of the masked state to compute the masked bit in the register. Reviewed by: mav MFC after: 2 weeks	2010-06-08 17:08:13 +00:00
Konstantin Belousov	6cf9a08d2c	Introduce the x86 kernel interfaces to allow kernel code to use FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month	2010-06-05 15:59:59 +00:00
Alan Cox	966898be68	In the unlikely event that pmap_ts_referenced() demoted five superpage mappings to the same underlying physical page, the calling thread would be left forever pinned to the same processor. MFC after: 3 days	2010-06-03 03:55:22 +00:00
John Baldwin	9c72429312	MFamd64: Add a new macro PCPU_XEN_FIELDS to hold XEN-specific per-CPU fields that is always included in PCPU_MD_FIELDS. The macro is empty for non-XEN kernels. This avoids duplicating non-XEN per-CPU fields in two places. While here, remove several unused fields from the XEN-specific structure. Reviewed by: kmacy, gibbs MFC after: 1 month	2010-06-02 15:09:36 +00:00
Alan Cox	b2830a9649	Eliminate a stale comment.	2010-05-31 06:06:10 +00:00
Alan Cox	72dc3eb65b	Simplify the inner loop of pmap_collect(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.	2010-05-30 18:48:41 +00:00
Alan Cox	a1192299b3	Merge various changes from i386/i386/pmap.c: The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions. Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all(). r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-30 04:44:32 +00:00
Alan Cox	8f0d5d3b9f	When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().	2010-05-29 17:10:45 +00:00
John Baldwin	0c86af8162	Defer initializing machine checks for the boot CPU until the local APIC is fully configured. MFC after: 1 month	2010-05-28 17:50:24 +00:00
Alan Cox	52d8ba372e	Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all().	2010-05-28 06:49:57 +00:00
Konstantin Belousov	eee6151f46	Clarify a potential issue in get_fpcontext() use. MFC after: 1 week	2010-05-27 18:33:00 +00:00
Alan Cox	c46b90e90a	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib	2010-05-26 18:00:44 +00:00
John Baldwin	835f163a20	Only enable CMCI on i386 if 'device apic' is enabled in the kernel since it requires the local APIC to work.	2010-05-25 21:39:30 +00:00
John Baldwin	58ccad7ddc	Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month	2010-05-24 15:45:05 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alexander Motin	dbd55f3ff0	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Alexander Motin	fa1ed4bd1a	Unify local_apic.c for x86 archs,	2010-05-23 17:45:01 +00:00
Poul-Henning Kamp	065b12a703	Rename an argument from "exp" to "expect" since the former makes FlexeLint uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.	2010-05-20 06:18:03 +00:00
John Baldwin	3b642a049b	Add constants for the optional EOI suppression support in local APICs and EOI registers in I/O APICs.	2010-05-19 19:52:41 +00:00
Alan Cox	9ab6032f73	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.	2010-05-16 23:45:10 +00:00
Poul-Henning Kamp	965df046e5	Apply a patch that has been lingering in my inbox for far too long: On a soekris Net5501, if you do a watchdog -t 16, followed by a watchdog -t 0 to disable the watchdog, and then after some time (16s) re-enable the watchdog the box reboots immediatly. This prevents also to stop and restart watchdogd(8). This is because when you stop the watchdog, the timer is not stoped, only the hard reset is disabled. So when the timer has elapsed, the C2 event of the timer is set. But when the hard reset is re-enabled, the event is not cleared and the box reboots. The attached patch stops and resets the counter when the watchdog is disabled and do not disable the hard reset of the timer (if the timer has elapsed it's too late). Submitted by: Patrick Lamaizière	2010-05-15 10:31:11 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Alan Cox	7024db1d40	Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.	2010-05-06 16:39:43 +00:00
Konstantin Belousov	db8fd40e9f	Add definitions for Intel AESNI CPUID bits and print the capabilities on boot. Hardware provided by: Sentex Communications MFC after: 1 week	2010-05-05 21:07:47 +00:00
Joel Dahl	8e0ad55abb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
Kip Macy	958d87cd86	merge 194209 in to the i386/xen pmap requested by: alc@	2010-04-30 03:26:12 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Attilio Rao	d8b878873e	- Extract the IODEV_PIO interface from ia64 and make it MI. In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks	2010-04-28 15:38:01 +00:00
Konstantin Belousov	8bac98182a	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
Alan Cox	14dd3a29ea	MFi386 r207205 Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it.	2010-04-27 05:35:35 +00:00
Alan Cox	0d2e1c3e39	Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified. In contrast to pmap_remove() or pmap_remove_all(), the mapping is not being destroyed, so the notion that the page was accessed is not lost. Moreover, clearing the page table entry's accessed bit and setting the page's PG_REFERENCED flag can throw off the page daemon's activity count calculation. Finally, in my tests, I found that 15% of the atomic memory operations being performed by pmap_protect() were only to clear PG_A, and not change protection. This could, by itself, be fixed, but I don't see the point given the above argument. Remove a comment from pmap_protect_pde() that is no longer meaningful after the above change.	2010-04-25 20:40:45 +00:00
Kip Macy	c5cc832f32	- fix style issues on i386 as well requested by: alc@	2010-04-24 21:36:52 +00:00
Alan Cox	7b85f59183	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!	2010-04-24 17:32:52 +00:00
Konstantin Belousov	ed7806879b	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
Jung-uk Kim	b834123032	If a conditional jump instruction has the same jt and jf, do not perform the test and jump unconditionally.	2010-04-22 23:47:19 +00:00
Andrew Thompson	b850ecc180	Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this had the illusion of a tunable setting but was always turned on regardless. MFC after: 1 week	2010-04-22 21:31:34 +00:00
Rui Paulo	ff569d8436	Rename the cyclic global variable lapic_cyclic_clock_func to just cyclic_clock_func. This will make more sense when we start developing non x86 cyclic version.	2010-04-20 17:03:30 +00:00
Pyun YongHyeon	d193ed0bed	Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet. This driver was written by Alexander Pohoyda and greatly enhanced by Nikolay Denev. I don't have these hardwares but this driver was tested by Nikolay Denev and xclin. Because SiS didn't release data sheet for this controller, programming information came from Linux driver and OpenSolaris. Unlike other open source driver for SiS190/191, sge(4) takes full advantage of TX/RX checksum offloading and does not require additional copy operation in RX handler. The controller seems to have advanced offloading features like VLAN hardware tag insertion/stripping, TCP segmentation offload(TSO) as well as jumbo frame support but these features are not available yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw> who sent fix for receiving VLAN oversized frames.	2010-04-14 20:45:33 +00:00
Konstantin Belousov	5f82d16eb1	Change printf() calls to uprintf() for sigreturn() and trap() complaints about inacessible or wrong mcontext, and for dreaded "kernel trap with interrupts disabled" situation. The later is changed when trap is generated from user mode (shall never be ?). Normalize the messages to include both pid and thread name. MFC after: 1 week	2010-04-13 10:12:58 +00:00
Rui Paulo	05c100d21f	Add EFI boot info fields.	2010-04-07 18:52:51 +00:00
Joel Dahl	8c14c16020	Switch to our preferred 2-clause BSD license. Approved by: jfv	2010-04-07 18:26:13 +00:00
Fabien Thomas	1fa7f10bac	- Support for uncore counting events: one fixed PMC with the uncore domain clock, 8 programmable PMC. - Westmere based CPU (Xeon 5600, Corei7 980X) support. - New man pages with events list for core and uncore. - Updated Corei7 events with Intel 253669-033US December 2009 doc. There is some removed events in the documentation, they have been kept in the code but documented in the man page as obsolete. - Offcore response events can be setup with rsp token. Sponsored by: NETASQ	2010-04-02 13:23:49 +00:00
John Baldwin	90dfe31955	Add a handler for the local APIC error interrupt. For now it just prints out the current value of the local APIC error register when the interrupt fires. MFC after: 1 week	2010-03-29 19:13:34 +00:00
Ed Schouten	510ea843ba	Rename st_timespec fields to st_tim for POSIX 2008 compliance. A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.	2010-03-28 13:13:22 +00:00
Alan Cox	3792de2e87	Correctly handle preemption of pmap_update_pde_invalidate(). X-MFC after: r205573	2010-03-27 23:53:47 +00:00
Alan Cox	a57d0d8e1d	Simplify pmap_growkernel(), making the i386 version more like the amd64 version. MFC after: 3 weeks	2010-03-27 18:24:27 +00:00
Alan Cox	09fcdf114e	A ptrace(2) by one processor may trigger a promotion in the address space of another process. Modify pmap_promote_pde() to handle this. (This is not a problem on amd64 due to implementation differences.) Reported by: jh@ MFC after: 1 week	2010-03-25 17:24:03 +00:00
Nathan Whitehorn	a107d8aac9	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb	2010-03-25 14:24:00 +00:00
Alan Cox	e1990590e3	Adapt r204907 and r205402, the amd64 implementation of the workaround for AMD Family 10h Erratum 383, to i386. Enable machine check exceptions by default, just like r204913 for amd64. Enable superpage promotion only if the processor actually supports large pages, i.e., PG_PS. MFC after: 2 weeks	2010-03-24 03:07:35 +00:00
John Baldwin	121b3af9f2	Remove unneeded type specifiers from 64-bit constants. The compiler infers their natural type from the constants' values. Submitted by: bde MFC after: 3 days	2010-03-22 15:08:26 +00:00
Ed Maste	d02e85a681	Merge r197455 from amd64: Add a backtrace to the "fpudna in kernel mode!" case, to help track down where this comes from. Reviewed by: bde	2010-03-22 11:52:53 +00:00
Xin LI	0d0284bc52	Back out revision 205307. For the record: CPU_ENABLE_SSE enables some code that dynamically enables SSE support but not necessarily enforce execution of SSE instructions.	2010-03-19 16:09:57 +00:00
Andriy Gapon	9344361b66	pmap amd64/i386: fix a typo in a comment MFC after: 3 days	2010-03-19 14:48:32 +00:00
John Baldwin	42c93b8d31	Use the same policy for rejecting / not-reject ACPI tables with incorrect checksums as the base acpi(4) driver. This fixes a problem where the MADT parser would reject the MADT table during early boot causing the MP Table to be, but then the acpi(4) driver would attach and use non-SMP interrupt routing. Tested by: Alastair Hogge agh of coolrhaug com MFC after: 1 week	2010-03-19 12:43:18 +00:00
Xin LI	01af4cc1cc	SSE is enabled by default about 5 years ago so there is no point pretending that we support I486 and I586 CPUs in the GENERIC kernel, users wants these support would have to build a custom kernel to explicitly disable SSE anyways. MFC after: 1 month	2010-03-19 01:16:53 +00:00
John Baldwin	a311ca2f45	- Extend the machine check record structure to include several fields useful for parsing model-specific and other fields in machine check events including the global machine check capabilities and status registers, CPU identification, and the FreeBSD CPU ID. - Report these added fields in the console log of a machine check so that a record structure can be reconstituted from the console messages. - Parse new architectural errors including memory controller errors. MFC after: 1 week	2010-03-16 16:01:19 +00:00
John Baldwin	c998036d71	Use unsigned long long constants for fields in 64-bit machine check registers instead of unsigned long constants. MFC after: 3 days	2010-03-16 15:27:58 +00:00
Ed Schouten	338f1debcd	Remove COMPAT_43TTY from stock kernel configuration files. COMPAT_43TTY enables the sgtty interface. Even though its exposure has only been removed in FreeBSD 8.0, it wasn't used by anything in the base system in FreeBSD 5.x (possibly even 4.x?). On those releases, if your ports/packages are less than two years old, they will prefer termios over sgtty.	2010-03-13 09:21:00 +00:00
John Baldwin	55c4e01602	Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the __gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to match what 7.x uses for PMAP_INLINE.	2010-03-12 03:08:47 +00:00
John Baldwin	343803ad83	Print out the family and model from the cpu_id. This is especially useful given the advent of the extended family and extended model fields. The values are printed in hex to match their common usage in documentation. Submitted by: Alexander Best MFC after: 1 week	2010-03-11 14:17:37 +00:00
John Baldwin	cf684ede27	Make NKPT a kernel option on i386 so that it can be set to a non-default value from kernel config files. Tested by: Charles Sprickman spork of bway net MFC after: 2 weeks	2010-03-10 19:50:52 +00:00

... 5 6 7 8 9 ...

12393 Commits