freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	94818d19c3	Move xrstor/xsave/xsetbv into fpu.c and reorder them. Requested by: bde MFC after: 1 month	2012-01-30 07:53:33 +00:00
Konstantin Belousov	a045432a58	Synchronize the struct sigcontext definitions on x86 with mcontext_t. Pointed out by: bde MFC after: 1 month	2012-01-30 07:51:52 +00:00
Kip Macy	263811f724	exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks	2012-01-27 20:18:31 +00:00
Konstantin Belousov	5be9d54a2b	Order newly added functions alphabetically. Requested by: bde MFC after: 3 days	2012-01-25 12:43:27 +00:00
David Schultz	2ee7b1d4ae	Add C11 macros describing subnormal numbers to float.h. Reviewed by: bde	2012-01-23 06:36:41 +00:00
Konstantin Belousov	8c6f8f3d5b	Add support for the extended FPU states on amd64, both for native 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month	2012-01-21 17:45:27 +00:00
Konstantin Belousov	6db9cf559f	Add definitions for the FPU extended state header, legacy extended state and AVX state. MFC after: 1 week	2012-01-17 17:07:13 +00:00
Konstantin Belousov	e568229f50	Modernize the fpusave structures definitions by using uint*_t types. MFC after: 1 week	2012-01-17 16:53:41 +00:00
Konstantin Belousov	dd4f5d2437	Implement xsetbv(), xsave() and xrstor() providing C access to the similarly named CPU instructions. Since our in-tree binutils gas is not aware of the instructions, and I have to use the byte-sequence to encode them, hardcode the r/m operand as (%rdi). This way, first argument of the pseudo-function is already placed into proper register. MFC after: 1 week	2012-01-17 07:30:36 +00:00
Konstantin Belousov	79937651ef	Add definitions related to XCR0. MFC after: 1 week	2012-01-17 07:23:43 +00:00
Konstantin Belousov	5ba2a4998c	Add macro IS_BSP() to check whether the current CPU is BSP. MFC after: 1 week	2012-01-17 07:21:23 +00:00
Ulrich Spörlein	9a14aa017b	Convert files to UTF-8	2012-01-15 13:23:18 +00:00
Kenneth D. Merry	130f4520cb	Add the CAM Target Layer (CTL). CTL is a disk and processor device emulation subsystem originally written for Copan Systems under Linux starting in 2003. It has been shipping in Copan (now SGI) products since 2005. It was ported to FreeBSD in 2008, and thanks to an agreement between SGI (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is available under a BSD-style license. The intent behind the agreement was that Spectra would work to get CTL into the FreeBSD tree. Some CTL features: - Disk and processor device emulation. - Tagged queueing - SCSI task attribute support (ordered, head of queue, simple tags) - SCSI implicit command ordering support. (e.g. if a read follows a mode select, the read will be blocked until the mode select completes.) - Full task management support (abort, LUN reset, target reset, etc.) - Support for multiple ports - Support for multiple simultaneous initiators - Support for multiple simultaneous backing stores - Persistent reservation support - Mode sense/select support - Error injection support - High Availability support (1) - All I/O handled in-kernel, no userland context switch overhead. (1) HA Support is just an API stub, and needs much more to be fully functional. ctl.c: The core of CTL. Command handlers and processing, character driver, and HA support are here. ctl.h: Basic function declarations and data structures. ctl_backend.c, ctl_backend.h: The basic CTL backend API. ctl_backend_block.c, ctl_backend_block.h: The block and file backend. This allows for using a disk or a file as the backing store for a LUN. Multiple threads are started to do I/O to the backing device, primarily because the VFS API requires that to get any concurrency. ctl_backend_ramdisk.c: A "fake" ramdisk backend. It only allocates a small amount of memory to act as a source and sink for reads and writes from an initiator. Therefore it cannot be used for any real data, but it can be used to test for throughput. It can also be used to test initiators' support for extremely large LUNs. ctl_cmd_table.c: This is a table with all 256 possible SCSI opcodes, and command handler functions defined for supported opcodes. ctl_debug.h: Debugging support. ctl_error.c, ctl_error.h: CTL-specific wrappers around the CAM sense building functions. ctl_frontend.c, ctl_frontend.h: These files define the basic CTL frontend port API. ctl_frontend_cam_sim.c: This is a CTL frontend port that is also a CAM SIM. This frontend allows for using CTL without any target-capable hardware. So any LUNs you create in CTL are visible in CAM via this port. ctl_frontend_internal.c, ctl_frontend_internal.h: This is a frontend port written for Copan to do some system-specific tasks that required sending commands into CTL from inside the kernel. This isn't entirely relevant to FreeBSD in general, but can perhaps be repurposed. ctl_ha.h: This is a stubbed-out High Availability API. Much more is needed for full HA support. See the comments in the header and the description of what is needed in the README.ctl.txt file for more details. ctl_io.h: This defines most of the core CTL I/O structures. union ctl_io is conceptually very similar to CAM's union ccb. ctl_ioctl.h: This defines all ioctls available through the CTL character device, and the data structures needed for those ioctls. ctl_mem_pool.c, ctl_mem_pool.h: Generic memory pool implementation used by the internal frontend. ctl_private.h: Private data structres (e.g. CTL softc) and function prototypes. This also includes the SCSI vendor and product names used by CTL. ctl_scsi_all.c, ctl_scsi_all.h: CTL wrappers around CAM sense printing functions. ctl_ser_table.c: Command serialization table. This defines what happens when one type of command is followed by another type of command. ctl_util.c, ctl_util.h: CTL utility functions, primarily designed to be used from userland. See ctladm for the primary consumer of these functions. These include CDB building functions. scsi_ctl.c: CAM target peripheral driver and CTL frontend port. This is the path into CTL for commands from target-capable hardware/SIMs. README.ctl.txt: CTL code features, roadmap, to-do list. usr.sbin/Makefile: Add ctladm. ctladm/Makefile, ctladm/ctladm.8, ctladm/ctladm.c, ctladm/ctladm.h, ctladm/util.c: ctladm(8) is the CTL management utility. It fills a role similar to camcontrol(8). It allow configuring LUNs, issuing commands, injecting errors and various other control functions. usr.bin/Makefile: Add ctlstat. ctlstat/Makefile ctlstat/ctlstat.8, ctlstat/ctlstat.c: ctlstat(8) fills a role similar to iostat(8). It reports I/O statistics for CTL. sys/conf/files: Add CTL files. sys/conf/NOTES: Add device ctl. sys/cam/scsi_all.h: To conform to more recent specs, the inquiry CDB length field is now 2 bytes long. Add several mode page definitions for CTL. sys/cam/scsi_all.c: Handle the new 2 byte inquiry length. sys/dev/ciss/ciss.c, sys/dev/ata/atapi-cam.c, sys/cam/scsi/scsi_targ_bh.c, scsi_target/scsi_cmds.c, mlxcontrol/interface.c: Update for 2 byte inquiry length field. scsi_da.h: Add versions of the format and rigid disk pages that are in a more reasonable format for CTL. amd64/conf/GENERIC, i386/conf/GENERIC, ia64/conf/GENERIC, sparc64/conf/GENERIC: Add device ctl. i386/conf/PAE: The CTL frontend SIM at least does not compile cleanly on PAE. Sponsored by: Copan Systems, SGI and Spectra Logic MFC after: 1 month	2012-01-12 00:34:33 +00:00
Sean Bruno	80dbff4e99	IFC to head to catch up the bhyve branch Approved by: grehan@	2012-01-04 02:01:27 +00:00
Gavin Atkinson	c1cbd9ab53	Default to not performing the early-boot memory tests when we detect we are booting inside a VM. There are three reasons to disable this: o It causes the VM host to believe that all the tested pages or RAM are in use. This in turn may force the host to page out pages of RAM belonging to other VMs, or otherwise cause problems with fair resource sharing on the VM cluster. o It adds significant time to the boot process (around 1 second/Gig in testing) o It is unnecessary - the host should have already verified that the memory is functional etc. Note that this simply changes the default when in a VM - it can still be overridden using the hw.memtest.tests tunable. MFC after: 4 weeks	2011-12-31 13:24:53 +00:00
Robert Watson	009d2032af	Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel configurations for various architectures in FreeBSD 10.x. This allows basic Capsicum functionality to be used in the default FreeBSD configuration on non-embedded architectures; process descriptors are not yet enabled by default. MFC after: 3 months Sponsored by: Google, Inc	2011-12-29 22:48:36 +00:00
John Baldwin	4eda7b08af	Regen.	2011-12-29 15:35:47 +00:00
John Baldwin	dd01579cde	Implement linux_fadvise64() and linux_fadvise64_64() using kern_posix_fadvise(). Reviewed by: silence on emulation@ MFC after: 2 weeks	2011-12-29 15:34:59 +00:00
Xin LI	81966bce06	Import the first release of HighPoint RocketRAID 27xx SAS 6Gb/s HBA card driver. This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms. Many thanks to HighPoint for providing this driver. MFC after: 2 weeks	2011-12-28 23:26:58 +00:00
Alan Cox	fe8b9971a8	Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold(): If the page lock acquisition is retried, then the underlying thread is not unpinned. Wrap nearby lines that exceed 80 columns.	2011-12-28 19:59:54 +00:00
Peter Grehan	608f97c359	Add support for running as a nested hypervisor under VMWare Fusion, on systems with VT-x/EPT (e.g. Sandybridge Macbooks). This will most likely work on VMWare Workstation8/Player4 as well. See the VMWare app note at: http://communities.vmware.com/docs/DOC-8970 Fusion doesn't propagate the PAT MSR auto save-restore entry/exit control bits. Deal with this by noting that fact and setting up the PAT MSR to essentially be a no-op - it is init'd to power-on default, and a software shadow copy maintained. Since it is treated as a no-op, o/s settings are essentially ignored. This may not give correct results, but since the hypervisor is running nested, a number of bets are already off. On a quad-core/HT-enabled 'MacBook8,2', nested VMs with 1/2/4 vCPUs were fired up. The more nested vCPUs the worse the performance, unless the VMs were started up in multiplexed mode where things worked perfectly up to the limit of 8 vCPUs. Reviewed by: neel	2011-12-24 19:39:02 +00:00
Xin LI	25841e912f	Add comments in NOTES to say what viawd is.	2011-12-20 00:16:52 +00:00
Ed Schouten	53627e400f	Replace __signed by signed. The signed keyword is an integral part of the C syntax. There's no need to use __signed.	2011-12-13 13:38:03 +00:00
Fabien Thomas	61af1d1393	Add watchdog support for VIA south bridge chipset. Tested on VT8251, VX900 but CX700, VX800, VX855 should works. MFC after: 1 month Sponsored by: NETASQ	2011-12-12 09:50:33 +00:00
Philip Paeps	7ac6374d04	Limit building sfxge(4) in-kernel to amd64 for the time being. We can put it back after I fix the breakages on some of our more exotic platforms. While here, add the driver to the amd64 NOTES, so it can be picked up in LINT builds.	2011-11-28 18:51:40 +00:00
Marius Strobl	4b7ec27007	- There's no need to overwrite the default device method with the default one. Interestingly, these are actually the default for quite some time (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9) since r52045) but even recently added device drivers do this unnecessarily. Discussed with: jhb, marcel - While at it, use DEVMETHOD_END. Discussed with: jhb - Also while at it, use __FBSDID.	2011-11-22 21:28:20 +00:00
Peter Grehan	3ee1a36e2e	IFC @ r227804 Pull in the virtio drivers from head.	2011-11-22 02:27:59 +00:00
Lawrence Stewart	cf13a58510	- Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate() system calls to provide feed-forward clock management capabilities to userspace processes. ffclock_getcounter() returns the current value of the kernel's feed-forward clock counter. ffclock_getestimate() returns the current feed-forward clock parameter estimates and ffclock_setestimate() updates the feed-forward clock parameter estimates. - Document the syscalls in the ffclock.2 man page. - Regenerate the script-derived syscall related files. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ Submitted by: Julien Ridoux (jridoux at unimelb edu au)	2011-11-21 01:26:10 +00:00
Attilio Rao	686710f7ba	Revert part of the r227758 which crept in. Pointy hat: attilio X-MFC: r227758	2011-11-20 16:36:02 +00:00
Attilio Rao	ccdf233323	Introduce macro stubs in the mutex implementation that will be always defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month	2011-11-20 16:33:09 +00:00
Ed Schouten	3d402cb52e	Regenerate system call tables.	2011-11-19 07:20:20 +00:00
Ed Schouten	767a32641c	Make the Linux *at() calls a bit more complete. Properly support: - AT_EACCESS for faccessat(), - AT_SYMLINK_FOLLOW for linkat().	2011-11-19 07:19:37 +00:00
Ed Schouten	51cfb9474f	Regenerate system call tables.	2011-11-19 06:36:11 +00:00
Ed Schouten	d3a993d46b	Improve access() parameter name consistency. The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag. Make this less confusing by using the same name as used in the POSIX specification -- `amode'.	2011-11-19 06:35:15 +00:00
David Chisnall	38d1ac34ff	Fix SIGATOMIC_M{IN,AX} on x86-64. These are meant to be the minimum values that are allowed in a sig_atomic_t, but it looks like they were just copied from the x86 versions, so these definitions violate the C and C++ specs. Mismatch was spotted by the libc++ test suite. Approved by: dim (mentor)	2011-11-12 20:16:06 +00:00
Konstantin Belousov	63b7742fbb	Weaken the part of assertions added in the r227394. Only check that the process state is stopped. MFC after: 1 week	2011-11-11 04:10:36 +00:00
Ryan Stone	493b584dbd	Correct the types of the arguments to return probes of the syscall provider. Previously we were erroneously supplying the argument types of the corresponding entry probe. Reviewed by: rpaulo MFC after: 1 week	2011-11-11 03:49:42 +00:00
Konstantin Belousov	e9862e9b9e	Attempt to improve formatting and content of several comments for amd64 and i386 MD code. Based on suggestions by: bde MFC after: 1 week	2011-11-09 18:25:50 +00:00
Konstantin Belousov	2bb663c043	Stopped process may legitimately have some threads sleeping and not suspended, if the sleep is uninterruptible. Reported and tested by: pho MFC after: 1 week	2011-11-09 17:25:43 +00:00
Attilio Rao	ed1f6dc235	Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on all the architectures. The option allows to mount non-MPSAFE filesystem. Without it, the kernel will refuse to mount a non-MPSAFE filesytem. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch. Tested by: gianni Reviewed by: kib	2011-11-08 10:18:07 +00:00
Kevin Lo	966d0ed18f	Enable PCI MMC/SD support by default on i386 and amd64	2011-11-08 08:29:05 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ryan Stone	166808c625	Fix the DTrace pid return trap interrupt vector. Previously we were using 31, but that vector is reserved. Without this fix, running dtrace -p <pid> would either cause the target process to crash or the kernel to page fault. Obtained from: rpaulo MFC after: 3days	2011-11-07 01:53:25 +00:00
Marius Strobl	a9ab459b31	Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel configuration files. Besides duplicating functionality, amd(4), which previously also supported the AMD Am53C974, unlike esp(4) is no longer maintained and has accumulated enough bit rot over time to always cause a panic during boot as long as at least one target is attached to it (see PR 124667). PR: 124667 Obtained from: NetBSD (based on) MFC after: 3 days	2011-11-01 21:26:57 +00:00
Marcel Moolenaar	b2f1a8f2b3	Revert rev. 226893: subr_syscall.c is being included from C files and on amd64 with FREEBSD32 enabled, this means that systrace_probe_func gets defined twice.	2011-10-30 02:19:39 +00:00
Marcel Moolenaar	056f0ec755	Define systrace_probe_func in subr_syscall.c where it's used, instead of defining it in MD code. This eliminates porting to other architectures.	2011-10-29 01:26:36 +00:00
Alan Cox	703dec68bf	Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.	2011-10-27 16:39:17 +00:00
Ken Smith	6168545a11	Adjust the debugger options slightly. This should help me do the right thing when changing the debugging options as part of head becoming a new stable branch. It may also help people who for one reason or another want to run head but don't want it slowed down by the debugging support. Reviewed by: kib	2011-10-27 13:07:49 +00:00
Peter Grehan	70d8f36aa4	IFC @ r226824	2011-10-27 04:56:53 +00:00
David Schultz	a50079b7ff	People porting FreeBSD to new architectures ought not have to implement a deprecated FPU control interface in addition to the standard one. To make this clearer, further deprecate ieeefp.h by not declaring the function prototypes except on architectures that implement them already. Currently i386 and amd64 implement the ieeefp.h interface for compatibility, and for fp[gs]etprec(), which doesn't exist on most other hardware. Powerpc, sparc64, and ia64 partially implement it and probably shouldn't, and other architectures don't implement it at all.	2011-10-21 06:41:46 +00:00
Ken Smith	7042aba738	Add a warning about why sbp(4) is commented out so that curious folks are forewarned they might wind up with a hole in their foot if they decide to give it a try. Suggested by: dougb	2011-10-19 21:55:20 +00:00
Ken Smith	4c0ba9b742	Comment out the sbp(4) driver for architectures that support it. As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112) but was left alone in head so people could work on fixing an issue that caused boot failure on some motherboards. Apparently nobody has worked on it and we are getting reports of boot failure with the 9.0 test builds. So this time I'll comment out the driver in head (still hoping someone will work on it) and MFC to stable/9. Submitted by: Alberto Villa <avilla at FreeBSD dot org>	2011-10-18 13:45:16 +00:00
Dag-Erling Smørgrav	a417d4a46b	Trace attempts to call restricted MD syscalls.	2011-10-18 07:39:27 +00:00
Konstantin Belousov	6bfe4c78c8	Remove unused define. MFC after: 1 month	2011-10-07 16:09:44 +00:00
Xin LI	db1fda10b4	Add the 9750 SATA+SAS 6Gb/s RAID controller card driver, tws(4). Many thanks for their contiued support to FreeBSD. This is version 10.80.00.003 from codeset 10.2.1 [1] Obtained from: LSI http://kb.lsi.com/Download16574.aspx [1]	2011-10-04 21:40:25 +00:00
Konstantin Belousov	c06f5f6cea	Do not allow the kernel to access usermode pages without installed fault handler. Panic immediately in such situation, on i386 and amd64. Reviewed by: avg, jhb MFC after: 1 week	2011-10-03 17:01:31 +00:00
Attilio Rao	8d79dfca55	Add some improvements in the idle table callbacks: - Replace instances of manual assembly instruction "hlt" call with halt() function calling. - In cpu_idle_mwait() avoid races in check to sched_runnable() using the same pattern used in cpu_idle_hlt() with the 'hlt' instruction. - Add comments explaining the logic behind the pattern used in cpu_idle_hlt() and other idle callbacks. In collabouration with: jhb, mav Reviewed by: adri, kib MFC after: 3 weeks	2011-10-03 14:23:00 +00:00
Neel Natu	40f31c0d0a	Kernel configuration for a bhyve guest.	2011-09-26 07:05:40 +00:00
Kip Macy	9eca9361f9	Auto-generated code from sys_ prefixing makesyscalls.sh change Approved by: re(bz)	2011-09-16 14:04:14 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Peter Grehan	fab4c373af	IFC @ r225592 sys/dev/bvm/bvm_console.c - move up to the new alt-break order.	2011-09-15 22:14:35 +00:00
Konstantin Belousov	20aee906b4	Put amd64_syscall() prototype in md_var.h. Requested by: jhb Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-15 09:54:07 +00:00
Konstantin Belousov	7a1c55c380	Microoptimize the return path for the fast syscalls on amd64. Arrange the code to have the fall-through path to follow the likely target. Do not use intermediate register to reload user %rsp. Proposed by: alc Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-15 09:53:04 +00:00
Konstantin Belousov	e4505da615	The jump target shall be after the padding, not into it. Reported by: alc Approved by: re (bz) MFC after: 2 weeks	2011-09-11 18:00:46 +00:00
Christian Brueffer	b48f7c4c8d	Fix a zyd(4) comment typo that was copy+pasted into most kernel config files. PR: 160276 Submitted by: MATSUMIYA Ryo <matsumiya@mma.club.uec.ac.jp> Approved by: re (kib) MFC after: 1 week	2011-09-11 17:39:51 +00:00
Konstantin Belousov	8bd1142b52	Perform amd64-specific microoptimizations for native syscall entry sequence. The effect is ~1% on the microbenchmark. In particular, do not restore registers which are preserved by the C calling sequence. Align the jump target. Avoid unneeded memory accesses by calculating some data in syscall entry trampoline. Reviewed by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:08:10 +00:00
Konstantin Belousov	26ccf4f10f	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:05:09 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
John Baldwin	3a3ba1b069	Enable the puc(4) driver on amd64 and i386 in GENERIC. This allows devices supported by puc(4) to work "out of the box" since puc.ko does not work "out of the box". Reviewed by: marcel Approved by: re (kib) MFC after: 1 week	2011-08-26 21:22:34 +00:00
John Baldwin	cee0b197de	Make NKPT a kernel option on amd64 so that it can be set to a non-default value from kernel config files. Reviewed by: alc Approved by: re (kib) MFC after: 1 week	2011-08-26 17:08:22 +00:00
Bjoern A. Zeeb	61bc18a327	In HEAD when doing no further checkes there is no reason use the temporary variable and check with if as TUNABLE_*_FETCH do not alter values unless successfully found the tunable. Reported by: jhb, bde MFC after: 3 days X-MFC with: r224516 Approved by: re (kib)	2011-08-20 19:21:46 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Rick Macklem	88c037e26a	Change all the sample kernel configurations to use NFSCL, NFSD instead of NFSCLIENT, NFSSERVER since NFSCL and NFSD are now the defaults. The client change is needed for diskless configurations, so that the root mount works for fstype nfs. Reported by seanbru at yahoo-inc.com for i386/XEN. Approved by: re (hrs)	2011-08-07 20:16:46 +00:00
Bjoern A. Zeeb	0a5f264d60	Introduce a tunable to disable the time consuming parts of bootup memtesting, which can easily save seconds to minutes of boot time. The tunable name is kept general to allow reusing the code in alternate frameworks. Requested by: many Discussed on: arch (a while a go) Obtained from: Sandvine Incorporated Reviewed by: sbruno Approved by: re (kib) MFC after: 2 weeks	2011-07-30 13:33:05 +00:00
Attilio Rao	786ef92b7b	Bump MAXCPU for amd64, ia64 and XLP mips appropriately. From now on, default values for FreeBSD will be 64 maxiumum supported CPUs on amd64 and ia64 and 128 for XLP. All the other architectures seem already capped appropriately (with the exception of sparc64 which needs further support on jalapeno flavour). Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced during the infrastructure cleanup for supporting MAXCPU > 32. This covers cpumask_t retiral too. The switch is considered completed at the present time, so for whatever bug you may experience that is reconducible to that area, please report immediately. Requested by: marcel, jchandra Tested by: pluknet, sbruno Approved by: re (kib)	2011-07-19 13:00:30 +00:00
Attilio Rao	68b739cd6f	Add the possibility to specify from kernel configs MAXCPU value. This patch is going to help in cases like mips flavours where you want a more granular support on MAXCPU. No MFC is previewed for this patch. Tested by: pluknet Approved by: re (kib)	2011-07-19 00:37:24 +00:00
Peter Grehan	bd2228ab3e	IFC @ r224187	2011-07-18 22:00:21 +00:00
Attilio Rao	521ea19d1c	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)	2011-07-18 15:19:40 +00:00
Neel Natu	14ddf164ba	Get rid of redundant initialization of 'dmask'. It was being re-initialized shortly afterwards.	2011-07-06 21:40:48 +00:00
Jung-uk Kim	f0b28f005e	Correct cpu_monitor() and cpu_mwait() for amd64. These instructions take %rcx as "extensions" in long mode. If any unused bit is set in %rcx, these instructions cause general protection fault. Fix style nits and synchronize i386 with amd64.	2011-07-05 18:42:10 +00:00
Attilio Rao	470107b2f1	MFC	2011-07-04 11:13:00 +00:00
Alan Cox	80788b2a27	When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib	2011-07-02 23:42:04 +00:00
Peter Grehan	23300944db	IFC @ r223696 to pick up dfr's userboot	2011-06-30 17:37:42 +00:00
Jonathan Anderson	12bc222e57	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)	2011-06-30 10:56:02 +00:00
Attilio Rao	7b744f6b01	MFC	2011-06-30 10:19:43 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Jonathan Anderson	24c1c3bf71	We may split today's CAPABILITIES into CAPABILITY_MODE (which has to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd	2011-06-29 13:03:05 +00:00
Peter Grehan	a5615c9044	IFC @ r222830	2011-06-28 06:26:03 +00:00
Attilio Rao	6b6603b30e	Remove the pc_cpumask usage from amd64. Reviewed by: alc Tested by: pluknet	2011-06-26 21:36:53 +00:00
Attilio Rao	de138ec703	MFC	2011-06-24 16:35:40 +00:00
John Baldwin	1368987ae4	Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to the x86 tree. The $PIR code is still only enabled on i386 and not amd64. While here, make the qpi(4) driver on conditional on 'device pci'.	2011-06-22 21:04:13 +00:00
Attilio Rao	9b571ec6b3	MFC	2011-06-22 19:42:32 +00:00
John Baldwin	e8f40e32eb	Oops, missed these in 223424. Reported by: jkim	2011-06-22 18:48:07 +00:00
John Baldwin	3bf59bd14f	Use uintXX_t instead of u_intXX_t.	2011-06-22 17:55:16 +00:00
John Baldwin	38d7a61ba4	Add a helper routine to conditionally modify the start address of a resource allocation from an x86 Host-PCI bridge driver so that it can be reused by the ACPI Host-PCI bridge driver (and eventually the MPTable Host-PCI bridge driver) instead of duplicating the same logic. Note that this means that hw.acpi.host_mem_start is now replaced with the hw.pci.host_mem_start tunable that was already used in the non-ACPI case. This also removes hw.acpi.host_mem_start on ia64 where it was not applicable (the implementation was very x86-specific). While here, adjust the logic to apply the new start address on any "wildcard" allocation even if that allocation comes from a subset of the allowable address range. Reviewed by: imp (1)	2011-06-22 16:15:15 +00:00
Attilio Rao	2e4cefa632	Remove the usage of pc_other_cpus from amd64. Tested by: pluknet	2011-06-21 09:19:38 +00:00
Konstantin Belousov	1c23d0f727	Fix vfork. Add comments.	2011-06-18 12:13:28 +00:00
Hans Petter Selasky	144b716627	Enable USB 3.0 support by default in i386 and amd64 GENERIC kernels. Discussed with: joel @ and thompsa @ MFC after: 7 days	2011-06-14 20:30:49 +00:00
Joel Dahl	701b698b6f	Enable sound support by default on i386 and amd64. The generic sound driver has been added, along with enough device-specific drivers to support the most common audio chipsets. We've discussed enabling it from time to time over the years and we've received numerous requests from users, so we decided that shipping 9.0 with working audio by default would be the best thing to do. Bug reports should be sent to the multimedia@ mailing list, as usual. Approved by: mav No objection: re	2011-06-11 09:08:46 +00:00
John Baldwin	049dc0d1ff	Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the Host-PCI bridge drivers and nexus.	2011-06-10 12:30:16 +00:00
Andriy Gapon	234dab4a82	remove code for dynamic offlining/onlining of CPUs on x86 The code has definitely been broken for SCHED_ULE, which is a default scheduler. It may have been broken for SCHED_4BSD in more subtle ways, e.g. with manually configured CPU affinities and for interrupt devilery purposes. We still provide a way to disable individual CPUs or all hyperthreading "twin" CPUs before SMP startup. See the UPDATING entry for details. Interaction between building CPU topology and disabling CPUs still remains fuzzy: topology is first built using all availble CPUs and then the disabled CPUs should be "subtracted" from it. That doesn't work well if the resulting topology becomes non-uniform. This work is done in cooperation with Attilio Rao who in addition to reviewing also provided parts of code. PR: kern/145385 Discussed with: gcooper, ambrisko, mdf, sbruno Reviewed by: attilio Tested by: pho, pluknet X-MFC after: never	2011-06-08 08:12:15 +00:00
Attilio Rao	74e4245e3f	Bring back the number of CPU to 32.	2011-06-07 08:05:23 +00:00
Attilio Rao	81c02539f1	MFC	2011-06-06 21:38:39 +00:00
Andriy Gapon	ecee337a8c	don't use cpuid level 4 in x86 cpu topology detection if it's not supported This regression was introduced in r213323. There are probably no Intel cpus that support amd64 mode, but do not support cpuid level 4, but it's better to keep i386 and amd64 versions of this code in sync. Discovered by: pho Tested by: pho MFC after: 2 weeks	2011-06-06 14:23:13 +00:00
John Baldwin	8b28761278	Some tweaks to the CPUID support: - Don't always pass the cpuid request to the current CPU as some nodes we will emulate purely in software. - Pass in the APIC ID of the virtual CPU so we can return the proper APIC ID. - Always report a completely flat topology with no SMT or multicore. - Report the CPUID2_HV feature and implement support for the 0x40000000 CPUID level. - Use existing constants from <machine/specialreg.h> when possible and use cpu_feature2 when checking for VMX support.	2011-06-02 14:04:07 +00:00
John Baldwin	b3996dd47c	Add a 'show vmcs' DDB command to dump state about the current CPU's current VMCS.	2011-06-02 13:49:19 +00:00
Kevin Lo	a92e80be3f	Bring back r222275. runfw(4) will statically link in rt2870.fw.uu to the kernel, though I have MODULES_OVERRIDE="" in GENERIC. Spotted by: thompsa	2011-05-25 10:04:13 +00:00
Kevin Lo	6d5ee6cd7f	run(4) needs firmware loaded to work	2011-05-25 04:46:48 +00:00
Peter Grehan	87c3644c64	IFC @ r222256	2011-05-24 15:39:34 +00:00
Attilio Rao	d955f0fccf	Revert a patch that involountary sneaked in while I was MFCing.	2011-05-23 23:51:01 +00:00
Attilio Rao	a9ff18a210	MFC	2011-05-23 01:17:30 +00:00
Neel Natu	ad54f37429	Fix a long standing bug in VMXCTX_GUEST_RESTORE(). There was an assumption by the "callers" of this macro that on "return" the %rsp will be pointing to the 'vmxctx'. The macro was not doing this and thus when trying to restore host state on an error from "vmlaunch" or "vmresume" we were treating the memory locations on the host stack as 'struct vmxctx'. This led to all sorts of weird bugs like double faults or invalid instruction faults. This bug is exposed by the -O2 option used to compile the kernel module. With the -O2 flag the compiler will optimize the following piece of code: int loopstart = 1; ... if (loopstart) { loopstart = 0; vmx_launch(); } else vmx_resume(); into this: vmx_launch(); Since vmx_launch() and vmx_resume() are declared to be __dead2 functions the compiler is free to do this. The compiler has no way to know that the functions return indirectly through vmx_setjmp(). This optimization in turn leads us to trigger the bug in VMXCTX_GUEST_RESTORE(). With this change we can boot a 8.1 guest on a 9.0 host. Reported by: jhb@	2011-05-20 03:23:09 +00:00
Neel Natu	3caf3beb5c	Avoid unnecessary sign extension when promoted to a 64-bit integer. This was benign because the interruption info field is a 32-bit quantity and the hardware guarantees that the upper 32-bits are all zeros. But it did make reading the objdump output very confusing.	2011-05-20 02:08:05 +00:00
Peter Grehan	1f3025e133	Changes to allow the GENERIC+bhye kernel built from this branch to run as a 1/2 CPU guest on an 8.1 bhyve host. bhyve/inout.c inout.h fbsdrun.c - Rather than exiting on accesses to unhandled i/o ports, emulate hardware by returning -1 on reads and ignoring writes to unhandled ports. Support the previous mode by allowing a 'strict' parameter to be set from the command line. The 8.1 guest kernel was vastly cut down from GENERIC and had no ISA devices. Booting GENERIC exposes a massive amount of random touching of i/o ports (hello syscons/vga/atkbdc). bhyve/consport.c dev/bvm/bvm_console.c - implement a simplistic signature for the bvm console by returning 'bv' for an inw on the port. Also, set the priority of the console to CN_REMOTE if the signature was returned. This works better in an environment where multiple consoles are in the kernel (hello syscons) bhyve/rtc.c - return 0 for the access to RTC_EQUIPMENT (yes, you syscons) amd64/vmm/x86.c x86.h - hide a bunch more CPUID leaf 1 bits from the guest to prevent cpufreq drivers from probing. The next step will be to move CPUID handling completely into user-space. This will allow the full spectrum of changes from presenting a lowest-common-denominator CPU type/feature set, to exposing (almost) everything that the host can support. Reviewed by: neel Obtained from: NetApp	2011-05-19 21:53:25 +00:00
Attilio Rao	5f6b159db7	MFC	2011-05-18 16:01:29 +00:00
Jung-uk Kim	2b052e43be	Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features. Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel AVX instructions. The SSE5 bit was recycled as XOP extended instruction bit, CVT16 was deprecated in favor of F16C (half-precision float conversion instructions for AVX), and the remaining FMA4 (4-operand FMA instructions) gained a separate CPUID bit. Replace non-existent references with today's CPUID specifications.	2011-05-17 22:36:16 +00:00
John Baldwin	e22b232b0e	Enable handling of 1GB pages in the direct map since HEAD supports those. Submitted by: neel	2011-05-15 02:09:12 +00:00
John Baldwin	34a6b2d627	First cut at porting the kernel portions of 221828 and 221905 from the BHyVe reference branch to HEAD.	2011-05-14 20:35:01 +00:00
Peter Grehan	6c4c7d0f96	bhyve import part 2 of 2, guest kernel changes. This branch is now considered frozen: future bhyve development will take place in a branch off -CURRENT. sys/dev/bvm/bvm_console.c sys/dev/bvm/bvm_dbg.c - simple console driver/gdb debug port used for bringup. supported by user-space bhyve executable sys/conf/options.amd64 sys/amd64/amd64/minidump_machdep.c - allow NKPT to be set in the kernel config file sys/amd64/conf/GENERIC - mptable config options; bhyve user-space executable creates an mptable with number of CPUs, and optional vendor extension - add bvm console/debug - set NKPT to 512 to allow loading of large RAM disks from the loader - include kdb/gdb sys/amd64/amd64/local_apic.c sys/amd64/amd64/apic_vector.S sys/amd64/include/specialreg.h - if x2apic mode available, use MSRs to access the local APIC, otherwise fall back to 'classic' MMIO mode sys/amd64/amd64/mp_machdep.c - support AP spinup on CPU models that don't have real-mode support by overwriting the real-mode page with a message that supplies the bhyve user-space executable with enough information to start the AP directly in 64-bit mode. sys/amd64/amd64/vm_machdep.c - insert pause statements into cpu shutdown busy-wait loops sys/dev/blackhole/blackhole.c sys/modules/blackhole/Makefile - boot-time loadable module that claims all PCI bus/slot/funcs specified in an env var that are to be used for PCI passthrough sys/amd64/amd64/intr_machdep.c - allow round-robin assignment of device interrupts to CPUs to be disabled from the loader sys/amd64/include/bus.h - convert string ins/outs instructions to loops of individual in/out since bhyve doesn't support these yet sys/kern/subr_bus.c - if the device was no created with a fixed devclass, then remove it's association with the devclass it was associated with during probe. Otherwise, new drivers do not get a chance to probe/attach since the device will stay married to the first driver that it probed successfully but failed to attach. Sponsored by: NetApp, Inc.	2011-05-14 18:37:24 +00:00
Attilio Rao	b2aa562e7b	MFC	2011-05-13 20:58:48 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Peter Grehan	366f60834f	Import of bhyve hypervisor and utilities, part 1. vmm.ko - kernel module for VT-x, VT-d and hypervisor control bhyve - user-space sequencer and i/o emulation vmmctl - dump of hypervisor register state libvmm - front-end to vmm.ko chardev interface bhyve was designed and implemented by Neel Natu. Thanks to the following folk from NetApp who helped to make this available: Joe CaraDonna Peter Snyder Jeff Heller Sandeep Mann Steve Miller Brian Pawlowski	2011-05-13 04:54:01 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Dmitry Chagin	98cde5eede	Remove wrong comment. MFC after: 1 week.	2011-05-11 17:57:15 +00:00
Jung-uk Kim	00c885e181	Add SC_PIXEL_MODE to GENERIC for amd64 and i386. Requested by: many	2011-05-10 16:44:16 +00:00
Attilio Rao	bd55ede060	MFC	2011-05-09 18:53:13 +00:00
Jung-uk Kim	65e7d70b09	Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.	2011-05-09 17:34:00 +00:00
Attilio Rao	aa8b9e0706	MFC	2011-05-06 22:45:33 +00:00
Andriy Gapon	fdf30d59a6	prepare code that does topology detection for amd cpus for bulldozer This also introduces a new detection path for family 10h and newer pre-bulldozer cpus, pre-10h hardware should not be affected. Tested by: Gary Jennejohn <gljennjohn@googlemail.com> (with pre-10h hardware) MFC after: 2 weeks	2011-05-06 13:51:54 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	8c0ef2464e	Revert md_assert_preempt() introduction. Discussed with: jeff, jhb	2011-05-04 20:29:40 +00:00
Attilio Rao	94ebcddde3	MFC	2011-05-03 18:57:46 +00:00
John Baldwin	6162795be0	Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can be disabled via 'nooptions NEW_PCIB'.	2011-05-03 18:23:11 +00:00
John Baldwin	83c41143ca	Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows. This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried. When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails. Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however. The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges. The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range. The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed. PR: kern/143874 kern/149306 Tested by: mav	2011-05-03 17:37:24 +00:00
Attilio Rao	7be8a2de4f	MFC @ r221324	2011-05-02 14:23:36 +00:00
John Baldwin	d2c9344ff9	Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.	2011-05-02 14:13:12 +00:00
Bernhard Schmidt	d1f25d5dcb	Add the remaining wireless drivers. Discussed with: joel	2011-05-01 13:26:34 +00:00
Attilio Rao	f1edea81ac	Add the function md_assert_nopreempt(), which is a very consistent function on the possibility of a thread to not preempt. As this function is very tied to x86 (interrupts disabled checkings) it is not intended to be used in MI code.	2011-04-30 23:12:37 +00:00
Kevin Lo	5aaea65247	Add urtw(4)	2011-04-29 06:36:39 +00:00
Jung-uk Kim	c34e9dbee1	Define "Hypervisor Present" bit. This bit is used by several hypervisors to identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware, and MS Hyper-V are known to set this bit. MFC after: 3 days	2011-04-28 22:23:39 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Rick Macklem	4309e17add	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.	2011-04-27 17:51:51 +00:00
Alexander Motin	0d307e0905	- Add shim to simplify migration to the CAM-based ATA. For each new adaX device in /dev/ create symbolic link with adY name, trying to mimic old ATA numbering. Imitation is not complete, but should be enough in most cases to mount file systems without touching /etc/fstab. - To know what behavior to mimic, restore ATA_STATIC_ID option in cases where it was present before. - Add some more details to UPDATING.	2011-04-26 17:01:49 +00:00
Maxim Sobolev	f30bc1f3b5	With the typical memory size of the system in tenth of gigabytes counting memory being dumped in 16MB increments is somewhat silly. Especially if the dump fails and everything you've got for debugging is screen filled with numbers in 16 decrements... Replace that with percentage-based progress with max 10 updates all fitting into one line. Collapse other very "useful" piece of crash information (total ram) into the same line to save some more space. MFC after: 1 week	2011-04-26 16:14:55 +00:00
Rick Macklem	7c208ed659	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them. Reviewed by: jhb MFC after: 2 weeks	2011-04-25 22:22:51 +00:00
Alexander Motin	97b53e3634	Switch the GENERIC kernels for all architectures to the new CAM-based ATA stack. It means that all legacy ATA drivers are disabled and replaced by respective CAM drivers. If you are using ATA device names in /etc/fstab or other places, make sure to update them respectively (adX -> adaY, acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential numbers for each type in order of detection, unless configured otherwise with tunables, see cam(4)). ataraid(4) functionality is now supported by the RAID GEOM class. To use it you can load geom_raid kernel module and use graid(8) tool for management. Instead of /dev/arX device names, use /dev/raid/rX.	2011-04-24 08:58:58 +00:00
Konstantin Belousov	3136faa59d	Make pmap_invalidate_cache_range() available for consumption on amd64. Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU cache for the set of pages, which are not neccessary mapped. Since its supposed use is to prepare the move of the pages ownership to a device that does not snoop all CPU accesses to the main memory (read GPU in GMCH), do not rely on CPU self-snoop feature. amd64 implementation takes advantage of the direct map. On i386, extract the helper pmap_flush_page() from pmap_page_set_memattr(), and use it to make a temporary mapping of the flushed page. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2011-04-18 21:24:42 +00:00
Jung-uk Kim	0e72764232	Add a function rdtsc32() to read lower 32 bits from TSC and discard upper 32 bits. Some times compiler inserts unnecessary instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. It reduces such compiler mistakes where every cycle counts.	2011-04-14 16:53:32 +00:00
Jung-uk Kim	4854ae249c	Consistently use __volatile as the rest of this file.	2011-04-14 16:19:41 +00:00
Jung-uk Kim	f5ac47f44c	Prefer C99 standard integers to reduce diff from i386 version.	2011-04-14 16:14:35 +00:00
Jung-uk Kim	a7817c7ae5	Reduce errors in effective frequency calculation.	2011-04-12 23:49:07 +00:00
Jung-uk Kim	b9e4376214	Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and MPERF MSRs are available. It was disabled in r216443. Remove the earlier hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little bit more reliable now.	2011-04-12 23:04:01 +00:00
Jung-uk Kim	dd3e254ebd	Add forgotten declarations for tsc_perf_stat from the previous commit.	2011-04-12 22:22:01 +00:00
Jung-uk Kim	155094d77a	Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.	2011-04-12 22:15:46 +00:00
Jung-uk Kim	3731174954	Add definitions for CPUID instruction 6, ECX information.	2011-04-12 22:12:23 +00:00
Konstantin Belousov	2140d9c83b	Remove setting of PCB_FULL_IRET at the places where we are going to call update_gdt_{f,g}sbase. The functions set the flag when td == curthread, and sysarch is always called with curthread. Reviewed by: jhb, jkim MFC after: 1 week	2011-04-08 21:27:31 +00:00
Konstantin Belousov	5ab73cbcba	Disable local interrupts before testing the PCB_FULL_IRET flag. Thread might be preempted after testing, which causes the flag to be cleared. If ast was not delivered, we will do sysret with potentially wrong fs/gs bases. Reviewed by: jhb, jkim MFC after: 1 week (together with r220430, r220452)	2011-04-08 21:26:50 +00:00
Ryan Stone	7d6a0bf373	Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi and machdep.kdb_on_nmi. Approved by: emaste (mentor) MFC after: 1 week	2011-04-08 14:39:41 +00:00
John Baldwin	13fb631aff	Fix a bug in the previous change to restore the fast path for syscall return. The ast() function may cause a context switch in which case PCB_FULL_IRET would be set in the pcb. However, the code was not rechecking the flag after ast() returned and would not properly restore the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after ast() returns. While here, trim an instruction (and memory access) from the doreti path and fix a typo in a comment. MFC after: 1 week	2011-04-08 13:33:57 +00:00
John Baldwin	ff265077cf	Catch up to PCB_FULL_IRET becoming a pcb flag rather than a full field. MFC after: 3 days	2011-04-08 13:30:48 +00:00
Jung-uk Kim	3453537fa5	Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).	2011-04-07 23:28:28 +00:00
John Baldwin	615d2dffa3	pcb_flags is an int, so use testl rather than testq. Pointy hat to: jhb Submitted by: jkim MFC after: 1 week	2011-04-07 23:13:22 +00:00
John Baldwin	1438b4ced1	If a system call does not request a full interrupt return, use a fast path via the sysretq instruction to return from the system call. This was removed in 190620 and not quite fully restored in 195486. This resolves most of the performance regression in system call microbenchmarks between 7 and 8 on amd64. Reviewed by: kib MFC after: 1 week	2011-04-07 21:32:25 +00:00
Jung-uk Kim	efd393d539	Remove stale checks for RDTSC support. amd64 must have TSC support anyway.	2011-04-07 21:29:34 +00:00
Konstantin Belousov	7332c129e0	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
Andriy Gapon	a930718af1	Revert r220032:linux compat: add SO_PASSCRED option with basic handling I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg	2011-03-31 08:14:51 +00:00
Adrian Chadd	dba9c85977	Break out the ath PCI logic into a separate device/module. Introduce the AHB glue for Atheros embedded systems. Right now it's hard-coded for the AR9130 chip whose support isn't yet in this HAL; it'll be added in a subsequent commit. Kernel configuration files now need both 'ath' and 'ath_pci' devices; both modules need to be loaded for the ath device to work.	2011-03-31 08:07:13 +00:00
Edward Tomasz Napierala	72bcfc9693	Revert part of r220137, committed by mistake - RACCT is _not_ supposed to be enabled in GENERIC.	2011-03-29 18:16:49 +00:00
Edward Tomasz Napierala	097055e26d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
Alan Cox	1c675a3bc3	The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000 instead of 0x100000. As a side effect, an amd64 kernel now loads at physical address 0x200000 instead of 0x100000. This is probably for the best because it avoids the use of a 2MB page mapping for the first 1MB of the kernel that also spans the fixed MTRRs. However, getmemsize() still thinks that the kernel loads at 0x100000, and so the physical memory between 0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired constant in getmemsize() by a symbol "kernphys" that is defined by the linker script. In collaboration with: kib	2011-03-28 06:35:17 +00:00
Alan Cox	63740078e0	Amd64 doesn't have a lazypmap ipi.	2011-03-27 16:18:51 +00:00
Andriy Gapon	01a9e1a11b	linux compat: add SO_PASSCRED option with basic handling This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks	2011-03-26 11:25:36 +00:00
Andriy Gapon	931f0826ea	linux compat: add non-dummy capget and capset system calls, regenerate And drop dummy definitions for those system calls. This may transiently break the build. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:59:24 +00:00
Andriy Gapon	1f4ec5a3ba	linux compat: add non-dummy capget and capset system calls PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks	2011-03-26 10:51:56 +00:00
Dmitry Chagin	acface683e	Export the correct AT_PLATFORM value. Since signal trampolines are copied to the shared page do not need to leave place on the stack for it. Forgotten in the previous commit. MFC after: 1 Week	2011-03-26 09:25:35 +00:00
Alan Cox	1587dfd730	Move an external declaration to the appropriate header file.	2011-03-26 06:21:05 +00:00
Jung-uk Kim	cd45fec044	Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU capabilities (e.g., CMPXCHG8B) to work around compatibility issues with ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID and there is no MSR to expose the feature although all mP6 processors are capable of CMPXCHG8B according to datasheets I found from the Net. Clean up and simplify VIA PadLock detection while I am in the neighborhood.	2011-03-26 02:02:07 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	d2b74735b8	For now remove options FLOWTABLE from the remaining GENERIC kernel configurations and make it opt-in for those who want it. LINT will still build it. While it may be a perfect win in some scenarios, it still troubles users (see PRs) in general cases. In addition we are still allocating resources even if disabled by sysctl and still leak arp/nd6 entries in case of interface destruction. Discussed with: qingli (2010-11-24, just never executed) Discussed with: juli (OCTEON1) PR: kern/148018, kern/155604, kern/144917, kern/146792 MFC after: 2 weeks	2011-03-19 15:50:34 +00:00
Jung-uk Kim	38b8542ca9	Deprecate tsc_present as the last of its real consumers finally disappeared.	2011-03-15 17:19:52 +00:00
David Christensen	dd46ab31de	- Initial release of bxe(4) to support Broadcom NetXtreme II 10GbE. (BCM57710, BCM57711, BCM57711E) MFC after: One month	2011-03-14 22:42:41 +00:00
Dmitry Chagin	8f1e49a638	Enable shared page use for amd64/linux32 and i386/linux binaries. Move signal trampoline code from the top of the stack to the shared page. MFC after: 2 Weeks	2011-03-13 14:58:02 +00:00
Andriy Gapon	d549ef5638	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls Regenerate system call and systrace support files. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (earlier version) MFC after: 3 weeks	2011-03-12 08:58:19 +00:00
Andriy Gapon	56ede1074e	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls This commits makes necessary changes in syscall/sysent generation infrastructure. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (ealier version) MFC after: 3 weeks	2011-03-12 08:51:43 +00:00
Andriy Gapon	136882cf92	amd64/NOTES: use a greater number in KSTACK_PAGES example This is a minor cosmetic change - the users are more likely to want to increase (rather than decrease) default kernel stack size, which is already 4 pages on amd64. MFC after: 4 days	2011-03-11 19:21:42 +00:00
Matthew D Fleming	c77715ef6c	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.	2011-03-11 18:56:55 +00:00
Jung-uk Kim	79422085d4	Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.	2011-03-11 00:44:32 +00:00
Matthew D Fleming	cd67ac41ae	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh	2011-03-10 22:56:00 +00:00
Jung-uk Kim	bc34c87e81	Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.	2011-03-10 20:02:58 +00:00
Julian Elischer	a8066a9d3b	Add a small change to the comment in the GENRIC config files that include udbp Submitted by: Chris Forgron, cforgeron at acsi dot ca MFC after: 1 week	2011-03-09 17:15:11 +00:00
Dmitry Chagin	e5d81ef1b5	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
Dmitry Chagin	372f5e052f	Remove dead code. MFC after: 1 Week	2011-03-07 08:12:07 +00:00
Alan Cox	cb25117d54	Make a change to the implementation of the direct map to improve performance on processors that support 1 GB pages. Specifically, if the end of physical memory is not aligned to a 1 GB page boundary, then map the residual physical memory with multiple 2 MB page mappings rather than a single 1 GB page mapping. When a 1 GB page mapping is used for this residual memory, access to the memory is slower than when multiple 2 MB page mappings are used. (I suspect that the reason for this slowdown is that the TLB is actually being loaded with 4 KB page mappings for the residual memory.) X-MFC after: r214425	2011-03-02 00:24:07 +00:00
Robert Watson	74b5505e5d	Continue to introduce Capsicum capability mode: White list sysarch calls allowed in capability mode; arguably, there should be some link between the capability mode model and the privilege model here. Sysarch is a morass similar to ioctl, in many senses. Submitted by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:35:48 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Dmitry Chagin	dc4f0a9e11	To avoid excessive code duplication create wrapper for fill regs from stack frame. Change the trap() code to use newly created function instead of explicit regs assignment.	2011-02-16 17:50:21 +00:00
Dmitry Chagin	09d6cb0a23	For realtime signals fill the sigval value.	2011-02-15 21:46:36 +00:00
Dmitry Chagin	fde6316272	Sort include files in the alphabetical order.	2011-02-13 19:07:48 +00:00
Dmitry Chagin	222198ab0b	Move linux_clone(), linux_fork(), linux_vfork() to a MI path.	2011-02-12 18:17:12 +00:00
Dmitry Chagin	c8d6845e9e	In preparation for moving linux_clone() to a MI path introduce linux_set_upcall_kse().	2011-02-12 16:33:00 +00:00
Dmitry Chagin	2c7660ba3e	In preparation for moving linux_clone () to a MI path move the TLS code in a separate function. Use function parameter instead of direct using register.	2011-02-12 15:50:21 +00:00
Dmitry Chagin	9bd9b52478	Regen for r218610.	2011-02-12 15:36:25 +00:00
Dmitry Chagin	f91ea2518b	The fourth argument of linux_clone is a pointer to the TLS. Change clone syscall definition to match actual linux one.	2011-02-12 15:33:25 +00:00
Konstantin Belousov	6f9ec5aab0	Clear the padding when returning context to the usermode, for MI ucontext_t and x86 MD parts. Kernel allocates the structures on the stack, and not clearing reserved fields and paddings causes leakage. Noted and discussed with: bde MFC after: 2 weeks	2011-02-05 15:10:27 +00:00
Matthew D Fleming	08b163fa51	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week	2011-02-02 16:35:10 +00:00
Dmitry Chagin	77192fddeb	Regen for r218101. MFC after: 1 Month.	2011-01-30 20:38:26 +00:00
Dmitry Chagin	8d73c2bfd1	Change linux futex syscall definition to match actual linux one. MFC after: 1 Month.	2011-01-30 20:31:43 +00:00
Dmitry Chagin	9adaae9403	The kern_wait() code already removes the SIGCHLD signal for the waited process. Removing other SIGCHLD signals is not needed and may cause problems. Pointed out by: jilles MFC after: 1 Month.	2011-01-30 18:17:38 +00:00
Dmitry Chagin	572fb2e33e	My style(9) bug. Pointed out by: kib MFC after: 1 Month.	2011-01-29 07:22:33 +00:00
Dmitry Chagin	adc7ece00a	Implement a variation of the linux_common_wait() which should be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.	2011-01-28 18:47:07 +00:00
Dmitry Chagin	53c74fc607	To avoid excessive code duplication move struct rusage translation to a separate function. MFC after: 1 Month.	2011-01-28 18:28:06 +00:00
Konstantin Belousov	77185f473b	linux_sigreturn() loads the struct trapframe from l_sigcontext members, thus making a signed extension of 32 bit register context. If the register is not touched in usermode between return from signal and next syscall entry, the sign-extension part of 64bit register is not cleared, causing linux32_fetch_syscall_args() to read wrong values. Use unsigned type for the registers in the linux sigcontext. Reported by: Jacob Frelinger <jacob.frelinger duke edu>, arundel In collaboration with: dchagin MFC after: 1 week	2011-01-27 21:45:38 +00:00
Dmitry Chagin	a5c1afadeb	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month	2011-01-26 20:03:58 +00:00
Matthew D Fleming	f89f7ada8d	Set td_kstack_pages for thread0. This was already being done for most architectures, but i386 and amd64 were missing it. Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>	2011-01-26 17:06:13 +00:00
Sergey Kandaurov	4053b05b91	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe	2011-01-21 10:26:26 +00:00
Konstantin Belousov	de64ee1a30	Use CTLFLAG_RDTUN for read-only sysctl that exports tunable. Reminded by: pjd MFC after: 6 days	2011-01-19 21:35:48 +00:00
Konstantin Belousov	9e52b8b629	Make the length of the LDT a loader tunable, machdep.max_ldt_segment, and export it with read-only sysctl. Remove unused defines. Reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 23:00:22 +00:00
Konstantin Belousov	a05c98a099	Use malloc(9) instead of kmem_alloc(9) for temporal copy of the user-supplied descriptor array. Noted and reviewed by: jhb (previous version) MFC after: 1 week	2011-01-18 22:56:10 +00:00
John Baldwin	6bd823f334	- Remove some always-true checks (checking for unsigned < 0). - Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT when descriptors are provided. Specifically, allow the 'start == 0' and 'num == 0' special case used to free all LDT entries that previously failed with EINVAL. Submitted by: clang via rdivacky (some of 1) Reviewed by: kib	2011-01-18 16:43:01 +00:00
Jung-uk Kim	2fea643112	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
Jung-uk Kim	df74996c3d	Avoid preemption while manipulating CRs and MTRRs. Tested by: ariff	2011-01-17 17:30:35 +00:00
Jung-uk Kim	bdbf2db5b2	Remove redundant, bogus, and even harmful uses of setting TS bit in CR0. It is done from fpstate_drop() when it is really necessary. Reviewed by: kib MFC after: 1 week	2011-01-14 21:09:01 +00:00
Matthew D Fleming	240577c2a7	Fix up a few more sysctl(9) mis-typing found in various LINT builds.	2011-01-13 18:20:27 +00:00
John Baldwin	072e9838e2	If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt. Tested by: steve	2011-01-13 17:00:22 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
Tijl Coosemans	d22e78d6b9	Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs. Approved by: kib (mentor)	2011-01-08 18:09:48 +00:00
Konstantin Belousov	6297a3d843	Create shared (readonly) page. Each ABI may specify the use of page by setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions. Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page. Reviewed by: alc	2011-01-08 16:13:44 +00:00
Tijl Coosemans	a56e818f29	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
Tijl Coosemans	9858863cd4	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
Konstantin Belousov	39198f15ee	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
Jung-uk Kim	50e3cec377	Increase size of pcb_flags to four bytes. Requested by: bde, jhb	2010-12-22 19:57:03 +00:00
Jung-uk Kim	e6c006d96a	Improve PCB flags handling and make it more robust. Add two new functions for manipulating pcb_flags. These inline functions are very similar to atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK prefix for SMP. Add comments about the rationale[1]. Use these functions wherever possible. Although there are some places where it is not strictly necessary (e.g., a PCB is copied to create a new PCB), it is done across the board for sake of consistency. Turn pcb_full_iret into a PCB flag as it is safe now. Move rarely used fields before pcb_flags and reduce size of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in the neighborhood. Reviewed by: kib Submitted by: kib[1] MFC after: 2 months	2010-12-22 00:18:42 +00:00
Tijl Coosemans	81bd5041a2	Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs. Rename (AMD64\|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere. Reviewed by: imp (previous version), jhb Approved by: kib (mentor)	2010-12-20 16:39:43 +00:00
Konstantin Belousov	7222d2fbee	Inform a compiler which asm statements in the x86 implementation of atomics change eflags. Reviewed by: jhb MFC after: 2 weeks	2010-12-18 16:41:11 +00:00
Jung-uk Kim	e1c9d39ebe	Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This function always returned the nominal frequency instead of current frequency because we use RDTSC instruction to calculate difference in CPU ticks, which is supposedly constant for the case. Now we support cpu_get_nominal_mhz() for the case, instead. Note it should be just enough for most usage cases because cpu_est_clockrate() is often times abused to find maximum frequency of the processor.	2010-12-14 20:07:51 +00:00
Robert Watson	9c9f06e60d	Add options NO_ADAPTIVE_SX to the XENHVM kernel configuration, matching its similar disabling of adaptive mutexes and rwlocks. The existing comment on why this is the case also applies to sx locks. MFC after: 3 days Discussed with: attilio	2010-12-13 12:15:46 +00:00
Konstantin Belousov	60c7c84e85	In fpudna()/npxdna(), mark FPU context initialized and optionally mark user FPU context initialized, if current context is user context. It was reversed in r215865, by inadequate change of this code fragment to a call to fpuuserinited()/npxuserinited(). The issue is only relevant for in-kernel users of FPU. Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net> Tested by: Mike Tancsa MFC after: 3 days	2010-12-12 16:16:39 +00:00
Robert Watson	996177338f	Derive the XENHVM kernel from GENERIC, adding only the options required to support PV drivers (such as xenpci), and non-adptive locking (along with a comment about why). This change eliminates the synchronisation problem between GENERIC and XENHVM, which had become severely rotted in HEAD, and in 8-STABLE included non-production kernel debugging features such as WITNESS. However, it comes at the cost of enabling devices and options that may not be present under Xen (such as random ethernet cards). For now, opt for a simpler kernel configuration file rather than using nooptions/ nodevice to enumerate and eliminate them. This leads to a somewhat larger XENHVM kernel. This is an MFC candidate for 8-STABLE before 8.2, in order to provide a production-worthy XENHVM kernel configuration for amd64. Discussed with: gibbs, cperciva Reported by: Piete Brooks <Piete.Brooks at cl.cam.ac.uk> Sponsored by: DARPA, AFRL MFC after: 3 days	2010-12-10 22:22:01 +00:00
Colin Percival	91ff9dc058	Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.	2010-12-09 06:41:50 +00:00
Jung-uk Kim	71e0b05797	Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC. Remove a confusing comment about converting to MHz as we never did.	2010-12-08 23:40:41 +00:00
Colin Percival	af60888734	On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192, while on i386 we have MAX_BPAGES=512. Implement this difference via '#ifdef __i386__'. With this commit, the i386 and amd64 busdma_machdep.c files become identical; they will soon be replaced by a single file under sys/x86.	2010-12-08 20:20:10 +00:00
Colin Percival	ec195da48a	MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen support, but if/when that support is ever added we'll want this, and until then it's harmless.	2010-12-08 19:52:04 +00:00
Colin Percival	81261a5a6d	MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and branch mispredictions. No objections from: scottl	2010-12-08 19:42:21 +00:00
Jung-uk Kim	dd7d207dcb	Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86. Discussed with: avg	2010-12-08 00:09:24 +00:00
Jung-uk Kim	7214d5d75b	Remove stale comments about P-state invariant TSC and fix style(9) nits.	2010-12-07 22:43:25 +00:00
Jung-uk Kim	1bcc28295b	Do not register a event handler for CPU freqency changes when it is found P-state invariant. This is continuation of r216274.	2010-12-07 22:34:51 +00:00
Jung-uk Kim	4a9c4056dc	Now the P-state invariant TSC is probed early enough, do not register event handlers for CPU freqency changes when it is found P-state invariant. Adjust a comment about non-existent tsc_freq_max() while I am here.	2010-12-07 22:23:26 +00:00
Jung-uk Kim	78a661bbaa	Probe P-state invariant TSC from rightful place.	2010-12-07 22:12:02 +00:00
Konstantin Belousov	1b3c32568a	Update some comments related to use of amd64 full context switch. In exec_linux_setregs(), use locally cached pointer to pcb to set pcb_full_iret. In set_regs(), note that full return is needed when code that sets segment registers is enabled. MFC after: 1 week	2010-12-07 12:44:33 +00:00
Konstantin Belousov	0f0170e66a	Retire write-only PCB_FULLCTX pcb flag on amd64. Reminded by: Petr Salinger <Petr.Salinger seznam cz> Tested by: pho MFC after: 1 week	2010-12-07 12:17:43 +00:00
Konstantin Belousov	3e0ddb6781	Do not leak %rdx value in the previous image to the new image after execve(2). Note that ia32 binaries already handle this properly, since ia32_setregs() resets td_retval[1], but not exec_setregs(). We still do not conform to the amd64 ABI specification, since %rsp on the image startup is not aligned to 16 bytes. PR: amd64/124134 Discussed with: Petr Salinger <Petr.Salinger seznam cz> (who convinced me that there is indeed several bugs) MFC after: 1 week	2010-12-06 15:15:27 +00:00
Jung-uk Kim	2f7ab7e85d	Revert r216161. It is not necessary because we zero-fill BSS anyway. Requested by: jhb	2010-12-03 22:27:51 +00:00
Jung-uk Kim	b14fe63392	Explicitly initialize TSC frequency. To calibrate TSC frequency, we use DELAY(9) and it may use TSC in turn if TSC frequency is non-zero. MFC after: 3 days	2010-12-03 21:54:10 +00:00
Jung-uk Kim	e391a266ed	Do not change CPU ticker frequency if TSC is P-state invariant. Note this change was meant to be committed with r184102 (and its subsequent MFCs) but it fell off somehow. Pointyhat to: jkim MFC after: 3 days	2010-12-03 21:06:30 +00:00
Rebecca Cran	c90f7d9b44	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Konstantin Belousov	c6fb218c3c	Calling fill_fpregs() for curthread is legitimate, and ELF coredump does this. Reported and tested by: pho MFC after: 5 days	2010-11-28 17:56:34 +00:00
Alan Cox	686b00d691	Make the size of the direct map easily configurable. Changing NDMPML4E now suffices. Increase the size of the direct map to 1TB. An earler version of this patch was tested by sbruno@.	2010-11-26 19:36:26 +00:00
Konstantin Belousov	5c6eb03790	Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs() functions, they are unused. Remove 'user' from npxgetuserregs() etc. names. For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU context storage. This eliminates the need for ugly copying with overwrite of the newly added and reserved fields in ucontext on i386 to satisfy alignment requirements for fpusave() and fpurstor(). pc98 version was copied from i386. Suggested and reviewed by: bde Tested by: pho (i386 and amd64) MFC after: 1 week	2010-11-26 14:50:42 +00:00
Tijl Coosemans	ce4ec51dbe	Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc). Reviewed by: imp, jhb Approved by: kib (mentor)	2010-11-26 10:59:20 +00:00
Ulrich Spörlein	02604cd4f4	Remove kernel support for BB profiling, now that kernbb(8) is gone, too. PR: bin/83558 Reviewed by: jkim	2010-11-26 08:11:43 +00:00
Dimitry Andric	1496505287	Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use unambiguous inline assembly to load a float variable.	2010-11-25 22:19:40 +00:00
Dimitry Andric	cfe92f33bc	Change ambiguous (or invalid, depending on how strict you want to be :) assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since the intent was to move 16 bits of data, in this case. Found by: clang Reviewed by: kib	2010-11-24 18:35:11 +00:00
Jung-uk Kim	d2d0fda841	Remove a stale tunable introduced in r215703.	2010-11-23 17:28:23 +00:00
Jung-uk Kim	42ca4a29de	Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does better job since r215703 and it is safer now.	2010-11-23 16:12:35 +00:00
Andriy Gapon	9b984feb3d	specialreg.h: add definitions for some useful bits found in CPUID.6 EAX and ECX CPUID.6 is defined as Thermal and Power Management Leaf by both Intel and AMD. Reviewed by: jhb MFC after: 7 days	2010-11-23 13:55:30 +00:00
Jung-uk Kim	7dd052c1d9	- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR. Flushing TLBs is required to ensure cache coherency according to the AMD64 architecture manual. Flushing caches is only required when changing from a cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or UC-). Since this function is only used once per processor during startup, there is no need to take any shortcuts. - Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the default of WB and UC. This is to avoid transition from a cacheable memory type to an uncacheable type to minimize possible cache incoherency. Since we perform flushing caches and TLBs now, this change may not be necessary any more but we do not want to take any chances. - Remove Apple hardware specific quirks. With the above changes, it seems this hack is no longer needed. - Improve pmap_cache_bits() with an array to map PAT memory type to index. This array is initialized early from pmap_init_pat(), so that we do not need to handle special cases in the function any more. Now this function is identical on both amd64 and i386. Reviewed by: jhb Tested by: RM (reuf_m at hotmail dot com) Ryszard Czekaj (rychoo at freeshell dot net) army.of.root (army dot of dot root at googlemail dot com) MFC after: 3 days	2010-11-22 19:52:44 +00:00
Andriy Gapon	b43d292565	specialreg.h: add definitions for MPERF/APERF pair of MSRs These MSRs can be used to determine actual (average) performance as compared to a maximum defined performance. Availability of these MSRs is indicated by bit0 in CPUID.6.ECX on both Intel and AMD processors. MFC after: 5 days	2010-11-19 15:07:36 +00:00
Andriy Gapon	7af7c7624a	specialreg.h: add AMD-specific "Hardware Configuration Register" MSR It seems that this MSR has been available in a range of AMD processors families for quite a while now. Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in the i386 version. Note2: perhaps some additional name component is needed to distinguish AMD-specific MSRs. MFC after: 5 days	2010-11-19 15:00:20 +00:00
Andriy Gapon	8fd6d51347	specialreg.h: add definition for AMD Core Performance Boost bit This bit indicates availability of the feature. MFC after: 4 days	2010-11-19 14:46:17 +00:00
Jung-uk Kim	816b3bd1b0	Restore CR0 after MTRR initialization for correctness sakes. There will be no noticeable change because we enable caches before we enter here for both BSP and AP cases. Remove another pointless optimization for CR4.PGE bit while I am here.	2010-11-16 23:26:02 +00:00
Jung-uk Kim	50083a5624	Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this code but probably it only worked by chance because modifying CR4.PGE bit causes invlidation of entire TLBs. Since these are very rare events, this micro-optimization seems useless. Reviewed by: jhb	2010-11-16 22:44:58 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Jung-uk Kim	19da400c64	Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly. Reviewed by: imp, nyan	2010-11-11 19:36:21 +00:00
Andriy Gapon	290e14f881	amd64: introduce minidump version 2 After KVA space was increased to 512GB on amd64 it became impractical to use PTEs as entries in the minidump map of dumped pages, because size of that map alone would already be 1GB. Instead, we now use PDEs as page map entries and employ two stage lookup in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are now dumped as regular pages. Fixed page map size now is 2MB. libkvm keeps support for accessing amd64 minidumps of version 1. Support for 1GB pages is added. Many thanks to Alan Cox for his guidance, numerous reviews, suggestions, enhancments and corrections. Reviewed by: alc [kernel part] MFC after: 15 days	2010-11-11 18:35:28 +00:00
Jung-uk Kim	93a8847473	Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.	2010-11-10 18:50:12 +00:00
Jung-uk Kim	7c2bf852d7	Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.	2010-11-10 01:29:56 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Attilio Rao	fcb250f392	Move the mptable.h under x86/include/. Sponsored by: Sandvine Incorporated MFC after: 14 days	2010-11-09 20:28:09 +00:00
Jung-uk Kim	cedd86cafa	Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.	2010-11-09 00:27:18 +00:00
Jung-uk Kim	2473325fa8	Reduce diff between platforms and fix style(9) bugs.	2010-11-09 00:14:39 +00:00
John Baldwin	13e25cb7a5	Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.	2010-11-08 20:57:02 +00:00
John Baldwin	f67b4bd367	A few small style and whitespace fixes.	2010-11-08 20:05:22 +00:00
Alan Cox	d9a799683c	Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked as "bogus". Reported by: Jia-Shiun Li	2010-11-07 21:48:49 +00:00
John Baldwin	0108cce0a4	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month	2010-11-05 13:42:58 +00:00
Andriy Gapon	3b50d59fef	x86 topo_probe: do not probe smp topology if only one cpu is visible This could lead to a division by zero if hardware is multi-core and/or multi-threaded, but for some (quite unusual) reason FreeBSD sees only one logical processor. This could happen, for example, if neither MADT nor MP Table are presented by BIOS. Also: - assert in topo_probe_0x4 that BSP is accounted for - neither cpu_cores nor cpu_logical should be zero after successful probing, so either being zero is an indication of failed probing Reported by: vwe, Dan Allen <danallen46@airwired.net> Tested by: Dan Allen <danallen46@airwired.net> MFC after: 3 days	2010-11-04 08:51:45 +00:00
John Baldwin	32c3d3b6e6	Move <machine/apicreg.h> to <x86/apicreg.h>.	2010-11-01 18:18:46 +00:00
John Baldwin	5ecdb3c46b	Move the <machine/mca.h> header to <x86/mca.h>.	2010-11-01 17:40:35 +00:00
Alan Cox	2eeee67ce8	Add another safety belt to pmap_demote_DMAP().	2010-10-30 23:49:37 +00:00
Alan Cox	59fb2d9b04	Don't demote in pmap_demote_DMAP() if the specified length is zero.	2010-10-30 17:21:32 +00:00
Attilio Rao	ba2a27351b	Merge nexus.c from amd64 and i386 to x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-28 16:31:39 +00:00
John Baldwin	89d84a4055	Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[] is only populated for multiple-CPU machines. This also matches what the code does when SMP is not enabled. PR: bin/151616 Tested by: "Damian S. Kolodziejczyk" damkol \| gmail Submitted by: avg MFC after: 1 week	2010-10-28 13:44:19 +00:00
Attilio Rao	a3da97926d	Merge the mptable support from MD bits to x86 subtree. Sponsored by: Sandvine Incorporated Discussed with: jhb	2010-10-28 07:58:06 +00:00
Alan Cox	92ababa777	[1] According to the x86 architectural specifications, no virtual-to- physical page mapping should span two or more MTRRs of different types. Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can ensure that the direct map region doesn't have such a mapping. [2] Fix a couple of nearby style errors in amd64_mrset(). [3] Re-enable the use of 1GB page mappings for implementing the direct map. (See also r197580 and r213897.) Tested by: kib@ on a Westmere-family processor [3] MFC after: 3 weeks	2010-10-27 16:46:37 +00:00
Attilio Rao	256439c972	Merge dump_machdep.c i386/amd64 under the x86 subtree. Sponsored by: Sandvine Incorporated Tested by: gianni	2010-10-26 12:46:26 +00:00
John Baldwin	0689bdcc19	Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable(). Requested by: bde	2010-10-25 15:31:13 +00:00
John Baldwin	c6390f7ac5	Use intr_disable() and intr_restore() instead of frobbing the flags register directly to disable interrupts. Reviewed by: bde (earlier version) MFC after: 2 weeks	2010-10-25 15:28:03 +00:00
Alan Cox	353b642ced	Update pmap_extract() to handle 1GB page mappings. Some device drivers use pmap_extract() rather than pmap_kextract() on direct map addresses. Thus, pmap_extract() needs to be able to deal with 1GB page mappings if we are to use 1GB page mappings for the direct map. (See r197580.)	2010-10-15 15:23:34 +00:00
Jung-uk Kim	56b11f84a7	Remove trailing ", " from `sysctl machdep.idle_available' output.	2010-10-12 20:53:12 +00:00
Konstantin Belousov	78ae4338a2	Add macro DECLARE_MODULE_TIED to denote a module as requiring the kernel of exactly the same __FreeBSD_version as the headers module was compiled against. Mark our in-tree ABI emulators with DECLARE_MODULE_TIED. The modules use kernel interfaces that the Release Engineering Team feel are not stable enough to guarantee they will not change during the life cycle of a STABLE branch. In particular, the layout of struct sysentvec is declared to be not part of the STABLE KBI. Discussed with: bz, rwatson Approved by: re (bz, kensmith) MFC after: 2 weeks	2010-10-12 09:18:17 +00:00
Konstantin Belousov	b3b4bec7e6	Regen.	2010-10-08 07:19:05 +00:00
Konstantin Belousov	5d2a6a61b4	Fix typo. Submitted by: arundel MFC after: 3 days	2010-10-08 07:18:44 +00:00
Konstantin Belousov	3f506a78ce	Display PCID capability of CPU and add CPUID define for it. MFC after: 1 week	2010-10-05 15:31:56 +00:00
Konstantin Belousov	2d5db3709b	The makectx() function, used by kdb_trap() to reconstruct pcb from trap frame when trap initiated kdb entry, incorrectly calculated the value of %rsp for trapped thread. According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than only on a CPL change." Even assuming the conditional push of the %ss:%rsp, the calculation was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64. Always use the tf_rsp from trap frame. The change supposedly fixes stepping when using kgdb backend for kdb. Submitted by: Zhouyi Zhou <zhouzhouyi gmail com> PR: amd64/151167 Reviewed by: avg MFC after: 1 week	2010-10-03 13:52:17 +00:00
Andriy Gapon	d443a96ffb	i386 and amd64 mp_machdep: improve topology detection for Intel CPUs This patch is significantly based on previous work by jkim. List of changes: - added comments that describe topology uniformity assumption - added reference to Intel Processor Topology Enumeration article - documented a few global variables that describe topology - retired weirdly set and used logical_cpus variable - changed fallback code for mp_ncpus > 0 case, so that CPUs are treated as being different packages rather than cores in a single package - moved AMD-specific code to topo_probe_amd [jkim] - in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT and core masks and match APIC IDs against those masks [started by jkim] - in topo_probe_0x4() drop code for double-checking topology parameters by looking at L1 cache properties [jkim] - in topo_probe_0xb() add fallback path to topo_probe_0x4() as prescribed by Intel [jkim] Still to do: - prepare for upcoming AMD CPUs by using new mechanism of uniform topology description [pointed by jkim] - probe cache topology in addition to CPU topology and probably use that for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have the same CPU topology, but Athlon cores do not share L2 cache while Core2's do (no L3 cache in both cases) - think of supporting non-uniform topologies if they are ever implemented for platforms in question - think how to better described old HTT vs new HTT distinction, HTT vs SMT can be confusing as SMT is a generic term - more robust code for marking CPUs as "logical" and/or "hyperthreaded", use HTT mask instead of modulo operation - correct support for halting logical and/or hyperthreaded CPUs, let scheduler know that it shouldn't schedule any threads on those CPUs PR: kern/145385 (related) In collaboration with: jkim Tested by: Sergey Kandaurov <pluknet@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Chip Camden <sterling@camdensoftware.com>, Steve Wills <steve@mouf.net>, Olivier Smedts <olivier@gid0.org>, Florian Smeets <flo@smeets.im> MFC after: 1 month	2010-10-01 10:32:54 +00:00
Neel Natu	5c1a8dc028	Fix bogus error message from bus_dmamem_alloc() about incorrect alignment. The check for alignment should be made against the physical address and not the virtual address that maps it. Sponsored by: NetApp Submitted by: Will McGovern (will at netapp dot com) Reviewed by: mjacob, jhb	2010-09-29 21:53:11 +00:00
David Xu	295fbd498e	Now userland POSIX semaphore is based on umtx. The kernel module is only used to support binary compatible, if want to run old binary, you need to kldload the module.	2010-09-24 09:04:16 +00:00
Norikatsu Shigemura	cbf4dac64f	Add support 'device tpm' for amd64. Add tpm(4)'s default setting to /boot/defaults/loader.conf. Add 'device tpm' to NOTES for amd64 and i386. Discussed with: takawata Approved by: imp (mentor)	2010-09-19 14:40:37 +00:00
Andriy Gapon	0b750af1b1	amd64: reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory KVA space is abundant on amd64, so there is no reason to limit kernel map size to a fraction of available physical memory. In fact, it could be larger than physical memory. This should help with memory auto-tuning for ZFS and shouldn't affect other workloads. This should reduce number of circumstances for "kmem_map too small" panics, but probably won't eliminate them entirely due to potential kmem fragmentation. In fact, you might want/need to limit maximum ARC size after this commit if you need to resrve more memory for applications. This change was discussed on arch@ and nobody said "don't do it". MFC after: 6 weeks	2010-09-17 07:36:32 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Kenneth D. Merry	d3c7b9a08a	MFp4 (//depot/projects/mps/...) Bring in a driver for the LSI Logic MPT2 6Gb SAS controllers. This driver supports basic I/O, and works with SAS and SATA drives and expanders. Basic error recovery works (i.e. timeouts and aborts) as well. Integrated RAID isn't supported yet, and there are some known bugs. So this isn't ready for production use, but is certainly ready for testing and additional development. For the moment, new commits to this driver should go into the FreeBSD Perforce repository first (//depot/projects/mps/...) and then get merged into -current once they've been vetted. This has only been added to the amd64 GENERIC, since that is the only architecture I have tested this driver with. Submitted by: scottl Discussed with: imp, gibbs, will Sponsored by: Yahoo, Spectra Logic Corporation	2010-09-10 15:03:56 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Roman Divacky	27d4fea6c5	Change the parameter passed to the inline assembly to u_short as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink	2010-09-03 14:25:17 +00:00
Jung-uk Kim	305c5c0acb	Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use these values in the function.	2010-08-30 21:19:42 +00:00
Rui Paulo	cba3269417	Register an interrupt vector for DTrace return probes. There is some code missing in lapic to make sure that we don't overwrite this entry, but this will be done on a sequent commit. Sponsored by: The FreeBSD Foundation	2010-08-28 08:03:29 +00:00
Rui Paulo	0bc1991a4a	Call the necessary DTrace function pointers when we have different kinds of traps. Sponsored by: The FreeBSD Foundation	2010-08-25 09:10:32 +00:00
Rui Paulo	8a8d8fa3d1	Add two DTrace trap type values. Used by fasttrap. Sponsored by: The FreeBSD Foundation	2010-08-24 13:13:24 +00:00
Attilio Rao	67a94de261	Revert part of the r211149 as I erroneously ported the logical_cpus from Yahoo! patchset as a mask (and according manipulating variables) while it is actually a CPU count. Submitted by: neel MFC after: 1 month X-MFC: 211149	2010-08-19 22:37:43 +00:00
John Baldwin	8c7a92bd4a	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
Pietro Cerutti	e0e08e6a60	- The iMac9,1 needs the PAT workaround as well Approved by: cognet	2010-08-17 12:17:24 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
Jung-uk Kim	0405a5efe7	Reset switchtime to zero rather than the current CPU ticker (TSC) value. It is more appropriate in this context because TSC MSR is reset to zero when the CPU is restarted from S3 and above. Move acpi_resync_clock() back to where it was before r211202. It does not make a difference any more.	2010-08-13 22:08:42 +00:00
Attilio Rao	3742bd96fe	Revert r211176: As long as interrupts are disabled and there is not explicit call to sched_add() there can't be any preemption there, thus the calls may be consistent. Reported by: kib, jhb	2010-08-12 13:46:43 +00:00
Jung-uk Kim	a1004d0abf	Reset switchtime and switchticks after resynchronizing the system clock. This should fix weird runtime problem after resume on amd64. It also fixes "calcru: runtime went backwards" warnings with bootverbose.	2010-08-12 00:20:46 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
Attilio Rao	807ef45666	IPI handlers may run generally with interrupts disabled because they are served via an interrupt gate. However, that doesn't explicitly prevent preemption and thread migration thus scheduler pinning may be necessary in some handlers. Fix that. Tested by: gianni MFC after: 1 month	2010-08-11 10:51:27 +00:00
Attilio Rao	7cd8b4cd42	Fix a typo due to a stale version of the patch. Reported by: gianni, rdivacky MFC after: 1 month X-MFC: 211149	2010-08-10 18:29:39 +00:00
Attilio Rao	4c967b618d	Fix some places that may use cpumask_t while they still use 'int' types. While there, also fix some places assuming cpu type is 'int' while u_int is really meant. Note: this will also fix some possible races in per-cpu data accessings to be addressed in further commits. In collabouration with: Yahoo! Incorporated (via sbruno and peter) Tested by: gianni MFC after: 1 month	2010-08-10 16:14:10 +00:00
Attilio Rao	d35534bf42	Simplify the logic for handling ipi_selected() and ipi_cpu() in the amd64/i386 case. Reviewed by: jhb Tested by: gianni MFC after: 1 month X-MFC: 210939	2010-08-09 20:25:06 +00:00
David Malone	ee04083c8a	Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not being used.	2010-08-08 20:34:53 +00:00
Konstantin Belousov	1757d9699d	Prefer struct sysentvec sv_psstrings to hardcoding FREEBSD32_PS_STRINGS in the compat32 code. Use sv_usrstack instead of FREEBSD32_USRSTACK as well. MFC after: 1 week	2010-08-07 11:57:13 +00:00
Bernhard Schmidt	5ec432ed82	Fix whitespace nits. PR: conf/148989 Submitted by: pluknet <pluknet at gmail.com> MFC after: 3 days	2010-08-06 18:46:27 +00:00
Jung-uk Kim	64299552b9	Remove unnecessary casting and simplify code. We are not there yet. ;-)	2010-08-06 17:21:32 +00:00
Jung-uk Kim	05db09e056	Correct argument order of acpi_restorecpu(), which was forgotten in r210804.	2010-08-06 15:59:00 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
John Baldwin	e2865ebbc2	Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the generic PCI-PCI bridge driver and only override specific methods. This should fix suspend/resume of PCI-PCI bridges using these drivers.	2010-08-05 17:48:37 +00:00
Jung-uk Kim	aa9928df7a	Remove an unnecessary register load.	2010-08-03 16:08:58 +00:00
Jung-uk Kim	3ab42a25a9	savectx() has not been used for fork(2) for about 15 years. [1] Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU, unconditionally reload FPU state. Pointed out by: bde [1]	2010-08-03 15:32:08 +00:00
Jung-uk Kim	6305bb243c	Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make better use of cache lines by placing it before pcb_save (now pcb_user_save), which is moved to the end of pcb since r210777.	2010-08-02 18:12:30 +00:00
Jung-uk Kim	a2d2c83668	- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1] savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]	2010-08-02 17:35:00 +00:00
John Baldwin	7134e39042	Tweak the logic to disable CLFLUSH in virtual environments to work around problems with flushing the local APIC register range so that it checks vm_guest directly. Reviewed by: kib, alc MFC after: 2 weeks	2010-08-02 17:01:23 +00:00
Xin LI	16430b12a3	In rdmsr_safe, use zero extend (by doing a 32-bit movl over eax to itself) instead of a sign extend. Discussed with: stas MFC after: 1 month	2010-07-30 21:39:28 +00:00
Xin LI	a3bc0a4e5c	Improve cputemp(4) driver wrt newer Intel processors, especially Xeon 5500/5600 series: - Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place of Tj(max) when a sane value is available, as documented in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane value we mean 70C - 100C for now); - Print the probe results when booting verbose; - Replace cpu_mask with cpu_stepping; - Use CPUID_* macros instead of rolling our own. Approved by: rpaulo MFC after: 1 month	2010-07-29 19:08:22 +00:00
John Baldwin	536af0d751	Mark the __curthread() functions as __pure2 and remove the volatile keyword from the inline assembly. This allows the compiler to cache invocations of curthread since it's value does not change within a thread context. Submitted by: zec (i386) MFC after: 1 week	2010-07-29 18:44:10 +00:00
Jung-uk Kim	9727ca6a77	Fix another fallout from r208833. savectx() is used to save CPU context for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot have a valid pointer in pcb_save. This should restore the previous behaviour.	2010-07-29 16:49:20 +00:00
Jung-uk Kim	39381048f0	Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.	2010-07-29 16:41:21 +00:00

... 5 6 7 8 9 ...

6349 Commits