Commit Graph

12339 Commits

Author SHA1 Message Date
mdf
3d3b036f95 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
mav
1881f29e6e Refactor Xen PV code to use new event timers subsystem. That uses one-shot
Xen timer and time counter to provide one-shot and periodic time events.

On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.

Now only ia64 and some ARMs left not migrated to the new event timers.
2011-05-13 12:39:37 +00:00
jkim
a112b2dac2 Add SC_PIXEL_MODE to GENERIC for amd64 and i386.
Requested by:	many
2011-05-10 16:44:16 +00:00
jkim
6d3172737b Implement boot-time TSC synchronization test for SMP. This test is executed
when the user has indicated that the system has synchronized TSCs or it has
P-state invariant TSCs.  For the former case, we may clear the tunable if it
fails the test to prevent accidental foot-shooting.  For the latter case, we
may set it if it passes the test to notify the user that it may be usable.
2011-05-09 17:34:00 +00:00
mav
3881dcb2ab Don't use MWAIT for short sleeps under XEN, as it was before r212541.
This fixes panic during boot in PV mode on Xen 3.2.
2011-05-07 12:27:25 +00:00
avg
777b49b2a5 prepare code that does topology detection for amd cpus for bulldozer
This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by:	Gary Jennejohn <gljennjohn@googlemail.com>
		(with pre-10h hardware)
MFC after:	2 weeks
2011-05-06 13:51:54 +00:00
jhb
f4c1badc8d Enable the new PCI-PCI bridge driver on amd64 and i386 by default. It can
be disabled via 'nooptions NEW_PCIB'.
2011-05-03 18:23:11 +00:00
jhb
51bd96b572 Reimplement how PCI-PCI bridges manage their I/O windows. Previously the
driver would verify that requests for child devices were confined to any
existing I/O windows, but the driver relied on the firmware to initialize
the windows and would never grow the windows for new requests.  Now the
driver actively manages the I/O windows.

This is implemented by allocating a bus resource for each I/O window from
the parent PCI bus and suballocating that resource to child devices.  The
suballocations are managed by creating an rman for each I/O window.  The
suballocated resources are mapped by passing the bus_activate_resource()
call up to the parent PCI bus.  Windows are grown when needed by using
bus_adjust_resource() to adjust the resource allocated from the parent PCI
bus.  If the adjust request succeeds, the window is adjusted and the
suballocation request for the child device is retried.

When growing a window, the rman_first_free_region() and
rman_last_free_region() routines are used to determine if the front or
end of the existing I/O window is free.  From using that, the smallest
ranges that need to be added to either the front or back of the window
are computed.  The driver will first try to grow the window in whichever
direction requires the smallest growth first followed by the other
direction if that fails.

Subtractive bridges will first attempt to satisfy requests for child
resources from I/O windows (including attempts to grow the windows).  If
that fails, the request is passed up to the parent PCI bus directly
however.

The PCI-PCI bridge driver will try to use firmware-assigned ranges for
child BARs first and only allocate a "fresh" range if that specific range
cannot be accommodated in the I/O window.  This allows systems where the
firmware assigns resources during boot but later wipes the I/O windows
(some ACPI BIOSen are known to do this) to "rediscover" the original I/O
window ranges.

The ACPI Host-PCI bridge driver has been adjusted to correctly honor
hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge
makes a wildcard request for an I/O window range.

The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option
is enabled.  This is a transition aide to allow platforms that do not
yet support bus_activate_resource() and bus_adjust_resource() in their
Host-PCI bridge drivers (and possibly other drivers as needed) to use the
old driver for now.  Once all platforms support the new driver, the
kernel option and old driver will be removed.

PR:		kern/143874 kern/149306
Tested by:	mav
2011-05-03 17:37:24 +00:00
bschmidt
07db5669ea All PCI based wireless drivers seem to be explicitly removed from the
PAE kernel config, do that also for those added to GENERIC lately.
2011-05-02 16:51:02 +00:00
jhb
3e97a80649 Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
2011-05-02 14:13:12 +00:00
bschmidt
bee0509ed3 Add the remaining wireless drivers.
Discussed with:	joel
2011-05-01 13:26:34 +00:00
kevlo
de19e5a7fe Add urtw(4) 2011-04-29 06:36:39 +00:00
jkim
369bfa0af2 Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation.  Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after:	3 days
2011-04-28 22:23:39 +00:00
attilio
d685681d59 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
rmacklem
66b402e198 This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
mav
519a30551e - Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
 - To know what behavior to mimic, restore ATA_STATIC_ID option in cases
where it was present before.
 - Add some more details to UPDATING.
2011-04-26 17:01:49 +00:00
rmacklem
8d09f58549 Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by:	jhb
MFC after:	2 weeks
2011-04-25 22:22:51 +00:00
mav
512a6cd715 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
jkim
cc6bebd7b6 Do not invoke resume event handlers if suspend was successful.
Pointy hat to:	jkim
2011-04-19 16:30:17 +00:00
jkim
8300589337 Add suspend/resume event handlers for apm(4) as well. 2011-04-19 16:20:55 +00:00
kib
01863c3790 Make pmap_invalidate_cache_range() available for consumption on amd64.
Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2011-04-18 21:24:42 +00:00
jkim
7d94642dc2 Add a function rdtsc32() to read lower 32 bits from TSC and discard upper
32 bits.  Some times compiler inserts unnecessary instructions to preserve
unused upper 32 bits even when it is casted to a 32-bit value.  It reduces
such compiler mistakes where every cycle counts.
2011-04-14 16:53:32 +00:00
jkim
8ada5a0bae Consistently use __volatile as the rest of this file. 2011-04-14 16:19:41 +00:00
jkim
9b08b2f085 Consistently use C99 standard integers as the rest of this file. 2011-04-14 16:02:52 +00:00
jkim
11b920013e Reduce errors in effective frequency calculation. 2011-04-12 23:49:07 +00:00
jkim
218c7113ed Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available.  It was disabled in r216443.  Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.
2011-04-12 23:04:01 +00:00
jkim
69382ad692 Add forgotten declarations for tsc_perf_stat from the previous commit. 2011-04-12 22:22:01 +00:00
jkim
8eb15cd79a Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.
2011-04-12 22:15:46 +00:00
jkim
df8e7b4e4c Add definitions for CPUID instruction 6, ECX information. 2011-04-12 22:12:23 +00:00
rstone
cd5ddda20d Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by:	emaste (mentor)
MFC after:	1 week
2011-04-08 14:39:41 +00:00
jkim
95c723445e Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
jkim
0c7a0c810c Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors.  Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately.  If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP.  Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.
2011-04-06 23:59:59 +00:00
trasz
92bec9b84c Add accounting for most of the memory-related resources.
Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib (earlier version)
2011-04-05 20:23:59 +00:00
jkim
9ce8e5e965 Use cpu_ticks() for get_cyclecount(9) rather than checking existence of TSC
at run-time on i386.  cpu_ticks() is set to use RDTSC early enough on i386
where it is available.  Otherwise, cpu_ticks() is driven by the current
timecounter hardware as binuptime(9) does.  This also avoids unnecessary
namespace pollution from <machine/cputypes.h>.
2011-04-04 22:56:33 +00:00
avg
94ec7d2988 Revert r220032:linux compat: add SO_PASSCRED option with basic handling
I have not properly thought through the commit.  After r220031 (linux
compat: improve and fix sendmsg/recvmsg compatibility) the basic
handling for SO_PASSCRED is not sufficient as it breaks recvmsg
functionality for SCM_CREDS messages because now we would need to handle
sockcred data in addition to cmsgcred.  And that is not implemented yet.

Pointyhat to:	avg
2011-03-31 08:14:51 +00:00
adrian
6f4c1d61a6 Break out the ath PCI logic into a separate device/module.
Introduce the AHB glue for Atheros embedded systems. Right now it's
hard-coded for the AR9130 chip whose support isn't yet in this HAL;
it'll be added in a subsequent commit.

Kernel configuration files now need both 'ath' and 'ath_pci' devices; both
modules need to be loaded for the ath device to work.
2011-03-31 08:07:13 +00:00
avg
df7a39b1d0 linux compat: add SO_PASSCRED option with basic handling
This seems to have been a part of a bigger patch by dchagin that either
haven't been committed or committed partially.

Submitted by:	dchagin, nox
MFC after:	2 weeks
2011-03-26 11:25:36 +00:00
avg
ae4ae2c803 linux compat: add non-dummy capget and capset system calls, regenerate
And drop dummy definitions for those system calls.
This may transiently break the build.

PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:59:24 +00:00
avg
b49c51915d linux compat: add non-dummy capget and capset system calls
PR:		kern/149168
Submitted by:	John Wehle <john@feith.com>
Reviewed by:	netchild
MFC after:	2 weeks
2011-03-26 10:51:56 +00:00
dchagin
7a5ef72838 Export the correct AT_PLATFORM value.
Since signal trampolines are copied to the shared page do not need to
leave place on the stack for it. Forgotten in the previous commit.

MFC after:	1 Week
2011-03-26 09:25:35 +00:00
jkim
c5c94c9d77 Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs.  These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software.  Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net.  Clean up
and simplify VIA PadLock detection while I am in the neighborhood.
2011-03-26 02:02:07 +00:00
alc
c84b8f6e0c Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum.  As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.

While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386.  Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map.  The memory backing pipes is not
allocated from the kmem map.  It is allocated from its own submap of
the kernel map.  In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.)  Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386.  However, on amd64,
the value will be reduced by 2/3.  This is intentional.  The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva.  This change is
effectively restoring maxpipekva on amd64 to its prior value.

Eliminate init_param3() since it is no longer used.
2011-03-23 16:38:29 +00:00
jeff
2d7d8c05e7 - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
bz
c41eae2d13 For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it.  LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases.  In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with:	qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR:		kern/148018, kern/155604, kern/144917, kern/146792
MFC after:	2 weeks
2011-03-19 15:50:34 +00:00
jkim
881681406b Rework r219679. Always check CPU class at run-time to make it predictable.
Unfortunately, it pulls in <machine/cputypes.h> but it is small enough and
namespace pollution is minimal, I hope.

Pointed out by:	bde
Pointy hat:	jkim
2011-03-16 16:09:08 +00:00
jkim
8060d27e7b Partially revert r219672. After r198295, kernel need to seed randomness as
soon as possible for stack protector.  However, dummy timecounter does not
have enough entropy and we don't need to sacrifice Pentium class and later.

Pointed out by:	Maxim Dounin (mdounin at mdounin dot ru)
2011-03-15 21:45:10 +00:00
jkim
2108e6a856 Remove tsc_present from this file, really. 2011-03-15 18:09:29 +00:00
jkim
ad8ef5e4c7 Deprecate tsc_present as the last of its real consumers finally disappeared. 2011-03-15 17:19:52 +00:00
jkim
d3440080b0 Unconditionally use binuptime(9) for get_cyclecount(9) on i386. Since this
function is almost exclusively used for random harvesting, there is no need
for micro-optimization.  Adjust the manual page accordingly.
2011-03-15 17:14:26 +00:00
jkim
193712e7cf Make get_cyclecount(9) little bit more useful where binuptime(9) is used. 2011-03-14 23:30:14 +00:00