Commit Graph

12761 Commits

Author SHA1 Message Date
jkim
de579b60c1 Simplify check for CMPXCHG8B instruction. Note CMPXCHG8B instruction is
always available for Rise mP6 processors although it is not set by CPUID.
2013-08-15 21:09:05 +00:00
jkim
20df47e877 Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these
two files were functionally identical.
2013-08-13 22:05:10 +00:00
jkim
c55a3ec3ad Tidy up global locks for ACPICA. There is no functional change. 2013-08-13 21:34:03 +00:00
kib
4675fcfce0 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00
attilio
e9f37cac74 On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (older version)
Reviewed by:	jeff
Tested by:	pho, scottl
2013-08-09 11:28:55 +00:00
attilio
16c7563cf4 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
avg
6f05d223ad follow up to r254051
- update powerpc/GENERIC64 as well, suggested by mdf
- update comments so that they make sense after the change, suggested by
  jhb

X-MFC after:	never (change specific to head)
2013-08-09 08:11:09 +00:00
avg
0255b98224 enable KDB_TRACE in GENERICs
KDB_TRACE is not an alternative to DDB/etc, they are complementary.
So I do not see any reason to not enable KDB_TRACE by default.

X-MFC after:	never (change specific to head)
2013-08-07 08:03:50 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
jeff
baea70e2d2 - Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space.  This works around several
   asserts from pmap_demote_pde_locked that did not apply and gave false
   warnings.

Discovered by:	pho
Reviewed by:	alc
Sponsored by:	EMC / Isilon Storage Division
2013-08-05 00:28:03 +00:00
obrien
7999076e3e Back out r253779 & r253786. 2013-07-31 17:21:18 +00:00
obrien
721ce839c7 Decouple yarrow from random(4) device.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
  The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.

* random(4) device doesn't really depend on rijndael-*.  Yarrow, however, does.

* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
  random_adaptor is basically an adapter that plugs in to random(4).
  random_adaptor can only be plugged in to random(4) very early in bootup.
  Unplugging random_adaptor from random(4) is not supported, and is probably a
  bad idea anyway, due to potential loss of entropy pools.
  We currently have 3 random_adaptors:
  + yarrow
  + rdrand (ivy.c)
  + nehemeiah

* Remove platform dependent logic from probe.c, and move it into
  corresponding registration routines of each random_adaptor provider.
  probe.c doesn't do anything other than picking a specific random_adaptor
  from a list of registered ones.

* If the kernel doesn't have any random_adaptor adapters present then the
  creation of /dev/random is postponed until next random_adaptor is kldload'ed.

* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
  system wide one.

Submitted by: arthurmesh@gmail.com, obrien
Obtained from: Juniper Networks
Reviewed by: obrien
2013-07-29 20:26:27 +00:00
avg
4e6c4b2a36 Revert r253748,253749
This WIP should not have been committed yet.

Pointyhat to:	avg
2013-07-28 18:44:17 +00:00
avg
ea04b46439 put contents of cpu.h under _KERNEL
no userland-serviceable parts inside

MFC after:	20 days
2013-07-28 18:32:27 +00:00
avg
425573ccba x86: detect mwait capabilities and extensions, when present
Reviewed by:	kib (earlier amd64-only version)
MFC after:	2 weeks
2013-07-28 17:54:42 +00:00
jeff
c8d343ae8b - Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
   uses of kmem_malloc().

Discussed with:	kib/alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-07-26 19:06:14 +00:00
ae
998db3fe4a Include sys/systm.h after sys/param.h.
Suggested by:	pluknet
2013-07-15 15:40:57 +00:00
glebius
8f9d71bb1b Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5.
Now that r253351 moved sendfile() stats to a separate struct, the
last field used in mbstat is m_mcfail, which is updated, but never
read or obtained from userland.
2013-07-15 12:18:36 +00:00
ae
6f8e41d6cb Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00
kib
b39f834413 Create a proper stack frame for i386 version of bcopy(), despite the
function is leaf.  The frame allows ddb to not loose the direct caller
of bcopy() in backtrace.

Other functions from support.s would benefit from the same change as
well, but for now bcopy() is the most frequent offender.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-07-13 19:42:52 +00:00
kib
ba3ad070c0 Explicitely panic instead of possibly doing undefined things when
ptelist KVA is exhausted.  Currently this cannot happen, the added
panic serves as assert.

Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-07-11 05:15:30 +00:00
kib
79797fb649 MFamd64 r253140:
Clear m->object for the page taken from the delayed free list in
pmap_pv_reclaim().

Noted by:	alc
2013-07-11 05:10:36 +00:00
delphij
15305ee17a Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver.
This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms.

Many thanks to HighPoint for providing this driver.

MFC after:	1 day
2013-07-06 07:49:41 +00:00
kib
8a0279994d Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32.  If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot.  The effect would be a counter going off by
arbitrary value after zeroing.  Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive.  On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching.  It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm).  If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by:  bde
Sponsored by:	The FreeBSD Foundation
2013-07-01 02:48:27 +00:00
jkim
70506a4dd6 Move definitions required by userland applications out of acpica_machdep.h. 2013-06-27 00:22:40 +00:00
kib
03abf664bd Some clarifications and updates for the comments, mostly retrieved
from Bruce Evans.  Trim the trailing spaces.

MFC after:	1 week
2013-06-19 05:05:16 +00:00
gibbs
16fc3a3c4b Adjust i386 Xen PV support for updated Xen interface files.
sys/i386/include/xen/xenvar.h:
sys/i386/xen/xen_machdep.c:
sys/xen/interface/foreign/structs.py:
sys/xen/evtchn/evtchn.c:
	MAX_VIRT_CPUS => XEN_LEGACY_MAX_VCPUS

Submitted by:	Roger Pau Monné
Reviewed by:	gibbs
2013-06-17 01:43:07 +00:00
gibbs
d4000dfe1d Upgrade Xen interface headers to Xen 4.2.1.
Move FreeBSD from interface version 0x00030204 to 0x00030208.
Updates are required to our grant table implementation before we
can bump this further.

sys/xen/hvm.h:
	Replace the implementation of hvm_get_parameter(), formerly located
	in sys/xen/interface/hvm/params.h.  Linux has a similar file which
	primarily stores this function.

sys/xen/xenstore/xenstore.c:
	Include new xen/hvm.h header file to get hvm_get_parameter().

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
	Correctly protect function definition and variables from being
	included into assembly files in xen-os.h

	Xen memory barriers are now prefixed with "xen_" to avoid conflicts
	with OS native primatives.  Define Xen memory barriers in terms of
	the native FreeBSD primatives.

Sponsored by:	Spectra Logic Corporation
Reviewed by:	Roger Pau Monné
Tested by:	Roger Pau Monné
Obtained from:	Roger Pau Monné (bug fixes)
2013-06-14 23:43:44 +00:00
jeff
7ee88fb112 - Add a BIT_FFS() macro and use it to replace cpusetffs_obj()
Discussed with:	attilio
Sponsored by:	EMC / Isilon Storage Division
2013-06-13 20:46:03 +00:00
kib
17884c1f07 Assert that interrupts are enabled in the trap handlers on x86 before
calling generic code to deliver signals.

Discussed with:	bde
Tested by:	pho
MFC after:	1 week
2013-06-03 17:40:05 +00:00
kib
ce0f25b29f MFamd64: when printing the trap information, show the %esp value.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-06-03 04:19:21 +00:00
kib
eaa828d3a5 Use slightly more idiomatic expression to get the address of array.
Tested by:	dim, pgj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:39:39 +00:00
kib
da32b7c314 When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64.  The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic.  The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear.  Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault.  For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-27 18:26:08 +00:00
achim
10ab667100 Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver.
Approved by:	scottl (mentor)
2013-05-24 09:22:43 +00:00
attilio
fdf82ef9cf o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
  to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
  operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
  mostl of the times only with readlocks.

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc
2013-05-21 20:38:19 +00:00
marcel
65b2bbd1ff Add basic support for FDT to i386 & amd64. This change includes:
1.  Common headers for fdt.h and ofw_machdep.h under x86/include
    with indirections under i386/include and amd64/include.
2.  New modinfo for loader provided FDT blob.
3.  Common x86_init_fdt() called from hammer_time() on amd64 and
    init386() on i386.
4.  Split-off FDT specific low-level console functions from FDT
    bus methods for the uart(4) driver. The low-level console
    logic has been moved to uart_cpu_fdt.c and is used for arm,
    mips & powerpc only. The FDT bus methods are shared across
    all architectures.
5.  Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
    fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from:	Juniper Networks, Inc.
2013-05-21 03:05:49 +00:00
ed
f489ad2fec Improve readability of static assertions for OFFSET_* macros.
Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.
2013-05-13 21:47:17 +00:00
peter
cec511ab65 Tidy up some CVS workarounds. 2013-05-12 01:53:47 +00:00
attilio
b24a52ec9e Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
tijl
1e5834db92 Remove redundant definitions of _ALIGN and _ALIGNBYTES. 2013-04-21 11:12:44 +00:00
gabor
5635c6550b - Correct spelling in comments
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:56:11 +00:00
gabor
3d6a89082d - Correct mispellings of word miscellaneous
Submitted by:	Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
2013-04-17 11:43:46 +00:00
kib
7117978a81 Fix the name of the pcb member in the comments.
Submitted by:	Oliver Pinter <oliver.pntr@gmail.com>
MFC after:	3 days
2013-04-13 15:20:33 +00:00
trasz
80b8b2f779 Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'
and kern.cam.ctl.disable tunable; those were introduced as a workaround
to make it possible to boot GENERIC on low memory machines.

With ctl(4) being built as a module and automatically loaded by ctladm(8),
this makes CTL work out of the box.

Reviewed by:	ken
Sponsored by:	FreeBSD Foundation
2013-04-12 16:25:03 +00:00
kib
4eacc6be7b Pass the segmented address of the counter, based on %fs, i.e. offset
from the pcpu[0] to the counter base, instead of the linear address.
2013-04-09 17:55:39 +00:00
glebius
9cf64d6c35 Merge from projects/counters: counter(9).
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with:	kib
Reviewed by:		luigi
Tested by:		ae, ray
Sponsored by:		Nginx, Inc.
2013-04-08 19:40:53 +00:00
glebius
8c6eba117e Merge from projects/counters:
Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by:	Nginx, Inc.
2013-04-08 19:19:10 +00:00
mav
7c2b81b0e9 Remove all legacy ATA code parts, not used since options ATA_CAM enabled in
most kernels before FreeBSD 9.0.  Remove such modules and respective kernel
options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam.  Remove the
atacontrol utility and some man pages.  Remove useless now options ATA_CAM.

No objections:	current@, stable@
MFC after:	never
2013-04-04 07:12:24 +00:00
kib
2c9ac98d06 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
kib
7c26a038f9 Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA.  The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer.  For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer.  The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings.  Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags.  Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by:	The FreeBSD Foundation
Discussed with:	jeff (previous version)
Tested by:	pho, scottl (previous version), jhb, bf
MFC after:	2 weeks
2013-03-19 14:13:12 +00:00