Commit Graph

12317 Commits

Author SHA1 Message Date
Christian Brueffer
87e5ba0b54 Fix XEN build, broken in r237924.
Reported by:	gcooper
Pointy hat:	brueffer
2012-07-02 14:03:19 +00:00
Christian Brueffer
593a9dc9b8 Replace an unreachable panic() in vm86_getptr (been there for 13 years) with
a KASSERT() behind the functions's only consumer.

Suggested by:	kib
Reviewed by:	kib
CID:		4494
Found with:	Coverity Prevent(tm)
MFC after:	2 weeks
2012-07-01 12:59:00 +00:00
Alan Cox
e30df26e7b Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.
2012-06-27 03:45:25 +00:00
Konstantin Belousov
a30facd9c7 Commit changes missed from r237435. Properly calculate the signal
trampoline addresses after the shared page is enabled.  Handle FreeBSD
ABIs without shared page support too.

Reported and tested by:	David Wolfskill <david catwhisker org>
	 (previous version)
Pointy hat to: kib
MFC after:   1 month
2012-06-22 16:05:56 +00:00
Konstantin Belousov
d69ae4126b Enable shared page on i386, now it has a use for vdso_timehands.
MFC after:	1 month
2012-06-22 07:16:29 +00:00
Konstantin Belousov
aea810386d Implement mechanism to export some kernel timekeeping data to
usermode, using shared page.  The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with:	bde
Reviewed by:	jhb
Tested by:	flo
MFC after:	1 month
2012-06-22 07:06:40 +00:00
Konstantin Belousov
232aa31fb9 Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after:  1 week
2012-06-22 06:38:31 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Alan Cox
6031c68de4 The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer.  This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by:	kib
MFC after:	6 weeks
2012-06-16 18:56:19 +00:00
Adrian Chadd
83567110bd Oops - use the actual 11n enable option. 2012-06-15 15:32:16 +00:00
Adrian Chadd
3342d83059 Ok, ok. 802.11n can be on by default in GENERIC in -HEAD.
God help me.
2012-06-15 02:16:29 +00:00
Jung-uk Kim
acd7df97cc - Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.
2012-06-13 21:03:01 +00:00
Mitsuru IWASAKI
77c80e2e5b Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().
2012-06-12 00:14:54 +00:00
Mitsuru IWASAKI
8a6c6fadc7 Some fixes for r236772.
- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by:	attilio@
2012-06-10 02:38:51 +00:00
Mitsuru IWASAKI
fb864578af Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
  among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
  amd64 code.

Reviewed by:	attilio@, acpi@
2012-06-09 00:37:26 +00:00
Alan Cox
23c0d041ba Various small changes to PV entry management:
Constify pc_freemask[].

pmap_pv_reclaim()
  Eliminate "freemask" because it was a pessimization.  Add a comment about
  the resident count adjustment.

free_pv_entry() [i386 only]
  Merge an optimization from amd64 (r233954).

get_pv_entry()
  Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
  (The right strategy needs more thought.  Moreover, there were unintended
  differences between the amd64 and i386 implementation.)

pmap_remove_pages()
  Eliminate unnecessary ()'s.
2012-06-04 03:51:08 +00:00
Andriy Gapon
7adc598a15 free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.

MFC after:	3 weeks
2012-06-03 08:01:12 +00:00
Alan Cox
0d6f49d84a Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.
2012-06-02 22:14:10 +00:00
Konstantin Belousov
fa9f322df9 Use plain store for atomic_store_rel on x86, instead of implicitly
locked xchg instruction.  IA32 memory model guarantees that store has
release semantic, since stores cannot pass loads or stores.

Reviewed by:	  bde, jhb
Tested by:	  pho
MFC after:	  2 weeks
2012-06-02 18:10:16 +00:00
Jung-uk Kim
9ad569771a Consistently use ACPI_SUCCESS() and ACPI_FAILURE() macros wherever possible. 2012-06-01 21:33:33 +00:00
Jung-uk Kim
db08ae007d Tidy up code clutter in SMP case a bit. No functional change. 2012-06-01 19:19:04 +00:00
Jung-uk Kim
108705d043 Call AcpiSetFirmwareWakingVector() with interrupt disabled for consistency. 2012-06-01 18:18:48 +00:00
Jung-uk Kim
d3638dc4de Improve style(9) in the previous commit. 2012-06-01 17:07:52 +00:00
Mitsuru IWASAKI
f0a101b7e2 Call AcpiLeaveSleepStatePrep() in interrupt disabled context
(described in ACPICA source code).

- Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c
  and call AcpiLeaveSleepStatePrep() in interrupt disabled context.
- Add acpi_wakeup_machdep() to execute wakeup MD procedures and call
  it twice in interrupt disabled/enabled context (ia64 version is
  just dummy).
- Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in
  order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep().
- Move identity mapping related code to acpi_install_wakeup_handler()
  (i386 version) for preparation of x86/acpica/acpi_wakeup.c
  (MFC candidate).

Reviewed by:	jkim@
MFC after:	2 days
2012-06-01 15:26:32 +00:00
Alan Cox
d85fbe8a91 Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().
2012-06-01 04:26:50 +00:00
Alan Cox
a2efa4249e Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.
2012-05-30 04:16:54 +00:00
Alan Cox
0490d34982 MFi386 pmap r233433
Disable detailed PV entry accounting by default.  (A config option for
  enabling it was already introduced in r233433.)
2012-05-29 16:11:15 +00:00
Alan Cox
6516bffdef Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

Tested by:	sbruno
MFC after:	5 weeks
2012-05-29 15:41:20 +00:00
Kevin Lo
544c5e5b53 Make sure that each va_start has one and only one matching va_end,
especially in error cases.
2012-05-29 01:48:06 +00:00
Alan Cox
4edfd622b5 Update a comment in get_pv_entry() to reflect the changes to the
synchronization of pv_vafree in r236158.
2012-05-28 17:35:23 +00:00
Alan Cox
8b0f4e0a0d Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c.  This new r/w lock is used primarily to synchronize access
to the PV lists.  However, it will be used in a somewhat unconventional
way.  As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

X-MFC after:	r236045
2012-05-27 16:24:00 +00:00
Alan Cox
33853281b4 Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues.  Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero.  However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed.  The new
pmap_pv_reclaim() doesn't even try.

MFC after:	5 weeks
2012-05-26 06:10:25 +00:00
Bjoern A. Zeeb
920b965865 MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip.  To be
  able to use some general checksum functions like in_addword()
  in a non-IPv4 context, limit the (also exported to user space)
  IPv4 specific functions to the times, when the ip.h header is
  present and IPVERSION is defined (to 4).

  We should consider more general checksum (updating) functions
  to also allow easier incremental checksum updates in the L3/4
  stack and firewalls, as well as ponder further requirements by
  certain NIC drivers needing slightly different pseudo values
  in offloading cases.  Thinking in terms of a better "library".

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-24 22:00:48 +00:00
Alan Cox
4e65634580 MF amd64 r233097, r233122
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.

Style fix to pmap_protect().
2012-05-24 15:25:35 +00:00
Konstantin Belousov
ccc00630ae Enable drm2 modules build.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
2012-05-23 21:07:01 +00:00
Mitsuru IWASAKI
fe756f2a59 Remove cpususpend IDT vector for XEN.
This broke XEN kernel building.
2012-05-20 08:17:20 +00:00
Mitsuru IWASAKI
29d8e665ba Revert the previous commit on wakecode address verbose printing.
This broke PAE kernel building.
2012-05-19 02:31:38 +00:00
Mitsuru IWASAKI
e3fd0bc1b2 Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after:	3 days
2012-05-18 18:55:58 +00:00
John Baldwin
424e69759c Centralize declaration of the debug.acpi sysctl node. 2012-05-17 17:58:53 +00:00
Andriy Gapon
99a312d048 i386 bootinfo: re-arrange EFI fields for natural alignment and packing
Suggested by:	bde
MFC after:	2 weeks
2012-05-13 09:25:39 +00:00
Alexander Motin
c078c18853 Add options GEOM_RAID into i386 and amd64 GENERIC kernels.
ataraid(4) previously was present there and having GEOM RAID is convinient.
Unlike other classes GEOM RAID can be set up from BIOS before install and
users are expecting it to be detected automatically.
2012-05-10 12:37:32 +00:00
Brooks Davis
b3a397a8de The DDB_CTF has little or nothing to do with the debugger so move it
next KDTRACE_HOOKS.
2012-05-09 01:37:48 +00:00
Alexander Leidinger
19e252baeb - >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
  they serve mostly as examples of what you can do with the static probe;s
  with moderate load the scripts may be overwhelmed, excessive lock-tracing
  may influence program behavior (see the last design decission)

Design decissions:
 - use "linuxulator" as the provider for the native bitsize; add the
   bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
 - Add probes only for locks which are acquired in one function and released
   in another function. Locks which are aquired and released in the same
   function should be easy to pair in the code, inter-function
   locking is more easy to verify in DTrace.
 - Probes for locks should be fired after locking and before releasing to
   prevent races (to provide data/function stability in DTrace, see the
   man-page of "dtrace -v ..." and the corresponding DTrace docs).
2012-05-05 19:42:38 +00:00
Attilio Rao
b8be27bf29 Revert part of r234723 by re-enabling the SMP protection for
intr_bind() on x86.
This has been requested by jhb and I strongly disagree with this,
but as long as he is the x86 and interrupt subsystem maintainer I will
follow his directives.

The disagreement cames from what we should really consider as a
public KPI. IMHO, if we really need a selection between the kernel
functions, we may need an explicit protection like _KERNEL_KPI, which
defines which subset of the kernel function might really be considered
as part of the KPI (for thirdy part modules) and which not.
As long as we don't have this mechanism I just consider any possible
function as usable by thirdy part code, thus intr_bind() included.

MFC after:	1 week
2012-05-03 21:44:01 +00:00
Dimitry Andric
460378bf13 Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after:	2 weeks
2012-04-29 11:04:31 +00:00
Attilio Rao
70dbd1604c Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by:	gibbs
Reviewed by:	jhb, marius
MFC after:	1 week
2012-04-26 20:24:25 +00:00
Brooks Davis
e9acaa9ae4 Enable DTrace hooks in GENERIC.
Reviewed by:	gnn
Approved by:	core (jhb, imp)
Requested by:	a cast of thousands
MFC after:	3 days
2012-04-20 21:37:42 +00:00
Jung-uk Kim
17b27db088 Regen for r234359. 2012-04-16 23:17:29 +00:00
Jung-uk Kim
f69f4d8630 Correct an argument type of iopl syscall for Linuxulator. This also fixes
a warning from Clang, i. e., "args->level < 0 is always false".
2012-04-16 23:16:18 +00:00
Jung-uk Kim
13fa650c75 Regen for r234357. 2012-04-16 22:59:51 +00:00