Commit Graph

11550 Commits

Author SHA1 Message Date
Robert Watson
bd875f5f13 Remove MAC kernel config files and add "options MAC" to GENERIC, with the
goal of shipping 8.0 with MAC support in the default kernel.  No policies
will be compiled in or enabled by default, but it will now be possible to
load them at boot or runtime without a kernel recompile.

While the framework is not believed to impose measurable overhead when no
policies are loaded (a result of optimization over the past few months in
HEAD), we'll continue to benchmark and optimize as the release approaches.
Please keep an eye out for performance or functionality regressions that
could be a result of this change.

Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2009-06-02 18:31:08 +00:00
Dmitry Chagin
f8cd0af232 Implement accept4 syscall.
Approved by:	kib (mentor)
MFC after:	1 month
2009-06-01 20:48:39 +00:00
Robert Watson
33dd50646e Regenerate generated syscall files following changes to struct sysent in
r193234.
2009-06-01 16:14:38 +00:00
Adrian Chadd
c22ca7f04f Fix the MP IPI code to differentiate between bitmapped IPIs and function IPIs.
This attempts to fix the IPI handling code to correctly differentiate
between bitmapped IPIs and function IPIs. The Xen IPIs were on low numbers
which clashed with the bitmapped IPIs.

This commit bumps those IPI numbers up to 240 and above (just like in the i386
code) and fiddles with the ipi_vectors[] logic to call the correct function.

This still isn't "right". Specifically, the IPI code may work fine for TLB
shootdown events but the rendezvous/lazypmap IPIs are thrown by calling ipi_*()
routines which don't set the call_func stuff (function id, addr1, addr2) that
the TLB shootdown events are. So the Xen SMP support is still broken.

PR:		135069
2009-05-31 08:11:39 +00:00
Adrian Chadd
23ca223bfa Remove some unused code in ipi_selected() .
The code path this was copied from (sys/i386/i386/mp_machdep.c:ipi_selected())
handles bitmap'ed IPIs and normal IPIs via separate notification paths. Xen
SMP handles them the same way.
2009-05-31 07:25:24 +00:00
Adrian Chadd
eee45687e9 Even though I'm not quite sure that the call_func stuff will work properly
in all the places/cases IPI messages will be generated, at least be consistent
with how the call_data pointer is assigned and cleared (ie, all done inside
the spinlock.

Ensure that its NULL before continuing, just to try and identify situations
where things are going horribly wrong.
2009-05-30 15:20:25 +00:00
Adrian Chadd
d68bbb81fb Don't schedule a CALL_FUNCTION_VECTOR software IPI if the IPI was signaled
via the bitmap (and thus sent via RESCHEDULE_VECTOR.)
2009-05-30 14:59:08 +00:00
Adrian Chadd
a235aceb1a Correctly report the IPI IRQs being created; make it clear what vectors they are for. 2009-05-30 06:37:03 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Adrian Chadd
f3ba9cc983 Revert to 2-clause. 2009-05-29 13:48:42 +00:00
Adrian Chadd
fd27593538 Fix the Xen TOD update when the hypervisor wall clock is nudged.
The "wall clock" in the current code is actually the hypervisor start time.
The time of day is the "start time" plus the hypervisor "uptime".

Large enough bumps in the dom0 clock lead to a hypervisor "bump" which is
implemented as a bump in the start time, not the uptime. The clock.c routines
were reading in the hypervisor start time and then using this as the TOD.
This meant that any hypervisor time bump would cause the FreeBSD DomU to
set its TOD to the hypervisor start time, rather than the actual TOD.

This fix is a bit hacky and some reshuffling should be done later on
to clarify what is going on. I've left the wall clock code alone.
(The code which updates shadow_tv and shadow_tv_version.)
A new routine adds the uptime to the shadow_tv, which is then used to
update the TOD.

I've included some debugging so it is obvious when the clock is nudged.

PR:	135008
2009-05-29 13:43:21 +00:00
Adrian Chadd
7d18ff9a2b Migrate the Xen hypervisor clock reading routines into something
sharable.
2009-05-29 13:36:06 +00:00
Adrian Chadd
58e3a119e5 Say hello to a very basic, read-only, Xen Hypervisor RTC.
The hypervisor doesn't provide a single "TOD" - it instead provides a
"start time" and a "running time". These are added together to form
the current TOD. The TOD is in UTC.

This RTC is only (initially) designed to be read at startup. There's
some further poking that needs to happen to pick up hypervisor time
changes (ie, by the Dom0 time being adjusted by something). This
time adjustment currently can cause "weird stuff" in the DomU clock;
I'll begin investigating and repairing that in subsequent commits.

PR:		135008
2009-05-28 04:17:05 +00:00
Warner Losh
1de9b53249 We don't need d_thread_t for cross-branch portability here anymore.
Move do struct thread * instead.
2009-05-20 16:47:40 +00:00
Warner Losh
09605c1806 Some minor style changes:
o Convert K&R function definitions to ANSI
o Eliminate spaces/tabs that should have been deleted as part of the de__P
  efforts
o Use struct thread * in preference to d_thread_t *.
2009-05-20 16:29:22 +00:00
John Baldwin
515c5b1ede Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs.  This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.
2009-05-20 16:11:22 +00:00
John Baldwin
dfc77ef51f - Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code.  Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
  during a machine check exception, or-in the overflow flag rather than
  replacing the other flags.

PR:		i386/134586 [2]
Submitted by:	Andi Kleen  andi-fbsd firstfloor.org
2009-05-18 21:50:06 +00:00
John Baldwin
d3da228f37 Add a read-only sysctl hw.pci.mcfg to mirror the tunable by the same name.
MFC after:	1 week
2009-05-18 21:47:32 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
Dmitry Chagin
3933bde22e Somewhere between 2.6.23 and 2.6.27, Linux added SOCK_CLOEXEC and
SOCK_NONBLOCK flags, that allow to save fcntl() calls.

Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-16 18:48:41 +00:00
John Baldwin
76dae09449 Trim the default set of device hints on i386 and amd64:
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
  le0 from i386.  All these hints were marked 'disabled' and thus already
  did not work "out of the box".

Discussed with:	imp
2009-05-14 21:53:35 +00:00
Attilio Rao
120b18d86f FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now).  Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by:	jhb, kmacy
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2009-05-14 17:43:00 +00:00
John Baldwin
9dc0b3d54f Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
  (i.e. Pentium), all this does is print out the value of the machine check
  registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
  a bit more involved.
  - First, there is limited support for decoding the CPU-independent MCA
    error codes in the kernel, and the kernel uses this to output a short
    description of any machine check events that occur.
  - When a machine check exception occurs, all of the MCx banks on the
    current CPU are scanned and any events are reported to the console
    before panic'ing.
  - To catch events for correctable errors, a periodic timer kicks off a
    task which scans the MCx banks on all CPUs.  The frequency of these
    checks is controlled via the "hw.mca.interval" sysctl.
  - Userland can request an immediate scan of the MCx banks by writing
    a non-zero value to "hw.mca.force_scan".
  - If any correctable events are encountered, the appropriate details
    are stored in a 'struct mca_record' (defined in <machine/mca.h>).
    The "hw.mca.count" is a count of such records and each record may
    be queried via the "hw.mca.records" tree by specifying the record
    index (0 .. count - 1) as the next name in the MIB similar to using
    PIDs with the kern.proc.* sysctls.  The idea is to export machine
    check events to userland for more detailed processing.
  - The periodic timer and hw.mca sysctls are only present if the CPU
    supports MCA.

Discussed with:	emaste (briefly)
MFC after:	1 month
2009-05-13 17:53:04 +00:00
Alan Cox
07a7b85e94 Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547.  However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page.  I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails.  This is appropriate because the creation of
mappings by pmap_copy() is optional.  They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after:	6 weeks
2009-05-13 07:42:53 +00:00
Christian Brueffer
d1e015bfee Remove unused variables.
Found with:	Coverity Prevent(tm)
CID:		4285, 4286
2009-05-12 22:11:02 +00:00
Dmitry Chagin
8d30f381ef Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by:	kib (mentor)
2009-05-10 18:43:43 +00:00
Dmitry Chagin
1ca16454b3 Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by:	bde

Approved by:	kib (mentor)
2009-05-10 18:16:07 +00:00
Jun Kuriyama
b3b17597ea - Use "device\t" and "options \t" for consistency. 2009-05-10 00:00:25 +00:00
Ed Schouten
e5a34e5582 Regenerate system call tables to use SVN ids. 2009-05-08 20:16:04 +00:00
Ed Schouten
a1c45f054a Regenerate ibcs2 system call table. 2009-05-08 20:08:43 +00:00
Ed Schouten
51e6ba0f00 Burn TTY ioctl bridges in compat layers.
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.

The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
2009-05-08 20:06:37 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Jamie Gritton
7ae27ff49f Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions.  This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by:	dchagin, kib
Approved by:	bz (mentor)
2009-05-07 18:36:47 +00:00
Dmitry Chagin
13f20d7e86 To avoid excessive code duplication move MI definitions to the MI
header file. As it is defined in Linux.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-07 09:39:20 +00:00
Alexander Motin
614dd4f83c Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.
2009-05-05 01:13:20 +00:00
Jung-uk Kim
4ef853cc7f Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection.  On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place.  On amd64, all supported Intel CPUs should have this MSR.
2009-05-04 18:05:27 +00:00
Alexander Motin
64886da216 Oops, sorry. Fix for fix. 2009-05-04 08:41:54 +00:00
Alexander Motin
b5f9da0f8d There is no atrtc driver in pc98, so hide atrtcclock_disable variable usage
in APM driver for this platform. This should fix pc98 build.
2009-05-04 08:36:47 +00:00
Alexander Motin
1703f2b424 Rename statclock_disable variable to atrtcclock_disable that it actually is,
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.

Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.

This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
2009-05-03 17:47:21 +00:00
Kip Macy
51a70df2db fix XEN compilation 2009-05-02 22:22:00 +00:00
Alexander Motin
a40d9024df Add support for using i8254 and rtc timers as event sources for i386 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPI.
2009-05-02 12:59:47 +00:00
Dmitry Chagin
d789bfd562 Move extern variable definitions to the header file.
Approved by:	kib (mentor)
MFC after:	1 month
2009-05-02 10:06:49 +00:00
Alexander Motin
2f369c9496 Small addition to r191720.
Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.
2009-05-01 20:53:37 +00:00
Sam Leffler
dcad868984 o add uath
o sort usb wireless drivers
2009-05-01 17:20:16 +00:00
Alexander Motin
1ecff35a6b Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.
2009-05-01 17:05:49 +00:00
Dmitry Chagin
79262bf1f0 Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.

New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.

Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.

Approved by:	kib (mentor)
MFC after:	1 month
2009-05-01 15:36:02 +00:00
Jung-uk Kim
788399cbd9 - Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by:	pluknet (pluknet at gmail dot com)[1]
2009-04-30 22:10:04 +00:00
Jeff Roberson
82fcb0f192 - Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
 - Remove the cpu_cores/cpu_logical detection from identcpu.
 - Describe the layout of the system in cpu_mp_announce().

Sponsored by:   Nokia
2009-04-29 06:54:40 +00:00
John Baldwin
10395e0714 Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages.  The boundaries
  don't determine whether or not parts of a DMA request bounce.  Instead,
  they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
  pages are always page aligned.

Reviewed by:	scottl (amd64)
MFC after:	1 month
2009-04-23 20:24:19 +00:00