Commit Graph

4610 Commits

Author SHA1 Message Date
David Xu
b41f1452d9 Add scheduler CORE, the work I have done half a year ago, recent,
I picked it up again. The scheduler is forked from ULE, but the
algorithm to detect an interactive process is almost completely
different with ULE, it comes from Linux paper "Understanding the
Linux 2.6.8.1 CPU Scheduler", although I still use same word
"score" as a priority boost in ULE scheduler.

Briefly, the scheduler has following characteristic:
1. Timesharing process's nice value is seriously respected,
   timeslice and interaction detecting algorithm are based
   on nice value.
2. per-cpu scheduling queue and load balancing.
3. O(1) scheduling.
4. Some cpu affinity code in wakeup path.
5. Support POSIX SCHED_FIFO and SCHED_RR.
Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler
uses 256 priority queues. Unlike ULE which using pull and push, the
scheduelr uses pull method, the main reason is to let relative idle
cpu do the work, but current the whole scheduler is protected by the
big sched_lock, so the benefit is not visible, it really can be worse
than nothing because all other cpu are locked out when we are doing
balancing work, which the 4BSD scheduelr does not have this problem.
The scheduler does not support hyperthreading very well, in fact,
the scheduler does not make the difference between physical CPU and
logical CPU, this should be improved in feature. The scheduler has
priority inversion problem on MP machine, it is not good for
realtime scheduling, it can cause realtime process starving.
As a result, it seems the MySQL super-smack runs better on my
Pentium-D machine when using libthr, despite on UP or SMP kernel.
2006-06-13 13:12:56 +00:00
John Baldwin
e3d7caf487 Enable a few more things in x86 NOTES to get broader LINT coverage:
- Turn on iwi(4), ipw(4), and ndis(4) on amd64 and i386.
- Turn on ral(4) and ural(4) on i386, pc98, and amd64.
2006-06-12 20:38:17 +00:00
Alan Cox
b74a62d602 Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@
2006-06-12 20:05:27 +00:00
Warner Losh
78878cef94 Add the ability to subset the devices that UART pulls in. This allows
the arm to compile without all the extras that don't appear, at least
not in the flavors of ARM I deal with.  This helps us save about 100k.

If I've botched the available devices on a platform, please let me
know and I'll correct ASAP.
2006-06-12 04:21:50 +00:00
Alan Cox
ce142d9ec0 Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object.  Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@
2006-06-05 20:35:27 +00:00
Mike Silbersack
f25d341cfb After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter.  In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary.  The printf should
be sufficient motivation to get the driver glitch fixed.
2006-06-01 04:49:29 +00:00
Matt Jacob
aa57a87a56 Turn the panic on not being able to meet alignment constraints
in bus_dmamem_alloc into the more reasonable EINVAL return.

Also, reclaim memory allocated but then not used if we had
an error return.
2006-05-31 00:37:56 +00:00
Mike Silbersack
3d31890277 MFi386 rev 1.78:
Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.

Add a panic to detect cases where we've still failed to properly align.
2006-05-28 18:31:32 +00:00
Maxim Sobolev
aa1807d5d6 Move clock_lock prototype into <machine/clock.h>, where it is more
appropriate.

Discussed with:	jhb
2006-05-19 18:53:50 +00:00
Marius Strobl
8df071afd9 Add le(4). I could actually only test it on alpha, i386 and sparc64 but
given that this includes the more problematic platforms I see no reason
why it shouldn't also work on amd64 and ia64.
2006-05-17 20:45:45 +00:00
Poul-Henning Kamp
c40da00ca3 Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.
2006-05-16 14:37:58 +00:00
Ruslan Ermilov
155d9f6a98 Kill more references to lnc(4).
Submitted by:	grep(1)
2006-05-16 12:15:39 +00:00
Marius Strobl
055abe9af2 Remove some remnants of lnc(4). 2006-05-14 18:49:25 +00:00
Poul-Henning Kamp
5405ab4889 Clean out sysctl machdep.* related defines.
The cmos clock related stuff should really be in MI code.
2006-05-11 17:29:25 +00:00
Alexander Leidinger
ba5bd0001c regen (linux rt_sigpending) 2006-05-10 18:19:51 +00:00
Alexander Leidinger
17138b619c Implement rt_sigpending in the linuxolator.
PR:		92671
Submitted by:	Markus Niemist"o <markus.niemisto@gmx.net>
2006-05-10 18:17:29 +00:00
Doug Ambrisko
32397ce071 Add in linsysfs. A linux 2.6 like sys filesystem to pacify the Linux
LSI MegaRAID SAS utility.

Sponsored by:		IronPort Systems
Man page help from:	brueffer
2006-05-09 22:27:01 +00:00
Doug Ambrisko
387196bf56 Forgot the amd/linux32 part since sys/*/linux didn't match :-(
Pointed out by:	Alexander (thanks)
2006-05-06 17:26:45 +00:00
Sam Leffler
57d6ae0689 add ath and wlan crypto support
MFC after:	1 month
2006-05-03 18:15:36 +00:00
Scott Long
8d59dfff98 Allow bus_dmamap_load() to pass ENOMEM back to the caller. This puts it into
conformance with the mbuf and uio load routines.  ENOMEM can only happen
with BUS_DMA_NOWAIT is passed in, thus the deferals are disabled.  I don't
like doing this, but fixing this fixes assumptions in other important drivers,
which is a net benefit for now.
2006-05-03 04:14:17 +00:00
John Baldwin
2b8a339c7e Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after:	1 month
2006-05-01 22:07:00 +00:00
John Baldwin
4ac60df584 Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction.  This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after:	1 month
2006-05-01 21:36:47 +00:00
Alan Cox
e9ba21a5bb Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().
2006-04-29 00:59:15 +00:00
Marcel Moolenaar
64220a7e28 Rewrite of puc(4). Significant changes are:
o  Properly use rman(9) to manage resources. This eliminates the
   need to puc-specific hacks to rman. It also allows devinfo(8)
   to be used to find out the specific assignment of resources to
   serial/parallel ports.
o  Compress the PCI device "database" by optimizing for the common
   case and to use a procedural interface to handle the exceptions.
   The procedural interface also generalizes the need to setup the
   hardware (program chipsets, program clock frequencies).
o  Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
   default and non-serdev devices are handled by the bus.
o  Use the serdev I/F to collect interrupt status and to handle
   interrupts across ports in priority order.
o  Sync the PCI device configuration to include devices found in
   NetBSD and not yet merged to FreeBSD.
o  Add support for Quatech 2, 4 and 8 port UARTs.
o  Add support for a couple dozen Timedia serial cards as found
   in Linux.
2006-04-28 21:21:53 +00:00
Scott Long
27aafcda76 Enable the rr232x driver for amd64. 2006-04-28 05:23:10 +00:00
Alan Cox
7dece6c7d9 In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning.  The exception to this rule is the
eighth bit (0x080).  It is the PS bit in a PDE and the PAT bit in a
PTE.  This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb
2006-04-27 21:26:25 +00:00
Peter Wemm
0be8b8cee8 Move vm.pmap.pv_entry_count out from the PV_STATS ifdefs. It is always
available and is a real counter, not a statistic.
2006-04-26 21:34:07 +00:00
Jung-uk Kim
daea0aad84 Check if reported HTT cores are physical cores. This commit does not
affect AMD CPUs at all because HTT bit is disabled earlier.  Intel
multicore CPUs and ULE scheduler may be affected.
2006-04-25 00:06:37 +00:00
Jung-uk Kim
091c9b4961 Add another Intel CPU feature flag, xTPR (Send Task Priority Messages). 2006-04-24 22:56:57 +00:00
Jung-uk Kim
cf24d86bcc Check if deterministic cache parameters leaf is valid before use. 2006-04-24 22:23:52 +00:00
Colin Percival
8b4553119e Adjust dangerous-shared-cache-detection logic from "all shared data
caches are dangerous" to "a shared L1 data cache is dangerous".  This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.

As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.

Discussed with:	jhb, scottl
Security:	See FreeBSD-SA-05:09.htt for background material.
2006-04-24 21:17:01 +00:00
Xin LI
3b28c0c6f9 Move AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT below
their corresponding devices.
2006-04-24 08:44:34 +00:00
Peter Wemm
9bbf94367c Oops. Minidumps were developed on 6.x, in without the small pv entry code.
Add some strategic dump_add_page()/dump_drop_page() lines to include pv
chunks in the minidumps - these operate in the direct map region like UMA.
2006-04-21 04:50:18 +00:00
Peter Wemm
c0345a84aa Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory.  This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm.  These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process.  Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use.  Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump.  Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format.  At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap.  libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump.  While this is a best case, I expect minidumps
to be in the 100MB-500MB range.  Obviously, never larger than physical
memory of course.

minidumps are on by default.  It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.
2006-04-21 04:24:50 +00:00
Warner Losh
59b8f529ca Set the rid for a resoruce allocated with rman_reserve_resource. 2006-04-20 04:16:34 +00:00
Colin Percival
2652af563e Correct a local information leakage bug affecting AMD FPUs.
Security:	FreeBSD-SA-06:14.fpu
2006-04-19 07:00:19 +00:00
Peter Wemm
714d4fe9b6 If we're doing a try-alloc of a pv entry and give up early, do not forget
to reduce the pv_entry_count counter.  This was found by Tor Egge.  In the
same email, Tor also pointed out the pv_stats problem in the previous
commit, but I'd forgotten about it until I went looking for this email
about this allocation problem.
2006-04-18 20:17:32 +00:00
Peter Wemm
bac58593f1 pv_entry_count is more than a statistic. It is used for resource limiting.
Do not compile out its counter updates if pv entry stats are turned off.
2006-04-18 20:11:00 +00:00
Alan Cox
ad740f9081 Include opt_pmap.h for PMAP_SHPGPERPROC.
PR: 94509
2006-04-13 03:31:48 +00:00
Alan Cox
826c207263 Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap.  To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge
2006-04-12 04:22:52 +00:00
Paul Saab
d8636a9ab7 Hook bce up to the build 2006-04-10 20:04:22 +00:00
John Baldwin
907d4d7f45 Cache the value of the lower half of each I/O APIC redirection table entry
so that we only have to do an ioapic_write() instead of an ioapic_read()
followed by an ioapic_write() every time we mask and unmask level triggered
interrupts.  This cuts the execution time for these operations roughly in
half.

Profiled by:	Paolo Pisati <p.pisati@oltrelinux.com>
MFC after:	1 week
2006-04-05 20:43:19 +00:00
Peter Wemm
2e4548288a Convert pv_entry_frees and pv_entry_allocs stats counters from int to long,
they wrap way too quickly.
2006-04-04 20:17:35 +00:00
Marcel Moolenaar
b1fb1bb19a Sync with i386: Map exceptions to signals in gdb_cpu_signal() so
that kgdb(1) gets a SIGTRAP when it needs to.

Pointed out by: grehan@
2006-04-04 03:00:20 +00:00
Marcel Moolenaar
470d831703 The PC is register 16, not 18.
Pointed out by: grehan@
2006-04-04 02:44:51 +00:00
Marcel Moolenaar
bfcdefd8aa Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
2006-04-03 22:51:47 +00:00
Peter Wemm
68ac481184 Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses).  Each pv entry was linked to a vm_page_t and
a process's address space (pmap).  It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system.  A page ("pv chunk") is allocated and this provides
168 pv entries for that process.  We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process.  We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer.  After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.

Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
  the chunk pages and free any spares.  The stats (vm.pmap.*) show that
  this doesn't seem to be that much of a problem, but it can be done if
  needed.
* running low on pv entries is now a much bigger problem.  The old
  get_pv_entry() routine just needed to reclaim one other pv entry.
  Now, since they are per-process, we can only use pv entries that are
  assigned to our current process, or by stealing an entire page worth
  from another process.  Under normal circumstances, the pmap_collect()
  code should be able to dislodge some pv entries from the current
  process.  But if needed, it can still reclaim entire pv chunk pages
  from other processes.
* This should port to i386 really easily, except there it would reduce
  pv entries from 24 bytes to about 12 bytes.

(I have integrated Alan's recent changes.)
2006-04-03 21:36:01 +00:00
Peter Wemm
b9eee07e36 Remove the unused sva and eva arguments from pmap_remove_pages(). 2006-04-03 21:16:10 +00:00
Alan Cox
9c6a71e4ca Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry().  This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy().  The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated.  Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days
2006-04-02 05:45:05 +00:00
Maksim Yevmenkin
b643101293 Add kbdmux(4) to GENERIC on amd64
Requested by:	scottl
Tested by:	scottl
2006-03-31 23:04:48 +00:00