Commit Graph

1874 Commits

Author SHA1 Message Date
Marcel Moolenaar
191eae8259 Oops. The sec field of struct bintime is *not* a 32-bit type.
It's time_t, which is 64 bits on ia64.
2011-06-25 17:58:35 +00:00
Marcel Moolenaar
3ebd36e3d0 Define the minimum fractional period in terms of hz. We know hz is
a magnitude smaller than itc_freq. A minimum period of 10*hz is
sufficient precision. As a side-effect, the number of clocks per
second, when the machine is idle, dropped by more than 50%.
Be anal and define the maximum period to be at least 4G seconds.
With a 64-bit counter and an ITC frequency that's expected to be
always less than 4Ghz, it takes longer than that to wrap around.
2011-06-25 16:35:43 +00:00
Marcel Moolenaar
dbd57cbf8f Replace the original copyright notice with my own. Everything in
this file is written by me and has no bearing on the initial or
original version.
2011-06-25 03:43:58 +00:00
Marcel Moolenaar
3f5deb80b7 Update copyright. 2011-06-25 03:37:40 +00:00
Marcel Moolenaar
e920e3978e Switch to the event timers infrastructure. This includes:
o   Setting td_intr_frame to the XIVs trap frame because it's referenced
    by the ET event handler.
o   Signal EOI to the CPU before calling the registered XIV handlers.
    This prevents lost ITC interrupts, which cause starvation in one-shot
    mode.
o   Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o   Have the APs call cpu_initclocks() so as to limited the scattering of
    clock related initialization. cpu_initclocks() calls the <self>_bsp()
    or <self>_ap() version accordingly.
o   Uncomment the ET clock handling in cpu_idle().
o   Update the DDB 'show pcpu' output for the new MD fields.
o   Entirely rewritten ia64_ih_clock(). Note that we don't create as many
    clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
    We can only have 240 XIVs and we can have more CPUs than that. There's
    a single intrcnt index for the cumulative clock ticks and we keep per
    CPU counts in the PCPU stats structure.
o   Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).

Open issues:
o   Clock interrupts can still be lost. Some tweaking is still necessary.

Thanks to: mav@ for his support, feedback and explanations.

ET stats while committing:
eris% sysctl machdep.cpu | grep nclks

machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock                      108599         50
2011-06-25 02:15:14 +00:00
Marcel Moolenaar
ded49c2eae Unblock the outgoing thread after we performed pmap_switch() to
switch the region registers. pmap_switch() returns the pmap for
which the region register are currently programmed, which needs
to be re-programmed on the CPU the ougoing thread gets switched
in.  This change does not noticibly change anything or fix known
bugs, but does give me a warm fuzzy feeling by being more
correct.
2011-06-23 16:21:43 +00:00
Alan Cox
9ed9322551 Use a non-standard page size that is supported. 2011-06-21 12:38:40 +00:00
Marcel Moolenaar
218b719b74 Improve on style(9) 2011-06-17 05:30:12 +00:00
Marcel Moolenaar
acd1d4d28e Properly serialize the global shootdown with the instruction
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.
2011-06-17 04:26:03 +00:00
Marcel Moolenaar
52737ec557 Add the model number for the Montvale processor (marketed as Itanium 2 9100).
At this time we're missing just one: Tukwila (Itanium 2 9300).
2011-06-11 02:22:11 +00:00
Attilio Rao
5e9857e76b MFC 2011-06-07 08:24:29 +00:00
Marcel Moolenaar
28fb80aa8c Call set_cputicker() to have the time counter use the ITC register.
Note that the ITC frequency is fixed.
2011-06-07 01:06:49 +00:00
Attilio Rao
81c02539f1 MFC 2011-06-06 21:38:39 +00:00
Marcel Moolenaar
e726a6b70c Improve cpu_idle():
o   cpu_idle_hook is expected to be called with interrupts
    disabled and re-enables interrupts on return.
o   sync with x86: don't idle when the CPU has runnable tasks
o   have callers of ia64_call_pal_static() disable interrupts
    and re-enable interrupts.
o   add, but compile-out, support for idle mode. This will be
    enabled at some later time, after proper testing.
2011-06-06 19:06:15 +00:00
Attilio Rao
61b926921f MFC 2011-05-31 21:22:44 +00:00
Nathan Whitehorn
d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Attilio Rao
c02f1527a9 MFC 2011-05-14 19:20:13 +00:00
Marcel Moolenaar
767ca6ed1a Prefer switching the memory stack from user to kernel *before* switching
the register stack. While the ordering doesn't matter, it creates an
invariant not previously there: the memory stack pointer will always be
larger than the register stack pointer. With this invariant in place,
it's easier to add instrumentation code that detects a stack overflow
because in such a scenario the memory stack pointer and register stack
pointers have crossed each other.

Aside: basic kernel operation needs about half the stack size (~16K)
at most. We have plenty of head room on the kernel stack...
2011-05-14 14:55:15 +00:00
Marcel Moolenaar
65385d6d79 Sharpening the saw:
o   Clobber the register that holds the restart token immediately after
    crossing the restart point. This prevents false positives (i.e. a
    nested exception that we don't know can happen and that is being
    treated as one we know by virtue of a lingering restart token).
o   Now that the bootstrap kernel stack is free, switch onto it and call
    trap() for nested traps that we don't know about. In trap we panic()
    so that we can analyze the condition.
2011-05-14 14:47:19 +00:00
Marcel Moolenaar
7fb64531d3 Be pedantic: mark the pcpu pointer (= register r13) itself as volatile. 2011-05-14 14:40:24 +00:00
Marcel Moolenaar
dc03be9d67 Turn ia64_srlz() and ia64_srlz_i() into defines so that the code is
still correct when inlining is disabled.
2011-05-14 14:36:08 +00:00
Attilio Rao
b2aa562e7b MFC 2011-05-13 20:58:48 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Attilio Rao
bceb221f78 Fix remaining bits that actually weren't converted by mistake.
Reported by:	sbruno
2011-05-13 14:57:20 +00:00
Attilio Rao
b9f714be9f MFC 2011-05-07 23:34:14 +00:00
Marcel Moolenaar
a63f3da5b2 In pmap_kextract(), return the physical address for PBVM virtual
addresses as well (incl. the PBVM page table).
2011-05-07 17:23:13 +00:00
Attilio Rao
aa8b9e0706 MFC 2011-05-06 22:45:33 +00:00
John Baldwin
f9a9473702 Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus
versions instead.  They were never needed as bus_generic_intr() and
bus_teardown_intr() had been changed to pass the original child device up
in 42734, but the ISA bus was not converted to new-bus until 45720.
2011-05-06 13:48:53 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
Marcel Moolenaar
6dfe4f958f Don't use the whole region 5 for KVA, because the CPU may not implement all
of the 61 bits available within the region for virtual addressing.  Since
there's no good way for us to map out the gap in the virtual address space,
limit KVA to the architectural minimum implemented address bits. This still
gives us 1 petabyte of KVA, so no worries.
2011-05-02 17:49:05 +00:00
Marcel Moolenaar
7df304f3e0 Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.
2011-04-30 20:49:00 +00:00
John Baldwin
b67d11bbcc Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
  ~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
  rm_end to allow managing the full range (0 to ~0ul) if they are not set by
  the caller when rman_init() is called.
2011-04-29 18:41:21 +00:00
Attilio Rao
2be767e069 Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, des
MFC after:	2 weeks
2011-04-28 16:02:05 +00:00
Rick Macklem
4309e17add This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.
2011-04-27 17:51:51 +00:00
Marcel Moolenaar
682bf0a7e1 Remove prototypes of non-existent functions. 2011-04-25 22:38:09 +00:00
Alexander Motin
97b53e3634 Switch the GENERIC kernels for all architectures to the new CAM-based ATA
stack. It means that all legacy ATA drivers are disabled and replaced by
respective CAM drivers. If you are using ATA device names in /etc/fstab or
other places, make sure to update them respectively (adX -> adaY,
acdX -> cdY, afdX -> daY, astX -> saY, where 'Y's are the sequential
numbers for each type in order of detection, unless configured otherwise
with tunables, see cam(4)).

ataraid(4) functionality is now supported by the RAID GEOM class.
To use it you can load geom_raid kernel module and use graid(8) tool
for management. Instead of /dev/arX device names, use /dev/raid/rX.
2011-04-24 08:58:58 +00:00
Marcel Moolenaar
76ceb3c6ee Use the new arch_loadaddr I/F to align ELF objects to PBVM page
boundaries. For good measure, align all other objects to cache
lines boundaries.

Use the new arch_loadseg I/F to keep track of kernel text and
data so that we can wire as much of it as is possible. It is
the responsibility of the kernel to link critical (read IVT
related) code and data at the front of the respective segment
so that it's covered by TRs before the kernel has a chance to
add more translations.

Use a better way of determining whether we're loading a legacy
kernel or not. We can't check for the presence of the PBVM page
table, because we may have unloaded that kernel and loaded an
older (legacy) kernel after that. Simply use the latest load
address for it.
2011-04-03 23:49:20 +00:00
Konstantin Belousov
7332c129e0 Add support for executing the FreeBSD 1/i386 a.out binaries on amd64.
In particular:
- implement compat shims for old stat(2) variants and ogetdirentries(2);
- implement delivery of signals with ancient stack frame layout and
  corresponding sigreturn(2);
- implement old getpagesize(2);
- provide a user-mode trampoline and LDT call gate for lcall $7,$0;
- port a.out image activator and connect it to the build as a module
  on amd64.

The changes are hidden under COMPAT_43.

MFC after:   1 month
2011-04-01 11:16:29 +00:00
Alan Cox
5adad80656 Eliminate an unused definition.
Reviewed by:	marcel
2011-03-26 20:40:33 +00:00
Marcel Moolenaar
0355a8b24b Fix switching to physical mode as part of calling into EFI runtime
services or PAL procedures. The new implementation is based on
specific functions that are known to be called in certain scenarios
only. This in particular fixes the PAL call to obtain information
about translation registers. In general, the new implementation does
not bank on virtual addresses being direct-mapped and will work when
the kernel uses PBVM.

When new scenarios need to be supported, new functions are added if
the existing functions cannot be changed to handle the new scenario.
If a single generic implementation is possible, it will become clear
in due time.

While here, change bootinfo to a pointer type in anticipation of
future development.
2011-03-21 18:20:53 +00:00
Marcel Moolenaar
7c9eed5c4e Change region 4 to be part of the kernel. This serves 2 purposes:
1.  The PBVM is in region 4, so if we want to make use of it, we
    need region 4 freed up.
2.  Region 4 and above cannot be represented by an off_t by virtue
    of that type being signed. This is problematic for truss(1),
    ktrace(1) and other such programs.
2011-03-21 01:09:50 +00:00
Bjoern A. Zeeb
d2b74735b8 For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it.  LINT will
still build it.

While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases.  In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.

Discussed with:	qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR:		kern/148018, kern/155604, kern/144917, kern/146792
MFC after:	2 weeks
2011-03-19 15:50:34 +00:00
Marcel Moolenaar
9b0a458bcc o Move the IVT and supporting functions to the front of the text
segment so that it's always mapped by the loader.
o   Change the alternate fault handlers to account for PBVM. Since
    currently the region is handled by the VHPT, no alternate faults
    will be generated for it.
2011-03-18 22:45:43 +00:00
Marcel Moolenaar
7f806fe14e Remove inclusion of unneeded bootinfo.h header. 2011-03-18 22:33:19 +00:00
Marcel Moolenaar
45c0ab27b1 Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about
the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though
it's not used anywhere in the source tree.
2011-03-18 15:36:28 +00:00
Marcel Moolenaar
18d9407a9f MFaltix:
Add support for Pre-Boot Virtual Memory (PBVM) to the loader.

PBVM allows us to link the kernel at a fixed virtual address without
having to make any assumptions about the physical memory layout. On
the SGI Altix 350 for example, there's no usuable physical memory
below 192GB. Also, the PBVM allows us to control better where we're
going to physically load the kernel and its modules so that we can
make sure we load the kernel in memory that's close to the BSP.

The PBVM is managed by a simple page table. The minimum size of the
page table is 4KB (EFI page size) and the maximum is currently set
to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment
one can specify in a linker script. The bottom line is that PBVM is
between 64KB and 8GB in size.

The loader maps the PBVM page table at a fixed virtual address and
using a single translations. The PBVM itself is also mapped using a
single translation for a maximum of 32MB.

While here, increase the heap in the EFI loader from 512KB to 2MB
and set the stage for supporting relocatable modules.
2011-03-16 03:53:18 +00:00
Marcel Moolenaar
23ca55ccf8 Reserve 24KB for the static kernel stack. We use this stack to call into
the firmware for physical mode calls and the Altix doesn't work when it's
16KB.
2011-03-15 16:50:17 +00:00
Marcel Moolenaar
2133803023 Re-order the TRs so that there're less gaps. This could have a performance
impact.
2011-03-15 06:07:02 +00:00
Marcel Moolenaar
6f181a80f8 Don't define IA64_PBVM_PAGE_SIZE as 1U shifted to the left by some amount.
The compiler seems to assume it's a 32-bit integral and rounding to the
page size using the standard expression (((u_long)(x) + mask) & ~mask),
results in a 32-bit value. Dropping the 'U' suffix is enough to have the
compiler treat the expression as a 64-bit integral.
2011-03-14 23:49:41 +00:00
Marcel Moolenaar
c94018bcd8 o Deal with unmapped PBVM in the alternate instruction and data TLB fault
handlers.
o   Put the IVT in its own section and keep the supporting code close.
o   Make sure the VHPT is sized so that it can be mapped using a single
    translation.
o   Map the PAL code and VHPT with a translation that has the right size.
    Assume the platform has a PAL code size that can be mapped with a
    single translations.
o   Pass the pointer to the bootinfo structure as an argument to ia64_init().
o   Get rid of LOG2_ID_PAGE_SIZE and IA64_ID_PAGE_SIZE. It was used to map
    the regions 6 & 7 and was as large as possible. The problem is that we
    can't support memory attributes easily if the granuratity is not a page.
    We need to support memory attributes because the new USB stack violates
    the BUS_DMA(9) interface.
o   Update some comments...

NOTE:	this is broken for SMP kernels, because the AP startup code hasn't
	been updated yet.
2011-03-14 05:29:45 +00:00