Commit Graph

4886 Commits

Author SHA1 Message Date
Alexander Kabaev
d586dea015 Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.
2007-05-19 05:03:59 +00:00
Alexander Kabaev
fa298d5ea8 Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.
2007-05-19 05:01:43 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
John Baldwin
19059a13ed Rework the support for ABIs to override resource limits (used by 32-bit
processes under 64-bit kernels).  Previously, each 32-bit process overwrote
its resource limits at exec() time.  The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process.  To fix this, don't
ovewrite the resource limits during exec().  Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit.  We then use this when querying and
setting resource limits.  Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children.  However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.

MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).

MFC after:	1 week
2007-05-14 22:40:04 +00:00
Alexander Kabaev
ec69a8a6d2 Do not dereference linux_to_bsd_signal[-1] if userland has
passed zero as exit signal.

GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.
2007-05-11 01:25:51 +00:00
Kevin Lo
00465d5ab3 Add wlan_amrr. ural(4) uses amrr as transmit rate control. 2007-05-10 01:39:50 +00:00
Scott Long
f73e86c383 It turns out that the hptiop driver isn't portable after all. Confine it to
amd64 and i386 for now.
2007-05-09 15:55:45 +00:00
Scott Long
4439f8b4b6 Introduce a driver for the Highpoint RocketRAID 3xxx series of controllers.
The driver relies on CAM.

Many thanks to Highpoint for providing this driver.
2007-05-09 07:07:26 +00:00
John Baldwin
2e025791ce Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
  systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
  indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
  starting any of the APs up rather than doing it while starting up the
  APs.  This step is now where we honor MAXCPU.

MFC after:	1 week
2007-05-08 22:01:04 +00:00
John Baldwin
fb610ca1f9 Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
  spin lock to protect intrcnt_index.  Originally I had this as a spin lock
  so interrupt code could use it to lookup sources.  However, we don't
  actually do that because it would add a lot of overhead to interrupts,
  and if we ever do support removing interrupt sources, we can use other
  means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
  determine if a source is enabled or not.  This allows us to notice when
  a source is no longer in use.  When that happens, we now invoke a new
  PIC method (pic_disable_intr()) to inform the PIC driver that the
  source is no longer in use.  The I/O APIC driver frees the APIC IDT
  vector when this happens.  The MSI driver no longer needs to have a
  hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
  of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
  complement apic_enable_vector() and use it in the I/O APIC and MSI code
  when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
  IRQ to its irq_rman.  The MSI code uses this when it creates new
  interrupt sources to let the nexus know about newly valid IRQs.
  Previously the msi_alloc() and msix_alloc() passed some extra stuff
  back to the nexus methods which then added the IRQs.  This approach is
  a bit cleaner.
- Change the MSI sx lock to a mutex.  If we need to create new sources,
  drop the lock, create the required number of sources, then get the lock
  and try the allocation again.
2007-05-08 21:29:14 +00:00
Paolo Pisati
bafe5a3118 Bring in the reminaing bits to make interrupt filtering work:
o push much of the i386 and amd64 MD interrupt handling code
  (intr_machdep.c::intr_execute_handlers()) into MI code
  (kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
  (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
  and make them part of 'struct intr_event', passing them as arguments to
  kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
  with filter and ithread functions.

Approved by: re (implicit?)
2007-05-06 17:02:50 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
John Baldwin
e706f7f0c7 Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
  pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
  multiple MSI-X messages into a single IRQ when handling a message
  shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
  to calculate the address and data values for a given MSI/MSI-X IRQ.
  The x86 nexus drivers map this into a call to a new 'msi_map()' function
  in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
  parameter from PCIB_ALLOC_MSIX().  MD code no longer has any knowledge
  of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
  Specifically, it now stores an array of IRQs (called "message vectors" in
  the code) that have associated address and data values, and a small
  virtual version of the MSI-X table that specifies the message vector
  that a given MSI-X table entry uses.  Sparse mappings are permitted in
  the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
  registers directly via custom bus_setup_intr() and bus_teardown_intr()
  methods.  pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
  address and data values for a given message as needed.  The MD code
  no longer has to call back down into the PCI bus code to set these
  values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
  code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
  new values of the address and data fields for a given IRQ.  The x86
  MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
  a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
  MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
  since the only remaining diff between the two is a substring in a
  bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
  entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed.  Instead of accepting
  indices for the allocated vectors, it accepts a mini-virtual table
  (with a new length parameter).  This table is an array of u_ints, where
  each value specifies which allocated message vector to use for the
  corresponding MSI-X message.  A vector of 0 forces a message to not
  have an associated IRQ.  The device may choose to only use some of the
  IRQs assigned, in which case the unused IRQs must be at the "end" and
  will be released back to the system.  This allows a driver to use the
  same remap table for different shortage values.  For example, if a driver
  wants 4 messages, it can use the same remap table (which only uses the
  first two messages) for the cases when it only gets 2 or 3 messages and
  in the latter case the PCI bus will release the 3rd IRQ back to the
  system.

MFC after:	1 month
2007-05-02 17:50:36 +00:00
Ariff Abdullah
1d80d190af Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep
local APIC timer alive.

Reviewed by:	jhb
PR:		i386/104678
MFC after:	3 days
2007-04-25 19:58:42 +00:00
John Baldwin
a5b6b9a68e Fix the triple fault used as a last resort during a reboot to actually
fault.  The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop.  The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction.  This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set.  Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.

The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault.  First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault.  Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.

MFC after:	3 days
2007-04-24 21:17:45 +00:00
John Baldwin
4cc968cb95 MFi386: Attempt to reset the machine using the Reset Control register and
Fast A20 and Init register if the keyboard reset doesn't work before
resorting to a triple fault.
2007-04-24 20:06:36 +00:00
Stephan Uphoff
31b4f4a916 Modify TLB invalidation handling.
Reviewed by:	alc@, peter@
MFC after:	1 week
2007-04-21 14:17:30 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Jung-uk Kim
f1753e0585 Fix style(9) and comments.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 20:12:05 +00:00
Jung-uk Kim
d477452eb3 style(9) says sizeof's are not be followed by a space. Fix them. 2007-04-18 18:11:32 +00:00
Jung-uk Kim
86a0e5dbb6 Implement settimeofday() for Linuxulator/amd64.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 18:08:12 +00:00
John Baldwin
88a5255bc4 Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after:	1 week
Requested by:	ariff
Reviewed by:	scottl
2007-04-17 21:05:34 +00:00
Alan Cox
0b76504872 Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@
2007-04-13 16:07:29 +00:00
Pawel Jakub Dawidek
fef2a25971 Remove trailing '.' for consistency! 2007-04-10 21:40:13 +00:00
Pawel Jakub Dawidek
57bcf75fd2 Add UFS_GJOURNAL options to the GENERIC kernel.
Approved by:	re (kensmith)
2007-04-10 16:49:41 +00:00
Jung-uk Kim
357afa7113 MFP4: Turn emul_lock into a mutex.
Submitted by:	rdivacky
2007-04-02 18:38:13 +00:00
Jung-uk Kim
46bd727a1e Correct BB-profiling and adjust comments.
Pointed out by:	bde
Reviewed by:	bde
2007-03-31 01:47:37 +00:00
Jung-uk Kim
6a4abad780 Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by:	bde
Discussed with:	kib
Reviewd by:	bde
2007-03-30 23:19:08 +00:00
Jung-uk Kim
80f87d5e55 Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.
Pointed out by:	bde[1]
2007-03-30 19:33:53 +00:00
Jung-uk Kim
6403d3a160 Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.
Pointed out by:	bde
2007-03-30 18:27:57 +00:00
Jung-uk Kim
b5def2b6b5 MFP4: Fix style(9) nits and grammar in comments. 2007-03-30 17:27:13 +00:00
Jung-uk Kim
5e397f16cd MFP4: 114193, 114194
Dont "return" in linux_clone() after we forked the new process in a case
of problems.  Move the copyout of p2->p_pid outside the emul_lock coverage.

Submitted by:	Roman Divacky
2007-03-30 17:16:51 +00:00
Jung-uk Kim
a328699b34 MFP4: Linux futex support for amd64.
Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by:	emulation
2007-03-30 01:07:28 +00:00
Jung-uk Kim
3a33908404 Regen for set_thread_area. 2007-03-30 00:08:21 +00:00
Jung-uk Kim
9c5b213e51 MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.
Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by:	emulation
2007-03-30 00:06:21 +00:00
Julian Elischer
6734f35eac Implement the openat() linux syscall
Submitted by:	Roman Divacky (rdivacky@)
MFC after:	2 weeks
2007-03-29 02:11:46 +00:00
Kris Kennaway
67eae018cb Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by:	jhb
2007-03-26 21:45:44 +00:00
Nate Lawson
0d4ac62a35 Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change.  cpufreq_post_change provides the results of the
change (success or failure).  cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed.  Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value.  When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq.  This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by:	bde, phk
MFC after:	1 month
2007-03-26 18:03:29 +00:00
Jung-uk Kim
2be4e4713a Catch up with ACPI-CA 20070320 import. 2007-03-22 18:16:43 +00:00
John Baldwin
d66ff27773 Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first.  nyan@ has already fixed all of the PC-98
drivers.
2007-03-21 15:36:38 +00:00
John Baldwin
b8783b00f8 Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system.  Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
2007-03-20 21:53:31 +00:00
John Baldwin
95a07592ee Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM.  On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions.  On i386 ram0 uses the
dump_avail[] array.  Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.
2007-03-20 21:08:39 +00:00
Jung-uk Kim
2498f259d4 - Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.
2007-03-20 20:22:45 +00:00
John Baldwin
ce533e82a2 Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5.  acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.
2007-03-20 20:21:44 +00:00
John Baldwin
86f07bb052 MFi386 1.173: Display two new Intel feature bits. 2007-03-20 18:48:04 +00:00
Jung-uk Kim
ab5916a526 Add another CPUID for AMD CPUs and fix style(9) while I am here. 2007-03-12 20:27:21 +00:00
Alan Cox
c640357f04 Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file.  Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb
2007-03-11 05:54:29 +00:00
Alan Cox
a2a90145e8 Completely eliminate "avail_start". It serves no useful purpose. 2007-03-10 20:26:43 +00:00
John Baldwin
07bb813626 Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf.  Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.
2007-03-09 15:49:57 +00:00
Mohan Srinivasan
f9bb753844 Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
2007-03-09 04:02:38 +00:00