freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	b8be27bf29	Revert part of r234723 by re-enabling the SMP protection for intr_bind() on x86. This has been requested by jhb and I strongly disagree with this, but as long as he is the x86 and interrupt subsystem maintainer I will follow his directives. The disagreement cames from what we should really consider as a public KPI. IMHO, if we really need a selection between the kernel functions, we may need an explicit protection like _KERNEL_KPI, which defines which subset of the kernel function might really be considered as part of the KPI (for thirdy part modules) and which not. As long as we don't have this mechanism I just consider any possible function as usable by thirdy part code, thus intr_bind() included. MFC after: 1 week	2012-05-03 21:44:01 +00:00
Attilio Rao	70dbd1604c	Clean up the intr* MD KPI from the SMP dependency, removing a cause of discrepancy between modules and kernel, but deal with SMP differences within the functions themselves. As an added bonus this also helps in terms of code readability. Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week	2012-04-26 20:24:25 +00:00
Andriy Gapon	f84633cdcc	bump INTRCNT_COUNT values to reflect actual numbers of IPI counters Maybe the numbers should be conditionalized on COUNT_IPIS Reviewed by: jhb MFC after: 1 week	2012-04-13 07:15:40 +00:00
John Baldwin	37b8ef16cd	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks	2009-10-15 14:54:35 +00:00
John Baldwin	cebc7fb16c	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)	2009-07-01 17:20:07 +00:00
John Baldwin	4e9dba6322	Fix kernels compiled without SMP support. Make intr_next_cpu() available for UP kernels but as a stub that always returns the single CPU's local APIC ID. Reported by: kib	2009-06-25 20:35:46 +00:00
John Baldwin	b4805f449c	- Restore the behavior of pre-allocating IDT vectors for MSI interrupts. This is mostly important for the multiple MSI message case where the IDT vectors for the entire group need to be allocated together. This also restores the assumptions made by the PCI bus code that it could invoke PCIB_MAP_MSI() once MSI vectors were allocated. - To avoid whiplash with CPU assignments, change the way that CPUs are assigned to interrupt sources on activation. Instead of assigning the CPU via pic_assign_cpu() before calling enable_intr(), allow the different interrupt source drivers to ask the MD interrupt code which CPU to use when they allocate an IDT vector. I/O APIC interrupt pins do this in their pic_enable_intr() routines giving the same behavior as before. MSI sources do it when the IDT vectors are allocated during msi_alloc() and msix_alloc(). - Change the intr_table_lock from an sx lock to a mutex. Tested by: rnoland	2009-06-25 18:13:46 +00:00
Joseph Koshy	bb471e3315	Improve robustness of NMI handling, for NMIs recognized in kernel mode. - Make the NMI handler run on its own stack (TSS_IST2). - Store the GSBASE value for each CPU just before the start of each NMI stack, permitting efficient retrieval using %rsp-relative addressing. - For NMIs taken from kernel mode, program MSR_GSBASE explicitly since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially invalid. The current contents of MSR_GSBASE are saved and restored at exit. - For NMIs handled from user mode, continue to use 'swapgs' to load the per-CPU GSBASE. Reviewed by: jeff Debugging help: jeff Tested by: gnn, Artem Belevich <artemb at gmail dot com>	2009-02-03 09:01:45 +00:00
Jeff Roberson	9c8e8e3aa7	- Allocate apic vectors on a per-cpu basis. This allows us to allocate more irqs as we have more cpus. This is principally useful on systems with msi devices which may want many irqs per-cpu. Discussed with: jhb Sponsored by: Nokia	2009-01-29 09:22:56 +00:00
John Baldwin	eaf86d1678	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
John Baldwin	fb610ca1f9	Minor fixes and tweaks to the x86 interrupt code: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. Originally I had this as a spin lock so interrupt code could use it to lookup sources. However, we don't actually do that because it would add a lot of overhead to interrupts, and if we ever do support removing interrupt sources, we can use other means to safely do so w/o locking in the interrupt handling code. - Replace is_enabled (boolean) with is_handlers (a count of handlers) to determine if a source is enabled or not. This allows us to notice when a source is no longer in use. When that happens, we now invoke a new PIC method (pic_disable_intr()) to inform the PIC driver that the source is no longer in use. The I/O APIC driver frees the APIC IDT vector when this happens. The MSI driver no longer needs to have a hack to clear is_enabled during msi_alloc() and msix_alloc() as a result of this change as well. - Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to complement apic_enable_vector() and use it in the I/O APIC and MSI code when freeing an IDT vector. - Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it creates new interrupt sources to let the nexus know about newly valid IRQs. Previously the msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx lock to a mutex. If we need to create new sources, drop the lock, create the required number of sources, then get the lock and try the allocation again.	2007-05-08 21:29:14 +00:00
John Baldwin	e706f7f0c7	Revamp the MSI/MSI-X code a bit to achieve two main goals: - Simplify the amount of work that has be done for each architecture by pushing more of the truly MI code down into the PCI bus driver. - Don't bind MSI-X indicies to IRQs so that we can allow a driver to map multiple MSI-X messages into a single IRQ when handling a message shortage. The changes include: - Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus to calculate the address and data values for a given MSI/MSI-X IRQ. The x86 nexus drivers map this into a call to a new 'msi_map()' function in msi.c that does the mapping. - Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index' parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge of the MSI-X index for a given MSI-X IRQ. - The PCI bus driver now stores more MSI-X state in a child's ivars. Specifically, it now stores an array of IRQs (called "message vectors" in the code) that have associated address and data values, and a small virtual version of the MSI-X table that specifies the message vector that a given MSI-X table entry uses. Sparse mappings are permitted in the virtual table. - The PCI bus driver now configures the MSI and MSI-X address/data registers directly via custom bus_setup_intr() and bus_teardown_intr() methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the address and data values for a given message as needed. The MD code no longer has to call back down into the PCI bus code to set these values from the nexus' bus_setup_intr() handler. - The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get new values of the address and data fields for a given IRQ. The x86 MSI code uses this when an MSI IRQ is moved to a different CPU, requiring a new value of the 'address' field. - The x86 MSI psuedo-driver loses a lot of code, and in fact the separate MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver since the only remaining diff between the two is a substring in a bootverbose printf. - The PCI bus driver will now restore MSI-X state (including programming entries in the MSI-X table) on device resume. - The interface for pci_remap_msix() has changed. Instead of accepting indices for the allocated vectors, it accepts a mini-virtual table (with a new length parameter). This table is an array of u_ints, where each value specifies which allocated message vector to use for the corresponding MSI-X message. A vector of 0 forces a message to not have an associated IRQ. The device may choose to only use some of the IRQs assigned, in which case the unused IRQs must be at the "end" and will be released back to the system. This allows a driver to use the same remap table for different shortage values. For example, if a driver wants 4 messages, it can use the same remap table (which only uses the first two messages) for the cases when it only gets 2 or 3 messages and in the latter case the PCI bus will release the 3rd IRQ back to the system. MFC after: 1 month	2007-05-02 17:50:36 +00:00
John Baldwin	4c5bec1161	Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid)) rather than local APIC IDs to keep track of CPUs which can handle interrupts.	2007-03-06 17:16:47 +00:00
Paolo Pisati	ef544f6312	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@	2007-02-23 12:19:07 +00:00
John Baldwin	5fe82bca57	Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support. - First off, device drivers really do need to know if they are allocating MSI or MSI-X messages. MSI requires allocating powerof2() messages for example where MSI-X does not. To address this, split out the MSI-X support from pci_msi_count() and pci_alloc_msi() into new driver-visible functions pci_msix_count() and pci_alloc_msix(). As a result, pci_msi_count() now just returns a count of the max supported MSI messages for the device, and pci_alloc_msi() only tries to allocate MSI messages. To get a count of the max supported MSI-X messages, use pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix(). pci_release_msi() still handles both MSI and MSI-X messages, however. As a result of this change, drivers using the existing API will only use MSI messages and will no longer try to use MSI-X messages. - Because MSI-X allows for each message to have its own data and address values (and thus does not require all of the messages to have their MD vectors allocated as a group), some devices allow for "sparse" use of MSI-X message slots. For example, if a device supports 8 messages but the OS is only able to allocate 2 messages, the device may make the best use of 2 IRQs if it enables the messages at slots 1 and 4 rather than default of using the first N slots (or indicies) at 1 and 2. To support this, add a new pci_remap_msix() function that a driver may call after a successful pci_alloc_msix() (but before allocating any of the SYS_RES_IRQ resources) to allow the allocated IRQ resources to be assigned to different message indices. For example, from the earlier example, after pci_alloc_msix() returned a value of 2, the driver would call pci_remap_msix() passing in array of integers { 1, 4 } as the new message indices to use. The rid's for the SYS_RES_IRQ resources will always match the message indices. Thus, after the call to pci_remap_msix() the driver would be able to access the first message in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based rather than 0-based so that they will always correspond to the rid values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt). To support this API, a new PCIB_REMAP_MSIX() method was added to the pcib interface to change the message index for a single IRQ. Tested by: scottl	2007-01-22 21:48:44 +00:00
John Baldwin	fde45e231a	Sort function prototypes.	2006-12-12 19:24:45 +00:00
John Baldwin	4184900911	MD support for PCI Message Signalled Interrupts on amd64 and i386: - Add a new apic_alloc_vectors() method to the local APIC support code to allocate N contiguous IDT vectors (aligned on a M >= N boundary). This function is used to allocate IDT vectors for a group of MSI messages. - Add MSI and MSI-X PICs. The PIC code here provides methods to manage edge-triggered MSI messages as x86 interrupt sources. In addition to the PIC methods, msi.c also includes methods to allocate and release MSI and MSI-X messages. For x86, we allow for up to 128 different MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs, 16-254 for APIC PCI IRQs, and IRQ 255 is reserved). - Add pcib_(alloc\|release)_msi[x]() methods to the MD x86 PCI bridge drivers to bubble the request up to the nexus driver. - Add pcib_(alloc\|release)_msi[x]() methods to the x86 nexus drivers that ask the MSI PIC code to allocate resources and IDT vectors. MFC after: 2 months	2006-11-13 22:23:34 +00:00
John Baldwin	520ffff83e	Change the x86 interrupt code to suspend/resume interrupt controllers (PICs) rather than interrupt sources. This allows interrupt controllers with no interrupt pics (such as the 8259As when APIC is in use) to participate in suspend/resume. - Always register the 8259A PICs even if we don't use any of their pins. - Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't included. - Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC on resume. This gets suspend/resume working with APIC on UP systems. SMP still needs more work to bring the APs back to life. The MFC after is tentative. Tested by: anholt (i386) Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3) MFC after: 1 week	2006-10-10 23:23:12 +00:00
John Baldwin	6e20fe33ba	Oops, fix sign bug in #ifdef for value of INTRCNT_COUNT. PR: kern/99870 Submitted by: jkim MFC after: 3 days	2006-10-10 19:26:35 +00:00
John Baldwin	215e7c161a	Rework how we wire up interrupt sources to CPUs: - Throw out all of the logical APIC ID stuff. The Intel docs are somewhat ambiguous, but it seems that the "flat" cluster model we are currently using is only supported on Pentium and P6 family CPUs. The other "hierarchy" cluster model that is supported on all Intel CPUs with local APICs is severely underdocumented. For example, it's not clear if the OS needs to glean the topology of the APIC hierarchy from somewhere (neither ACPI nor MP Table include it) and setup the logical clusters based on the physical hierarchy or not. Not only that, but on certain Intel chipsets, even though there were 4 CPUs in a logical cluster, all the interrupts were only sent to one CPU anyway. - We now bind interrupts to individual CPUs using physical addressing via the local APIC IDs. This code has also moved out of the ioapic PIC driver and into the common interrupt source code so that it can be shared with MSI interrupt sources since MSI is addressed to APICs the same way that I/O APIC pins are. - Interrupt source classes grow a new method pic_assign_cpu() to bind an interrupt source to a specific local APIC ID. - The SMP code now tells the interrupt code which CPUs are avaiable to handle interrupts in a simpler and more intuitive manner. For one thing, it means we could now choose to not route interrupts to HT cores if we wanted to (this code is currently in place in fact, but under an #if 0 for now). - For now we simply do static round-robin of IRQs to CPUs when the first interrupt handler just as before, with the change that IRQs are now bound to individual CPUs rather than groups of up to 4 CPUs. - Because the IRQ to CPU mapping has now been moved up a layer, it would be easier to manage this mapping from higher levels. For example, we could allow drivers to specify a CPU affinity map for their interrupts, or we could allow a userland tool to bind IRQs to specific CPUs. The MFC is tentative, but I want to see if this fixes problems some folks had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose interrupts). MFC after: 1 week	2006-02-28 22:24:55 +00:00
John Baldwin	333b8de537	MFi386: - Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in atpic entry points. - Move PCPU_* asm macros out of the middle of the asm profiling macros. - Pass IRQ vector argument as an int rather than void * to reduce diffs with i386. - EOI the lapic in C for the lapic timer handler. - GC unused Xcpuast function. - Split IPI_STOP handling code of ipi_nmi_handler() out into a cpustop_handler() function and call it from Xcpustop rather than duplicating all the logic in assembly. - Fixup the list of symbols with interrupt frames in ddb traces. Xatpic_fastintr* have never existed on amd64, and the lapic timer handler and various IPI handlers were missing. - Use trapframe instead of intrframe for interrupt entry points (on amd64 the interrupt vector was already a separate argument, so the two frames were already identical) and GC intrframe. Submitted by: peter (3)	2005-12-08 18:33:30 +00:00
John Baldwin	c7362ff7fb	Change the x86 code to allocate IDT vectors on-demand when an interrupt source is first enabled similar to how intr_event's now allocate ithreads on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we only have 191 available IDT vectors for I/O interrupts, this limited us to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC intpins. On many machines, however, each PCI-X bus has its own APIC even though it only has 1 or 2 devices, thus, we were reserving between 24 and 32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT vectors. Also, this change provides an API (apic_alloc_vector() and apic_free_vector()) that will allow a future MSI interrupt source driver to request IDT vectors for use by MSI interrupts on x86 machines. Tested on: amd64, i386	2005-11-02 20:11:47 +00:00
John Baldwin	e0f66ef861	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)	2005-10-25 19:48:48 +00:00
Peter Wemm	b6e89c6d47	JumboMFi386: use bitmapped IPI handler. Update elcr and default mptable config handler. Tidy up various local apic initialization.	2005-01-21 06:01:20 +00:00
Peter Wemm	1c0dea0f6e	Sync with i386 - Optimize intr_execute_handlers a bit etc.	2004-08-16 23:12:30 +00:00
Peter Wemm	463e5aa66e	MFi386: numerous interrupt and acpi updates	2004-05-16 20:30:47 +00:00
John Baldwin	3ab2ba59f4	Shuffle the APIC interrupt vectors around a bit: - Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff range. The pmap lazyfix IPI was reordered down next to the TLB shootdowns to avoid conflicting with the spurious interrupt vector. - Move the base of APIC interrupts up 16 so that the first 16 APIC interrupts do not overlap the vectors used by the ATPIC. - Remove bogus interrupt vector reservations for LINT[01]. - Now that 0xc0 - 0xef are available, use them for device interrupts. This increases the number of APIC device interrupts to 191. - Increase the system-wide number of global interrupts to 191 to catch up to more APIC interrupts. Requested by: peter (2)	2003-11-14 19:10:13 +00:00
John Baldwin	ecee5704ed	New device interrupt code. This defines an interrupt source abstraction that provides methods via a PIC driver to do things like mask a source, unmask a source, enable it when the first interrupt handler is added, etc. The interrupt code provides a table of interrupt sources indexed by IRQ numbers, or vectors. These vectors are what new-bus uses for its IRQ resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt code then maps that vector a given interrupt source object. When an interrupt comes in, the low-level interrupt code looks up the interrupt source for the source that triggered the interrupt and hands it off to this code to execute the appropriate handlers. By having an interrupt source abstraction, this allows us to have different types of interrupt source providers within the shared IRQ address space. For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1 through 60 may map to pins on various I/O APICs, and IRQs 120 through 128 may map to MSI interrupts for various PCI devices.	2003-11-03 21:25:52 +00:00

28 Commits