freebsd-dev

Author	SHA1	Message	Date
Marius Strobl	fafda37b15	In total store which we use for running the kernel and all of the userland atomic operations behave as if the were followed by a memory barrier so there's no need to include ones in the acquire variants of atomic(9). Removing these results a small performance improvement, specifically this is sufficient to compensate the performance loss seen in the worldstone benchmark seen when using SCHED_ULE instead of SCHED_4BSD. This change is inspired by Linux even more radically doing the equivalent thing some time ago. Thanks go to Peter Jeremy for additional testing.	2011-10-01 00:11:03 +00:00
Marius Strobl	ade68e910d	Use the extended integer condition code when comparing 64-bit values. Given that ATOMIC_INC_LONG currently is unused this happened to not be fatal.	2011-09-30 20:13:51 +00:00
Marius Strobl	6fd7e2b7c6	- Right-justify backslashes as suggested by style(9). - Rename ATOMIC_INC_ULONG to ATOMIC_INC_LONG in order to be consistent with the names of the other macros in this file an adjust accordingly.	2011-09-30 20:06:23 +00:00
Marius Strobl	1b57ae60a7	Merge from r224217: Bump MAXCPU to 64. Approved by: re (kib)	2011-07-20 18:51:18 +00:00
Attilio Rao	68b739cd6f	Add the possibility to specify from kernel configs MAXCPU value. This patch is going to help in cases like mips flavours where you want a more granular support on MAXCPU. No MFC is previewed for this patch. Tested by: pluknet Approved by: re (kib)	2011-07-19 00:37:24 +00:00
Marius Strobl	0e5b645f76	- pmap_cache_remove() and pmap_protect_tte() are only used within pmap.c so static'ize them. - Correct a typo.	2011-07-05 18:50:40 +00:00
Marius Strobl	4a35efc720	- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from the TLBs in order to get rid of the user mappings but instead traverse them an flush only the latter like we also do for the Spitfire-class. Also flushing the unlocked kernel entries can cause instant faults which when called from within cpu_switch() are handled with the scheduler lock held which in turn can cause timeouts on the acquisition of the lock by other CPUs. This was easily seen with a 16-core V890 but occasionally also happened with 2-way machines. While at it, move the SPARC64-V support code entirely to zeus.c. This causes a little bit of duplication but is less confusing than partially using Cheetah-class bits for these. - For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024- entry, 2-way set associative TLB. - In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back. Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)	2011-07-02 11:14:54 +00:00
Marius Strobl	915d84ba38	Fix whitespace	2011-06-21 20:50:55 +00:00
Marius Strobl	0e3d1b3853	On machines where we don't need to lock the kernel TSB into the dTLB and thus may basically use the entire 64-bit kernel address space reduce VM_KMEM_SIZE_SCALE to 1 allowing kernel to use more memory.	2011-06-21 20:48:14 +00:00
Marius Strobl	82f131f39b	Don't include curcpu in the mask which is used as the IPI cookie as we have to ignore it when sending the IPI anyway. Actually I can't think of a good reason why this ever was done that way in the first place as it's not even usefull for debugging. While at it replace the use of pc_other_cpus as it's slated for deorbit.	2011-06-15 22:41:55 +00:00
Marius Strobl	c40847145b	Adapt CATR() to r222813. This is somewhat tricky as we can't afford using more than three temporary register in several places CATR() is used so this code trades instructions in for registers. Actually, this still isn't sufficient and CATR() has the side-effect of clobbering %y. Luckily, with the current uses of CATR() this either doesn't matter or we are able to (save and) restore it. Now that there's only one use of AND() and TEST() left inline these.	2011-06-07 17:33:39 +00:00
Attilio Rao	e370959707	Fix KTR_CPUMASK in order to accept a string representing a cpuset_t. This introduce all the underlying support for making this possible (via the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64 implements its own assembly primitives for tracing events and needs to properly check it. Anyway the sparc64 logic is not implemented yet due to lack of knowledge (by me) and time (by marius), but it is just a matter of using ktr_cpumask when possible. Tested and fixed by: pluknet Reviewed by: marius	2011-05-31 20:48:58 +00:00
Attilio Rao	d0984adc98	Revert a change that crept in during MFC.	2011-05-31 20:23:33 +00:00
Attilio Rao	5b6ea0b538	MFC	2011-05-31 14:18:10 +00:00
Attilio Rao	217e1c0ebc	Revert a patch that unvolountary sneaked in while I was MFCing.	2011-05-23 23:50:21 +00:00
Attilio Rao	a9ff18a210	MFC	2011-05-23 01:17:30 +00:00
Attilio Rao	b2aa562e7b	MFC	2011-05-13 20:58:48 +00:00
Matthew D Fleming	cfb00e5aa7	Move the ZERO_REGION_SIZE to a machine-dependent file, as on many architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853	2011-05-13 19:35:01 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Marius Strobl	0fd4b3388e	The ita_mask should include curcpu but the cpuset passed to cpu_ipi_selected() must not, otherwise we tell the CPU to IPI itself, which the sun4u CPUs don't support. For reasons unknown so far MD and MI IPI use actually still triggers that assertion though.	2011-05-11 21:15:12 +00:00
Marius Strobl	707b4f4479	Add an ATOMIC_CLEAR_LONG.	2011-05-10 21:18:45 +00:00
Attilio Rao	0d9fa7bd31	Add sparc64 support. Compiled (and helped) by: pluknet	2011-05-06 21:53:29 +00:00
Marius Strobl	39272630aa	Correct spelling in comments. Submitted by: brucec	2011-04-22 09:31:40 +00:00
Marius Strobl	3a8a826af3	Remove the advertising clause from the UCB license according to the July 22, 1999 addendum.	2011-03-13 13:42:43 +00:00
Marius Strobl	273fb3dc7c	Sync licenses and the corresponding RCS IDs with NetBSD, mainly switching the licenses of Matthew R. Green and the TNF to 2-clause. Obtained from: NetBSD	2011-03-12 14:33:32 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Alan Cox	e6ffa21488	Remove pmap fields that are either unused or not fully implemented. Discussed with: kib	2011-02-17 15:36:29 +00:00
Jung-uk Kim	2fea643112	Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set(). Compile sys/dev/mem/memutil.c for all supported platforms and remove now unnecessary dev_mem_md_init(). Consistently define mem_range_softc from mem.c for all platforms. Add missing #include guards for machine/memdev.h and sys/memrange.h. Clean up some nearby style(9) nits. MFC after: 1 month	2011-01-17 22:58:28 +00:00
Konstantin Belousov	50a57dfbec	Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h. Update the outdated comments describing MAXSLP and the process selection algorithm for swap out. Comments wording and reviewed by: alc	2011-01-09 12:50:44 +00:00
David Schultz	633bd99821	Fix the value for DECIMAL_DIG on UltraSparcs. The previous value of 35 wasn't quite big enough to ensure correct rounding for very-close- to-halfway cases.	2011-01-09 06:05:48 +00:00
Tijl Coosemans	a56e818f29	On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and corresponding macros) are different from 32 bit. [1] Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX. Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition for (u)intmax_t. Do this on all architectures for consistency. Suggested by: bde [1] Approved by: kib (mentor)	2011-01-08 12:43:05 +00:00
Tijl Coosemans	9858863cd4	Fix types of some values in machine/_limits.h. On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int. However, lacking integer suffixes for types smaller than int, their type should correspond to that of an object of type unsigned char (or short) when used in an expression with objects of type int. In that case unsigned char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and USHRT_MAX should also be int. Where MIN/MAX constants implicitly have the correct type the suffix has been removed. While here, correct some comments. Reviewed by: bde Approved by: kib (mentor)	2011-01-08 11:13:34 +00:00
Konstantin Belousov	39198f15ee	Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the initial stack protection set by the kernel image activator.	2011-01-07 14:22:34 +00:00
Marius Strobl	f4ff513c4b	Reserve INTR_MD[1-4] similarly to what BUS_DMA_BUS[1-4] are intended for and switch sparc64 to use the first one for bus error filter handlers of bridge drivers instead of (ab)using INTR_FAST for that so we eventually can get rid of the latter. Reviewed by: jhb MFC after: 1 month	2011-01-04 16:11:32 +00:00
Marius Strobl	4d05e7b184	On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS, which takes an physical address instead of an virtual one, for loading TTEs of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB, which only has a very limited number of lockable dTLB slots. The net result is that we now basically can handle a kernel TSB of any size and no longer need to limit the kernel address space based on the number of dTLB slots available for locked entries. Consequently, other parts of the trap handlers now also only access the the kernel TSB via its physical address in order to avoid nested traps, as does the PMAP bootstrap code as we haven't taken over the trap table at that point, yet. Apart from that the kernel TSB now is accessed via a direct mapping when we are otherwise taking advantage of ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this is implemented by extending the patching of the TSB addresses and mask as well as the ASIs used to load it into the trap table so the runtime overhead of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS is not yet enabled on SPARC64 CPUs due to lack of testing and due to the fact it might require minor adjustments there. Theoretically it should be possible to use the same approach also for the user TSB, which already is not locked into the dTLB, avoiding nested traps. However, for reasons I don't understand yet OpenSolaris only does that with SPARC64 CPUs. On the other hand I think that also addressing the user TSB physically and thus avoiding nested traps would get us closer to sharing this code with sun4v, which only supports trap level 0 and 1, so eventually we could have a single kernel which runs on both sun4u and sun4v (as does Linux and OpenBSD). Developed at and committed from: 27C3	2010-12-29 16:59:33 +00:00
Marius Strobl	62cf53e2ea	- Move the macros for generating load and store instructions to asmacros.h so they can be shared by different source files and extend them by a variant for atomic compare and swap. - Consistently use EMPTY.	2010-12-29 14:14:50 +00:00
Marius Strobl	b5b0068b4b	Rename the "xor" parameter to "xorval" as the former is a reserved keyword in C++. Submitted by: gahr	2010-12-29 14:11:46 +00:00
Marius Strobl	05bcfef170	Extend the hack of r182730 to trick GAS/GCC into compiling access to STICK/STICK_COMPARE independently of the selected instruction set by TICK_COMPARE so tick.c as of r214358 once again can be compiled with gcc -mcpu=v9 for reference purposes.	2010-12-21 22:03:12 +00:00
Marius Strobl	3318c3ef45	Revert r216080 so kmem_map is capped at 3/5 of the currently rather modest kernel address space in order to leave space for the buffer cache, pipes, thread stacks, etc on machines with more physical memory until we take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs providing it so we don't need to lock the kernel TSB pages into the dTLB, basically making the entire 64-bit kernel address space available on relevant machines. Submitted by: alc	2010-12-21 21:32:17 +00:00
Rebecca Cran	c90f7d9b44	Revert r216134. This checkin broke platforms where bus_space are macros: they need to be a single statement, and do { } while (0) doesn't work in this situation so revert until a solution can be devised.	2010-12-03 07:09:23 +00:00
Rebecca Cran	15b4888a24	Disallow passing in a count of zero bytes to the bus_space(9) functions. Passing a count of zero on i386 and amd64 for [I386\|AMD64]_BUS_SPACE_MEM causes a crash/hang since the 'loop' instruction decrements the counter before checking if it's zero. PR: kern/80980 Discussed with: jhb	2010-12-02 22:19:30 +00:00
Max Khon	5bdabbdbf8	Change VM_KMEM_SIZE_MAX to be just (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) Suggested by: marius	2010-11-30 16:49:06 +00:00
Max Khon	311e93395e	Define VM_KMEM_SIZE_MAX on sparc64. Otherwise kernel built with DEBUG_MEMGUARD panics early in kmeminit() with the message "kmem_suballoc: bad status return of 1" because of zero "size" argument passed to kmem_suballoc() due to "vm_kmem_size_max" being zero. The problem also exists on ia64.	2010-11-28 19:26:20 +00:00
Alan Cox	2cf36c8f67	Enable reservation-based physical memory allocation. Even without the creation of large page mappings in the pmap, it can provide modest performance benefits. In particular, for a "buildworld" on a 2x 1GHz Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system time by 12.6%. Tested by: marius@	2010-11-10 17:57:34 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Marius Strobl	10c2bb0a10	- Wrap exchanging td_intr_frame and calling the event timer callback in a critical section as apparently required by both. I don't think either belongs in the event timer front-ends but the callback should handle this as necessary instead just like for example intr_event_handle() does but this is how the other architectures currently handle it, either explicitly or implicitly. - Further rename and reword references to hardclock as this front-end no longer has a notion of actually calling it.	2010-10-19 19:44:05 +00:00
Marius Strobl	1fe259cdd8	In the replacement text of the __bswapN_const() macros cast the argument to the expected type so they work like the corresponding __bswapN_var() functions and the compiler doesn't complain when arguments of different width are passed.	2010-10-08 14:59:45 +00:00
Marius Strobl	2c55431721	Add a VIS-based block copy function for SPARC64 V and later, which additionally takes advantage of the prefetch cache of these CPUs. Unlike the uncommitted US-III version, which provide no measurable speedup or even resulted in a slight slowdown on certain CPUs models compared to using the US-I version with these, the SPARC64 version actually results in a slight improvement.	2010-09-15 21:44:31 +00:00
Marius Strobl	c1769fad32	Add macros for alternate entry points.	2010-09-15 21:11:29 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
John Baldwin	60c7b36b7a	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.	2010-08-11 23:22:53 +00:00
Marius Strobl	553cf1a13c	- As it is not possible for sched_bind(9) to context switch with td_critnest > 1 when not already running on the desired CPU read the TICK counter of the BSP via a direct cross trap request in that case instead. - Treat the STICK based timecounter the same way as the TICK based one regarding its quality and obtaining the counter value from the BSP. Like the TICK timers the STICK ones also are only synchronized during their startup (which might not result in good synchronicity in the first place) but not afterwards and might drift over time, causing problems when the time is read from different CPUs (see r135972).	2010-08-08 14:00:21 +00:00
Marius Strobl	dfab2088e8	- Introduce a cpu_ipi_single() function pointer in order to send IPIs to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs. Besides being used to implement the ipi_cpu() introduced in r210939, cpu_ipi_single() will also be used internally by the sparc64 MD code. - Factor out the Jalapeno support from the Cheetah IPI send functions in order to be able to more easily and efficiently implement support for more than 32 target CPUs as well as a workaround for Cheetah+ erratum 25 for the latter.	2010-08-08 00:09:22 +00:00
Marius Strobl	820a9ea5cb	For CPUs which ignore TD_CV and support hardware unaliasing don't bother doing page coloring. This results in a small but measurable performance improvement in buildworld times.	2010-08-08 00:01:08 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Alexander Motin	6c8dd81fa9	Adapt sparc64 and sun4v timer code for the new event timers infrastructure. Reviewed by: marius@	2010-07-29 12:08:46 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Attilio Rao	651aa2d896	KTR_CTx are long time aliased by existing classes so they can't serve their purpose anymore. Axe them out. Sponsored by: Sandvine Incorporated Discussed with: jhb, emaste Possible MFC: TBD	2010-07-21 10:05:07 +00:00
Alexander Motin	a448e0d827	Allocate proper ammount of memory for interrupt names on sparc64 and sun4v, same as done on other architectures. This removes garbage from `vmstat -ia` output. Reviewed by: marius@	2010-07-16 22:09:29 +00:00
Marius Strobl	9f7666cebe	- Pin the IPI cache and TLB demap functions in order to prevent migration between determining the other CPUs and calling cpu_ipi_selected(), which apart from generally doing the wrong thing can lead to a panic when a CPU is told to IPI itself (which sun4u doesn't support). Reported and tested by: Nathaniel W Filardo - Add __unused where appropriate. MFC after: 3 days	2010-07-04 12:43:12 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
Marius Strobl	4461491b3e	Change ad_firmware_geom_adjust() to operate on a struct disk * only and hook it up to ada(4) also. While at it, rename ad_firmware_geom_adjust() to ata_disk_firmware_geom_adjust() etc now that these are no longer limited to ad(4). Reviewed by: mav MFC after: 3 days	2010-05-20 12:46:19 +00:00
Marius Strobl	5a8336816e	Add support for SPARC64 V (and where it already makes sense for other HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the MMU and cache handling, it doesn't add pmap optimizations possible with these CPU, yet, though. With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250 and likely also other models based on SPARC64 V like 450, 650 and 850. Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.	2010-05-02 19:38:17 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Konstantin Belousov	8bac98182a	Style: use #define<TAB> instead of #define<SPACE>. Noted by: bde, pluknet gmail com MFC after: 11 days	2010-04-27 09:48:43 +00:00
Marius Strobl	e2f198273c	Add OF_getscsinitid(), a helper similar to OF_getetheraddr() but for obtaining the initiator ID to be used for SPI controllers from the Open Firmware device tree.	2010-04-26 19:13:10 +00:00
Konstantin Belousov	ed7806879b	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
Marius Strobl	3c7ae7bf67	Update for UltraSPARC-IV{,+} and SPARC64 V, VI, VII and VIIIfx CPUs.	2010-04-11 15:35:17 +00:00
Marius Strobl	5679850859	Correct the DCR_IPE macro to refer to the right bit. Also improve the associated comment as besides US-IV+ these bits are only available with US-III++, i.e. the 1.2GHz version of the US-III+.	2010-04-10 11:13:51 +00:00
Marius Strobl	07e6e81e2f	- The firmware of Sun Fire V1280 has a misfeature of setting %wstate to 7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into it which totally screws us even when restoring %wstate afterwards as spill/fill traps can happen while in OFW. The rather hackish OpenBSD approach of just setting the equivalent of WSTATE_KERNEL to 7 also is no option as we treat %wstate as a bit field. So in order to deal with this problem actually implement spill/fill handlers for %wstate 7 which just act as the WSTATE_KERNEL ones except of theoretically also handling 32-bit, turn off interrupts completely so we don't even take IPIs while in OFW which should ensure we only take spill/fill traps at most and restore %wstate after calling into OFW once we have taken over the trap table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into OFW just like OpenSolaris does, which should at least help testing this change on non-V1280. - Remove comments referring to the %wstate usage in BSD/OS. - Remove the no longer used RSF_ALIGN_RETRY macro. - Correct some trap table addresses in comments. - Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table. - Ensure PSTATE_AM is off when entering or exiting to OFW as well as that interrupts are also completely off when exiting to OFW as the firmware trap table shouldn't be used to handle our interrupts.	2010-03-21 13:09:54 +00:00
Marius Strobl	ddcc3ff59e	o Add support for UltraSparc-IV+: - Swap the configuration of the first and second large dTLB as with US-IV+ these can only hold entries of certain page sizes each, which we happened to chose the non-working way around. - Additionally ensure that the large iTLB is set up to hold 8k pages (currently this happens to be a NOP though). - Add a workaround for US-IV+ erratum #2. - Turn off dTLB parity error reporting as otherwise we get seemingly false positives when copying in the user window by simulating a fill trap on return to usermode. Given that these parity errors can be avoided by disabling multi issue mode and the problem could be reproduced with a second machine this appears to be a silicon bug of some sort. - Add a membar #Sync also before the stores to ASI_DCACHE_TAG. While at it, turn of interrupts across the whole cheetah_cache_flush() for simplicity instead of around every flush. This should have next to no impact as for cheetah-class machines we typically only need to flush the caches a few times during boot when recovering from peeking/poking non-existent PCI devices, if at all. - Just use KERNBASE for FLUSH as we also do elsewhere as the US-IV+ documentation doesn't seem to mention that these CPUs also ignore the address like previous cheetah-class CPUs do. Again the code changing LSU_IC is executed seldom enough that the negligible optimization of using %g0 instead should have no real impact. With these changes FreeBSD runs stable on V890 equipped with US-IV+ and -j128 buildworlds in a loop for days are no problem. Unfortunately, the performance isn't were it should be as a buildworld on a 4x1.5GHz US-IV+ V890 takes nearly 3h while on a V440 with (theoretically) less powerfull 4x1.5GHz US-IIIi it takes just over 1h. It's unclear whether this is related to the supposed silicon bug mentioned above or due to another issue. The documentation (which contains a sever bug in the description of the bits added to the context registers though) at least doesn't mention any requirements for changes in the CPU handling besides those implemented and the cache as well as the TLB configurations and handling look fine. o Re-arrange cheetah_init() so it's easier to add support for SPARC64 V up to VIIIfx CPUs, which only require parts of this initialization.	2010-03-17 22:45:09 +00:00
Marius Strobl	bc11f2d90f	Add macros for the VER.impl of SPARC64 II to VIIIfx.	2010-03-17 21:00:39 +00:00
Marius Strobl	319efdb1cc	- Add TTE and context register bits for the additional page sizes supported by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs. - Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK which just is the complement of TLB_CXR_CTX_MASK instead of trying to assemble it from the page size bits which vary across CPUs. - Add macros for the remainder of the SFSR bits, which are useful for at least debugging purposes.	2010-03-17 20:23:14 +00:00
Joel Dahl	1edcf74de7	The NetBSD Foundation has granted permission to remove clause 3 and 4 from the software. Obtained from: NetBSD	2010-03-03 17:55:51 +00:00
Marius Strobl	ba96c16ae8	Some machines can not only consist of CPUs running at different speeds but also of different types, f.e. Sun Fire V890 can be equipped with a mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization and different workarounds for model specific errata. Therefore move the CPU implementation number from a global variable to the per-CPU data. Functions which are called before the latter is available are passed the implementation number as a parameter now. This file was missed in r204152.	2010-02-21 09:25:53 +00:00
Marius Strobl	9b824f84d5	Some machines can not only consist of CPUs running at different speeds but also of different types, f.e. Sun Fire V890 can be equipped with a mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization and different workarounds for model specific errata. Therefore move the CPU implementation number from a global variable to the per-CPU data. Functions which are called before the latter is available are passed the implementation number as a parameter now.	2010-02-20 23:24:19 +00:00
Marius Strobl	a675da796f	Predict KASSERTs to be true.	2010-02-13 19:17:06 +00:00
Marius Strobl	527eebfeeb	- Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores to the exclusion lists as the CPU nodes aren't handled as regular devices either. Also add the pseudo-devices found in Sun Fire V1280. - Allow nexus_attach() and nexus_alloc_resource() to be used by drivers derived from nexus(4) for subordinate busses. - Don't add the zero-sized memory resources of glue devices to the resource lists.	2010-02-13 18:51:49 +00:00
Marius Strobl	ac144cadbc	Resurrect nexusvar.h from r167307.	2010-02-13 18:18:45 +00:00
Marius Strobl	c61b6da840	- Search the whole OFW device tree instead of only the children of the root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu' nodes hang off of from 'cmp' (chip multi-threading processor) or 'core' or combinations thereof. Also in large UltraSPARC III based machines the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which group snooping-coherency domains together instead of directly from the nexus. It would be great if we could use newbus to deal with the different ways the 'cpu' devices can hang off of pseudo ones but unfortunately both cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device probing. - Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these are multi-core each CPU has two Fireplane config registers and thus the module/target ID has to be determined differently so the one specific to a certain core is used. Similarly, starting with UltraSPARC IV the individual cores use a different property in the OFW device tree to indicate the CPU/core ID as it no longer is in coincidence with the shared slot/socket ID. This involves changing the MD KTR code to not directly read the UPA module ID either. We use the MID stored in the per-CPU data instead of calling cpu_get_mid() as a replacement in order prevent clobbering any registers as side-effect in the assembler version. This requires CATR() invocations from mp_startup() prior to mapping the per-CPU pages to be removed though. While at it additionally distinguish between CPUs with Fireplane and JBus interconnects as these also use slightly different sizes for the JBus/agent/module/target IDs. - Make sparc64_shutdown_final() static as it's not used outside of machdep.c.	2010-02-13 16:52:33 +00:00
Marius Strobl	4f607e8ef2	- Assert that HEAPSZ is a multiple of PAGE_SIZE as at least the firmware of Sun Fire V1280 doesn't round up the size itself but instead lets claiming of non page-sized amounts of memory fail. - Change parameters and variables related to the TLB slots to unsigned which is more appropriate. - Search the whole OFW device tree instead of only the children of the root nexus device for the BSP as starting with UltraSPARC IV the 'cpu' nodes hang off of from 'cmp' (chip multi-threading processor) or 'core' or combinations thereof. Also in large UltraSPARC III based machines the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which group snooping-coherency domains together instead of directly from the nexus. - Add support for UltraSPARC IV and IV+ BSPs. Due to the fact that these are multi-core each CPU has two Fireplane config registers and thus the module/target ID has to be determined differently so the one specific to a certain core is used. Similarly, starting with UltraSPARC IV the individual cores use a different property in the OFW device tree to indicate the CPU/core ID as it no longer is in coincidence with the shared slot/socket ID. While at it additionally distinguish between CPUs with Fireplane and JBus interconnects as these also use slightly different sizes for the JBus/agent/module/target IDs. - Check the return value of init_heap(). This requires moving it after cons_probe() so we can panic when appropriate. This should be fine as the PowerPC OFW loader uses that order for quite some time now.	2010-02-13 14:13:39 +00:00
Marius Strobl	3438bdc53e	Merge from amd64/i386: Implement support for interrupt descriptions.	2009-12-24 15:43:37 +00:00
Marius Strobl	6ed76228c1	- Add support for the IOMMUs of Fire JBus to PCIe and Oberon Uranus to PCIe bridges. - Add support for talking the PROM mappings over to the kernel IOTSB just like we do with the kernel TSB in order to allow OFW drivers to continue to work. - Change some members, parameters and variables to unsigned where more appropriate.	2009-12-23 22:02:34 +00:00
Marius Strobl	46c9b5d9bd	Fix whitespace according to style(9).	2009-12-23 21:51:41 +00:00
Marius Strobl	8bf72e61a7	- Add macros for the states of the interrupt clear registers. - Change INTMAP_VEC() to take an INO as its second argument rather than an INR. The former is what I actually intended with this macro and how it's currently used.	2009-12-22 21:48:18 +00:00
Marius Strobl	ea77e7bb3f	Make these constants unsigned which is more appropriate.	2009-12-22 21:42:54 +00:00
Konstantin Belousov	a7b890448c	Extract the code that records syscall results in the frame into MD function cpu_set_syscall_retval(). Suggested by: marcel Reviewed by: marcel, davidxu PowerPC, ARM, ia64 changes: marcel Sparc64 tested and reviewed by: marius, also sunv reviewed MIPS tested by: gonzo MFC after: 1 month	2009-11-10 11:43:07 +00:00
Marius Strobl	bed992dc26	Sync with the other archs and wrapper the prototype of in_cksum_skip(9) in #ifdef _KERNEL. Submitted by: Ulrich Spoerlein MFC after: 1 month	2009-10-26 22:00:26 +00:00
Marius Strobl	a430b967b5	Change the load base to below 2GB so PIE binaries work including when compiled to use the Medium/Low code model, which we currently default to for the userland. GNU/Linux has moved their default to Medium/Middle some time ago, which probably explains why the current GNU ld(1) uses a base in the range between 32 and 44 bits instead. Submitted by: kib	2009-10-18 13:08:15 +00:00
Konstantin Belousov	023063938a	Define architectural load bases for PIE binaries. Addresses were selected by looking at the bases used for non-relocatable executables by gnu ld(1), and adjusting it slightly. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:31:24 +00:00
Alan Cox	fe105d45a2	Add a new sysctl for reporting all of the supported page sizes. Reviewed by: jhb MFC after: 3 weeks	2009-09-18 17:04:57 +00:00
Poul-Henning Kamp	a254d1f16d	Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an architecture specific include file containing the _ALIGN* stuff which <sys/socket.h> needs.	2009-09-08 20:45:40 +00:00
Attilio Rao	dc6fbf6545	* Completely Remove the option STOP_NMI from the kernel. This option has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)	2009-08-13 17:09:45 +00:00
Marius Strobl	fada2a867d	Add a MD __PCI_BAR_ZERO_VALID which denotes that BARs containing 0 actually specify valid bases that should be treated just as normal. The PCI specifications have no indication that 0 would be a magic value indicating a disabled BAR as commonly used on at least amd64 and i386 but not sparc64. It's unclear what to do in pci_delete_resource() instead of writing 0 to a BAR though as there's no (other) way do disable individual BARs so its decoding is left enabled in case of __PCI_BAR_ZERO_VALID for now. Approved by: re (kib), jhb MFC after: 1 week	2009-07-21 19:06:39 +00:00
Alan Cox	3153e878dd	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)	2009-07-12 23:31:20 +00:00
Sam Leffler	8c393fd1f0	Cleanup ALIGNED_POINTER: o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v) o define as "1" on amd64 and i386 where there is no restriction o make the type returned consistent with ALIGN o remove _ALIGNED_POINTER o make associated comments consistent Reviewed by: bde, imp, marcel Approved by: re (kensmith)	2009-07-05 17:45:48 +00:00
Marius Strobl	49c8326a79	- Work around the broken loader behavior of not demapping no longer used kernel TLB slots when unloading the kernel or modules, which results in havoc when loading a kernel and modules which take up less TLB slots afterwards as the unused but locked ones aren't accounted for in virtual_avail. Eventually this should be fixed in the loader which isn't straight forward though and the kernel should be robust against this anyway. [1] - Ensure that the addresses allocated directly from phys_avail[] by pmap_bootstrap_alloc() are always colored properly. This implicit assumption was broken in r194784 as unlike the other consumers the DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE * DCACHE_COLORS. [2] - Remove the no longer used global msgbuf_phys. - Remove the redundant ekva parameter of pmap_bootstrap_alloc(). - Correct some outdated function names in ktr(9) invocations. Requested by: jhb [1] Reported by: gavin [2] Approved by: re (kib) MFC after: 2 weeks	2009-06-28 22:42:51 +00:00
Alan Cox	5797795f5a	Correct the #endif comment. Noticed by: jmallett Approved by: re (kib)	2009-06-26 16:22:24 +00:00
Alan Cox	e999111ae7	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb	2009-06-26 04:47:43 +00:00
Jeff Roberson	50c202c592	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
Robert Watson	9725389e1e	Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing a fair number of static data structures, making this an unlikely option to try to change without also changing source code. [1] Change default cache line size on ia64, sparc64, and sun4v to 128 bytes, as this was what rtld-elf was already using on those platforms. [2] Suggested by: bde [1], jhb [2] MFC after: 2 weeks	2009-04-20 12:59:23 +00:00
Robert Watson	22037b2d2c	Add description and cautionary note regarding CACHE_LINE_SIZE. MFC after: 2 weeks Suggested by: alc	2009-04-19 21:26:36 +00:00
Robert Watson	a93fa8f2bb	For each architecture, define CACHE_LINE_SHIFT and a derived CACHE_LINE_SIZE constant. These constants are intended to over-estimate the cache line size, and be used at compile-time when a run-time tuning alternative isn't appropriate or available. Defaults for all architectures are 64 bytes, except powerpc where it is 128 bytes (used on G5 systems). MFC after: 2 weeks Discussed on: arch@	2009-04-19 20:19:13 +00:00
Marius Strobl	707085fef9	- There's no need to wrap kdb_active and kdb_trap() in #ifdef KDB as they're always available. - Remove unused variable. [1] - Add a missing const. - Sort includes. Submitted by: Christoph Mallon [1]	2009-03-19 20:46:51 +00:00
Konstantin Belousov	a4f2b2b0c6	Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer to the full path of the image that is being executed. Increase AT_COUNT. Remove no longer true comment about types used in Linux ELF binaries, listed types contain FreeBSD-specific entries. Reviewed by: kan	2009-03-17 12:50:16 +00:00
Marius Strobl	9223a606d0	Improve r185008 so the streaming cache is only flushed when a mapping actually met the threshold.	2009-02-10 21:51:33 +00:00
Marius Strobl	ceab1bee37	- Use the generally more appropriate PROM base rather than the kernel one as the non-faulting flush address in the loader so we can can change KERNBASE and VM_MIN_KERNEL_ADDRESS if we ever want to without needing to worry about using a compatible loader. - Correctly check for LOADER_DEBUG. - Add a missing const for page_sizes[].	2009-02-10 21:48:42 +00:00
Marius Strobl	75193d5283	- Currently the PMAP code is laid out to let the kernel TSB cover the whole KVA space using one locked 4MB dTLB entry per GB of physical memory. On Cheetah-class machines only the dt16 can hold locked entries though, which would be completely consumed for the kernel TSB on machines with >= 16GB. Therefore limit the KVA space to use no more than half of the lockable dTLB slots, given that we need them also for other things. - Add sanity checks which ensure that we don't exhaust the (lockable) TLB slots.	2009-01-01 14:01:21 +00:00
Nathan Whitehorn	91416fb268	Modularize the Open Firmware client interface to allow run-time switching of OFW access semantics, in order to allow future support for real-mode OF access and flattened device frees. OF client interface modules are implemented using KOBJ, in a similar way to the PPC PMAP modules. Because we need Open Firmware to be available before mutexes can be used on sparc64, changes are also included to allow KOBJ to be used very early in the boot process by only using the mutex once we know it has been initialized. Reviewed by: marius, grehan	2008-12-20 00:33:10 +00:00
Warner Losh	db3cd725a5	AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them. Reviewed by: peter	2008-12-17 06:56:58 +00:00
Nathan Whitehorn	94b4a038a1	Adapt parts of the sparc64 Open Firmware bus enumeration code (in particular, the code for parsing interrupt maps) to PowerPC and reflect their new MI status by moving them to the shared dev/ofw directory. This commit also modifies the OFW PCI enumeration procedure on PowerPC to allow the bus to find non-firmware-enumerated devices that Apple likes to add, and adds some useful Open Firmware properties (compat and name) to the pnpinfo string of children on OFW SBus, EBus, PCI, and MacIO links. Because of the change to PCI enumeration on PowerPC, X has started working again on PPC machines with Grackle hostbridges. Reviewed by: marius Obtained from: sparc64	2008-12-15 15:31:10 +00:00
Kip Macy	db7f0b974f	- bump __FreeBSD version to reflect added buf_ring, memory barriers, and ifnet functions - add memory barriers to <machine/atomic.h> - update drivers to only conditionally define their own - add lockless producer / consumer ring buffer - remove ring buffer implementation from cxgb and update its callers - add if_transmit(struct ifnet ifp, struct mbuf m) to ifnet to allow drivers to efficiently manage multiple hardware queues (i.e. not serialize all packets through one ifq) - expose if_qflush to allow drivers to flush any driver managed queues This work was supported by Bitgravity Inc. and Chelsio Inc.	2008-11-22 05:55:56 +00:00
Marius Strobl	e363ea0fab	Use the interrupt level right below PIL_FAST for executing interrupt filters instead of PIL_FAST and allow special filters and handlers for interrupts which need to be able to interrupt even filters, f.e. bus error interrupts, to be registered with the revived INTR_FAST at PIL_FAST.	2008-11-19 22:12:32 +00:00
Marius Strobl	11202ac9db	- Allow the front-end to specify that iommu(4) should disable rerun of the streaming cache for silicon bug workarounds. - Announce the presence of a streaming cache on attach for informational purposes. - For performance reasons don't do unnecessary flushes of the streaming cache when coherent mappings are synced. - Fix some minor style issues.	2008-11-16 19:53:49 +00:00
Marius Strobl	33f12b1200	Use the STICK timers only when absolutely necessary, i.e. if a machine consists of CPUs running at different speeds, for driving hardclock as these timers in turn are driven at frequencies as low as 5MHz, resulting in bad granularity compared to the TICK timers. However, don't employ the workaround for the BlackBird erratum #1 when using the TICK timer on machines with cheetah-class CPUs for performance reasons. Reported by: Florian Smeets	2008-09-20 11:26:13 +00:00
Marius Strobl	b2a1ae8353	- Newer firmware versions no longer provide SUNW,stop-self so just disable interrupts and loop forever with these. - Hide all MP-related bits in <machine/smp.h> underneath #ifdef SMP. - Inline ipi_all_but_self(9) and ipi_selected(9). We don't expose any additional bits but save a few cycles by doing so. - Remove ipi_all(9), which actually only called panic(9). It can't be implemented natively anyway and having it removed at least causes MI users to fail already fail when linking.	2008-09-18 13:56:30 +00:00
Marius Strobl	a4eba4a555	For cheetah-class CPUs ensure that the dt512_0 is set to hold 8k pages for all three contexts and configure the dt512_1 to hold 4MB pages for them (e.g. for direct mappings). This might allow for additional optimization by using the faulting page sizes provided by AA_DMMU_TAG_ACCESS_EXT for bypassing the page size walker for the dt512 in the superpage support code. Submitted by: nwhitehorn (initial patch)	2008-09-08 21:24:25 +00:00
Marius Strobl	e5858aa9d5	Use the PROM provided SUNW,set-trap-table to take over the trap table. This is required in order to set obp-control-relinquished within the PROM, allowing to safely read the OFW translations node. Without this, f.e. a `ofwdump -ap` triggers a fatal reset error or worse things on machines based on USIII and beyond. In theory this should allow to remove touching %tba in cpu_setregs(), in practice we seem to currently face a chicken and egg problem when doing so however.	2008-09-04 20:52:54 +00:00
Marius Strobl	597b17a0e0	Flesh out MMU and cache handling of cheetah-class CPUs.	2008-09-04 19:58:52 +00:00
Marius Strobl	4f76d0a885	The physical address space of cheetah-class CPUs has been extended to 43 bits so update TD_PA_BITS accordingly. For the most part this increase is transparent to the existing code except for when reading the physical address from ASI_{D,I}TLB_DATA_ACCESS_REG, which we only do in the loader and which was already adjusted in r182478, or from the OFW translations node. While at it, ensure we are only taking valid OFW mapping entries into account.	2008-09-04 19:43:14 +00:00
Marius Strobl	09c7f9e338	- USIII-based machines can consist of CPUs running at different frequencies (and having different cache sizes) so use the STICK (System TICK) timer, which was introduced due to this and is driven by the same frequency across all CPUs, instead of the TICK timer, whose frequency varies with the CPU clock, to drive hardclock. We try to use the STICK counter with all CPUs that are USIII or beyond, even when not necessary due to identical CPUs, as we can can also avoid the workaround for the BlackBird erratum #1 there. Unfortunately, using the STICK counter currently causes a hang with USIIIi MP machines for reasons unknown, so we still use the TICK timer there (which is okay as they can only consist of identical CPUs). - Given that we only (try to) synchronize the (S)TICK timers of APs with the BSP during startup, we could end up spinning forever in DELAY(9) if that function is migrated to another CPU while we're spinning due to clock drift afterwards, so pin to the CPU in order to avoid migration. Unfortunately, pinning doesn't work at the point DELAY(9) is required by the low-level console drivers, yet, so switch to a function pointer, which is updated accordingly, for implementing DELAY(9). For USIII and beyond, this would also allow to easily use the STICK counter instead of the TICK one here, there's no benefit in doing so however. While at it, use cpu_spinwait(9) for spinning in the delay- functions. This currently is a NOP though. - Don't set the TICK timer of the BSP to 0 during at startup as there's no need to do so. - Implement cpu_est_clockrate(). - Unfortunately, USIIIi-based machines don't provide a timecounter device besides the STICK and TICK counters (well, in theory the Tomatillo bridges have a performance counter that can be (ab)used as timecounter by configuring it to count bus cycles, though unlike the performance counter of Schizo bridges, the Tomatillo one is broken and counts Sun knows what in this mode). This means that we've to use a (S)TICK counter for timecounting, which has the old problem of not being in sync across CPUs, so provide an additional timecounter function which binds itself to the BSP but has an adequate low priority.	2008-09-03 17:39:19 +00:00
Marius Strobl	ec0f669534	- USIII-based machines can consist of CPUs having different cache sizes (and running at different frequencies) so move the cacheinfo to the PCPU data. While at it, remove some redundant and/or unused members from struct cacheinfo. - In sparc64_init don't assume the first CPU node we find in the OFW device tree is the BSP.	2008-09-02 21:13:54 +00:00
Marius Strobl	6adb632eeb	Update the comment regarding the workaround for the BlackBird TICK_COMPARE bug and the instruction alignment used for it based on information found in the OpenSolaris source. MFC after: 3 days	2008-08-23 20:53:27 +00:00
John Baldwin	70d12a18f2	Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week	2008-08-19 19:53:52 +00:00
Marius Strobl	6557990017	cosmetic changes and style fixes	2008-08-13 20:30:28 +00:00
Marius Strobl	db85033cd0	Assume OpenSolaris knows better and use their value for VM_MAX_PROM_ADDRESS.	2008-08-12 20:00:28 +00:00
Marius Strobl	0b1bfc4986	- Reimplement {d,i}tlb_enter() and {d,i}tlb_va_to_pa() in C. There's no particular reason for them to be implemented in assembler and having them in C allows easier extension as well as using more C macros and {d,i}tlb_slot_max rather than hard-coding magic (and actually spitfire-only) values. - Fix the compilation of pmap_print_tte(). - Change pmap_print_tlb() to use ldxa() rather than re-rolling it inline as well as TLB_DAR_SLOT and {d,i}tlb_slot_max rather than hardcoding magic (and actually spitfire-only) values. - While at it, suffix the above mentioned functions with "_sun4u" to underline they're architecture-specific. - Use __FBSDID and macros instead of magic values in locore.S. - Remove unused includes and smp_stack in locore.S.	2008-08-07 22:46:25 +00:00
Marius Strobl	6a92796332	Revert the addition of "__volatile" to "__asm" done in r180011, since the condition codes where added to the clobber lists in r180073 the former is unnecessary.	2008-07-05 15:28:30 +00:00
Marius Strobl	e344c57bcb	Improve r180011 by explicitly adding the condition codes to the clobber list. Suggested by: Christoph Mallon	2008-06-27 22:17:14 +00:00
Marius Strobl	0d9e99b6ca	Use "__asm __volatile" rather than "__asm" for instruction sequences that modify condition codes (the carry bit, in this case). Without "__volatile", the compiler might add the inline assembler instructions between unrelated code which also uses condition codes, modifying the latter. This prevents the TCP pseudo header checksum calculation done in tcp_output() from having effects on other conditions when compiled with GCC 4.2.1 at "-O2" and "options INET6" left out. [1] Reported & tested by: Boris Kochergin [1] MFC after: 3 days	2008-06-25 21:04:59 +00:00
Ed Schouten	721351876c	Remove the unused major/minor numbers from iodev and memdev. Now that st_rdev is being automatically generated by the kernel, there is no need to define static major/minor numbers for the iodev and memdev. We still need the minor numbers for the memdev, however, to distinguish between /dev/mem and /dev/kmem. Approved by: philip (mentor)	2008-06-25 07:45:31 +00:00
Marius Strobl	0352f67204	- Remove the BUS_HANDLE_MIN checking in the __BUS_DEBUG_ACCESS macro; for UPA it should have fulfilled its purpose by now and Fireplane- and JBus-based machines are way to messy in organization to implement something equivalent. - Fix a bunch of style(9) bugs.	2008-05-08 21:10:39 +00:00
Marius Strobl	083b2bd41a	- Use the name returned by device_get_nameunit(9) for the name of the counter-timer timecounter so the associated SYSCTL nodes don't clash on machines having multiple U2P and U2S bridges as well as establishing a clear mapping between these bridges and their timecounter device. - Don't bother setting up a "nice" name for the IOMMU, just use the name returned by device_get_nameunit(9), too. - Fix some minor style(9) bugs. - Use __FBSDID in counter.c MFC after: 1 week	2008-05-07 21:22:15 +00:00
Marius Strobl	c2dcc708df	- Include <machine/utrap.h> so this header doesn't have an MD dependency. - Make prototypes style(9) compliant. MFC after: 1 week	2008-04-23 20:38:37 +00:00
Marius Strobl	526bd70425	o Rename ic_eoi to ic_clear to emphasize the functions it points don't send and EOI which works like on amd64/i386 and blocks all interrupts on the relevant interrupt controller. o Replace the post_filter and post_inthread hooks registered when creating the interrupt events with just ic_clear as on sparc64 we don't need to do any disable->EOI->enable dance to unblock all but the relevant interrupt while running the filter or handler; just not clearing the interrupt already has the same effect. o Merge from amd64/i386: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. - Add support for binding interrupts to CPUs, including for the bus_bind_intr(9) interface, a assign_cpu hook and initially shuffling interrupts arround in a round-robin fashion. Reviewed by: jhb MFC after: 1 month	2008-04-23 20:04:38 +00:00
Marius Strobl	a6c165e468	- Add support for IPI_PREEMPT. [1] - Add my copyright to mp_machdep.c for having implemented support for USIII and up and some fixes. Obtained from: sun4v (modulo style(9) bugs) [1]	2008-04-09 21:14:01 +00:00
John Birrell	e483943791	When building a kernel module, define MAXCPU the same as SMP so that modules work with and without SMP.	2008-03-27 05:03:26 +00:00
Poul-Henning Kamp	e465985885	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
Marius Strobl	5259569262	- Const'ify the bus_stream_asi and bus_type_asi arrays. - Replace hard-coded functions names missed in bus_machdep.c rev. 1.44 with __func__. - Break some long lines. MFC after: 1 month	2008-03-24 17:57:01 +00:00
Pawel Jakub Dawidek	ab35440fa1	Oops. Use atomic_add_long() for atomic_fetchadd_long() (not atomic_add_int()) for sparc64 and sun4v. Noticed by: marius	2008-03-19 07:27:24 +00:00
Pawel Jakub Dawidek	6eb4157ffc	Implement atomic_fetchadd_long() for all architectures and document it. Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)	2008-03-16 21:20:50 +00:00
Marius Strobl	d5295d0b09	- Do as the comment in pmap_bootstrap() suggests and flush all non-locked TLB entries possibly left over by the firmware and also do so while bootstrapping APs. - Use __FBSDID. MFC after: 1 month	2008-03-09 15:53:34 +00:00
Marius Strobl	559921043b	The Sun disk label only uses 16-bit fields for cylinders, heads and sectors so the geometry of large IDE disks has to be adjusted. This corresponds to what the OpenSolaris dad(7D) driver does except that the latter only tweaks sectors and effectively limits the mediasize to 128GB so the cylinders and heads fields won't ever overflow. Not limiting the mediasize is a compromise between allowing to use Sun disk label as far as possible and being able to use the entire disk with another disk label. This allows to use the full capacity of large IDE disks if they were not labeled under (Open)Solaris (in both ways of the meaning). MFC after: 2 weeks	2008-02-11 21:40:22 +00:00
Alan Cox	b8e7fc24fe	Add configuration knobs for the superpage reservation system. Initially, the reservation will only be enabled on amd64.	2007-12-27 16:45:39 +00:00
Joseph Koshy	0da7aa7a7d	Add stubs to unbreak LINT.	2007-12-07 13:45:47 +00:00
Robert Watson	3c90d1ea74	Break out stack(9) from ddb(4): - Introduce per-architecture stack_machdep.c to hold stack_save(9). - Introduce per-architecture machine/stack.h to capture any common definitions required between db_trace.c and stack_machdep.c. - Add new kernel option "options STACK"; we will build in stack(9) if it is defined, or also if "options DDB" is defined to provide compatibility with existing users of stack(9). Add new stack_save_td(9) function, which allows the capture of a stacktrace of another thread rather than the current thread, which the existing stack_save(9) was limited to. It requires that the thread be neither swapped out nor running, which is the responsibility of the consumer to enforce. Update stack(9) man page. Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)	2007-12-02 20:40:35 +00:00
Alan Cox	7bfda801a8	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)	2007-09-25 06:25:06 +00:00
Marius Strobl	7439368f60	o Revamp the sparc64 interrupt code in order to be able to interface with the INTR_FILTER-enabled MI code. Basically this consists of registering an interrupt controller (of which there can be multiple and optionally different ones either per host-to-foo bridge or shared amongst host-to-foo bridges in any one machine) along with an interrupt vector as specific argument for all the interrupt vectors used by a given host-to-foo bridge (roughly similar to registering interrupt sources on amd64 and i386), providing functions to enable, clear and disable the interrupts of the children beneath the bridge. This also includes: - No longer entering a critical section in tl0_intr() and tl1_intr() for executing interrupt handlers but rather let the handlers enter it themselves so in the case of intr_event_handle() we don't enter a nested critical section. - Adding infrastructure for binding delivery of interrupt vectors to specific CPUs which later on can be interfaced with the code from amd64/i386 for binding interrupts to specific CPUs. - Getting rid of the wrapper hack introduced along the lines of the API changes for INTR_FILTER which as a side-effect caused interrupts associated with ithread handlers only to get the elevated priority of those associated with filters ("fast handlers") (this removes the hack also in the non-INTR_FILTER case). - Disabling (by not clearing) an interrupt in the interrupt controller until all associated handlers have been executed, which is crucial for the typical locking strategy of NIC drivers in order to work correctly in case of shared interrupts. This was a more or less theoretical problem on sparc64 though, as shared interrupts are rather uncommon there except for the on-board SCCs and UARTs. Note that due to the behavior of at least of some of the interrupt controllers used on sparc64 an enable+EOI instead of a disable+EOI approach (as implied by the INTR_FILTER MI code and implemented on other architectures) is used as the latter can cause lost interrupts or in the worst case interrupt starvation. o Correct a typo in sbus_alloc_resource() which caused (pass-through) allocations to only work down to the grandchildren of the bus, which wasn't a real problem so far as we don't support any devices which are great-grandchildren or greater of a U2S bridge, yet. o In fhc(4) use bus_{read,write}_4() instead of bus_space_{read,write}_4() in order to get rid of sc_bh and sc_bt in the fhc_softc. Also get rid of some other unneeded members in fhc_softc. Reviewed by: marcel (earlier version) Approved by: re (kensmith)	2007-09-06 19:16:30 +00:00
Marius Strobl	5435966282	Style(9) fix - use #define<tab> consistently. Approved by: re (kensmith)	2007-09-06 14:56:09 +00:00

1 2 3 4 5 ...

771 Commits