freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	50ad4fc65c	Regenerate	2008-04-08 09:51:19 +00:00
Konstantin Belousov	48b05c3f82	Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho	2008-04-08 09:45:49 +00:00
Alan Cox	109d493230	Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.	2008-04-07 07:38:02 +00:00
John Baldwin	1ee1b68792	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64	2008-04-05 19:58:30 +00:00
Alan Cox	2addc03d04	Eliminate an unnecessary test and its misleading comment from pmap_enter().	2008-04-04 18:00:22 +00:00
Alan Cox	bc8a0d87bd	Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The given pmap is never NULL, and therefore pmap_pml4e() can never return NULL. The pervasive use of these inline functions throughout the pmap makes these simple changes worthwhile.	2008-04-02 04:39:47 +00:00
Paul Saab	6e7534b8c8	Add support to mincore for detecting whether a page is part of a "super" page or not. Reviewed by: alc, ups	2008-03-28 04:29:27 +00:00
Doug Rabson	fa9d9930ca	Add kernel module support for nfslockd and krpc. Use the module system to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k' option to rpc.lockd and make kernel NLM the default. A user can still force the use of the old user NLM by building a kernel without NFSLOCKD and/or removing the nfslockd.ko module.	2008-03-27 11:54:20 +00:00
John Birrell	e483943791	When building a kernel module, define MAXCPU the same as SMP so that modules work with and without SMP.	2008-03-27 05:03:26 +00:00
Poul-Henning Kamp	dad3b6c6fd	Back in the good old days, PC's had random pieces of rock for frequency generation and what frequency the generated was anyones guess. In general the 32.768kHz RTC clock x-tal was the best, because that was a regular wrist-watch Xtal, whereas the X-tal generating the ISA bus frequency was much lower quality, often costing as much as several cents a piece, so it made good sense to check the ISA bus frequency against the RTC clock. The other relevant property of those machines, is that they typically had no more than 16MB RAM. These days, CPU chips croak if their clocks are not tightly within specs and all necessary frequencies are derived from the master crystal by means if PLL's. Considering that it takes on average 1.5 second to calibrate the frequency of the i8254 counter, that more likely than not, we will not actually use the result of the calibration, and as the final clincher, we seldom use the i8254 for anything besides BEL in syscons anyway, it has become time to drop the calibration code. If you need to tell the system what frequency your i8254 runs, you can do so from the loader using hw.i8254.freq or using the sysctl kern.timecounter.tc.i8254.frequency.	2008-03-26 22:12:00 +00:00
Poul-Henning Kamp	3a995824f6	Eliminate unnecessary #includes	2008-03-26 20:26:12 +00:00
Poul-Henning Kamp	e465985885	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].	2008-03-26 20:09:21 +00:00
Poul-Henning Kamp	ebfbcd612a	Rename timer0_max_count to i8254_max_count. Rename timer0_real_max_count to i8254_real_max_count and make it static. Rename timer_freq to i8254_freq and make it a loader tunable.	2008-03-26 15:03:24 +00:00
Poul-Henning Kamp	f168bfa529	The RTC related pscnt and psdiv variables have no business being public.	2008-03-26 13:25:27 +00:00
Jung-uk Kim	cb7d38abf2	Belatedly add BPF_JITTER in NOTES for supported architectures.	2008-03-24 22:23:22 +00:00
Peter Wemm	f001eabf3a	First pass at (possibly futile) microoptimizing of cpu_switch. Results are mixed. Some pure context switch microbenchmarks show up to 29% improvement. Pipe based context switch microbenchmarks show up to 7% improvement. Real world tests are far less impressive as they are dominated more by actual work than switch overheads, but depending on the machine in question, workload, kernel options, phase of moon, etc, a few percent gain might be seen. Summary of changes: - don't reload MSR_[FG]SBASE registers when context switching between non-threaded userland apps. These typically cost 120 clock cycles each on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no faster on this. - The above change only helps unthreaded userland apps that tend to use the same value for gsbase. Threaded apps will get no benefit from this. - reorder things like accessing the pcb to be in memory order, to give prefetching a better chance of working. Operations are now in increasing memory address order, rather than reverse or random. - Push some lesser used code out of the main code paths. Hopefully allowing better code density in cache lines. This is probably futile. - (part 2 of previous item) Reorder code so that branches have a more realistic static branch prediction hint. Both Intel and AMD cpus default to predicting branches to lower memory addresses as being taken, and to higher memory addresses as not being taken. This is overridden by the limited dynamic branch prediction subsystem. A trip through userland might overflow this. - Futule attempt at spreading the use of the results of previous operations in new operations. Hopefully this will allow the cpus to execute in parallel better. - stop wasting 16 bytes at the top of kernel stack, below the PCB. - Never load the userland fs/gsbase registers for kthreads, but preserve curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!) Microbenchmarking this code seems to be really sensitive to things like scheduling luck, timing, cache behavior, tlb behavior, kernel options, other random code changes, etc. While it doesn't help heavy userland workloads much, it does help high context switch loads a little, and should help those that involve switching via kthreads a bit more. A special thanks to Kris for the testing and reality checks, and Jeff for tormenting me into doing this. :) This is still work-in-progress.	2008-03-23 23:09:06 +00:00
Alan Cox	58680920e9	Correct an error in pmap_mincore() when applied to a 2MB page mapping: Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the 2MB physical page from the PDE.	2008-03-23 23:04:09 +00:00
Peter Wemm	22c0c6e9d3	Export TDP_KTHREAD to asm files.	2008-03-23 22:46:37 +00:00
Peter Wemm	6c73bb3557	Move pcb_flags to make trivially better use of cache lines.	2008-03-23 22:45:51 +00:00
Peter Wemm	3d60169ef4	Protect the setting of the fsbase/gsbase MSR registers and the pcb_[fg]sbase values with a critical section, like the rest of the kernel.	2008-03-23 22:44:56 +00:00
Alan Cox	702006ff76	To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. However, these changes enable me to remove a work-around from pmap_promote_pde(), the superpage promotion procedure. (Note: The AMD processors that we have tested, including the latest, the Phenom, still exhibit the historical behavior.) Acknowledgments: After I observed the problem, Stephan (ups) was instrumental in characterizing the exact behavior of Intel's recent TLBs. Tested by: Peter Holm	2008-03-23 20:38:01 +00:00
Konstantin Belousov	3f7905d29c	Prevent the overflow in the calculation of the next page directory. The overflow causes the wraparound with consequent corruption of the (almost) whole address space mapping. As Alan noted, pmap_copy() does not require the wrap-around checks because it cannot be applied to the kernel's pmap. The checks there are included for consistency. Reported and tested by: kris (i386/pmap.c:pmap_remove() part) Reviewed by: alc MFC after: 1 week	2008-03-23 07:07:27 +00:00
John Baldwin	eb2b0540e5	Explicitly use spinlock_enter/exit rather than locking the icu_lock spin lock in the 8259A drivers as these drivers are only used on UP systems. This slightly reduces the penalty of an SMP kernel (such as GENERIC) on a UP x86 machine.	2008-03-20 21:53:27 +00:00
John Baldwin	dcc8106854	Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ resource to a CPU. The default method is to pass the request up to the parent similar to BUS_CONFIG_INTR() so that all busses don't have to explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine similar to bus_setup/teardown_intr() is added for device drivers to use. Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource must be allocated, but it can happen in any order with respect to bus_setup_intr(). Currently it is only supported on amd64 and i386 via nexus(4) methods that simply call the intr_bind() routine. Tested by: gallatin	2008-03-20 21:24:32 +00:00
John Baldwin	6d2d1c044f	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}	2008-03-17 22:42:01 +00:00
Pawel Jakub Dawidek	6eb4157ffc	Implement atomic_fetchadd_long() for all architectures and document it. Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)	2008-03-16 21:20:50 +00:00
Roman Divacky	d8653dd986	Regen.	2008-03-16 16:29:37 +00:00
Roman Divacky	5dfb688191	Implement sched_setaffinity and get_setaffinity using real cpu affinity setting primitives. Reviewed by: jeff Approved by: kib (mentor)	2008-03-16 16:27:44 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
John Baldwin	eaf86d1678	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}	2008-03-14 19:41:48 +00:00
John Baldwin	c9107e85d9	Fix a silly bogon which prevented all the CPUs that are tagged as interrupt receivers from being given interrupts if any CPUs in the system were not tagged as interrupt receivers that I introduced when switching the x86 interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC IDs. In practice this only affects systems with Hyperthreading (though disabling HTT in the BIOS would workaround the issue) as that is the only case currently where one can have CPUs that aren't tagged as interrupt receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical CPUs of which 2 were interrupt receivers) the result was that all device interrupts were sent to CPU 0. MFC after: 1 week Pointy hat to: jhb	2008-03-14 03:44:42 +00:00
John Baldwin	5217af301c	Rework how the nexus(4) device works on x86 to better handle the idea of different "platforms" on x86 machines. The existing code already handles having two platforms: ACPI and legacy. However, the existing approach was rather hardcoded and difficult to extend. These changes take the approach that each x86 hardware platform should provide its own nexus(4) driver (it can inherit most of its behavior from the default legacy nexus(4) driver) which is responsible for probing for the platform and performing appropriate platform-specific setup during attach (such as adding a platform-specific bus device). This does mean changing the x86 platform busses to no longer use an identify routine for probing, but to move that logic into their matching nexus(4) driver instead. - Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it can be overriden. - Expose a nexus_init_resources() routine which initializes the various resource managers so that subclassed nexus(4) drivers can invoke it from their attach routine. - The legacy nexus(4) driver explicitly adds a legacy0 device in its attach routine. - The ACPI driver no longer contains an new-bus identify method. Instead it exposes a public function (acpi_identify()) which is a probe routine that the MD nexus(4) drivers can use to probe for ACPI. All of the probe logic in acpi_probe() is now moved into acpi_identify() and acpi_probe() is just a stub. - On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via acpi_identify() and claims the nexus0 device if the probe succeeds. It then explicitly adds an acpi0 device in its attach routine. - The legacy(4) driver no longer knows anything about the acpi0 device. - On ia64 if acpi_identify() fails you basically end up with no devices. This matches the previous behavior where the old acpi_identify() would fail to add an acpi0 device again leaving you with no devices. Discussed with: imp Silence on: arch@	2008-03-13 20:39:04 +00:00
Konstantin Belousov	22eca0bf45	Since version 4.3, gcc changed its behaviour concerning the i386/amd64 ABI and the direction flag, that is it now assumes that the direction flag is cleared at the entry of a function and it doesn't clear once more if needed. This new behaviour conforms to the i386/amd64 ABI. Modify the signal handler frame setup code to clear the DF {e,r}flags bit on the amd64/i386 for the signal handlers. jhb@ noted that it might break old apps if they assumed DF == 1 would be preserved in the signal handlers, but that such apps should be rare and that older versions of gcc would not generate such apps. Submitted by: Aurelien Jarno <aurelien aurel32 net> PR: 121422 Reviewed by: jhb MFC after: 2 weeks	2008-03-13 10:54:38 +00:00
John Baldwin	391664b110	The variable MTRR registers actually have variable-sized PhysBase and PhysMask fields based on the number of physical address bits supported by the current CPU. The old code assumed 36 bits on i386 and 40 bits on amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits. In at least one case (the new Intel CPU) having the size of the mask field wrong resulted in writing questionable values into the MTRR registers on the application processors (BSP as well if you modify the MTRRs via memcontrol or running X, etc.). The result of the questionable physmask was that all of memory was apparently treated as uncached rather than write-back resulting in a very significant performance hit. Fix this by constructing a run-time mask for the PhysBase and PhysMask fields based on the number of physical address bits supported by the CPU. All 64-bit capable CPUs provide a count of PA bits supported via the 0x80000008 extended CPUID feature, so use that if it is available. If that feature is not available, then assume 36 PA bits. While I'm here, expand the (now-unused) macros for the PhysBase and PhysMask fields to the current largest possible value (52 PA bits). MFC after: 1 week PR: i386/120516 Reported by: Nokia	2008-03-12 22:09:19 +00:00
John Baldwin	f15a9cd288	Minimize diffs with i686_mem.c: - A few whitespace changes I missed in the style(9) changes. - Move M_MEMDESC to mem.c.	2008-03-12 21:43:50 +00:00
Jeff Roberson	6617724c5f	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.	2008-03-12 10:12:01 +00:00
John Baldwin	1b085fde87	Style(9) these files. No changes in the compiled code. (Verified by diff'ing objdump -d output).	2008-03-11 21:41:36 +00:00
John Baldwin	336d8e5536	Add constants for the various fields in MTRR registers. MFC after: 1 week Verified by: md5(1)	2008-03-11 20:10:37 +00:00
John Baldwin	463e0f91cb	Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows the cpufreq drivers to reliably use properties of PCI devices for quirks, etc. - For the legacy drivers, add CPU devices via an identify routine in the CPU driver itself rather than in the legacy driver's attach routine. - Add CPU devices after Host-PCI bridges in the acpi bus driver. - Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and check its device ID rather than having a bogus PCI attachment that only checked for the ID in probe and always failed. As a side effect, you can now kldload ichss after boot. - Fix the ichss(4) driver to use the correct device_t for the ICH (and not for ichss0) when doing PCI config space operations to enable SpeedStep. MFC after: 2 weeks Reviewed by: njl, Andriy Gapon avg of icyb.net.ua	2008-03-10 22:18:07 +00:00
Jeff Roberson	32c9d3a767	- Rather than repeating the same preemption code everywhere call the scheduler specific sched_preempt() routine.	2008-03-10 01:32:48 +00:00
Rink Springer	2e7328e7cc	Import uslcom(4) from OpenBSD - this is a driver for Silicon Laboratories CP2101/CP2102 based USB serial adapters. Reviewed by: imp, emaste Obtained from: OpenBSD MFC after: 2 weeks	2008-03-05 14:13:30 +00:00
Alan Cox	0116b8b321	Add support for automatic promotion of 4KB page mappings to 2MB page mappings. Automatic promotion can be enabled by setting the tunable "vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic promotion is disabled. (Expect this to change.) Reviewed by: ups Tested by: kris, Peter Holm	2008-03-04 18:50:15 +00:00
Jeff Roberson	81aa71755b	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia	2008-03-02 07:58:42 +00:00
Ruslan Ermilov	58eefce0e6	Eliminate whitespace diffs to the i386 version.	2008-02-19 06:30:49 +00:00
Scott Long	7bbd40c57e	Teach the dump and minidump code to respect the maxioszie attribute of the disk; the hard-coded assumption of 64K doesn't work in all cases.	2008-02-15 06:26:25 +00:00
Scott Long	54f8dbc48f	If busdma is being used to realign dynamic buffers and the alignment is set to PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't reserve any pages. Adjust to be correct. Review of other architectures is forthcoming. Submitted by: Joseph Golio	2008-02-12 16:24:30 +00:00
Jung-uk Kim	865df544c6	Fix Linux mmap with MAP_GROWSDOWN flag. Reported by: Andriy Gapon (avg at icyb dot net dot ua) Tested by: Andriy Gapon (avg at icyb dot net dot ua) Pointyhat: me MFC after: 3 days	2008-02-11 19:35:03 +00:00
Scott Long	593c873471	Remove the rr232x driver. It has been superceded by the hptrr driver.	2008-02-03 07:07:30 +00:00
David Schultz	2cb2359632	Add a few more CPUID feature bits while here. We don't support these features yet.	2008-02-02 23:17:27 +00:00
David Schultz	67f6aa5ccf	SSE4 CPUID bits	2008-02-02 22:40:17 +00:00

1 2 3 4 5 ...

5053 Commits