freebsd-dev

Author	SHA1	Message	Date
Alexander Motin	45f6d66569	Remove all legacy ATA code parts, not used since options ATA_CAM enabled in most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never	2013-04-04 07:12:24 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Marcel Moolenaar	c78a079c40	kernacc() expects all KVAs to be covered in the kernel map. With the introduction of the PBVM, this stopped being the case. Redefine the VM parameters so that the PBVM is included in the kernel map. In particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of region 4 to include the PBVM. While here define KERNBASE to the actual link address of the kernel as is intended. PR: 169926	2013-02-25 02:41:38 +00:00
Marcel Moolenaar	1736d28c44	Enable PREEMPTION by default now that PR 147501 has been fixed.	2013-02-23 19:27:53 +00:00
Marcel Moolenaar	51a325a7f9	Close a race relating to setting the PCPU pointer (r13). Register r13 points to the TLS in user space and points to the PCPU structure in the kernel. The race is the result of having the exception handler on the one hand and the RPC system call entry on the other. The EPC syscall path is non-atomic in that interrupts are enabled while the two stacks are switched. The register stack is switched last as that is the stack used to determine whether we're going back to user space by the exception handler. If we go back to user space, we restore r13, otherwise we leave r13 alone. The EPC syscall path however set r13 to the PCPU structure before switching the register stack, which means that there was a window in which the exception handler would restore r13 when it was already pointing to the PCPU structure. This is fatal when the exception happened on CPU x, but left from the exception on anotehr CPU. In that case r13 would point to the PCPU of the CPU the thread was running on. This immediately results in getting the wrong value for curthread. The fix is to make sure we assign r13 after we set ar.bspstore to point to the kernel register stack for the thread.	2013-02-17 00:51:34 +00:00
Marcel Moolenaar	cb35030ab3	Return EFAULT when the address is not a kernel virtual address.	2013-02-16 21:46:27 +00:00
Marcel Moolenaar	f1acb9d36e	Eliminate the PC_CURTHREAD symbol and load the current thread's thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu. The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations. The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.	2013-02-12 17:38:35 +00:00
Marcel Moolenaar	22aa5c496b	Eliminate padding by moving 'narg' next to 'code'. Both are 32-bit entities in the syscall_args structure that otherwise has 64-bit only fields.	2013-02-12 17:24:41 +00:00
Konstantin Belousov	dd0b4fb6d5	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)	2013-02-12 16:57:20 +00:00
Marcel Moolenaar	44c169253d	Now that we actually use more memory descriptors, make sure to dump them as well.	2013-02-12 16:51:43 +00:00
Hiroki Sato	729e57723d	Remove firewire devices missed in r244992.	2013-01-04 15:29:50 +00:00
Konstantin Belousov	0dcbedfa61	Enable the UFS quotas for big-iron GENERIC kernels. Discussed with: mckusick MFC after: 2 weeks	2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav	36fca20f10	As discussed on -current last October, remove the firewire drivers from GENERIC.	2013-01-03 14:30:24 +00:00
Konstantin Belousov	b32ecf44bc	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks	2012-11-14 20:01:40 +00:00
Attilio Rao	cfedf924d3	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris	2012-11-03 23:03:14 +00:00
Konstantin Belousov	28854834f4	Fix compilation on ia64 when page size is configured for 16KB. Reviewed by: alc, marcel	2012-10-28 11:53:54 +00:00
Alan Cox	cdd7357cc5	Port the new PV entry allocator from amd64/i386. This allocator has two advantages. First, PV entries are roughly half the size. Second, this allocator doesn't access the paging queues, and thus it allows for the removal of the page queues lock from this pmap. Replace all uses of the page queues lock by a R/W lock that is private to this pmap. Tested by: marcel	2012-10-26 03:02:39 +00:00
Alan Cox	e4b8a2fc5a	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.	2012-09-28 05:30:59 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Marcel Moolenaar	5435dbc676	Use pmap_kextract(x) rather than pmap_extract(kernel_pmap, x). The former knows about all the special mappings, like PBVM. The kernel text and data are in the PBVM.	2012-08-18 23:28:34 +00:00
Marcel Moolenaar	85e6303e05	Remove support for SKI: HP's Itanium simulator. It's pretty much not used, serves very little value given that FreeBSD runs on real H/W for a long time. Note that SKI is open-source (see http://ski.sourceforge.net), so if there's interest and value again, then this code can be revived. Discussed with: jhb	2012-08-18 22:59:06 +00:00
John Baldwin	9ea9af5281	Add locking for sscdisk(4) and mark it MPSAFE. Since this driver just makes calls out to the emulator, the locking is fairly simple. A global mutex protects the list of ssc disks, and each ssc disk has a mutex to protect it's bioq. Approved by: marcel	2012-08-16 17:17:08 +00:00
Konstantin Belousov	1c771f9222	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
Marcel Moolenaar	0aa31a3393	Move PCPU initialization to a new function called cpu_pcpu_setup(). This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.	2012-07-08 18:00:22 +00:00
Marcel Moolenaar	6d5cb58b5f	Unleash the APs at SI_SUB_KICK_SCHEDULER so that we have them all up and running to service interrupts. This is especially important when the firmware has bound interrupts to CPUs, like for the SGI Altix 350. We wake up APs at SI_SUB_CPU time and they sit and spin until we unleash them, so there's nothing fundamentally different from a MD perspective.	2012-07-08 17:43:25 +00:00
Marcel Moolenaar	373ca4ed22	Implement ia64_physmem_alloc() and use it consistently to get memory before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP. Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.	2012-07-07 05:17:43 +00:00
Marcel Moolenaar	662d34aff2	Hide the creation of phys_avail behind an API to make it easier to do it correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API. Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory. While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address. The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem. The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.	2012-07-07 00:25:17 +00:00
Andrew Turner	74dc547e24	Make the wchar_t type machine dependent. This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred. Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively. Discussed with: bde	2012-06-24 04:15:58 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Konstantin Belousov	232aa31fb9	Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to timekeeping information. MFC after: 1 week	2012-06-22 06:38:31 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Jung-uk Kim	d3638dc4de	Improve style(9) in the previous commit.	2012-06-01 17:07:52 +00:00
Mitsuru IWASAKI	f0a101b7e2	Call AcpiLeaveSleepStatePrep() in interrupt disabled context (described in ACPICA source code). - Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c and call AcpiLeaveSleepStatePrep() in interrupt disabled context. - Add acpi_wakeup_machdep() to execute wakeup MD procedures and call it twice in interrupt disabled/enabled context (ia64 version is just dummy). - Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep(). - Move identity mapping related code to acpi_install_wakeup_handler() (i386 version) for preparation of x86/acpica/acpi_wakeup.c (MFC candidate). Reviewed by: jkim@ MFC after: 2 days	2012-06-01 15:26:32 +00:00
Alan Cox	2865094134	pmap_alloc_vhpt() doesn't need the pages that it allocates to be mapped into the kernel map, so vm_page_alloc_contig() can be used in place of contigmalloc(). Reviewed by: marcel	2012-06-01 03:56:12 +00:00
Bjoern A. Zeeb	920b965865	MFp4 bz_ipv6_fast: in_cksum.h required ip.h to be included for struct ip. To be able to use some general checksum functions like in_addword() in a non-IPv4 context, limit the (also exported to user space) IPv4 specific functions to the times, when the ip.h header is present and IPVERSION is defined (to 4). We should consider more general checksum (updating) functions to also allow easier incremental checksum updates in the L3/4 stack and firewalls, as well as ponder further requirements by certain NIC drivers needing slightly different pseudo values in offloading cases. Thinking in terms of a better "library". Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 22:00:48 +00:00
Marcel Moolenaar	a48b6a8585	Don't assume we have legacy PICs (i.e. 8259A in cascade) at the legacy I/O port addresses. Even if we do, this is hardly the place to mask interrupts. It's not clear that this was at all needed. The code came with CVS revision 1.2 of nexus.c when interrupt support was first added. What is known is that ia64 has always been designed around the IOSAPIC, and that doing I/O like this prevents Altix from booting.	2012-05-04 23:16:29 +00:00
Dimitry Andric	460378bf13	Add a convenience macro for the returns_twice attribute, and apply it to the prototypes of the appropriate functions (getcontext, savectx, setjmp, sigsetjmp and vfork). MFC after: 2 weeks	2012-04-29 11:04:31 +00:00
Ed Schouten	92396a3174	Remove pty(4) from our kernel configurations. As of FreeBSD 8, this driver should not be used. Applications that use posix_openpt(2) and openpty(3) use the pts(4) that is built into the kernel unconditionally. If it turns out high profile depend on the pty(4) module anyway, I'd rather get those fixed. So please report any issues to me. The pty(4) module is still available as a kernel module of course, so a simple `kldload pty' can be used to run old-style pseudo-terminals.	2012-03-21 08:38:42 +00:00
Tijl Coosemans	2c7879ea84	Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace amd64/i386/pc98 specialreg.h with stubs.	2012-03-19 21:34:11 +00:00
Tijl Coosemans	68156ad982	Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.	2012-03-19 21:29:57 +00:00
Tijl Coosemans	bcde3b9f67	Move userland bits (and some common kernel bits) from amd64 and i386 segments.h to a new x86 segments.h. Add __packed attribute to some structs (just to be sure). Also make it clear that i386 GDT and LDT entries are used in ia64 code.	2012-03-19 21:24:50 +00:00
Tijl Coosemans	6e310b206f	Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h. Reviewed by: kib	2012-03-18 19:12:11 +00:00
Attilio Rao	9c170fd168	Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported platforms. This will make every attempt to mount a non-mpsafe filesystem to the kernel forbidden, unless it is expressely compiled with VFS_ALLOW_NONMPSAFE option. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch.	2012-03-06 20:01:25 +00:00
John Baldwin	831ce4cb3d	- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned long for specifying a boundary constraint. - Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary constraints. These allow boundary constraints to be fully expressed for cases where sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a driver to properly specify a 4GB boundary in a PAE kernel. Note that this cannot be safely MFC'd without a lot of compat shims due to KBI changes, so I do not intend to merge it. Reviewed by: scottl	2012-03-01 19:58:34 +00:00
Gavin Atkinson	1748d1e513	Correct capitalization of "Hz" in user-visible text (manpages, printf(), etc). MFC after: 3 days	2012-02-28 13:19:34 +00:00
Marcel Moolenaar	2cb3783b57	Rev. 228360 moved the call to cpu_set_upcall() to happen before td_proc gets initialized in td (=newtd). Use td0 instead. MFC after: 3 days	2012-02-08 04:05:38 +00:00
David Schultz	2ee7b1d4ae	Add C11 macros describing subnormal numbers to float.h. Reviewed by: bde	2012-01-23 06:36:41 +00:00
David Schultz	9fa03ecd01	Add parentheses where required. Without them, `sizeof LDBL_MAX' is a syntax error and shouldn't be, while `1 FLT_ROUNDS' isn't a syntax error and should be. Thanks to bde for the examples.	2012-01-20 06:51:41 +00:00
Kenneth D. Merry	130f4520cb	Add the CAM Target Layer (CTL). CTL is a disk and processor device emulation subsystem originally written for Copan Systems under Linux starting in 2003. It has been shipping in Copan (now SGI) products since 2005. It was ported to FreeBSD in 2008, and thanks to an agreement between SGI (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is available under a BSD-style license. The intent behind the agreement was that Spectra would work to get CTL into the FreeBSD tree. Some CTL features: - Disk and processor device emulation. - Tagged queueing - SCSI task attribute support (ordered, head of queue, simple tags) - SCSI implicit command ordering support. (e.g. if a read follows a mode select, the read will be blocked until the mode select completes.) - Full task management support (abort, LUN reset, target reset, etc.) - Support for multiple ports - Support for multiple simultaneous initiators - Support for multiple simultaneous backing stores - Persistent reservation support - Mode sense/select support - Error injection support - High Availability support (1) - All I/O handled in-kernel, no userland context switch overhead. (1) HA Support is just an API stub, and needs much more to be fully functional. ctl.c: The core of CTL. Command handlers and processing, character driver, and HA support are here. ctl.h: Basic function declarations and data structures. ctl_backend.c, ctl_backend.h: The basic CTL backend API. ctl_backend_block.c, ctl_backend_block.h: The block and file backend. This allows for using a disk or a file as the backing store for a LUN. Multiple threads are started to do I/O to the backing device, primarily because the VFS API requires that to get any concurrency. ctl_backend_ramdisk.c: A "fake" ramdisk backend. It only allocates a small amount of memory to act as a source and sink for reads and writes from an initiator. Therefore it cannot be used for any real data, but it can be used to test for throughput. It can also be used to test initiators' support for extremely large LUNs. ctl_cmd_table.c: This is a table with all 256 possible SCSI opcodes, and command handler functions defined for supported opcodes. ctl_debug.h: Debugging support. ctl_error.c, ctl_error.h: CTL-specific wrappers around the CAM sense building functions. ctl_frontend.c, ctl_frontend.h: These files define the basic CTL frontend port API. ctl_frontend_cam_sim.c: This is a CTL frontend port that is also a CAM SIM. This frontend allows for using CTL without any target-capable hardware. So any LUNs you create in CTL are visible in CAM via this port. ctl_frontend_internal.c, ctl_frontend_internal.h: This is a frontend port written for Copan to do some system-specific tasks that required sending commands into CTL from inside the kernel. This isn't entirely relevant to FreeBSD in general, but can perhaps be repurposed. ctl_ha.h: This is a stubbed-out High Availability API. Much more is needed for full HA support. See the comments in the header and the description of what is needed in the README.ctl.txt file for more details. ctl_io.h: This defines most of the core CTL I/O structures. union ctl_io is conceptually very similar to CAM's union ccb. ctl_ioctl.h: This defines all ioctls available through the CTL character device, and the data structures needed for those ioctls. ctl_mem_pool.c, ctl_mem_pool.h: Generic memory pool implementation used by the internal frontend. ctl_private.h: Private data structres (e.g. CTL softc) and function prototypes. This also includes the SCSI vendor and product names used by CTL. ctl_scsi_all.c, ctl_scsi_all.h: CTL wrappers around CAM sense printing functions. ctl_ser_table.c: Command serialization table. This defines what happens when one type of command is followed by another type of command. ctl_util.c, ctl_util.h: CTL utility functions, primarily designed to be used from userland. See ctladm for the primary consumer of these functions. These include CDB building functions. scsi_ctl.c: CAM target peripheral driver and CTL frontend port. This is the path into CTL for commands from target-capable hardware/SIMs. README.ctl.txt: CTL code features, roadmap, to-do list. usr.sbin/Makefile: Add ctladm. ctladm/Makefile, ctladm/ctladm.8, ctladm/ctladm.c, ctladm/ctladm.h, ctladm/util.c: ctladm(8) is the CTL management utility. It fills a role similar to camcontrol(8). It allow configuring LUNs, issuing commands, injecting errors and various other control functions. usr.bin/Makefile: Add ctlstat. ctlstat/Makefile ctlstat/ctlstat.8, ctlstat/ctlstat.c: ctlstat(8) fills a role similar to iostat(8). It reports I/O statistics for CTL. sys/conf/files: Add CTL files. sys/conf/NOTES: Add device ctl. sys/cam/scsi_all.h: To conform to more recent specs, the inquiry CDB length field is now 2 bytes long. Add several mode page definitions for CTL. sys/cam/scsi_all.c: Handle the new 2 byte inquiry length. sys/dev/ciss/ciss.c, sys/dev/ata/atapi-cam.c, sys/cam/scsi/scsi_targ_bh.c, scsi_target/scsi_cmds.c, mlxcontrol/interface.c: Update for 2 byte inquiry length field. scsi_da.h: Add versions of the format and rigid disk pages that are in a more reasonable format for CTL. amd64/conf/GENERIC, i386/conf/GENERIC, ia64/conf/GENERIC, sparc64/conf/GENERIC: Add device ctl. i386/conf/PAE: The CTL frontend SIM at least does not compile cleanly on PAE. Sponsored by: Copan Systems, SGI and Spectra Logic MFC after: 1 month	2012-01-12 00:34:33 +00:00
Adrian Chadd	3d090dd20d	Flip on IEEE80211_SUPPORT_MESH and AH_SUPPORT_AR5416, the wlan and ath modules respectively assume this is set. Pointy hat to: adrian	2012-01-05 17:28:05 +00:00
Robert Watson	009d2032af	Add "options CAPABILITY_MODE" and "options CAPABILITIES" to GENERIC kernel configurations for various architectures in FreeBSD 10.x. This allows basic Capsicum functionality to be used in the default FreeBSD configuration on non-embedded architectures; process descriptors are not yet enabled by default. MFC after: 3 months Sponsored by: Google, Inc	2011-12-29 22:48:36 +00:00
Andriy Gapon	9976156f12	kern cons: introduce infrastructure for console grabbing by kernel At the moment grab and ungrab methods of all console drivers are no-ops. Current intended meaning of the calls is that the kernel takes control of console input. In the future the semantics may be extended to mean that the calling thread takes full ownership of the console (e.g. console output from other threads could be suspended). Inspired by: bde MFC after: 2 months	2011-12-17 15:08:43 +00:00
Alan Cox	3b03ca3bbe	Eliminate vestiges of page coloring.	2011-12-15 05:07:16 +00:00
Ed Schouten	53627e400f	Replace __signed by signed. The signed keyword is an integral part of the C syntax. There's no need to use __signed.	2011-12-13 13:38:03 +00:00
Attilio Rao	ed1f6dc235	Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on all the architectures. The option allows to mount non-MPSAFE filesystem. Without it, the kernel will refuse to mount a non-MPSAFE filesytem. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch. Tested by: gianni Reviewed by: kib	2011-11-08 10:18:07 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Ken Smith	6168545a11	Adjust the debugger options slightly. This should help me do the right thing when changing the debugging options as part of head becoming a new stable branch. It may also help people who for one reason or another want to run head but don't want it slowed down by the debugging support. Reviewed by: kib	2011-10-27 13:07:49 +00:00
Ken Smith	36d7cf2f1c	Move the debugging support to its own section. This matches what is in the other architectures' GENERIC and makes removing it at the point we're creating a new stable branch a bit easier. Discussed with: marcel	2011-10-26 22:28:28 +00:00
David Schultz	a50079b7ff	People porting FreeBSD to new architectures ought not have to implement a deprecated FPU control interface in addition to the standard one. To make this clearer, further deprecate ieeefp.h by not declaring the function prototypes except on architectures that implement them already. Currently i386 and amd64 implement the ieeefp.h interface for compatibility, and for fp[gs]etprec(), which doesn't exist on most other hardware. Powerpc, sparc64, and ia64 partially implement it and probably shouldn't, and other architectures don't implement it at all.	2011-10-21 06:41:46 +00:00
Ken Smith	7042aba738	Add a warning about why sbp(4) is commented out so that curious folks are forewarned they might wind up with a hole in their foot if they decide to give it a try. Suggested by: dougb	2011-10-19 21:55:20 +00:00
Ken Smith	4c0ba9b742	Comment out the sbp(4) driver for architectures that support it. As part of the 8.0-RELEASE cycle this was done in stable/8 (r199112) but was left alone in head so people could work on fixing an issue that caused boot failure on some motherboards. Apparently nobody has worked on it and we are getting reports of boot failure with the 9.0 test builds. So this time I'll comment out the driver in head (still hoping someone will work on it) and MFC to stable/9. Submitted by: Alberto Villa <avilla at FreeBSD dot org>	2011-10-18 13:45:16 +00:00
Konstantin Belousov	6bfe4c78c8	Remove unused define. MFC after: 1 month	2011-10-07 16:09:44 +00:00
Konstantin Belousov	578113aaa3	Remove locking of the vm page queues from several pmaps, which only protected the dirty mask updates. The dirty mask updates are handled by atomics after the r225840. Submitted by: alc Tested by: flo (sparc64) MFC after: 2 weeks	2011-09-28 15:01:20 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Konstantin Belousov	26ccf4f10f	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks	2011-09-11 16:05:09 +00:00
Konstantin Belousov	3407fefef6	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)	2011-09-06 10:30:11 +00:00
Konstantin Belousov	d98d0ce27a	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)	2011-08-09 21:01:36 +00:00
Marcel Moolenaar	3b6ecbbdf1	Fix kernel core dumps now that the kernel is using PBVM. The basic problem to solve is that we don't have a fixed mapping from kernel text to physical address so that libkvm can bootstrap itself. We solve this by passing the physical address of the bootinfo structure to the consumer as the entry point of the core file. This way, libkvm can extract the PBVM page table information and locate the kernel in the core file. We also need to dump memory chunks of type loader data, because those hold the kernel and the PBVM page table (among other things). Approved by: re (blanket)	2011-08-06 03:40:33 +00:00
Marcel Moolenaar	1af1866850	Follow-up commit: refactor pmap_kextract() to make it easier to catch and debug issues like the one fixed in the previous commit: Replace all return statements with goto statements so that we end up at a single place with a value for the physical address. Print a message for all unknown KVA addresses. Approved by: re (blanket)	2011-08-05 23:10:47 +00:00
Marcel Moolenaar	444b037cd2	Remove stray semicolon in pmap_kextract() that turned the conditional "return (0)" into an unconditional one and as such broke PBVM address queries -- such as during kernel core dumps. Approved by: re (blanket)	2011-08-05 23:05:46 +00:00
Attilio Rao	786ef92b7b	Bump MAXCPU for amd64, ia64 and XLP mips appropriately. From now on, default values for FreeBSD will be 64 maxiumum supported CPUs on amd64 and ia64 and 128 for XLP. All the other architectures seem already capped appropriately (with the exception of sparc64 which needs further support on jalapeno flavour). Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced during the infrastructure cleanup for supporting MAXCPU > 32. This covers cpumask_t retiral too. The switch is considered completed at the present time, so for whatever bug you may experience that is reconducible to that area, please report immediately. Requested by: marcel, jchandra Tested by: pluknet, sbruno Approved by: re (kib)	2011-07-19 13:00:30 +00:00
Attilio Rao	732772c701	On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes storage. Fix the sintrcnt/sintrnames specification. No MFC is previewed for this patch. Reported, reviewed and tested by: marcel Approved by: re (kib)	2011-07-19 12:41:57 +00:00
Attilio Rao	68b739cd6f	Add the possibility to specify from kernel configs MAXCPU value. This patch is going to help in cases like mips flavours where you want a more granular support on MAXCPU. No MFC is previewed for this patch. Tested by: pluknet Approved by: re (kib)	2011-07-19 00:37:24 +00:00
Attilio Rao	521ea19d1c	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)	2011-07-18 15:19:40 +00:00
John Baldwin	9c931b6fdc	Enable NEW_PCIB by default on ia64. Approved by: re (kib), marcel	2011-07-18 14:05:14 +00:00
John Baldwin	fbd3fc8fc7	Implement bus_adjust_resource() for the ia64 nexus driver. Reviewed by: marcel Approved by: re (kib)	2011-07-18 14:04:37 +00:00
Marcel Moolenaar	08f72ab5dc	Don't assume pmap_mapdev() gets called only for memory mapped I/O addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev() rather excessively (number of calls) just to get a valid KVA. To that end, have pmap_mapdev(): 1. cache the last result so that we don't waste time for multiple consecutive invocations with the same PA/SZ. 2. find the memory descriptor that covers the PA and return NULL if none was found or when the PA is for a common DRAM address. 3. Use either a region 6 or region 7 KVA, in accordance with the memory attribute.	2011-07-16 20:34:02 +00:00
Marcel Moolenaar	5b927e4ea9	Don't send EOI to the CPU before we handled the interrupt. This could potentially trigger multiple pending interrupts for level-sensitive interrupts. However, the event timer interrupt does need EOI before being handled to avoid missing clock events. These conflicting requirements are handled by having the XIV handler inform the dispatch code whether or not it send EOI to the CPU. If not, the dispatch code will do it. This allows handlers to send EOI before doing potentially long-running activities, while still have a sensible default behaviour.	2011-07-16 20:16:49 +00:00
Marcel Moolenaar	303c22c53b	Add a few more helper functions for working with memory descriptors: o efi_md_find() - returns the md that covers the given address o efi_md_last() - returns the last md in the list o efi_md_prev() - returns the md that preceeds the given md.	2011-07-16 19:56:07 +00:00
Marcel Moolenaar	0eb8c1410a	Implement basic support for memory attributes. At this time we only distinguish between UC and WB memory so that we can map the page to either a region 6 address (for UC) or a region 7 address (for WB). This change is only now possible, because previously we would map regions 6 and 7 with 256MB translations and on top of that had the kernel mapped in region 7 using a wired translation. The introduction of the PBVM moved the kernel into its own region and freed up region 7 and allowed us to revert to standard page-sized translations. This commit inroduces pmap_page_to_va() that respects the attribute.	2011-07-08 16:30:54 +00:00
Marcel Moolenaar	6359509327	Disable PREEMPTION for now. See also PR ia64/147501.	2011-07-04 16:59:26 +00:00
Attilio Rao	470107b2f1	MFC	2011-07-04 11:13:00 +00:00
Alan Cox	80788b2a27	When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib	2011-07-02 23:42:04 +00:00
Marcel Moolenaar	c412551667	Change the management of nested faults by switching to physical addressing while reading or writing the trap frame. It's not possible to guarantee that the one translation cache entry that we depend on is not going to get purged by the CPU. We already know that global shootdowns (ptc.g and/or ptc.ga) can (and will) cause multiple TC entries to get purged and we initialize tried to handle that by serializing kernel entry with these operations. However, we need to serialize kernel exit as well. But even if we can serialize, it appears that CPU threads within a core can affect each other's TC entries beyond the global shootdown. This would mean serializing any and all translatation cache updates with the threads in a core with the kernel entry and exit of any thread in that core. This is just too painful and complicated. Since we already properly coded for the 2 nested faults that we can get, all we need to do is use those to obtain the physical address of the trap frame, switch to physical mode and in that way eliminate any further faults. The trap frame is already aligned to 1KB boundaries to make sure we don't cross the page boundary, this is safe to do. We still need to serialize ptc.g or ptc.ga across CPUs because the platform can only have 1 such operation outstanding at the same time. We can now use a regular (spin) lock for this. Also, it has been observed that we can get a nested TLB faults for region 7 virtual addresses. This was unexpected. For now, we enhance the nested TLB fault handler to deal with those as well, but it needs to be understood.	2011-06-30 20:34:55 +00:00
Attilio Rao	7b744f6b01	MFC	2011-06-30 10:19:43 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Attilio Rao	cfdfd32d34	MFC	2011-06-26 17:30:46 +00:00
Marcel Moolenaar	191eae8259	Oops. The sec field of struct bintime is not a 32-bit type. It's time_t, which is 64 bits on ia64.	2011-06-25 17:58:35 +00:00
Marcel Moolenaar	3ebd36e3d0	Define the minimum fractional period in terms of hz. We know hz is a magnitude smaller than itc_freq. A minimum period of 10*hz is sufficient precision. As a side-effect, the number of clocks per second, when the machine is idle, dropped by more than 50%. Be anal and define the maximum period to be at least 4G seconds. With a 64-bit counter and an ITC frequency that's expected to be always less than 4Ghz, it takes longer than that to wrap around.	2011-06-25 16:35:43 +00:00
Marcel Moolenaar	dbd57cbf8f	Replace the original copyright notice with my own. Everything in this file is written by me and has no bearing on the initial or original version.	2011-06-25 03:43:58 +00:00
Marcel Moolenaar	3f5deb80b7	Update copyright.	2011-06-25 03:37:40 +00:00
Marcel Moolenaar	e920e3978e	Switch to the event timers infrastructure. This includes: o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order). Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary. Thanks to: mav@ for his support, feedback and explanations. ET stats while committing: eris% sysctl machdep.cpu \| grep nclks machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i \| grep clock clock 108599 50	2011-06-25 02:15:14 +00:00
Attilio Rao	de138ec703	MFC	2011-06-24 16:35:40 +00:00
Marcel Moolenaar	ded49c2eae	Unblock the outgoing thread after we performed pmap_switch() to switch the region registers. pmap_switch() returns the pmap for which the region register are currently programmed, which needs to be re-programmed on the CPU the ougoing thread gets switched in. This change does not noticibly change anything or fix known bugs, but does give me a warm fuzzy feeling by being more correct.	2011-06-23 16:21:43 +00:00
Attilio Rao	9b571ec6b3	MFC	2011-06-22 19:42:32 +00:00

1 2 3 4 5 ...

2020 Commits