freebsd-skq

Author	SHA1	Message	Date
Ian Lepore	6fb87f7371	When running on armv6, set alignment checking to modulo-4 mode rather than modulo-8, because clang emits ldrd and strd instructions for addresses that are only 4-byte aligned	2013-03-31 22:43:16 +00:00
Ian Lepore	070cf887bb	When running on armv6, set alignment checking to modulo-4 mode rather than modulo-8, because clang emits ldrd and strd instructions for addresses that are only 4-byte aligned.	2013-03-31 22:42:25 +00:00
Michael Tuexen	ebae998767	Add a macro for checking for IPv4 link local addresses. MFC after: 1 week	2013-03-31 18:27:46 +00:00
Jilles Tjoelker	d289dc7b73	Rename do_pipe() to kern_pipe2() and declare it properly.	2013-03-31 17:42:54 +00:00
Ian Lepore	27aa887af3	Fix a typo in the CF device driver name that prevented instantiation.	2013-03-31 12:51:56 +00:00
Neel Natu	77d8fd9bb3	Add counter to keep track of the number of timer interrupts generated by the local apic for each virtual cpu.	2013-03-31 03:56:48 +00:00
Neel Natu	b5aaf7b22b	Add some more stats to keep track of all the reasons that a vcpu is exiting.	2013-03-30 17:46:03 +00:00
Tim Kientzle	2f37ec8b30	Initialize sym_count to 0. This fixes a compiler warning introduced in r248121.	2013-03-30 16:33:16 +00:00
Matthew D Fleming	926cd204c7	Use a shared lock for VOP_GETEXTATTR, as it is a read-like operation. MFC after: 1 week	2013-03-30 15:09:04 +00:00
Jilles Tjoelker	937c916587	Improve namespacing in <sys/socket.h>: * MSG_NOSIGNAL is in POSIX.1-2008. * MSG_NOTIFICATION (SCTP) is not in POSIX. * PRU_FLUSH_* (SCTP) are not in POSIX. * bindat()/connectat() are not in POSIX. Discussed with: rrs (PRU_FLUSH_*)	2013-03-30 13:30:27 +00:00
Adrian Chadd	a296efdeeb	AR933x CPU device improvements: * Add baud rate and divisor programming code. See below for more information. * Flesh out ar933x_init() to disable interrupts and program the initial console setup. * Remove #if 0'ed code from ar933x_term(). * Explain what these functions do. Now, the baud rate and divisor code comes from Linux, as a submission to the OpenWRT project and Linux kernel from Gabor Juhos <juhosg@openwrt.org>. The original ticket for this code is https://dev.openwrt.org/ticket/12031 . I've contacted Gabor and asked for his permission to also licence the patch in question (which covers this code) to BSD lience and he's agreed. Hence why I'm including it here in FreeBSD. Tested: * AP121 (AR9330)	2013-03-30 04:31:29 +00:00
Adrian Chadd	8eeea2945d	AR933x UART updates: * Default clock is 25MHz; * Remove the UART register macro here - it's not needed as we don't need to "adjust" the register offset / spacing at all; * Remove unused fields in the softc. Tested: * AP121	2013-03-30 04:13:47 +00:00
Navdeep Parhar	d14b0ac129	cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This includes support for the NIC and TOE features of the 40G, 10G, and 1G/100M cards based on the T5. The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4) has been updated instead of writing a brand new driver. T5 cards will show up as cxl (short for cxlgb) ports attached to the t5nex bus driver. Sponsored by: Chelsio	2013-03-30 02:26:20 +00:00
Steven Hartland	5f83aee5e5	Adds the ability to enable / disable sorting of BIO requests queued within CAM. This can significantly improve performance particularly for SSDs which don't suffer from seek latencies. The sysctl / tunable kern.cam.sort_io_queues provides the systems default setting where:- 0 = queued BIOs are NOT sorted 1 = queued BIOs are sorted (default) Each device gets its own sysctl kern.cam.<type>.<id>.sort_io_queue Valid values are:- -1 = use system default (default) 0 = queued BIOs are NOT sorted 1 = queued BIOs are sorted Note: Additional patch will look to add automatic use of none sorted queues for none rotating media e.g. SSD's Reviewed by: scottl Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-29 22:58:15 +00:00
Ed Maste	ce7ad6640c	Keep fwd_tag around for subsequent pcb lookups For TIMEWAIT handling tcp_input may have to jump back for an additional pass through pcblookup. Prior to this change the fwd_tag had been discarded after the first lookup, so a new connection attempt delivered locally via 'ipfw fwd' would fail to find a match. As of r248886 the tag will be detached and freed when passed to the socket buffer.	2013-03-29 20:51:44 +00:00
Jim Harris	1e526bc478	Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or NULL. This simplifies decisions around if/how requests are routed through busdma. It also paves the way for supporting unmapped bios. Sponsored by: Intel	2013-03-29 20:34:28 +00:00
Adrian Chadd	033891b29d	Disable this; it's a local option that I haven't yet committed to -HEAD.	2013-03-29 20:07:51 +00:00
Ian Lepore	63cdf42e8c	Add userland access to at91 gpio functionality via ioctl calls. Also, add the ability for userland to be notified of changes on gpio pins via a select(2)/read(2) interface. Change the interrupt handler from filtered to threaded. Because of the uiomove() calls in the new interface, change locking from standard mutex to sx. Add / restore the at91_gpio_high_z() function. Reviewed by: imp (long ago)	2013-03-29 19:52:57 +00:00
Ian Lepore	914421fa79	Change the API for at91_pio_gpio_get() to return the entire masked set of bits, not just a 0/1 indicating whether any of the masked bits are on. This is compatible with the single in-tree caller of this function right now (at91_vbus_poll() in dev/usb/controller/at91dci_atemelarm.c).	2013-03-29 19:04:18 +00:00
Ian Lepore	b39ec0de86	Call soc_info.soc_data->soc_clock_init() before at91_pmc_init_clock(), so that the latter correctly fills in the clock data structures based on proper hardware-specific shift and mask values from the soc_data structure.	2013-03-29 18:47:08 +00:00
Jack F Vogel	be2095895a	Change the define in the header to eliminate unnecessary data when using LEGACY TX.	2013-03-29 18:46:13 +00:00
Ian Lepore	fce4536cfd	Add a couple forward declarations, so that board support routines don't have to pre-include a bunch of header files they don't need just to use this one.	2013-03-29 18:43:10 +00:00
Jack F Vogel	c05891a6da	Change defines in the igb driver to allow an easier selection of the older if_start/non-multiqueue interface from the stack. This is not the default, but can be turned on in the Makefile now regardless of the OS level to allow either testing or use of ALTQ. MFC after: one week	2013-03-29 18:25:45 +00:00
Ian Lepore	5c4938ee48	Redo the workaround for at91rm9200 erratum #26 in a way that doesn't cause a lockup on some rm92 hardware.	2013-03-29 18:17:51 +00:00
Ian Lepore	c29eb73802	Fix a typo: the RXD0 pin is PA18, not PA19.	2013-03-29 18:06:54 +00:00
Jack F Vogel	7bdac10465	Two small fixes: Set promiscuous code was unconditionally turning off multicast when turning off promiscuous mode, this should only be done when there are less than MAX groups. Thanks to Mike Karels for this correction. Second, the overtmp interrupt setup/detection was wrong, correcting it. MFC after: one week	2013-03-29 18:03:00 +00:00
Ian Lepore	14146e9a04	Remove a really noisy printf left over from debugging hardware errata.	2013-03-29 17:57:24 +00:00
Jim Harris	10a93479b9	Add bus_dmamap_load_bio for non-CAM disk drivers that wish to enable unmapped I/O. Sponsored by: Intel Reviewed by: kib	2013-03-29 16:26:25 +00:00
Jim Harris	86675b5c0d	Add CTR5() to bus_dmamap_load_ccb, similar to other bus_dmamap_load_* functions. Sponsored by: Intel	2013-03-29 16:00:16 +00:00
Jim Harris	ab72998ef7	Do not add 1 to nsegs before passing to CTR5(), since nsegs has already been incremented before these calls. Sponsored by: Intel	2013-03-29 15:54:12 +00:00
Jim Harris	b327350604	Pass correct parameter to CTR5() in bus_dmamap_load_uio. Sponsored by: Intel	2013-03-29 15:51:45 +00:00
Gleb Smirnoff	21f398487c	Fix bug in m_split() in a case when split len matches len of the first mbuf, and the first mbuf is M_PKTHDR. PR: kern/176144 Submitted by: Jacques Fourie <jacques.fourie gmail.com>	2013-03-29 14:10:40 +00:00
Gleb Smirnoff	844cacd17c	Once ng_ksocket(4) is fixed, re-apply r194662. See this revision for longer description. Discussed with: andre, rwatson Sponsored by: Nginx, Inc.	2013-03-29 14:06:04 +00:00
Gleb Smirnoff	9a4d9e198a	Revamp mbuf handling in ng_ksocket_incoming2(): - Clear code that workarounded a bug in FreeBSD 3, and even predated import of netgraph(4). - Clear workaround for m_nextpkt pointing into next record in buffer (fixed in r248884). Assert that m_nextpkt is clear. - Do not rely on SOCK_STREAM sockets containing M_PKTHDR mbufs. Create a header ourselves and attach chain to it. This is correct fix for kern/154676. PR: kern/154676 Sponsored by: Nginx, Inc	2013-03-29 14:04:26 +00:00
Gleb Smirnoff	a307eb26ed	When soreceive_generic() hands off an mbuf from buffer, clear its pointer to next record, since next record belongs to the buffer, and shouldn't be leaked. The ng_ksocket(4) used to clear this pointer itself, but the correct place is here. Sponsored by: Nginx, Inc	2013-03-29 13:57:55 +00:00
Gleb Smirnoff	6b1781e3ea	Whitespace.	2013-03-29 13:53:14 +00:00
Gleb Smirnoff	d09c774bb5	Non-functional cleanup of ng_ksocket_incoming2().	2013-03-29 13:51:01 +00:00
Marius Strobl	03efffd10e	Unbreak compilation after r248868.	2013-03-29 11:53:20 +00:00
Alexander Motin	09cfadbe7f	Make pre-shutdown flush and spindown routines to not use xpt_polled_action(), but execute the commands in regular way. There is no any reason to cook CPU while the system is still fully operational. After this change polling in CAM is used only for kernel dumping.	2013-03-29 08:33:18 +00:00
Alexander Motin	f371c9e260	Implement CAM_PERIPH_FOREACH() macro, safely iterating over the list of driver's periphs, acquiring and releaseing periph references while doing it. Use it to iterate over the lists of ada and da periphs when flushing caches and putting devices to sleep on shutdown and suspend. Previous code could panic in theory if some device disappear in the middle of the process.	2013-03-29 07:50:47 +00:00
Adrian Chadd	10e00ec8cc	For the AR933x UART, the serial clock is not the AHB clock, it's the reference clock. So use that instead.	2013-03-29 06:32:39 +00:00
Adrian Chadd	19f293bd60	* Fix clock register definitions * Add maximum clock register values	2013-03-29 06:32:02 +00:00
Adrian Chadd	600f8cb57a	Print out the platform reference frequency. This is useful for AR933x platforms where that matters.	2013-03-29 06:31:31 +00:00
Neel Natu	66f71b7d24	Allow caller to skip 'guest linear address' validation when doing instruction decode. This is to accomodate hardware assist implementations that do not provide the 'guest linear address' as part of nested page fault collateral. Submitted by: Anish Gupta (akgupt3 at gmail dot com)	2013-03-28 21:26:19 +00:00
Adrian Chadd	43b36ea90a	Initial (unfinished!) AR933x support.	2013-03-28 20:48:58 +00:00
Mark Johnston	83a3ff21a8	Ignore interface renames instead of removing the interface from the bridge group. Reviewed by: rstone Approved by: rstone (co-mentor) Sponsored by: Sandvine Incorporated MFC after: 1 week	2013-03-28 20:37:07 +00:00
Adrian Chadd	7d52c7525f	Tie in the AR933x support into -HEAD.	2013-03-28 19:30:56 +00:00
Adrian Chadd	308a33172f	Bring over the initial, CPU-only UART support for the AR933x SoC. This implements the kernel glue needed (getc, putc, rxready). This isn't a 16550 UART, even if the datasheet overview claims so. The Linux ar933x support was used as a reference, however the uart code is a reimplementation. Attentive viewers will note that the uart code is based off of the ns8250 code and the UART bus code is a stubbed-out version of this. I'll be replacing it with non-stubbed versions soon, making this a fully featured driver. Tested: * AP121 reference board (AR933x), booting through the mountroot> prompt; then doing some basic interactive tests in ddb.	2013-03-28 19:27:06 +00:00
Sean Bruno	cc0c1555d3	Update hwpmc to support Haswell class processors. 0x3C: /* Per Intel document 325462-045US 01/2013. */ Add manpage to document all the goodness that is available in this processor model. Submitted by: hiren panchasara <hiren.panchasara@gmail.com> Reviewed by: jimharris, sbruno Obtained from: Yahoo! Inc. MFC after: 2 weeks	2013-03-28 19:15:54 +00:00
Jim Harris	64432b473b	Remove obsolete comment. This code has now been tested with the QEMU NVMe device emulator.	2013-03-28 16:57:48 +00:00
Jim Harris	bb852ae89b	Delete extra IO qpairs allocated based on number of MSI-X vectors, but later found to not be usable because the controller doesn't support the same number of queues. This is not the normal case, but does occur with the Chatham prototype board. Sponsored by: Intel	2013-03-28 16:54:19 +00:00
Scott Long	07dbf2c768	Several fixes and improvements to sendfile() 1. If we wanted to send exactly as many bytes as the socket buffer is sized for, the inner loop of kern_sendfile() would see that the socket is full before seeing that it had no more bytes left to send. This would cause it to return EAGAIN to the caller instead of success. Fix by changing the order that these conditions are tested. 2. Simplify the calculation for the bytes to send in each iteration of the inner loop of kern_sendfile() 3. Fix some calls with bogus arguments to sf_buf_ext(). These would only trigger on mbuf allocation failure, but would be hilariously bad if they did trigger. Submitted by: gibbs(3), andre(2) Reviewed by: emax, andre Obtained from: Netflix MFC after: 1 week	2013-03-28 14:14:28 +00:00
Sean Bruno	b27556e012	Restore DB_COMMAND capabilities of ciss(4) for debugging and diagnostics Obtained from: Yahoo! Inc. MFC after: 2 weeks	2013-03-28 12:44:43 +00:00
Alexander Motin	47bf7bcb97	Except one case mps(4) driver does not touch the data and works well with unmapped I/O. That one exception is access to INQUIRY VPD request result. Those requests are never unmapped now, but to be safe add respective check there and allow unmapped I/O for the SIM by setting PIM_UNMAPPED flag.	2013-03-28 11:24:30 +00:00
Sean Bruno	3ea59fd3a4	Fix compile of ciss(4) with CISS_DEBUG defined Obtained from: Yahoo! Inc. MFC after: 2 weeks	2013-03-28 11:00:41 +00:00
Konstantin Belousov	bafa6cfc93	Release the v_writecount reference on the vnode in case of error, before the vnode is vput() in vm_mmap_vnode(). Error return means that there is no use reference on the vnode from the vm object reference, and failing to restore v_writecount breaks the invariant that v_writecount is less or equal to the usecount. The situation observed when nfs client returns ESTALE for VOP_GETATTR() after the open. In collaboration with: pho MFC after: 1 week	2013-03-28 06:39:27 +00:00
Adrian Chadd	09ac4e68f3	Fix the AR933x platform device start/stop code. This was ported from the AR724x code and I think that also doesn't quite work. I'll investigate that soon. With this in place the system reset path works, so 'reset' from kdb actually resets the SoC. Tested: * AP121 test board	2013-03-28 05:43:03 +00:00
Jim Harris	47301c53ed	deferal -> deferral	2013-03-27 23:07:43 +00:00
Alexander Motin	cd05a36e54	On SIM destruction free associated CCBs, preallocated inside xpt_get_ccb(). Before this change they were just leaked. Fortunately USB sticks now use only one CCB, and so leak was only 2KB per detach, while other bigger SIMs with much more allocated CCBs are rarely detached. MFC after: 2 weeks	2013-03-27 18:55:01 +00:00
Jung-uk Kim	ba9855e380	Limit the amount of video memory we map for the driver to the maximum value. This basically restores the spirit of r203535, which was partially reverted in r205557, while we still map fixed amount to work around transient issues we experienced with r203535. Prodded by: avg Tested by: avg MFC after: 1 week	2013-03-27 18:06:28 +00:00
Konstantin Belousov	f3215a60fd	Fix a race with the vnode reclamation in the aio_qphysio(). Obtain the thread reference on the vp->v_rdev and use the returned struct cdev dev instead of using vp->v_rdev. Call dev_strategy_csw() instead of dev_strategy(), since we now own the reference. Since the csw was already calculated, test d_flags to avoid mapping the buffer if the driver supports unmapped requests []. Suggested by: kan [*] Reviewed by: kan (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-03-27 11:47:52 +00:00
Konstantin Belousov	d1e99f43ed	Add dev_strategy_csw() function, which is similar to dev_strategy() but assumes that a thread reference was already obtained on the passed device. Use the function from physio(), to avoid two extra dev_mtx lock and unlock. Note that physio() is always used as the cdevsw method, or is called from a cdevsw method, and the caller already owns the reference. dev_strategy() is left to keep KPI intact, but now it is implemented as a wrapper around dev_strategy_csw(). Do some style cleanup in physio(). Requested and reviewed by: kan (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-03-27 11:34:27 +00:00
Konstantin Belousov	88c8c0a70f	On i386, double the default size of the bio transient map. With the maxbcache size fixed, the auto-tuned transient map is too small for real-world load on i386. Tested by: David Wolfskill Sponsored by: The FreeBSD Foundation	2013-03-27 10:56:15 +00:00
Konstantin Belousov	d4e9009cc8	Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal buffer map size, auto-tuned on the 4GB machine. Having the maxbcache bigger than the buffer map causes the transient bio map sizing logic to assume that there is enough KVA to use approximately 90MB (buffer map is sized to 110MB, and maxbcache is 200MB). The increase in the KVA usage caused other big KVA consumers, like nvidia.ko, to fail the initialization. Change the definition for both PAE and non-PAE cases, since PAE is even more KVA-starved. Reported and tested by: David Wolfskill Discussed with: alc Sponsored by: The FreeBSD Foundation	2013-03-27 10:52:18 +00:00
Alexander Motin	7019329c1a	Add Subsystem ID field to the quirk table. Use it to identify Mac Pro 1,1, which requires OVREF to be set to get proper playback volume, but which has all zeroes in HDA controller subdevice IDs on PCI. MFC after: 1 month Sponsored by:	2013-03-27 07:30:08 +00:00
Adrian Chadd	601a83560e	Commit initial (unfinished!) support for the AR933x series of embedded CPUs. The AR933x is a mips24k based SoC with an AR9380 series SoC on board, two gigabit ethernet interfaces and an internal 10/100mbit ethernet switch. There's also the normal interfaces (USB, ethernet, uart, GPIO.) The downside? There's a non-ns8250 UART device. With a very basic UART driver (not in this commit) the SoC is initialised and boots up. I'll commit the UART code soon and then link it into the general setup path. This code is a re-implementation based from the Linux kernel / openwrt AR933x support. TODO: * UART (obviously) * All of the ethernet, USB and wifi SoC glue, including ethernet PLL programming.	2013-03-27 03:38:58 +00:00
Adrian Chadd	a4a1b49368	Add the reference clock for each supported chip. Obtained from: Linux (openwrt)	2013-03-27 03:33:19 +00:00
Jim Harris	bdd1fd402c	Fix printf format issue on i386. Reported by: bz	2013-03-27 00:37:00 +00:00
Adrian Chadd	b92b5f6e3a	* Stop processing after HAL_EIO; this is what the reference driver does. * If we hit an empty queue condition (which I haven't yet root caused, grr.) .. make sure we release the lock before continuing.	2013-03-27 00:35:45 +00:00
Jim Harris	944497f65f	Panic should the SCI framework ever request a pointer into the ccb's data buffer for a ccb that is unmapped. This case is currently not possible, since the SCI framework only requests these pointers for doing SCSI/ATA translation of non- READ/WRITE commands. The panic is more to protect against the unlikely future scenario where additional commands could be unmapped. Sponsored by: Intel	2013-03-27 00:15:22 +00:00
Jim Harris	1da66a2776	Report support for unmapped I/O by adding PIM_UNMAPPED flag. Submitted by: jhb, scottl	2013-03-26 23:04:06 +00:00
Jim Harris	547d523eb8	Clean up debug prints. 1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:17:10 +00:00
Jim Harris	dd433dd0fb	Move common code from the different nvme_allocate_request functions into a separate function. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:13:07 +00:00
Jim Harris	237d2019e5	Change a number of malloc(9) calls to use M_WAITOK instead of M_NOWAIT. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:11:34 +00:00
Jim Harris	955910a916	Replace usages of mtx_pool_find used for admin commands with a polling mechanism. Now that all requests are timed, we are guaranteed to get a completion notification, even if it is an abort status due to a timed out admin command. This has the effect of simplifying the controller and namespace setup code, so that it reads straight through rather than broken up into a bunch of different callback functions. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:09:51 +00:00
Jim Harris	43a3725688	Abort and do not retry any outstanding admin commands left over after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:06:05 +00:00
Jim Harris	232e2edb6c	Add the ability to internally mark a controller as failed, if it is unable to start or reset. Also add a notifier for NVMe consumers for controller fail conditions and plumb this notifier for nvd(4) to destroy the associated GEOM disks when a failure occurs. This requires a bit of work to cover the races when a consumer is sending I/O requests to a controller that is transitioning to the failed state. To help cover this condition, add a task to defer completion of I/Os submitted to a failed controller, so that the consumer will still always receive its completions in a different context than the submission. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:58:38 +00:00
Jim Harris	3d7eb41c1b	Just disable the controller instead of deleting IO queues during detach. This is just as effective, and removes the need for a bunch of admin commands to a controller that's going to be disabled shortly anyways. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:48:41 +00:00
Jim Harris	ec84ecbba0	Have nvd(4) register for controller notifications. Also have nvd maintain controller/namespace relationships internally. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:45:37 +00:00
Jim Harris	74019d4b67	Set Pre-boot Software Load Count to 0 at the end of the controller start process. The spec indicates the OS driver should use Set Features (Software Progress Marker) to set the pre-boot software load count to 0 after the OS driver has successfully been initialized. This allows pre-boot software to determine if there have been any issues with the OS loading. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:42:53 +00:00
Jim Harris	be34f21609	Remove the is_started flag from struct nvme_controller. This flag was originally added to communicate to the sysctl code which oids should be built, but there are easier ways to do this. This needs to be cleaned up prior to adding new controller states - for example, controller failure. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:19:26 +00:00
Jim Harris	02e3348484	Ensure the controller's MDTS is accounted for in max_xfer_size. The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to allow the controller to specify the maximum I/O data transfer size. nvme(4) already provides a default maximum, but make sure it does not exceed what MDTS reports. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:16:53 +00:00
Jim Harris	cb5b7c1304	Cap the number of retry attempts to a configurable number. This ensures that if a specific I/O repeatedly times out, we don't retry it indefinitely. The default number of retries will be 4, but is adjusted using hw.nvme.retry_count. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:14:51 +00:00
Jim Harris	0d7e13ecb2	Pass associated log page data to async event consumers, if requested. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:08:32 +00:00
Jim Harris	2868353a57	When an asynchronous event request is completed, automatically fetch the specified log page. This satisfies the spec condition that future async events of the same type will not be sent until the associated log page is fetched. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:05:15 +00:00
Jim Harris	0692579bf3	Add structure definitions and controller command function for firmware log pages. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:03:03 +00:00
Jim Harris	0892778256	Add structure definitions and a controller command function for error log pages. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:01:53 +00:00
Jim Harris	cf81529ce3	Create struct nvme_status. NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures. While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:00:18 +00:00
Jim Harris	f37c22a3bd	Make nvme_ctrlr_reset a nop if a reset is already in progress. This protects against cases where a controller crashes with multiple I/O outstanding, each timing out and requesting controller resets simultaneously. While here, remove a debugging printf from a previous commit, and add more logging around I/O that need to be resubmitted after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:56:58 +00:00
Jim Harris	48ce317898	By default, always escalate to controller reset when an I/O times out. While aborts are typically cleaner than a full controller reset, many times an I/O timeout indicates other controller-level issues where aborts may not work. NVMe drivers for other operating systems are also defaulting to controller reset rather than aborts for timed out I/O. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:32:57 +00:00
Pedro F. Giffuni	f5678b698a	Dtrace: dtrace.c erroneously checks for memory alignment on amd64. Merge change from illumos: 3511 dtrace.c erroneously checks for memory alignment on amd64 Illumos Revision: c93cc65 Reference: https://www.illumos.org/issues/3511 Obtained from: Illumos MFC after: 3 weeks	2013-03-26 20:17:08 +00:00
Adrian Chadd	92e84e43a6	Implement the replacement EDMA FIFO code. (Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was cleared!) This code implements a whole bunch of sorely needed EDMA TX improvements along with CABQ TX support. The specifics: * When filling/refilling the FIFO, use the new TXQ staging queue for FIFO frames * Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly. For now the non-CABQ transmit path pushes one frame into the TXQ staging queue without setting up the intermediary link pointers to chain them together, so draining frames from the txq staging queue to the FIFO queue occurs AMPDU / MPDU at a time. * In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO at once. * Now that frames are in a FIFO pending queue, we can top up the FIFO after completing a single frame. This means we can keep it filled rather than waiting for it drain and _then_ adding more frames. * The EDMA restart routine now walks the FIFO queue in the TXQ rather than the pending queue and re-initialises the FIFO with that. * When restarting EDMA, we may have partially completed sending a list. So stamp the first frame that we see in a list with ATH_BUF_FIFOPTR and push _that_ into the hardware. * When completing frames, only check those on the FIFO queue. We should never ever queue frames from the pending queue direct to the hardware, so there's no point in checking. * Until I figure out what's going on, make sure if the TXSTATUS for an empty queue pops up, complain loudly and continue. This will stop the panics that people are seeing. I'll add some code later which will assist in ensuring I'm populating each descriptor with the correct queue ID. * When considering whether to queue frames to the hardware queue directly or software queue frames, make sure the depth of the FIFO is taken into account now. * When completing frames, tag them with ATH_BUF_BUSY if they're not the final frame in a FIFO list. The same holding descriptor behaviour is required when handling descriptors linked together with a link pointer as the hardware will re-read the previous descriptor to refresh the link pointer before contiuning. * .. and if we complete the FIFO list (ie, the buffer has ATH_BUF_FIFOEND set), then we don't need the holding buffer any longer. Thus, free it. Tested: * AR9380/AR9580, STA and hostap * AR9280, STA/hostap TODO: * I don't yet trust that the EDMA restart routine is totally correct in all circumstances. I'll continue to thrash this out under heavy multiple-TXQ traffic load and fix whatever pops up.	2013-03-26 20:04:45 +00:00
Jim Harris	941433323c	Add a tunable for the I/O timeout interval. Default is still 30 seconds, but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:02:35 +00:00
Jim Harris	12d191ec12	Add handling for controller fatal status (csts.cfs). On any I/O timeout, check for csts.cfs==1. If set, the controller is reporting fatal status and we reset the controller immediately, rather than trying to abort the timed out command. This changeset also includes deferring the controller start portion of the reset to a separate task. This ensures we are always performing a controller start operation from a consistent context. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:58:17 +00:00
Jim Harris	dbba74428b	Add API for nvme consumers to access controller and namespace identify data. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:52:57 +00:00
Jim Harris	b846efd7ec	Add controller reset capability to nvme(4) and ability to explicitly invoke it from nvmecontrol(8). Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:50:46 +00:00
Adrian Chadd	3feffbd796	Add per-TXQ EDMA FIFO staging queue support. Each set of frames pushed into a FIFO is represented by a list of ath_bufs - the first ath_buf in the FIFO list is marked with ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with ATH_BUF_FIFOEND. Multiple lists of frames are just glued together in the TAILQ as per normal - except that at the end of a FIFO list, the descriptor link pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND. For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q) stays the same and is treated the same. For EDMA chipsets the frames are pushed into axq_q and then when the FIFO is to be (re) filled, frames will be moved onto the FIFO queue and then pushed into the FIFO. So: * Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame lists. It's a TAILQ (like the normal hardware frame queue) rather than the ath9k list-of-lists to represent FIFO entries. * Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND. * When allocating ath_buf entries, clear out the flag value before returning it or it'll end up having stale flags. * When cloning ath_buf entries, only clone ATH_BUF_MGMT. Don't clone the FIFO related flags. * Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_ drain the normal hardware queue. Tested: * AR9280, hostap * AR9280, STA * AR9380/AR9580 - hostap TODO: * Test on other chipsets, just to be thorough.	2013-03-26 19:46:51 +00:00
Jim Harris	65c2474e6d	Keep a doubly-linked list of outstanding trackers. This enables in-order re-submission of I/O after a controller reset. Sponsored by: Intel	2013-03-26 18:45:16 +00:00
Jim Harris	5f1e251de6	Create a generic nvme_ctrlr_cmd_get_log_page function, and change the health information log page function to use it. Sponsored by: Intel	2013-03-26 18:43:53 +00:00
Jim Harris	99d99f7408	Expose the get/set features API to nvme consumers. Sponsored by: Intel	2013-03-26 18:42:05 +00:00
Jim Harris	038a5ee403	Add an interface for nvme shim drivers (i.e. nvd) to register for notifications when new nvme controllers are added to the system. Sponsored by: Intel	2013-03-26 18:39:54 +00:00
Jim Harris	0a0b08cc30	Enable asynchronous event requests on non-Chatham devices. Also add logic to clean up all outstanding asynchronous event requests when resetting or shutting down the controller, since these requests will not be explicitly completed by the controller itself. Sponsored by: Intel	2013-03-26 18:37:36 +00:00
Jim Harris	990e741c18	Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct() function. Sponsored by: Intel	2013-03-26 18:34:19 +00:00
Jim Harris	274b3a88fa	Specify command timeout interval on a per-command type basis. This is primarily driven by the need to disable timeouts for asynchronous event requests, which by nature should not be timed out. Sponsored by: Intel	2013-03-26 18:31:46 +00:00
Jim Harris	879de69910	Explicitly abort a timed out command, if the ABORT command sent to the controller indicates the command was not found. Sponsored by: Intel	2013-03-26 18:29:04 +00:00
Jim Harris	6cb0607039	Break out the code for completing an nvme_tracker object into a separate function. This allows for completions outside the normal completion path, for example when an ABORT command fails due to the controller reporting the targeted command does not exist. This is mainly for protection against a faulty controller, but we need to clean up our internal request nonetheless. Sponsored by: Intel	2013-03-26 18:27:22 +00:00
Jim Harris	448195e764	Add support for ABORT commands, including issuing these commands when an I/O times out. Also ensure that we retry commands that are aborted due to a timeout. Sponsored by: Intel	2013-03-26 18:23:35 +00:00
Jim Harris	d6f54866ea	Add an internal _nvme_qpair_submit_request function, which performs the submit action assuming the qpair lock has already been acquired. Also change nvme_qpair_submit_request to just lock/unlock the mutex around a call to this new function. This fixes a recursive mutex acquisition in the retry path. Sponsored by: Intel	2013-03-26 18:20:11 +00:00
Jim Harris	aaf6b84a4e	Make the DSM range count 0-based. Previously we were deallocating one more LBA than we should have been. Sponsored by: Intel	2013-03-26 18:16:30 +00:00
Jim Harris	51a9feb9b0	Do not look at the namespace's thin provisioning field to determine if DSM command is supported. The two are not related. Sponsored by: Intel	2013-03-26 18:01:24 +00:00
Alan Cox	3fc10b7363	Introduce vm_radix_isleaf() and use it in a couple places. As compared to using vm_radix_node_page() == NULL, the compiler is able to generate one less conditional branch when vm_radix_isleaf() is used. More use cases involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(), and vm_radix_remove() will follow. Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division	2013-03-26 17:30:40 +00:00
Gleb Smirnoff	fa75f402ae	Return ENOMEM if malloc() fails.	2013-03-26 14:08:14 +00:00
Gleb Smirnoff	a23a2dd138	Cleanup: wrap long lines, cleanup comments, etc.	2013-03-26 14:05:37 +00:00
Alexander Motin	7868ec506b	geom_slice.c and its consumers like GEOM_LABEL are not touching the data unless hotspots are used. Pass G_PF_ACCEPT_UNMAPPED flag through except such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).	2013-03-26 07:55:24 +00:00
Alexander Motin	6c6e13b6e1	GEOM NOP does not touch the data, so pass G_PF_ACCEPT_UNMAPPED flag through.	2013-03-26 05:58:49 +00:00
Alexander Motin	a93c0ed463	Remove extra bio_data and bio_length copying to child request after calling g_clone_bio(), that already copied them.	2013-03-26 05:42:12 +00:00
Adrian Chadd	35bec3655e	Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link - they're no longer needed for the legacy path and they're not wanted for the EDMA path. Tested: * AR9280, hostap + CABQ * AR9380/AR9580, hostap + CABQ	2013-03-26 04:56:54 +00:00
Adrian Chadd	b708ea2941	Remove this dead code - it's no longer relevant (as yes, we do actually support TX on EDMA chips.)	2013-03-26 04:53:40 +00:00
Adrian Chadd	b6ef0f8ac8	Convert the CABQ queue code over to use the HAL link pointer method instead of axq_link. This (among a bunch of uncommitted work) is required for EDMA chips to correctly transmit frames on the CABQ. Tested: * AR9280, hostap mode * AR9380/AR9580, hostap mode (staggered beacons) TODO: * This code only really gets called when burst beacons are used; it glues multiple CABQ queues together when sending to the hardware. * More thorough bursted beacon testing! (first requires some work with the beacon queue code for bursted beacons, as that currently uses the link pointer and will fail on EDMA chips.)	2013-03-26 04:52:16 +00:00
Adrian Chadd	9e7259a2a3	Convert the EDMA multicast queue code over to use the HAL method to set the descriptor link pointer, rather than directly. This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue has a chance in hell of being put together right. Tested: * AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)	2013-03-26 04:48:58 +00:00
Adrian Chadd	0891354cd2	Migrate the multicast queue assembly code to not use the axq_link pointer and instead use the HAL method to set the link pointer. Tested: * AR9280, hostap mode, CABQ frames being queued and transmitted	2013-03-26 04:47:40 +00:00
Alexander Kabaev	31932fae1e	Do not pass unmapped buffers to drivers that cannot handle them In physio, check if device can handle unmapped IO and pass an appropriately mapped buffer to the driver strategy routine. The only driver in the tree that can handle unmapped buffers is one exposed by GEOM, so mark it as such with the new flag in the driver cdevsw structure. This fixes insta-panics on hosts, running dconschat, as /dev/fwmem is an example of the driver that makes use of physio routine, but bypasses the g_down thread, where the buffer gets mapped normally. Discussed with: kib (earlier version)	2013-03-26 01:17:06 +00:00
Pedro F. Giffuni	5472787377	Dtrace: Add SUN MDB-like type-aware print() action. Merge change from illumos: 1694 Add type-aware print() action This is a very nice feature implemented in upstream Dtrace. A complete description is available here: http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/ This change bumps the DT_VERS_* number to 1.9.0 in accordance to what is done in illumos. While here also include some minor cleanups to ease further merging and appease clang with a fix by Fabian Keil. Illumos Revisions: 13501:c3a7090dbc16 13483:f413e6c5d297 Reference: https://www.illumos.org/issues/1560 https://www.illumos.org/issues/1694 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 20:38:09 +00:00
Pedro F. Giffuni	730cecb05a	Dtrace: add toupper()/tolower() and enhancements to lltostr(). Merge changes from illumos: 1451 DTrace needs toupper()/tolower() subroutines 1457 lltostr() D subroutine should take an optional base This change bumps the DT_VERS_* number to 1.8.1 in accordance to what is done in illumos. The test suite we currently include is outdated and doesnt support some updates in tst.subr.d which had to be left out for now. Illumos Revisions: r13458 5e394d8db762 r13459 c3454574dd1a Reference: https://www.illumos.org/issues/1451 https://www.illumos.org/issues/1457 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-25 15:40:57 +00:00
Alexander V. Chernikov	58e8e6e6bb	Unlock IPMI sc while performing requests via KCS and SMIC interfaces. It is already done in SSIF interface code. This reduces contention/spinning reported by many users. PR: kern/172166 Submitted by: Eric van Gyzen <eric at vangyzen.net> MFC after: 2 weeks	2013-03-25 14:30:34 +00:00
Alexander Motin	6a740c4a4f	Read Asynchronous Notification statuses only if Port Multiplier or ATAPI device are connected. ATA disks are not using ANs, while the extra register read operation is quite expensive.	2013-03-25 13:58:17 +00:00
Davide Italiano	3f321a4eac	Cache the callout precision argument as part of the informations required for migrating callouts to new CPU. This value is passed to callout_cc_add() in order to update properly precision field in case of rescheduling/migration. Reviewed by: mav	2013-03-25 09:43:50 +00:00
Alexander Motin	3d44989055	Depending on combination of running commands (NCQ/non-NCQ) try to avoid extra read from PxCI/PxSACT registers. If only NCQ commands are running, we don't really need PxCI. If only non-NCQ commands are running we don't need PxSACT. Mixed set may happen only on controllers with FIS-based switching when port multiplier is attached, and then we have to read both registers. MFC after: 1 month	2013-03-25 08:50:51 +00:00
Andrey V. Elsukov	5b4661289d	When we are removing a specific set, call ipfw_expire_dyn_rules only once. Obtained from: Yandex LLC MFC after: 1 week	2013-03-25 07:43:46 +00:00
Alexander Motin	f4673017b3	Make GEOM MULTIPATH to report unmapped bio support if underling path report it. GEOM MULTIPATH itself never touches the data and so transparent.	2013-03-25 07:24:58 +00:00
Alexander Motin	6d14d0d010	Remove two bzero()s that are erasing only few more bytes then set later.	2013-03-25 06:31:17 +00:00
Alexander Motin	30ba747160	In GEOM DISK: - Replace single done mutex with per-disk ones. On system with several disks on several HBAs that removes small, but measurable lock congestion. - Modify disk destruction process to not destroy the mutex prematurely. - Remove some extra pointer derefences.	2013-03-25 05:45:24 +00:00
Pedro F. Giffuni	f2e66d30b8	Dtrace: add optional size argument to tracemem(). Merge change from illumos: 1455 DTrace tracemem() should take an optional size argument Our local enhancements to dt_print_bytes were equivalent to those in illumos but we made it match the illumos version to ease further code merges. For now leave out tst.smallsize.d and tst.smallsize.d.out since those don't seem to work cleanly on FreeBSD. This change bumps the DT_VERS_* number to 1.7.1 in accordance to what is done in illumos. Illumos Revision: 13457:571b0355c2e3 Reference: https://www.illumos.org/issues/1455 Tested by: Fabian Keil Obtained from: Illumos MFC after: 1 month	2013-03-24 19:12:08 +00:00
Ian Lepore	a350e54067	Set the backlink in mmc commands to the mmc request that contains them.	2013-03-24 17:23:10 +00:00
Alexander Motin	db12db318d	No need to erase all 64 bytes of CFIS area if we never use more then 16.	2013-03-24 16:51:21 +00:00
Alan Cox	652615dcb7	Micro-optimize the control flow in a few places. Eliminate a panic call that could never be reached in vm_radix_insert(). (If the pointer being checked by the panic call were ever NULL, the immmediately preceding loop would have already crashed on a NULL pointer dereference.) Reviewed by: attilio (an earlier version) Sponsored by: EMC / Isilon Storage Division	2013-03-24 16:43:07 +00:00
Alexander Motin	3c330aff3f	Fix long known deadlock between geom dev destruction and d_close() call. Use destroy_dev_sched_cb() to not wait for device destruction while holding GEOM topology lock (that actually caused deadlock). Use request counting protected by mutex to properly wait for outstanding requests completion in cases of device closing and geom destruction. Unlike r227009, this code does not block taskqueue thread for indefinite time, waiting for completion.	2013-03-24 10:14:25 +00:00
Adrian Chadd	1f6b3ed63c	Add new regulatory domain. Obtained from: Qualcomm Atheros	2013-03-24 04:42:56 +00:00
Adrian Chadd	56a859789f	Move the TXQ lock earlier in this routine - so to correctly protect the link pointer check.	2013-03-24 04:09:54 +00:00
Adrian Chadd	0acf45ed86	Fix the locking changes due to the TXQ change drive-by. Tested: * AR9580, STA mode	2013-03-24 04:09:29 +00:00
Alexander Motin	50199fa0d0	Make g_wither_washer() to not loop by itself, but only when there was some more topology change done that may require its attention. Add few missing g_do_wither() calls in respective places to signal it. This fixes potential infinite loop here when some provider is withered, but still opened or connected for some reason and so can not be destroyed. For example, see r227009 and r227510.	2013-03-24 03:15:20 +00:00
Adrian Chadd	b837332d0a	Overhaul the TXQ locking (again!) as part of some beacon/cabq timing related issues. Moving the TX locking under one lock made things easier to progress on but it had one important side-effect - it increased the latency when handling CABQ setup when sending beacons. This commit introduces a bunch of new changes and a few unrelated changs that are just easier to lump in here. The aim is to have the CABQ locking separate from other locking. The CABQ transmit path in the beacon process thus doesn't have to grab the general TX lock, reducing lock contention/latency and making it more likely that we'll make the beacon TX timing. The second half of this commit is the CABQ related setup changes needed for sane looking EDMA CABQ support. Right now the EDMA TX code naively assumes that only one frame (MPDU or A-MPDU) is being pushed into each FIFO slot. For the CABQ this isn't true - a whole list of frames is being pushed in - and thus CABQ handling breaks very quickly. The aim here is to setup the CABQ list and then push _that list_ to the hardware for transmission. I can then extend the EDMA TX code to stamp that list as being "one" FIFO entry (likely by tagging the last buffer in that list as "FIFO END") so the EDMA TX completion code correctly tracks things. Major: * Migrate the per-TXQ add/removal locking back to per-TXQ, rather than a single lock. * Leave the software queue side of things under the ATH_TX_LOCK lock, (continuing) to serialise things as they are. * Add a new function which is called whenever there's a beacon miss, to print out some debugging. This is primarily designed to help me figure out if the beacon miss events are due to a noisy environment, issues with the PHY/MAC, or other. * Move the CABQ setup/enable to occur _after_ all the VAPs have been looked at. This means that for multiple VAPS in bursted mode, the CABQ gets primed once all VAPs are checked, rather than being primed on the first VAP and then having frames appended after this. Minor: * Add a (disabled) twiddle to let me enable/disable cabq traffic. It's primarily there to let me easily debug what's going on with beacon and CABQ setup/traffic; there's some DMA engine hangs which I'm finally trying to trace down. * Clear bf_next when flushing frames; it should quieten some warnings that show up when a node goes away. Tested: * AR9280, STA/hostap, up to 4 vaps (staggered) * AR5416, STA/hostap, up to 4 vaps (staggered) TODO: * (Lots) more AR9380 and later testing, as I may have missed something here. * Leverage this to fix CABQ hanling for AR9380 and later chips. * Force bursted beaconing on the chips that default to staggered beacons and ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being correctly set when chaining descriptors.)	2013-03-24 00:03:12 +00:00
Adrian Chadd	49ddabc4bd	CABQ calculation changes to try and fix some weird corner cases leading to stuck beacons. * Set the cabq readytime (ie, how long to burst for) to 50% of the total beacon interval time * fix the cabq adjustment calculation based on how the beacon offset is calculated (the SWBA/DBA time offset.) This is all still a bit magic voodoo but it does seem to have further quietened issues with missed/stuck beacons under my local testing. In any case, it better matches what the reference HAL implements. Obtained from: Qualcomm Atheros	2013-03-23 23:51:11 +00:00
Konstantin Belousov	b11d58b63f	Do not call malloc(M_WAITOK) while bodev->fence_lock mutex is held. The ttm_buffer_object_transfer() does not need the mutex locked at all, except for the call to the driver sync_obj_ref() method. Reported and tested by: dumbbell MFC after: 2 weeks	2013-03-23 22:23:15 +00:00
Jean-Sébastien Pédron	accadf8de2	drm/ttm: Fix a typo: s/pTTM]/[TTM]/	2013-03-23 20:46:47 +00:00
Jean-Sébastien Pédron	76c40c6986	drm/ttm: Explain why we don't need to acquire a ref in ttm_bo_vm_ctor()	2013-03-23 20:43:26 +00:00
Martin Matuska	7608b757d7	Fix kernel build with options ZFS after r24571 (libzfs_core). Submitted by: Bjoern A. Zeeb <bz@FreeBSD.org>	2013-03-23 20:01:45 +00:00
Jean-Sébastien Pédron	a649986089	drm/ttm: Fix TTM buffer object refcount This fixes memory leaks in the radeonkms driver. Reviewed by: Konstantin Belousov (kib@) Tested by: J.R. Oldroyd <jr@opal.com>	2013-03-23 19:19:19 +00:00
Ian Lepore	49addc5755	Don't check and warn about pmap mismatch on every call to busdma sync. With some recent busdma refactoring, sometimes it happens that a sync op gets called when bus_dmamap_load() never got called, which results in a spurious warning about a map mismatch when no sync operations will actually happen anyway. Now the check is done only if a sync operation is actually performed, and the result of the check is a panic, not just a printf. Reviewed by: cognet (who prevented me from donning a point hat)	2013-03-23 17:17:06 +00:00
Will Andrews	ef04b888d2	Be more explicit about what each bio_cmd & bio_flags value means. Reviewed by: ken (mentor)	2013-03-23 16:55:07 +00:00
Will Andrews	58567a1b4e	ZFS: Fix a panic while unmounting a busy filesystem. This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic. Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check. Submitted by: avg Reviewed by: avg Approved by: ken (mentor) MFC after: 1 month	2013-03-23 16:34:56 +00:00
Will Andrews	fdbc71742b	Extend taskqueue(9) to enable per-taskqueue callbacks. The scope of these callbacks is primarily to support actions that affect the taskqueue's thread environments. They are entirely optional, and consequently are introduced as a new API: taskqueue_set_callback(). This interface allows the caller to specify that a taskqueue requires a callback and optional context pointer for a given callback type. The callback types included in this commit can be used to register a constructor and destructor for thread-local storage using osd(9). This allows a particular taskqueue to define that its threads require a specific type of TLS, without the need for a specially-orchestrated task-based mechanism for startup and shutdown in order to accomplish it. Two callback types are supported at this point: - TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior to processing any tasks. - TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits, after it has processed its last task but before the taskqueue is reclaimed. While I'm here: - Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them in appropriate locations. - Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required interface for all consumers of taskqueue(9). Reviewed by: kib (all), eadler (taskqueue.9), brd (taskqueue.9) Approved by: ken (mentor) Sponsored by: Spectra Logic MFC after: 1 month	2013-03-23 15:11:53 +00:00
Andriy Gapon	ca84e042a3	post mountroot event after a real/final root is mounted not every time an intermediate root (including the first devfs) is mounted. This is also consistent with waking up via root_mount_complete. Reviewed by: jhb MFC after: 13 days	2013-03-23 08:59:34 +00:00
Andriy Gapon	aaf2546b67	fbt_getargdesc: correctly handle types for return probes MFC after: 6 days	2013-03-23 08:52:50 +00:00
Andriy Gapon	a47016e9a9	fbt_typoff_init: fix an off by one in determining required memory size This issue would be silent most of the time, but if the requested memory is a multiple of a page size, then accessing one element beyond the end would lead to a kernel page fault. Otherwise, the unlucky last type would just be inaccessible. Reported by: glebius Tested by: glebius MFC after: 6 days	2013-03-23 08:48:44 +00:00
Xin LI	843b298e62	Don't attempt to reference sc before testing whether it's NULL. Submitted by: Sascha Wildner Obtained from: DragonFly MFC after: 2 weeks	2013-03-22 22:46:19 +00:00
Kirk McKusick	baa12a84a7	The purpose of this change to the FFS layout policy is to reduce the running time for a full fsck. It also reduces the random access time for large files and speeds the traversal time for directory tree walks. The key idea is to reserve a small area in each cylinder group immediately following the inode blocks for the use of metadata, specifically indirect blocks and directory contents. The new policy is to preferentially place metadata in the metadata area and everything else in the blocks that follow the metadata area. The size of this area can be set when creating a filesystem using newfs(8) or changed in an existing filesystem using tunefs(8). Both utilities use the `-k held-for-metadata-blocks' option to specify the amount of space to be held for metadata blocks in each cylinder group. By default, newfs(8) sets this area to half of minfree (typically 4% of the data area). This work was inspired by a paper presented at Usenix's FAST '13: www.usenix.org/conference/fast13/ffsck-fast-file-system-checker Details of this implementation appears in the April 2013 of ;login: www.usenix.org/publications/login/april-2013-volume-38-number-2. A copy of the April 2013 ;login: paper can also be downloaded from: www.mckusick.com/publications/faster_fsck.pdf. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks	2013-03-22 21:45:28 +00:00
Gleb Smirnoff	209dddb90e	Remove __FreeBSD_version ifdefs.	2013-03-22 20:44:16 +00:00
Pawel Jakub Dawidek	051a23d4e8	- Constify local path variable for chflagsat(). - Use correct format characters (%lx) for u_long. This fixes the build broken in r248599.	2013-03-22 07:40:34 +00:00
Kevin Lo	b3dcd51dde	Clean up some unused leftover code. Pointed out by: ae	2013-03-22 01:45:54 +00:00
Kevin Lo	dda95c6e59	Remove unused global variables. Reviewed by: ae, glebius	2013-03-22 01:40:17 +00:00
Steven Hartland	def84b9736	Fix for building libzpool under i386. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 23:06:11 +00:00
Pawel Jakub Dawidek	5d46382415	Regenerate after r248599. Sponsored by: The FreeBSD Foundation	2013-03-21 23:02:19 +00:00
Pawel Jakub Dawidek	e948704e4b	Implement chflagsat(2) system call, similar to fchmodat(2), but operates on file flags. Reviewed by: kib, jilles Sponsored by: The FreeBSD Foundation	2013-03-21 22:59:01 +00:00
Pawel Jakub Dawidek	14cd1ffdf8	Regenerate after r248597. Sponsored by: The FreeBSD Foundation	2013-03-21 22:47:03 +00:00
Pawel Jakub Dawidek	b4b2596b97	- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type u_long. Before this change it was of type int for syscalls, but prototypes in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not for lchflags(2)) stated that it was u_long. Now some related functions use u_long type for flags (strtofflags(3), fflagstostr(3)). - Make path argument of type 'const char *' for consistency. Discussed on: arch Sponsored by: The FreeBSD Foundation	2013-03-21 22:44:33 +00:00
Konstantin Belousov	e808788c05	Correct the page count when excess length is trimmed from the bio. Reported and tested by: Ivan Klymenko <fidaj@ukr.net	2013-03-21 22:36:43 +00:00
Jilles Tjoelker	46f10cc265	Allow O_CLOEXEC in posix_openpt() flags. PR: kern/162374 Reviewed by: ed	2013-03-21 21:39:15 +00:00
Attilio Rao	d52d7aa871	Fix a bug in UMTX_PROFILING: UMTX_PROFILING should really analyze the distribution of locks as they index entries in the umtxq_chains hash-table. However, the current implementation does add/dec the length counters for every thread insert/removal, measuring at all really userland contention and not the hash distribution. Fix this by correctly add/dec the length counters in the points where it is really needed. Please note that this bug brought us questioning in the past the quality of the umtx hash table distribution. To date with all the benchmarks I could try I was not able to reproduce any issue about the hash distribution on umtx. Sponsored by: EMC / Isilon storage division Reviewed by: jeff, davide MFC after: 2 weeks	2013-03-21 19:58:25 +00:00
Alexander Motin	359b47db97	Minimal timer period of 100us introduced in r244758 is overkill. While original 2us are indeed not enough, 3us are working quite well on my tests. To be more safe set minimal period to 5us and to be even more safe replicate here from HPET mechanism of rereading counter after programming comparator. This change allows to handle 30K of short nanosleep() calls per second on Raspberry Pi instead of just 8K before. Discussed with: gonzo	2013-03-21 15:42:41 +00:00
John Baldwin	d071a6fa33	Another NFS SIGSTOP related fix: Ignore thread suspend requests due to SIGSTOP if stop signals are currently deferred. This can occur if a process is stopped via SIGSTOP while a thread is running or runnable but before it has set TDF_SBDRY. Tested by: pho Reviewed by: kib MFC after: 1 week	2013-03-21 14:06:27 +00:00
Konstantin Belousov	c46262f810	Fix twa(4) after the r246713. The driver copies data around to satisfy some alignment restrictions. Do not set TW_OSLI_REQ_FLAGS_CCB flag for mapped data, pass the csio->data_ptr in the req->data. Do not put the ccb pointer into req->data ever, ccb is stored in req->orig_req already. Submitted by: Shuichi KITAGUCHI <ki@hh.iij4u.or.jp> PR: kern/177020	2013-03-21 13:06:28 +00:00
Konstantin Belousov	4d569af96c	Initialize the variable to avoid (false) compiler warning about use of an uninitialized local. Reported by: Ivan Klymenko <fidaj@ukr.net> MFC after: 2 weeks	2013-03-21 12:59:24 +00:00
Steven Hartland	2b114ad2a4	Add missing descriptions for ZFS sysctls Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:25:21 +00:00
Steven Hartland	adea827b21	Optimisation of TRIM processing. Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed. In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time. This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds) Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that. Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 11:02:08 +00:00
Steven Hartland	6ad46cec23	Names the ZFS TRIM thread Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks	2013-03-21 10:41:30 +00:00
Steven Hartland	89e5b43079	TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `17122c31ac` MFC after: 2 weeks	2013-03-21 10:29:05 +00:00
Steven Hartland	78ad0c1c80	Improve TXG handling in the TRIM module. This patch adds some improvements to the way the trim module considers TXGs: - Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `5b46ad40d9` MFC after: 2 weeks	2013-03-21 10:16:10 +00:00
Steven Hartland	e07e3a3792	Don't register repair writes in the trim map. The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes. I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: Source: `6a3cebaf7c`	2013-03-21 10:02:32 +00:00
Steven Hartland	e05aad2d33	Add TRIM support for L2ARC. This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit. Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: `31aae37399` MFC after: 2 weeks	2013-03-21 09:34:41 +00:00
Martin Matuska	05f49d92ef	Merge libzfs_core branch: includes MFV 238590, 238592, 247580 MFV 238590, 238592: In the first zfs ioctl restructuring phase, the libzfs_core library was introduced. It is a new thin library that wraps around kernel ioctl's. The idea is to provide a forward-compatible way of dealing with new features. Arguments are passed in nvlists and not random zfs_cmd fields, new-style ioctls are logged to pool history using a new method of history logging. http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/ MFV 247580 [1]: To address issues of several deadlocks and race conditions the locking code around dsl_dataset was rewritten and the interface to synctasks was changed. User-Visible Changes: "zfs snapshot" can create more arbitrary snapshots at once (atomically) "zfs destroy" destroys multiple snapshots at once "zfs recv" has improved performance Backward Compatibility: I have extended the compatibility layer to support full backward compatibility by remapping or rewriting the responsible ioctl arguments. Old utilities are fully supported by the new kernel module. Forward Compatibility: New utilities work with old kernels with the following restrictions: - creating, destroying, holding and releasing of multiple snapshots at once is not supported, this includes recursive (-r) commands Illumos ZFS issues: 2882 implement libzfs_core 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once 3464 zfs synctask code needs restructuring References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2900 https://www.illumos.org/issues/3464 [1] MFC after: 1 month Sponsored by: Hybrid Logic Inc. [1]	2013-03-21 08:38:03 +00:00
Gleb Smirnoff	5aedfa32a4	Add NGM_NAT_LIBALIAS_INFO command, that reports internal stats of libalias instance. To be used in the mpd5 daemon. Submitted by: Dmitry Luhtionov <dmitryluhtionov gmail.com>	2013-03-21 08:36:15 +00:00
Konstantin Belousov	7db07e1c85	Only size and create the bio_transient_map when unmapped buffers are enabled. Now, disabling the unmapped buffers should result in the kernel memory map identical to pre-r248550. Sponsored by: The FreeBSD Foundation	2013-03-21 07:28:15 +00:00
Konstantin Belousov	6c83fce371	Assert that transient mapping of the bio is only done when unmapped buffers are allowed. Sponsored by: The FreeBSD Foundation	2013-03-21 07:26:33 +00:00
Konstantin Belousov	7157d8f7ab	Do not call vnode_pager_setsize() while a NFS node mutex is locked. vnode_pager_setsize() might sleep waiting for the page after EOF be unbusied. Call vnode_pager_setsize() both for the regular and directory vnodes. Reported by: mich Reviewed by: rmacklem Discussed with: avg, jhb MFC after: 2 weeks	2013-03-21 07:25:08 +00:00
Hans Petter Selasky	3232aae327	Add new USB ID. PR: usb/177173 MFC after: 1 week	2013-03-21 07:04:17 +00:00
Konstantin Belousov	e3269b5096	In bufwrite(), a dirty buffer is moved to the clean queue before the bufobj counter of the writes in progress is incremented. Other thread inspecting the bufobj would consider it clean. For the regular vnodes, the vnode lock is typically held both by the thread performing the bufwrite() and an other thread doing syncing, which prevents the situation. On the other hand, writes to the VCHR vnodes are done without holding vnode lock. Increment the write ref counter for the buffer object before calling bundirty(). Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks	2013-03-20 21:08:00 +00:00
Konstantin Belousov	8d6884ce9c	When the journaled FFS volume is suspended due to the journal space becoming too low, the softdep flush thread processes the workitems, which frees the space in journal, and then unsuspends the fs. The softdep_flush() and other workitem processing functions busy the filesystem before iterating over the worklist, to prevent the parallel unmount from freeing the mount data. The vfs_busy() is called with MBF_NOWAIT flag. Now, if the unmount is already started and the filesystem is suspended due to low journal space, the journal is never flushed and filesystem is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed for the unmounting fs, and softdep_flush() does not process the workitems. Unmount needs to write metadata, where it hangs in the "suspfs" state. Move the vn_start_write() call in the dounmount() before setting the MNTK_UNMOUNT flag. This practically ensures that softdep_flush() processed the pending journal writes by making dounmount() wait for the lift of the suspension. Sponsored by: The FreeBSD Foundation Reported and tested by: pho MFC after: 2 weeks	2013-03-20 21:07:49 +00:00
Kirk McKusick	3289d5877a	When renaming a directory from one parent directory to another, we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks	2013-03-20 17:57:00 +00:00
Aleksandr Rybalko	a2c472e741	Integrate Efika MX project back to home. Sponsored by: The FreeBSD Foundation	2013-03-20 15:39:27 +00:00
Hans Petter Selasky	76be9c89ba	Fix spelling.	2013-03-20 11:51:26 +00:00
Alexander V. Chernikov	ae01d73c04	Add ipfw support for setting/matching DiffServ codepoints (DSCP). Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg. Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words). Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order): Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin PR: kern/102471, kern/121122 MFC after: 2 weeks	2013-03-20 10:35:33 +00:00
Martin Matuska	192d547574	Release hold on pool before calling zvol_create_minor()	2013-03-20 09:56:20 +00:00
Konstantin Belousov	6991ee13a6	Fix the logic inversion in the r248512. Noted by: mckay	2013-03-20 09:44:23 +00:00
Adrian Chadd	9cda8c8082	Fix the EDMA CABQ handling - for now, the CABQ takes a descriptor chain like the legacy chips expect.	2013-03-20 05:44:03 +00:00
Pyun YongHyeon	cf402cc979	For RTL8211B or later PHYs, enable crossover detection and auto-correction. This change makes re(4) establish a link with a system using non-crossover UTP cable. Tested by: Michael BlackHeart < amdmiek <> gmail dot com >	2013-03-20 05:31:34 +00:00
Adrian Chadd	bd8cbcc32c	Add VNET wrappers around the rest of the ieee80211 rtsock messages. I triggered the cac/radar messages when doing testing in DFS channels.	2013-03-20 02:42:52 +00:00
Martin Matuska	a0abc0d302	Run zvol_create_minors() only if in non-error case	2013-03-19 22:27:15 +00:00
Martin Matuska	e56718d734	Run zvol_create_minors() on snapshot creation	2013-03-19 22:14:50 +00:00
Jilles Tjoelker	c2e3c52e0d	Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC. This change allows creating file descriptors with close-on-exec set in some situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file descriptors (SCM_RIGHTS) atomically close-on-exec. The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD. MSG_CMSG_CLOEXEC is the first free bit for MSG_. The SOCK_ flags are not passed to MAC because this may cause incorrect failures and can be done later via fcntl() anyway. On the other hand, audit is expected to cope with the new flags. For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags argument. Reviewed by: kib	2013-03-19 20:58:17 +00:00
Adrian Chadd	f0db652cf6	Break out the RX completion path into "FIFO check / refill" and "complete RX frames." The 128 entry RX FIFO is really easy to fill up and miss refilling when it's done in the ath taskq - as that gets blocked up doing RX completion, TX completion and other random things. So the 128 entry RX FIFO now gets emptied and refilled in the ath_intr() task (and it grabs / releases locks, so now ath_intr() can't just be a FAST handler yet!) but the locks aren't held for very long. The completion part is done in the ath taskqueue context. Details: * Create a new completed frame list - sc->sc_rx_rxlist; * Split the EDMA RX process queue into two halves - one that processes the RX FIFO and refills it with new frames; another that completes the completed frame list; * When tearing down the driver, flush whatever is in the deferred queue as well as what's in the FIFO; * Create two new RX methods - one that processes all RX queues, one that processes the given RX queue. When MSI is implemented, we get told which RX queue the interrupt came in on so we can specifically schedule that. (And I can do that with the non-MSI path too; I'll figure that out later.) * Convert the legacy code over to use these new RX methods; * Replace all the instances of the RX taskqueue enqueue with a call to a relevant RX method to enqueue one or all RX queues. Tested: * AR9380, STA * AR9580, STA * AR5413, STA	2013-03-19 19:32:28 +00:00
Adrian Chadd	74ea88c379	Add more TODO items.	2013-03-19 17:55:36 +00:00
Adrian Chadd	378a752f59	Now that the tx map field is correctly populated for both edma and legacy chips, just use that.	2013-03-19 17:54:37 +00:00
Konstantin Belousov	129c6621f7	ahci(4) and siis(4) are ready to process the unmapped i/o requests Sponsored by: The FreeBSD Foundation Tested by: pho Submitted by: bf (siis patch)	2013-03-19 15:09:32 +00:00
Konstantin Belousov	59a01b70af	UFS support of the unmapped i/o for the user data buffers. Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf	2013-03-19 15:08:15 +00:00
Konstantin Belousov	2649fcc1d8	Commit the removal of a whitespace to record the proper commit message for the r248519: For the cam-attached HBAs, allow the driver to specify that it accepts the unmapped bio by the PIM_UNMAPPED flag. The CAM passes the CAM_DATA_BIO data transfer type request for the unmapped bio, and the driver could use the bus_dmamap_load_ccb() as a helper to transparently handle the ccb. Sponsored by: The FreeBSD Foundation Reviewed by: scottl Tested by: pho, scottl	2013-03-19 15:05:21 +00:00
Konstantin Belousov	abc1e60e0e	Support unmapped i/o for the md(4). The vnode-backed md(4) has to map the unmapped bio because VOP_READ() and VOP_WRITE() interfaces do not allow to pass unmapped requests to the filesystem. Vnode-backed md(4) uses pbufs instead of relying on the bio_transient_map, to avoid usual md deadlock. Sponsored by: The FreeBSD Foundation Tested by: pho, scottl	2013-03-19 15:01:50 +00:00
Konstantin Belousov	59ec9023ca	Support unmapped i/o for the md(4). The vnode-backed md(4) has to map the unmapped bio because VOP_READ() and VOP_WRITE() interfaces do not allow to pass unmapped requests to the filesystem. Vnode-backed md(4) uses pbufs instead of relying on the bio_transient_map, to avoid usual md deadlock. Sponsored by: The FreeBSD Foundation Tested by: pho, scottl	2013-03-19 14:53:23 +00:00
Konstantin Belousov	db7bfaa8ce	The geom_part provider supports unmapped bio iff the underlying provider does so, since geom_part never inspects the bio_data. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:50:24 +00:00
Konstantin Belousov	f8c19ba466	A flag for the geom disk driver to indicate that it accepts the unmapped i/o requests. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:49:15 +00:00
Konstantin Belousov	e81ff91e62	Do not remap usermode pages into KVA for physio. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:43:57 +00:00
Konstantin Belousov	2cc718a11c	Do not map the swap i/o pbufs if the geom provider for the swap partition accepts unmapped requests. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:39:27 +00:00
Konstantin Belousov	6ce697dc73	Pass unmapped buffers for page in requests if the filesystem indicated support for the unmapped i/o. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:36:28 +00:00
Konstantin Belousov	f8c09530bd	A flag for the filesystem to indicate to the upper levels that it accepts unmapped buffers for the VOP_STRATEGY(). Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:33:01 +00:00
Konstantin Belousov	7d5365c70b	Add a helper function vfs_bio_bzero_buf() to zero the portion of the buffer, transparently handling mapped or unmapped buffers. Its intent is to replace the use of bzero(bp->b_data) in cases where the buffer might be unmapped, to avoid unneeded upgrades. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 14:27:14 +00:00
Aleksandr Rybalko	5ac9d9890f	Return "start" and "end" to u_long world. Because rman handle addresses as u_long too. Discussed with: ian@ Pointy hat to: ray@	2013-03-19 14:15:41 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Konstantin Belousov	36a6d2ebc4	Add a convenience macro bread_gb() to wrap a call to breadn_flags(). Comparing with bread(), it adds an argument to pass the flags to getblk(). Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks	2013-03-19 13:21:39 +00:00
Aleksandr Rybalko	ee52bd5713	Cast "start" to u_long. Temporary fix to unbreak tinderbox. We need here max possible storage or dynamic, depend on size of address cell.	2013-03-19 13:13:26 +00:00
Konstantin Belousov	b4862fafd5	Assert that a ccb passed to cam_periph_mapmem() for XPT_SCSI_IO and XPT_ATA_IO holds virtual buffer address. Sponsored by: The FreeBSD Foundation Tested by: pho	2013-03-19 13:10:14 +00:00
Ed Maste	96ecfd9813	Fix remainder calculation when biosize is not a power of 2 In common configurations biosize is a power of two, but is not required to be so. Thanks to markj@ for spotting an additional case beyond my original patch. Reviewed by: rmacklem@	2013-03-19 13:06:11 +00:00
Hans Petter Selasky	565d8205f3	Add new USB ID. PR: usb/177105 MFC after: 1 week	2013-03-19 12:52:13 +00:00
Martin Matuska	07091d8f14	MFV r247580: Merge synctask code restructuring from vendor. Modify forward and backward compatibility to support new change. Illumos ZFS issues: 3464 zfs synctask code needs restructuring Sponsored by: Hybrid Logic Ltd.	2013-03-19 12:51:18 +00:00
Martin Matuska	520268fb97	MFC @248493	2013-03-19 11:09:15 +00:00
Martin Matuska	87a5cb4650	Plug memory leak in dsl_check_snap_cb() This was unnoticed because the function is very rarely used. MFC after: 3 days	2013-03-19 07:47:51 +00:00
Andrey V. Elsukov	93bb4f9ed5	Separate the locking macros that are used in the packet flow path from others. This helps easy switch to use pfil(4) lock.	2013-03-19 06:04:17 +00:00
Andrey V. Elsukov	5474386bd3	Fix style and comments.	2013-03-19 05:51:47 +00:00
Justin Hibbits	ec86c487a6	Fix the powerpc64 build. MACHINE_CPUARCH is common for powerpc/powerpc64, not MACHINE_ARCH.	2013-03-19 00:39:02 +00:00
Aleksandr Rybalko	36581e4785	Don't hesitate to ask parent to setup IRQ finally. Sponsored by: The FreeBSD Foundation	2013-03-18 23:51:39 +00:00
Aleksandr Rybalko	bf9d6206b0	Allow simplebus to attach to another simplebus. Sponsored by: The FreeBSD Foundation	2013-03-18 23:41:19 +00:00
Aleksandr Rybalko	089dfb09f1	Hide "no default resources for" warning under bootverbose. It's ok to use optional resources. Sponsored by: The FreeBSD Foundation	2013-03-18 23:38:15 +00:00
Aleksandr Rybalko	2737a5a925	Allow simplebus to attach in less strict way, when "simple-bus" listed on not first position of compatible property, so simplebus driver can be generic driver for any bus listed as compatible with "simple-bus". Sponsored by: The FreeBSD Foundation	2013-03-18 23:35:01 +00:00
Jung-uk Kim	78260bb30a	List TrackPoint device before generic model.	2013-03-18 23:31:22 +00:00
Jung-uk Kim	569d8f7e27	Add preliminary support for IBM/Lenovo TrackPoint. PR: kern/147237 (based on the initial patch for 8.x) Tested by: glebius (device detection and suspend/resume) MFC after: 1 month	2013-03-18 23:22:47 +00:00
Martin Matuska	a602517b63	Add missing zvol_create_mirrors() on zfs_ioc_create()	2013-03-18 20:22:40 +00:00
Ryan Stone	3aff0961dd	Correct the definition for Exar XR17V258IV: we must use a config_function to specify the offset into the PCI memory spare at which each serial port will find its registers. This was already done for other Exar PCI serial devices; it was accidentally omitted for this specific device. Sponsored by: Sandvine Incorporated MFC after: 1 week	2013-03-18 19:22:51 +00:00
John Baldwin	1968f37bc9	Tweak some comments.	2013-03-18 18:04:09 +00:00
John Baldwin	3cf3b9f097	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.	2013-03-18 17:23:58 +00:00
Aleksandr Rybalko	4117c1db9e	o Switch to use physical addresses in rman for FDT. o Remove vtophys used to translate virtual address to physical in case rman carry virtual. Sponsored by: The FreeBSD Foundation	2013-03-18 15:18:55 +00:00
Martin Matuska	876a84e867	MFC @248461	2013-03-18 09:39:51 +00:00
Martin Matuska	6f4accc2de	Move common zfs ioctl compatibility functions (userland) into libzfs_compat.c Introduce additional constants for zfs ioctl versions	2013-03-18 09:32:29 +00:00
Hans Petter Selasky	bd247e9ddd	Add new USB ID. PR: usb/177013 MFC after: 1 week	2013-03-18 07:02:58 +00:00
Justin Hibbits	80a5635c8b	Add FBT for PowerPC DTrace. Also, clean up the DTrace assembly code, much of which is not necessary for PowerPC. The FBT module can likely be factored into 3 separate files: common, intel, and powerpc, rather than duplicating most of the code between the x86 and PowerPC flavors. All DTrace modules for PowerPC will be MFC'd together once Fasttrap is completed.	2013-03-18 05:30:18 +00:00
Pyun YongHyeon	e8bedbd24a	r119712 introduced SIS_TYPE_83816 but it was not actually set in driver such that checking against the type was always false. To detect NS DP83816, driver should have checked silicon revision register for NS controllers. While here, remove SIS_TYPE_83816 to not make the similar mistake again. Reported by: Brad Smith ( brad@openbsd )	2013-03-18 04:46:17 +00:00
Adrian Chadd	1ab002f461	Print out the current fifo queue depth correctly - not just the max queue depth. Silly hat to me.	2013-03-18 02:29:57 +00:00
Adrian Chadd	eefc93a947	Dump out information about the RX descriptor free list and FIFO information.	2013-03-18 01:12:36 +00:00
Adrian Chadd	d50e882ab9	Log some more information when the RX buffer allocation failed.	2013-03-18 01:11:52 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Martin Matuska	af2e40ccd1	Merge libzfs_core part of r239388 Illumos ZFS issues: 3085 zfs diff panics, then panics in a loop on booting References: https://www.illumos.org/issues/3085	2013-03-17 18:49:11 +00:00
Martin Matuska	70b0720877	Fix accidentially changed ioc variable for old v15 compatibility	2013-03-17 17:28:06 +00:00

... 3 4 5 6 7 ...

92838 Commits