freebsd-skq

Author	SHA1	Message	Date
markj	6eda1f0daf	Remove the kld lock macros and just use the sx(9) API. Add locking in linker_init_kernel_modules() and linker_preload() in order to remove most of the checks for !cold before asserting that the kld lock is held. These routines are invoked by SYSINIT(9), so there's no harm in them taking the kld lock.	2013-08-24 21:07:04 +00:00
markj	86598e9098	Remove some code that has been commented out since it was added in 2000.	2013-08-24 21:00:39 +00:00
andre	e3737c33e7	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
ken	32eeb1af5c	Fix a printf format warning on 32-bit mips and powerpc. Reported by: bde, gjb Pointy hat to: ken	2013-08-24 19:02:36 +00:00
andre	b148bf45e0	Add an mbuf pointer parameter to (*ext_free) to give the external free function access to the mbuf the external memory was attached to. Mechanically adjust all users to include the mbuf parameter. This fixes a long standing annoyance for external free functions. Before one had to sacrifice one of the argument pointers for this. Sponsored by: The FreeBSD Foundation	2013-08-24 16:57:44 +00:00
mav	8d8ac3d3c9	MFprojects/camlock r254460: Remove locking from taskqueue_member(). The list of threads is static during the taskqueue life cycle, so there is no need to protect it, taking quite congested lock several more times for each ZFS I/O.	2013-08-24 14:41:49 +00:00
andre	817be10f1c	dd a 24 bits wide ext_flags field to m_ext by reducing ext_type to 8 bits. ext_type is an enumerator and the number of types we have is a mere dozen. A couple of ext_types are renumbered to fit within 8 bits. EXT_VENDOR[1-4] and EXT_EXP[1-4] types for vendor-internal and experimental local mapping. The ext_flags field is currently unused but has a couple of flags already defined for future use. Again vendor and experimental flags are provided for local mapping. EXT_FLAG_BITS is provided for the printf(9) %b identifier. Initialize and copy ext_flags in the relevant mbuf functions. Improve alignment and packing of struct m_ext on 32 and 64 archs by carefully sorting the fields.	2013-08-24 13:15:42 +00:00
andre	b9a332b6b2	Avoid code duplication for mbuf initialization and use m_init() instead in mb_ctor_mbuf() and mb_ctor_pack().	2013-08-24 12:24:58 +00:00
ken	281a193b53	Add support to physio(9) for devices that don't want I/O split and configure sa(4) to request no I/O splitting by default. For tape devices, the user needs to be able to clearly understand what blocksize is actually being used when writing to a tape device. The previous behavior of physio(9) was that it would split up any I/O that was too large for the device, or too large to fit into MAXPHYS. This means that if, for instance, the user wrote a 1MB block to a tape device, and MAXPHYS was 128KB, the 1MB write would be split into 8 128K chunks. This would be done without informing the user. This has suboptimal effects, especially when trying to communicate status to the user. In the event of an error writing to a tape (e.g. physical end of tape) in the middle of a 1MB block that has been split into 8 pieces, the user could have the first two 128K pieces written successfully, the third returned with an error, and the last 5 returned with 0 bytes written. If the user is using a standard write(2) system call, all he will see is the ENOSPC error. He won't have a clue how much actually got written. (With a writev(2) system call, he should be able to determine how much got written in addition to the error.) The solution is to prevent physio(9) from splitting the I/O. The new cdev flag, SI_NOSPLIT, tells physio that the driver does not want I/O to be split beforehand. Although the sa(4) driver now enables SI_NOSPLIT by default, that can be disabled by two loader tunables for now. It will not be configurable starting in FreeBSD 11.0. kern.cam.sa.allow_io_split allows the user to configure I/O splitting for all sa(4) driver instances. kern.cam.sa.%d.allow_io_split allows the user to configure I/O splitting for a specific sa(4) instance. There are also now three sa(4) driver sysctl variables that let the users see some sa(4) driver values. kern.cam.sa.%d.allow_io_split shows whether I/O splitting is turned on. kern.cam.sa.%d.maxio shows the maximum I/O size allowed by kernel configuration parameters (e.g. MAXPHYS, DFLTPHYS) and the capabilities of the controller. kern.cam.sa.%d.cpi_maxio shows the maximum I/O size supported by the controller. Note that a better long term solution would be to implement support for chaining buffers, so that that MAXPHYS is no longer a limiting factor for I/O size to tape and disk devices. At that point, the controller and the tape drive would become the limiting factors. sys/conf.h: Add a new cdev flag, SI_NOSPLIT, that allows a driver to tell physio not to split up I/O. sys/param.h: Bump __FreeBSD_version to 1000049 for the addition of the SI_NOSPLIT cdev flag. kern_physio.c: If the SI_NOSPLIT flag is set on the cdev, return any I/O that is larger than si_iosize_max or MAXPHYS, has more than one segment, or would have to be split because of misalignment with EFBIG. (File too large). In the event of an error, print a console message to give the user a clue about what happened. scsi_sa.c: Set the SI_NOSPLIT cdev flag on the devices created for the sa(4) driver by default. Add tunables to control whether we allow I/O splitting in physio(9). Explain in the comments that allowing I/O splitting will be deprecated for the sa(4) driver in FreeBSD 11.0. Add sysctl variables to display the maximum I/O size we can do (which could be further limited by read block limits) and the maximum I/O size that the controller can do. Limit our maximum I/O size (recorded in the cdev's si_iosize_max) by MAXPHYS. This isn't strictly necessary, because physio(9) will limit it to MAXPHYS, but it will provide some clarity for the application. Record the controller's maximum I/O size reported in the Path Inquiry CCB. sa.4: Document the block size behavior, and explain that the option of allowing physio(9) to split the I/O will disappear in FreeBSD 11.0. Sponsored by: Spectra Logic	2013-08-24 04:52:22 +00:00
delphij	b93cf73204	Allow tmpfs be mounted inside jail.	2013-08-23 22:52:20 +00:00
jkim	e9e439afa6	Fix a whitespace.	2013-08-23 16:54:38 +00:00
kib	4731d4c0e7	Since the 253927, which removed the soft busy call for the sf page, it does not make sense to wait for the soft busy state of the page to drain. The vm object lock is dropped immediately after, so the result of the wait is invalidated. It might make sense to not wait for the hard busy state as well, esp. for the fully valid page, but this is postponed for now. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-23 14:50:03 +00:00
jhb	6d81dda61e	Use tvtohz() to convert a socket buffer timeout to a tick value rather than using a home-rolled version. The home-rolled version could result in shorter-than-requested sleeps. Reported by: Vitja Makarov <vitja.makarov@gmail.com> MFC after: 2 weeks	2013-08-23 13:47:41 +00:00
kib	e25f6a560e	Both cluster_rbuild() and cluster_wbuild() sometimes set the pages shared busy without first draining the hard busy state. Previously it went unnoticed since VPO_BUSY and m->busy fields were distinct, and vm_page_io_start() did not verified that the passed page has VPO_BUSY flag cleared, but such page state is wrong. New implementation is more strict and catched this case. Drain the busy state as needed, before calling vm_page_sbusy(). Tested by: pho, jkim Sponsored by: The FreeBSD Foundation	2013-08-22 18:26:45 +00:00
kib	ba12eedccd	Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-22 07:39:53 +00:00
andre	d8dc6ade5b	Revert r254520 and resurrect the M_NOFREE mbuf flag and functionality. Requested by: np, grehan	2013-08-21 18:12:04 +00:00
kib	d11c4f9c32	Implement read(2)/write(2) and neccessary lseek(2) for posix shmfd. Add MAC framework entries for posix shm read and write. Do not allow implicit extension of the underlying memory segment past the limit set by ftruncate(2) by either of the syscalls. Read and write returns short i/o, lseek(2) fails with EINVAL when resulting offset does not fit into the limit. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:45:00 +00:00
kib	6a459eb27c	Make the seek a method of the struct fileops. Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:36:01 +00:00
kib	a3e8f2c6dc	Extract the general-purpose code from tmpfs to perform uiomove from the page queue of some vm object. Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-21 17:23:24 +00:00
pho	967f62efad	Added sysctl to turn off calls to vmem_check(). Sponsored by: EMC / Isilon storage division Discussed with: jeff	2013-08-20 11:06:56 +00:00
jeff	ed90d4ba3f	- Use an arbitrary but reasonably large import size for kva on architectures that don't support superpages. This keeps the number of spans and internal fragmentation lower. - When the user asks for alignment from vmem_xalloc adjust the imported size by 2*align to be certain we can satisfy the allocation. This comes at the expense of potential failures when the backend can't supply enough memory but could supply the requested size and alignment. Sponsored by: EMC / Isilon Storage Division	2013-08-19 23:02:39 +00:00
andre	e1092223ba	Remove the unused M_NOFREE mbuf flag. It didn't have any in-tree users for a very long time, if ever. Should such a functionality ever be needed again the appropriate and much better way to do it is through a custom EXT_SOMETHING external mbuf type together with a dedicated *ext_free function. Discussed with: trociny, glebius	2013-08-19 11:16:53 +00:00
jilles	1d1d0428d2	Disallow opening a POSIX message queue for execute. O_EXEC was formerly ignored, so equivalent to O_RDONLY. Reject O_EXEC with [EINVAL] like the invalid mode 3.	2013-08-18 13:27:04 +00:00
pjd	3014e000ae	Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64. Reported by: jilles Sponsored by: The FreeBSD Foundation	2013-08-18 10:30:41 +00:00
bryanv	1b4affdd25	Do not use potentially stale thread in kthread_add() When an existing process is provided, the thread selected to use to initialize the new thread could have exited and be reaped. Acquire the proc lock earlier to ensure the thread remains valid. Reviewed by: jhb, julian (previous version) MFC after: 3 days	2013-08-17 17:02:43 +00:00
pjd	635a029a89	In r114945 the line 'nmp = TAILQ_NEXT(mp, mnt_list);' was duplicated. Instead of just removing the duplicate, convert the loop to TAILQ_FOREACH().	2013-08-17 14:13:45 +00:00
delphij	a1c410b061	Fix build.	2013-08-17 00:25:11 +00:00
kib	408a640438	Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops. Reviewed by: glebius Sponsored by: The FreeBSD Foundation	2013-08-16 14:22:20 +00:00
markj	e461168640	Use strdup(9) instead of reimplementing it.	2013-08-16 03:41:41 +00:00
ken	5591de079d	Change the way that unmapped I/O capability is advertised. The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to `1000045` for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic	2013-08-15 22:52:39 +00:00
cperciva	8544609e64	Change the queue of locks in kern_rangelock.c from holding lock requests in the order that they arrive, to holding (a) granted write lock requests, followed by (b) granted read lock requests, followed by (c) ungranted requests, in order of arrival. This changes the stopping condition for iterating through granted locks to see if a new request can be granted: When considering a read lock request, we can stop iterating as soon as we see a read lock request, since anything after that point is either a granted read lock request or a request which has not yet been granted. (For write lock requests, we must still compare against all granted lock requests.) For workloads with R parallel reads and W parallel writes, this improves the time spent from O((R+W)^2) to O(W*(R+W)); i.e., heavy parallel-read workloads become significantly more scalable. No statistically significant change in buildworld time has been measured, but synthetic tests of parallel 'dd > /dev/null' and 'openssl enc >/dev/null' with the input file cached yield dramatic (up to 10x) improvement with high (up to 128 processes) levels of parallelism. Reviewed by: kib	2013-08-15 20:19:17 +00:00
glebius	722a1a5e5d	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
markj	6c24c9fb32	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks	2013-08-15 04:08:55 +00:00
markj	cee1e037da	Use kld_{load,unload} instead of mod_{load,unload} for the linker file load and unload event handlers added in r254266. Reported by: jhb X-MFC with: r254266	2013-08-14 00:42:21 +00:00
jeff	ca13e69c2f	- Disable quantum caches on the kmem_arena. This can make fragmentation worse on small KVA systems. I had intended to only enable it for debugging. Sponsored by: EMC / Isilon Storage Division	2013-08-13 22:41:24 +00:00
jeff	d330a11545	- Add a statically allocated memguard arena since it is needed very early on. - Pass the appropriate flags to vmem_xalloc() when allocating space for the arena from kmem_arena. Sponsored by: EMC / Isilon Storage Division	2013-08-13 22:40:43 +00:00
jhb	7120845712	Some small cleanups to the fixes in r180340: - Set NOTE_TRACKERR before running filt_proc(). If the knote did not have NOTE_FORK set in fflags when registered, then the TRACKERR event could miss being posted. - Don't pass the pid in to filt_proc() for NOTE_FORK events. The special handling for pids is done knote_fork() directly and no longer in filt_proc(). MFC after: 2 weeks	2013-08-13 18:45:58 +00:00
glebius	33afc0388b	- Minor style(9) fix. - Bring a comment up to date.	2013-08-13 13:40:31 +00:00
markj	5a3f78714c	FreeBSD's DTrace implementation has a few problems with respect to handling probes declared in a kernel module when that module is unloaded. In particular, * Unloading a module with active SDT probes will cause a panic. [1] * A module's (FBT/SDT) probes aren't destroyed when the module is unloaded; trying to use them after the fact will generally cause a panic. This change fixes both problems by porting the DTrace module load/unload handlers from illumos and registering them with the corresponding EVENTHANDLER(9) handlers. This allows the DTrace framework to destroy all probes defined in a module when that module is unloaded, and to prevent a module unload from proceeding if some of its probes are active. The latter problem has already been fixed for FBT probes by checking lf->nenabled in kern_kldunload(), but moving the check into the DTrace framework generalizes it to all kernel providers and also fixes a race in the current implementation (since a probe may be activated between the check and the call to linker_file_unload()). Additionally, the SDT implementation has been reworked to define SDT providers/probes/argtypes in linker sets rather than using SYSINIT/SYSUNINIT to create and destroy SDT probes when a module is loaded or unloaded. This simplifies things quite a bit since it means that pretty much all of the SDT code can live in sdt.ko, and since it becomes easier to integrate SDT with the DTrace framework. Furthermore, this allows FreeBSD to be quite flexible in that SDT providers spanning multiple modules can be created on the fly when a module is loaded; at the moment it looks like illumos' SDT implementation requires all SDT probes to be statically defined in a single kernel table. PR: 166927, 166926, 166928 Reported by: davide [1] Reviewed by: avg, trociny (earlier version) MFC after: 1 month	2013-08-13 03:10:39 +00:00
markj	5423ffaa89	Remove some unused fields from struct linker_file. They were added in r172862 for use by the DTrace SDT framework but don't seem to have ever been used. MFC after: 2 weeks	2013-08-13 03:09:00 +00:00
markj	80dd3f5e73	Add event handlers for module load and unload events. The load handlers are called after the module has been loaded, and the unload handlers are called before the module is unloaded. Moreover, the module unload handlers may return an error to prevent the unload from proceeding. Reviewed by: avg MFC after: 2 weeks	2013-08-13 03:07:49 +00:00
kib	58a6f3bcbe	The r254167 moved initialization of the sleepqueues before the witness is operational. init_sleepqueues() initializes 256 mutexes, which, due to witness still being cold, started to overflow the pending_locks array. As stated in the reported panic message, increase WITNESS_PENDLIST from 768 to 1024, which provides space for additional 256 locks. Reported by: many Tested by: rakuco, bdrewery	2013-08-10 21:42:14 +00:00
cognet	333a884980	Don't call sleepinit() from proc0_init(), make it a SYSINIT instead. vmem needs the sleepq locks to be initialized when free'ing kva, so we want it called as early as possible.	2013-08-09 23:13:52 +00:00
cognet	51c3f72bfa	Instead of just trying to do it for arm, make sure vm_kmem_size is properly aligned in kmeminit(), where it'll work for any arch. Suggested by: alc	2013-08-09 22:30:54 +00:00
attilio	e9f37cac74	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
attilio	3f74b0e634	Give mutex(9) the ability to recurse on a per-instance basis. Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls. This helps in cases we want to narrow down to specific calls the possibility to recurse for some locks. Sponsored by: EMC / Isilon storage division Reviewed by: jeff, alc Tested by: pho	2013-08-09 11:24:29 +00:00
attilio	16c7563cf4	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
trasz	fcb31f05a9	Don't dereference null pointer should acl_alloc() be passed M_NOWAIT and allocation failed. Nothing in the tree passed M_NOWAIT. Obtained from: mjg MFC after: 1 month	2013-08-09 08:40:31 +00:00
scottl	40b11a1746	Add a helpful message that can help point to why a sysctl tree removal failed Obtained from: Netflix MFC after: 3 days	2013-08-09 01:04:44 +00:00
rstone	d9719f74bc	Allow drivers to return BUS_PROBE_NOWILDCARD from their attach routine to match devices where the driver class was fixed but the unit number was wildcarded. This better matches the documented behaviour in DEVICE_PROBE(9). Reviewed by: imp	2013-08-08 19:30:49 +00:00

1 2 3 4 5 ...

13381 Commits