freebsd-skq

Author	SHA1	Message	Date
kmacy	3daa1603f7	Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone, simplify setting refcnt Reviewed by: andre, rwatson, and glebius MFC after: 3 days	2007-04-04 21:27:01 +00:00
mohans	83064ec323	Fix for problems that occur when all mbuf clusters migrate to the mbuf packet zone. Cluster allocations fail when this happens. Also processes that may have blocked on cluster allocations will never be woken up. Thanks to rwatson for an overview of the issue and pointers to the mbuma paper and his tool to dump out UMA zones. Reviewed by: andre@	2007-01-25 01:05:23 +00:00
rwatson	7beaaf5cd2	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
andre	710a642f7a	Remove VLAN mtag UMA zones and initialize ether_vtag and tso_segsz packet header fields to zero on mbuf allocation. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:44:32 +00:00
rwatson	120490c1a5	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month	2006-06-10 14:34:07 +00:00
ps	10b2fe8dea	Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.	2006-04-21 09:25:40 +00:00
andre	8244d1b6f2	Properly handle the case when the packet secondary zone can't allocate further mbuf clusters to attach to mbufs. Reported by: kris Tested by: kris Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-03-08 14:05:38 +00:00
glebius	3d033e9247	One more grammar nit. Submitted by: ru	2006-02-27 07:22:32 +00:00
glebius	a9ab65374c	Fix several typos and trim spaces at eol. PR: kern/93759 Submitted by: Antoine Brodin <antoine.brodin laposte.net>	2006-02-26 11:44:28 +00:00
andre	a6010e7b75	Replace the 4k fixed sized jumbo mbuf clusters with PAGE_SIZE sized jumbo mbuf clusters. To make the variable size clear they are named MJUMPAGESIZE. Having jumbo clusters with the native PAGE_SIZE is more useful than a fixed 4k size according the device driver writers using this API. The 9k and 16k jumbo mbuf clusters remain unchanged. Requested by: glebius, gallatin Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-17 14:14:15 +00:00
glebius	19f8b36e66	Merge the //depot/user/yar/vlan branch into CVS. It contains some collective work by yar, thompsa and myself. The checksum offloading part also involves work done by Mihail Balikov. The most important changes: o Instead of global linked list of all vlan softc use a per-trunk hash. The size of hash is dynamically adjusted, depending on number of entries. This changes struct ifnet, replacing counter of vlans with a pointer to trunk structure. This change is an improvement for setups with big number of VLANs, several interfaces and several CPUs. It is a small regression for a setup with a single VLAN interface. An alternative to dynamic hash is a per-trunk static array with 4096 entries, which is a compile time option - VLAN_ARRAY. In my experiments the array is not an improvement, probably because such a big trunk structure doesn't fit into CPU cache. o Introduce an UMA zone for VLAN tags. Since drivers depend on it, the zone is declared in kern_mbuf.c, not in optional vlan(4) driver. This change is a big improvement for any setup utilizing vlan(4). o Use rwlock(9) instead of mutex(9) for locking. We are the first ones to do this! :) o Some drivers can do hardware VLAN tagging + hardware checksum offloading. Add an infrastructure for this. Whenever vlan(4) is attached to a parent or parent configuration is changed, the flags on vlan(4) interface are updated. In collaboration with: yar, thompsa In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>	2006-01-30 13:45:15 +00:00
andre	b7dae7ac6c	In mb_zinit_pack() explicitly ignore the return value of uma_zalloc_arg(). The success of the cluster allocation is checked through a field in the mbuf structure. This change is non-functional but properly silences code inspection tools. Found by: Coverity Prevent(tm) Coverity ID: CID807 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-23 15:49:01 +00:00
andre	5751cf9192	Hide the 4k mbuf clusters if the normal clusters are defined to be 4k already. This unbreaks tinderbox. Submitted by: ru	2005-12-10 15:21:04 +00:00
andre	143b5d29e0	Add an API for jumbo mbuf cluster allocation and also provide 4k clusters in addition to 9k and 16k ones. struct mbuf m_getjcl(int how, short type, int flags, int size) void m_cljget(struct mbuf *m, int how, int size) m_getjcl() returns an mbuf with a cluster of the specified size attached like m_getcl() does for 2k clusters. m_cljget() is different from m_clget() as it can allocate clusters without attaching them to an mbuf. In that case the return value is the pointer to the cluster of the requested size. If an mbuf was specified, it gets the cluster attached to it and the return value can be safely ignored. For size both take MCLBYTES, MJUM4BYTES, MJUM9BYTES, MJUM16BYTES. Reviewed by: glebius Tested by: glebius Sponsored by: TCP/IP Optimization Fundraise 2005	2005-12-08 13:13:06 +00:00
glebius	33ac50a107	Fix panic string in last revision.	2005-11-06 16:47:59 +00:00
andre	07bbeaa756	Free only those mbuf+clusters back to the packet zone that were allocated from there. All others get broken up and free'd individually to the mbuf and cluster zones. The packet zone is a secondary zone to the mbuf zone. There is currently a limitation in UMA which prevents decreasing the packet zone stock when the mbuf and cluster zone are drained and all their members are part of packets. When this is fixed this change may be reverted.	2005-11-05 19:43:55 +00:00
andre	438cb7bde7	Fix a logic error introduced with mandatory mbuf cluster refcounting and freeing of mbufs+clusters back to the packet zone.	2005-11-04 17:20:53 +00:00
andre	b53d0a6c80	Mandatory mbuf cluster reference counting and groundwork for UMA based jumbo 9k and jumbo 16k cluster support. All mbuf's with external storage attached are mandatory reference counted. For clusters and jumbo clusters UMA provides the refcnt storage directly. It does not have to be separatly allocated. Any other type of external storage gets its own refcnt allocated from an UMA mbuf refcnt zone instead of normal kernel malloc. The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer publically accessible. The proper m_* functions have to be used. mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well as 9k and 16k clusters. Clusters and jumbo clusters may be obtained without attaching it immideatly to an mbuf. This is for high performance cluster allocation in network drivers where mbufs are attached after the cluster has been filled. Tested by: rwatson Sponsored by: TCP/IP Optimizations Fundraise 2005	2005-11-02 16:20:36 +00:00
rwatson	5651e89254	No longer maintain mbstat statistics for the mbuf allocator, UMA statistics and libmemstat(3) are now used to track mbuf statistics. MFC after: 1 month	2005-09-27 20:28:43 +00:00
rwatson	38dffb25ed	Define four constants, MBUF_{,MEM,CLUSTER,PACKET,TAG}_MEM_NAME, which are string names for their respective UMA zones and malloc types, and are passed into uma_zcreate() and MALLOC_DEFINE(). Export them outside of _KERNEL in mbuf.h so that netstat can reference them. Change the names to improve consistency, with each zone/type associated with the mbuf allocator being prefixed mbuf_. MFC after: 1 week	2005-07-17 14:04:03 +00:00
silby	0edd2a4f6f	Fix the false memory modified after free messages some users have been reporting - in my previous change, I missed the case where a mbuf from the packet zone was freed back to the mbuf/packet keg, where it was subsequently put into the mbuf zone and found not to contain the expected trash. This change adds the necessary trash_dtor call inside mb_fini_pack so that everything is correct. Thanks for Bosko for finding the bug and showing me how secondary zones work. Approved by: re (dwhite)	2005-06-29 08:18:26 +00:00
silby	cbb0f23931	Change the mbuf, mbuf cluster, and mbuf packet allocation routines so that the UMA "trash" allocator is used - this ensures that any writes to a freed mbuf should provoke a panic. Only enabled under INVARIANTS, of course. Approved by: re (scottl)	2005-06-23 04:33:39 +00:00
bmilekic	f9dded75d0	Well, it seems that I pre-maturely removed the "All rights reserved" statement from some files, so re-add it for the moment, until the related legalese is sorted out. This change affects: sys/kern/kern_mbuf.c sys/vm/memguard.c sys/vm/memguard.h sys/vm/uma.h sys/vm/uma_core.c sys/vm/uma_dbg.c sys/vm/uma_dbg.h sys/vm/uma_int.h	2005-02-16 21:45:59 +00:00
bmilekic	885ba93847	Optimize the way reference counting is performed with Mbufs. We do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster) constructor to initialize the reference counter anymore. The reference counts are located in a separate memory region (in the slab header, because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very often in a cache miss. Additionally, and perhaps more significantly, optimize the free mbuf+cluster (packet) case, which is very common, to no longer require an atomic operation on free (to verify the reference counter) if the reference on the cluster has never been increased (also very common). Reduces an atomic on mbuf free on average. Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>	2005-02-10 22:23:02 +00:00
bmilekic	5cdce7d092	Update copyright, remove "all rights reserved" (since they are not all reserved, as the lisence makes clear), and strike the third clause (now this is a 2-clause liberal BSDL as are the rest of files I hold copyright over).	2005-02-01 03:17:52 +00:00
brian	255869f387	CTASSERT that MSZIE is a power of 2 (otherwise dtom() breaks) Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks) As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE anyway, but that was an implementation side-effect.... KASSERT -> CTASSERT suggested by: dd@ Approved by: silence on -net	2004-09-20 08:52:04 +00:00
green	9532ab7116	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.	2004-08-02 00:18:36 +00:00
bmilekic	9e06a1e05a	Fix a couple of bugs in the mbuf and packet ctors. In the latter case, nextpkt within the m_hdr was not being initialized to NULL for !M_PKTHDR cases. Maybe this will fix weird socket buffer inconsistency panics, but we'll see.	2004-06-01 16:17:10 +00:00
bmilekic	f7574a2276	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00

29 Commits