choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.
Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64
itself; this causes undefined behaviour on UltraSPARCs. In particular,
the interrupt packet data words will not necessarily be delivered
correctly, which would result in a crash.
This bug also caused the cache-flushing work to be done twice on the
triggering CPU (when it did not cause crashes).
Reviewed by: jake
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.
Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.
I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.
I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.
New option: DISABLE_PG_G - In case I missed something.
keyword and in the description of rp's hints.
Didn't fix rp's hints being mostly in comments so that they are harder to
use (they don't get linted either way because makeLINT.sh strips them and
there is no compile-time syntax checking of hints anyway).
available from bsd.obj.mk.
The native version was identical (and pretty much unused except in
the -DMODULES_WITH_WORLD case, which it is not for "make release")
except that the "bin" -> "base" change of the default DISTRIBUTION
name did not propagate here.
NOTES. Add some comments about the potential problems associated with NIC
driver modules and changing these options.
Fix sorting problems in sys/conf/options with the MSIZE and MCLSHIFT
options.
Reviewed by: bde
This code does not imply that SBus cards work yet. They hang for me.
But I can't netboot the latest snapshot on my ultra1e, and things
hang at bus_setup_intr time.
Since I'm offline for a while, I thought I'd toss this in in case somebody
else who has a bit better luck wants to fart around with it. Please try
and wait until I get back to check things in.
warn*(), and setproctitle() functions) to buildworld work again. This
can be cleaned up later if/when a new GCC supports the feature (but personally
I think it's a waste of time to keep mod'ing imported GCC sources for this
since only three procedures are involved).
Suggested by: peter
and kmem_free_wakeup(). Previously, kmem_free_wakeup() always
called wakeup(). In general, no one was sleeping.
o Export vm_map_unlock_and_wait() and vm_map_wakeup() from vm_map.c
for use in vm_kern.c.
the default) is now the only method for i386.
Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().
This allows accton(1) to be used with an append-only file.
PR: 7169
Reported by: Joao Carlos Mendes Luis <jonny@jonny.eng.br>
Reviewed by: bde
Approved by: sheldonh (mentor)
MFC after: 2 weeks
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on. Extensive testing has appeared to have shown no
increase in overhead.
Disadvantages
Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal
when there is locality of reference).
Advantages
vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried
to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on
vnodes will be easier.
This commit along with another that Alan is working on for the VM page
global hash table will allow me to implement ranged fsync(), optimize
server-side nfs commit rpcs, and implement partial syncs by the
filesystem syncer (aka filesystem syncer would detect that someone is
trying to get the vnode lock, remembers its place, and skip to the
next vnode).
Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
- Remove some obsolete code (NetBSD gem.c r1.12)
- Clean up how the local MAC address is programmed (NetBSD gem.c r1.13)
- Make the driver work on PowerMacs with gigabit interfaces
(NetBSD gem.c r1.14 and r1.15, gemreg.h r1.3 and r1.4, gemvar.h r1.6 and 1.7)
- Suppress RX_MAC interrutps regarding the FRAME_COUNT register.
(NetBSD gem.c r1.16 and r1.17)
- Fix receiver lockups. (NetBSD gem.c r1.18, gemvar.h r1.8)
- Distinguish between Apple and Sun variants (NetBSD if_gem_pci.c r1.9)
Reviewed by: tmm
Obtained from: NetBSD
like this can be emulated by VT_SETMODEing to VT_PROCESS and never
releasing the vty, but this has a number of problems, most notably
that a process must stay resident for the lock to be in effect.
Reviewed by: roam, sheldonh
ipl.s except doreti which really belongs in with the exceptions as it's
just the other side of the same coin. Will remove ipl.s in a separate commit.
Agreed by: several including bde@freebsd.org
- Add IGNORE_LOCK() that only ignores VCHR files for now since no one locks
their underlying device in the leaf filesystems. (devvp)
- Add prototypes for vop_lookup_{pre,post} that I forgot before.
I've tried to make this fairly platform-independant as some PowerPC platforms
may not have openpic-style interrupt controllers. This may not have the best
performance but it works for now.
that the attach succeeded. (Fixes a potential panic for devices
that fail to attach properly and are subsquently unplugged and then
plugged back in again.)
I do not know why this didn't panic my box, but I have most certainly
been using it:
peter@overcee[3:14pm]~src/sys/i386/i386-110> sysctl -a | grep zero
vm.stats.misc.zero_page_count: 2235
vm.stats.misc.cnt_prezero: 638951
vm.idlezero_enable: 1
vm.idlezero_maxrun: 16
Submitted by: Tor.Egge@cvsup.no.freebsd.org
Approved by: Tor's patches are never wrong. :-)
TLB problem when bouncing from one cpu to another (the original cpu will
not have purged its TLB if the it simply went idle).
Pointed out by: Tor.Egge@cvsup.no.freebsd.org
Approved by: Tor is never wrong. :-)
no punch_fw was used.
Fix another couple of bugs which prevented rules from being
installed properly.
On passing, use IPFW2 instead of NEW_IPFW to compile the new code,
and slightly simplify the instruction generation code.
Following Darren's suggestion, make Dijkstra happy and rewrite the
ipfw_chk() main loop removing a lot of goto's and using instead a
variable to store match status.
Add a lot of comments to explain what instructions are supposed to
do and how -- this should ease auditing of the code and make people
more confident with it.
In terms of code size: the entire file takes about 12700 bytes of text,
about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!)
for ipfw_log().
of being correct. None of the root mountable filesystems
do something at VFS_START().
Shorten a comment to fix a style bug while I'm here.
PR: kern/18505
Oops; I forgot for previous delta... If we're and FC or ULTRA2 or better
card, we can have a 1024 element request queue instead of 256.
MFC after: 1 week
Remove sim queue freezes for resource shortages. I've had too many
strange race conditions where I freeze on a resource shortage but
never get unfrozen.
Consolidate the remaining sim queue freeze condition (for loopdown)
into an inline with debug messages that allows us to track problems
at ISP_LOGDEBUG0 level easier. Change a bunch of debug messages about
loop down/up conditions to ISP_LOGDEBUG0 level.
Remove dead isp_relsim code.
Change some internal flag stuff for efficiency.
Complain vociferously if we try and use our FC scratch area while it's
busy being used already (I mean, if we don't have solaris' ability
to sleep as an interrupt thread which would allow us to just use
a p/v semaphore, at least *say* when you've just borked yourself).
Add infrastructure to allow overrides of hard loopid && initiator
id from boot variables.
Fix the usual quota of silly bugs:
+ 'ktmature' needs to be per-instance. Argh.
+ When entering isp_watchdog, set intsok to zero, preserving
old value to restore later. It's not nice to try and sleep
from splsoftclock.
+ Fix tick overflow buglet in checking timeout value.
MFC after: 1 week
turns out that there's something of a hole in our new fabric name
server stuff. We ask the name server for entities that have
registered as a specific type. That type is FC-SCSI. If the entity
hasn't performed a REGISTER FC4 TYPES, the fabric nameserver won't
return it.
This brings this driver to a bit of a fork in the road as to what
the right thing to do is. For servicing the needs of accessing
FC-SCSI devices, this method is fine, and to be preferred. It is
extremely unlikely we're interested in fabric devices that *don't*
register correctly. If I ever get around to adding an FC-IP stack,
then asking for devices that have registers as FC-IP types is also
the right thing to do.
So- asking the fabric nameserver for a specific type is fine, *as
long as you are only interested in specific types*. If, on the other
hand, you want to create (as for management tool support) a picture
of everything on the fabric, this is *not* so fine. There are a
large class of FC-SCSI *initiators* who *don't* correctly register,
so we never will *see* them.
Is this a problem? Yes, but only a little one. If we want to do such
management tool support, we should probably run a *different* fabric
nameserver query algorithm. Better yet, we should talk to the management
nameserver in Brocade switches instead of the standard FC-GS-2 fabric
nameserver (which can be unwieldy).
Other changes: if we've overrrides marked, don't set some default
values from reading NVRAM. This allows us to override things like
EXEC throttle without having to ignore NVRAM entirely.
MFC after: 1 week
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.
We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.
Use Luigi's hack for preserving the (lack of) priority.
Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.
- Initialize lock structure in vncache_alloc
- Return locked vnodes from vncache_alloc
- Setup vnode op vectors to use default lock, unlock, and islocked
- Implement simple locking scheme required for lookup
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.
While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.
the actual code. Both use a ";" (not a ",") to delimit entries.
PR: 39679
Submitted by: Cyrille Lefevre <cyrille.lefevre@laposte.net>
MFC after: 3 days
Tell vop_strategy_pre() to use this instead.
- Ignore B_CLUSTER bufs. Their components are locked but they don't really
exist so they don't have to be. This isn't ideal but it is safe.
vm_mmap() as well as the GETATTR etc.
- If the handle is a vnode in vm_mmap() assert that it is locked.
- Wiggle Giant around a little to account for the extra vnode operation.
- Cache a pointer to the vnode's object in the buf.
- Hold a reference to that object in addition to the vnode's reference just
to be consistent.
- Cleanup code that got the object indirectly through the vp and VOP calls.
This fixes at least one case where we were calling GETVOBJECT without a lock.
It also avoids an expensive layered call at the cost of another pointer in
struct buf.
- Grab the vnode object early in exec when we still have the vnode lock.
- Cache the object in the image_params.
- Make use of the cached object in imgact_*.c
- Switch to the new vop_strategy_pre for lock validation.
VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated. There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
- Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.
VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated. There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.
in the VOP inlines. This is intended to replace the simple locking
specifications for calls that have more complicated behavior such as rename and
lookup.
The syntax of the new entries is:
#! name pre/post function
If the function is marked 'pre' it is executed prior to calling the VOP and
takes a pointer to a struct vop_{name}_args as it's only parameter.
If the function is marked 'post' it is executed after the VOP call and takes
a pointer to a struct vop_{name}_args as it's first parameter and the integer
return value from the vop as the second paramter.
now it should support all the instructions of the old ipfw.
Fix some bugs in the user interface, /sbin/ipfw.
Please check this code against your rulesets, so i can fix the
remaining bugs (if any, i think they will be mostly in /sbin/ipfw).
Once we have done a bit of testing, this code is ready to be MFC'ed,
together with a bunch of other changes (glue to ipfw, and also the
removal of some global variables) which have been in -current for
a couple of weeks now.
MFC after: 7 days
internal PHY on the 3COM 3C905B and 3C905C parts, however I've rigged it so
that xlphy (aka exphy) takes precedence for the time being.
If people try this with their xl cards and decide that it's a better choice,
we can switch this later.
This is the PHY used in various iMacs and possibly other GMAC-equipped
Macintoshes with 10/100 PHYs (the ones with 10/100/1000 appear to use brgphy).
Obtained from: NetBSD
- Tell IS_LOCKING_VFS to ignore block and character devices. specfs vnodes
aren't locked for io and they just generate lots of false positives.
- Add newlines to the badlock prints.
we just have to deal with the kstack when told to. We do not have a
UMA-managed cache for the proc struct and its associated upage yet. So,
go back to the old lazy mechanism. Note that if UMA destroys pages that
used to contain proc structures, we'll lose the corresponding upage
forever. (zones never did this - once a page was allocated, it stayed
attached to the proc zone forever)
driver. I tried a few obvious experiments, but was unable to make
the 3c996B-T generate correct UDP checksums for transmitted fragmented
packets. I'm not so sure the device is even capable of it.
This fixes NFS over UDP.
MFC after: 1 day
queue lock (revision 1.33 of vm/vm_page.c removed them).
o Make the free queue lock a spin lock because it's sometimes acquired
inside of a critical section.
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))
of the KVA space's size in addition to the amount of physical memory
and reduce it by a factor of two.
Under the old formula, our reservation amounted to one kernel map entry
per virtual page in the KVA space on a 4GB i386.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.
Reviewed by: rwatson
Repo-copy by: peter
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.
Reviewed by: mckusick
module. This adds an ffs_uninit() function that calls ufs_uninit()
and also calls a new softdep_uninitialize() function. Add a stub
for softdep_uninitialize() to cover the non-SOFTUPDATES case.
Reviewed by: mckusick
Add definition of COMPILER_DEPENDENT_INT64 and also
fix definition of COMPILER_DEPENDENT_UINT64.
Pointed-out by: Michael Nottebrock <michaelnottebrock@gmx.net>
are packets queued for transmission.
This driver is strange -- it never sets IFF_OACTIVE, so all
transmissions always cause a call to fxp_start. However, if the
link gets stuck, there was nothing to reset it, so there was still
a possibility of lockups.
MFC after: 3 days
still queued for transmission. This should solve the problem of
the device stalling on transmissions if some link event prevents
transmission.
There are other drivers which have the same problem and need to be
fixed in the same way.
MFC after: 3 days
so that, if we recieve a ICMP "time to live exceeded in transit",
(type 11, code 0) for a TCP connection on SYN-SENT state, close
the connection.
MFC after: 2 weeks
and function) with existing configuration choices. Arguably if
ALT_BREAK_TO_DEBUGGER was present, so should have been
BREAK_TO_DEBUGGER. Regardless, it broke the option sort order in
these kernel configuration files.
Requested by: bde
use it is not built by default, and there are currently bugs that
prevent UFS from being unloaded. Nevertheless it can be useful when
developing UFS code on network-booted machines.
turn it off!
I don't know if people think that these debugging macros are worth keeping
or not but I'll keep them for a short while, while the danger of
queue stuffups in the (rather complicated) run queue code exists.
close up the continued line after removing the cast made the line.
space before parentheses in indirect function call.
Add an addtional error handler case for the results of callback.
Submitted by: bde
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
- Statically size the bpvo entries to avoid conflicts between bpvo allocation
and the vm allocator.
- Shift pmap_init2 code into pmap_init.
- Add UMA_ZONE_VM flag to uma_zcreate.
Submitted by: Peter Grehan <peterg@ptree32.com.au>
The case in cpu_switch() where there isn't a higher priority thread
(choosethread() == curthread) uses r4 as the PCB context pointer. However, the
use of r4 after the label L2 is incorrect, since it was probably trashed by
the call to choosethread, and in any case was set up to curthread at the start
of the routine.
This condition will occur when an interrupt thread schedules a netisr, which
is a lower priority thread.
Another (probably unnecessary) difference is that I was paranoid about
register trashing, so I decided to save r2 and r13 as well.
Submitted by: Peter Grehan <peterg@ptree32.com.au>
- Tidy up clock code. Don't repeatedly call hardclock().
- Remove intrnames, decrnest and intrcnt from locore.s
- Coalesce all trap handling into a single stub that then calls a dispatch
function.
Submitted by: Peter Grehan <peterg@ptree32.com.au>
nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread
argument to nfs_getcacheblk(). In nfs_getcacheblk() we dereference the
thread pointer to get a process pointer to pass to nfs_sigintr(). This
obviously results in a panic. :)
Rather than change nfs_getcacheblk() to check if the thread pointer is
NULL when calling nfs_sigintr() like other callers do, change
nfs_sigintr() to take a thread as the last argument instead of a
process so none of the callers have to care if the thread is NULL or not.
- Add vfs_badlock_print to control whether or not we print lock violations
- Add vfs_badlock_panic to control whether we panic on lock violations
Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.
This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.
Reviewed by: jeff
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)
Which winds up deleting a different entry from the syncache. Handle
this by not utilizing the next entry in the timer chain until after
syncache_respond() completes. The case of A == B should not be possible.
Problem found by: Don Bowman <don@sandvine.com>
This code makes use of variable-size kernel representation of rules
(exactly the same concept of BPF instructions, as used in the BSDI's
firewall), which makes firewall operation a lot faster, and the
code more readable and easier to extend and debug.
The interface with the rest of the system is unchanged, as witnessed
by this commit. The only extra kernel files that I am touching
are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In
userland I only had to touch those programs which manipulate the
internal representation of firewall rules).
The code is almost entirely new (and I believe I have written the
vast majority of those sections which were taken from the former
ip_fw.c), so rather than modifying the old ip_fw.c I decided to
create a new file, sys/netinet/ip_fw2.c . Same for the user
interface, which is in sbin/ipfw/ipfw2.c (it still compiles to
/sbin/ipfw). The old files are still there, and will be removed
in due time.
I have not renamed the header file because it would have required
touching a one-line change to a number of kernel files.
In terms of user interface, the new "ipfw" is supposed to accepts
the old syntax for ipfw rules (and produce the same output with
"ipfw show". Only a couple of the old options (out of some 30 of
them) has not been implemented, but they will be soon.
On the other hand, the new code has some very powerful extensions.
First, you can put "or" connectives between match fields (and soon
also between options), and write things like
ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any
This should make rulesets slightly more compact (and lines longer!),
by condensing 2 or more of the old rules into single ones.
Also, as an example of how easy the rules can be extended, I have
implemented an 'address set' match pattern, where you can specify
an IP address in a format like this:
10.20.30.0/26{18,44,33,22,9}
which will match the set of hosts listed in braces belonging to the
subnet 10.20.30.0/26 . The match is done using a bitmap, so it is
essentially a constant time operation requiring a handful of CPU
instructions (and a very small amount of memmory -- for a full /24
subnet, the instruction only consumes 40 bytes).
Again, in this commit I have focused on functionality and tried
to minimize changes to the other parts of the system. Some performance
improvement can be achieved with minor changes to the interface of
ip_fw_chk_t. This will be done later when this code is settled.
The code is meant to compile unmodified on RELENG_4 (once the
PACKET_TAG_* changes have been merged), for this reason
you will see #ifdef __FreeBSD_version in a couple of places.
This should minimize errors when (hopefully soon) it will be time
to do the MFC.
calibrated. This fixes the problem where playback and recording do
not run at the correct speed. It probably also eliminates the
need for the hacks/workarounds/sysctl's that were previously
devised to deal with this, but I will leave that for a different
time.
Reviewed by: orion
bridges in modern hardware (that hardware w/ lots of RAM). Raise the
address from 0x44000000 to 0x88000000 to match what we do with
NEWCARD. However, this really should be done in the pci layer.
passed down the VFS stack. While I'm here, replace a '0' with a 'NULL'
to make the code more readable.
Sponsored by: DARPA, NAI Labs
Obtained from: TrustedBSD Project
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)
"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."
Submitted by: tegge
Approved by: tegge's patches never fail
types are not required, as the overhead is unnecessary:
o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.
imposed by the filesystem structure itself remains. With 16k blocks,
the maximum file size is now just over 128TB.
For now, the UFS1 file size limit is left unchanged so as to remain
consistent with RELENG_4, but it too could be removed in the future.
Reviewed by: mckusick
there to protect fdrop() (which in turn can call vrele()), however,
fdrop_locked() grabs Giant for us, so we do not have to.
Reviewed by: jhb
Inspired by: alc
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.
ti.4: Update the ti(4) man page to include information on the
TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
and also include information about the new character
device interface and the associated ioctls.
man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated
links.
jumbo.9: New man page describing the jumbo buffer allocator
interface and operation.
zero_copy.9: New man page describing the general characteristics of
the zero copy send and receive code, and what an
application author should do to take advantage of the
zero copy functionality.
NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.
conf/files: Add uipc_jumbo.c and uipc_cow.c.
conf/options: Add the 5 options mentioned above.
kern_subr.c: Receive side zero copy implementation. This takes
"disposable" pages attached to an mbuf, gives them to
a user process, and then recycles the user's page.
This is only active when ZERO_COPY_SOCKETS is turned on
and the kern.ipc.zero_copy.receive sysctl variable is
set to 1.
uipc_cow.c: Send side zero copy functions. Takes a page written
by the user and maps it copy on write and assigns it
kernel virtual address space. Removes copy on write
mapping once the buffer has been freed by the network
stack.
uipc_jumbo.c: Jumbo disposable page allocator code. This allocates
(optionally) disposable pages for network drivers that
want to give the user the option of doing zero copy
receive.
uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are
enabled if ZERO_COPY_SOCKETS is turned on.
Add zero copy send support to sosend() -- pages get
mapped into the kernel instead of getting copied if
they meet size and alignment restrictions.
uipc_syscalls.c:Un-staticize some of the sf* functions so that they
can be used elsewhere. (uipc_cow.c)
if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
calling malloc() with M_WAITOK. Return an error if
the M_NOWAIT malloc fails.
The ti(4) driver and the wi(4) driver, at least, call
this with a mutex held. This causes witness warnings
for 'ifconfig -a' with a wi(4) or ti(4) board in the
system. (I've only verified for ti(4)).
ip_output.c: Fragment large datagrams so that each segment contains
a multiple of PAGE_SIZE amount of data plus headers.
This allows the receiver to potentially do page
flipping on receives.
if_ti.c: Add zero copy receive support to the ti(4) driver. If
TI_PRIVATE_JUMBOS is not defined, it now uses the
jumbo(9) buffer allocator for jumbo receive buffers.
Add a new character device interface for the ti(4)
driver for the new debugging interface. This allows
(a patched version of) gdb to talk to the Tigon board
and debug the firmware. There are also a few additional
debugging ioctls available through this interface.
Add header splitting support to the ti(4) driver.
Tweak some of the default interrupt coalescing
parameters to more useful defaults.
Add hooks for supporting transmit flow control, but
leave it turned off with a comment describing why it
is turned off.
if_tireg.h: Change the firmware rev to 12.4.11, since we're really
at 12.4.11 plus fixes from 12.4.13.
Add defines needed for debugging.
Remove the ti_stats structure, it is now defined in
sys/tiio.h.
ti_fw.h: 12.4.11 firmware.
ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13,
and my header splitting patches. Revision 12.4.13
doesn't handle 10/100 negotiation properly. (This
firmware is the same as what was in the tree previously,
with the addition of header splitting support.)
sys/jumbo.h: Jumbo buffer allocator interface.
sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to
indicate that the payload buffer can be thrown away /
flipped to a userland process.
socketvar.h: Add prototype for socow_setup.
tiio.h: ioctl interface to the character portion of the ti(4)
driver, plus associated structure/type definitions.
uio.h: Change prototype for uiomoveco() so that we'll know
whether the source page is disposable.
ufs_readwrite.c:Update for new prototype of uiomoveco().
vm_fault.c: In vm_fault(), check to see whether we need to do a page
based copy on write fault.
vm_object.c: Add a new function, vm_object_allocate_wait(). This
does the same thing that vm_object allocate does, except
that it gives the caller the opportunity to specify whether
it should wait on the uma_zalloc() of the object structre.
This allows vm objects to be allocated while holding a
mutex. (Without generating WITNESS warnings.)
vm_object_allocate() is implemented as a call to
vm_object_allocate_wait() with the malloc flag set to
M_WAITOK.
vm_object.h: Add prototype for vm_object_allocate_wait().
vm_page.c: Add page-based copy on write setup, clear and fault
routines.
vm_page.h: Add page based COW function prototypes and variable in
the vm_page structure.
Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
a new resource limit that covers a process's entire VM space, including
mmap()'d space.
(Part II will be additional code to check RLIMIT_VMEM during exec() but it
needs more fleshing out).
PR: kern/18209
Submitted by: Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net>
MFC after: 7 days
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:
o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().
the indirection operator ('*') and address examination ('x/a') on
big-endian platoforms for which the above is not true, as well as on
little-endian platforms if the cut-off bits are not 0.
installed with pmap_kenter_flags, since the physical addresses may not
have an associated vm_page. Add a function to do this.
Tested by: Tomi Vainio <Tomi.Vainio@Sun.COM>
up when operating in PCI-X mode. For some received packets there is
data corruption in the first few bytes in that case. Aligning the
packet buffer eliminates the corruption. With this fix, the code
that offsets the packet buffer up by 2 bytes to align the payload is
disabled for BCM5701s operating in PCI-X mode. On the i386, which
permits unaligned accesses, the payload is left unaligned. On other
platforms, the packet is copied after reception to force alignment
of the payload. Obviously, this work-around reduces performance in
those cases (BCM5701 plus PCI-X) where it is in effect.
MFC after: 3 days
sysctl (machdep.cpu_idle_hlt) to off in the SMP case. This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
otherwise we might get interrupts and are unable to
handle them properly, which results in a page fault.
PR: kern/39549
Submitted by: Gil Kloepfer <gil@arlut.utexas.edu>
Add a comment so that people don't forget to keep the
version in src/lib/libmd/md5c.c in sync with this one.
This fixes a warning on sparc64.
Reviewed by: phk
request. We need to eat the MAC address of the packet before we go
looking at the SSID and such. Doing do is sufficient to make Cisco
cards assocaite with prism II cards.
The submitter says that Linux does the same thing.
Submitted by: jhay