- rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads
that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed
timestamp bugs in it. Removal of most of vfs_ioopt made it just and
optimization, and removal of the vm object reference calls made it less
than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as
part of cloning ffs_extwrite() although it was always less than an
optimization in ffs_extwrite().
- some comments, compound statements and vertical whitespace were vestiges
of dead code.
- culled long-dead #define's
- segment register defs moved to sr.h
- NPMAPS moved to pmap.h
- KERNBASE moved to vmparam.h
- removed include of <machine/cpu.h> and fixed src files that
relied on this.
Modifying segment register code no longer causes gcc rebuilds :-)
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.
For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.
Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.
There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.
Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.
This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.
Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.
Sponsored by: sentex.net
systems define power/sleep buttons in both places but only deliver
notifies to the ones defined in the AML.
Also, reduce length of various function handler names.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
pci-hi/med/lo + node 'interrupts' property. This worked by
accident until recent notebooks required correct operation.
Tested by: Suleiman Souhlal <refugee@segfaulted.com>
routines to do anything except return error if the miniport adapter context
is not set (meaning we either having init'ed the driver yet, or the
initialization failed).
Also, be sure to NULL out the adapter context along with the
miniport characteristics pointers if calling the MiniportInitialize()
method fails.
the added comment for low-level details.) The effect of this race
condition is a panic "vm_page_cache: caching a dirty page, ..."
Reviewed by: tegge
MFC after: 7 days
created with the same name, and vice versa:
- Immediately recycle vnodes of files & directories that have been deleted
or renamed.
- When looking an entry in the VFS name cache or smbfs's private
cache, make sure the vnode type is consistent with the type of file
the server thinks it is, and re-create the vnode if it isn't.
The alternative to this is to recycle vnodes unconditionally when their
use count drops to 0, but this would make all the caching we do
mostly useless.
PR: 62342
MFC after: 2 weeks
- Factor out common settings and put them in an upper level Makefile.inc.
- Properly use PROG for real programs, not their products.
- Further reduce diffs to i386 versions.
Tested on: sparc64 (panther)
- Now that bsd.prog.mk deals with programs linked with -nostdlib
better, and has a notion of an "internal" program, use PROG
where possible. This has a good impact on the contents of
.depend files and causes programs to be linked with cc(1).
XXX: boot2 couldn't be converted as it's actually two programs.
Tested on: i386, amd64
the "disappearing subdisks" problem when new subdisks can't be created
due to some errors.
This is in fact an ugly hack, but a more elegant solution would probably
require a redesign of vinum in several places.
Approved by: joerg (mentor)
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.
Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver. Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.
Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.
Idea by: Yaoping Ruan & Vivek Pai
nodes, or if they did, they're now locked away on the Kurt Gdel
memorial home for the numerically confused:
Don't cast a kernel pointer (from makedev(9)) to an integer (maj+minor combo).
on the card, unmap it first. This allows it to be picked up properly when
the queue gets kicked again. This was the root problem for the lost command
(i.e. stuck in getblk/vinvalb) problem. While here, panic if commands don't
map correctly instead of just silently ignoring the problem and dropping
command. Also slow down the dynamic allocation of new commands.
It should be safe to go back into the aac waters. Thanks to everyone who
suffered through this and provided good feedback.
802.11b chipset work. This chip is present on the SMC2602W version 3
NIC, which is what was used for testing. This driver creates kernel
threads (12 of them!) for various purposes, and required the following
routines:
PsCreateSystemThread()
PsTerminateSystemThread()
KeInitializeEvent()
KeSetEvent()
KeResetEvent()
KeInitializeMutex()
KeReleaseMutex()
KeWaitForSingleObject()
KeWaitForMultipleObjects()
IoGetDeviceProperty()
and several more. Also, this driver abuses the fact that NDIS events
and timers are actually Windows events and timers, and uses NDIS events
with KeWaitForSingleObject(). The NDIS event routines have been rewritten
to interface with the ntoskrnl module. Many routines with incorrect
prototypes have been cleaned up.
Also, this driver puts jobs on the NDIS taskqueue (via NdisScheduleWorkItem())
which block on events, and this interferes with the operation of
NdisMAllocateSharedMemoryAsync(), which was also being put on the
NDIS taskqueue. To avoid the deadlock, NdisMAllocateSharedMemoryAsync()
is now performed in the NDIS SWI thread instead.
There's still room for some cleanups here, and I really should implement
KeInitializeTimer() and friends.
seems to work well in RELENG_4. However, 5.X locking foo means that I'll
have to do some quick redesign.
Add ioctl handlers for ISP_GETROLE and ISP_SETROLE ioctls.
handling of resources shortages. The driver is now so fast that it can
completely fill all 512 slots on the card, but for some reason only 511
slots are being allocated. Anything that tries to go into the 512th
slot gets silently lost. Both bugs need to be fixed at a later date,
but this should fix the reports of hangs in getblk and vinvalb.
edge cases in the loop.
- Try to grab a command before dequeueing the bio from the bioq. The old
behaviour of requeuing deferred bios to the end of the bioq is arguably
wrong. This should be fixed in the future to check the bioq head without
automatically dequeueing the bio.
- do not use PROG for what's not a real C program,
- use sys.mk transformation rules where possible,
- only create the "machine" symlink on AMD64,
- removed MAINTAINER lines in individual makefiles,
- added the LIBSTAND defitinion to <bsd.libnames.mk>,
- somewhat better contents in .depend files.
Tested on: i386, amd64
Prodded by: bde
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.
Prompted by: bde (the first two items anyway)
RLIM_INFINITY case for ogetrlimit().
- Use %jd and intmax_t to output negative time in usec in calcru().
- Rework getrusage() to make a copy of the rusage struct into a local
variable while holding Giant and then do the copyout from the local
variable to avoid having to have the original process rusage struct
locked while doing the copyout (which would not be safe). This also
includes a few style fixes from Bruce to getrusage().
Submitted by: bde (1, parts of 3)
Suggested by: bde (2)
delete it each time its run and have it regenerated each time by make.
I used a quick hackish script rather than putting it in the files file
and used the before-depend rule to avoid the depend/no-depend hacks.
failed, the reference count for the virtual memory object referenced
by the specified shared memory segment would have been erroneously
incremented.
Reported by: Joost Pol <joost@pine.nl>
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64
TIOCMSET before FreeBSD existed and have never been implemented by any
FreeBSD serial driver (not even as aliases).
Moved the TIOCM bit definitions to be with TIOCMGET.
Added more comments gaps between ioctl numbers. There are now a large
number of conflicts with ppp, slip, tap and tun ioctls (especially ppp
ones) from closing gaps that weren't there. This mainly breaks decoding
of ioctl numbers in kdump, but there are some serious conflicts where
the interpretation of a tty ioctl depends on the line discipline.
aic79xx.seq:
Convert the COMPLETE_DMA_SCB list to an "stailq". This allows us to
safely keep the SCB that is currently being DMA'ed back the host on
the head of the list while processing completions off of the bus. The
newly completed SCBs are appended to the tail of the queue. In the
past, we just dequeued the SCB that was in flight from the list, but
this could result in a lost completion should the host perform certain
types of error recovery that must cancel all in-flight SCB DMA operations.
Switch from using a 16bit completion entry, holding just the tag and the
completion valid bit, to a 64bit completion entry that also contains a
"status packet valid" indicator. This solves two problems:
o The SCB DMA engine on at least Rev B. silicon does not properly deal
with a PCI disconnect that occurs at a non-64bit aligned offset in the
chips "source buffer". When the transfer is resumed, the DMA engine
continues at the correct offset, but may wrap to the head of the buffer
causing duplicate completions to be reported to the host. By using a
completion buffer in host memory that is 64bit aligned and using 64bit
completion entries, such disconnects should only occur at aligned addresses.
This assumes that the host bridge will only disconnect on cache-line
boundaries and that cache-lines are multpiles of 64bits.
o By embedding the status information in the completion entry we can avoid
an extra memory reference to the HSCB for commands that complete without
error.
Use the comparison of a "host freeze count" and a "sequencer freeze count"
to allow the host to process most SCBs that complete with non-zero status
without having to clear critical sections. Instead the host can just pause the
sequencer, performs any necessary cleanup in the waiting for selection list,
increments its freeze count on the controller, and unpauses. This is only
possible because the sequencer defers completions of SCBs with bad status
until after all pending selections have completed. The sequencer then avoids
referencing any data structures the host may touch during completion of the
SCB until the freeze counts match.
aic79xx.c:
Change the strategy for allocating our sentinal HSCB for the QINFIFO. In
the past, this allocation was tacked onto the QOUTFIFO allocation. Now that
the qoutfifo has grown to accomodate larger completion entries, the old
approach will result in a 64byte allocation that costs an extra page of
coherent memory. We now do this extra allocation via ahd_alloc_scbs()
where the "unused space" can be used to allocate "normal" HSCBs.
In our packetized busfree handler, use the ENSELO bit to differentiate
between packetized and non-packetized unexpected busfree events that
occur just after selection, but before the sequencer has had the oportunity
to service the selection.
When cleaning out the waiting for selection list, use the SCSI mode
instead of the command channel mode. The SCB pointer in the command
channel mode may be referenced by the SCB dma engine even while the
sequencer is paused, whereas the SCSI mode SCB pointer is only accessed
by the sequencer.
Print the "complete on qfreeze" sequencer SCB completion list in
ahd_dump_card_state(). This list holds all SCB completions that are deferred
until a pending select-out qfreeze event has taken effect.
aic79xx.h:
Add definitions and structures to handle the new SCB completion scheme.
Add a controller flag that indicates if the controller is in HostRAID
mode.
aic79xx.reg:
Remove macros used for toggling from one data fifo mode to the other.
They have not been in use for some time.
Add scratch ram fields for our new qfreeze count scheme, converting
the complete dma list into an "stailq", and providing for the "complete
on qfreeze" SCB completion list. Some other fields were moved to retain
proper field alignment (alignment >= field size in bytes).
aic79xx.seq:
Add code to our idle loop to:
o Process deferred completions once a qfreeze event has taken full
effect.
o Thaw the queue once the sequencer and host qfreeze counts match.
Generate 64bit completion entries passing the SCB_SGPTR field as the
"good status" indicator. The first bit in this field is only set if
we have a valid status packet to send to the host.
Convert the COMPLETE_DMA_SCB list to an "stailq".
When using "setjmp" to register an idle loop handler, do not combine
the "ret" with the block move to pop the stack address in the same
instruction. At least on the A, this results in a return to the setjmp
caller, not to the new address at the top of the stack. Since we want
the latter (we want the newly registered handler to only be invoked from
the idle loop), we must use a separate ret instruction.
Add a few missing critical sections.
Close a race condition that can occur on Rev A. silicon. If both FIFOs
happen to be allocated before the sequencer has a chance to service the
FIFO that was allocated first, we must take special care to service the
FIFO that is not active on the SCSI bus first. This guarantees that a
FIFO will be freed to handle any snapshot requests for the FIFO that is
still on the bus. Chosing the incorrect FIFO will result in deadlock.
Update comments.
aic79xx_inline.h
Correct the offset calculation for the syncing of our qoutfifo.
Update ahd_check_cmdcmpltqueues() for the larger completion entries.
aic79xx_pci.c:
Attach to HostRAID controllers by default. In the future I may add a
sysctl to modify the behavior, but since FreeBSD does not have any
HostRAID drivers, failing to attach just results in more email and
bug reports for the author.
MFC After: 1week
namespace pollution in other headers. <sys/types.h> is now the only
prerequisite for <sys/sx.h>.
Fixed some style bugs:
- removed bogus LOCORE ifdef. Including this C header in assembler
sources is just nonsense.
- removed unused include of <sys/_mutex.h>. It finished rotting when
the mutex in struct sx became indirect in rev.1.15.
- removed most comments on #else and #endif's and cleaned up the others.
All were misindented...
- add an option for the output device in the hope that this can
be made non-blocking at some stage.
- define an alias for the disk device, required by dev/ofw/ofw_disk.c
- shift iobus to 0x9000000 so as not to clash with the OpenFirmware
entry point of 0x8000400 when address decoding.
- down-tone comments about the disk dev config :-)
using the direct-mapping of physmem to force PTE data structures
to be physically addressable so the interrupt-time real-mode
DSI trap handler could perform PTE spills. However, the memory
may have been > 256Mb, which would have caused a BAT spill and
double-interrupt.
The new trap code no longer handles PTE spills, so the requirement
that these pages be direct-mapped no longer applies. The irony is
UMA_MD_SMALL_ALLOC will return direct mappings for these structs :-)
- remove unused 601 and tlb exception code
- remove interrupt-time PTE spill code. The pmap code
will now take care of pinning kernel PTEs, and there
are no longer issues about physical mapping of PTE
data structures
- All segment registers are switched on kernel entry/exit,
allowing the kernel to have more virtual space and for
user virtual space to extend to 4G.
- The temporary register save area has been shifted from
unused exception vector space to the per-cpu data area.
This allows interrupts to be delivered to multiple CPUs
- ISI traps no longer spill to BAT tables. It is assumed
that all of kernel instruction memory is pinned.
- shift from 'ldmw/stmw' instructions to individual register
loads/stores when saving context. All PPC manuals indicate
this should be much faster.
- use '%r' for register names throughout.
TODO: need to test if DSI traps were the result of kernel stack
guard-page hits.
Reworked from: NetBSD
- Rename temporary variable names ("tmp", "tmp2") to more informative
names ("load", "pctcpu", "rss", ...)
- Unclutter indentation and return paths: rather than lots of nested
ifs, simply return earlier if it's not going to work out. Simplify
general structure and avoid "deep" code.
- Comment on the thread/process selection and locking.
- Correct handling of "running"/"runnable" states, avoid "unknown"
that people were seeing for running processes. This was due to
a misunderstanding of the more complex state machine / inhibitors
behavior of KSE.
- Do perform ttyinfo() printing on KSE (P_SA) processes, it seems
generally to work.
While I initially attempted to formulate this as two commits (one
layout, the other content), I concluded that the layout changes were
really structural changes.
Many elements submitted by: bde
Since we have a worker thread now, we can actually do the allocation
asynchronously in that thread's context. Also, we need to return a
status value: if we're unable to queue up the async allocation, we
return NDIS_STATUS_FAILURE, otherwise we return NDIS_STATUS_PENDING
to indicate the allocation has been queued and will occur later.
This replaces the kludge where we just invoked the callback routine
right away in the current context.
The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.
As part of this change, ifunit() is greatly simplified by testing
if_xname directly. if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.
Reviewed by: ru, sam (concept)
Vincent Jardin <vjardin AT free.fr>
Max Laier <max AT love2party.net>
that Asus provides on its CDs has both a MiniportSend() routine
and a MiniportSendPackets() function. The Microsoft NDIS docs say
that if a driver has both, only the MiniportSendPackets() routine
will be used. Although I think I implemented the support correctly,
calling the MiniportSend() routine seems to result in no packets going
out on the air, even though no error status is returned. The
MiniportSendPackets() function does work though, so at least in
this case it doesn't matter.
In if_ndis.c:ndis_getstate_80211(), if ndis_get_assoc() returns
an error, don't bother trying to obtain any other state since the
calls may fail, or worse cause the underlying driver to crash.
(The above two changes make the Asus-supplied Centrino work.)
Also, when calling the OID_802_11_CONFIGURATION OID, remember
to initialize the structure lengths correctly.
In subr_ndis.c:ndis_open_file(), set the current working directory
to rootvnode if we're in a thread that doesn't have a current
working directory set.
machdep.c fixed the missing early initialization of curpcb, so curpcb
is now always set together with curthread and it cannot be NULL except
before the IDT has been set up (so trap() is unreachable) or after a
memory error. In any case, it was often used without checking.
curcpb shouldn't exist anyway. It doesn't exist for most non-i386 arches.
It just caches curthread->td_pcb in a global. This was a better idea
before it was per-cpu. trap() and some other places can get at it more
efficiently using td->td_pcb instead of PCPU_GET(curpcb). The main
exception is support.s which mostly wants only curpcb->pcb_onfault.
instead, just dec/inc in the ctor/dtor. For now, increment/decrement
in two's, since we're now performing the operation once per pair,
not once per pipe. Not really any measurable performance change
in my micro-benchmarks, but doing less work is good, especially when
it comes to atomic operations.
Suggested by: alc
it is still above the critical temperature on the next poll cycle. This
is a 10 second advance notice by default. Document the private
(non-standard) notify we will be using with devd(8).
changes to jointly allocated pipe pairs. Replace these checks
with pipe_present checks. This avoids a NULL pointer dereference
when a pipe is half-closed.
Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>
used for the ICMP reply source in reponse to packets which are not
directly addressed to us. By default continue with with normal
source selection.
Reviewed by: bms
1. Root from inside a jail was able to unmount any file system
(except /).
2. Unprivileged root was able to unmount file systems mounted by
privileged root (execpt /).
3. User from inside a jail was able to mount file system when
sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
(that's ok) and unmount it even if vfs.usermount was equal to 0
(that's not correct).
Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>
Only a part of this fix will be MFC'ed (if approved).
PR: kern/60149
Reviewed by: rwatson
Approved by: scottl (mentor)
MFC after: 3 days
the system. Also, decrease the poll interval to 10 seconds from 30
seconds. This is needed because some systems will report an invalid high
temperature for one poll cycle. It is suspected this is due to the
embedded controller timing out. A typical value is 138C for one cycle on a
system that is otherwise 65C. This prevents the system from prematurely
shutting down after one invalid reading. It will still shut down after 30
seconds of high temperature, which is the same as previous default
behavior.
Tested by: Scott Lambert <lambert AT lambertfam.org>
is for an 802.11 device or not. At least one driver I have does not
support the OID_802_11_NETWORK_TYPES_SUPPORTED OID.
Also, for now, don't do anything special in the ndis_suspend() method.
I originally wanted to shut down the NIC but leave the IFF_UP flag alone
since technically the interface is meant to remain up, but an interrupt
may be delivered to the ISR on suspend, and if this happens while the
NIC is halted, we will crash, since none of the miniport driver methods
will function.
This needs to be dealt with properly later, but for now this prevents
a panic, and the resume method properly re-inits the NIC.
the thread that calls pmap_pte_quick() and by virtue of the page queues
lock being held, we can manage PADDR1/PMAP1 as a CPU private mapping.
The most common effect of this change is to reduce the overhead of the page
daemon on multiprocessors.
In collaboration with: tegge
packet along with data, instead of in their own packet. When serving files
of size (packetsize - headersize) or smaller, this will result in one less
packet crossing the network. Quick testing with thttpd and http_load has
shown a noticeable performance improvement in this case (350 vs 330 fetches
per second.)
Included in this commit are two support routines, iov_to_uio, and m_uiotombuf;
these routines are used by sendfile to construct the header mbuf chain that
will be linked to the rest of the data in the socket buffer.
sense with sched_4bsd as it does with sched_ule.
- Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or
not a thread should be accounted for in sched_tdcnt.
when uma_reclaim() was called. This was introduced when the zone
working-set algorithm was removed in favor of using the per cpu caches
as the working set.
would allocate two 'struct pipe's from the pipe zone, and malloc a
mutex.
- Create a new "struct pipepair" object holding the two 'struct
pipe' instances, struct mutex, and struct label reference. Pipe
structures now have a back-pointer to the pipe pair, and a
'pipe_present' flag to indicate whether the half has been
closed.
- Perform mutex init/destroy in zone init/destroy, avoiding
reallocating the mutex for each pipe. Perform most pipe structure
setup in zone constructor.
- VM memory mappings for pageable buffers are still done outside of
the UMA zone.
- Change MAC API to speak 'struct pipepair' instead of 'struct pipe',
update many policies. MAC labels are also handled outside of the
UMA zone for now. Label-only policy modules don't have to be
recompiled, but if a module is recompiled, its pipe entry points
will need to be updated. If a module actually reached into the
pipe structures (unlikely), that would also need to be modified.
These changes substantially simplify failure handling in the pipe
code as there are many fewer possible failure modes.
On half-close, pipes no longer free the 'struct pipe' for the closed
half until a full-close takes place. However, VM mapped buffers
are still released on half-close.
Some code refactoring is now possible to clean up some of the back
references, etc; this patch attempts not to change the structure
of most of the pipe implementation, only allocation/free code
paths, so as to avoid introducing bugs (hopefully).
This cuts about 8%-9% off the cost of sequential pipe allocation
and free in system call tests on UP and SMP in my micro-benchmarks.
May or may not make a difference in macro-benchmarks, but doing
less work is good.
Reviewed by: juli, tjr
Testing help: dwhite, fenestro, scottl, et al