58533 Commits

Author SHA1 Message Date
glebius
cea41af9a0 Add a tunable net.inet.tcp.maxtcptw, that allows to set a limit
on tcptw zone independently from setting a limit on socket zone.
2006-04-04 14:31:37 +00:00
rwatson
2e3d21db7b Before dereferencing intotw() when INP_TIMEWAIT, check for inp_ppcb being
NULL.  We currently do allow this to happen, but may want to remove that
possibility in the future.  This case can occur when a socket is left
open after TCP wraps up, and the timewait state is recycled.  This will
be cleaned up in the future.

Found by:	Kazuaki Oda <kaakun at highway dot ne dot jp>
MFC after:	3 months
2006-04-04 12:26:07 +00:00
dd
264a13426d Remove unused variables s and error in key_detach. The previous
revision removed their usage but did not remove the declaration. This
caused a warning in my build, which was fatal with -Werror.
2006-04-04 10:11:15 +00:00
jeff
275c043cbe - VFS_LOCK_GIANT when recycling a vnode via getnewvnode. We may be
recycling for an unrelated filesystem.  I really don't like potentially
   acquiring giant in the context of a giantless filesystem but there
   are reasonable objections to removing the recycling from this path.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:46:10 +00:00
jeff
6862688995 - Properly check against B_DELWRI and B_NEEDSGIANT. This check was
incorrectly written and caused some !NEEDSGIANT buffers to be put in
   the NEEDSGIANT queue.

Sponsored by:	Isilon Systems, Inc.
2006-04-04 06:44:21 +00:00
gnn
60609380bc Remove unintended DEBUG flag setting. 2006-04-04 03:12:21 +00:00
marcel
78f0584b0b Sync with i386: Map exceptions to signals in gdb_cpu_signal() so
that kgdb(1) gets a SIGTRAP when it needs to.

Pointed out by: grehan@
2006-04-04 03:00:20 +00:00
davidxu
2cacffb02b WARNS level 4 cleanup, still has work to do. 2006-04-04 02:57:09 +00:00
marcel
dc8b7dcaa1 The PC is register 16, not 18.
Pointed out by: grehan@
2006-04-04 02:44:51 +00:00
ps
5d986f2c0f Add support for Intel cpu model's 5 & 6.
Approved by:	jkoshy
2006-04-04 02:36:04 +00:00
jkoshy
33b6f0c5ea Freshen a comment.
Reviewed by:	jhb
2006-04-04 02:26:45 +00:00
njl
558b7687d6 Fix an off-by-one error in the port range detection. Cleanup some old
whitespace.
2006-04-04 02:22:38 +00:00
marcel
b36d718171 In z8530_divisor() return 0 if the calculated divisor is less than 0.
This happens when the baudrate is too high for the given RCLK.
2006-04-04 01:16:16 +00:00
marcel
dde58fbc0d Increment kdb_active after we stopped the other CPUs and decrement
kdb_active before we restart them. This avoids false positives on
restarted CPUs when they test for kdb_active while kdb_trap() is
still finishing up.
2006-04-04 00:40:20 +00:00
marcel
d28296b199 Improve handling of IPI_STOP:
o  use atomic operations to fiddle with stopped_cpus and started_cpus.
o  disable interrupts while we're waiting to be started.
o  remove logic relating to cpustop_restartfunc as it's not used.
2006-04-03 23:56:40 +00:00
marcel
8278e2d5fb Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
2006-04-03 22:51:47 +00:00
tegge
8582a7eef5 Eliminate softdep_flush() livelock by accounting for number of worklist items
marked as being in progress.
2006-04-03 22:23:23 +00:00
peter
3a90816456 Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses).  Each pv entry was linked to a vm_page_t and
a process's address space (pmap).  It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system.  A page ("pv chunk") is allocated and this provides
168 pv entries for that process.  We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process.  We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer.  After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.

Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
  the chunk pages and free any spares.  The stats (vm.pmap.*) show that
  this doesn't seem to be that much of a problem, but it can be done if
  needed.
* running low on pv entries is now a much bigger problem.  The old
  get_pv_entry() routine just needed to reclaim one other pv entry.
  Now, since they are per-process, we can only use pv entries that are
  assigned to our current process, or by stealing an entire page worth
  from another process.  Under normal circumstances, the pmap_collect()
  code should be able to dislodge some pv entries from the current
  process.  But if needed, it can still reclaim entire pv chunk pages
  from other processes.
* This should port to i386 really easily, except there it would reduce
  pv entries from 24 bytes to about 12 bytes.

(I have integrated Alan's recent changes.)
2006-04-03 21:36:01 +00:00
marius
6a5bc8bdd6 - s,tramoline,trampoline, in a comment.
- Use FBSDID in trap.c
- Make the global trap_sig[] static as it's not used outside of trap.c.
- In sendsig() remove an unused variable.
- In trap() sync with the other archs; for fast data access MMU miss and
  data access protection traps set ksi_addr to the SFAR reg which contains
  the faulting address and otherwise to the TPC reg. Generally the TCP reg
  contains the address of the instruction that caused the exception, except
  for fast instruction access traps (and some others; more refinement may
  be needed here) it also contains the faulting address.
  Previously sendsig() always set si_addr to the SFAR reg which is wrong
  for most traps.
- In sendsig() add support for FreeBSD old-style signals.

These changes are inspired by kmacy's sun4v changes and allow libsigsegv
to build on FreeBSD/sparc64, but it doesn't pass all checks and tests it
actually should, yet.

MFC after:	5 days
2006-04-03 21:27:01 +00:00
peter
0f363b7d24 Remove the unused sva and eva arguments from pmap_remove_pages(). 2006-04-03 21:16:10 +00:00
marcel
a1c5f48a6d In kdb_trap(), change the type of the local variable 'intr' from int
to register_t, as intr_disable() returns the latter and register_t
may be wider than int.

Pointed out by: marius@
2006-04-03 20:55:52 +00:00
sam
96f86dcf07 o add opt_ath.h enable tweaking various config parameters for the driver
without modifying the source code
o default debug msgs and diag support to off

MFC after:	3 days
2006-04-03 18:14:02 +00:00
marcel
64ac08d05f Replace critical_enter() and critical_exit() in kdb_trap() with
intr_disable() and intr_restore() resp. Previously, critical
regions would have interrupts disabled, but that was changed.
Consequently, the debugger could run with interrupts enabled.
This could cause problems for the low-level console code where
received characters would trigger an interrupt that causes
the interrupt handler to read the character instead of the
cngetc() function.
2006-04-03 17:48:09 +00:00
ariff
5980319eac Add device ID for nForce 410 MCP audio controller.
PR:		kern/95257
Submitted by:	cenix <cenixxx at gmail dot com>
MFC after:	3 days
2006-04-03 17:37:27 +00:00
rwatson
56cba4038a In TCP notify routines, check inpcb for INP_TIMEWAIT and INP_DROPPED.
The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT
checks appear to have always been required, but not been there, which
is/was a bug.  This avoids unconditionally casting of in_ppcb to a tcpcb,
when it may be a twtcb, which may have resulted in obscure ICMP-related
panics in earlier releases.

MFC after:	3 months
2006-04-03 14:07:50 +00:00
rwatson
d67aff8ec4 Change inp_ppcb from caddr_t to void *, fix/remove associated related
casts.

Consistently use intotw() to cast inp_ppcb pointers to struct tcptw *
pointers.

Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb *
pointers.

Don't assign tp to the results to intotcpcb() during variable declation
at the top of functions, as that is before the asserts relating to
locking have been performed.  Do this later in the function after
appropriate assertions have run to allow that operation to be conisdered
safe.

MFC after:	3 months
2006-04-03 13:33:55 +00:00
rwatson
4586157b3a Style tweaks: convert to ANSI from K&R function prototypes.
MFC after:	3 months
2006-04-03 12:59:27 +00:00
rwatson
cf774d5382 Update comment on tcp_close() for new world order.
MFC after:	3 months
2006-04-03 12:52:13 +00:00
rwatson
206bd5674e Clarify comment on handling of non-timewait TCP states in
tcp_usr_detach().

MFC after:	3 months
2006-04-03 12:43:56 +00:00
rwatson
34473d63e2 Fix up locking surrounding tcp_drop sysctl: in the new world order, we
don't free inpcbs until after the socket is closed, so we always need
to unlock an inpcb after calling tcp_drop() on it.

MFC after:	3 months
2006-04-03 11:57:12 +00:00
rwatson
2ff901e7be After checking for SO_ISDISCONNECTED in tcp_usr_accept(), return
immediately rather than jumping to the normal output handling, which
assumes we've pulled out the inpcb, which hasn't happened at this
point (and isn't necessary).

Return ECONNABORTED instead of EINVAL when the inpcb has entered
INP_TIMEWAIT or INP_DROPPED, as this is the documented error value.

This may correct the panic seen by Ganbold.

MFC after:	1 month
Reported by:	Ganbold <ganbold at micom dot mng dot net>
2006-04-03 09:52:55 +00:00
rwatson
c8b4c281fa Correct incorrect assertion in div_bind(): inp must not be NULL here.
Reported by:	tegge
MFC after:	3 months
2006-04-03 09:01:17 +00:00
marcel
c8a811b93f Remove unused variable 'error'. Forgotten in previous commit. 2006-04-02 21:58:09 +00:00
marcel
a135d9eb7b Don't claim a SAB82532. We have scc(4) for that. 2006-04-02 21:50:45 +00:00
marcel
aedb89e6c0 Eliminate the sc_hasfifo flag from the softc. It was only used by
the NS8250 class driver. The UART has FIFOs if sc_rxfifosz>1, so
test for that instead.
While here properly initialize sc_rxfifosz and sc_txfifosz in the
case the UART doesn't have FIFOs.
2006-04-02 21:45:54 +00:00
rwatson
cce79b77fe During reformulation of tcp_usr_detach(), the call to initiate TCP
disconnect for fully connected sockets was dropped, meaning that if
the socket was closed while the connection was alive, it would be
leaked.  Structure tcp_usr_detach() so that there are two clear
parts: initiating disconnect, and reclaiming state, and reintroduce
the tcp_disconnect() call in the first part.

MFC after:	3 months
2006-04-02 16:42:51 +00:00
alc
af01e3f809 Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry().  This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy().  The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated.  Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days
2006-04-02 05:45:05 +00:00
cel
08249d49bf rick says:
The following bug was just identified in OpenBSD and it looks like the same
bug exists in the other BSDen NFS servers.

A Linux client (don't know which version, but you can look at
	http://bugzilla.kernel.org/show_bug.cgi?id=6256)
does a Setattr of mtime to the server's time, where the file is mode 0664 and
the client user has group access (ie. caller is not the file owner).

The BSD servers fail the Setattr with EPERM, since the VA_UTIMES_NULL flag
isn't set before doing the VOP_SETATTR.

It seems to me that this should be allowed, since it is allowed for a local
utimes(2). If so, the fix is to set VA_UTIMES_NULL for the
"set-time-to-server-time" cases of setting atime and/or mtime.

Submitted by:	rick@snowhite.cis.uoguelph.ca
Reviewed by:	cel
Approved by:	silby
MFC after:	1 week
2006-04-02 04:24:57 +00:00
rwatson
ace109901c Properly handle an edge case previously not handled correctly: a
socket can have a tcp connection that has entered time wait
attached to it, in the event that shutdown() is called on the
socket and the FINs properly exchange before close().  In this
case we don't detach or free the inpcb, just leave the tcptw
detached and freed, but we must release the inpcb lock (which we
didn't previously).

MFC after:	3 months
2006-04-01 23:53:25 +00:00
jmg
45648c7949 mask out any action when copying the flags from the event to the knote..
Pointed out by:	Václav Haisman
Submitted by:	Dan Nelson (slightly modifed patch)
MFC after:	3 days
2006-04-01 20:15:39 +00:00
mjacob
418e5ad9cc Fix fat-fingered version define. 2006-04-01 19:49:55 +00:00
marcel
01ed5990ae Don't hold the hardware mutex across getc(). It can wait indefinitely
for a character to be received. Instead let getc() do any necesary
locking.
2006-04-01 19:04:54 +00:00
rwatson
9fa0587a55 White space consistency with kasserts. Minor style tweaks.
MFC after:	3 months
2006-04-01 16:54:37 +00:00
rwatson
5078a28ae8 Update TCP for infrastructural changes to the socket/pcb refcount model,
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, the receive
  code no longer requires the pcbinfo lock, and the send code only
  requires it if building a new connection on an otherwise unconnected
  socket triggered via sendto() with an address.  This should
  significnatly reduce tcbinfo lock contention in the receive and send
  cases.

- In order to support the invariant that so_pcb != NULL, it is now
  necessary for the TCP code to not discard the tcpcb any time a
  connection is dropped, but instead leave the tcpcb until the socket
  is shutdown.  This case is handled by setting INP_DROPPED, to
  substitute for using a NULL so_pcb to indicate that the connection
  has been dropped.  This requires the inpcb lock, but not the pcbinfo
  lock.

- Unlike all other protocols in the tree, TCP may need to retain access
  to the socket after the file descriptor has been closed.  Set
  SS_PROTOREF in tcp_detach() in order to prevent the socket from being
  freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether
  or not it needs to free the socket when the connection finally does
  close.  The typical case where this occurs is if close() is called on
  a TCP socket before all sent data in the send socket buffer has been
  transmitted or acknowledged.  If INP_SOCKREF is found when the
  connection is dropped, we release the inpcb, tcpcb, and socket instead
  of flagging INP_DROPPED.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Annotate the existence of a long-standing race in the TCP timer code,
  in which timers are stopped but not drained when the socket is freed,
  as waiting for drain may lead to deadlocks, or have to occur in a
  context where waiting is not permitted.  This race has been handled
  by testing to see if the tcpcb pointer in the inpcb is NULL (and vice
  versa), which is not normally permitted, but may be true of a inpcb
  and tcpcb have been freed.  Add a counter to test how often this race
  has actually occurred, and a large comment for each instance where
  we compare potentially freed memory with NULL.  This will have to be
  fixed in the near future, but requires is to further address how to
  handle the timer shutdown shutdown issue.

- Several TCP calls no longer potentially free the passed inpcb/tcpcb,
  so no longer need to return a pointer to indicate whether the argument
  passed in is still valid.

- Un-macroize debugging and locking setup for various protocol switch
  methods for TCP, as it lead to more obscurity, and as locking becomes
  more customized to the methods, offers less benefit.

- Assert copyright on tcp_usrreq.c due to significant modifications that
  have been made as part of this work.

These changes significantly modify the memory management and connection
logic of our TCP implementation, and are (as such) High Risk Changes,
and likely to contain serious bugs.  Please report problems to the
current@ mailing list ASAP, ideally with simple test cases, and
optionally, packet traces.

MFC after:	3 months
2006-04-01 16:36:36 +00:00
rwatson
a7c2bca553 Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, in protocol
  shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
  detached in_pcb structure for a socket.

MFC after:	3 months
2006-04-01 16:20:54 +00:00
rwatson
71cc03392b Break out in_pcbdetach() into two functions:
- in_pcbdetach(), which removes the link between an inpcb and its
  socket.

- in_pcbfree(), which frees a detached pcb.

Unlike the previous in_pcbdetach(), neither of these functions will
attempt to conditionally free the socket, as they are responsible only
for managing in_pcb memory.  Mirror these changes into in6_pcbdetach()
by breaking it into in6_pcbdetach() and in6_pcbfree().

While here, eliminate undesired checks for NULL inpcb pointers in
sockets, as we will now have as an invariant that sockets will always
have valid so_pcb pointers.

MFC after:	3 months
2006-04-01 16:04:42 +00:00
rwatson
173781a39a In raw and raw-derived socket types, maintain and enforce invariant that
the so_pcb pointer on the socket is always non-NULL.  This eliminates
countless unnecessary error checks, replacing them with assertions.

MFC after:	3 months
2006-04-01 15:55:44 +00:00
rwatson
5479e5d692 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
rwatson
68ff3be0b3 Annotate uses of fgetsock() with indications that they should rely
on their existing file descriptor references to sockets, rather than
use fgetsock() to retrieve a direct socket reference.

MFC after:	3 months
2006-04-01 15:25:01 +00:00
rwatson
8622e776f9 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00