47044 Commits

Author SHA1 Message Date
ume
afde7385d2 do not check super user privilege in ip6_savecontrol. It is
meaningless and can even be harmful.

Obtained from:	KAME
MFC after:	3 days
2004-06-02 15:41:18 +00:00
tjr
9bd12a2fd9 Remove a redundant "td = curthread" statement from profclock(). 2004-06-02 12:05:06 +00:00
phk
85f8b889c5 Some embedded platforms have no keyboard controller. Give up waiting
for it to react after a timeout.
2004-06-02 09:38:32 +00:00
tjr
80d36400ed Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by:	davidxu
2004-06-02 07:52:36 +00:00
jeff
33a226cf5e - Run sched_balance() and sched_balance_groups() from hardclock via
sched_clock() rather than using callouts.  This means we no longer have to
   take the load of the callout thread into consideration while balancing and
   should make the balancing decisions simpler and more accurate.

Tested on:	x86/UP, amd64/SMP
2004-06-02 05:46:48 +00:00
rwatson
576b26bafd Integrate accept locking from rwatson_netperf, introducing a new
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:

          so_qlen          so_incqlen         so_qstate
          so_comp          so_incomp          so_list
          so_head

While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.

While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and  address
lock order concerns.  In particular:

- Reorganize the optimistic concurrency behavior in accept1() to
  always allocate a file descriptor with falloc() so that if we do
  find a socket, we don't have to encounter the "Oh, there wasn't
  a socket" race that can occur if falloc() sleeps in the current
  code, which broke inbound accept() ordering, not to mention
  requiring backing out socket state changes in a way that raced
  with the protocol level.  We may want to add a lockless read of
  the queue state if polling of empty queues proves to be important
  to optimize.

- In accept1(), soref() the socket while holding the accept lock
  so that the socket cannot be free'd in a race with the protocol
  layer.  Likewise in netgraph equivilents of the accept1() code.

- In sonewconn(), loop waiting for the queue to be small enough to
  insert our new socket once we've committed to inserting it, or
  races can occur that cause the incomplete socket queue to
  overfill.  In the previously implementation, it was sufficient
  to simply tested once since calling soabort() didn't release
  synchronization permitting another thread to insert a socket as
  we discard a previous one.

- In soclose()/sofree()/et al, it is the responsibility of the
  caller to remove a socket from the incomplete connection queue
  before calling soabort(), which prevents soabort() from having
  to walk into the accept socket to release the socket from its
  queue, and avoids races when releasing the accept mutex to enter
  soabort(), permitting soabort() to avoid lock ordering issues
  with the caller.

- Generally cluster accept queue related operations together
  throughout these functions in order to facilitate locking.

Annotate new locking in socketvar.h.
2004-06-02 04:15:39 +00:00
rwatson
41a003003f Rather than assert f_type==DTYPE_VNODE, conditionally perform the
file lock release based on f_type==DTYPE_VNODE.  vn_closefile() is
used by non-vnode types as well (fifo).
2004-06-01 23:36:47 +00:00
wpaul
da796b8f8e Explicitly #include <sys/module.h> in these files too (they use
MODULE_DEPEND()).
2004-06-01 23:27:36 +00:00
wpaul
a4fd26fba2 Explicitly #include <sys/module.h> instead of depending on <sys/kernel.h>
to do it for us.
2004-06-01 23:24:17 +00:00
phk
1a00e2ae6b A major overhaul of the nmdm(4) driver:
It was based on the pty(4) driver which as a tty side an a non-tty side.

Nmdm(4) seems to have inherited two symmetric sides from pty but
unfortunately they are not quite ttys.  Running a getty one one
side and tip on the other failed to produce NL->CRNL mapping for
instance.

Rip out the basically bogus cdevsw->{read,write} functions and rely
on ttyread() and ttywrite() which does the same thing.

Use taskqueue_swi_giant to run a task for either side to do what
needs to be done.  (Direct calling is not an option as it leads to
recursion.)  Trigger the task from the t_oproc and t_stop methods.

Default the ports to not ECHO.  Since we neither rate limiting nor
emulation, two ports echoing each other is a really bad idea, which
can only be properly mitigated by rate limiting, rate emulation or
intelligent detection.  Rate emulation would be a neat feature.

Ditch the modem-line emulation, if needed for some app, it needs
to be thought much more about how it interacts with the open/close
logic.
2004-06-01 22:53:00 +00:00
jhb
5b4075da93 - Add a function ioapic_program_intpin() that completely programs an I/O
APIC interrupt pin based on the settings in the corresponding interrupt
  source structure.
- Use ioapic_program_intpin() in place of manual frobbing of the intpin
  configuration in ioapic_program_destination() and ioapic_register().
- Use ioapic_program_intpin() to implement suspend/resume support for I/O
  APICs.
2004-06-01 20:28:42 +00:00
joerg
b2404db737 Add SVR4-compatible VTOC-style elements to the Sun label. The
FreeBSD kernel doesn't use them but sunlabel(8) shortly will,
and both these files are used by sunlabel(8).
2004-06-01 20:18:25 +00:00
jhb
33b8939d73 Allow the pir0 device add to fail since pir0 may already exist. This should
fix the panics in device_set_ivars() that people were seeing on boxes with
multiple Host-PCI bridges but not using ACPI.
2004-06-01 19:51:29 +00:00
jhb
150c17be5c Fix legacy_add_child() to properly handle the case where
device_add_child_ordered() fails (due to a duplicate device add for
example) and properly cleanup and return NULL.
2004-06-01 19:50:42 +00:00
jhb
ff59f0bd8b Use the local APIC ID rather than the ACPI Processor ID to index the array
of CPUs since local APIC IDs are bounded but ACPI IDs are not bounded.
2004-06-01 19:49:38 +00:00
rwatson
87f869d1e9 Replace current locking comments for struct socket/struct sockbuf
with new ones.  Annotate constant-after-creation fields as such.  The
comments describe a number of locks that are not yet merged.
2004-06-01 19:33:06 +00:00
phk
30540e618c Remove unused variable. 2004-06-01 19:02:51 +00:00
rwatson
5adf35c004 Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires
Giant.
2004-06-01 18:05:41 +00:00
rwatson
1e76056c09 Push the VOP_ADVLOCK() call to release advisory locks on vnode file
descriptors out of fdrop_locked() and into vn_closefile().  This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes.  This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.

Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
2004-06-01 18:03:20 +00:00
bmilekic
9e06a1e05a Fix a couple of bugs in the mbuf and packet ctors. In the latter case,
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases.  *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
2004-06-01 16:17:10 +00:00
scottl
0fae2f30f5 Commit the correct version of the patch from last night. This fixes an
immediate panic when doing any i/o, and it closes a completion race.
2004-06-01 15:50:11 +00:00
phk
83ae77becd Gainfully employ the new ttyioctl in the trivial cases. 2004-06-01 13:49:28 +00:00
phk
3521579704 Introduce a ttyioctl() cdevsw default function. 2004-06-01 13:39:02 +00:00
ru
44888b6a3f Removed a leftover from the previous change.
Submitted by:	Gleb Smirnoff
2004-06-01 13:15:32 +00:00
sos
1a40a5a65c When waiting for drive to become ready, reinit the request params as they
might get trashed by autosensing.
2004-06-01 12:28:45 +00:00
sos
a6ddd03408 Use the right cmd+errorcode if we are in autosense/not. 2004-06-01 12:26:08 +00:00
phk
e0c89dae13 There is no need to explicitly call the stop function. In all likelyhood
->l_close() did it and ttyclose certainly will.
2004-06-01 11:57:15 +00:00
phk
19bbdd84d6 shift the four cdevsw functions for ttys to sys/conf.h and prototype
them with the correct typedef.
2004-06-01 11:56:04 +00:00
phk
b59eec9a5a There is no need to explicitly call ttwakeup() and ttwwakeup() after
ttyclose() has been called.  It's already been done once by ttyclose,
and probably once by the line-discipline too.
2004-06-01 11:38:06 +00:00
sos
70c10dad98 Only set and report error if not set already. 2004-06-01 11:37:24 +00:00
sos
2fbc4845f9 Dont retry on devices that left the system.
Ignore "fake" devices that has 0x7f status.
2004-06-01 11:34:46 +00:00
phk
d4a4e27cd4 ttyclose() increments t_gen. Remove redundant increments in drivers. 2004-06-01 10:15:56 +00:00
truckman
967b923f98 Whitespace correction - #define should be followed by a tab. 2004-06-01 08:59:03 +00:00
tanimura
3d7b42f638 Axe the old midi drivers and framework. matk has developed a new
module-friendly midi subsystem to be merged soon.
2004-06-01 06:22:59 +00:00
scottl
bee5c9d805 Collapse aac_map_command() into aac_startio(). Check the AAC_QUEUE_FRZN in
every iteration of aac_startio().  This ensures that a command that is
deferred for lack of resources doesn't immediately get retried in the
aac_startio() loop.  This avoids an almost certain livelock.
2004-06-01 05:32:26 +00:00
rwatson
5a32935851 Add a global mutex, accept_filter_mtx, to protect the global list of
accept filters and prevent read-modify-write races.
2004-06-01 04:08:48 +00:00
rwatson
bddadcf71a The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
2004-06-01 02:42:56 +00:00
bmilekic
2ad1fea4f3 Fix a comment above uma_zsecond_create(), describing its arguments.
It doesn't take 'align' and 'flags' but 'master' instead, which is
a reference to the Master Zone, containing the backing Keg.

Pointed out by: Tim Robbins (tjr)
2004-06-01 01:36:26 +00:00
truckman
d503c79cad Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
2004-06-01 01:18:51 +00:00
njl
85ad08a47d Remove debugging printf that never triggered because acpi is the first
user of nexus::bus_get_resource.
2004-06-01 01:04:25 +00:00
mlaier
03517ac71a "Get rid of the nested include of <sys/module.h> from <sys/kernel.h>" or
better do no longer depend on it.

Requested-by:	phk
Approved-by:	bms(mentor)
2004-05-31 22:48:19 +00:00
bmilekic
f7574a2276 Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)
2004-05-31 21:46:06 +00:00
rwatson
13656d723e Assert Giant in vn_start_write() and vn_finished_write(). 2004-05-31 20:56:10 +00:00
bmilekic
b7d590579b Giant wasn't dropped here if we have to return EBUSY. This is bad. 2004-05-31 20:21:06 +00:00
rwatson
2d231cab68 Release NFS subsystem lock and acquire Giant when calling into
vn_start_write().
2004-05-31 19:08:22 +00:00
rwatson
afc098b3e1 Assert Giant in vrele(). 2004-05-31 19:06:01 +00:00
krion
2ca388c921 - Fix typo
Approved by:	tobez
2004-05-31 16:55:12 +00:00
rwatson
a1a21e421c Add an assertion that nfssvc() isn't called with Giant.
Add two additional pairs of assertions, one at the end of the NFS
server event loop, and one one exit from the NFS daemon, that
assert that if debug.mpsafenet is enabled, Giant is not held, and
that if it is not enabled, Giant will be held.  This is intended
to support debugging scenarios where Giant is "leaked" during NFS
processing.
2004-05-31 16:32:49 +00:00
nsouch
a1d2a459f7 Necessary modifications do get pcf working again for ISA. Tested with
my Elektor card. Note that the hints are necessary to specify the
IO base of the pcf chip. This enables to check the IO base when the
probe routine is called during ISA enumeration.

The interrupt driven code is mixed with polled mode, which is wrong
and produces supposed spurious interrupts at each access. I still have
to work on it.
2004-05-31 14:24:21 +00:00
takawata
d40a25cb50 Devclass have to be shared with same 'pcm' devclass, or
unit management will corrupt.
2004-05-31 11:38:46 +00:00