Commit Graph

63632 Commits

Author SHA1 Message Date
Poul-Henning Kamp
27de12a9c2 Don't rename fields with #define.
Collapse two semantically identical structs.
Add missing vr_ prefix.
2007-04-22 14:57:05 +00:00
Robert Watson
269ad13024 Further MAC test policy cleanup and enhancement:
- Redistribute counter declarations to where they are used, rather than at
  the file header, so it's more clear where we do (and don't) have
  counters.

- Add many more counters, one per policy entry point, so that many
  individual access controls and object life cycle events are tracked.

- Perform counter increments for label destruction explicitly in entry
  point functions rather than in LABEL_DESTROY().

- Use LABEL_INIT() instead of SLOT_SET() directly in label init functions
  to be symmetric with destruction.

- Align counter names more carefully with entry point names.

- More constant and variable name normalization.

Obtained from:	TrustedBSD Project
2007-04-22 13:29:37 +00:00
Poul-Henning Kamp
c8ea76936e Run if_vr(4) through FlexeLint and clean some of the cobwebs found. 2007-04-22 12:55:36 +00:00
Randall Stewart
58967d8d46 Moves the PCB features and flags from sctp_pcb.h to
sctp.h so that netstat can access and display these
values.
2007-04-22 12:12:38 +00:00
Robert Watson
6827d0294e Perform overdue clean up mac_test policy:
- Add a more detailed comment describing the mac_test policy.

- Add COUNTER_DECL() and COUNTER_INC() macros to declare and manage
  various test counters, reducing the verbosity of the test policy
  quite a bit.

- Add LABEL_CHECK() macro to abbreviate normal validation of labels.
  Unlike the previous check macros, this checks for a NULL label and
  doesn't test NULL labels.  This means that optionally passed labels
  will now be handled automatically, although in the case of optional
  credentials, NULL-checks are still required.

- Add LABEL_DESTROY() macro to abbreviate the handling of label
  validation and tear-down.

- Add LABEL_NOTFREE() macro to abbreviate check for non-free labels.

- Normalize the names of counters, magic values.

- Remove unused policy "enabled" flag.

Obtained from:	TrustedBSD Project
2007-04-22 11:35:15 +00:00
Randall Stewart
9a6142d8cd - Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
-  Moved to latest socket api for extended sndrcv info struct.
-  Moved to support all new levels of fragment interleave.
2007-04-22 11:06:27 +00:00
Dag-Erling Smørgrav
7621783a55 Now that we're MPSAFE, tell namei() to acquire Giant if necessary. 2007-04-22 08:41:52 +00:00
Robert Watson
18717f69b1 Allow MAC policy modules to control access to audit configuration system
calls.  Add MAC Framework entry points and MAC policy entry points for
audit(), auditctl(), auditon(), setaudit(), aud setauid().

MAC Framework entry points are only added for audit system calls where
additional argument context may be useful for policy decision-making; other
audit system calls without arguments may be controlled via the priv(9)
entry points.

Update various policy modules to implement audit-related checks, and in
some cases, other missing system-related checks.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA, Inc.
2007-04-21 22:08:48 +00:00
Robert Watson
fea9ea0005 Teach netinet6 to use PRIV_NETINET_REUSEPORT. 2007-04-21 18:14:04 +00:00
Robert Watson
dc4725135d Attempt to rationalize NFS privileges:
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.

- Use PRIV_NFS_DAEMON in the NFS server.

- In the NFS client, move the privilege check from nfslockdans(), which
  occurs every time a write is performed on /dev/nfslock, and instead do it
  in nfslock_open() just once.  This allows us to avoid checking the saved
  uid for root, and just use the effective on open.  Use PRIV_NFS_LOCKD.
2007-04-21 18:11:19 +00:00
Stephan Uphoff
31b4f4a916 Modify TLB invalidation handling.
Reviewed by:	alc@, peter@
MFC after:	1 week
2007-04-21 14:17:30 +00:00
Pawel Jakub Dawidek
9de81c7273 MFp4:
@118370	Correct typo.

@118371	Integrate changes from vendor.

@118491	Show backtrace on unexpected code paths.

@118494	Integrate changes from vendor.

@118504	Fix sendfile(2). I had two ways of fixing it:
	1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of
	   hacking around with vn_rdwr(UIO_NOCOPY), which was suggested
	   by ups.
	2. Modify ZFS behaviour to handle this special case.

	Although 1 is more correct, I've choosen 2, because hack from 1
	have a side-effect of beeing faster - it reads ahead MAXBSIZE
	bytes instead of reading page by page. This is not easy to implement
	with VOP_GETPAGES(), at least not for me in this very moment.

	Reported by:	Andrey V. Elsukov <bu7cher@yandex.ru>

@118525	Reorganize the code to reduce diff.

@118526	This code path is expected. It is simply when file is opened with
	O_FSYNC flag.

	Reported by:	kris
	Reported by:	Michal Suszko <dry@dry.pl>
2007-04-21 12:02:57 +00:00
Stephane E. Potvin
0e5179e441 Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
Pawel Jakub Dawidek
eed20b37f5 Don't reinvent vm_page_grab().
Reviewed by:	ups
2007-04-20 19:49:20 +00:00
Andre Oppermann
df47e4377b o Remove unncessary TOF_SIGLEN flag from struct tcpopt
o Correctly set to->to_signature in tcp_dooptions()
o Update comments
2007-04-20 15:28:01 +00:00
Andre Oppermann
7824d002c0 Add more KASSERT's. 2007-04-20 15:21:29 +00:00
Andre Oppermann
0d957bba48 o Remove unused and redundant TCP option definitions
o Replace usage of MAX_TCPOPTLEN with the correctly constructed and
  derived MAX_TCPOPTLEN
2007-04-20 15:08:09 +00:00
Andre Oppermann
4d6e713043 Remove bogus check for accept queue length and associated failure handling
from the incoming SYN handling section of tcp_input().

Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed.  It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake.  It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.

Change return value of syncache_add() to void.  No status communication
is required.
2007-04-20 14:34:54 +00:00
Andre Oppermann
e207f80039 Simplifly syncache_expand() and clarify its semantics. Zero is returned
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies.  True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.

For both cases an RST is sent back in tcp_input().

A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.

Reported by:	kris (the panic)
2007-04-20 13:51:34 +00:00
Andre Oppermann
0a5df51410 Only update TCP timestamp on SYN duplication if it is present on
current SYN in syncache_add().  Otherwise disable timestamps.
2007-04-20 13:36:48 +00:00
Andre Oppermann
c73f70b728 o Plug memory leak in syncache_add() on MAC label allocation failure.
o Simplify code flow with 'done' goto label.
o Remove mbuf argument from syncache_respond().  It doesn't make use
  of it.
2007-04-20 13:30:08 +00:00
Alexander Motin
e07c5170e1 Added m_tag_copy_chain() call to copy original outgoing packet tags to all of
it's fragments.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:44:40 +00:00
Alexander Motin
ccffcb5147 Optimized packet distribution plan for the equal links case. Do not
split packet on fragments smaller then MP_MIN_FRAG_LEN to reduce total
overhead.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:42:08 +00:00
Alexander Motin
8e8f114e62 - Changed sequence numbers processing to avoid incorrect timeout waiting
when one of links is inactive and have stale sequence number. To avoid
this sequence numbers of all links are getting updated on every
successful packet reassembling.
- ng_ppp_bump_mseq function created to simplify code.
- ng_ppp_frag_drop function separated from ng_ppp_frag_process to
simplify code.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:38:18 +00:00
Alexander Motin
fd58342c26 - Fixed mistakes in latency and xmitBytes calculation math
which lead to ineffective multilink packet distribution plans.
- Changed bytesInQueue calculation math to have more precise information
about links utilization.
- Taken rough account of the link overhead. Better way to do it could be to
get exact overhead from user-level, but I have not done it to keep
binary compatibility.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:22:57 +00:00
Kip Macy
fb1e3ccd7e Schedule the ithread on the same cpu as the interrupt
Tested by: kmacy
Submitted by: jeffr
2007-04-20 05:45:46 +00:00
Kip Macy
5f1e4ae331 Free cluster if we fail to create the dmamap.
Fixes CID 1829
Found by: Coverity
2007-04-20 05:16:42 +00:00
Kip Macy
527888d7c0 Eliminate CID 1842 by comparing against (type != EXT_MBUF) => refcnt != NULL 2007-04-20 05:12:54 +00:00
Kip Macy
f297a9d336 Fix memory leak in m_collapse (CID 1843)
Found by: Coverity
Submitted by: jhb
2007-04-20 05:06:02 +00:00
Peter Grehan
90bf3dc7cb Add ofw bus methods to the ppc nexus driver. This will be used in future
EFIKA platform support.

PR:	111522
Submitted by:	Andrew Turner, andrew at fubar geek nz
2007-04-20 03:24:59 +00:00
Tom Rhodes
164554dec4 In some cases, like whenever devfs file times are zero, the fix(aa) will not
be applied to dev entries.  This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check.  It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps.  It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.

Discussed with:	bde
"Ok with me:	phk
2007-04-20 01:47:05 +00:00
Ariff Abdullah
9d2d90cb79 Unbreak module / driver attach breakage. Both snd_envy24 and snd_envy24ht
mistakenly rely on wrong snd_spicds version.
2007-04-20 01:28:51 +00:00
Scott Long
77dc25cc98 Retire the spl() markers. Add in some minor missed locking as a result. 2007-04-19 23:34:51 +00:00
Scott Long
11e4face2d Inline cam_periph_lock|unlock to make debugging easier. Use more
CAM_SIM_LOCK() more uniformly.
2007-04-19 22:46:26 +00:00
Scott Long
919c80dfc7 Fix a leaked lock in dashutdown. 2007-04-19 22:18:15 +00:00
Scott Long
7628acd8ee Up until now, the free SCB pool received only a small initial allocation,
and new SCBs were allocated on demand later if needed.  This has two
problems.  First, allocating SCBs involves allocating contiguous memory,
and if memory is exhausted then the VM will try to page out to satisfy
the request, leading to recursion and deadlock.  The second problem is
that it can cause lock order reversals due to parts of the VM still being
under Giant.

Fix the problem be allocating the full pool at driver attach, when it is
safe to do so.
2007-04-19 18:53:52 +00:00
Scott Long
58b0b144e8 Avoid problems with make_dev. 2007-04-19 18:14:33 +00:00
John Baldwin
0d4e0cc591 Oops, fix intsmb(4) attach. Don't overwrite the 'value' holding the
interrupt mode with the SMB revision before checking 'value' for a valid
interrupt mode.

Reported by:	Ulrich Spoerlein <uspoerlein of gmail fame>
2007-04-19 17:14:06 +00:00
Mike Makonnen
18a6073100 Make inet6_rth_* family of functions more compliant with RFC3542:
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
   the next header in the chain. If cmsg is NULL it should return
   the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
   to verify that the number of headers (segments) is
   between 0 and 127 inclusive.

MFC-After: 1 month
2007-04-19 15:48:16 +00:00
Scott Long
2a30c7ddf7 Zero the CCBs when mallocing them. 2007-04-19 14:45:37 +00:00
Scott Long
9758cc8399 Split the camisr into per-SIM done queues. This optimizes the locking a
little bit and allows for direct dispatch of the doneq from certain
contexts that would otherwise face recursive locking problems.
2007-04-19 14:28:43 +00:00
Ariff Abdullah
fd7390d640 - AC97 quirk / patch cleanups. Most quirks doesn't work in general sense
and should only be applied on certain specific card / vendor, hence the
  addition of ac97_getsubvendor().
- Fix low volume issue on several MSI laptops through ALC655 quirk.

Reported/Tested by:	Christian Mueller
                   	<raptor-freebsd-multimedia@xpls.de>
MFC after:		1 week
2007-04-19 13:54:22 +00:00
Sepherosa Ziehau
b03cfe2396 - Fix mbuf/node leakage in drivers' raw_xmit().
- For ural(4):
  o  Fix node leakage in ural_start(), if ural_tx_mgt() fails.
  o  Fix mbuf leakage in ural_tx_{mgt,data}(), if usbd_transfer() fails.
  o  In ural_tx_{mgt,data}(), set ural_tx_data.{m,ni} to NULL, if
     usbd_transfer() fails, so they will not be freed again in ural_stop().

Approved by:	sam (mentor)
2007-04-19 13:09:57 +00:00
Randall Stewart
f1f73e5718 - More work on making send lock contention.
- Removed free-oqueue cache.
- Fix counter for sq entries
- Increased the amount of information retained
  on ASOC_TSN logging on the association.
- Made it so with the ASOC_TSN logging on
  sending or recieving an abort we dump the log.
- Went through and added invariant's around some
  panic's that needed them.
- decrements went to atomic_subtact_int instead of add -1
- Removed residual count increment that threw off a
  strm oq count.
- Tracks and complaints if we don't have a LAST fragment and
  clean up the sp structure.
- Track a new stat that counts number of abandoned msgs that
  happen if you close without reading.
- Fix lookup of frag point to be aware of a 0 assoc-id.
Reviewed by:	gnn
2007-04-19 11:28:43 +00:00
Poul-Henning Kamp
3f17cc74af style nit 2007-04-19 09:18:51 +00:00
Joseph Koshy
382d30cdd8 Fix witness(4) warnings about mutex use.
Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):

 - leaf spin mutexes---only one of these should be held at a time,
   so these mutexes are specified as belonging to a single witness
   type "pmc-leaf".

 - `struct pmc_owner' descriptors are protected by a spin mutex of
   witness type "pmc-owner-proc".  Since we call wakeup_one() while
   holding these mutexes, the witness type of these mutexes needs
   to dominate that of "sleepq chain" mutexes.

 - logger threads use a sleep mutex, of type "pmc-sleep".

Submitted by:	wkoszek (earlier patch)
2007-04-19 08:02:51 +00:00
Pawel Jakub Dawidek
fb1daf8164 Fix a bug in sendfile(2) when files larger than page size and nbytes=0.
When nbytes=0, sendfile(2) should use file size. Because of the bug, it
was sending half of a file. The bug is that 'off' variable can't be used
for size calculation, because it changes inside the loop, so we should
use uap->offset instead.
2007-04-19 05:54:45 +00:00
Alan Cox
f40fd96d5b Correct contigmalloc2()'s implementation of M_ZERO. Specifically,
contigmalloc2() was always testing the first physical page for PG_ZERO,
not the current page of interest.

Submitted by: Michael Plass
PR: 81301
MFC after: 1 week
2007-04-19 05:39:54 +00:00
Alan Cox
a96d395ba1 Correct two comments.
Submitted by: Michael Plass
2007-04-19 04:52:47 +00:00
Nate Lawson
0ae62c18a0 Bump the interrupt storm detection counter to 1000. My slow fileserver
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk.  Change the print logic to print once per second
when the storm is occurring instead of only once.  Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.

Reviewed by:	jhb
MFC after:	1 week
2007-04-19 01:24:32 +00:00
Jung-uk Kim
f1753e0585 Fix style(9) and comments.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 20:12:05 +00:00
Ariff Abdullah
2e334adf6a sndbuf_alloc() now accept dmaflags argument which will be forwarded to
internal bus_dmammem_alloc() for greater flexibility on setting up DMA /
page attributes.
2007-04-18 18:26:41 +00:00
Ariff Abdullah
e492b75981 Break ABI / module compatibility for the upcoming sndbuf_alloc() changes. 2007-04-18 18:20:48 +00:00
Andre Oppermann
bbf4e1cb47 Make tcp_twrespond() use tcp_addoptions() instead of a home grown version. 2007-04-18 18:14:39 +00:00
Jung-uk Kim
d477452eb3 style(9) says sizeof's are not be followed by a space. Fix them. 2007-04-18 18:11:32 +00:00
Jung-uk Kim
86a0e5dbb6 Implement settimeofday() for Linuxulator/amd64.
Submitted by:	Scot Hetzel (swhetzel at gmail dot com)
2007-04-18 18:08:12 +00:00
Pawel Jakub Dawidek
32371d2025 MFp4: Fix automatic snapshot mount when unprivileged user does lookup
on a snapshot directory:
- Remove PRIV_VFS_MOUNT check - regular users can mount snapshots
  via lookups on snapshot directory.
- Reset mount credential to kcred, so user won't be able to unmount
  the snapshot.
- Reset owner uid.
- Unlock vnode in case of a failure.

Reported by:	simokawa
2007-04-18 15:24:48 +00:00
Pawel Jakub Dawidek
f2c9a576db MFp4: We check for PRIV_VFS_MOUNT already in mount(2) syscall and we don't
want to do the check when snapshot is automatically mounted by an
      unprivileged user doing lookup on a snapshot directory.
2007-04-18 15:22:07 +00:00
Poul-Henning Kamp
cc76e59ded On AMD's Geode LX: Force the TSC to run through core-suspension so we can
use it as a timecounter.

Sponsored by: Soekris Engineering
2007-04-18 10:08:24 +00:00
Scott Long
545f17a3c8 Missed locking the dump and shutdown entry points in the scsi_da driver. 2007-04-18 05:14:16 +00:00
Scott Long
8008a935a7 Revert a driver API change to xpt_alloc_ccb that isn't necessary. Fix a
couple of associated error checks.
2007-04-18 04:58:53 +00:00
Pyun YongHyeon
eed497bbe5 Don't reinitialize the hardware if only PROMISC flag was changed.
Previously whenever PROMISC mode turned on/off link renegotiation
occurs and it could resulted in network unavailability for serveral
seconds.(Depending on switch STP settings it could last several tens
seconds.)

Reported by:	Prokofiev S.P.  < proks AT logos DOT uptel DOT net >
Tested by:	Prokofiev S.P.  < proks AT logos DOT uptel DOT net >
2007-04-18 00:40:43 +00:00
Poul-Henning Kamp
4898b3a557 Add support for hw-assisted checksums on 6105M.
Sponsored by: Soekris Engineering
2007-04-17 22:59:54 +00:00
Pawel Jakub Dawidek
35e8a7fad7 Simplify. 2007-04-17 21:58:34 +00:00
Pawel Jakub Dawidek
a1bcf4dc7b - Fix a leftover - vfs_mount_alloc() is now exported properly.
This fixes stange panics when listing .zfs/snapshot/ directory for me.
  Reported by:	simokawa
  Reported by:	Johan Hendriks <Johan@double-l.nl>
- Hide cache_purge() under FREEBSD_NAMECACHE like in other files.
- Protect mnt_flag with mount interlock.
2007-04-17 21:16:34 +00:00
Pawel Jakub Dawidek
7760d8409f Export vfs_mount_alloc() as it is used in ZFS. 2007-04-17 21:14:06 +00:00
John Baldwin
88a5255bc4 Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after:	1 week
Requested by:	ariff
Reviewed by:	scottl
2007-04-17 21:05:34 +00:00
Pawel Jakub Dawidek
39db4c6e0f Ignore hostid check for root-on-ZFS configurations. Making hostid available
before the root is mounted is tricky and having it in /boot/ is not really
desire.

Reported by:	Zephiris <zephiris@gmail.com>
2007-04-17 17:57:34 +00:00
Poul-Henning Kamp
c859cda5eb No need to throw tag+handle around on the stack. 2007-04-17 17:32:39 +00:00
Andre Oppermann
9eab54debf When we run into the syncache entry limits syncache_add() tries
to free the oldest entry in the current bucket row.  The global
entry limit may be smaller than the bucket rows and their limit
combined however.  Thus only try to free a syncache entry if we
found one in this bucket row.

Reported by:	kris
2007-04-17 15:25:14 +00:00
John Baldwin
90dea4f9a7 When trying to allocate a PnP BIOS memory resource, the code loops trying
to move up the start address until the allocation succeeds.  If the
alignment of the resource was 0, then the code would keep trying the same
request in an infinite loop and hang.  Force the request to always move
start up by at least 1 byte each time through the loop.
2007-04-17 15:14:23 +00:00
Robert Watson
b63c567b6f Change $P4$ ID strings to P4 ID strings so that they are not auto-expanded
when integrated back into Perforce.  This avoids unnecessary conflicts
during the loopback of files maintained in Perforce.
2007-04-17 12:27:08 +00:00
Robert Watson
8b65d3135a Remove $P4$ that snuck into CVS from Perforce. 2007-04-17 12:24:18 +00:00
Poul-Henning Kamp
1c04bd82a5 Improve the if_vr driver ever so slightly.
The 6105M and 6102 does not have the DWORD alignment problem, so
don't m_defrag() every packet in the transmit path for those.

More stringent usage of tx-descriptor ring and its flags.

Tested on 6102 and 6105M, other chips may also be able to run
without the m_defrag() but I have neither hardware nor docs to
find out.

Sponsored by:	Soekris Engineering
2007-04-17 12:23:57 +00:00
Robert Watson
c9791cfb3e Shorten text string for ip_fw2 dynamic rules zone by removing the word
"zone", which is generally not present in zone names.  This reduces the
incidence of line-wrapping in "vmstat -z " using 80-column displays.

MFC after:	3 days
2007-04-17 09:28:36 +00:00
Scott Long
032b0a17dc Basic MPSAFE locking for the AHC and AHD drivers. 2007-04-17 06:26:25 +00:00
Warner Losh
1a13e01f7f Don't use spinlocks here. The iicbus transactions can take a long
time, and this prevents interrupts (say for Hz/hardclock) from
happening.  Time stands still during the transfers...
2007-04-17 05:48:35 +00:00
Scott Long
b653ca76bc Don't delete the devalias, as per the man page.
Submitted by: jmg
2007-04-17 01:12:35 +00:00
Andrew Thompson
18242d3b09 Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.
The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on:	current@
2007-04-17 00:35:11 +00:00
John Baldwin
2248f68064 - Add a 'show rman <rm>' DDB command to dump the resources in a resource
manager similar to 'devinfo -u'.
- Add a 'show allrman' DDB command that effectively does 'show rman' on all
  resource managers in the system.
2007-04-16 21:09:03 +00:00
Scott Long
84f824818c For the XPT_SASYNC_CB operation, only decouple the broadcast to the bus
and device lists instead of decoupling the whole operation.  This avoids
problems with SIMs going away.
2007-04-16 19:55:36 +00:00
Scott Long
f35487464c Drop the topology lock before calling the periph oninvalidate and dtor
vectors.
2007-04-16 19:42:23 +00:00
Scott Long
cd5c9285cd Drop the periph/sim lock when calling disk_destroy(). 2007-04-16 19:41:14 +00:00
Scott Long
d292906a7c Destroy the devalias before destroying the dev. 2007-04-16 19:40:13 +00:00
Robert Watson
0e92f0d7dd Merge OpenBSM 1.0 alpha 14 changes to src/sys/security/audit:
- au_to_attr64(), au_to_process64(), au_to_subject64(),
  au_to_subject64_ex(), au_to_zonename(), au_to_header64_tm().
- Extended address token fixes.

Obtained from:	TrustedBSD Project
2007-04-16 16:20:45 +00:00
Robert Watson
bfbc9a096b Update src/sys/bsm for OpenBSM 1.0 alpha 14 import.
Add new audit event types.
2007-04-16 16:13:10 +00:00
Pawel Jakub Dawidek
6b3d6017e8 s/destory/destroy/ (except for the code in contrib/). 2007-04-16 12:31:35 +00:00
Pawel Jakub Dawidek
8cb195f758 Uncomment forgotten check. Without this check in-place, ZFS will panic on
unload instead of returning EBUSY. This check tells if there are mounted
ZFS file systems or not. We can't unload if there are mounted file systems.

Reported by:	Andrey V. Elsukov <bu7cher@yandex.ru>
2007-04-16 10:23:24 +00:00
Kip Macy
d302816a12 PHYS_TO_VM_PAGE requires explicit vm_page.h include on sparc64 2007-04-15 22:17:10 +00:00
Robert Watson
215c8d75b8 Remove unused variable tcbinfo_mtx. 2007-04-15 21:03:23 +00:00
Dag-Erling Smørgrav
8edf8ae133 Avoid "unused variable" warning when building without PSEUDOFS_TRACE. 2007-04-15 20:35:18 +00:00
Matt Jacob
07589439e5 Use %j and args cast to uintmax_t to print bus_addr_t && length args. 2007-04-15 19:03:45 +00:00
Christian S.J. Peron
db8086c4fa Add an entry for AUT_ZONENAME and the prototype for the au_to_zonename()
function that will be implemented shortly. This is being done for the
openbsm import.
2007-04-15 17:24:41 +00:00
Dag-Erling Smørgrav
388596dffc Make pseudofs (and consequently procfs, linprocfs and linsysfs) MPSAFE. 2007-04-15 17:10:01 +00:00
Dag-Erling Smørgrav
b1f9e8cec9 Instead of stating GIANT_REQUIRED, just acquire and release Giant where
needed.  This does not make a difference now, but will when procfs is
marked MPSAFE.
2007-04-15 17:06:09 +00:00
Dag-Erling Smørgrav
78c3440e7d Whitespace cleanup. 2007-04-15 17:02:03 +00:00
Robert Watson
a0bda9d077 In nfsrv_rcv(), don't reacquire the nfs server lock until after
nfs_realign() has been called, as it may sleep waiting on memory
allocation.

Reported by:	simon
2007-04-15 15:50:50 +00:00
Kip Macy
2b6dbb2afa Add pmap includes needed by i386 2007-04-15 15:30:45 +00:00
Dag-Erling Smørgrav
302762c344 Fix the same bug as in procfs_doproc{,db}regs(): check that uio_offset is
0 upon entry, and don't reset it before returning.

MFC after:	3 weeks
2007-04-15 13:29:36 +00:00
Dag-Erling Smørgrav
66cd74a611 Don't reset uio_offset to 0 before returning. Instead, refuse to service
requests where uio_offset is not 0 to begin with.  This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.

MFC after:	3 weeks
2007-04-15 13:24:03 +00:00
Randall Stewart
f1d6e6dc71 Fix stupid syntax error - Pointy hat to me :-( 2007-04-15 13:03:14 +00:00
Dag-Erling Smørgrav
ab26caf6af Add macros to assert that the process is / isn't held in memory.
MFC after:	3 weeks
2007-04-15 12:59:49 +00:00
Randall Stewart
478d3f0901 - Add more comments to sctps_stats struture in sctp_uio.h
- Fix bug that prevented EEOR mode from working
  and simplified the can_we_split code in the process.
- Reduce lock contention for the tcb_send_lock. I did
  this especially for EEOR mode, still need to look at
  why I need a lock when removing from the tailq and the
  ->next is NOT null. A lock fixes it but it implies a
  bug yet exists.
- Activated Andre's proposed changes to better use the mbuf
  infrastructure.
- Fixed places that were not using the aloc macro's to take
  advantage of the per assoc cache.
- Adds ifdef fix so any logging will enable stat_logging to
  get the right data structures in place (suggested by Max Laier).
2007-04-15 11:58:26 +00:00
Pawel Jakub Dawidek
7ae6548e62 MFp4: Start DNLC after desiredvnodes variable is initialized.
Before this change if zfs.ko was loaded by the loader, DNLC was
      automatically disabled.

Reported by:	Zephiris <zephiris@gmail.com>
2007-04-15 09:10:17 +00:00
Scott Long
2b83592fdc Remove Giant from CAM. Drivers (SIMs) now register a mutex that CAM will
use to synchornize and protect all data objects that are used for that
SIM.  Drivers that are not yet MPSAFE register Giant and operate as
usual.  RIght now, no drivers are MPSAFE, though a few will be changed
in the coming week as this work settles down.

The driver API has changed, so all CAM drivers will need to be recompiled.
The userland API has not changed, so tools like camcontrol do not need to
be recompiled.
2007-04-15 08:49:19 +00:00
Kip Macy
4f450d951a back out option to disable packet zone
Requested by: sam
2007-04-15 06:30:28 +00:00
Kip Macy
ba68b814cc suck in more of busdma to enable more efficient mappings
kill redundant INVARIANTS check
2007-04-15 05:46:34 +00:00
Kip Macy
d43f50b93a Add sysctl for disabling/enabling mbuf chain collapsing
remove map creation before calling bus_dmamap_load_mvec_sg
2007-04-15 05:45:10 +00:00
Kip Macy
52c81add3c Implement ZERO_COPY_SOCKETS check in a way that doesn't make LINT unhappy 2007-04-15 04:55:39 +00:00
Pawel Jakub Dawidek
87e89536f1 Fix RAID-Z resilvering.
Obtained from:	OpenSolaris
2007-04-14 20:50:14 +00:00
Kip Macy
51580731ae Add support for mbuf iovec in the TX path 2007-04-14 20:40:22 +00:00
Kip Macy
642046797b add reference count pointer to mbuf iovec
implement robust version of m_collapse
add support for sf_buf
add fix for m_iovappend
add calls to m_sanity under INVARIANTS
fix m_freem_vec to correctly travese the mbuf iovec chain
2007-04-14 20:38:38 +00:00
Kip Macy
21c5f3f383 hide static declaration
remove extra white space
2007-04-14 20:31:05 +00:00
Kip Macy
21ee3e7aff remove now invalid check from m_sanity
panic on m_sanity check failure with INVARIANTS
2007-04-14 20:19:16 +00:00
Kip Macy
f8bbd17f06 Add option for disabling allocation from the packet zone 2007-04-14 20:16:03 +00:00
Kip Macy
38073c4181 pad out m_hdr to make pkthdr word-aligned
shuffle pkthdr.len so that pkthdr.header is aligned without compiler added padding

Reviewed by: rwatson, andre, sam
2007-04-14 19:42:20 +00:00
Max Laier
d0cf96b407 Fix a typeo - unbreak the build. 2007-04-14 18:27:34 +00:00
Maxim Konovalov
461f64fe8c o Add bsm and security to a list of cscope dirs. 2007-04-14 16:29:15 +00:00
Pawel Jakub Dawidek
d48078479c MFp4: Hmm, it seems to work now. 2007-04-14 15:01:50 +00:00
Dag-Erling Smørgrav
f61bc4ea5e Further pseudofs improvements:
The pfs_info mutex is only needed to lock pi_unrhdr.  Everything else
in struct pfs_info is modified only while Giant is held (during
vfs_init() / vfs_uninit()); add assertions to that effect.

Simplify pfs_destroy somewhat.

Remove superfluous arguments from pfs_fileno_{alloc,free}(), and the
assertions which were added in the previous commit to ensure they were
consistent.

Assert that Giant is held while the vnode cache is initialized and
destroyed.  Also assert that the cache is empty when it is destroyed.

Rename the vnode cache mutex for consistency.

Fix a long-standing bug in pfs_getattr(): it would uncritically return
the node's pn_fileno as st_ino.  This would result in st_ino being 0
if the node had not previously been visited by readdir(), and also in
an incorrect st_ino for process directories and any files contained
therein.  Correct this by abstracting the fileno manipulations
previously done in pfs_readdir() into a new function, pfs_fileno(),
which is used by both pfs_getattr() and pfs_readdir().
2007-04-14 14:08:30 +00:00
Pawel Jakub Dawidek
8aff52ca4e MFp4: Use max_ncpus, which is used in other places in the code. 2007-04-14 12:33:47 +00:00
Pawel Jakub Dawidek
8870baf005 MFp4: Add more debug, so we can see if zpool.cache was loaded or why it
wasn't loaded.
2007-04-14 12:23:03 +00:00
Pawel Jakub Dawidek
dbd490e0e2 MFp4: Allow to tune vfs.zfs.debug from loader.conf. 2007-04-14 12:21:06 +00:00
Pawel Jakub Dawidek
c98fbf0418 MFp4: - Allow to tune number of spa_zio_* threads.
- Reduce default number of spa_zio_* threads to N*spa_zio_issue
	  plus N*spa_zio_intr threads per ZIO type, where N is the number
	  of CPUs.
	- Put ZIO type number in thread's name.
2007-04-14 12:20:06 +00:00
Robert Watson
d72a615878 Some Linux applications (ping) pass a non-NULL msg_control argument to
sendmsg() while using a 0-length msg_controllen.  This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL.  This allows Linux ping to work.

Submitted by:	rdivacky
2007-04-14 10:35:09 +00:00
Randall Stewart
c105859eee - fix source address selection when picking an acceptable address
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
  if we failed to get init/or/cookie to bring up an assoc. Change
  so we don't just give a generic "comm lost" but look at actual
  states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
  compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
  for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
  during window probes on the sent queue. This would then
  cause an out-of-order problem and assure that the flight
  size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
  overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
  was active, fix to make strict sacks a bit smarter about what
  the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
  commit of at least m_last().. need to adjust for 6.2 as
  well.. since m_last won't exist.
Reviewed by:	gnn
2007-04-14 09:44:09 +00:00
Bruce M Simpson
05d91e4363 In member interface detach event handler, do not attempt to free state
which has already been freed by in_ifdetach(). With this cumulative change,
the removal of a member interface will not cause a panic in pfsync(4).

Requested by:	yar
PR:		86848
2007-04-14 01:01:46 +00:00
Pawel Jakub Dawidek
24b0502ee0 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
Pawel Jakub Dawidek
bd59d85850 Fix overflow, which was causing endless loops when 32bit machine had more
than 2GB of RAM. This was because our physmem is long and 'physmem*PAGESIZE'
can be negative for more than 2GB of memory.

Reported by:	Andrey V. Elsukov <bu7cher@yandex.ru>

It is not yet tested by Andrey, so there can be other problems, but this
was definiately a bug, so I'm committing a fix now.
2007-04-13 18:50:03 +00:00
Maxim Konovalov
b274ce9ef2 o Extend the list of supported CDMA-2000 terminals.
Submitted by:	R.Mahmatkhanov
MFC after:	10 days
2007-04-13 18:15:07 +00:00
Alan Cox
0b76504872 Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@
2007-04-13 16:07:29 +00:00
Christian S.J. Peron
f0cbfcc468 Fix the handling of IPv6 addresses for subject and process BSM audit
tokens. Currently, we do not support the set{get}audit_addr(2) system
calls which allows processes like sshd to set extended or ip6
information for subject tokens.

The approach that was taken was to change the process audit state
slightly to use an extended terminal ID in the kernel. This allows
us to store both IPv4 IPv6 addresses. In the case that an IPv4 address
is in use, we convert the terminal ID from an struct auditinfo_addr to
a struct auditinfo.

If getaudit(2) is called when the subject is bound to an ip6 address,
we return E2BIG.

- Change the internal audit record to store an extended terminal ID
- Introduce ARG_TERMID_ADDR
- Change the kaudit <-> BSM conversion process so that we are using
  the appropriate subject token. If the address associated with the
  subject is IPv4, we use the standard subject32 token. If the subject
  has an IPv6 address associated with them, we use an extended subject32
  token.
- Fix a couple of endian issues where we do a couple of byte swaps when
  we shouldn't be. IP addresses are already in the correct byte order,
  so reading the ip6 address 4 bytes at a time and swapping them results
  in in-correct address data. It should be noted that the same issue was
  found in the openbsm library and it has been changed there too on the
  vendor branch
- Change A_GETPINFO to use the appropriate structures
- Implement A_GETPINFO_ADDR which basically does what A_GETPINFO does,
  but can also handle ip6 addresses
- Adjust get{set}audit(2) syscalls to convert the data
  auditinfo <-> auditinfo_addr
- Fully implement set{get}audit_addr(2)

NOTE: This adds the ability for processes to correctly set extended subject
information. The appropriate userspace utilities still need to be updated.

MFC after:	1 month
Reviewed by:	rwatson
Obtained from:	TrustedBSD
2007-04-13 14:55:19 +00:00
Pawel Jakub Dawidek
f0bc5ac3e1 Fix vnodes starvation caused by DNLC (ZFS name cache):
- Tune number of namecache entires better (based on desiredvnodes).
- Handle vfs_lowvnodes event by releasing requested number of name cache
  entries, but no less than 5%.

Reported by:	simokawa
2007-04-13 08:42:01 +00:00
Pawel Jakub Dawidek
6bc3ab2574 When we are running low on vnodes, there is currently no way to ask other
subsystems to release some vnodes. Implement backpressure based on
vfs_lowvnodes event (similar to vm_lowmem for memory).
2007-04-13 08:38:48 +00:00
Pawel Jakub Dawidek
6704017a15 MFp4: Synchronize with vendor (mostly 'zfs rename -r'). 2007-04-12 23:16:02 +00:00
Pawel Jakub Dawidek
1da61b3665 MFp4: Bring back comments.
Requested by:	jhb
2007-04-12 23:14:25 +00:00
Lukas Ertl
a2237c41fc -) Correct sdcount for a plex when removing or adding subdisks.
-) Set correct sizes for plexes and volumes a subdisk has been removed.

Submitted by:   Ulf Lilleengen <lulf_AT_freebsd.org>
2007-04-12 17:54:35 +00:00
Lukas Ertl
9e357b05da Avoid infinite loop if the device string given for a drive
only consists of "/".

Submitted by:  Ulf Lilleengen <lulf_AT_freebsd.org>
2007-04-12 17:40:44 +00:00
Alan Cox
1434c1f34b MFamd64
Define PGEX_RSV.
2007-04-12 17:00:56 +00:00
Ruslan Ermilov
f76f6cfd25 Fix PAE on SMP by enabling EFER_NXE on all APs.
Reported by:	kris
Diagnosed by:	alc
2007-04-12 11:05:24 +00:00
Kip Macy
aa84193acf restore sense to get_imm_packet
MFC after: 3 days
2007-04-12 04:48:54 +00:00
Kip Macy
98d6fba71d switch over to per-txq dma tag to facilitate parallelism on TX
MFC after: 3 days
2007-04-12 04:31:44 +00:00
Kip Macy
dd782506d8 explicitly check TSO flag
don't clear and then set M_PKTHDR, m_gethdr sets it correctly
improve error handling on m_gethdr failure

MFC after: 3 days
2007-04-12 03:33:30 +00:00
Kip Macy
23ed7b513f Add ETHER_HDR_LEN to hardware accepted mtu
MFC after: 3 days
2007-04-12 03:07:24 +00:00
Andrew Thompson
575156b607 Fix a case where the multicast addresses were not removed from some ports. The
first port to be removed from the trunk would free the multicast list so
subsequent removed ports didnt have their multicast addresses removed.
2007-04-12 01:58:57 +00:00
Andre Oppermann
dfd389bf64 Add m_last() inline function. 2007-04-11 23:13:12 +00:00
Dag-Erling Smørgrav
15bad11fdb Add a flag to struct pfs_vdata to mark the vnode as dead (e.g. process-
specific nodes when the process exits)

Move the vnode-cache-walking loop which was duplicated in pfs_exit() and
pfs_disable() into its own function, pfs_purge(), which looks for vnodes
marked as dead and / or belonging to the specified pfs_node and reclaims
them.  Note that this loop is still extremely inefficient.

Add a comment in pfs_vncache_alloc() explaining why we have to purge the
vnode from the vnode cache before returning, in case anyone should be
tempted to remove the call to cache_purge().

Move the special handling for pfstype_root nodes into pfs_fileno_alloc()
and pfs_fileno_free() (the root node's fileno must always be 2).  This
also fixes a bug where pfs_fileno_free() would reclaim the root node's
fileno, triggering a panic in the unr code, as that fileno was never
allocated from unr to begin with.

When destroying a pfs_node, release its fileno and purge it from the
vnode cache.  I wish we could put off the call to pfs_purge() until
after the entire tree had been destroyed, but then we'd have vnodes
referencing freed pfs nodes.  This probably doesn't matter while we're
still under Giant, but might become an issue later.

When destroying a pseudofs instance, destroy the tree before tearing
down the fileno allocator.

In pfs_mount(), acquire the mountpoint interlock when required.

MFC after:	3 weeks
2007-04-11 22:40:57 +00:00
Robert Watson
949da0d8f8 Remove obsolete comment about privileges: SUSER_ALLOWJAIL is no longer set
in this code.
2007-04-11 16:31:02 +00:00
Robert Watson
94b94b2b49 Remove now-obsolete comment regarding mqueue privileges in jail. 2007-04-11 16:22:59 +00:00
Ruslan Ermilov
7480de4305 Make "struct tcp_timer" visible only to the kernel, and unbreak world. 2007-04-11 14:08:42 +00:00
John Baldwin
e403490aa6 Fix m_freem_vec() to actually traverse the mbuf chain. This avoids
double free's and an infinite loop.

CID:		1834
Found by:	Coverity Prevent (tm)
2007-04-11 13:47:24 +00:00
John Baldwin
4796ce4956 Group the loop to acquire/release Giant with the WITNESS_SAVE/RESTORE under
a single conditional.  The two operations are linked, but since the link
is not very direct, Coverity can't see it.  Humans might also miss the
link as well.  So, this isn't fixing any actual bugs, just improving
readability.

CID:		1787 (likely others as well)
Found by:	Coverity Prevent (tm)
2007-04-11 13:44:55 +00:00
Ruslan Ermilov
9fd6e3d4a4 This commit was generated by cvs2svn to compensate for changes in r168616,
which included commits to RCS files with non-trunk default branches.
2007-04-11 11:09:18 +00:00
Ruslan Ermilov
1859f337c4 Unbreak world build. 2007-04-11 11:09:18 +00:00
Andre Oppermann
b8152ba793 Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by:	rwatson (earlier version)
2007-04-11 09:45:16 +00:00
Nate Lawson
1b96d500fb Put some overly verbose prints under bootverbose. This is on the vendor
branch but we need to work out a different interface with the vendor.
2007-04-11 02:03:36 +00:00
Nate Lawson
c1149e97bb This commit was generated by cvs2svn to compensate for changes in r168609,
which included commits to RCS files with non-trunk default branches.
2007-04-11 02:03:36 +00:00
Pyun YongHyeon
b5898b804f Add work around for hardware Tx checksum offload bug in Yukon II.
Yukon II generated corrupted TCP checksum for short TCP packets
that's less than 60 bytes in size(e.g. window probe packet, pure ACK
packet etc). Padding the frame with zeros to make the frame minimum
ethernet frame size didn't work at all. Instead of dropping Tx
checksum offload support we calculate TCP checksum with S/W method
when we encounter short TCP frames.
Fortunately it seems that short UDP datagrams appear to be handled
correctly by Yukon II.

While I'm here simplify ethernet/VLAN header size calculation logic.

PR:	111384
2007-04-11 00:47:29 +00:00
Pawel Jakub Dawidek
7f64b05f79 Move rpc/types.h under sys/, as this is used by ZFS kernel module.
Repo-copied by:	simon
2007-04-10 22:10:16 +00:00
Wojciech A. Koszek
f7caeade24 strchr() and strrchr() are already present in the kernel, but with less
popular names. Hence:

- comment current index() and rindex() functions, as these serve the same
  functionality as, respectively, strchr() and strrchr() from userland;
- add inlined version of strchr() and strrchr(), as we tend to use them more
  often;
- remove str[r]chr() definitions from ZFS code;

Reviewed by:	pjd
Approved by:	cognet (mentor)
2007-04-10 21:42:12 +00:00
Pawel Jakub Dawidek
fef2a25971 Remove trailing '.' for consistency! 2007-04-10 21:40:13 +00:00
Scott Long
6eef46be3b Whitespace fixes 2007-04-10 21:37:37 +00:00
Marius Strobl
5abeece6ab Let brgphy(4) attach for the Broadcom BCM5755 ASIC based chipsets
as well.

Obtained from:	OpenBSD
MFC after:	1 week
2007-04-10 20:43:23 +00:00
Marius Strobl
a404cff674 On i386 compile the back-end with EISA support as well as the EISA
front-end if the dpt(4) module is built along with a kernel that
includes eisa(4) or when compiling it stand-alone (logic based on
the corresponding ISA logic in sys/modules/sound/sound/Makefile).
As as side-effect this fixes the stand-alone build of the dpt(4)
module after dpt.h 1.17, dpt_eisa.c 1.22 and dpt_scsi.c 1.55.

Breakage reported by:	n_hibma
2007-04-10 20:33:31 +00:00
Scott Long
715ab2120d A fix for the SG_GET_TIMEOUT function slipped into a previous commit by
accident.  Remove the text describing the problem as it is no longer
relevant.  Also give real implementations for the GET and SET ioctls.
2007-04-10 20:03:42 +00:00
Pawel Jakub Dawidek
57bcf75fd2 Add UFS_GJOURNAL options to the GENERIC kernel.
Approved by:	re (kensmith)
2007-04-10 16:49:41 +00:00
Robert Watson
e8c5c7a635 Update comment regarding how we check privilege on FreeBSD: we now use
priv_check().
2007-04-10 16:09:00 +00:00
Robert Watson
4b08405682 Allow PRIV_NETINET_REUSEPORT in jail. 2007-04-10 15:59:49 +00:00
Robert Watson
6493245ded Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser
checks to see whether bind() can reuse a port/address combination while
it's already in use (for some definition of use).
2007-04-10 15:58:38 +00:00
Robert Watson
cc807dbd0a Remove unnecessary suser() check in the sysctl to set up ath_hal
logging: the sysctl framework will already have checked for privilege
if a sysctl value is being set.

Discussed a long time ago with:	sam
2007-04-10 15:48:45 +00:00
Robert Watson
9956b3f5e4 Do allow POSIX mqueue unlink privilege inside a jail, as we all all
other POSIX mqueue privileges inside a jail.
2007-04-10 15:40:27 +00:00
Pawel Jakub Dawidek
08be819487 Minor style cleanups (mostly removal of trailing whitespaces). 2007-04-10 15:29:37 +00:00
Pawel Jakub Dawidek
21ff8c6715 Correct typos. 2007-04-10 15:22:40 +00:00
Pawel Jakub Dawidek
1b6e2c02fe MFp4: Allow to set zfs_recover via vfs.zfs.recover from /boot/loader.conf. 2007-04-10 12:54:19 +00:00
Pawel Jakub Dawidek
5b9528e2d4 MFp4: Hide under '#ifdef _KERNEL' only what's really needed. 2007-04-10 12:52:14 +00:00
Giorgos Keramidas
a52da38f26 Minor typo fix, noticed while I was going through *_pager.c files. 2007-04-10 12:34:51 +00:00
Konstantin Belousov
5b959aa44f Fix the NAMEI zone leak when snapshot was successfully created.
Reported and tested by:	Peter Holm
MFC after:		2 weeks
2007-04-10 09:31:42 +00:00
Konstantin Belousov
9724167c2a Recalculate the NEWBLOCK flag for pagedep structure after the softdep
lock is dropped, since pagedep may be already processed and deallocated.

Found and tested by:	kris
MFC after:		2 weeks
2007-04-10 09:30:41 +00:00
Konstantin Belousov
23743f6a11 When LK_NOWAIT is passed as argument to process_worklist_item(), this
does not prevent handle_workitem_remove() from recursing into a blocking
version. Add the dirrem to worklist instead of processing it now if this
is the case.

Reported and tested by:	kris
Submitted by:		tegge
MFC after:		2 weeks
2007-04-10 09:28:17 +00:00
Andrew Thompson
49fd43bdbc Fix an uninitialized variable warning. 2007-04-10 08:02:33 +00:00
Andrew Thompson
40c97c2118 Fix build, trunk is a device not an option. 2007-04-10 03:09:38 +00:00
Pawel Jakub Dawidek
2d03e33170 Try to stabilize ZFS with regard to memory consumption:
- Allow to shrink ARC down to 16MB (instead of 64MB).
- Set arc_max to 1/2 of kmem_map by default.
- Start freeing things earlier when low memory situation is detected.
- Serialize execution of arc_lowmem().

I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of
kmem_map size. If there is less RAM or kmem_map, a warning will be printed.
World is cruel, be no better. In other words: modern file system requires
modern hardware:)

From ZFS administration guide:

"Currently the minimum amount of memory recommended to install a Solaris
 system is 512 Mbytes. However, for good ZFS performance, at least one
 Gbyte or more of memory is recommended."
2007-04-10 02:35:57 +00:00
Pawel Jakub Dawidek
52124c7f1c Reduce diff against vendor - we have now stronger check for "mutex already
initialized", so we can go back to kmem_alloc().
2007-04-10 02:19:12 +00:00
Andrew Thompson
75efd6fd67 Add trunk(4) module. 2007-04-10 00:41:31 +00:00
Andrew Thompson
7b62d98bf8 Hook trunk(4) up to the build. 2007-04-10 00:35:31 +00:00
Andrew Thompson
b47888ceba Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance.  This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover    - Sends traffic through the secondary port if the master becomes
              inactive.
fec         - Supports Cisco Fast EtherChannel.
lacp        - Supports the IEEE 802.3ad Link Aggregation Control Protocol
              (LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin  - Distributes outgoing traffic using a round-robin scheduler
              through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.
2007-04-10 00:27:25 +00:00
Pawel Jakub Dawidek
0404b7791b Remove unused #define. 2007-04-09 23:30:28 +00:00
Andrew Thompson
6429a5cb9b Fix a compiler warning so hash.h can be included in the kernel. This changes
the args for hash32_stre and hash32_strne but there are no consumers in the
base system and openbgpd does not use it which the initial import was for.

Silence on:	hackers
2007-04-09 22:55:14 +00:00
Pawel Jakub Dawidek
6db107202a Fix build breakage. 2007-04-09 22:29:13 +00:00
Pawel Jakub Dawidek
151db24af1 Add zfs_load here.
Reminded by:	bmah
2007-04-09 22:09:09 +00:00
Nate Lawson
a363f67a81 Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost.  We're still single-threaded at this point, but just
be safe for the future.
2007-04-09 21:10:04 +00:00
Nate Lawson
6b1e469ea5 Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.
2007-04-09 19:23:52 +00:00
Pawel Jakub Dawidek
82068fe7a9 Add kern.hostuuid sysctl, which will be used to keep host's UUID.
Reviewed by:	mlaier, rink, brooks, rwatson
2007-04-09 19:18:09 +00:00
Paolo Pisati
d640d2e29d The old PacketAlias* API is not exported when
libalias run in kernel land.
2007-04-09 17:08:27 +00:00
Kip Macy
a53b1c1753 throw sun4v into the check while we're at it 2007-04-09 17:05:54 +00:00
Kip Macy
3a0a4ac13d busdma tags are opaque on all architectures except sparc64
for now simply don't compile/use on sparc64
2007-04-09 17:01:23 +00:00
Alexander Kabaev
74c7f74304 LINT on ia64 requires memset symbol too. Make fire it is present by adding
it to libkern on this architecture.
2007-04-09 14:02:18 +00:00
Andre Oppermann
cc9164e2e6 Sort sctp_*.c files. 2007-04-09 12:51:29 +00:00
Scott Long
4400b36d94 Make use of M_ZERO in various malloc calls. 2007-04-09 05:47:32 +00:00
Scott Long
472cdbef04 Fix a logic bug that slipped in at the last minute and apparently escaped
testing.
2007-04-09 05:43:02 +00:00
Pawel Jakub Dawidek
24bda1641f Instead of detecting if lock is already initialized based on standard 1 bit
check, use more accurate 13 bits check. We had too many false-positives with
the standard check.

Reported by:	mlaier
2007-04-09 01:05:31 +00:00
Pawel Jakub Dawidek
1868634782 Always try to load zpool.cache instead of trying to find good place to
document it. When there is no such file, it's invisible for the user.
2007-04-09 00:04:54 +00:00
Pawel Jakub Dawidek
33fc425c85 We don't have to wait for the root file system to be mounted anymore, now that
kobj KPI supports operating on files loaded by the loader.
2007-04-09 00:03:45 +00:00
Pawel Jakub Dawidek
5fc5d6ed61 Drop the Giant lock before calling zfs_domount(), which is held when
mounting root file system.
2007-04-09 00:02:11 +00:00
Pawel Jakub Dawidek
f92cb15e7b Move zpool.cache from /etc/zfs/ to /boot/zfs/, so we can keep it on
dedicated /boot/ file system and use ZFS for the root file system.
2007-04-08 23:59:39 +00:00
Pawel Jakub Dawidek
bdebccf9b9 Extend kobj compatibility KPI to support operating on files before and
after the root file system is mounted.
This is one of the changes that will allow to put root file system on ZFS.
2007-04-08 23:57:08 +00:00
Pawel Jakub Dawidek
df3aed4f96 Use root_mounted(). 2007-04-08 23:54:23 +00:00
Pawel Jakub Dawidek
2eb68d493f Add root_mounted() function that returns true if the root file system is
already mounted.
2007-04-08 23:54:01 +00:00
Kip Macy
dc5a36e241 Add missing paren 2007-04-08 22:56:18 +00:00
Xin LI
9e3edba677 Bump __FreeBSDversion for CAM sg addition.
Requested by:	bsam
2007-04-08 22:45:20 +00:00
Søren Schmidt
ae4ce3ceef OK, this is not my day, fix the former fix :/ 2007-04-08 21:53:52 +00:00
Søren Schmidt
f27a14650f Hopefully unbreak the 64bit DMA support this time. 2007-04-08 19:18:51 +00:00
Kip Macy
cae1990513 remove stale variable reference 2007-04-08 18:02:37 +00:00
Pawel Jakub Dawidek
ffe54ff0ec MFp4: Synchronize with recent OpenSolaris changes. 2007-04-08 16:29:25 +00:00
Kip Macy
db2faf119f add busdma function for mapping mbuf iovecs
change m_collapse to return an error code
2007-04-08 15:59:07 +00:00
Pawel Jakub Dawidek
425d75486e - Use 'name=value' so it can be properly recognized by devd(8).
- Use only subclass as devd's type.
2007-04-08 15:55:48 +00:00
Søren Schmidt
cd945eed47 Dont zero out 64BIT flag on DMA ops. 2007-04-08 15:31:39 +00:00
Kip Macy
27f0ce0f2b hook uipc_mvec.c into build 2007-04-08 15:18:03 +00:00
Kip Macy
c0a24dd4aa Convert driver RX path over to using mbuf iovec 2007-04-08 15:04:19 +00:00
Kip Macy
a8d9a363f5 Add driver private mbuf iovec support routines 2007-04-08 14:56:16 +00:00
Pawel Jakub Dawidek
c2cda60911 prison_free() can be called with a mutex held. This wasn't a problem until
I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR,
move prison removal to prison_complete() entirely. To ensure that noone
will reference this prison before it's beeing removed from the list skip
prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to
greater than 0 in prison_hold().

Reported by:	kris
OK'ed by:	rwatson
2007-04-08 10:46:23 +00:00
Pawel Jakub Dawidek
61cfeccd58 Take vnode pointer and hold it under znode lock, so we won't race with
zfs_reclaim(). This may or may not fix problem reported by kris, but it's
definiatelly better that way.
2007-04-08 10:29:14 +00:00
Pawel Jakub Dawidek
b63b0c6529 Only use prison mutex to protect the fields that need to be protected by it. 2007-04-08 10:21:38 +00:00
Ariff Abdullah
319276aac0 Disable cmi_midiattach(). The implementation is incomplete, and causing
various interesting memory leak issues.
2007-04-08 07:52:27 +00:00
Pawel Jakub Dawidek
264de85e73 pr_list is protected by the allprison_lock. 2007-04-08 02:13:32 +00:00
Pawel Jakub Dawidek
3dc4488c91 Move atomic.S files to directories that better fit OpenSolaris directory
layout.
2007-04-07 23:54:54 +00:00
Pawel Jakub Dawidek
e321494eca Fix libzpool compilation.
Reported by:	des
2007-04-07 23:47:14 +00:00
Pawel Jakub Dawidek
9a691cb33a Limit the number of system taskq threads to the number of CPUs.
They are only used when there is a need for reducing namecache.

Observed by:	kris, csjp
2007-04-07 21:41:11 +00:00
Scott Long
1eba4c7948 Add the CAM 'SG' peripheral device. This device implements a subset of the
Linux SCSI SG passthrough device API.  The intention is to allow for both
running of Linux apps that want to talk to /dev/sg* nodes, and to facilitate
porting of apps from Linux to FreeBSD.  As such, both native and linuxolator
entry points and definitions are provided.

Caveats:
 - This does not support the procfs and sysfs nodes that the Linux SG
   driver provides.  Some Linux apps may rely on these for operation,
   others may only use them for informational purposes.
 - More ioctls need to be implemented.
 - Linux uses a naming scheme of "sg[a-z]" for devices, while FreeBSD uses a
   scheme of "sg[0-9]".  Devfs aliasis (symlinks) are automatically created
   to link the two together.  However, tools like camcontrol only see the
   native names.
 - Some operations were originally designed to return byte counts or other
   data directly as the syscall return value.  The linuxolator doesn't appear
   to support this well, so this driver just punts for these cases.

Now that the driver is in place, others are welcome to add missing
functionality.  Thanks to Roman Divacky for pushing this work along.
2007-04-07 19:40:58 +00:00
Dag-Erling Smørgrav
48be553b82 Build ZFS on amd64 and pc98.
Approved by:	pjd@
2007-04-07 19:12:10 +00:00
Dag-Erling Smørgrav
29665eac3f Fix some type mismatches.
Reviewed by:	pjd@
2007-04-07 19:11:41 +00:00
Pawel Jakub Dawidek
639fdcd852 Allow to tune maximum and minimum memory used by ARC. 2007-04-07 19:10:50 +00:00
Pawel Jakub Dawidek
2b6271b7f2 Hide SEEK_DATA and SEEK_HOLE under __BSD_VISIBLE.
Suggested by:	ache
2007-04-07 18:31:40 +00:00
Matt Jacob
7b88fb86e3 Hide bus reset announcements within bootverbose.
MFC after:	3 days
2007-04-07 18:15:52 +00:00
Pawel Jakub Dawidek
f3fdfb670c - Remove SEEK_DATA and SEEK_HOLE from stdio.h, they don't belong here.
- Only define SEEK_DATA and SEEK_HOLE in sys/unistd.h when neither
  _POSIX_SOURCE nor _XOPEN_SOURCE is defined.

Pointed out by:	bde, ache
2007-04-07 16:02:30 +00:00
Yoshihiro Takahashi
55ccb0b485 Fix build. 2007-04-07 13:37:45 +00:00
Pawel Jakub Dawidek
a583dae953 Add missing mutex_init() which was causing assertion panic when on clone
destruction.

Reported by:	kris
2007-04-07 11:04:37 +00:00
Paolo Pisati
c326cd0e62 Prevent the usage of an uninitialized variable: do not accept
StartMediaTx message before an OpnRcvChnAck message was received.

Reviewed by:	glebius
Approved by:	glebius (mentor)
MFC after:      3 days
Found with:	Coverity Prevent(tm)
CID:		498
2007-04-07 09:52:36 +00:00
Paolo Pisati
f4296f2246 Silence Coverity about an unused variable.
Reviewed by: 	glebius
Approved by: 	glebius (mentor)
MFC after: 	3 days
CID: 		538
2007-04-07 09:47:39 +00:00
KATO Takenori
f2a081cfe4 Added the IPLware 3.33 support.
- Added magic numbers to pretend the NEC original program version
    2.70.
  - Added string display routine with Shift-JIS code support.
  - Added three nop instructions at start1 in start.s since the
    installaer of the IPLware put 'call $0x09ab' instruction.
  - Put the near return instruction at 0x9ab in selector.s.

Since the Shit-JIS display routine must be located at 0x1243, the
linker script file (ldscript) is applied.
2007-04-07 08:37:04 +00:00
Kip Macy
d330ae533a back out last change
Requested by: ru
2007-04-07 05:09:40 +00:00
Hidetoshi Shimokawa
54911451d5 Fix a bug for over 4GB media.
MFC after: 3 days
2007-04-07 02:52:13 +00:00
Robert Watson
7b20aa9ca6 Remove XXX comment that changes to file fields should be protected with
the file lock rather than the filedesc lock: I fixed this in the last
revision.

Spotted by:	kris
2007-04-06 23:31:30 +00:00
Alexander Kabaev
7d80a3b493 pc98 boot2 is compiled with _KERNEL defined, and that makes non-static
bootinfo variable declaration visible. It conflicts with static
declaration in this file. Declare variable as globally visible in
order to resolve the conflict.
2007-04-06 20:50:24 +00:00
Jung-uk Kim
6e612eca81 Fix kernel module dependency. linprocfs depends on sysvmsg and sysvsem.
Submitted by:	nork
2007-04-06 18:15:56 +00:00
Ruslan Ermilov
2e137367b4 Add the PG_NX support for i386/PAE.
Reviewed by:	alc
2007-04-06 18:15:03 +00:00
Søren Schmidt
fe2fb53542 Add 64bit addressing support to SiI 3132/3124 2007-04-06 17:36:35 +00:00
Søren Schmidt
2cfcfef1fc Remove debug gunk. 2007-04-06 16:21:34 +00:00
Søren Schmidt
16194fc40b Add support for 64bit addressing to AHCI and Marvell controllers.
Munged into ATA shape and Marvell specifics my yours truely.

Submitted by: jhb
2007-04-06 16:18:59 +00:00
Pawel Jakub Dawidek
68474f1930 Sysctl description is not a format string, so one % is enough. 2007-04-06 12:53:54 +00:00
Yoshihiro Takahashi
bc30e6ae00 MFi386: add libkern/memset.c 2007-04-06 11:30:31 +00:00
Yoshihiro Takahashi
9f94082ed0 sort. 2007-04-06 11:29:52 +00:00
Pawel Jakub Dawidek
93caf77f95 Use strcasecmp() from libkern. 2007-04-06 11:21:01 +00:00
Pawel Jakub Dawidek
4d00f78b40 We have strcasecmp() in libkern now. 2007-04-06 11:18:57 +00:00
Kip Macy
735d79b8df make modules compile without updating etc 2007-04-06 06:05:45 +00:00
Alexander Kabaev
89c40e5fec Be more conservative and compile libkern/memset.c only on architectures
than need it. These are i386, amd64 and powerpc so far.
2007-04-06 04:51:50 +00:00
Pawel Jakub Dawidek
ba7c08b71b Bump __FreeBSD_version on ZFS import.
Requested by:	nork
2007-04-06 02:33:43 +00:00
Pawel Jakub Dawidek
ceef0c312c Connect ZFS to the build. 2007-04-06 02:13:30 +00:00
Pyun YongHyeon
ad6d01d151 If we've encountered unrecognized chipset don't access hardware
anymore. Previously it tried to access interrupt register to disable
interrupts which could result in hang if the hardware was not
properly initialized by system BIOS/ACPI.

Tested by:	Benjamin Hansmann (benjamin.hansmann AT rub dot de)
MFC after:	3 days
2007-04-06 02:02:07 +00:00
Pawel Jakub Dawidek
2109a92fd1 Add Makefile for zfs.ko kernel module. 2007-04-06 01:35:16 +00:00
Pawel Jakub Dawidek
e726fc7c37 Add ZFS-specific privileges. 2007-04-06 01:11:39 +00:00
Pawel Jakub Dawidek
f0a75d274a Please welcome ZFS - The last word in file systems.
ZFS file system was ported from OpenSolaris operating system. The code in under
CDDL license.

I'd like to thank all SUN developers that created this great piece of software.

Supported by:	Wheel LTD (http://www.wheel.pl/)
Supported by:	The FreeBSD Foundation (http://www.freebsdfoundation.org/)
Supported by:	Sentex (http://www.sentex.net/)
2007-04-06 01:09:06 +00:00
Alexander Kabaev
c8c0ba192e Add local ptototype for memset function. 2007-04-06 00:06:26 +00:00
Pawel Jakub Dawidek
028e84c68b allprison mutex was converted to sx(9) lock. 2007-04-05 23:32:32 +00:00
Pawel Jakub Dawidek
dc68a63332 Implement functionality I called 'jail services'.
It may be used for external modules to attach some data to jail's in-kernel
structure.

- Change allprison_mtx mutex to allprison_sx sx(9) lock.
  We will need to call external functions while holding this lock, which may
  want to allocate memory.
  Make use of the fact that this is shared-exclusive lock and use shared
  version when possible.
- Implement the following functions:
  prison_service_register() - registers a service that wants to be noticed
	when a jail is created and destroyed
  prison_service_deregister() - deregisters service
  prison_service_data_add() - adds service-specific data to the jail structure
  prison_service_data_get() - takes service-specific data from the jail
	structure
  prison_service_data_del() - removes service-specific data from the jail
	structure

Reviewed by:	rwatson
2007-04-05 23:19:13 +00:00
Alexander Kabaev
616db5f04c Add trivial MI memset function implementation. GCC mandates the
existence of this function as a linkable symbol in standalone
configurations and existing inline memcpy from libkern.h fails
this requirement.
2007-04-05 22:02:39 +00:00
Pawel Jakub Dawidek
54b369c1ae Make prison_find() globally accessible. 2007-04-05 21:34:54 +00:00
Pawel Jakub Dawidek
f6521d1c31 Implement SEEK_DATA and SEEK_HOLE extensions to lseek(2) as found in
OpenSolaris. For more information please refer to:

	http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data
2007-04-05 21:10:53 +00:00
Pawel Jakub Dawidek
f3a8d2f93c Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by:	rwatson
2007-04-05 21:03:05 +00:00
Pawel Jakub Dawidek
0f2c2ce0a3 When KVA is exhausted, try the vm_lowmem event for the last time before
panicing. This helps a lot in ZFS stability.
2007-04-05 20:52:51 +00:00
Pawel Jakub Dawidek
fcdd9721e4 Fix a problem for file systems that don't implement VOP_BMAP() operation.
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(),
which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg.
EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before'
and 'after' arguments, so we have some accidental values there. This bascially
was causing this condition to be meet:

	if ((rahead + rbehind) >
	    ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) {
		pagedaemon_wakeup();
		[...]
	}

(we have some random values in rahead and rbehind variables)

I'm not entirely sure this is the right fix, maybe we should just return FALSE
in vnode_pager_haspage() when VOP_BMAP() fails?

alc@ knows about this problem, maybe he will be able to come up with a better
fix if this is not the right one.
2007-04-05 20:49:46 +00:00
Pawel Jakub Dawidek
24c3c19e73 Hide lbolt under _SOLARIS_C_SOURCE in preparation for ZFS import.
I really couldn't avoid this with preprocessor magic.
2007-04-05 20:40:47 +00:00
Marcel Moolenaar
9760f68ca0 Add PCI IDs for the HP RMP3 serial port. This is often used as
the serial console.

MFC after: 1 week
2007-04-05 19:15:46 +00:00
Alexander Kabaev
b27c252dcf Remove extern struct pcb stoppcbs[] declaration from this file.
It breaks GCC 4.1 compiles and does not appear to be required.
2007-04-05 18:34:11 +00:00
Dag-Erling Smørgrav
56c62ab69c Whitespace nits. 2007-04-05 13:43:00 +00:00
Kip Macy
0f4d9d04ea Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone,
simplify setting refcnt

Reviewed by: andre, rwatson, and glebius
MFC after: 3 days
2007-04-04 21:27:01 +00:00
Andre Oppermann
995a77176f Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some
further INP_INFO_WLOCK_ASSERT() while there.
2007-04-04 18:30:16 +00:00
Andre Oppermann
0c38fd0a7a Move last tcpcb initialization for the inbound connection case from
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.

The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
2007-04-04 16:13:45 +00:00
Andre Oppermann
beaa515e95 Some local and style(9) cleanups. 2007-04-04 15:30:31 +00:00
Andre Oppermann
5dd9dfefd6 Retire unused TCP_SACK_DEBUG. 2007-04-04 14:44:15 +00:00
Andre Oppermann
b728e90260 In tcp_dooptions() skip over SACK options if it is a SYN segment. 2007-04-04 14:39:49 +00:00
Robert Watson
5e3f7694b1 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
Xin LI
04533fc68e Use *_EMPTY macros when appropriate. 2007-04-04 07:29:53 +00:00
Kip Macy
fa0521c0e9 Make DMA tags per-queue to facilate parallel mappings
Defer mbuf allocation and initialization until after data has already been
received in a cluster

This reduces cpu utilization somewhat, but it only improves the rx path.
Recent changes to TCP appear to make us rate limited by the TX path.

This is the first step in reducing mbuf management overhead for manipulating
clusters.

MFC after: 3 days
2007-04-04 05:29:18 +00:00
Kip Macy
e0bfe940a4 m_extadd does not appear to do the right thing for the case of clusters
allocated from UMA - add m_cljset to correspond to m_cljget

MFC after: 3 days
2007-04-04 04:08:57 +00:00
Alexander Kabaev
edb2e5dca3 Include string.h for non-kernel builds to get proper memcpy prototype. 2007-04-04 03:16:59 +00:00
Alexander Kabaev
d8164209b3 Include string.h for non-kernel builds to get proper strcpy, strlen
prototypes.
2007-04-04 03:14:15 +00:00
Alexander Kabaev
9160afee7c Do not assign result of (char *) cast to u_char * variable. 2007-04-04 03:10:42 +00:00
Kip Macy
ab43ffd2f6 add helper functions for mapping size to zonez and types
eliminate duplicated zone lookup switch statements
2007-04-04 00:31:49 +00:00
Kip Macy
59a31e6acf fix typo 2007-04-04 00:11:22 +00:00
Kip Macy
e2bc106690 style fixes and make sure that the lock is treated as released in the sharers == 0 case
not that this is somewhat racy because a new sharer can come in while we're updating stats
2007-04-04 00:01:05 +00:00
Kip Macy
afc0bfbd90 Fixes to sx for newsx - fix recursed case and move out of inline
Submitted by: Attilio Rao <attilio@freebsd.org>
2007-04-03 22:58:21 +00:00
Kip Macy
70fe8436c8 move lock_profile calls out of the macros and into kern_mutex.c
add check for mtx_recurse == 0 when releasing sleep lock
2007-04-03 22:52:31 +00:00
Julian Elischer
1bd69ee131 Since we switched to using monatomically increasing timestamps,
they have been reported back to the userland as being in 1970.
Add boot time to the timestamp to give the time in the scale of the 'current'
real timescale.  Not perfect if you change the time a lot but good enough
to keep all the rules correct relative to each other correct in terms
of time relative to "now".
2007-04-03 22:45:50 +00:00
Kip Macy
8289600ce7 skip call to _lock_profile_obtain_lock_success entirely if acquisition time is non-zero
(i.e. recursing or adding sharers)
2007-04-03 18:36:27 +00:00
Alexander Kabaev
585b090609 Add dl_iterate_phdr function prototype and corresponding dl_phdr_info
structure definition.
2007-04-03 18:33:41 +00:00
Kip Macy
802d9610eb Remove unneccessary LO_CONTESTED flag 2007-04-03 17:57:50 +00:00
Robert Watson
6246c6e2a7 Fix use after free bug: use temporary variable to hold next entry in linked
list while freeing current entry, rather than using the free'd entry's next
pointer.

Found with:	Coverity Prevent(tm)
CID:		1333
2007-04-03 12:45:10 +00:00
Pawel Jakub Dawidek
afd894bb12 Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.
2007-04-03 11:45:28 +00:00
Randall Stewart
bff64a4db3 - fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
  packet (1000 addresses).
- flight size correcting updated to include one message only
  and to handle case where the peer does not cumack the
  next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
  since we tried to unlock the destroyed mutex. Fixes
  so we properly exit when we need to destroy an assoc.
  (Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
  and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by:	gnn
2007-04-03 11:15:32 +00:00
Andrew Gallatin
e39a0a37cf - Fix a bug in the TSO transmit routine where frames which had
been defragged and had their headers in the same cluster as their
payload would be fed to the NIC in header-sized chunks, and would
likely exceed the number of available transmit descriptors.

- If a TSO frame exceeds the number of available transmit descriptors,
don't leak busdmma resources when freeing it.

Sponsored by: Myricom Inc.
2007-04-03 10:41:33 +00:00
Kevin Lo
6d361569d5 Since the driver uses mutexes, remove splusb() and splx(). 2007-04-03 05:59:17 +00:00
Alexander Kabaev
02b71ede34 Correct PT_GNU_EH_FRAME definition. 2007-04-03 01:47:07 +00:00
Marcel Moolenaar
35777a2a79 Don't use a time-limiting loop that's defined in terms of the baudrate
in the putc() method.  Likewise, in the getc() method, don't check for
received characters with an interval defined in terms of the baudrate.
In both cases it works equally well to implement a fixed delay.  More
importantly, it avoids calculating a delay that's roughly 1/10th the
time it takes to send/receive a character. The calculation is costly
and happens for every character sent or received, affecting low-level
console or debug port performance significantly. Secondly, when the
RCLK is not available or unreliable, the delays could disrupt normal
operation.

The fixed delay is 1/10th the time it takes to send a character at
230400 bps.
2007-04-03 01:21:10 +00:00
Marcel Moolenaar
f8100ce2a7 Don't expose the uart_ops structure directly, but instead have
it obtained through the uart_class structure. This allows us
to declare the uart_class structure as weak and as such allows
us to reference it even when it's not compiled-in.
It also allows is to get the uart_ops structure by name, which
makes it possible to implement the dt tag handling in uart_getenv().
The side-effect of all this is that we're using the uart_class
structure more consistently which means that we now also have
access to the size of the bus space block needed by the hardware
when we map the bus space, eliminating any hardcoding.
2007-04-02 22:00:22 +00:00
Warner Losh
cf5bdd4446 Loop on sdcard init. This helps if one hasn't plugged in the card
fast enough, or there's other issues that cause the first try to fail.
2007-04-02 20:26:04 +00:00
John Baldwin
1ce2bc9187 Fix a fd leak in socketpair():
- Close the new file objects created during socketpair() if the copyout of
  the new file descriptors fails.
- Add a test to the socketpair regression test for this edge case.
2007-04-02 19:15:47 +00:00
Jung-uk Kim
0a55a034ba Enable MSI support on RELENG_6.
MFC after:	3 days
2007-04-02 19:09:06 +00:00
Jung-uk Kim
357afa7113 MFP4: Turn emul_lock into a mutex.
Submitted by:	rdivacky
2007-04-02 18:38:13 +00:00
John Baldwin
ddda35b8f6 - Split out the part of SYSCALL_MODULE_HELPER() that builds a 'struct
sysent' for a new system call into a new MAKE_SYSENT() macro.
- Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in
  the NFS server and use syscall_register() and syscall_deregister() to
  manage the nfssvc system call entry instead of manually frobbing the
  sysent[] array.
2007-04-02 13:53:26 +00:00
John Baldwin
ebb3c22c16 Don't go to a whole lot of extra work to handle the race where the new
file descriptor is closed out from under us in kern_open().  This race
is already handled and the file will be closed when kern_open() does an
fdrop just before returning.
2007-04-02 13:40:38 +00:00
Ariff Abdullah
f505e02090 Revert busy refcount back to int. As a side note, multiple open
is still (and always) possible and does not change previous behaviour.

Requested by:	netchild
2007-04-02 10:24:15 +00:00
Ariff Abdullah
ff7499570c Disable seq_modevent(). The implementation is incomplete, and causing
memory leak during unload.
2007-04-02 06:03:47 +00:00
Pyun YongHyeon
75a1d5a086 Use our own timer for watchdog instead of if_watchdog/if_timer
interface.
2007-04-02 04:43:41 +00:00
Ariff Abdullah
3627e77dfa No need to track every closing instance, and put busy counter to rest
in its single bit coffin.
2007-04-02 03:46:25 +00:00
Scott Long
15735bec61 Freeze the simq, not the devq, if we run out of command slots. This fixes
the last round of reported instability in the rev 13/14 driver.

Approved by: Erich Chen
2007-04-02 03:31:37 +00:00
Ariff Abdullah
a9be51acfe Provide hint / tunable for possible asynchronous USB execution. Async
execution should help us avoiding potential deadlock and illegal locking
while sleeping in various mixer -> usb calls. To enable it, use
hint.uaudio.%d.async="1" or sysctl dev.uaudio.%d.async=1. Default is
disable, to remain compatible with old behaviour (with slight risk of
potential deadlock).
2007-04-02 03:25:39 +00:00
Ariff Abdullah
72e9d07fbf - Don't wakeup() unnecessarily, so the behavior of dead interrupt or
stalled DMA engine can be observed and predicted.
- Minor sysctl/tunable cleanup.
2007-04-02 03:03:06 +00:00
Matt Jacob
9a1b0d43c2 Temporarily desupport simultaneous target and initiator mode.
When the linux port changes were imported which split the
target command list to be separate from the initiator command
list and the handle format changed to encode a type in the handle
the implications to the function isp_handle_index (which only
the NetBSD/OpenBSD/FreeBSD ports use) were overlooked.

The fault is twofold: first, the index into the DMA maps
in  isp_pci is wrong because a target command handle with
the type bit left in place caused a bad index (and panic)
into dma map. Secondly, the assumption of the array
of DMA maps in either PCS or SBUS attachment structures is
that there is a linear mapping between handle index and
DMA map index. This can no longer be true if there are
overlapping index spaces for initiator mode and target
mode commands.

These changes bandaid around the problem by forcing us
to not have simultaneous dual roles and doing the appropriate
masking to make sure things are indexed correctly. A longer
term fix is being devloped.
2007-04-02 01:04:20 +00:00
Alexander Leidinger
02da6fa190 Handle errors from bus_setup_intr().
Found by:	Coverity Prevent (tm)
CID:		1066
2007-04-01 16:55:31 +00:00
Alexander Leidinger
68af68014e Tell the user when the setup of the interrupt handler failed and return
an error.

Found by:	Coverity Prevent (tm)
CID:		71-78
2007-04-01 16:52:54 +00:00
Wojciech A. Koszek
4850546f51 ng_node and ng_worklist locks both migrated from being spinning locks to
adaptive mutexes. Let witness(4) calm down and bring proper types of those
locks to the lock order database.

Glanced at by:	rwatson
2007-04-01 15:48:10 +00:00
Pawel Jakub Dawidek
4874b3fb12 More style nits. 2007-04-01 15:40:56 +00:00
Alexander Leidinger
c9be0e5d4d Tell a statistic checker that not checking the return value of the probing
of the mii phy is intended for this chip.

Found by:	Coverity Prevent (tm)
CID:		43
2007-04-01 14:15:26 +00:00
Alexander Leidinger
2acfcc2d4c Make it obvious that we don't care about the return value of
usbd_endpoint_count(), the failure case is handled implicit in the
following code.

Found by:	Coverity Prevent (tm)
CID:		56
2007-04-01 13:46:39 +00:00
Pawel Jakub Dawidek
daa88cdf0a Style nit. 2007-04-01 13:41:10 +00:00
Pawel Jakub Dawidek
5c1c2e82e2 I think the code I'm removing here is completely bogus.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.

What this code did was bascially this:

If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).

If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).

The latter will be quite dangerous, but those flags are reset later in
vfs_domount().

MFC after:	1 month
2007-04-01 13:08:05 +00:00
Craig Rodrigues
3b1b4d767f Change #include <machine/pcpu.h> to #include <sys/pcpu.h>
to get definition of curthread, required by <sys/sx.h>.
2007-04-01 12:48:10 +00:00
Robert Watson
af940ed8c0 If nooption SMP on powerpc, also nooption ADAPTIVE_SX, which depends on
SMP and is now in the global NOTES.
2007-04-01 11:10:16 +00:00
Pawel Jakub Dawidek
def72fbba1 Now that the vdropl() function is public, assert that the vnode interlock
is held.
2007-04-01 10:45:32 +00:00
Marcel Moolenaar
d71cc3c89d Add bge(4).
Fix a white-space nit while I'm here.
2007-04-01 06:24:19 +00:00
Marcel Moolenaar
37402373e9 When writing to PCI configuration registers, don't immediately
read the same register back. It can cause hangs or machine
checks in certain cases. One particular case is with bge(4)
when a reset is initiated for the controller.

MFC after: 1 month
2007-04-01 06:15:53 +00:00
Marcel Moolenaar
447e3a84cc Remove unused file. 2007-04-01 00:41:01 +00:00
Dag-Erling Smørgrav
e6534b36d8 Make vdropl() public; zfs needs it. There is also plenty of existing
file system code (mostly *_reclaim()) which look like this:

    VOP_LOCK(vp);
    /* examine vp */
    VOP_UNLOCK(vp);
    vdrop(vp);

This can now be rewritten to:

    VOP_LOCK(vp);
    /* examine vp */
    vdropl(vp); /* will unlock vp */

MFC after:	1 week
2007-03-31 23:57:17 +00:00
John Baldwin
4e7f640dfb Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
Sam Leffler
511cecafd6 oops, another missed file from crypto api change 2007-03-31 23:15:11 +00:00
Pawel Jakub Dawidek
695919ad9a Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them. 2007-03-31 22:44:45 +00:00
John Baldwin
4dc5078f81 Add constants for the fields in a BAR. Also, add two new macros
PCI_BAR_(IO|MEM)() that return true if the passed in value from a BAR
is for an IO or memory BAR, respectively.

Reviewed by:	imp
2007-03-31 21:39:02 +00:00
Matt Jacob
9f9e9ae3a7 Fix compilation problem (add a const) for pre-7.0 compiles. 2007-03-31 21:01:35 +00:00
John Baldwin
657d9f9f55 - Add missing constants for subclasses.
- Add a few progif constants as well.
2007-03-31 20:41:00 +00:00
Robert Watson
e92d773fbc Rather than ignoring any error return from getnewvnode() in nameiinit(),
explicitly test and panic.  This should not ever happen, but if it does,
this is a preferred failure mode to a NULL pointer dereference in kernel.

Coverity CID:	1716
Found with:	Coverity Prevent(tm)
2007-03-31 16:08:50 +00:00
Wojciech A. Koszek
4abab3d593 We don't need spinning locks here. Change them to the adaptive mutexes. This
change should bring no performance decrease, as it did not in my tests.

Reviewed by:	julian, glebius
Approved by:	cognet (mentor)
2007-03-31 15:43:06 +00:00
Alexander Leidinger
c2bb6a54ef Tell interested readers of the source that the return value is not
checked by intend.

Found by:	Coverity Prevent (tm)
CID:		55
Reviewed by:	ariff
2007-03-31 13:38:12 +00:00
Randall Stewart
5e54f665f0 - Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
  both how big a msg must be as well as how much needs
  to be left over.
- With our new algo in place, we need to implicitly
  set "end of msg" on the sp-> structure otherwise we
  end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
  header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
  will kill the assoc via the kill timer and send an
  abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
  us from fragmenting a small complete message when
  we needed to.
- Window probes were not marked back to unsent and
  flight adjusted when a sack came in with no
  window change or accepting of the probe data.
  We now fix this with having a mark on the net and
  the chunk so we can clear it out when the sack arrives
  forcing it to retran just like it was "new" this
  improves the handling of window probes, which were
  dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
2007-03-31 11:47:30 +00:00
Jung-uk Kim
46bd727a1e Correct BB-profiling and adjust comments.
Pointed out by:	bde
Reviewed by:	bde
2007-03-31 01:47:37 +00:00
Jung-uk Kim
6a4abad780 Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by:	bde
Discussed with:	kib
Reviewd by:	bde
2007-03-30 23:19:08 +00:00
Hidetoshi Shimokawa
437a3435c5 Teardown interrupt only when sc->ih is not NULL.
MFC after: 3 days
2007-03-30 22:25:26 +00:00
Jung-uk Kim
80f87d5e55 Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.
Pointed out by:	bde[1]
2007-03-30 19:33:53 +00:00
Jung-uk Kim
6403d3a160 Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.
Pointed out by:	bde
2007-03-30 18:27:57 +00:00
John Baldwin
028923e54d - Use PARTIAL_PICKUP_GIANT() to implement PICKUP_GIANT().
- Move UGAR() macro up to the comment that describes it.
- Fix a couple of typos.
2007-03-30 18:10:08 +00:00