Commit Graph

66775 Commits

Author SHA1 Message Date
John Birrell
ff13848395 Remove the last 3 files I missed. These have been repo copied to the new
location under a cddl part of the tree following the core@ license review.
2008-03-28 00:28:45 +00:00
Attilio Rao
6aa2100c49 Instruments buffer lock objects in order to track correctly consumers
consumers in locking operations.
While here, operates some style(9) cleanups.
2008-03-28 00:14:33 +00:00
John Birrell
8f0cc58815 Remove files that have been repo copied to their new location
in cddl-specific parts of the source tree.
2008-03-28 00:08:47 +00:00
John Birrell
e327f52446 The sources covered by Sun's CDDL have been repo copied below the
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.

This change modifies the affected Makefiles to reference the sources
in their new location.
2008-03-27 23:21:25 +00:00
Alexander Motin
244586d6f1 Remove ng_setisr() call from ng_dequeue(). It is useless as we any way
will never exit ngintr(), while there is some ready requests on the queue.
It was made years ago with hope of parallel queue processing by several
net threads. But even if we have several threads sometimes, we have no
rights to process queue in parallel as it will break original requests
serialization that is critically important for some setups.
2008-03-27 23:02:30 +00:00
Antoine Brodin
10f0bcab61 Remove option headers that do not exist and are not used
from the Makefiles in sys/modules.
(opt_devfs.h, opt_bdg.h, opt_emu10kx.h and opt_uslcom.h)

Approved by:	rwatson (mentor)
2008-03-27 20:38:03 +00:00
Alexander Motin
c86d865ec8 Switch from timeval to bintime, to use 1/(2^20) of seconds instead of
microseconds. It allows to use bit shifts instead of some heavy 64bit
mul/div math operations.
2008-03-27 20:04:20 +00:00
Ian Dowse
f5f1525321 Add IFF_NEEDSGIANT to IFF_CANTCHANGE, to prevent user-level code
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.

Submitted by:	Alexey Popov
Reviewed by:	rwatson
MFC after:	3 days
2008-03-27 18:02:30 +00:00
Doug Rabson
6b0d16d374 Add nfslockd and krpc modules. 2008-03-27 11:55:03 +00:00
Doug Rabson
fa9d9930ca Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Alan Cox
97dbe5e48e MFamd64 with few changes:
1. Add support for automatic promotion of 4KB page mappings to 2MB page
   mappings.  Automatic promotion can be enabled by setting the tunable
   "vm.pmap.pg_ps_enabled" to a non-zero value.  By default, automatic
   promotion is disabled.  Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
   PTE if that PTE has the PG_RW bit set.  However, this assumption does
   not hold on recent processors from Intel.  For example, consider a PTE
   that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
   is cached in the TLB and later the PG_RW bit is cleared in the PTE,
   but the corresponding TLB entry is not (yet) invalidated.
   Historically, upon a write access using this (stale) TLB entry, the
   TLB would observe that the PG_RW bit had been cleared and initiate a
   page fault, aborting the setting of the PG_M bit in the PTE.  Now,
   however, P4- and Core2-family processors will set the PG_M bit before
   observing that the PG_RW bit is clear and initiating a page fault.  In
   other words, the write does not occur but the PG_M bit is still set.

   The real impact of this difference is not that great.  Specifically,
   we should no longer assert that any PTE with the PG_M bit set must
   also have the PG_RW bit set, and we should ignore the state of the
   PG_M bit unless the PG_RW bit is set.
2008-03-27 04:34:17 +00:00
John Birrell
fc70c0bdd8 Regen after makesyscalls.sh change. 2008-03-27 01:55:06 +00:00
John Birrell
e994ea8e55 Generate another function for the DTrace syscall provider to specify
the syscall argument types.

This code is only compiled into the systrace kernel modul and has no
effect otherwise.
2008-03-27 01:53:44 +00:00
Attilio Rao
e15e150d76 Really, smb_iod_main() is not totally MPSAFE, so just acquire and drop
Giant around it in order to assume MPSAFETY.

Reported by:	jhb, rwatson
Pointy hat to:	attilio
2008-03-27 01:23:59 +00:00
Poul-Henning Kamp
dad3b6c6fd Back in the good old days, PC's had random pieces of rock for
frequency generation and what frequency the generated was anyones
guess.

In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.

The other relevant property of those machines, is that they
typically had no more than 16MB RAM.

These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.

Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.

If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
2008-03-26 22:12:00 +00:00
Poul-Henning Kamp
1d73a9dc74 Further cleanup of sound generation in syscons:
The timer_spkr_*() functions take care of the enabling/disabling
of the speaker.

Test on the existence of timer_spkr_*() functions, rather than
architectures.
2008-03-26 22:02:51 +00:00
Poul-Henning Kamp
93f5134aaf Make speaker a pseudo device driver instead of attaching to a PnP id.
If somebody cleaned this code up to proper style(9), it could become
a great educational starting point for aspiring kernel hackers.
2008-03-26 21:33:41 +00:00
Robert Watson
61e175d59d Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).

MFC after:	4 months
Suggested by:	csjp
2008-03-26 21:29:13 +00:00
Alexander Motin
714f558be1 Some minor code and math optimizations. 2008-03-26 21:19:03 +00:00
John Baldwin
d952ba1bd5 Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo'
(such as 'atime' vs 'noatime').  The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'.  As such, their list of valid
mount options should include 'nofoo' instead of 'foo'.  With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error.  You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).

MFC after:	1 week
Reviewed by:	crodig
2008-03-26 20:48:07 +00:00
Poul-Henning Kamp
8c0b6469bf Remove two variables which are handled MI now. 2008-03-26 20:28:52 +00:00
Poul-Henning Kamp
3a995824f6 Eliminate unnecessary #includes 2008-03-26 20:26:12 +00:00
Poul-Henning Kamp
e465985885 The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
	timer_spkr_acquire()
	timer_spkr_release()
and
	timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all.  In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that.  It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
2008-03-26 20:09:21 +00:00
Doug Rabson
159f35a54a Bump __FreeBSD_version for the addition of 'l_sysid' to the flock structure. 2008-03-26 15:41:00 +00:00
Ed Maste
a620bad028 Add \n to the end of a printf string and remove it from panic strings. 2008-03-26 15:28:56 +00:00
Doug Rabson
a7ac0db6cb Regen. 2008-03-26 15:24:02 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
Poul-Henning Kamp
ebfbcd612a Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
Poul-Henning Kamp
f168bfa529 The RTC related pscnt and psdiv variables have no business being public. 2008-03-26 13:25:27 +00:00
Poul-Henning Kamp
b416a29043 Remove old sysctl stuff which is long gone in other arch's. 2008-03-26 13:03:51 +00:00
Christian Brueffer
662cac9f23 Fix some "in in" typos in comments.
PR:		121490
Submitted by:	Anatoly Borodin <anatoly.borodin@gmail.com>
Approved by:	rwatson (mentor), jkoshy
MFC after:	3 days
2008-03-26 07:32:08 +00:00
Alan Cox
fdcd29b52b Enable the automatic creation of superpage reservations. 2008-03-26 03:12:00 +00:00
Sam Leffler
658d4b51ac split out tty create part of ucom_attach into ucom_attach_tty so
derived drivers can use it

Submitted by:	Jared Go
MFC after:	3 weeks
2008-03-25 23:46:24 +00:00
Sam Leffler
162382facd add some CDMA modems
Submitted by:	Jared Go
MFC after:	1 week
2008-03-25 23:35:32 +00:00
Scott Long
478cfc7300 Implement taskqueue_block() and taskqueue_unblock(). These functions allow
the owner of a queue to block and unblock execution of the tasks in the
queue while allowing tasks to continue to be added queue.  Combining this
with taskqueue_drain() allows a queue to be safely disabled.  The unblock
function may run (or schedule to run) the queue when it is called, just as
calling taskqueue_enqueue() would.

Reviewed by: jhb, sam
2008-03-25 22:38:45 +00:00
Ed Maste
523da39bcc Add 64-bit array support for RAIDs > 2TB. This corresponds to ~ Adaptec
driver build 15317.

Tested on:
Adaptec 2230S, Firmware 4.2-0 (8205)
ICP ICP5085BL, Firmware 5.2-0 (12814)

Submitted by:	Adaptec
2008-03-25 21:39:06 +00:00
Sam Leffler
85a8a1ddff add __noinline
Submitted by:	imp
Reviewed by:	kan (long ago)
MFC after:	3 weeks
2008-03-25 21:30:01 +00:00
Sam Leffler
fb27dd1db3 expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after:	3 weeks
2008-03-25 21:23:32 +00:00
Sam Leffler
acaf1de6db IFM_IEEE80211_IBSSMASTER hasn't been used in many years; replace it
with IFM_IEEE80211_WDS which will be used by the forthcoming vap code

MFC after:	3 weeks
2008-03-25 21:22:43 +00:00
Sam Leffler
9e340a6190 enable dynamic addition of "show all" commands
MFC after:	3 weeks
2008-03-25 20:36:32 +00:00
John Baldwin
5c63b21a1a Regen. 2008-03-25 19:35:34 +00:00
John Baldwin
30c6422a8a Add entries for the cpuset-related system calls. The existing system calls
can be used on little endian systems.

Pointy hat to:	jeff
2008-03-25 19:34:47 +00:00
Ed Maste
54e2ebdfc2 Correct data direction flags in aac_bio_command() in the
!AAC_FLAGS_RAW_IO && AAC_FLAGS_SG_64BIT case.

Submitted by:   Adaptec
2008-03-25 18:34:04 +00:00
Ruslan Ermilov
d7a38db650 Fix build.
Reported by:	ache, tinderbox
2008-03-25 13:20:52 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Ruslan Ermilov
b2798e2573 Regen after changing prototypes of cpuset_{get,set}affinity(). 2008-03-25 09:14:17 +00:00
Ruslan Ermilov
7f64829a5e Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t.
Prodded by:	davidxu
2008-03-25 09:11:53 +00:00
Robert Watson
fa0c2b3474 Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after:	4 months
2008-03-25 07:41:33 +00:00
Weongyo Jeong
3c7e78d32d Add support for Marvell Libertas 88W8335 based PCI network adapters.
Reviewed by:	sam, many wireless people
Approved by:	thompsa (mentor)
2008-03-25 06:32:33 +00:00
Alexander Motin
489290e9e9 Rewrite node to support multiple hooks, alike to ng_l2tp, to use one pair
of pptpgre and ksocket nodes for all calls between two peers. This patch
modifies node's API by adding new "session_%04x" hook names support, while
keeping backward compatibility.

Together with appropriate user-level support (by latest mpd5) it gives
huge performance benefits for case of multiple active calls between
two peers because of avoiding data duplication and extra socket processing.
On my benchmarks I have got more then 10 times speedup for the 200
simultaneous PPTP calls between two peers.
In conclusion, it allows now to build effective "clients <=> PAC <=> PNS"
setups.
2008-03-24 22:55:22 +00:00
Jung-uk Kim
cb7d38abf2 Belatedly add BPF_JITTER in NOTES for supported architectures. 2008-03-24 22:23:22 +00:00
Jung-uk Kim
b83a219e9b Fix build with option BPF_JITTER. 2008-03-24 22:21:32 +00:00
Jung-uk Kim
892547230b Remove redundant inclusions of net/bpfdesc.h. 2008-03-24 22:16:46 +00:00
Kip Macy
e79dd20dd5 change inp_wlock_assert to inp_lock_assert 2008-03-24 20:24:04 +00:00
Ed Maste
31a0399e57 Diff reduction to Adaptec's driver (around build 15317): catch up with a
change in debugging routines.

The fwprintf macro in the AAC_DEBUG case (mapping to printf) isn't from the
Adaptec driver.
2008-03-24 19:23:33 +00:00
Sam Leffler
3be798ba3a o add M_PROTO[678]; they'll be needed by net80211 vap code
o sort mbuf flags together and extend values to 32 bits
o write M_COPYFLAGS in terms of M_PROTOFLAGS
o move M_COPYFLAGS and M_PROTOFLAGS up to be together with flag defs

Reviewed by:	rwatson
MFC after:	3 weeks
2008-03-24 19:01:29 +00:00
Marius Strobl
5259569262 - Const'ify the bus_stream_asi and bus_type_asi arrays.
- Replace hard-coded functions names missed in bus_machdep.c rev. 1.44
  with __func__.
- Break some long lines.

MFC after:	1 month
2008-03-24 17:57:01 +00:00
Marius Strobl
23a6342bb7 - Take advantage of bus_dmamap_load_mbuf_sg(9).
- Take advantage of m_collapse(9).
- Sync with other NIC drivers and prepend a TX mbuf if the first attempt
  to load it fails with an error other than EFBIG and stop trying instead
  of freeing it and keeping on trying to enqueue more mbufs. Also ensure
  the driver queue isn't empty before trying to enqueue mbufs in order to
  reduce locking operations.
- In xl_ifmedia_upd() add a missing XL_UNLOCK(). [1]
- Const'ify the xl_devs array.
- Remove an outdated comment.

PR:		113406 [1]
MFC after:	1 month
2008-03-24 17:49:06 +00:00
Marius Strobl
ebc284cc83 - Const'ify the dc_devs array.
- Correct the maxsize parameter when creating the mbufs busdma tag to
  reflect the actual requirement of dc(4).
- Move the KASSERT in dc_newbuf() to the right spot.
- Also convert the TX side to take advantage of bus_dmamap_load_mbuf_sg(9).
- Move the comment regarding dc_start_locked() to the right spot.

MFC after:	2 weeks
2008-03-24 17:38:24 +00:00
Marius Strobl
bd3d9826d7 Split the registers into two halves in preparation for SBus support.
Obtained from:	NetBSD (loosely)
MFC after:	2 weeks
2008-03-24 17:23:53 +00:00
Ed Maste
04f4d586b7 Diff reduction to Adaptec driver build 15317 (refactoring and code shuffling):
- Resource allocation in aac_alloc (moved from from aac_init)
- Interrupt setup in aac_setup_intr (from aac_attach)
- Container probing in aac_get_container_info (from aac_startup and
  aac_handle_aif)
- Firmware status check moved to aac_check_firmware from aac_init
2008-03-24 16:38:47 +00:00
Bjoern A. Zeeb
44c92dbb34 Fix a bug that when getting/dumping the soft lifetime we reported
the hard lifetime instead.

MFC after:	3 days
2008-03-24 15:01:20 +00:00
Bjoern A. Zeeb
fdcc0789fb Import change from KAME, rev. 1.362 kame/kame/sys/netkey/key.c
In case of "new SA", we must check the hard lifetime of the old SA
to find out if it is not permanent and we can delete it.

Submitted by:	sakane via gnn
MFC after:	3 days
2008-03-24 14:55:09 +00:00
Christian S.J. Peron
bde4024026 Bump the FreeBSD version for zerocopy bpf buffers and changes to the
bpf(4) monitoring ABI/structures.
2008-03-24 14:30:01 +00:00
Christian S.J. Peron
4d621040ff Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by:		Seccuris Inc.
In collaboration with:	rwatson
Tested by:		pwood, gallatin
MFC after:		4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
    that can break the monitoring ABI.
2008-03-24 13:49:17 +00:00
Kip Macy
cf7a8ff3b7 remove unneccessary tcbinfo lock acquisitions - set tp to null affter calling enter_timewait as we no longer own the inpcb 2008-03-24 05:21:10 +00:00
Jeff Roberson
0ee6cecc9d - Greatly simplify vget() by removing the guarantee that any new
references to a vnode with VI_OWEINACT set will force the vinactive()
   call.  The kernel makes no guarantees about which reference was the
   last to close a file or when the actual inactive processing will
   happen.  The previous code was designed to preserve existing semantics
   in the face of shared locks, however, this was unnecessary.

Discussed with:	mckusick
2008-03-24 04:22:58 +00:00
Jeff Roberson
804e60d4cf - Don't acquire the vnode interlock in _vn_lock() unless no lock type
is requested.  Handle this case specially before the while loop.
 - Use the held vnode lock to check for VI_DOOMED.  The vnode lock and
   interlock must both be held to set VI_DOOMED so either one held, even
   shared, is sufficient to check it.

No objection by:	kib
2008-03-24 04:17:35 +00:00
Jeff Roberson
97735db712 - Remove an old comment; vnodes have been working without Giant for
years now.
 - Clarify the locking required for VI_DOOMED in preparation for
   simplifications to vget() and vn_lock().
2008-03-24 04:11:40 +00:00
Kip Macy
8815ab518a Label inp as unused in the non-INVARIANTS case 2008-03-24 00:29:01 +00:00
Peter Wemm
f001eabf3a First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed.  Some pure context switch microbenchmarks show up to 29%
improvement.  Pipe based context switch microbenchmarks show up to 7%
improvement.  Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
  non-threaded userland apps.  These typically cost 120 clock cycles each
  on an AMD cpu (less on Barcelona/Phenom).  Intel cores are probably no
  faster on this.
- The above change only helps unthreaded userland apps that tend to use
  the same value for gsbase.  Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
  prefetching a better chance of working.  Operations are now in increasing
  memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths.  Hopefully
  allowing better code density in cache lines.  This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
  realistic static branch prediction hint.  Both Intel and AMD cpus
  default to predicting branches to lower memory addresses as being
  taken, and to higher memory addresses as not being taken.  This is
  overridden by the limited dynamic branch prediction subsystem.  A trip
  through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
  in new operations.  Hopefully this will allow the cpus to execute in
  parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
  curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.
2008-03-23 23:09:06 +00:00
Alan Cox
58680920e9 Correct an error in pmap_mincore() when applied to a 2MB page mapping:
Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the
2MB physical page from the PDE.
2008-03-23 23:04:09 +00:00
Peter Wemm
22c0c6e9d3 Export TDP_KTHREAD to asm files. 2008-03-23 22:46:37 +00:00
Peter Wemm
6c73bb3557 Move pcb_flags to make trivially better use of cache lines. 2008-03-23 22:45:51 +00:00
Peter Wemm
3d60169ef4 Protect the setting of the fsbase/gsbase MSR registers and the
pcb_[fg]sbase values with a critical section, like the rest of the kernel.
2008-03-23 22:44:56 +00:00
Kip Macy
3d5853271e Insulate inpcb consumers outside the stack from the lock type and offset within the pcb by adding accessor functions.
Reviewed by: rwatson
MFC after: 3 weeks
2008-03-23 22:34:16 +00:00
Alan Cox
702006ff76 To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set.  However, this assumption does
not hold on recent processors from Intel.  For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear.  Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE.  Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault.  In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great.  Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.  However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm
2008-03-23 20:38:01 +00:00
Konstantin Belousov
1be222e9df Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed:	on stable@, with bde
Reviewed by:	tegge, jeff [1]
MFC after:	2 weeks
2008-03-23 13:45:24 +00:00
Konstantin Belousov
3f7905d29c Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by:	kris (i386/pmap.c:pmap_remove() part)
Reviewed by:	alc
MFC after:	1 week
2008-03-23 07:07:27 +00:00
Pyun YongHyeon
2000cf6c0b MSI handling on some RealTek chips are broken so disable it by
default.

Reported by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
2008-03-23 05:35:18 +00:00
Pyun YongHyeon
03ca7ae8a9 For MSI capable hardwares, enable MSI enable bit in RL_CFG2
register.  If MSI was disabled by hw.re.msi_disable tunable
expliclty clear the MSI enable bit.
2008-03-23 05:31:35 +00:00
Pyun YongHyeon
ce6283934e Some RealTek chips are known to be buggy on DAC handling, so
disable DAC by default.
2008-03-23 05:13:45 +00:00
Pyun YongHyeon
ccf34c81f8 VLAN hardware tag information should be set for all desciptors of a
multi-descriptor transmission attempt. Datasheet said nothing about
this requirements. This should fix a long-standing VLAN hardware
tagging issues with re(4).

Reported by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by:	Giulio Ferro ( auryn AT zirakzigil DOT org )
2008-03-23 05:06:16 +00:00
Pyun YongHyeon
70acaecfd0 Always honor configured VLAN/checksum offload capabilities.
Previously re(4) used to blindly enable VLAN hardware tag stripping
and Rx checksum offload regardless of enabled optional features of
interface.
2008-03-23 04:59:13 +00:00
David Xu
34d05d83f6 Remove commented out code, thread suspension is done in thread library. 2008-03-23 02:03:06 +00:00
Jeff Roberson
e6b2545b3b - Only return 1 from sync_vnode() in cases where the vnode is still
at the head of the sync list.  This prevents sched_sync() from
   re-queueing a vnode which may have been freed already.

Discussed with:	kib
2008-03-23 01:44:28 +00:00
Marcel Moolenaar
807e684076 Instead of making a single geom_part.ko module, make a module
for each partitioning scheme. The gpart code is currently non-
optional.
2008-03-23 01:42:47 +00:00
Jeff Roberson
f6a8cecfc6 - Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnode
interlock here anymore.

Reported by:	kris
2008-03-23 01:42:19 +00:00
Marcel Moolenaar
4ffca444a5 Redefine G_PART_SCHEME_DECLARE() from populating a private linker set
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.
2008-03-23 01:31:59 +00:00
Marcel Moolenaar
8a8fcb0089 Add g_retaste(), which given a class will present all non-open providers
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.

The g_retaste() function posts an event which is handled by the
g_retaste_event().

Event suggested by: phk
2008-03-23 01:23:35 +00:00
Olivier Houchard
2c361379e4 We need to prototype _start() as well, as we use it to test if we're running
from flash or from RAM.

Reported by:	imp
MFC After:	3 days
2008-03-22 20:34:07 +00:00
Qing Li
c7a0fc800c Reuse the mbuf that was just retrieved from the receive ring if mbuf
exhaustion is encountered. There was a fix made previously for this
problem but the solution (breaking out of the receive loop) does not
seem to work. mbuf reuse strategy is already adopted by other drivers
such as if_bge.  The problem was recreated and the patch is also
verified in the same test environment.
2008-03-22 18:13:39 +00:00
Sam Leffler
dd5ac081b8 add hints to specify how NPE ports are mapped to MAC+PHY; these
could be commented out as they just duplicate the defaults that
are built into the code

Reviewed by:	imp
MFC after:	1 week
2008-03-22 16:55:51 +00:00
Sam Leffler
c7ad0d8736 Improve mac+phy configuration so that hints can be used to describe
layouts different than the defaults:
o hint.npe.0.mac="A", "B", etc. specifies the window for MAC register accesses
o hint.npe.0.mii="A", "B", etc. specifies PHY registers
o hint.npe.1.phy=%d specifies the PHY to map to a port

This allows devices like NSLU to be setup w/o code changes and will
also be used for forthcoming support for more Avila boards.

Reviewed by:	imp
MFC after	1 week
2008-03-22 16:53:28 +00:00
Poul-Henning Kamp
4218a7310b In abort2(2): Accept a NULL arg pointer if nargs == 0 2008-03-22 16:32:52 +00:00
Sam Leffler
c28953b424 (finally) add the hal status to the diagnostic generated after
a failed ath_hal_reset call

MFC after:	3 days
2008-03-22 16:27:47 +00:00
Jeff Roberson
698b1a6643 - Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
 - Create a new lock in the bufobj to lock bufobj fields independently.
   This leaves the vnode interlock as an 'identity' lock while the bufobj
   is an io lock.  The bufobj lock is ordered before the vnode interlock
   and also before the mnt ilock.
 - Exploit this new lock order to simplify softdep_check_suspend().
 - A few sync related functions are marked with a new XXX to note that
   we may not properly interlock against a non-zero bv_cnt when
   attempting to sync all vnodes on a mountlist.  I do not believe this
   race is important.  If I'm wrong this will make these locations easier
   to find.

Reviewed by:	kib (earlier diff)
Tested by:	kris, pho (earlier diff)
2008-03-22 09:15:16 +00:00
Alfred Perlstein
435cdf88ea Fix a race where timeout/untimeout could cause crashes for Giant locked
code.

The bug:

There exists a race condition for timeout/untimeout(9) due to the
way that the softclock thread dequeues timeouts.

The softclock thread sets the c_func and c_arg of the callout to
NULL while holding the callout lock but not Giant.  It then drops
the callout lock and acquires Giant.

It is at this point where untimeout(9) on another cpu/thread could
be called.

Since c_arg and c_func are cleared, untimeout(9) does not touch the
callout and returns as if the callout is canceled.

The softclock then tries to acquire Giant and likely blocks due to
the other cpu/thread holding it.

The other cpu/thread then likely deallocates the backing store that
c_arg points to and finishes working and hence drops Giant.

Softclock resumes and acquires giant and calls the function with
the now free'd c_arg and we have corruption/crash.

The fix:

We need to track curr_callout even for timeout(9) (LOCAL_ALLOC)
callouts.  We need to free the callout after the softclock processes
it to deal with the race here.

Obtained from: Juniper Networks, iedowse
Reviewed by: jhb, iedowse
MFC After: 2 weeks.
2008-03-22 07:29:45 +00:00
Doug Ambrisko
a355f43ed2 Add in a compat. mode so you can either open the card's device
node or directly open mfi0 and specify the card you want to talk to
in the ioctl.
2008-03-22 02:57:49 +00:00