Commit Graph

33138 Commits

Author SHA1 Message Date
phk
6a65638bb3 Add two new submodes to the AES encryption method.
This method is now suitable for encrypting swap spaces.

Sponsored by:	DARPA & NAI Labs.
2002-06-28 21:25:15 +00:00
jeff
1ed9e0f375 Improve the VOP locking asserts
- Add vfs_badlock_print to control whether or not we print lock violations
 - Add vfs_badlock_panic to control whether we panic on lock violations

Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
2002-06-28 20:58:14 +00:00
iedowse
792737cf4d In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by:	jeff
2002-06-28 20:06:47 +00:00
jlemon
3d67b56283 One possible code path for syncache_respond() is:
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)

Which winds up deleting a different entry from the syncache.  Handle
this by not utilizing the next entry in the timer chain until after
syncache_respond() completes.  The case of A == B should not be possible.

Problem found by: Don Bowman <don@sandvine.com>
2002-06-28 19:12:38 +00:00
jeff
9fbb8d216f Clean up vn_rdwr locking.
- Do shared locks on read.
 - Only do vn_{start,finished}_write when writing.
2002-06-28 17:51:11 +00:00
green
62d02a6b93 Fix a case where a vnode got explicitly unlocked after the pointer to it
got set to NULL.

Revision 1.355: in the box
2002-06-28 16:17:47 +00:00
dfr
e59e678096 Fix warning.
Reviewed by: luigi
2002-06-28 08:36:26 +00:00
julian
ee54ee6584 bring Makefile up to date with new ipfw
Submitted by:	luigi
2002-06-28 08:10:07 +00:00
scottl
475c5eb90b Fix a botched flag clear operation. Rumor has it that this also fixes
the funky-volume-settings-on-startup problem.

Reviewed by:	the channel that shall not be named
MFC after:	7 days
2002-06-28 06:11:26 +00:00
luigi
f5aea44c67 Remove a printf and add a comment on an assumption that could be
occasionally violated by device drivers.
2002-06-27 23:23:04 +00:00
luigi
a9ab854862 The new ipfw code.
This code makes use of variable-size kernel representation of rules
(exactly the same concept of BPF instructions, as used in the BSDI's
firewall), which makes firewall operation a lot faster, and the
code more readable and easier to extend and debug.

The interface with the rest of the system is unchanged, as witnessed
by this commit. The only extra kernel files that I am touching
are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In
userland I only had to touch those programs which manipulate the
internal representation of firewall rules).

The code is almost entirely new (and I believe I have written the
vast majority of those sections which were taken from the former
ip_fw.c), so rather than modifying the old ip_fw.c I decided to
create a new file, sys/netinet/ip_fw2.c .  Same for the user
interface, which is in sbin/ipfw/ipfw2.c (it still compiles to
/sbin/ipfw).  The old files are still there, and will be removed
in due time.

I have not renamed the header file because it would have required
touching a one-line change to a number of kernel files.

In terms of user interface, the new "ipfw" is supposed to accepts
the old syntax for ipfw rules (and produce the same output with
"ipfw show". Only a couple of the old options (out of some 30 of
them) has not been implemented, but they will be soon.

On the other hand, the new code has some very powerful extensions.
First, you can put "or" connectives between match fields (and soon
also between options), and write things like

ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any

This should make rulesets slightly more compact (and lines longer!),
by condensing 2 or more of the old rules into single ones.

Also, as an example of how easy the rules can be extended, I have
implemented an 'address set' match pattern, where you can specify
an IP address in a format like this:

        10.20.30.0/26{18,44,33,22,9}

which will match the set of hosts listed in braces belonging to the
subnet 10.20.30.0/26 . The match is done using a bitmap, so it is
essentially a constant time operation requiring a handful of CPU
instructions (and a very small amount of memmory -- for a full /24
subnet, the instruction only consumes 40 bytes).

Again, in this commit I have focused on functionality and tried
to minimize changes to the other parts of the system. Some performance
improvement can be achieved with minor changes to the interface of
ip_fw_chk_t. This will be done later when this code is settled.

The code is meant to compile unmodified on RELENG_4 (once the
PACKET_TAG_* changes have been merged), for this reason
you will see #ifdef __FreeBSD_version in a couple of places.
This should minimize errors when (hopefully soon) it will be time
to do the MFC.
2002-06-27 23:02:18 +00:00
scottl
434f33d824 Delay the AC97 calibration until after the system clock has been
calibrated.  This fixes the problem where playback and recording do
not run at the correct speed.  It probably also eliminates the
need for the hacks/workarounds/sysctl's that were previously
devised to deal with this, but I will leave that for a different
time.

Reviewed by:	orion
2002-06-27 22:36:01 +00:00
imp
0825fa8a7f Lots of people have had to hack around the fixed address for cardbus
bridges in modern hardware (that hardware w/ lots of RAM).  Raise the
address from 0x44000000 to 0x88000000 to match what we do with
NEWCARD.  However, this really should be done in the pci layer.
2002-06-27 19:56:22 +00:00
rwatson
c282ad9b24 Fix a bug that prevented the deletion of non-default ACLs from being
passed down the VFS stack.  While I'm here, replace a '0' with a 'NULL'
to make the code more readable.

Sponsored by:	DARPA, NAI Labs
Obtained from:	TrustedBSD Project
2002-06-27 19:31:15 +00:00
rwatson
99f90043d5 A bit of whitespace magic. 2002-06-27 19:30:11 +00:00
imp
59390e1d94 Leave it to a non-native speaker of English to catch another typo: "do do" ->
"to do"

submitted by: marius@alchemy.franken.de
2002-06-27 18:16:16 +00:00
imp
309f5820ce Spell less like a 'merkin and more like a speaker of English 2002-06-27 17:59:24 +00:00
mux
c13cdc98b2 GENERIC now builds with -Werror, so remove NO_WERROR.
Approved by:	jake
2002-06-27 14:43:27 +00:00
mux
5347bbeaf2 Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
arr
479ff2a2a0 Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
       current/freebsd-current)

  "Too many pages were prefaulted in pmap_object_init_pt, thus
   the wrong physical page was entered in the pmap for the virtual
   address where the .dynamic section was supposed to be."

Submitted by:	tegge
Approved by:	tegge's patches never fail
2002-06-27 06:34:03 +00:00
jeff
9d6ea37a8c Set the UMA_ZONE_VM flag on the pvzone to avoid kmem_map recursion. 2002-06-27 04:08:45 +00:00
luigi
c576432afc Just a comment on some additional consistency checks that could
be added here.
2002-06-26 21:00:53 +00:00
iedowse
e62ef89fc1 Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

 o In the i386 pmap_protect(), `sindex' and `eindex' represent page
   indices within the 32-bit virtual address space.
 o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
   variable to store the low few bits of a vm_pindex_t that gets used
   as an array index.
 o vm_uiomove() uses `osize' and `idx' for page offsets within a
   map entry.
 o In vm_object_split(), `idx' is a page offset within a map entry.
2002-06-26 20:32:51 +00:00
iedowse
c3868afc98 Use an explicit cast to avoid relying on sign extension to do the
right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'.

Reviewed by:	dillon
2002-06-26 19:18:14 +00:00
iedowse
c0323a07fb Remove the kernel file-size limit for UFS2, so that only the limit
imposed by the filesystem structure itself remains. With 16k blocks,
the maximum file size is now just over 128TB.

For now, the UFS1 file size limit is left unchanged so as to remain
consistent with RELENG_4, but it too could be removed in the future.

Reviewed by:	mckusick
2002-06-26 18:34:51 +00:00
arr
614440e53d - Remove the Giant acquisition from linux_socket_ioctl() as it was really
there to protect fdrop() (which in turn can call vrele()), however,
  fdrop_locked() grabs Giant for us, so we do not have to.

Reviewed by:	jhb
Inspired by:	alc
2002-06-26 15:53:11 +00:00
ken
0d3a835f3f At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
dillon
371cb8c6de Enforce RLIMIT_VMEM on growable mappings (aka the primary stack or any
MAP_STACK mapping).

Suggested by:	alc
2002-06-26 03:13:46 +00:00
arr
5290d081cc - Remove Giant acquisition from modevent(), modfnext(), modstat() and
modfind().  Giant is no longer needed by these functions for safe
  execution.

Reviewed by:	jhb
2002-06-26 00:31:44 +00:00
dillon
ff4bf46648 Part I of RLIMIT_VMEM implementation. Implement core functionality for
a new resource limit that covers a process's entire VM space, including
mmap()'d space.

(Part II will be additional code to check RLIMIT_VMEM during exec() but it
needs more fleshing out).

PR:		kern/18209
Submitted by:	Andrey Alekseyev <uitm@zenon.net>, Dmitry Kim <jason@nichego.net>
MFC after:	7 days
2002-06-26 00:29:28 +00:00
arr
b6fded0faf - Alleviate jail() from having the burden of acquiring Giant by simply
removing.  We can do this since we no longer need Giant to safely
  execute jail().

Reviewed by:	rwatson, jhb
2002-06-26 00:29:01 +00:00
iedowse
0f26e30dae Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

 o Use a 64-bit type for the vm_object `size' and the size argument
   to vm_object_allocate().
 o Use the correct type for index variables in dev_pager_getpages(),
   vm_object_page_clean() and vm_object_page_remove().
 o Avoid an overflow in the i386 pmap_object_init_pt().
2002-06-25 22:14:06 +00:00
jeff
7083248242 Turn VM_ALLOC_ZERO into a flag.
Submitted by:	tegge
Reviewed by:	dillon
2002-06-25 22:01:12 +00:00
jeff
e9c6c8e0fd Reduce the amount of code that runs with the zone lock held in slab_zalloc().
This allows us to run the zone initialization functions without any locks held.
2002-06-25 21:04:50 +00:00
alc
698aabc226 o Eliminate vmspace::vm_minsaddr. It's initialized but never used.
o Replace stale comments in vmspace by "const until freed" annotations
   on some fields.
2002-06-25 18:14:38 +00:00
tmm
b6591e77a0 Don't assume that pointers are 4 bytes or sizeof(int) in size. This fixes
the indirection operator ('*') and address examination ('x/a') on
big-endian platoforms for which the above is not true, as well as on
little-endian platforms if the cut-off bits are not 0.
2002-06-25 15:59:24 +00:00
jake
e17570ea7d pmap_kremove can no longer be used to remove the magic device mappings
installed with pmap_kenter_flags, since the physical addresses may not
have an associated vm_page.  Add a function to do this.

Tested by:	Tomi Vainio <Tomi.Vainio@Sun.COM>
2002-06-25 15:13:09 +00:00
kato
d3f2ba39a2 MFi386: sys/i386/i386/machdep.c rev. 1.520. 2002-06-25 09:10:38 +00:00
mckusick
ea2f279985 Force the quota update to be done when an inode is released in
ufs_inactive. This avoid a panic when checking a NULL credential
in suser_cred().
2002-06-25 01:02:28 +00:00
arr
fc63b01c87 - Remove UM_* user land memory macros since they are no longer used. 2002-06-24 22:31:17 +00:00
mp
3662a19b34 Add missing splx().
MFC after:	3 days
2002-06-24 22:28:42 +00:00
hsu
deb4c976c7 Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by:	jlemon
2002-06-24 22:25:00 +00:00
jdp
b1ccf3cc04 Work around what appears to be a chip bug in the BCM5701 that shows
up when operating in PCI-X mode.  For some received packets there is
data corruption in the first few bytes in that case.  Aligning the
packet buffer eliminates the corruption.  With this fix, the code
that offsets the packet buffer up by 2 bytes to align the payload is
disabled for BCM5701s operating in PCI-X mode.  On the i386, which
permits unaligned accesses, the payload is left unaligned.  On other
platforms, the packet is copied after reception to force alignment
of the payload.  Obviously, this work-around reduces performance in
those cases (BCM5701 plus PCI-X) where it is in effect.

MFC after:	3 days
2002-06-24 22:04:15 +00:00
peter
ca21d675fc Compile in the cpu halt code even on SMP, instead just default the
sysctl (machdep.cpu_idle_hlt) to off in the SMP case.  This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).
2002-06-24 21:31:57 +00:00
dfr
39fa02684f Add UMA_ZONE_VM flag to the zones which are used for pmap_enter(). 2002-06-24 18:31:49 +00:00
jlemon
fe48a4f5d5 Prototype fixes (long newinum --> ino_t newinum). 2002-06-24 17:20:19 +00:00
hsu
f13c72f301 Style bug: fix 4 space indentations that should have been tabs.
Submitted by:	jlemon
2002-06-24 16:47:02 +00:00
markm
50d1c42f1e Fix a GCCism.
int foo[0];  // dodgy
int foo[];   // means the same, works the same and survives lint.

Tested by:	3 months of use on my laptop
2002-06-24 16:44:38 +00:00
jake
e102a9b6dd Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended.  This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly.  This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
2002-06-24 15:48:02 +00:00
robert
fcf509a309 Enable mixer interrupts after the mixer is initialized,
otherwise we might get interrupts and are unable to
handle them properly, which results in a page fault.

PR:		kern/39549
Submitted by:	Gil Kloepfer <gil@arlut.utexas.edu>
2002-06-24 15:28:47 +00:00