Commit Graph

86126 Commits

Author SHA1 Message Date
Kirk McKusick
dca5e0ec50 This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point to replace
MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync
routines.

The vfs_msync routine is run every 30 seconds for every writably
mounted filesystem. It ensures that any files mmap'ed from the
filesystem with modified pages have those pages queued to be
written back to the file from which they are mapped.

The ffs_lazy_sync and qsync routines are run every 30 seconds for
every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine
ensures that any files that have been accessed in the previous
30 seconds have had their access times queued for updating in the
filesystem. The qsync routine ensures that any files with modified
quotas have those quotas queued to be written back to their
associated quota file.

In a system configured with 250,000 vnodes, less than 1000 are
typically active at any point in time. Prior to this change all
250,000 vnodes would be locked and inspected twice every minute
by the syncer. For UFS/FFS filesystems they would be locked and
inspected six times every minute (twice by each of these three
routines since each of these routines does its own pass over the
vnodes associated with a mount point). With this change the syncer
now locks and inspects only the tiny set of vnodes that are active.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-20 07:00:28 +00:00
Kirk McKusick
f257ebbb2e This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-20 06:50:44 +00:00
Xin LI
f1c05d0b46 Fix build. 2012-04-20 04:40:39 +00:00
Michael Tuexen
90eba9b693 Whitespace changes.
MFC after: 3 days
2012-04-19 15:30:15 +00:00
Michael Tuexen
74b2fab47a Use the same pattern for mbuf logging everywhere.
MFC after: 3 days
2012-04-19 13:11:17 +00:00
Michael Tuexen
953b6058cc Fix reported errno.
MFC after: 3 days
2012-04-19 12:47:18 +00:00
Michael Tuexen
921569e288 Fix a bug where we copy out more data from a mbuf chain that are
actually in it. This happens when SCTP receives an unknown chunk, which
requires the sending of an ERROR chunk, and there is no final padding but
the chunk is not 4-byte aligned.
Reported by yueting via rwatson@

MFC after: 3 days
2012-04-19 12:43:19 +00:00
Alexander Motin
fc1de96060 Add to GEOM RAID class module for reading non-degraded RAID5 volumes and
some environment to differentiate 4 possible RAID5 on-disk layouts.

Tested with Intel and AMD RAID BIOSes.

MFC after:	2 weeks
2012-04-19 12:30:12 +00:00
Adrian Chadd
a47f39da1f Stop using the hardware register value byte order swapping for now,
at least until I can root cause what's going on.

The only platform I've seen this on is the AR9220 when attached to
the AR71xx CPUs.  I get immediate PCIe bus errors and all subsequent
accesses cause further MIPS bus exceptions.  I don't have any other
big-endian platforms to test this on.

If I get a chance (or two), I'll try to whack this on a bus analyser
and see exactly what happens.

I'd rather leave this on, especially for slower, embedded platforms.
But the #ifdef hell is something I'm trying to avoid.
2012-04-19 03:26:21 +00:00
Kirk McKusick
16165feec4 Delete a no longer useful VNASSERT missed during changes in 234400.
Suggested by: kib
2012-04-18 19:34:20 +00:00
Kirk McKusick
60005d66ab Fix a memory leak of M_VNODE_MARKER introduced in 234386.
Found by:  Peter Holm
2012-04-18 19:30:22 +00:00
Marcel Moolenaar
1eaf8e042c Compensate for the replacement of uart_cpu_{amd64|i386}.c with
uart_cpu_x86.c

Pointy hat: marcel
2012-04-18 17:44:05 +00:00
Josh Paetzel
6578a63cc7 Unbreak tinderbox.
Fix FreeBSD paradigms in the upstream code.

PR:	bin/166933
Submitted by:	Garrett Cooper <yanegomi@gmail.com>
2012-04-18 16:47:57 +00:00
Jaakko Heinonen
fd1062ce4c Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because
tmpfs doesn't support snapshots.

Suggested by:	bde
2012-04-18 15:22:08 +00:00
Jaakko Heinonen
808dd116cb The part about exec atime no longer applies in the comment.
Pointed out by:	bde
2012-04-18 15:19:00 +00:00
Thomas Quinot
4815449e08 Fix typo in comment 2012-04-18 12:50:13 +00:00
Dmitry Morozovsky
b20e4de387 VMware environments are not unusual now. Add VMware partitions recognition
(both MBR for ESXi <= 4.1 and GPT for ESXi 5) to g_part.

Reviewed by:	ae
Approved by:	ae
MFC after:	2 weeks
2012-04-18 11:59:03 +00:00
Alexander Motin
63297dfd4a Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
 - Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
 - Hide duplicate messages about device status change.
 - Remove periodic thread wake up with 10Hz rate.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2012-04-18 09:42:14 +00:00
Alexander Motin
6f48792444 Alike to SCSI make adaclose() to not return error if device gone.
This fixes KASSERT panic inside GEOM if kernel built with INVARIANTS.

MFC after:	1 week
2012-04-18 08:55:26 +00:00
Andrew Thompson
ddf3201009 Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.
2012-04-18 01:39:14 +00:00
Kirk McKusick
73305eb826 Drop export of vdestroy() function from kern/vfs_subr.c as it is
used only as a helper function in that file. Replace sole call to
vbusy() with inline code in vholdl(). Replace sole calls to vfree()
and vdestroy() with inline code in vdropl().

The Clang compiler already inlines these functions, so they do not
show up in a kernel backtrace which is confusing. Also you cannot
set their frame in kgdb which means that it is impossible to view
their local variables. So, while the produced code is unchanged,
the debugging should be easier.

Discussed with: kib
MFC after:      2 weeks
2012-04-17 21:46:59 +00:00
Kirk McKusick
71469bb38f Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-17 16:28:22 +00:00
Edward Tomasz Napierala
9e21ef395a Fix bug where NFSv4 ACL enforcement code wouldn't unconditionally
allow the owner to read and write ACL and file attributes when there
was no entry with subject matching the owner.  In other words,
'getfacl meh' shouldn't fail for the owner if the ACL looks like this:

# file: meh
# owner: trasz
# group: wheel
         user:root:------a-------:------:allow

Reported by:	kientzle
2012-04-17 14:54:00 +00:00
Edward Tomasz Napierala
0b18eb6d74 Stop treating system processes as special. This fixes panics
like the one triggered by this:

# kldload geom_vinum
# pwait `pgrep -S gv_worker` &
# kldunload geom_vinum

or this:

GEOM_JOURNAL: Shutting down geom gjournal 3464572051.
panic: destroying non-empty racct: 1 allocated for resource 6

which were tracked by jh@ to be caused by checking p->p_flag,
while it wasn't initialised yet.  Basically, during fork, the code
checked p_flag, concluded the process isn't marked as P_SYSTEM,
incremented the counter, and later on, when exiting, checked that
the process was marked as P_SYSTEM, and thus didn't decrement it.

Also, I believe there wasn't any good reason for checking P_SYSTEM
in the first place.

Tested by:	jh
2012-04-17 14:31:02 +00:00
Edward Tomasz Napierala
47f6635cc1 Fix panic, triggered like this: "int main() { thr_exit(); }"
Submitted by:	Mateusz Guzik
2012-04-17 13:44:40 +00:00
Edward Tomasz Napierala
786813aa1f Enforce upper bound on the input buffer length.
Reported by:	Mateusz Guzik
2012-04-17 13:28:14 +00:00
Edward Tomasz Napierala
48ef856766 Fix panic at boot with SD/MMC readers with no media present, introduced
at r234177.  Note that this is a temporary fix, until I come up with something
prettier.
2012-04-17 10:44:28 +00:00
Adrian Chadd
f846cf42ab Run the fatal proc as a proc, rather than where it currently is.
Otherwise the reset path will sleep, which it can't do in this context.
2012-04-17 06:02:41 +00:00
Adrian Chadd
baf94755c0 Fix the RX free list locking creation and destruction to be consistent
even in the face of errors.

If the RX descriptor list fails, the RX lock won't be initialised, but
then the DMA free path wil try freeing it.

This commit is brought to you by a working mwl(4).
2012-04-17 04:52:57 +00:00
Adrian Chadd
865a6f735d Add missing #include 2012-04-17 04:31:50 +00:00
Adrian Chadd
93f5997b8c Style(9) and white space fixes. 2012-04-17 01:34:49 +00:00
Adrian Chadd
3f08db2e79 Protect the PCI space registers behind a mutex.
Obtained from:	Linux/OpenWRT, Atheros
2012-04-17 01:22:59 +00:00
Peter Grehan
26b1d645e0 Add x2apic MSR definitions
Reviewed by:	jhb
Obtained from:	bhyve via Neel via NetApp
2012-04-17 00:54:38 +00:00
Jung-uk Kim
5cf2cbca3f Fix a Clang warning.
Submitted by:	arundel
2012-04-16 23:29:12 +00:00
Jung-uk Kim
17b27db088 Regen for r234359. 2012-04-16 23:17:29 +00:00
Jung-uk Kim
f69f4d8630 Correct an argument type of iopl syscall for Linuxulator. This also fixes
a warning from Clang, i. e., "args->level < 0 is always false".
2012-04-16 23:16:18 +00:00
Jung-uk Kim
13fa650c75 Regen for r234357. 2012-04-16 22:59:51 +00:00
Jung-uk Kim
db8eb180d9 Correct arguments of stat64, fstat64 and lstat64 syscalls for Linuxulator. 2012-04-16 22:58:28 +00:00
Dimitry Andric
6c63ec073e Bump __FreeBSD_version due to the import of a new clang 3.1 prerelease
snapshot.
2012-04-16 21:28:04 +00:00
Jung-uk Kim
28cc85fd09 Regen for r234352. 2012-04-16 21:24:23 +00:00
Jung-uk Kim
d69a426fce - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c.  There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *.  Probably
this was the source of MI/MD confusion.

Reviewed by:	emulation
2012-04-16 21:22:02 +00:00
Jung-uk Kim
67490d785a - When interrupt is not requested for VM86 call, make a fake exit point and
push the address onto stack as we do for INTn emulation.  This avoids stack
underflow when we encounter RETF instruction in VM86 mode.  Lack of this
exit point actually caused page fault in VM86 mode with VESA module when we
resume from suspend state[1].
- Remove unnecessary CLI and STI instructions from BIOS interrupt emulation.
INTn and IRET must be able to emulate the flag correctly.

Reported by:	gavin [1]
Tested by:	gavin (early revision)
MFC after:	3 days
2012-04-16 19:31:44 +00:00
Peter Grehan
57a7aaa71f Sync with Bryan Venteicher's virtio git repo:
d04e609bdd1973cc7d2e8b38b7dcfae057b0962d
	virtio_blk: Use correct temporary variable in vtblk_poll_request

Obtained from:	Bryan Venteicher  bryanv at daemoninthecloset dot org
2012-04-16 18:29:12 +00:00
Marius Strobl
05374275e5 Turn on PREEMPTION by default. After fixing several bugs over time, the
last show-stopper keeping PREEMPTION from being usable on sparc64 should
have been dealt with in r230662.
At least on 2-way systems, PREEMPTION causes a little bit of a degradation
in worldstone performance. However, FreeBSD seems to have started building
up regressions in !PREEMPTION cases so sparc64 better should not be an
oddball in this regard.

MFC after:	1 week
2012-04-16 18:29:07 +00:00
Jaakko Heinonen
587fdb536f Sync tmpfs_chflags() with the recent changes to UFS:
- Add a check for unsupported file flags.
- Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts
  to toggle SF_SETTABLE flags.
2012-04-16 18:10:34 +00:00
Jaakko Heinonen
c5ab5ce345 tmpfs: Allow update mounts only for certain options.
Since r230208 update mounts were allowed if the list of mount options
contained the "export" option. This is not correct as tmpfs doesn't
really support updating all options.

Reviewed by:	kevlo, trociny
2012-04-16 18:07:42 +00:00
Gleb Smirnoff
ef341ee1e3 When we receive an ICMP unreach need fragmentation datagram, we take
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.

This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.

To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
  offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
  MTU value, that is passed to tcp_mss_update() as mtuoffer.

Reported by:	az
Reported by:	Andrey Zonov <andrey zonov.org>
Reviewed by:	andre (previous version of patch)
2012-04-16 13:49:03 +00:00
Marko Zec
5bc2249ff8 #include <net/vnet.h> is no longer needed here.
Spotted by:	Ed Maste
MFC after:	3 days.
2012-04-16 13:41:46 +00:00
Andriy Gapon
85d5a233bd zfsboot: honor -q if it's present in boot.config
Before r228267 the option was honored but the original content of
boot.config was not preserved.  I tried to fix that but missed the idea.
Now the proper way of doing things is taken from i386/boo2.
Also, a comment is added to explain this a little bit unobvious
behavior.

Inspired by:	jhb
MFC after:	5 days
2012-04-16 10:43:06 +00:00
Andriy Gapon
8b452ff57a intpm: add ATI IXP400 pci id
PR:		kern/136762
Submitted by:	Aurelien Mere <freebsd@amc-os.com>
Tested by:	Jens Link <jens.link@gmx.de>
MFC after:	5 days
2012-04-16 10:33:46 +00:00