Commit Graph

169493 Commits

Author SHA1 Message Date
kib
64276929de Handle missing jremrefs when a directory is renamed overtop of
another, deleting it.  If the directory is removed, UFS always need to
remove the .. ref, even if the ultimate ref on the parent would not
change. The new directory must have a new journal entry for that ref.
Otherwise journal processing would not properly account for the
parent's reference since it will belong to a removed directory entry.

Change ufs_rename()'s dotdot rename section to always
setup_dotdot_link(). In the tip != NULL case SUJ needs the newref dependency
allocated via setup_dotdot_link().

Stop setting isrmdir to 2 for newdirrem() in softdep_setup_remove().
Remove the isdirrem > 1 checks from newdirrem().

Reported by:	many
Submitted by:	jeff
Tested by:	pho
2010-12-30 10:52:07 +00:00
kib
44467ec2f8 In indir_trunc(), when processing jnewblk entries that are not written
to the disk, recurse to handle indirect blocks of next level that are
hidden by the corresponding entry.

In collaboration with:	pho
Reviewed by:	jeff, mckusick
Tested by:	mckusick, pho
2010-12-30 10:41:17 +00:00
scf
3e36dcd710 Fix the LINUX_SOUND_MIXER_INFO ioctl to return success after the
information is set to FreeBSD.  It had been falling through to the end
of linux_ioctl_sound() and returning ENOIOCTL.  Noticed when running the
Linux ALSA amixer tool.

Add a LINUX_SOUND_MIXER_READ_CAPS ioctl which is used by the Skype
v2.1.0.81 binary.

Reviewed by:	gavin
MFC after:	2 weeks
2010-12-30 02:18:04 +00:00
cperciva
d34b4913e6 Add xenpic_dynirq_disable_intr and set it as the .pic_disable_intr method
for xenpic_dynirq_template.  This fixes a panic when a virtual disk is
removed, since that results in an interrupt channel being disabled and
NULL isn't very good function for disabling interrupts.

We should probably have a xenpic_pirq_disable_intr as well; I'm not adding
that here because (a) I'm not sure what uses pirqs so I don't have a test
case, and (b) the xenpic_pirq_enable_intr code is significantly more
complex than the xenpic_dynirq_enable_intr code, so I'm not sure what
should go into a xenpic_pirq_disable_intr routine.

PR:		kern/153511
MFC after:	3 days
2010-12-30 01:28:56 +00:00
cperciva
28ac9ef742 Remove INDEX-6 from the default portsnap configuration file; the 6.x index
bits haven't been built since December 1st, although the mirrors are still
distributing the bits as they were at the EoL.

Reminded by:	Alex Kozlov
2010-12-30 01:13:42 +00:00
kib
f112887cab Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only
consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY
was cleared early in vm_object_page_clean, before the cleaning pass
was done. This is no longer true after r216799.

 Moreover, since OBJ_CLEANING is a flag, and not the counter, it could
be reset too prematurely when parallel vm_object_page_clean() are
performed.

Reviewed by:	alc (as a part of the bigger patch)
MFC after:	1 month (after r216799 is merged)
2010-12-29 22:26:49 +00:00
jilles
9af03ea4fc printf: Do not use sh memory functions in sh builtin.
These functions throw exceptions if they fail, possibly causing memory
leaks. The normal out-of-memory handling suffices. The INTOFF around almost
all of printf prevents memory leaks due to SIGINT.
2010-12-29 21:38:00 +00:00
alc
cc9b2308bc There is no point in vm_contig_launder{,_page}() flushing held pages,
instead skip over them.  As long as a page is held, it can't be reclaimed by
contigmalloc(M_WAITOK).  Moreover, a held page may be undergoing
modification, e.g., vmapbuf(), so even if the hold were released before the
completion of contigmalloc(), the page might have to be flushed again.

MFC after:	3 weeks
2010-12-29 20:35:36 +00:00
jilles
584bccb74d sh: Properly restore exception handler in fc.
If SIGINT arrived at exactly the right moment (unlikely), an exception
handler in a no longer active stack frame would be called.

Because the old handler was not used in the normal path, clang thought it
was a dead value and if an exception happened it would longjmp() to garbage.
This caused builtins/fc1.0 to fail if histedit.c was compiled with clang.

MFC after:	1 week
2010-12-29 19:39:51 +00:00
attilio
ff557d3a61 Fix several callout migration races:
- Problem1:
   Hypothesis: thread1 is doing a callout_reset_on(), within his
   callout handler, willing to implicitly or explicitly migrate the
   callout.  thread2 is draining the callout.

   Thesys:
   * thread1 calls callout_lock() and locks the old callout cpu
   * thread1 performs the checks in the first path of the
     callout_reset_on()
   * thread1 hits this codepiece:
       /*
        * If the lock must migrate we have to check the state again as
        * we can't hold both the new and old locks simultaneously.
        */
       if (c->c_cpu != cpu) {
               c->c_cpu = cpu;
               CC_UNLOCK(cc);
               goto retry;
       }

     which means it will drop the lock and 'retry'
   * thread2 will callout_lock() and locks the new callout cpu.
     thread1 spins on the new lock and will not keep going for the
     moment.
   * thread2 checks that the callout is not pending (as callout is
     currently running) and that it is not on cc->cc_curr (because cc
     now refers to the new callout and the callout is running on the
     old callout cpu) thus it thinks it is done and returns.
   * thread1  will now acquire the lock and then adds the callout
     to the new callout cpu queue

   That seems an obvious race as callout_stop() falsely reports
   the callout stopped or worse, callout_drain() falsely returns
   while the callout is still in use.
 - Solution1:
   Fixing this problem would require, in general, to lock both
   callout cpus at once while switching the c_cpu field and avoid
   cyclic deadlocks between callout cpus locks.
   The concept of CPUBLOCK is then introduced (working more or less
   like the blocked_lock for thread_lock() function) meaning:
   "in callout_lock(), spin until the c->c_cpu is not different from
   CPUBLOCK". That way the "original" callout cpu, referred to the
   above mentioned code snippet, will remain blocked until the lock
   handover is over critical path will remain covered.

 - Problem2:
   Having the callout currently executed on a specific callout cpu
   and contemporary pending on another callout cpu (as it can happen
   with current code) breaks, at least, the assumption callout_drain()
   returns just once the callout cannot be referenced anymore.
 - Solution2:
   Callout migration is deferred if the current callout is already
   under execution.
   The best place to do that is in softclock() and new members are
   added to the callout cpu structure in order to specify a pending
   migration is requested. That is necessary because the callout
   cannot be trusted (not freed) the 100% of times after the execution
   of the callout handler.
   CPUBLOCK will prevent, in the "deferred migration" case, that the
   callout gets freed in this case, stopping any callout_stop() and
   callout_drain() possible activity until the migration is
   actually performed.

 - Problem3:
   There is a further race in callout_drain().
   In order to avoid a race between sleepqueue lock and callout cpu
   spinlock, in _callout_stop_safe(), the callout cpu lock is dropped,
   the sleepqueue lock is acquired and a new callout cpu lookup is
   performed.  Note that the channel used for locking the sleepqueue is
   obtained from the "current" callout cpu (&cc->cc_waiting).
   If the callout migrated in the meanwhile, callout_drain() will end up
   using the wrong wchan for the sleepqueue (the locked one will be the
   older, while the new one will not really be locked) leading to a
   lock leak and a race access to sleepqueue.
 - Solution3:
   It is enough to check if a migration happened between the operation
   of acquiring the sleepqueue lock and the new callout cpu lock and
   eventually unwind all those and try again.

This problems can lead to deathly races on moderate (4-ways) SMP
environment, leading to easy panic or deadlocks.
The 24-ways of the reporter, could easilly panic, with completely
normal workload, almost daily.
gianni@ kindly wrote the following prof-of-concept which can
panic a FreeBSD machine in less than one hour, in smaller SMP:
http://www.freebsd.org/~attilio/callout/test.c

Reported by:	Nicholas Esborn <nick at desert dot net>, DesertNet
In collabouration with:	gianni, pho, Nicholas Esborn
Reviewed by:	jhb
MFC after:	1 week (*)

* Usually, I would aim for a larger MFC timeout, but I really want this
  in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special
  case for this patch
2010-12-29 18:17:36 +00:00
kan
1e202d12a5 Switch mips architectures back to libgcc.
MIPS64 n64 binaries are broken with libcompiler_rt at this time.
Switch mips back to libgcc until the cause of breakage is analyzed
and fixed.
2010-12-29 17:12:05 +00:00
marius
10c0dabcb4 On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS,
which takes an physical address instead of an virtual one, for loading TTEs
of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB,
which only has a very limited number of lockable dTLB slots. The net result
is that we now basically can handle a kernel TSB of any size and no longer
need to limit the kernel address space based on the number of dTLB slots
available for locked entries. Consequently, other parts of the trap handlers
now also only access the the kernel TSB via its physical address in order
to avoid nested traps, as does the PMAP bootstrap code as we haven't taken
over the trap table at that point, yet. Apart from that the kernel TSB now
is accessed via a direct mapping when we are otherwise taking advantage of
ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this
is implemented by extending the patching of the TSB addresses and mask as
well as the ASIs used to load it into the trap table so the runtime overhead
of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS
is not yet enabled on SPARC64 CPUs due to lack of testing and due to the
fact it might require minor adjustments there.
Theoretically it should be possible to use the same approach also for the
user TSB, which already is not locked into the dTLB, avoiding nested traps.
However, for reasons I don't understand yet OpenSolaris only does that with
SPARC64 CPUs. On the other hand I think that also addressing the user TSB
physically and thus avoiding nested traps would get us closer to sharing
this code with sun4v, which only supports trap level 0 and 1, so eventually
we could have a single kernel which runs on both sun4u and sun4v (as does
Linux and OpenBSD).

Developed at and committed from:	27C3
2010-12-29 16:59:33 +00:00
marius
074b42904f - Move the macros for generating load and store instructions to asmacros.h
so they can be shared by different source files and extend them by a
  variant for atomic compare and swap.
- Consistently use EMPTY.
2010-12-29 14:14:50 +00:00
marius
b07041fd2b Rename the "xor" parameter to "xorval" as the former is a reserved keyword
in C++.

Submitted by:	gahr
2010-12-29 14:11:46 +00:00
kib
1a1716f8a9 Move the increment of vm object generation count into
vm_object_set_writeable_dirty().

Fix an issue where restart of the scan in vm_object_page_clean() did
not removed write permissions for newly added pages or, if the mapping
for some already scanned page changed to writeable due to fault.
Merge the two loops in vm_object_page_clean(), doing the remove of
write permission and cleaning in the same loop. The restart of the
loop then correctly downgrade writeable mappings.

Fix an issue where a second caller to msync() might actually return
before the first caller had actually completed flushing the
pages. Clear the OBJ_MIGHTBEDIRTY flag after the cleaning loop, not
before.

Calls to pmap_is_modified() are not needed after pmap_remove_write()
there.

Proposed, reviewed and tested by:	alc
MFC after:	1 week
2010-12-29 12:53:53 +00:00
kib
d69a17ac0a Add support for FS_TRIM to user-mode UFS utilities.
Reviewed by:	mckusick, pjd, pho
Tested by:	pho
MFC after:	1 month
2010-12-29 12:31:18 +00:00
kib
17dccd1898 Add kernel side support for BIO_DELETE/TRIM on UFS.
The FS_TRIM fs flag indicates that administrator requested issuing of
TRIM commands for the volume. UFS will only send the command to disk
if the disk reports GEOM::candelete attribute.

Since disk queue is reordered, data block is marked as free in the bitmap
only after TRIM command completed. Due to need to sleep waiting for
i/o to finish, TRIM bio_done routine schedules taskqueue to set the
bitmap bit.

Based on the patch by:	mckusick
Reviewed by:	mckusick, pjd
Tested by:	pho
MFC after:	1 month
2010-12-29 12:25:28 +00:00
kib
e8862739b9 Move the definition of mkdirlisthd from header to C file.
Reviewed by:	mckusick
Tested by:	pho
2010-12-29 12:16:06 +00:00
kib
cbd7f9d931 Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4).
Non-zero value of attribute means that device supports BIO_DELETE.

Suggested and reviewed by:	pjd
Tested by:	pho
MFC after:	1 week
2010-12-29 12:11:07 +00:00
kib
41c444747f Add sysctl vm.md_malloc_wait, non-zero value of which switches malloc-backed
md(4) to using M_WAITOK malloc calls.

M_NOWAITOK allocations may fail when enough memory could be freed, but not
immediately. E.g. SU UFS becomes quite unhappy when metadata write return
error, that would happen for failed malloc() call.

Reported and tested by:	pho
MFC after:	1 week
2010-12-29 11:39:15 +00:00
kib
305388e45b Use a proper type for the variable holding the summary size of the inode
data. Otherwise, on 32bit systems, unlinked inode which size is the
multiple of 4GB was not truncated, causing corruption.

Reported by:	brucec
Reviewed by:	mckusick
Tested by:	pho
2010-12-29 11:19:39 +00:00
davidxu
3daac37e3c - Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
  lowered, repropagete priorities, this may cause mutex owner's priority
  to be lowerd, in old code, mutex owner's priority is rise-only.
2010-12-29 09:26:46 +00:00
cperciva
83176a3464 A lack of console input is not the same thing as a byte of \0 input.
Correctly return -1 from cngetc when no input is available to be read.

This fixes the '(CTRL-C to abort)' spam while dumping.

MFC after:	3 days
2010-12-29 05:13:21 +00:00
rmacklem
be4347d563 Delete the nfsvno_localconflict() function in the experimental
NFS server since it is no longer used and is broken.

MFC after:	2 weeks
2010-12-28 23:50:13 +00:00
imp
7530d2b976 MIPS has lots of flavors as well 2010-12-28 22:49:28 +00:00
imp
fdd4426c6a Revert r216777, per jhb@ 2010-12-28 22:45:29 +00:00
imp
7e6057c691 Revert r216775, per jhb@ 2010-12-28 22:44:32 +00:00
nwhitehorn
974e62ca27 Fix an error in the ABI in rtld_bind_start(). When passing arguments to a
C function, the caller's stack frame must have room to store all of the
arguments to that function. While here, fix stack frame alignment issues.

Without this change, the compiler will save r3 and r4 into the caller's
stack frame before calling setjmp() in _rtld_bind(). These would then
overwrite arguments to the newly-bound function, causing eventual failures.
2010-12-28 22:31:59 +00:00
jilles
74d9b02bb0 sh: Don't do optimized command substitution if expansions have side effects.
Before considering to execute a command substitution in the same process,
check if any of the expansions may have a side effect; if so, execute it in
a new process just like happens if it is not a single simple command.

Although the check happens at run time, it is a static check that does not
depend on current state. It is triggered by:
- expanding $! (which may cause the job to be remembered)
- ${var=value} default value assignment
- assignment operators in arithmetic
- parameter substitutions in arithmetic except ${#param}, $$, $# and $?
- command substitutions in arithmetic

This means that $((v+1)) does not prevent optimized command substitution,
whereas $(($v+1)) does, because $v might expand to something containing
assignment operators.

Scripts should not depend on these exact details for correctness. It is also
imaginable to have the shell fork if and when a side effect is encountered
or to create a new temporary namespace for variables.

Due to the $! change, the construct $(jobs $!) no longer works. The value of
$! should be stored in a variable outside command substitution first.
2010-12-28 21:27:08 +00:00
imp
d20b96d0f2 Comment out npx and isa from NOTES file. We don't need them here
since DEFAULTS already pulls them in.
2010-12-28 21:22:08 +00:00
imp
9fe1a3e4f4 Remove mem, io, isa and npx since they are duplicative of the entries
in DEFAULTS.  Saves 8 lines of warnings when we build XBOX.
2010-12-28 21:20:58 +00:00
imp
781476f8b5 Due to the automatic inclusion of DEFAULTS everywhere, and since it
has device mem in it almost everywhere, we get warnings about
duplicated device almost everywhere.  Comment it out, with a note
about why, so that we don't get those warnings.
2010-12-28 21:18:58 +00:00
pjd
76df586660 ZFS might not return monotonically increasing directory offset cookies,
so turn off UFS-specific hack that assumes so in ZFS case.
Before the change we can miss returning some directory entries to a
NFS client.

I believe that the hack should be moved to ufs_readdir(), but until we find
somebody who will do it, turn it off for ZFS in NFS server code.

Submitted by:	rmacklem
Discussed with:	rmacklem, mckusick
MFC after:	3 days
2010-12-28 21:12:15 +00:00
jmallett
d9307a3688 When allocating memory from bootmem for the kernel to use, try to leave about
2MB of memory in the bootmem allocator for the SDK to use internally at a later
point.  It'd be nice if there were some functions we could call before
allocating memory to let various facilities reserve some memory, but for now
this seems sufficient.  Previously some unfortunate systems could give up all
(or at least most) of their memory to the kernel from bootmem, and then
allocating command queues for packet output and the like would fail later in
the boot process (which in turn would lead to crashes even later.)

Reported by:	kan
2010-12-28 20:11:54 +00:00
alc
99000f1878 Correct a typo in vm_fault_quick_hold_pages().
Reported by:	Bartosz Stec
2010-12-28 20:02:30 +00:00
jhb
5840cb14a8 Start sentences on a new line to ease life for translators. Tweak the
wording in a few places.

MFC after:	1 week
2010-12-28 18:58:15 +00:00
yongari
bacecb9418 Add device id for RDC M3010 which is found on Vortex86 SoC.
Reviewed by:	mav
2010-12-28 17:45:43 +00:00
nwhitehorn
df478abcd2 Only keep track of PTE validity statistics for pages not locked in the
table. The 'locked' attribute is used to circumvent the regular page table
locking for some special pages, with the result that including locked pages
here causes races when updating the stats.
2010-12-28 17:02:15 +00:00
jhb
936f38f35c Use bus_alloc_resource_any().
MFC after:	2 weeks
2010-12-28 16:57:29 +00:00
jilles
076a6add84 sh: Add test for optimized command substitution.
This test verifies that certain expansions without side effects do not
cause the command substitution to be executed in a child process.

This is not a correctness requirement, but it involves a nontrivial amount
of code and it would be unfortunate if it stopped working.
2010-12-28 14:58:08 +00:00
cperciva
35c87db32c Remove a "not strictly correct" (and panic-inducing) workaround for a bug
which doesn't seem to exist.

PR:		kern/141328
MFC after:	3 days
2010-12-28 14:36:32 +00:00
jilles
713ef02a1f sh: Make expansion errors in optimized command substitution non-fatal.
Command substitutions consisting of a single simple command are executed in
the main shell process but this should be invisible apart from performance
and very few exceptions such as $(trap).
2010-12-28 13:28:24 +00:00
lstewart
5fb7e0486c Add a comment for the ccv member of struct tcpcb.
Sponsored by:	FreeBSD Foundation
MFC after:	5 weeks
X-MFC with:	r215166
2010-12-28 12:37:57 +00:00
lstewart
446c1bbb10 - Add some helper hook points to the TCP stack. The hooks allow Khelp modules to
access inbound/outbound events and associated data for established TCP
  connections. The hooks only run if at least one hook function is registered
  for the hook point, ensuring the impact on the stack is effectively nil when
  no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual
  data to any registered Khelp module hook functions.

- Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp
  modules to associate per-connection data with the TCP control block.

- Bump __FreeBSD_version and add a note to UPDATING regarding to ABI changes
  introduced by this commit and r216753.

In collaboration with:	David Hayes <dahayes at swin edu au> and
				Grenville Armitage <garmitage at swin edu au>
Sponsored by:	FreeBSD Foundation
Reviewed by:	bz, others along the way
MFC after:	3 months
2010-12-28 12:13:30 +00:00
uqs
1f1cd9cdf4 Revert most of r210764, now that mdocml does the right
thing with empty quotation macros.

Requested by:	Alex Kozlov
2010-12-28 10:08:50 +00:00
ae
8db1129593 Allow destroying EBR in COMPAT (default) mode.
MFC after:	2 week
2010-12-28 08:42:12 +00:00
ae
43a50d46ef Make EBR probe method less strictly to be able detect EBRs with
small non fatal inconsistency. EBR may contain boot loader and sometimes
it just has some garbage data. Now this does not prevent FreeBSD to use
extended partitions. But since we do not support bootcode for EBR we mark
tables which have non empty boot area as corrupt. This does make them
readonly and we can not damage this data.

PR:		kern/141235
MFC after:	1 month
2010-12-28 08:36:44 +00:00
lstewart
04373a011b Add a new sack hint to track the most recent and highest sacked sequence number.
This will be used by the incoming Enhanced RTT Khelp module.

Sponsored by:	FreeBSD Foundation
Submitted by:	David Hayes <dahayes at swin edu au>
Reviewed by:	bz and others (as part of a larger patch)
MFC after:	3 months
2010-12-28 03:27:20 +00:00
lstewart
425155a6c9 Fix a whitespace nit introduced in r215166.
Sponsored by:	FreeBSD Foundation
Spotted by:	bz
MFC after:	5 weeks
X-MFC with:	r215166
2010-12-28 01:38:52 +00:00
cperciva
871e71c2db Build the modules which can be built. The excluded modules fall into two
categories: Those which can't build with PAE because they attempt to cast
a pointer to a bus_addr_t (mostly scsi drivers); and those which can't be
built with XEN because they conflict with something in xen-os.h (e.g., in
cxgb there is a conflicting definition of test_and_clear_bit).

MFC after:	1 week
2010-12-27 23:59:27 +00:00