any per-instance global data that is not already protected by a
buf or vnode lock. Presently, only fields in ffs's struct fs utilize
this lock.
- Sort some ufsmount members so that fields used for quotas are grouped
together. This is in anticipation of quota locking.
Sponsored By: Isilon Systems, Inc.
caller may not be holding Giant, and namei() should acquire it as
necessary. HASGIANT is used to indicate when namei() is returning
with a reference to a vnode that requires giant, and giant is locked.
- Add the macro NDHASGIANT() which can be used in conjunction with
VFS_UNLOCK_GIANT() in callers who have marked the nd with MPSAFE.
Sponsored By: Isilon Systems, Inc.
must be held when any vnode owned by the filesystem is manipulated.
- Add VFS_LOCK_GIANT and VFS_UNLOCK_GIANT macros which are used to
conditionally lock and unlock Giant based on a particular mountpoint.
NetBSD went this route a while ago. FreeBSD originally tried this to
cope with multifunction cards. However, it turns out that we're
better off not worrying about the function number, and instead worry
about the function type for the function. This has worked well in
NetBSD, and all FreeBSD's relevant drivers have been converted.
# I'll rework the macros that specify them shortly, as soon as I can
# come up with a good, compatible way to deal...
Use the correct number of handles for multihandle returns.
Very, very, rarely on some SMP systems we've seen an 'unstable' type
in the response queue. I dunno whether or not it's a bug in our
handling, or whether there's a cache incoherency issue, but
try to guard against it.
MFC after: 2 weeks
have seen in the isa pnp case where a resource buts up against
0xffffffff. This would only impact when the board was booted without
ACPI.
Submitted by: Ed Maste (freebsd-stable <20050103145720.GA90754@sandvine.com>)
MFC After: 5 days
its ability to automatically scan and attach luns for modern storage
which has luns in the 0..1000 range, not 0..7.
The correct thing would be to do REPORT LUNS for devices whose LUN0
version shows a version >= SCSI3, but lacking that we should be able
to search higher than LUN 7 if we're >= SCSI3 with no ill effects.
This change keeps all of the QUIRK_HILUNS quirks, obeys the QUIRK_NOLUNS,
and introduces a QUIRK_NOHILUNS which will keep searches above LUN 7
happening for devices that report >= SCSI3 compliance. I doubt the latter
will be needed, but you never know.
This allowed me to randomly scan and attach > 500 disks at a time in
a situation where quirking for QUIRK_HILUNS wasn't practical (the
vendor id and product id changes of the virtualization changes
constantly).
Reviewed by: ken@freebsd.org, scottl@freebsd.org, gibbs@freebsd.org
MFC after: 2 weeks
witness_proc_has_locks(), as they are unused, which results in a compiler
error. This problem was introduced with the implementation of "show
alllocks".
Spotted by: Artem Kuchin <matrix at itlegion dot ru>
frame includes FCS (requires applications to be updated, but since
we weren't doing the out-of-line FCS stuff anyway app changes
were needed already)
o add a flag to indicate padding exists between the 802.11 header and
the payload (e.g. for Atheros cards)
o diff reducation against netbsd
MFC after: 1 week
in mddestroy() to properly free already allocated memory.
This fixes a panic when we want to create too big memory backed device
with preallocate memory (-o reserve).
- Remove redundant { }.
MFC after: 1 week
address, nor do we need the alignment requirements, so eliminate them.
This likely means that we can now collapse some of the entries as we
have no need of them anymore (they match other entries and were there
only to get the right attr memory offset of the enet addr).
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.
Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example. If you are planning to use it, for now you must:
1) Put "options DEBUG_MEMGUARD" in your kernel config.
2) Edit src/sys/kern/kern_malloc.c manually, look for
"XXX CHANGEME" and replace the M_SUBPROC comparison with
the appropriate malloc type (this might require additional
but small/simple code modification if, say, the malloc type
is declared out of scope).
3) Build and install your kernel. Tune vm.memguard_divisor
boot-time tunable which is used to scale how much of kmem_map
you want to allott for MemGuard's use. The default is 10,
so kmem_size/10.
ToDo:
1) Bring in a memguard(9) man page.
2) Better instrumentation (e.g., boot-time) of MemGuard taking
over malloc types.
3) Teach UMA about MemGuard to allow MemGuard to override zone
allocations too.
4) Improve MemGuard if necessary.
This work is partly based on some old patches from Ian Dowse.
cards work. These changes depend on the expanded funce parsing that
just was committed to pccard_cis.c. In NetBSD the ethernet address
was read out of attr memory directly. We rely on the kernel pccard
parser to pulll this information out of what appears to be an obsolete
funce with the information in it.
# I'm still getting the no rx interrupt sometimes with some hub/switches
# for reasons unknown... But usually only one and only when dhclient
# runs.
as type 0, rather than the usualy type 4. Assume that this format is
from an old standard and go with it. The Fujitsu FMV-186A and Silicom
Ethernet cards I have both have tuples with this format, and they are
both pretty old cards.
# if somebody knows for sure, please let me know.
in BSD class, ie. if provider below us uses the same metadata, don't
create slices based on the metadata.
This allows to create slices on geoms with rank != 1 without hacks.
Discussed with: phk
Approved by: phk
MFC after: 2 weeks
aware of any fe based cards that do anything except network (well,
maybe the fujitsu scsi/lan card, but I've only seen two of those on
ebay in the last 3 years).
replacement address for an rdr rule. Some rdr rules have no address family
(when the replacement is a table and no other criterion implies one AF).
In this case, pf would fail to select a replacement address and drop the
packet due to translation failure.
Found by: Gustavo A. Baratto
virtual COM port. This makes the use of the Dell OpenManage tools on FreeBSD
considerably easier, and is based on Chuck Cranor's original patch for 4.6.
Reviewed by: imp
Tested by: dpk at dpk dot net
MFC after: 1 week
bridge in the device tree which lacks the mandatory (also by the OFW PCI
bus binding spec) "reg" property. Change the code to just ignore nodes
missing the "reg" property instead of panicing when encountering such a
node. Also ignore nodes without a "name" property (guaranteed by the OFW
PCI bus binding spec). This brings the behaviour of the MD OFW PCI code
regarding such incomplete nodes in line with the EBus and the SBus code.
Tested by: Cyril Tikhomiroff <tikho@anor.net>
MFC after: 1 month
to elide. This is a somewhat more convenient way of specifying in
e.g. make.conf a list of modules you know you will never need.
PR: kern/76225
Submitted by: David Yeske <dyeske@yahoo.com>
MFC after: 2 weeks
now use a pool mutex to manage the reference counts. This fixes races
resulting in use-after-free.
Tested by: kris, David Cornejo dave at dogwood dot com
Reported by: bmilekic's MemGuard
MFC after: 1 week
o rework pll setup code to follow h/w specification
o add hint.hifn.X.pllconfig to specify reference clock setup
requirements; default is pci66 which means the clock is
derived from the PCI bus clock and the card resides in a
66MHz slot
Tested on 7955 and 7956 cards; support for 7954 cards not enabled
since we have no cards to test against.
In collaboration with Poul-Henning Kamp.
Reviewed by: phk
MFC after: 1 week
Rather than have a twisty maze of special case allocations, move
instead to a data driven allocation. This should be the most robust
way to cope with the resource problems that the multiplicity of ways
of encoding 5 registers that have the misfortune of not being a power
of 2 nor contiguous.
Also, make it less impossible that pccard will work. I've not been able
to get my libretto floppy working, but it now fails later than before.
phk and I had similar ideas on this during the 5.3 release cycle, but
it wasn't until recently that I could test more than one allocation
scenario.
MFC After: 1 month (5.4 if possible, 5.5 if not)
and tweaks. The code was actually quite broken because it discarded the
upper bits of the 64 bit division. We only had a 50% chance of scaling up
the blocksize for large NFS client mounts when it was needed. For 5.x and
beyond, this was harmless because we could represent the result in either
case. For 4.x this was a big problem though. (4.x also has a df(1) bug to
compound the problem)
happen on the first management frame received from a neighbor; we assume
any merge candidate will send more frames and those should be processed
with a suitable table entry.
Stepped on by: Tai-hwa Liang
interrupts that have a trigger mode of conforming. This fixes problems on
some older machines that still route PCI devices via ISA interrupts when
using an I/O APIC.
Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 month
instead of burying that in the atpic(4) code as atpic(4) is not the only
user of elcr(4). Change the elcr(4) code to export a global elcr_found
variable that other code can check to see if a valid ELCR was found.
MFC after: 1 month
producers rather than consumers as new-bus resources only handle consumed
resources. We already do this for the other ACPI resource types that
support the producer/consumer attribute.
Hold a lock on the table instead of futzing with reference counts which
was potentially dangerous except drivers were quiescent while we did this
so the table contents never changed. Disable the hack logic for removing
scan candidates with multiple association failures; it's never done the
right thing and will be fixed correctly with background scanning goes in.
object (/) rather than the pci bus object when walking the _PRT to force
attach devices. We already look up relative to the root object when doing
interrupt routing.
Suggested by: njl
For such devices, we require _PRS to exist and we warn if any of the
resources in _PRS are not IRQ resources (since we'll have no way of knowing
which of those resources to use without a working _CRS). When it does
come time to set resources, we build up a resource buffer from scratch
as we do for devices with _CRS that only have IRQ resources.
- Fix a bug with setting extended IRQ resources where we set the IRQ value
in the wrong resource structure meaning that whichever IRQ was listed in
_PRS was used instead. This might fix some weird issues on certain boxes
where IRQs > 16 don't seem to work when using ACPI.
- Fix a bug with how we walked the resource buffer after _SRS to call
config_intr() in that the 'end' variable was not properly updated, so we
could either terminate the loop early or loop after the end of the
buffer.
Tested by: pjd
not we're going to process the frame; this makes the counters reflect frames
actually processes instead of received (discarded frames were already counted)
o increase the max per-frame tx descriptor count and the number of tx
buffers for forthcoming fast frame support
o correct the max scatter/gather count; it cannot be larger than the
max(tx,rx,beacon) descriptor counts
(fix imported from madwifi by Takanori Watanabe)
o eliminate save/restore of pci registers handled by the system
o eliminate duplicate zero of the softc (noted by njl)
o consolidate common code
MFC after: 1 week
and always has been, but the system call itself returns
errno in a register so the problem is really a function of
libc, not the system call.
Discussed with : Matthew Dillion <dillon@apollo.backplane.com>
(calling a __dead2 function such as panic() at the end of a function), the
saved %eip on the stack will actually not be part of the function that
executed a call instruction but instead will be the first instruction of
the next function in the text. This happens with dblfault_handler() and
syscall() for example. Work around this in the one place it matters by
looking at the saved %eip - 1 to determine the calling function when we
check for "magic" frames.
MFC after: 2 weeks
provides truer debugger stack traces. For those that want to stick with
-O2 kernel builds, one should probably add -fno-optimize-sibling-calls
so that each stack frame as a frame pointer.
It is semi-promissed by the Release Engineers that when RELENG_6 is
created we go back to -O2.
Desired by: scottl, jhb
they both happen before pipe backing allocation occurs. Previously,
a pipe memory shortage would cause a panic due to a KNOTE call
on an uninitialized si_note.
Reported by: Peter Holm
MFC after: 1 week
for the vast majority of our cards. However, they are critically
needed to distinguish different fe based PC Cards (the FMV-182 from
the 182A) which need to be treated differently (the ethernet address
is loaded not from the standard CIS-based ethernet tuples, but from
differing locations in attribute space based on the version string in
CIS3. This should have no impact for other users of this function.
unhappiness lately.
As far as I can tell, no files that have made it safely to disk
have been endangered, but stuff in transit has been in peril.
Pointy hat: phk
- Introduce the amr_io_lock to control access to command queues, bio queues,
and the hardware.
- Eliminate the taskqueue and do all completion processing in the ithread.
- Assign a static slot number to each command instead of doing a linear
search for free slots each time a command is needed.
- Modify the interrupt handler to more closely match what Linux does, for
safety.
and BBO is BO's backing object. Now, suppose that O and BO are being
collapsed. Furthermore, suppose that BO has been marked dead
(OBJ_DEAD) by vm_object_backing_scan() and that either
vm_object_backing_scan() has been forced to sleep due to encountering
a busy page or vm_object_collapse() has been forced to sleep due to
memory allocation in the swap pager. If vm_object_deallocate() is
then called on BBO and BO is BBO's only shadow object,
vm_object_deallocate() will collapse BO and BBO. In doing so, it adds
a necessary temporary reference to BO. If this collapse also sleeps
and the prior collapse resumes first, the temporary reference will
cause vm_object_collapse to panic with the message "backing_object %p
was somehow re-referenced during collapse!"
Resolve this race by changing vm_object_deallocate() such that it
doesn't collapse BO and BBO if BO is marked dead. Once O and BO are
collapsed, vm_object_collapse() will attempt to collapse O and BBO.
So, vm_object_deallocate() on BBO need do nothing.
Reported by: Peter Holm on 20050107
URL: http://www.holm.cc/stress/log/cons102.html
In collaboration with: tegge@
Candidate for RELENG_4 and RELENG_5
MFC after: 2 weeks
Without this fix, when ACLs are set via tunefs(8) on the root file system,
they are removed on boot when 'mount -a' is called, because mount(8)
called for the root file system always add MNT_UPDATE flag and MNT_UPDATE
flag isn't perfect.
Now, one cannot remove ACLs stored in superblock (configured with tunefs(8))
via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still
possible to mount file system which doesn't have ACLs in superblock via
'mount -o acls <file system>' or /etc/fstab's 'acls' option.
Reported by: Lech Lorens/pl.comp.os.bsd
Discussed with: phk, rwatson
Reviewed by: rwatson
MFC after: 2 weeks
calls MiniportQueryInformation(), it will return NDIS_STATUS_PENDING.
When this happens, ndis_get_info() will sleep waiting for a completion
event. If two threads call ndis_get_info() and both end up having to
sleep, they will both end up waiting on the same wait channel, which
can cause a panic in sleepq_add() if INVARIANTS are turned on.
Fix this by having ndis_get_info() use a common mutex rather than
using the process mutex with PROC_LOCK(). Also do the same for
ndis_set_info(). Note that Pierre's original patch also made ndis_thsuspend()
use the new mutex, but ndis_thsuspend() shouldn't need this since
it will make each thread that calls it sleep on a unique wait channel.
Also, it occured to me that we probably don't want to enter
MiniportQueryInformation() or MiniportSetInformation() from more
than one thread at any given time, so now we acquire a Windows
spinlock before calling either of them. The Microsoft documentation
says that MiniportQueryInformation() and MiniportSetInformation()
are called at DISPATCH_LEVEL, and previously we would call
KeRaiseIrql() to set the IRQL to DISPATCH_LEVEL before entering
either routine, but this only guarantees mutual exclusion on
uniprocessor machines. To make it SMP safe, we need to use a real
spinlock. For now, I'm abusing the spinlock embedded in the
NDIS_MINIPORT_BLOCK structure for this purpose. (This may need to be
applied to some of the other routines in kern_ndis.c at a later date.)
Export ntoskrnl_init_lock() (KeInitializeSpinlock()) from subr_ntoskrnl.c
since we need to use in in kern_ndis.c, and since it's technically part
of the Windows kernel DDK API along with the other spinlock routines. Use
it in subr_ndis.c too rather than frobbing the spinlock directly.
the last action of kern_exit(). Instead, it is a MD callout to cleanup
per-process state during exit.
- Add notes of concern to Alpha and ia64 about the possible need to drop
fp state in cpu_thread_exit() rather than in cpu_exit() since it is
per-thread state rather than per-process.
- ip_fw_chk() returns action as function return value. Field retval is
removed from args structure. Action is not flag any more. It is one
of integer constants.
- Any action-specific cookies are returned either in new "cookie" field
in args structure (dummynet, future netgraph glue), or in mbuf tag
attached to packet (divert, tee, some future action).
o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}()
to a switch structure, so that the functions are more readable, and a future
actions can be added with less modifications.
Approved by: andre
MFC after: 2 months
and KASSERT coverage.
After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.
Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.
We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.
Add new (non-inline) VOP_FOO_AP() functions which take a "struct
foo_args" argument and does everything the VOP_FOO() macros
used to do with checks and debugging code.
Add KASSERT to VOP_FOO_AP() check for argument type being
correct.
Slim down VOP_FOO() inline functions to just stuff arguments
into the struct foo_args and call VOP_FOO_AP().
Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
and make VCALL() use it instead of the current offsetoff() hack.
Retire vcall() which implemented the offsetoff()
Make deadfs and unionfs use VOP_FOO_AP() calls instead of
VCALL(), we know which specific call we want already.
Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
functions.
Remove unused vdesc_offset and VOFFSET().
Generally improve style/readability of the generated code.
so we need to acquire Giant in netgraph methods, so that we don't
race with line discipline methods. Remove NET_NEEDS_GIANT.
- Packets coming into node from netgraph are queued in ifqueue
attached to node private data.
- Mutex in struct ifqueue is used to lock not only the queue, but
the whole private data, and tp->t_lsc field.
- tp->t_lsc pointer is used to indicate whether line discipline is
attached to netgraph or not.
- Use FLG_DIE flag to indicate that node may be destroyed.
(This protection doesn't work, and it didn't before. Must be redesigned.)
- Increment ngt_unit atomically, removing mutex.
- Acquire Giant, when executing ngt_start() from netgraph context.
- Acquire Giant, when {,de}registering line discipline.
- Uncomment forcing queue mode on peers hook, since this is reasonable.
- Force queue mode on our hook, to avoid acquiring Giant when coming from
network stack. We may already hold some mutexes at this point.
Cleanups:
- Use callout_pending() instead of our own flag.
- Remove spl(9) calls. Now we can use return() instead of ERROUT().
style(9):
- Sort includes.
- Sparse initializer for struct linesw.
- Remove some empty lines, sort declarations.
Reviewed by: julian, phk
MFC after: 1 month
of len in tcp_output(), in the case where the FIN has already been
transmitted. The mis-computation of len is because of a gcc
optimization issue, which this change works around.
Submitted by: Mohan Srinivasan
interrupt is wired up to all the I/O APICs in the system. If the system
has only one I/O APIC, then just act as if the entry specified that APIC.
We still don't try to handle global entries in a system with multiple I/O
APICs.
Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 week
up its pending error state, which may be set in some rare conditions resulting
in connect() syscall returning that bogus error and making application believe
that attempt to change association has failed, while it has not in fact.
There is sockets/reconnect regression test which excersises this bug.
MFC after: 2 weeks
errno can be tampered potentially by nested signal handle.
Now all error codes are returned in negative value, positive value are
reserved for future expansion.
external source (i.e., _STA). The previous case only handled calls
occurring within AML. This should fix Toshibas, among others. Thanks
to Robert Moore of Intel for the fix.
MFC after: 2 days
the given providers. Without even one of the configured components there
should be no way to get the secret.
Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl
- Use callout_pending() instead of our own flags.
- Remove home-grown protection of node, which has a scheduled
callout().
- Remove spl(9) calls.
Tested by: bz
TAILQ_FOREACH_SAFE().
Loose the error pointer argument and return any errors the normal way.
Return EAGAIN for the case where more work needs to be done.
e.g. at the loader:
set hint.pcib.1.skipslot=26
This allows undocumented and problematic hardware on some systems
to be ignored, for instance, the USB keyboard/mouse that shows up
on a 12" albook that doesn't exist nor do anything other than eat up
the syscons keyboard. Another one is the unused USB cell in the old
366MHz iBook that locks up the machine when probed.
In a way this is temporary, since there are better fixes for the
above problems, but will be useful in the meantime by allowing
a keyboard to be used to help debug said fixes :)
- while here remove some trailing white space
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.
If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.
Discussed with: rwatson