- Fix binat for incoming connections when a netblock (not just a single
address) is used for source in the binat rule. closes PR 3535, reported by
Karl O.Pinc. ok henning@, cedric@
- Fix a problem related to empty anchor rulesets, which could cause a kernel
panic.
Approved by: bms(mentor)
are supposed to continue firing as long as there is work to do, not
stop after the first invocation.
This is damage control after a patch that has been committed prematurely.
Tested by: kris
instead of ephemeral mappings using pmap_qenter() by the writer. The
writer is still, however, responsible for wiring the pages, just not
mapping them. Consequently, the allocation of KVA for the direct case is
unnecessary. Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.
Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386. Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
* all
- s/__FUNCTION__/__func__/.
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
- Compatibility for RELENG_4 and DragonFly.
* firewire
- Timestamp just before queuing.
- Retry bus probe if it fails.
- Use device_printf() for debug message.
- Invalidiate CROM while update.
- Don't process minimum/invalid CROM.
* sbp
- Add ORB_SHORTAGE flag.
- Add sbp.tags tunable.
- Revive doorbell support. It's not enabled by default.
objects rather than synchronization objects. When a sync object is
signaled, only the first thread waiting on it is woken up, and then
it's automatically reset to the not-signaled state. When a
notification object is signaled, all threads waiting on it will
be woken up, and it remains in the signaled state until someone
resets it manually. We want the latter behavior for NDIS events.
- In kern_ndis.c:ndis_convert_res(), we have to create a temporary
copy of the list returned by BUS_GET_RESOURCE_LIST(). When the PCI
bus code probes resources for a given device, it enters them into
a singly linked list, head first. The result is that traversing
this list gives you the resources in reverse order. This means when
we create the Windows resource list, it will be in reverse order too.
Unfortunately, this can hose drivers for devices with multiple I/O
ranges of the same type, like, say, two memory mapped I/O regions (one
for registers, one to map the NVRAM/bootrom/whatever). Some drivers
test the range size to figure out which region is which, but others
just assume that the resources will be listed in ascending order from
lowest numbered BAR to highest. Reversing the order means such drivers
will choose the wrong resource as their I/O register range.
Since we can't traverse the resource SLIST backwards, we have to
make a temporary copy of the list in the right order and then build
the Windows resource list from that. I suppose we could just fix
the PCI bus code to use a TAILQ instead, but then I'd have to track
down all the consumers of the BUS_GET_RESOURCE_LIST() and fix them
too.
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active. This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive). This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
which pulls a job off a thread work queue (assuming it hasn't run yet).
This is needed for KeRemoveQueueDpc().
- In subr_ntoskrnl.c, implement KeInsertQueueDpc() and KeRemoveQueueDpc(),
to go with KeInitializeDpc() to round out the API. Also change the
KeTimer implementation to use this API instead of the private
timer callout scheduler. Functionality of the timer API remains
unchanged, but we get a couple new Windows kernel API routines and
more closely imitate the way thing works in Windows. (As of yet
I haven't encountered any drivers that use KeInsertQueueDpc() or
KeRemoveQueueDpc(), but it doesn't hurt to have them.)
Report the %ecx bits in cpuid function 1. This is a hack.
When reporting AMD Features, only mask off the common bits. Otherwise
the SEP bit masks off SYSCALL etc in the report.
variable length, so we should not be trying to copy it into a fixed
length buffer, especially one on the stack. malloc() a buffer of the
right size and return a pointer to that instead.
Fixes a crash I discovered when testing whe a Cisco AP in infrastructure
mode, which returns several information elements that make the
ndis_wlan_bssid_ex structure larger than expected.
Because xfer->send.payload is a pointer to the buffer, '&' shouldn't be there.
Submitted by: John Weisgerber <weisgerberj@gsilumonics.com>
PR: misc/64623
(1 << 24) - 2 instead of 1 << 24, which it was obviously intended to
be). This fixes SBus isp(4)s on sparc64 machines.
Report and testing: Marius Strobl <marius@alchemy.franken.de>
iommu_dvma_vallocseg(), which I botched in r1.32. This bug could
cause an endless loop when a map was loaded and DVMA was scarce,
or that map had a stringent alignment or boundary.
Report and additional testing: Marius Strobl <marius@alchemy.franken.de>
in cpu_fork(). This prevents the stack tracer from running past the
end of the stack (only the pc is checked in that case), which became
fatal when db_print_backtrace() was introduced and called outside
of ddb.
Additional testing: kris
caused hangs on SMP systems under load. My theory was that an interrupted
thread was migrating and returning to PAL on a different CPU and that that
caused the hangs. To prevent this, I used the recently added sched_pin()
API to pin the interrupted thread to the CPU that received the interrupt
across ithread_schedule() to prevent migration. This seems to have fixed
the hangs based on tests by several folks on the alpha@ list.
Tested by: wilko, tisco, several others on alpha@
(NIC would claim to establish a link with an ad-hoc net but it couldn't
send/receive packets). It turns out that every time the checkforhang
handler was called by ndis_ticktask(), the driver would generate a new
media connect event. The NDIS spec says the checkforhang handler is
called "approximately every 2 seconds" but using exactly 2 seconds seems
too fast. Using 3 seconds makes it happy again, so we'll go with that
for now.
extra entry for if_ndis_pci.c that depends on cardbus, just to cover
all the bases. (I don't think you can have cardbus without PCI, but
just in case...)
- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
interfaces during encapsulation. Add a wonking comment on how we need
some sort of drain/reference count mechanism to keep gre references
alive while in use and simultaneous destroy.
This commit does not lockdown softc data, which follows in a future
commit.
- Add gif_mtx, which protects globals.
- Hold gif_mtx around manipulation of gif_softc_list.
- Abstract gif destruction code into gif_destroy(), which tears down
a softc after it's been removed from the global list by either module
unload or clone destroy.
- Lock gif_called, even though we know gif_called is broken with reentrant
network processing.
- Document an event ordering problem in gif_set_tunnel() that will need
to be fixed.
gif_softc fields not locked down in this commit.
processing with gif interfaces, to a global variable named "gif_called".
Add an annotation that this approach will not work with a reentrant
network stack, and that we should instead use packet tags to detect
excessive recursive processing.
implementation could be characterized as a hybrid of the amd64 and i386
implementations. Specifically, the direct virtual-to-physical mapping is
used if possible and sf_buf_alloc() is used if the direct map cannot.
to other files in netatalk:
Log:
Since I have my hands all over netatalk adding locking and restructuring
it, cinch the file's style closer to style(9) with regard to parenthesis:
s/( /(/g
s/ )/)/g
s/return(/return (/g
s/return 0/return (0)/
s/return 1/return (1)/
when it associates with a net. Because FreeBSD's kstack size is only
2 pages by default, this blows the stack and causes a double fault.
To deal with this, we now create all our kthreads with 8 stack pages.
Also, we now run all timer callouts in the ndis swi thread (since
they would otherwise run in the clock ithread, whose stack is too
small). It happens that the alloca() in this case was occuring within
the interrupt handler, which was already running in the ndis swi
thread, but I want to deal with the callouts too just to be extra
safe.
NOTE: this will only work if you update vm_machdep.c with the change
I just committed. If you don't include this fix, setting the number
of stack pages with kthread_create() has essentially no effect.
with more than the normal amount of stack pages, however the stack
pointer always wound up being initialized using KSTACK_PAGES. It
should be using td->td_kstack_pages instead. This means that although
the vm subsystem would give you all the stack pages you asked for,
%esp would always be initialized as if you had just 2 pages, and
the rest would go to waste.
I wanted to use the 'give me more stack pages' feature of kthread_create()
because the Intel 2200BG NDIS driver does an alloca() of about 5000 bytes,
which wrecks the stack with the default 2 page size, and I was baffled
that no matter how much code I shoved into thread contexts with
allegedly larger stacks, the thing would still crash unless I changed
KSTACK_PAGES.
Note: this bug is present in _ALL_ arches at this point. Peter has
promised to merge this fix into all of them.
instead of bus_alloc_resource_any() to restore source compatibility
with 5.2-REL and 5.2.1-REL systems. bus_alloc_resource_any() doesn't
really do anything besides hide some of bus_alloc_resource()'s arguments
from us, and in my opinion this isn't worth breaking backwards
compatibility for people who want to use the NDISulator code on 5.2.x.
activation (i.e., applications are using libpthread). This is because
SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily.
Which doesn't give fair amount of CPU time to processes which are
using scheduler-activation-based threads when other (semi-)CPU-intensive,
non-P_SA processes are running.
Further work will no doubt be done by jeffr at a later date.
Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Reviewed by: rwatson, freebsd-current@
distinguish between debugger inserted breakpoints and fixed
breakpoints. While here, make sure the break instruction never
ends up in the last slot of a bundle by forcing it to be an
M-unit instruction. This makes it easier for use to skip over
it.
are actually layered on top of the KeTimer API in subr_ntoskrnl.c, just
as it is in Windows. This reduces code duplication and more closely
imitates the way things are done in Windows.
- Modify ndis_encode_parm() to deal with the case where we have
a registry key expressed as a hex value ("0x1") which is being
read via NdisReadConfiguration() as an int. Previously, we tried
to decode things like "0x1" with strtol() using a base of 10, which
would always yield 0. This is what was causing problems with the
Intel 2200BG Centrino 802.11g driver: the .inf file that comes
with it has a key called RadioEnable with a value of 0x1. We
incorrectly decoded this value to '0' when it was queried, hence
the driver thought we wanted the radio turned off.
- In if_ndis.c, most drivers don't accept NDIS_80211_AUTHMODE_AUTO,
but NDIS_80211_AUTHMODE_SHARED may not be right in some cases,
so for now always use NDIS_80211_AUTHMODE_OPEN.
NOTE: There is still one problem with the Intel 2200BG driver: it
happens that the kernel stack in Windows is larger than the kernel
stack in FreeBSD. The 2200BG driver sometimes eats up more than 2
pages of stack space, which can lead to a double fault panic.
For the moment, I got things to work by adding the following to
my kernel config file:
options KSTACK_PAGES=8
I'm pretty sure 8 is too big; I just picked this value out of a hat
as a test, and it happened to work, so I left it. 4 pages might be
enough. Unfortunately, I don't think you can dynamically give a
thread a larger stack, so I'm not sure how to handle this short of
putting a note in the man page about it and dealing with the flood
of mail from people who never read man pages.
only done minimal testing on one of these cards and the firmware folks
have been extremely uncooperative in answering my qeustions about them, so
hopefully they will work ok for everyone.
This completes the effort to handle dependent functions, which are used
in some machines for irq link resources. Also, clean up some nearby
comments while I'm at it.
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.
With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)
Compile-tested on: i386
This change has not been tested.
This change was triggered by a gcc(1) warning on ia64 at -O2. The
variable v was not used after being computed, which resulted in enough
dead code elimination (DCE) to confuse the compiler and emit a bogus
warning about the use of the variable i without prior definition. The
variable i is the loop variable.
Submitted by: des
Responsibility: marcel
for uart(4) to figure out which device to use as console. Use this file
to define hw.uart.console instead so that we don't have to put it in
the default loader.conf, which makes it hard to override.
to select a serial console and debug port (resp). On ia64 these replace
the use of hints completely and take precedence over hints on alpha,
amd64 and i386. On sparc64 these variables are not yet recognised.
The reasons for introducing these variables are:
1. Hints have side-effects. They reserve the unit number for use by
isa or acpi devices and therefore cannot be used to select a pci
device. Also, the use of a unit number to select a device prior
to bus enumeration is nonsense. The new variables have no side-
effects and are not based on unit numbers.
2. Hints don't have the expression power to allow the sysadmin to
select UARTs that are not legacy PC devices and need the support
of compile-time constants to give the sysadmin some level of
flexibility.
The hw.uart.console and hw.uart.dbgport variables specify a list of
attributes. An attribute is a tag-value pair, seperated by a colon.
Attributes are seperated by a comma. Where possible, tags are the
same as those in /etc/remote (only br and pa in practice). Details
can be found in the manpage (not part of this commit).
Not tested on: amd64, pc98
from ddp_usrreq.c. Functions moved are:
at_pcballoc()
at_pcbconnect()
at_pcbdetach()
at_pcbdisconnect()
at_pcbsetaddr()
at_sockaddr()
Also moved are ddp_ports and ddpcb, global variables associated with DDP
pcbs. This makes PCB implementation more parallel to inet, inet6, and
ipx.
device, the device is probed multiple times (so each device is
detected N times after unloading/loading the module N-1 times).
The real fix is (quote Doug and Warner):
> : In an ideal world, there should be some kind of BUS_UNIDENTIFY method
> : which a driver could use to delete the devices it created in
> : BUS_IDENTIFY.
>
> Or the bus would have a driver deleted routine that got called and it
> would remove all instances of the devclass attached to it.
Reviewed by: Doug Rabson & Warner Losh
to mmap it PROT_EXEC. This also depends on the architecture, as some
architextures (e.g. i386) do not distinguish between read and exec pages
Inspired by: http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85
Reviewed by: alc
mappings required by mdstart_swap(). On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md. Even if the ephemeral mapping is not
cached, this implementation is still faster. On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.
On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"
before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)
after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)
after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)
malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
I've added -fno-strict-aliasing for now so we can ease into this.
I wanted to shoot for -O3, but the inlining caused problems due to GCC's
size heuristics; so also add -frename-registers, which is one of the things
-O3 would have given us.
entry size and the ELF version. Also, avoid a potential integer
overflow when determining whether the ELF header fits entirely
within the first page.
Reviewed by: jdp
A panic when attempting to execute an ELF binary with a bogus program
header table entry size was
Reported by: Christer Öberg <christer.oberg@texonet.com>
the tap driver, even with Giant over the cdev operation vector, due to
a non-atomic test-and-set of the si_drv1 field in the dev_t. This bug
exists with Giant under high memory pressure, as malloc() may sleep
in tapcreate(), but is less likely to occur. The resolution will
probably be to cover si_drv1 using the global tapmtx since no softc is
available, but I need to think about this problem more generally
across a range of drivers using si_drv1 in combination with SI_CHEAPCLONE
to defer expensive allocation to open().
Correct what appears to be a bug in the original if_tap implementation,
in which tapopen() will panic if a tap device instance is opened more
than once due to an incorrect assertion -- only triggered if INVARIANTS
is compiled in (i.e., when built into a kernel). Return EBUSY instead.
Expand mtx_lock() coverage using tp->tap_mtx to include tp->ether_addr.
use sf_buf_free() instead of sf_buf_mext() to consolidate all actions
that require the page queues lock in one critical section. While I'm
here remove unnecessary splvm() and splx() calls.
clip/destroy the dB value contained in the wi(4)'s receive frames,
it doesn't match with the flag set in the radiotap header
(unperturbed dB versus dBm).
Also set HOOK_HACK to true (remove the related #ifdef's) as we have the
hooks in the kernel this was missed during the merge from the port.
Noticed by: Amir S. (for the HOOK_HACK part)
Approved by: bms(mentor)
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.
Tested on: i386
Compiled on: i386, alpha, amd64
Without this fix it is possible to cheat policies like:
- sysctl security.bsd.see_other_[gu]ids=0,
- mac_seeotheruids(4),
- jail(2)
and get full processes list with their arguments.
This problem exists from revision 1.62 of kern_proc.c when it was
introduced.
Reviewed by: nectar, rwatson.
as the process that opens tun_softc can exit before the file
descriptor is closed.
Taiwan experience provided by: keichii
Crashing breakers provided by: Chia-liang Kao <clkao@clkao.org>
(tap_pid, tap_flags). if_tap should now be entirely MPSAFE.
Committed from: Bamboo house by ocean in Taiwan
Tropical paradise provided by: Chia-liang Kao <clkao@clkao.org>
group block locked. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause panic due to recursive lock attempt.
PR: 64206
Reviewed by: mckusick
Tested by: pjd
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.
might be enqueued on a sleep queue but not be asleep when the timeout fires
if it is blocked on a lock trying to check for pending signals before going
to sleep. In the case of fixing up the TDF_TIMEOUT race, however, the
thread must be marked asleep.
Reported by: kan (the bogus one)
mini-layer. I don't have time to bing it forward into the GEOM world, and
no one else has stepped forward to claim it. It'll be in the Attic for safe
keeping for now.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.
While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
set it to avoid the need for a bunch of code that tests whether or
not the lock member is set to REQ_WIRED in order to determine which
length member should be used.
Fix another bug in the oldlen return value code.
Fix a potential wired memory leak if a sysctl handler uses
sysctl_wire_old_buffer() and returns an EAGAIN error to trigger
a retry.
If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set
wiredlen to zero and return zero (success) so that the handler will
operate according to sysctl(3):
The size of the buffer is given by the location specified by
oldlenp before the call, and that location gives the amount
of data copied after a successful call and after a call that
returns with the error code ENOMEM.
The handler will return an ENOMEM error because the zero length
buffer will overflow.
ptrace_set_pc(), and cpu_ptrace() so that those functions are free to
acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so
that it is safe to sleep inside of these routines if necessary. This
allows ptrace() to be marked MP safe again as it no longer triggers lock
order reversals on Alpha.
Tested by: wilko
snprintf() and vsnprintf() in FreeBSD kernel land).
This is needed by the Intel Centrino 2200BG driver. Unfortunately, this
driver still doesn't work right with Project Evil even with this tweak,
but I'm unable to diagnose the problem since I don't have access to a
sample card.