- no longer serialize on Giant for thread_single*() and family in fork,
exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
all the ancient Intel/VIA/SIS/etc chipsets on amd64 systems. Even the
newer intel stuff won't need this since we use acpi by default and we
don't have all their magic programming information. Just use a generic
"Host to PCI bridge" name if we ever hit this code.
attach/detach time.
Assigning the default behaviour to this particular device is
incorrect, corrupting the video BIOS aperture, and breaking
VESA support in the kernel and XFree86.
Reviewed By: dfr
MFC after: 1 week
PR: kern/62906
we always grab Giant, even if we're actually only polling objects that
don't require giant. Once socket locking is merged, there will be
strong motivation to fix this.
While there, remove (caddr_t) casting of ethernet addresses, which
among other things discards the qualifier. This makes it clear that
atmulticastaddr does not require synchronization.
the last chunk are misaligned relative to a MAXBSIZE byte boundary.
vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections
are usually perfectly misaligned relative to MAXBSIZE, and chunking
prevents the file system from doing much realigning.
This gives a surprisingly large speedup for core dumps -- from 50 to
13 seconds for a 512MB core dump here. The pessimization was mostly
from an interaction of the misalignment with IO_DIRECT. It increased
the number of i/o's for each chunk by a factor of 5 (3 writes and 2
read-before-writes instead of 1 write).
to build the kernel. It doesn't affect the operation if gcc.
Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.
Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c
1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c
1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c
and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)
Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.
Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.
Reviewed by: -arch
Submitted by: netchild
Intel C/C++ compiler (lang/icc) to build the kernel.
The icc CPUTYPE CFLAGS use icc v7 syntax, icc v8 moans about them, but
doesn't abort. They also produce CPU specific code (new instructions
of the CPU, not only CPU specific scheduling), so if you get coredumps
with signal 4 (SIGILL, illegal instruction) you've used the wrong
CPUTYPE.
Incarnations of this patch survive gcc compiles and my make universe.
I use it on my desktop.
To use it update share/mk, add
/usr/local/intel/compiler70/ia32/bin (icc v7, works)
or
/usr/local/intel_cc_80/bin (icc v8, doesn't work)
to your PATH, make sure you have a new kernel compile directory
(e.g. MYKERNEL_icc) and run
CFLAGS="-O2 -ip" CC=icc make depend
CFLAGS="-O2 -ip" CC=icc make
in it.
Don't compile with -ipo, the build infrastructure uses ld directly to
link the kernel and the modules, but -ipo needs the link step to be
performed with Intel's linker.
Problems with icc v8:
- panic: npx0 cannot be emulated on an SMP system
- UP: first start of /bin/sh results in a FP exception
Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.
Reviewed by: silence on -arch
Submitted by: netchild
Conforming POSIX application should do by disallowing the argv
argument to be NULL.
PR: kern/33738
Submitted by: Marc Olzheim, Serge van den Boom
OK'ed by: nectar
path to an absolute path without a host name. Previously, there was a
nasty POLA violation where a system would PXE boot until you added the
BOOTP option and then it would panic instead.
Reviewed by: tegge, Dirk-Willem van Gulik <dirkx at webweaving.org>
(a previous version)
Submitted by: tegge (getip function)
silences an annoying warning in getblk() when VMIO'ing on a directory
vnode, which can happen when vfs.vmiodirenable is 1.
Bring the warning message in line with reality at the same time.
Submitted by: hmp
were a rather overwhelming task. I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.
Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs: Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.
It's not pretty, but at least it's one less layer violated.
in the non-_KERNEL case. This "fixes" applications that include
this "kernel-only" header and also include <strings.h> (or get
<strings.h> via the default _BSD_VISIBLE pollution in <string.h>.
In C++ there was a fatal error: the declaration specifies C linkage
but the implementation gives C++ linkage. In C there was only a
static/extern mismatch if the headers were included in a certain order
order, and a partially redundant declaration for all include orders;
gcc emits incomplete or wrong diagnostics for these, but only for
compiling with -Wsystem-headers and certain other warning options, so
the problem was usually not seen for C.
Ports breakage reported by: kris
for Windows are deserialized miniports. Such drivers maintain their own
queues and do their own locking. This particular driver is not deserialized
though, and we need special support to handle it correctly.
Typically, in the ndis_rxeof() handler, we pass all incoming packets
directly to (*ifp->if_input)(). This in turn may cause another thread
to run and preempt us, and the packet may actually be processed and
then released before we even exit the ndis_rxeof() routine. The
problem with this is that releasing a packet calls the ndis_return_packet()
function, which hands the packet and its buffers back to the driver.
Calling ndis_return_packet() before ndis_rxeof() returns will screw
up the driver's internal queues since, not being deserialized,
it does no locking.
To avoid this problem, if we detect a serialized driver (by checking
the attribute flags passed to NdisSetAttributesEx(), we use an alternate
ndis_rxeof() handler, ndis_rxeof_serial(), which puts the call to
(*ifp->if_input)() on the NDIS SWI work queue. This guarantees the
packet won't be processed until after ndis_rxeof_serial() returns.
Note that another approach is to always copy the packet data into
another mbuf and just let the driver retain ownership of the ndis_packet
structure (ndis_return_packet() never needs to be called in this
case). I'm not sure which method is faster.
based on the Madison core and targeting the low end of the spectrum.
Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since
the CPUID information is the same for Madison and Deerfield, we use
the clock frequency to identify the processor.
Supposedly the Deerfield only uses 62W, which seems to be less than
modern Xeon processors (about 70W) and about half what a Madison would
need.
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.
This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.
The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.
In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.
Unfortunately performance takes a hit.
Add "async" option to give people who know what they are doing the
old behaviour.
only. This is a MAJOR incompatible change for the sparc64 platform,
but will not effect FreeBSD on other architectures.
Reviewed by: imp for UPDATING, freebsd-sparc for the change itself.
This may not be a generally valid configuration, but neither is relying
on the PCI clock to be stable.
The only currently known and supported hardware is the VPN14x1 from
Soekris, and since it has external clock, we fail safe(r) by using
it.
Unfortunately there is no way to probe this reliably.
Retire g_sanity() and corresponding debugflag (0x8)
Retire g_{stall,release}_events().
Under #ifdef DIAGNOSTIC:
Make g_valid_obj() an official function and have it return an an
non-zero integer which indicates the kind of object when found.
Implement G_VALID_{CLASS,GEOM,CONSUMER,PROVIDER}() macros based
on g_valid_obj().
Sprinkle calls to these macros liberally over the infrastructure.
Always check that we do not free a live object.
that I added recently:
- When a periodic timer fires, it's automatically re-armed. We must
make sure to re-arm the timer _before_ invoking any caller-supplied
defered procedure call: the DPC may choose to call KeCancelTimer(),
and re-arming the timer after the DPC un-does the effect of the
cancel.
- Fix similar issue with periodic timers in subr_ndis.c.
- When calling KeSetTimer() or KeSetTimerEx(), if the timer is
already pending, untimeout() it first before timeout()ing
it again.
- The old Atheros driver for the 5211 seems to use KeSetTimerEx()
incorrectly, or at the very least in a very strange way that
doesn't quite follow the Microsoft documentation. In one case,
it calls KeSetTimerEx() with a duetime of 0 and a period of 5000.
The Microsoft documentation says that negative duetime values
are relative to the current time and positive values are absolute.
But it doesn't say what's supposed to happen with positive values
that less than the current time, i.e. absolute values that are
in the past.
Lacking any further information, I have decided that timers with
positive duetimes that are in the past should fire right away (or
in our case, after only 1 tick). This also takes care of the other
strange usage in the Atheros driver, where the duetime is
specified as 500000 and the period is 50. I think someone may
have meant to use -500000 and misinterpreted the documentation.
- Also modified KeWaitForSingleObject() and KeWaitForMultipleObjects()
to make the same duetime adjustment, since they have the same rules
regarding timeout values.
- Cosmetic: change name of 'timeout' variable in KeWaitForSingleObject()
and KeWaitForMultipleObjects() to 'duetime' to avoid senseless
(though harmless) overlap with timeout() function name.
With these fixes, I can get the 5211 card to associate properly with
my adhoc net using driver AR5211.SYS version 2.4.1.6.
a static const global variable in ah_core.c. This makes it more clear
that this array does not require synchronization, as well as
synchronizing the layout to the ESP algorithm list. This is the
version of my patch that Itojun committed to the KAME tree.
Obtained from: me, via KAME
pmap. For the kernel pmap, Giant is not required. In general, for
other pmaps, Giant is required by i386's pmap_pte() implementation.
Specifically, the use of PMAP2/PADDR2 is synchronized by Giant.
Note: In principle, updates to the kernel pmap's wired count could be
lost without Giant. However, in practice, we never use the kernel
pmap's wired count. This will be resolved when pmap locking appears.
- With the above change, cpu_thread_clean() and uma_large_free() need
not acquire Giant. (The first case is simply the revival of
i386/i386/vm_machdep.c's revision 1.226 by peter.)
- Add encapmtx to protect ip_encap.c global variables (encapsulation
list).
- Unifdef #ifdef 0 pieces of encap_init() which was (and now really
is) basically a no-op.
- Lock encapmtx when walking encaptab, modifying it, comparing
entries, etc.
- Remove spl's.
Note that currently there's no facilite to make sure outstanding
use of encapsulation methods on a table entry have drained bfore
we allow a table entry to be removed. As such, it's currently the
caller's responsibility to make sure that draining takes place.
Reviewed by: mlaier
stf_destroy() to handle the common softc destruction path for the
two destruction sources: interface cloning destroy, and module
unload.
NOTE: sc_ro, the cached route for stf conversion, is not synchronized
against concurrent access in this change, that will follow in a future
change.
Reviewed by: pjd
Push if_faith softc destruction logic into faith_destroy() so that
it can be called after softc list removal in both the clone destroy
and module unload paths.
ndis_probe_pci() doesn't contain an entry for an IRQ resource, try to
force one to be routed to us anyway by adding an extra call to
bus_alloc_resource(). If this fails, then we have to abort the attach.
Patch provided by jhb, tweaked by me.
really sure why we have a softc list for if_loop, given that it
can't be unloaded, but that's an issue to revisit in the future as
corrupting the softc list would still cause panics.
Reviewed by: benno
Since there are two destroy paths for if_disc interfaces --
module unload and cloan interface destroy, create a new utility
function disc_destroy(), which is callded on a softc after it
has been removed from the global softc list; the cloaner and
module unload entry paths will both remove it before calling
disc_destroy().
Reviewed by: pjd
prevented newfs to work on volumes that are larger than 1TB.
PR: 63577
Submitted by: Masaki Takakashi <mtakahashi@se.gtd.cosmo.co.jp>
Approved by: grog (mentor), bde
least common multiple of all disks sector sizes.
This will allow to safely concatenate disks with different sector sizes.
- Mark unused function arguments.
- Other minor cleanups.
in the Memory Mapped Configuration Region (MMCR) to reset the CPU.
If CPU_ELAN is set, try this first to reset the CPU before the
traditional way.
Without this change, my Compulab board powers down on 'reset' instead
of rebooting.
This adds the former ports registered groups: proxy and authpf as well as
the proxy user. Make sure to run mergemaster -p in oder to complete make
installworld without errors.
This also provides the passive OS fingerprints from OpenBSD (pf.os) and an
example pf.conf.
For those who want to go without pf; it provides a NO_PF knob to make.conf.
__FreeBSD_version will be bumped soon to reflect this and to be able to
change ports accordingly.
Approved by: bms(mentor)
- security.bsd.hardlink_check_uid, when set, means, that unprivileged
users are not permitted to create hard links to files not
owned by them,
- security.bsd.hardlink_check_gid, when set, means, that unprivileged
users are not permitted to create hard links to files owned
by group they don't belong to.
OK'ed by: rwatson
ethernet (tested) and FDDI (not tested). The main use for this is on ADSL (or
other ATM) connections where bridged ethernet is used, PPPoE being a prime
example.
There is no manual page as yet, I will write one shortly.
Reviewed by: harti
that we (p1) are currently running, we hold a reference on p_textvp which
means the vnode cannot go away. p2 cannot run yet (and hence cannot exit)
so this should be safe to do at this point. As a bonus, it removes a
block of under-Giant code that was there to support the vref.
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.
if_ndis.c has been split into if_ndis_pci.c and if_ndis_pccard.c.
The ndiscvt(8) utility should be able to parse device info for PCMCIA
devices now. The ndis_alloc_amem() has moved from kern_ndis.c to
if_ndis_pccard.c so that kern_ndis.c no longer depends on pccard.
NOTE: this stuff is not guaranteed to work 100% correctly yet. So
far I have been able to load/init my PCMCIA Cisco Aironet 340 card,
but it crashes in the interrupt handler. The existing support for
PCI/cardbus devices should still work as before.
- Push Giant down a bit in coredump() and call coredump() with the proc
lock already held rather than unlocking it only to turn around and
relock it.
Requested by: peter
process group and session dereferences. Also, check that p_pgrp and
p_sesssion are NULL before dereferencing them.
- Push down Giant in fork1().
Requested by: peter
introduction of kern_mlock() and kern_munlock() in
src/sys/kern/kern_sysctl.c 1.150
src/sys/vm/vm_extern.h 1.69
src/sys/vm/vm_glue.c 1.190
src/sys/vm/vm_mmap.c 1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.
Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.
Combine the best parts of each of the original sets of implementations
with further code cleanup. Make the mclock() and vslock()
implementations as similar as possible.
Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.
Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.
Tested by: Cy Schubert <Cy.Schubert AT komquats.com>
The previous logic meant that if a user sets it to a minimal cooling value
acpi_thermal will not use higher cooling levels. Reverse the logic so that
the user requesting a level (say, 2) also gets 0 - 1 also.
PR: kern/61592
Submitted by: Andrew Thompson <andy@fud.org.nz>
DIAGNOSTIC instead of INVARIANTS. INVARIANTS is intended for tests
that don't substantially change code flow or behavior (passive), but
this test required locking both the proc lock and scheduler lock
in order to execute. It also appears to be a very advisory diagnostic
as opposed to an invariant violation.
Following discussion with: bde
- completely unused things
- all of rev.1.102 (C++ support). <sys/cdefs.h> is included by the
prerequisite <sys/types.h>. __BEGIN_DECLS/__END_DECLS has no effect
(except possibly if undefined behaviour is invoked using a hack like
defining away __inline) since this header doesn't really support any
extern functions.
that this provokes. "Wherever possible" means "In the kernel OR NOT
C++" (implying C).
There are places where (void *) pointers are not valid, such as for
function pointers, but in the special case of (void *)0, agreement
settles on it being OK.
Most of the fixes were NULL where an integer zero was needed; many
of the fixes were NULL where ascii <nul> ('\0') was needed, and a
few were just "other".
Tested on: i386 sparc64