valid mask. Consequently, there is no need to perform a bit-wise and of
the page's dirty and valid masks in order to determine which parts of a
page are dirty and valid.
Eliminate an unnecessary #include.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.
Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.
Asked for by: rwatson
Reviewed by: rwatson
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.
The advantage of using a separate condvar is that we can just use
cv_signal(9) instead of cv_broadcast(9). It makes no sense to wake up
multiple threads. It also makes the TTY code easier to understand.
t_dcdwait sounds totally unrelated.
I suspect the usage of bgwait causes a lot of spurious wakeups when
threads are blocked in the background, because they will be woken up
each time a write() call is performed.
Also wakeup dcdwait when the TTY is abandoned.
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.
The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
the PHYs as some PHY drivers use it (but probably shouldn't). How
gem(4) has worked with brgphy(4) on powerpc without this so far is
unclear to me.
- Introduce a dying flag which is set during detach and checked in
gem_ioctl() in order to prevent active BPF listeners to clear
promiscuous mode which may lead to the tick callout being restarted
which will trigger a panic once it's actually gone.
- In gem_stop() reset rather than just disable the transmitter and
receiver in order to ensure we're not unloading DMA maps still in
use by the hardware. [1]
- The blanking time is specified in PCI clocks so we should use twice
the value when operating at 66MHz.
- Spell some 2 as ETHER_ALIGN and a 19 as GEM_STATUS_TX_COMPLETION_SHFT
to make the actual intentions clear.
- As we don't unload the peak attempts counter ignore its overflow
interrupts.
- Remove a stale setting of a variable to GEM_TD_INTERRUPT_ME which
isn't used afterwards.
- For optimum performance increment the TX kick register in multiples
of 4 if possible as suggested by the documentation.
- Partially revert r164931; drivers should only clear the watchdog
timer if all outstanding TX descriptors are done.
- Fix some debugging strings.
- Add a missing BUS_DMASYNC_POSTWRITE in gem_rint().
- As the error paths in the interrupt handler are generally unlikely
predict them as false.
- Add support for the SBus version of the GEM controller. [2]
- Add some lock assertions.
- Improve some comments.
- Fix some more or less cosmetic issues in the code of the PCI front-end.
- Change some softc members to be unsigned where more appropriate and
remove unused ones.
Approved by: re (kib)
Obtained from: NetBSD (partially) [2], OpenBSD [1]
MFC after: 2 weeks
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:
ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr
Remove unused macro which didn't have required referencing:
IFP_TO_IA6
This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.
Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.
Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)
driver's i/o ops must be locked to avoid chaos. Extend the cambria
bus tag to support ata and add a spin lock. The ata driver is
hacked to use that instead of it's builtin hack for ixp425. Once
the ata driver is fixed to not be confused about byte order we can
generalize the cambria bus tag code and make it generally useful.
While here take advantage of our being ixp435-specific to remove
delays when switching between byte+word accesses and to eliminate
the 2us delay for the uarts (the spin lock overhead looks to do
this for us).
- Support for 10G-PCIE*-8B*-C (dual-port CX4) NICs
- For dual-port NICs, f/w failover is now a few microsecs
instead of a few millisecs.
- On failover, f/w sends RARP broadcast to make the change
immediately known to the network
- Fixed a bug observed on IBM X3 architecture where
some spurious ecrc errors would be reported when OS enabled
ecrc support.
Sponsored by: Myricom Inc.
* Driver for ACPI HP extra functionations, which required
ACPI WMI driver.
Submitted by: Michael <freebsdusb at bindone.de>
Approved by: re
MFC after: 2 weeks
stream (TCP) sockets.
It is functionally identical to generic soreceive() but has a
number stream specific optimizations:
o does only one sockbuf unlock/lock per receive independent of
the length of data to be moved into the uio compared to
soreceive() which unlocks/locks per *mbuf*.
o uses m_mbuftouio() instead of its own copy(out) variant.
o much more compact code flow as a large number of special
cases is removed.
o much improved reability.
It offers significantly reduced CPU usage and lock contention
when receiving fast TCP streams. Additional gains are obtained
when the receiving application is using SO_RCVLOWAT to batch up
some data before a read (and wakeup) is done.
This function was written by "reverse engineering" and is not
just a stripped down variant of soreceive().
It is not yet enabled by default on TCP sockets. Instead it is
commented out in the protocol initialization in tcp_usrreq.c
until more widespread testing has been done.
Testers, especially with 10GigE gear, are welcome.
MFP4: r164817 //depot/user/andre/soreceive_stream/
used for the optional GPS+RS485 uarts on the Gateworks Cambria boards
which otherwise are unreliable
o setup the hack bus space tag for the GPS+RS485 uarts
o program the gpio interrupts for the uarts to be edge-rising
o force timing on the expansion bus for the uarts to be "slow"
Thanks to Chris Lang of Gateworks for these tips.
long mbuf chain into an arbitrary large uio in a single step.
It is a functional mirror image of m_uiotombuf().
This function is supposed to be used instead of hand rolled code
with the same purpose and to concentrate it into one place for
potential further optimization or hardware assistance.
chains) to pure data mbufs using m_demote(). This removes the
packet header and all m_tag information as they are not meaningful
anymore on a stream socket where mbufs are linked through m->m_next.
Strictly speaking a packet header can be only ever valid on the first
mbuf in an m_next chain.
sbcompress() was doing this already when the mbuf chain layout lent
itself to it (e.g. header splitting or merge-append), just not
consistently.
This frees resources at socket buffer append time instead of at
sbdrop_internal() time after data has been read from the socket.
For MAC the per packet information has done its duty and during
socket buffer appending the policy of the socket itself takes over.
With the append the packet boundaries disappear naturally and with
it any context that was based on it. None of the residual information
from mbuf headers in the socket buffer on stream sockets was looked at.
This change should make options VIMAGE kernel builds usable again,
to some extent at least.
Note that the size of struct vnet_inet has changed, though in
accordance with one-bump-per-day policy we didn't update the
__FreeBSD_version number, given that it has already been touched
by r194640 a few hours ago.
Reviewed by: bz
Approved by: julian (mentor)
- remove HT_HEADER test (MT_HEADER == MT_DATA for some time now)
- be more pedantic about m_nextpkt in other than first mbuf
- update m_flags to be retained
- shrink size guards for vnet_net.
vnet_rtable does not need size guards as it is self-contained.
- remove a bunch of defines from vnet.h no longer valid.
Vimage module, which had been there already but now is stateful.
All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.
Add a missing function local variable in case of MPATHing.
Reviewed by: zec
possible to do tolower/toupper independently without code conversion.
Submitted by: imura (but bugs are mine)
Obtained from: http://people.freebsd.org/~imura/kiconv/
(1_kiconv_wctype_kern.diff, 1_kiconv_wctype_user.diff)
No longer export rt_tables as all lookups go through
rt_tables_get_rnh().
We cannot make rt_tables (and rtstat, rttrash[1]) static as
netstat -r (-rs[1]) would stop working on a stripped
VIMAGE_GLOBALS kernel.
Reviewed by: zec
Presumably broken by: phk 13.5y ago in r12820 [1]
int. All of its callers pass in cmd as a u_long, so this has
always been a dangerous type demotion. It was spooted by clang/llvm
trying to do a type promotion and sign extension within
cam_periph_ioctl.
Submitted by: rdivacky
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.
MFC after: 3 weeks
address lists, at_ifaddr_list. Acquire the lock, and use ifaddr
refcounts where necessary, to close most known address-related
races in netatalk.
Annotate one potential race in at_control() where we acquire an
ifaddr reference, drop the global lock, and scrub the address from
the ifnet before re-acquiring the global lock, which could allow
for a writer-writer race.
MFC after: 3 weeks
remaining potential races in ifconfig's management of IPX addresses.
This is largely accomplished by dropping a global write lock for the
IPX address list over the body of in_control(), although there are
some places we bump the refcount on an ifaddr of interest while
calling out to the routing code or link layer code, which might
require revisiting.
Annotate one as a potential race if two simultaneous delete ioctls
are issued for the same IPX addresses at once.
MFC after: 3 weeks
a new rwlock, ipx_ifaddr_rw, wrapped with macros. This locking is
necessary but not sufficient, in isolation, to satisfy the stability
requirements of a fully parallel IPX input path during interface
reconfiguration.
MFC after: 3 weeks
calls to vdrop() until after the free page queues lock is released. This
eliminates repeatedly releasing and reacquiring the free page queues lock
each time the last cached page is reclaimed from a vnode-backed object.
- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.
The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.
MFC after: 3 weeks
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.
This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.
Reviewed by: rwatson