Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.
Reviewed by: kmacy, rwatson
Approved by: re
When multiple interfaces exist in the system, with each interface having
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.
Reviewed by: bz, gnn, kmacy
Approved by: re
It is possible for all the kthreads to exit (hci modules unloaded) which in
turn ends our usb process. This means the proc pointer becomes invalid and will
panic if a new kthread is added. Count the number of threads and clear the proc
pointer on the last one.
Approved by: re (kib)
Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.
Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().
Reviewed by: bz
Approved by: re (kib)
Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.
Reviewed by: bz
Approved by: re (kib)
Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy, qingli
Approved by: re (kib)
Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.
Approved by: re (kib)
Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.
Submitted by: pjd
Reviewed by: {krw,sthen,markus}@openbsd.org
Approved by: re (kib)
Rather than using IFNET_RLOCK() when iterating over (and modifying) the
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.
This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.
Discussed with: bz, julian
Reported by: phk
Approved by: re (kib)
Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
Approved by: re (kib)
When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When "jail -c vnet" request fails, the current code actually creates and
leaves behind an orphaned vnet. This change ensures that such vnets get
released.
This change affects only options VIMAGE builds.
Submitted by: jamie
Discussed with: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In
nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
handler, resulting in effectively identical operation as it was prior this
change. div_destroy() also tears down hashtables used by ipdivert, which
were previously left behind on ipdivert kldunloads.
For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
because without introducing additional locking logic it is impossible to
atomically check whether all ipdivert instances in all vnets are idle, and
proceed with cleanup without opening a race window for a vnet to open an
ipdivert socket while ipdivert tear-down is in progress.
While here, staticize div_init(), because it is not used outside of
ip_divert.c.
In cooperation with: julian
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
When registering a protocol to an existing protocol domain via
pf_proto_register(), iterate over all existing vnets to call protosw_init()
and thus the appropriate .pr_init() handler in the context of each vnet.
NB in the future we probably want to separate pr_init() handlers into
two, i.e. per-vnet and global, functions.
This change has no impact on nooptions VIMAGE builds.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't try to power down PHY when alc(4) failed to map the device.
This fixes system crash when mapping alc(4) device failed in device
attach.
Reported by: Jim < stapleton.41 <> gmail DOT com >
Approved by: re (kib)
Add RTL8168DP/RTL8111DP device id. While I'm here append "8111D" to
the description of RTL8168D as RL_HWREV_8168D can be either
RTL8168D or RTL8111D.
PR: kern/137672
Approved by: re (kib)
Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
Approved by: re (kensmith)
Tweak the way that the ACPI and ISA bus drivers match hint devices to
BIOS-enumerated devices:
- Assume a device is a match if the memory and I/O ports match even if the
IRQ or DRQ is wrong or missing. Some BIOSes don't include an IRQ for
the atrtc device for example.
- Add a hack to better match floppy controller devices. Many BIOSes do not
include the starting port of the floppy controller listed in the hints
(0x3f0) in the resources for the device. So far, however, all the BIOS
variations encountered do include the 'port + 2' resource (0x3f2), so
adjust the matching for "fdc" devices to look for 'port + 2'.
Approved by: re (kib)
The svnversion string is only relevant when newvers.sh is called
during the kernel build process, the other places that call the
script do not make use of that information. So restrict execution
of the svnversion-related code to the kernel build context.
Approved by: re (kib)
Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.
This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
This patch fixes two bugs in sglist(9) and improves robustness of the API via
better semantics if a request to append an address range to an existing list
fails.
- When cloning an sglist, properly set the length in the new sglist instead of
leaving the new list empty.
- Properly compute the amount of data added to an sglist via
_sglist_append_buf(). This allows sglist_consume_uio() to properly update
uio_resid.
- When a request to append an address range to a scatter/gather list fails,
restore the sglist to the state it had at the start of the function call
instead of resetting it to an empty list.
Approved by: re (kib)
Fix a boot hang for hptrr(4) caused by changes introduced in r195534.
It is necessary to make sure cpi->transport is set for xpt_scan_bus() to
work properly.
Submitted by: Bernhard Schmidt (scb+freebsd-current <at> techwires
<dot> net)
Reviewed by: scottl
Approved by: re (kib)
Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
Rather than fix questionable ifnet list locking in the implementation of
the kern.polling.enable sysctl, remove the sysctl. It has been deprecated
since FreeBSD 6 in favour of per-ifnet polling flags.
Reviewed by: luigi
Approved by: re (kib)
Change the 'resid' parameter to sglist_consume_uio() from an int to a
size_t to match the recent type change of the uio_resid member of struct
uio.
Approved by: re (kib)
Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when
CARP tries to free them using M_IFADDR after the last address for a virtual
host is removed and when detaching from the parent interface.
Approved by: re (kib), ken (mentor)
Our libc doesn't implement control method for XDR (only kernel does) and it
will always return failure. Fix this by bringing userland implementation of
xdrmem_control() back. This allow 'zpool import' to work again.
Reported by: Thomas Backman <serenity@exscape.org>
Reviewed by: kmacy
Approved by: re (kib)
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.
Approved by: re@
Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.
A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.
Reviewed by: jfv, yongari
Approved by: re (kib)
Fix USB cache sync operations for platforms with non-coherent DMA.
- usb_pc_cpu_invalidate() is called between [consecutive] reads from a device,
so a sequence of BUS_DMASYNC_POSTREAD and _PREREAD should be used. Note we
cannot use or'ed shorthand ( _POSTREAD | _PREREAD) for BUS_DMASYNC flags, as
the low level bus dma sync operation is implementation dependent and we
cannot assume the required order of operations to be guaranteed.
- usb_pc_cpu_flush() is called before writing to a device, so
BUS_DMASYNC_PREWRITE should be used.
Submitted by: Grzegorz Bernacki
Reviewed by: HPS, arm@, usb@ ML
Tested by: HPS, Mike Tancsa
Approved by: re (kib)
Obtained from: Semihalf
Small changes to the warning message generated by pty(4):
- Only print the warning once, instead of filling up the screen.
- Use the word "legacy" for the pty_warningcnt description, to prevent
confusion.
- Use log() instead of printf().
Discussed with: rwatson, jhb
Approved by: re (kib)
If we cannot immediately get the pf_consistency_lock in the purge thread,
restart the scan after acquiring the lock the hard way. Otherwise we
might end up with a dead reference.
Approved by: re (kib)
Do not try to reevaluate current RX production index on each
loop iteration as it can be updated by the card while we
process the RX ring forcing us to process RX descriptors
for which DMA synchronisation operation has not been
performed. This fixes the bug when bge(4) drops packets
under high load.
Discussed with: yongari, marius
Approved by: re (kib)
- change the interface to flowtable_lookup so that we don't rely on
the mbuf for obtaining the fib index
- check that a cached flow corresponds to the same fib index as the
packet for which we are doing the lookup
- at interface detach time flush any flows referencing stale rtentrys
associated with the interface that is going away (fixes reported
panics)
- reduce the time between cleans in case the cleaner is running at
the time the eventhandler is called and the wakeup is missed less
time will elapse before the eventhandler returns
- separate per-vnet initialization from global initialization
(pointed out by jeli@)
Reviewed by: sam@
Approved by: re@
Backout r193289. r193289 restored page select bits to previous
value instead of blindly resetting it to 0. However, it seems page
select bits of some 88E1116 PHY is initialized to invalid one such
that restoring page select bits after programming broke MII
register access. The correct solution would be reset page select
bits to 0 in PHY attach stage but it would require more testing.
Since we're in BETA stage such a change would be dangerous so just
back it out.
This change should fix nfe(4) breakage on NVIDIA MCP55.
Reported by: Ryan Rogers < webmaster <> doghouserepair dot com >
Sam Fourman Jr. < sfourman <> gmail dot com >
Tested by: Ryan Rogers < webmaster <> doghouserepair dot com >
Sam Fourman Jr. < sfourman <> gmail dot com >
Approved by: re (kib)
for ATA_SETFEATURES/ATA_SF_SETXFER command which by definition transfers no
data. Most of controllers are irrelevant to this bug, but some nVidia's
doesn't.
Tested on: current@
Approved by: re (kib)
Apply the same patch as r196205 for nfs_upgrade_lock() and
nfs_downgrade_lock() to the experimental nfs client.
Approved by: re (kensmith), kib (mentor)
* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
a pointer-fetching specific operation check. Consequently, rename the
operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
directly alignment on the word boundry, for all the given specific
architectures. That's a bit too strict for some common case, but it
assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation
Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)
The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.
PR: 115406
Submitted by: Kent Hauser <kent@khauser.net>
Approved by: re (kib)
Fix parse() so that the partition to boot (load /boot/loader) from can
be set. The syntax as printed in main() is used: 0:ad(0p3)/boot/loader
Reviewed by: jhb
Approved by: re (kib)
getcwd() (when __getcwd() fails) works by stating current directory, going up
(..), calling readdir and looking for previous directory inode. In case of
.zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it
won't be visible in readdir output.
Fix this by implementing VPTOCNP for snapshot directories, so __getcwd()
doesn't fail and getcwd() doesn't have to use readdir method.
This fixes /bin/pwd from within .zfs/snapshot/<name>/.
Suggested by: kib
Approved by: re (rwatson)
Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have
0 usecount.
Reported by: Thomas Backman <serenity@exscape.org>
Approved by: re (kib)
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.
Approved by: re (kib)
Because taskqueue_run() can drop tq_mutex, we need to check if the
TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a
wakeup.
Approved by: re (kib)
- Fix a race where /dev/zfs control device is created before ZFS is fully
initialized. Also destroy /dev/zfs before doing other deinitializations.
- Initialization through taskq is no longer needed and there is a race
where one of the zpool/zfs command loads zfs.ko and tries to do some work
immediately, but /dev/zfs is not there yet.
Reported by: pav
Approved by: re (kib)
Change the usb workers from kernel processes to threads, this is mostly a
cosmetic change to reduce cruft in the proc table.
Also change the idle wait message to `-` like how taskqueues are.
Reviewed by: julian
Approved by: re (kib)
* Fix a bug where PR-SCTP settings are ignore when using implicit
association setup.
* Fix a bug where message with illegal stream ids are not deleted.
* Fix a crash when reporting back unsent messages from the send_queue.
* Fix a bug related to INIT retransmission when the socket is already
closed.
* Fix a bug where associations were stalled when partial delivery API
was enabled.
* Fix a bug where the receive buffer size was smaller than the
partial_delivery_point.
Approved by: re, rrs (mentor)
Proprely intialize UART parameters at probe stage, so uart(4)
will initialize the FIFO memory correctly on attach. Before
that this values was intialized in only in at91_usart_bus_attach
which is called after the uart(4) memory allocation happens.
Approved by: re (kib)
MFC after: 1 week
In function ip_output(), the cached route is flushed when there is a
mismatch between the cached entry and the intended destination. The
cached rtentry{} is flushed but the associated llentry{} is not. This
causes the wrong destination MAC address being used in the output
packets. The fix is to flush the llentry{} when rtentry{} is cleared.
Reviewed by: kmacy, rwatson
Approved by: re
Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
SCTP is not yet compatible with options VIMAGE kernels although it compiles
with VIMAGE defined, so explicitly disallow building such kernels.
Reviewed by: rrs
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from
INVARIANTS.
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Add a new macro to test that a variable could be loaded atomically.
Check that the given variable is at most uintptr_t in size and that
it is aligned.
Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
alignment -- however, the function of ALIGN() is to guarantee
alignment, and therefore may lead to stronger alignment
enforcement than necessary for types that are smaller than
sizeof(uintptr_t).
Add checks to mtx, rw and sx locks init functions to detect possible
breakage. This was used during debugging of the problem fixed with
r196118 where a pointer was on an un-aligned address in the dpcpu area.
In collaboration with: rwatson
Reviewed by: rwatson
Approved by: re (kib)
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.
Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks
Take the number of allocated freeblks into consideration for
softdep_slowdown(), to prevent kernel memory exhaustioni on
mass-truncation.
Approved by: re (rwatson)
In nfs_upgrade_vnlock(), assert that the vnode is locked.
When downgrading, pass LK_RETRY to the vn_lock(), since otherwise
vn_lock() unlocks the doomed vnode, causing extra unlock.
Approved by: re (rwatson)
URL: http://svn.freebsd.org/changeset/base/196201
Fix ipfw crash on uid or gid check.
Receiving any ip packet for which there is no existing socket will
crash if ipfw has a uid or gid test rule, as the uid/gid
of the non existent owner of said non existent socket is tested.
Brooks introduced this error as part of his >16 gids patch.
It appears to be a cut-n-paste error from similar code a few lines
before. The old code used the 'pcb' variable here, but in the
new code that switched the 'inp' variable, which is often NULL
and what is tested in the code further up. The rest of the multi-gid
patch for ipfw seems solid (and cleaner than previous code).
p.s. What's up with all the properties changing? It is a fresh checkout.
Reviewed by: brooks
Approved by: re (rwatson)
* Completely remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Add an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
Reverse misordered unlock and lock in at_control for netatalk phase I
addresses.
Submitted by: Russell Cattelan <cattelan at thebarn.com>
Approved by: re (kib)
Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.
Reviewed by: rwatson, zec
Approved by: re (kib)
Put multiple instructions into a block when iterating; unbreaks
NET_RT_DUMP, which otherwise only returned information of AF_MAX.
This was broken in r193232 (save your time - my bug, my fix).
Reported by: Larry Baird (lab gta.com)
Tested by: Larry Baird (lab gta.com)
Reviewed by: zec, lstewart, qing
PR: kern/137700
Approved by: re (kib)
Drain link state event changes posted during vap destroy. This is a
band-aid for the general problem that if_link_state_change can be
called between if_detach and if_free leaving a task queued that has
been free'd.
Reviewed by: rwatson
Approved by: re (rwatson)
A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.
Approved by: re
Add a check for a NULL mbuf ptr at the beginning of xdrmbuf_inline()
so that it returns failure instead of crashing when "m->m_len" is
executed and m == NULL. The mbuf ptr can be NULL when a call to
xdrmbuf_getbytes() gets the bytes it needs, but they are at the end
of a short RPC reply. When this happens, xdrmbuf_getbytes() returns
success, but advances the mbuf ptr (xdrs->x_private) to m_next, which
is NULL. If this is followed by a call to xdrmbuf_getlong(), it calls
xdrmbuf_inline(), which would cause a crash by accessing "m->m_len".
Approved by: re (rwatson), kib (mentor)
Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.
Reviewed by: rwatson
Approved by: re (rwatson)
Add ddb show dpcpu_off command to ease dpcpu memory debugging.
While show pcpu prints pc_dynamic this also prints the original
memory address as well as the maths.
Once dpcpu goes NUMA this is considered to help debugging as well.
Reviewed by: rwatson
Approved by: re
Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.
These sections will not support symbols with super-cache line alignment
requirements.
For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.
Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib
Approved by: re
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf
In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.
Reviewed by: bz
Approved by: re (kib)
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.
Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.
For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.
Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.
Bump __FreeBSD_version in order to reflect the newbus lock introduction.
Reviewed by: ed, hps, jhb, imp, mav, scottl
No answer by: ariff, thompsa, yongari
Tested by: pho,
G. Trematerra <giovanni dot trematerra at gmail dot com>,
Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by: Yahoo! Incorporated
Approved by: re (ksmith)
- fix write() on pseudo-terminal masters to return the amount of bytes
passed to the TTY, not the amount of bytes read from user.
- fix ttydisc_rint_bypass() to set the high watermark when it cannot
write all input, just like ttydisc_rint() itself.
Approved by: re (kib)
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.
Reviewed by: bz
Approved by: re (vimage blanket)
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)
firmware loading bugs.
Target mode support has received some serious attention to make it
more usable and stable.
Some backward compatible additions to CAM have been made that make
target mode async events easier to deal with have also been put
into place.
Further refinement and better support for NP-IV (N-port Virtualization)
is now in place.
Code for release prior to RELENG_7 has been stripped away for code clarity.
Sponsored by: Copan Systems
Reviewed by: scottl, ken, jung-uk kim
Approved by: re
instead of whatever the parent/system has (which is generally 0). This
mirrors the old-style default used for jail(2) in conjunction with the
security.jail.enforce_statfs sysctl.
Approved by: re (kib), bz (mentor)
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
sanitizing stdin/out/err file descriptors during execve().
Submitted by: kib
Approved by: re (rwatson)
MFC after: 1 week
in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock
can be avoided by making a restriction on the IP addresses associated with
jails:
Don't allow the "ip4" and "ip6" parameters to be changed after a jail is
created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed,
but only if the jail was already created with either ip4/6=new or
ip4/6=disable. With this restriction, the prison flags in question
(PR_IP4_USER and PR_IP6_USER) become read-only and can be checked
without locking.
This also allows the simplification of a messy code path that was needed
to handle an existing prison gaining an IP address list.
PR: kern/136899
Reported by: Dirk Meyer
Approved by: re (kib), bz (mentor)
USB core:
- add support for defragging of written device data.
- improve handling of alternate settings in device side mode.
- correct return value from usbd_get_no_alts() function.
- reported by: HPS
- P4 ID: 166156, 166168
- report USB device release information to devd and pnpinfo.
- reported by: MIHIRA Sanpei Yoshiro
- P4 ID: 166221
Submitted by: hps
Approved by: re
See PR description for more info. Patch is
implemented differently than suggested, but
having the same result.
PR: usb/137188
Submitted by: hps
Approved by: re
- add support for defragging of written device data.
- improve handling of alternate settings in device side mode.
- correct return value from usbd_get_no_alts() function.
- reported by: HPS
- P4 ID: 166156, 166168
- report USB device release information to devd and pnpinfo.
- reported by: MIHIRA Sanpei Yoshiro
- P4 ID: 166221
Submitted by: hps
Approved by: re
- Add minimum polling support to drive UMASS
and UKBD in case of panic.
- Add extra check to ukbd probe to fix problem about
mouse devices attaching like keyboards.
- P4 ID: 166148
Submitted by: hps
Approved by: re
- add support for setting the UMS polling rate through -F option
passed to moused.
- requested by Alexander Best
- P4 ID: 166075
PR: usb/125264
Submitted by: hps
Approved by: re
jails have their own IP stack and don't have access to the parent IP
addresses anyway. Note that a virtual network stack forms a break
between prisons with regard to the list of allowed IP addresses.
Approved by: re (kib), bz (mentor)
"disable", which only allows access to the parent/physical system's
IP addresses when specifically directed. Change the default value of
"host" to "new", and don't copy the parent host values, to insulate
jails from the parent hostname et al.
Approved by: re (kib), bz (mentor)
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.
Approved by: re (kensmith), kib (mentor)
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.
Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.
Proposed and reviewed by: alc
Approved by: re (kensmith)
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records. This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.
Approved by: re (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month
to avoid exposing ARG_ macros/flag values outside of the audit code in
order to name which one of two possible vnodes will be audited for a
system call.
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 month
instead of the root/current working directory as the starting point for
lookups. Up to two such descriptors can be audited. Add audit record
BSM encoding for fooat(2).
Note: due to an error in the OpenBSM 1.1p1 configuration file, a
further change is required to that file in order to fix openat(2)
auditing.
Approved by: re (kib)
Reviewed by: rdivacky (fooat(2) portions)
Obtained from: TrustedBSD Project
MFC after: 1 month
L2 information. For an indirect route the cached L2 entry contains the
MAC address of the gateway. Typically the default route is used to
transmit multicast packets when explicit multicast routes are not
available. The ether_output() function bypasses L2 resolution function
if it verifies the L2 cache is valid, because the cached L2 address
(a unicast MAC address) is copied into the packets as the destination
MAC address. This validation, however, does not apply to broadcast and
multicast packets because the destination MAC address is mapped
according to a standard method instead.
Submitted by: Xin Li
Reviewed by: bz
Approved by: re
processing code holds the read lock (when processing a
FWD-TSN for pr-sctp). If it finds stranded data that
can be given to the application, it calls sctp_add_to_readq().
The readq function also grabs this lock. So if INVAR is on
we get a double recurse on a non-recursive lock and panic.
This fix will change it so that readq() function gets a
flag to tell if the lock is held, if so then it does not
get the lock.
Approved by: re@freebsd.org (Kostik Belousov)
MFC after: 1 week
- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.
Reviewed by: bz
Approved by: re
old ABI versions of the relevant control system call (e.g.
freebsd7_freebsd32_msgctl() instead of freebsd32_msgctl() for msgsys()).
Approved by: re (kib)
panic when in zfs_fuid_create_cred() when userid is negative. It is
converted to unsigned value which makes IS_EPHEMERAL() macro to
incorrectly report that this is ephemeral ID. The most reasonable
solution for now is to always report that the given ID is not ephemeral.
PR: kern/132337
Submitted by: Matthew West <freebsd@r.zeeb.org>
Tested by: Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com>
Approved by: re (kib)
MFC after: 2 weeks
* don't clobber proxy entries
* HWMP seq number processing, including discard of old frames
* flush routing table entries based on nexthop
* print route flags in ifconfig
* more debugging messages and comments
Proxy changes submitted by sam.
Approved by: re (kib)
requesting IDENTIFY from slave device first. This order is important
for proper cable type detection by master device.
PR: kern/136438
Approved by: re (kib)
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
(ifconfig ifN (-)vnet <jname|jid>) work correctly.
Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.
In the reclaim case, correctly set the vnet before calling if_vmove.
Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)
There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.
Suggested by: rwatson (*)
Approved by: re (kib)
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.
Approved by: re (kib), bz (mentor)
Discussed with: rwatson
ability to retrieve the group list of each process.
Modify procstat's -s option to query this mib when the kinfo_proc
reports that the field has been truncated. If the mib does not exist,
fall back to the truncated list.
Reviewed by: rwatson
Approved by: re (kib)
MFC after: 2 weeks
- When a vlan event occurs a check was not made that
the event was actually for the interface, thus resulting
in a panic. All three drivers have this vulnerability. Add
a check for this condition.
- Secondly, there was a duplicate buf_ring free in the em
driver resulting in a panic on unload. Remove.
Approved by: re
part that is made up of 8K banks rather than 4K, if these
systems are using bank 1 then the last change in this code
breaks the bank read, resulting in an invalid checksum of
the eeprom during driver load. This change fixes this.
Approved by: re
frequency w/o regulatory issues, do this by hooking if_transmit and
if_output with routines that discard all transmits
Reviewed by: thompsa, cbzimmer (intent)
Approved by: re (kensmith)
in up to 16 (KI_NGROUPS) values and steal a bit from ki_cr_flags
(all bits currently unused) to indicate overflow with the new flag
KI_CRF_GRP_OVERFLOW.
This fixes procstat -s.
Approved by: re (kib)
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)
if input-device is unavailable. The Xserve G5 defaults to using
screen/keyboard for output-device/input-device even if these are not
installed, and then falls back to serial ports at boot time.
Reviewed by: marcel
Hardware from: grehan
Approved by: re (kib)
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.
Approved by: re (kensmith), kib (mentor)
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0. For some
reason this didn't manifest on my kernel compiled with -O0.
PR: kern/136601
Submitted by: Jaakko Heinonen <jh at saunalahti dot fi>
Approved by: re (kib)
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.
Approved by: re (kensmith), kib (mentor)
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.
Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.
Proposed and reviewed by: alc
Approved by: re (kensmith)
being issued from the server, there was a case where an Open issued locally
based on the delegation would be released before the associated vnode
became inactive. If the delegation was recalled after the open was released,
an Open against the server would not have been acquired and subsequent I/O
operations would need to use the special stateid of all zeros. This patch
fixes that case.
Approved by: re (kensmith), kib (mentor)
loader, because it uses a reserved suffix (_type). Fix
this by removing the "_" and renaming the tunable to
hw.mxge.rss_hashtype. The old (rss_hash_type) tunable is
still fetched, in case people load the driver via scripts.
When both are present in the kernel environment,
the new value (hw.mxge.rss_hashtype) overrides the old
value.
Approved by: re (kib)
non-vrtiualized sysctls so we cannot used one common function.
Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.
Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.
Convert the two over places to use the new macro.
Reviewed by: rwatson
Approved by: re (kib)
this was broken in r183248 when the check of wk_keyix was replaced by
a check of IEEE80211_KEY_DEVKEY (because the flag was clobbered). Define
IEEE80211_KEY_DEVICE to specify flags that are owned by net80211/driver
and use this to preserve IEEE80211_KEY_DEVKEY so we don't ask the driver
for another key index when we already have one.
Testing by: Daniel Thiele, Wes Morgan
Reviewed by: rpaulo
Approved by: re (kib)
actually specify valid bases that should be treated just as normal.
The PCI specifications have no indication that 0 would be a magic value
indicating a disabled BAR as commonly used on at least amd64 and i386
but not sparc64. It's unclear what to do in pci_delete_resource()
instead of writing 0 to a BAR though as there's no (other) way do
disable individual BARs so its decoding is left enabled in case of
__PCI_BAR_ZERO_VALID for now.
Approved by: re (kib), jhb
MFC after: 1 week
Driver supports Serial ATA and ATAPI devices, Port Multipliers
(including FIS-based switching), hardware command queues (31 command
per port) and Native Command Queuing. This is probably the second on
popularity, after AHCI, type of SATA2 controllers, that benefits from
using CAM, because of hardware command queuing support.
Approved by: re (kib)
It turns LBC control registers were not programmed correctly on MPC85XX. We
were accessing bogus addresses as the base offset (OCP85XX_LBC_OFF) was
erroneously added during offset calculations. Effectively the state of LBC
control registers was not altered by the kernel initialization code, but
everything worked as long as we coincided to use the same settings (LBC decode
windows) as firmware has initialized.
Submitted by: Lukasz Wojcik
Reviewed by: marcel
Approved by: re (kensmith)
Obtained from: Semihalf
On some ARM variations CPU func dispatcher has the D-cache invalidate method
point to write-back invalidate, which is wrong, and can lead to a crash/panic
on affected platforms.
Spotted by: HPS
Reviewed by: cognet
Approved by: re (kib)
nagged again via PR. Thank Paul for his persistence and contributions.
PR: 136895
Submitted by: Paul B. Mahol <onemda@gmail.com>
Reviewed by: sam (timeout, 10 days), weongyo (timeout, 10 days), me
Approved by: re (Kostik Belousov <kostikbel@gmail.com>)
back to the bottom of ip_init() as found in 7.x. I missed the fact that
the bottom half of the init routine only runs in the !VNET case.
Submitted by: zec
Approved by: re (vimage blanket)
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.
This patch makes it behave just like UFS, i.e. depend on current
file permissions. Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).
Reviewed by: pjd (idea, not actual code)
Approved by: re (kib)
* bridge support (sam)
* handling of errors (sam)
* deletion of inactive routing entries
* more debug msgs (sam)
* fixed some inconsistencies with the spec.
* decap is now specific to mesh (sam)
* print mesh seq. no. on ifconfig list mesh
* small perf. improvements
Reviewed by: sam
Approved by: re (kib)
nor destructors, as there's no actual work to do.
In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.
Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.
Reviewed by: bz
Approved by: re (vimage blanket)
a) nocache-remap problem
When a page is remapped into a non-cacheable virtual memory region there
was no associated write-back invalidate operation performed. We remove
writeback of the original buffer size from bus_dmamem_alloc() and add
appropriate L1/L2 flush operation.
b) missing write-back invalidate operation
In pmap_kremove a page is removed so we must do a write-back
invalidate operation aligned to the page virtual address.
Submitted by: Michal Hajduk
Reviewed by: Mark Tinguely, rpaulo, stas
Approved by: re (kib)
Obtained from: Semihalf
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().
The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.
Add pmap_page_set_memattr() to the Xen pmap.
Approved by: re (kib)
portion of the page that was written. Among other problems, this
page might be picked up by pagedaemon, with failed assertion in
vm_pageout_flush() about validity of the page.
Reported and tested by: pho
Approved by: re (kensmith)
MFC after: 3 weeks
preparation for 8.0-RELEASE. Add the previous version of those
libraries to ObsoleteFiles.inc and bump __FreeBSD_Version.
Reviewed by: kib
Approved by: re (rwatson)
- When the root vnode was acquired during mounting, mnt_stat.f_iosize was
still set to 0, so getnewvnode() would set bo_bsize == 0. This would
confuse getblk(), so that it always returned the first block causing
the problem when the root directory of the mount point was greater
than one block in size. It was fixed by setting mnt_stat.f_iosize to
NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode.
- NFSMNT_INT was being set temporarily while the initial connect to a
server was being done. This erroneously configured the krpc for
interruptible RPCs, which caused problems because signals weren't
being masked off as they would have been for interruptible mounts.
This code was deleted to fix the problem. Since mount_nfs does an
NFS null RPC before the mount system call, connections to the server
should work ok.
Tested by: swell dot k at gmail dot com
Approved by: re (kensmith), kib (mentor)
if _WANT_VNET is defined. This is required so that libkvm can locate
virtual network stack instances in order to reach their global variables
for monitoring and crashdump analysis.
Reviewed by: bz
Approved by: re (kib)
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.
Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.
Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.
Update various consumers of these KPIs based on whether they may sleep
or not.
Reviewed by: bz
Approved by: re (kib)
a valid zone ID or interface identifier in a v6 multicast leave, would
trigger a fairly paranoid KASSERT().
Observed with Boost++ regression tests on ref8.freebsd.org.
Approved by: re (kib)
If the access counts were not increased and decreased in equal numbers by
gvinum consumers, the read access count would be inconsistent with the write
access count. Instead, modify the read access count with the write access
count directly to prevent any inconsistencies.
Approved by: re (kib)
configuring machine-dependent memory attributes...":
Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.
Eliminate pointless code from pmap_cache_bits() on amd64.
Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.
Approved by: re (kib)
r195704 for the experimental client. The patch avoids calling vn_lock()
for the case where nfs_nget() has acquired the same vnode as dvp,
since nfs_nget() has already locked the vnode.
Reviewed by: kib, jhb
Approved by: re (kensmith), kib (mentor)
contrib/openbsm and a subset also imported into sys/security/audit.
This patch release addresses several minor issues:
- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
error number space.
MFC after: 3 weeks
Obtained from: TrustedBSD Project
Sponsored by: Apple, Inc.
Approved by: re (kib)
since the last imported OpenBSM release:
OpenBSM 1.1p1
- Fixes to AUT_SOCKUNIX token parsing.
- IPv6 support for au_to_me(3).
- Improved robustness in the parsing of audit_control, especially long
flags/naflags strings and whitespace in all fields.
- Add missing conversion of a number of FreeBSD/Mac OS X errnos to/from BSM
error number space.
Obtained from: TrustedBSD Project
Sponsored by: Apple, Inc.
Currently dtrace_gethrtime uses formula similar to the following for
converting TSC ticks to nanoseconds:
rdtsc() * 10^9 / tsc_freq
The dividend overflows 64-bit type and wraps-around every 2^64/10^9 =
18446744073 ticks which is just a few seconds on modern machines.
Now we instead use precalculated scaling factor of
10^9*2^N/tsc_freq < 2^32 and perform TSC value multiplication separately
for each 32-bit half. This allows to avoid overflow of the dividend
described above.
The idea is taken from OpenSolaris.
This has an added feature of always scaling TSC with invariant value
regardless of TSC frequency changes. Thus the timestamps will not be
accurate if TSC actually changes, but they are always proportional to
TSC ticks and thus monotonic. This should be much better than current
formula which produces wildly different non-monotonic results on when
tsc_freq changes.
Also drop write-only 'cp' variable from amd64 dtrace_gethrtime_init()
to make it identical to the i386 twin.
PR: kern/127441
Tested by: Thomas Backman <serenity@exscape.org>
Reviewed by: jhb
Discussed with: current@, bde, gnn
Silence from: jb
Approved by: re (gnn)
MFC after: 1 week
modules was present, which turns out to be false in some situations.
Back out the assertion.
Reported by: Luiz Otavio O Souza <lists.br at gmail.com>,
Florian Smeets <flo at kasimir.com>
Approved by: re (kensmith) (implicit)
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.
Approved by: re (kensmith), kib (mentor)
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.
Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)
PCATCH, to indicate that thread shall not be stopped upon receipt of
SIGSTOP until it reaches the kernel->usermode boundary.
Also change thread_single(SINGLE_NO_EXIT) to only stop threads at
the user boundary unconditionally.
Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)
process that still need to be suspended or exited from thread_single
into the new function calc_remaining().
Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
behavior is mandated by POSIX.
- Do not fail requests that pass a length greater than SSIZE_MAX
(such as > 2GB on 32-bit platforms). The 'len' parameter is actually
an unsigned 'size_t' so negative values don't really make sense.
Submitted by: Alexander Best alexbestms at math.uni-muenster.de
Reviewed by: alc
Approved by: re (kib)
MFC after: 1 week
This dramatically pushing 99.9% interpolations and quantizations
error _below_ -180dB on 32bit dynamic range, resulting extremely
high quality conversion.
- Use BSPLINE interpolator for filter oversampling factor greater or
equal than 64 (log2 6).
Approved by: re (kib)
page_list using the matching malloc type for the allocation.
Approved by: re
Reviewed by: scottl [1]
MFC after: 1 week
[1] Original patch was against xpt_cam.c, prior to the cam refactoring.
check missed this because cxgb's TOM is currently commented out of the build
system.
Submitted by: Navdeep Parhar <np at FreeBSD dot org>
Approved by: re (kensmith), kensmith (mentor temporarily unavailable)