objects.
Assert that there is no new waiters for the already terminated objects.
Old waiters should have been notified by the termination calling
vnode_pager_dealloc() (old/new are with regard of the lock acquisition
interval).
Only clear the vp->v_object for the case of already terminated object,
since other branches call vnode_pager_dealloc(), which should clear
the pointer. Assert this.
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Remove the use of fdt_data_to_res(), and instead construct the resources
manually. Additionally, avoid the 32-bit size limitation of fdt_data_get(), by
building physical addresses manually from the lbc ranges property.
Approved by: re@(gjb)
related to "shared" CPLs.
a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.
b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.
c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.
Approved by: re@ (kib@)
Sponsored by: Chelsio Communications
callout thread that can cause a kernel panic. Always do the final cleanup
in the callout thread by passing a separate callout function for that task
to callout_reset_sbt().
Protect the ref_count decrement in the callout with DN_BH_WLOCK(). All
other ref_count manipulation is protected with this lock.
There is still a tiny window between ref_count reaching zero and the end
of the callout function where it is unsafe to unload the module. Fixing
this would require the use of callout_drain(), but this can't be done
because dummynet holds a mutex and callout_drain() might sleep.
Remove the callout_pending(), callout_active(), and callout_deactivate()
calls from calculate_drop_prob(). They are not needed because this callout
uses callout_init_mtx().
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
Approved by: re (gjb)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D6928
calculate appropriate return value for stops. Simplify the code by
using them.
Fix typo in sig_suspend_threads(). The thread which sleep must be
aborted is td2. (*)
In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region. For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)
Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)
Reported and tested by: pho (*)
Requested by: bde (**)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
does it under the vnode interlock, but the interlock is not owned by the
asserting thread. As result, we might read increased use counter but also
still see VI_OWEINACT.
In collaboration with: nwhitehorn
Hardware donated by: IBM LTC
Sponsored by: The FreeBSD Foundation (kib)
Approved by: re (gjb)
the knote is activated immediately. If the exit1() later activates
knotes, such knote is attempted to be activated second time. Detect
the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive
activation.
Before r302235, such knotes were removed from the knlist immediately
upon activation.
Reported by: truckman
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
is still operational before doing any work; otherwise we might
run into, e.g., destroyed locks.
PR: 210724
Reported by: olevole olevole.ru
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Obtained from: projects/vnet
Approved by: re (gjb)
Split initializzation an teardown into module (global state) and VNET
(per virtual network stack) parts. Virtualise global state, which is
not "const".
Cleanup eventhandlers, so that we can make use of the passed in argument
to get the vnet state from the ifp; disable the "cloner" event as it is
too early, has no state, and can fire before initialisation (see comment
in the source).
Handle the dynamic sysctls specially. The problem is that "ipmain"
is the virtualized struct, but the fields used for the sysctls are
hanging off memory allocated and attached to the virtualized "ipmain"
thus standard VNET macros and sysctl handling do not work.
We still say it is VNET sysctls to get the proper protection checks
in the VIMAGE case; to solve the problem of accessing the right bit
of memory hanging of each per-VNET ipmain, we use a dedicated handler
function wrapping around sysctl_ipf_int() undoing the base calculation
from kern_sysctl.c and then adding the passed-in offset into the right
struct depending on handler. A bit of a mess exposing VNET-internals
this way but the only way to keep the code without having to massively
restructure ipf internals.
Approved by: re (hrs)
Sponsored by: The FreeBSD Foundation
Obtained from: projects/vnet
MFC after: 2 weeks
Reviewed by: cy
Differential Revision: https://reviews.freebsd.org/D7000
Those changes were found confusing FreeBSD libc ACL code, that doesn't
differentiate ACL for directories and files, and report ACLs for all
directories created after those patches as non-trivial. On the other
side these changes were considered wrong from POSIX and NFSv4 points of
view. Until further investigation done upstream, revert those changes
locally in preparation for FreeBSD 11.0 release.
Approved by: re (hrs)
for SCTP DATA and I-DATA chunks.
* For fragmented user messages, set the I-Bit only on the last
fragment.
* When using explicit EOR mode, set the I-Bit on the last
fragment, whenever SCTP_SACK_IMMEDIATELY was set in snd_flags
for any of the send() calls.
Approved by: re (hrs)
MFC after: 1 week
This patch offers a workaround to buf_ring reordering
visible on armv7 and armv8. This is supposed to be
removed once new buf_ring implementation is integrated
into the tree.
Obtained from: Semihalf
Reviewed by: alc,emaste
Differential Revision: https://reviews.freebsd.org/D6986
Approved by: re (gjb)
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
allocations from ipfilter in preparation for VNET support.
Suggested by: cy (see D7000)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Recource management functions in GT PCI controller driver
treated memory/IO resources as KSEG1 addresses, later during
activation these values would be increased by KSEG1 base again
rendering the address invalid and causing "bus error" trap.
Actual logic was converted to use real physical addresses,
so mapping takes place only during activation.
Submitted by: Aleksandr Rybalko <ray@FreeBSD.org>
Approved by: re (gjb)
necessary because CLOOP format lacks explicit EOF or length, so that
in the presence of padding or when the CLOOP is put onto a larger
partition upper level provider size may be larger. Bound amount
of extra data that we might touch to the max length of the compressed
block and detect zero-padding in the last cluster, which when
sector is all-zero might cause us to emit bogus I/O error after
decompression of that fails. To not make code any more complicated
that it needs to be deal with it in lazy-manner, i.e. when we
first access that specific cluster.
This change also fixes stupid mistake in the LZMA code, inherited
from geom_lzma, which does not share length of the output buffer
buffer with the decompression routine, so that in the presence
of corrupted or purposedly tailored data may easily cause heap
overflow and kernel memory corruption.
Beef up validation of the CLOOP TOC by checking that lengths of
all but the last compressed clusters match upper limit set by
the decompressor and improve some error diagnostic output while
I am here.
2.Add kern.geom.uzip.attach_to tunable to artifically limit
attaching uzip to certain devices in the dev tree only.
For example the following only makes us attaching to the
GPT labels:
kern.geom.uzip.attach_to="gpt/*"
3.Add kern.geom.uzip.noattach_to, which does opposite to the (2)
above, i.e. prevents geom_uzip from tasting / attaching to
providers matching some pattern. By default we don't attach
to our own kind, i.e. kern.geom.uzip.noattach_to="*.uzip".
It saves us quite some CPU cycles, esp on low-end embedded
systems.
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7013
Add new lock for stageq (part of ieee80211_superg structure) and
ni_tx_superg (part of ieee80211_node structure);
drop com_lock protection where it is used to protect them.
While here, drop duplicate OPACKETS counter incrementation.
ni_tx_ampdu is not protected with it (however, it is also used without
locking in other places; probably, it requires some other solution
to be thread-safe).
Tested with RTL8188CUS (AP) and RTL8188EU (STA).
NOTE: Since this change breaks KBI, all wireless drivers need to be
recompiled.
Reviewed by: adrian
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D6958
late boot: enable it explicitly after installing the page tables. If booting
from an FDT, also make sure to escape the firmware's MMU context early
before overwriting firmware page tables.
Approved by: re (gjb)
Prior to this change ZFS ARC min / max could only be changed using
boot time tunables, this allows the values to be tuned at runtime
using the sysctls:
* vfs.zfs.arc_max
* vfs.zfs.arc_min
When adjusting ZFS ARC minimum the memory used will only reduce
to the new minimum given memory pressure.
Reviewed by: allanjude
Approved by: re (gjb)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D5907
The interface's queues are functional after VI_INIT_DONE (which is short
of interface-up) and that's all that's needed for t4_tom to communicate
with the chip.
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
will cal if_free() in case of conflict, error, ..
if_free() however sets the VNET instance from the ifp->if_vnet which
was not yet initialized but would only in if_attach(). Fix this by
setting the curvnet from where we allocate the interface in if_alloc().
if_attach() will later overwrite this as needed. We do not set the home_vnet
early on as we only want to prevent the if_free() panic but not change any
of the other housekeeping, e.g., triggered through ifioctl()s.
Reviewed by: brooks
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7010
particular, the Giant is supposed to protect against parallel
ntp_adjtime(2) invocations. But, for instance, sys_ntp_adjtime() does
copyout(9) under Giant and then examines time_status to return syscall
result. Since copyout(9) could sleep, the syscall result might be
inconsistent.
Another and more important issue is that if PPS is configured,
hardpps(9) is executed without any protection against the parallel
top-level code invocation. Potentially, this may result in the
inconsistent state of the ntptime state variables, but I cannot say
how serious such distortion is. The non-functional splclock() call in
sys_ntp_adjtime() protected against clock interrupts calling hardpps()
in the pre-SMP era.
Modernize the locking. A mutex protects ntptime data. Due to the
hardpps() KPI legitimately serving from the interrupt filters (and
e.g. uart(4) does call it from filter), the lock cannot be sleepable
mutex if PPS_SYNC is defined. Otherwise, use normal sleepable mutex
to reduce interrupt latency.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6825
private mtx in resettodr(), no implementation of CLOCK_SETTIME() is
allowed to sleep.
Reviewed by: imp, jhb
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
X-Differential revision: https://reviews.freebsd.org/D6825
TDF_SEINTR flags values, unlike TDF_SBDRY, must be treated almost as
if TDF_SBDRY is not set for STOP signal delivery. The only difference
is that sig_suspend_threads() should abort the sleep instead of doing
immediate suspension.
Reported by: ngie
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Approved by: re (gjb)
structure, change it to int.
The real fix is to sanitize user-visible definitions in sys/event.h,
e.g. the affected struct knlist is of no use for userspace programs.
Reported and tested by: jkim
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
sleepable scan, iteration over the shadow chain looking for a page
could find an OBJ_DEAD object. Such state of the mapping is only
transient, the dead object will be terminated and removed from the
chain shortly. We must not return KERN_PROTECTION_FAILURE unless the
object type is changed to OBJT_DEAD in the chain, indicating that
paging on this address is really impossible. Returning
KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered
to processes, or kernel accesses to UVA spuriously failing with
EFAULT.
If the object with OBJ_DEAD flag is found, only return
KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD.
Otherwise, sleep a tick and retry the fault handling.
Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by
waiting until the paging on this object is finished. But to do so, we
need to reference the dead object, while vm_object_collapse() insists
on owning the final reference on the collapsed object. This could be
fixed by e.g. changing the assert to shared reference release between
vm_fault() and vm_object_collapse(), but it seems to be too much
complications for rare boundary condition.
PR: 204426
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
X-Differential revision: https://reviews.freebsd.org/D6085
MFC after: 2 weeks
Approved by: re (gjb)
exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen:
- knote kn_knlist pointer is reset
- INFLUX knote is removed from the process knlist.
And, there are two consequences:
- KN_LIST_UNLOCK() on such knote is nop
- there is nothing which would block exit1() from processing past the
knlist_destroy() (and knlist_destroy() resets knlist lock pointers).
Both consequences result either in leaked process lock, or
dereferencing NULL function pointers for locking.
Handle this by stopping embedding the process knlist into struct proc.
Instead, the knlist is allocated together with struct proc, but marked
as autodestroy on the zombie reap, by knlist_detach() function. The
knlist is freed when last kevent is removed from the list, in
particular, at the zombie reap time if the list is empty. As result,
the knlist_remove_inevent() is no longer needed and removed.
Other changes:
In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events
from kn_sfflags for knote registered by kernel to only get NOTE_CHILD
notifications. The flags leak resulted in excessive
NOTE_EXEC/NOTE_FORK reports.
Fix immediate note activation in filt_procattach(). Condition should
be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT
report for the exiting process.
In knote_fork(), do not perform racy check for KN_INFLUX before kq
lock is taken. Besides being racy, it did not accounted for notes
just added by scan (KN_SCAN).
Some minor and incomplete style fixes.
Analyzed and tested by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6859
interrupt sleeps with the ERESTART on the suspension attempts.
Otherwise, single-threading requests are deferred until the locks are
granted for NFS files, which causes hangs.
When retrying local registration of the remotely-granted adv lock,
allow full suspension and check for suspension, for usual reasons.
Reported by: markj, pho
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.
Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
Most of the effect of setting MSR[SF] is that the CPU will stop ignoring
the high 32 bits of registers containing addresses in load/store
instructions. As such, the kernel was setting it only when it began to
need access to high memory. MSR[SF] also affects the operation of some
conditional instructions, however, and so setting it at late times could
subtly break code at very early times. This fixes use of the FDT mode in
loader, and FDT boot more generally, on 64-bit PowerPC systems.
Hardware provided by: IBM LTC
Approved by: re (kib)
[1] Remove unneeded sockaddr conversion before kern_recvit() call as the from
argument is used to record result (the source address of the received message) only.
[2] In Linux the type of msg_namelen member of struct msghdr is signed but native
msg_namelen has a unsigned type (socklen_t). So use the proper storage to fetch fromlen
from userspace and than check the user supplied value and return EINVAL if it is less
than 0 as a Linux do.
Reported by: Thomas Mueller <tmueller at sysgo dot com> [1]
Reviewed by: kib@
Approved by: re (gjb, kib)
MFC after: 3 days
for messages which have been put on the send queue:
* Do not report any DATA or I-DATA chunk padding.
* Correctly deal with the I-DATA chunk header instead of the DATA
chunk header when the I-DATA extension is used.
Approved by: re (kib)
MFC after: 1 week