r256543:
Add fasttrap for PowerPC. This is the last piece of the DTrace/ppc puzzle.
It's incomplete, it doesn't contain full instruction emulation, but it should be
sufficient for most cases.
r259245,r259421: (FBT)
FBT now does work fully on PowerPC.
Save r3 before using it for the trap check, else we end up saving the new r3,
containing the trap instruction encoding (0x7c810808), and restoring it back
with the frame on return. This caused it to panic on my ppc32 machine.
r259668,r259674:
Fix a typo in the FBT code.
r259394:
Rebase the PMC indices at 1, since PMC_SOFT is at 0.
r259395,r259699:
Add userland PMC backtracing, and use the PMC trapframe macros for kernel
backtraces.
ext2fs: fix inode flag conversion.
After r252890 we are naively attempting to pass through the
inode flags. This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.
Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.
Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.
PR: kern/185448
Take additional reference on SCSI probe periph to cover its freeze count.
Otherwise periph may be invalidated and freed before single-stepping freeze
is dropped, causing use after free panic.
MFV r258373:
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa
4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c47
Fix a braino with r259730: we cannot currently use CFLAGS.gcc or
CFLAGS.clang in sys/conf/Makefile.arm, since the main kernel build does
not use <bsd.sys.mk>. So revert that particular change for now.
Pointy hat to: me
Noticed by: zbb
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
When a da or ada device dissappears, outstanding IOs fail with
ENXIO, not EIO. The check for EIO was probably copied from Illumos,
where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually
turn it into UNAVAIL, even though pulling an idle drive would turn
it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that
results in devd getting two resource.fs.zfs.removed events. The
comment said that the event had to be sent directly instead of
through the async removal thread because "the DE engine is using
this information to discard prevoius I/O errors". However, the fact
that vdev_geom_io_intr was never actually sending the events until
now, and that vdev_geom_orphan never sent them at all, and that
vdev_geom_orphan usually gets called about 2 seconds after the
actual removal, means that FreeBSD's userland can cope with a late
event just fine.
Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().
This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.
MFC r260187:
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.
MFC r260217:
Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.
Prevent users from deactivating the last component of a mirror.
MFC r259929:
Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.
The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.
Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.
PR: 184985
Add "resize" verb to gmirror(8) and such functionality to geom_mirror(4).
Now it is easy to expand the size of the mirror when all its components
are replaced. Also add g_resize method to geom_mirror class. It will write
updated metadata to new last sector, when parent provider is resized.
Split the last gcc-specific flags off into CFLAGS.gcc. This also
removes the need to use -Qunused-arguments for clang throughout the
tree.
MFC r260369:
Apply band-aid for 32-bit compat libs failures after r260334: put back
-Qunused-arguments for clang for now, until I can figure out a way to
make it unneeded in all scenarios. Sorry about the breakage.
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
MFC r260322:
In addition to r260102, also define GCC_MS_EXTENSIONS in bsd.sys.mk,
since kernel module builds do not use kern.pre.mk.
Add an OFW SPI compatible bus. Fix the spibus probe to return
BUS_PROBE_GENERIC and not BUS_PROBE_SPECIFIC (0) so the OFW SPI bus can
attach when enabled. Export the spibus devclass_t and driver_t
declarations.
Submitted by: ray
Approved by: adrian (mentor)
Implement automatic live resize support for GEOM MULTIPATH class.
In "manual" mode just automatically resize provider in any direction.
In "automatic" mode allow growth (with new metadata write); in case of
shrinking check if there is already valid metadata found at the new
location. This should allow easy transparent recovery if first resize
was done by mistake.
While there, unify metadata write code and fix minor memory leak.
Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to
a new value will subsequently take effect. Reads of IA32_PERF_CTL determine
the last targeted operating point." So seems it should be fine to just
trigger wanted transition and go. Linux does the same.
Update the description for pmap_remove_pages() to match the modern
times. Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.
Add a new sysctl / loader tunable kern.panic_reboot_wait_time which
defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config
setting). Use this now-variable value in place of the defined constant
to control how long the system waits after a panic before rebooting.
Bring back the old size of the kinfo_file structure to preserve ABI.
Keep only one uint64_t spare for further cap_rights_t expension.
Add a comment clarifying that if the size of this structure changes,
a new sysctl MIB has to be allocate for it and the old structure has
to be returned by the old sysctl MIB.
Requested by: re
Don't check for fd limits in fdgrowtable_exp.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.
locking support for CAM
r256826:
Fix several target mode SIMs to not blindly clear ccb_h.flags field of
ATIO CCBs. Not all CCB flags there belong to them.
r256836:
Remove hard limit on number of BIOs handled with one ATA TRIM request.
r256843:
Merge CAM locking changes from the projects/camlock branch to radically
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.
r256888:
Unconditionally acquire periph reference on CCB allocation failure.
r256895:
Fix memory and references leak due to unfreed path.
r256960:
Move CAM_UNQUEUED_INDEX setting to the last moment and under the periph lock.
This fixes race condition with cam_periph_ccbwait(), causing use-after-free.
r256975:
Minor (mostly cosmetical) addition to r256960.
r257054:
Some microoptimizations for da and ada drivers:
- Replace ordered_tag_count counter with single flag;
- From da remove outstanding_cmds counter, duplicating pending_ccbs list;
- From da_softc remove unused links field.
r257482:
Fix lock recursion, triggered by `smartctl -a /dev/adaX`.
r257501:
Make getenv_*() functions and respectively TUNABLE_*_FETCH() macros not
allocate memory and so not require sleepable environment. getenv() has
already used on-stack temporary storage, so just use it more rationally.
getenv_string() receives buffer as argument, so don't need another one.
r257914:
Some CAM locks polishing:
- Fix LOR and possible lock recursion when handling high-power commands.
Introduce new lock to protect left power quota and list of frozen devices.
- Correct locking around xpt periph creation.
- Remove seems never used XPT_FLAG_OPEN xpt periph flag.
Again, Netflix assisted with testing the merge, but all of the credit goes
to Alexander and iX Systems.
Submitted by: mav
Sponsored by: iX Systems
r256603:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
r256606:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.
r256607:
Fix passing uninitialized bio_resid argument to g_trace().
r256610:
Add unmapped I/O support to GEOM RAID.
r256830:
Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping
temporary mapped buffer. That fixes double unmap if biodone() called twice
for the same BIO (but with different done methods).
r256880:
Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
r259247:
Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit
goes to Alexander and iX Systems.
Submitted by: mav
Sponsored by: iX Systems
- Take BIO lock in biodone() only when there is no completion callback set
and so we should wake up thread waiting in biowait().
- Remove msleep() timeout from biowait(). It was added 11 years ago, when
there was no locks used, and it should not be needed any more.
Handle case when ACPI reports HPET device, but does not provide memory
resource for it. In such case take the address range from the HPET table.
This fixes hpet(4) driver attach on Asrock C2750D4I board.
Use relaxed (write-only) memory barriers when writing some of queue index
registers (for now on ISP2400+). We never read those registers back and
AFAIK their semantics does not require any immediate reaction on write.
Some more registers access optimizations:
- Process ATIO queue only if interrupt status tells so;
- Do not update queue out pointers after each processed command, do it
only once at the end of the loop.
Save one more register read per command by not reading rqstoutrp register
every time. The purpose of that register is unlikely output queue overflow
detection, so read it only when its last known (and probably stale now)
value signals overflow.
Optimize isp(4) to reduce CPU usage, especially in target mode:
- Remove two excessive and slow register reads from isp_intr(). Instead
of rereading value every time, assume that registers contain what we have
written there.
- Avoid sequential search through 4096 array elements when looking for
command tag. Use hash of lists to store active tags separately from free
ones and so greatly speedup the searches.
Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE
(64MB). Even if we would find one somehow, ZFS kernel code rejects such
devices. It is funny to look on attempts to read 4 256K vdev labels from
1.44MB floppy, though it is not very practical and quite slow.
Reenable vfs.zfs.zio.use_uma for amd64, disabled at r209261.
On machines with seveal CPUs and enough RAM this can easily twice improve
ZFS performance or twice reduce CPU usage. It was disabled three years
ago due to memory and KVA exhaustion reports, but our VM subsystem got
improved a lot since that time, hopefully enough to make another try.
Introduce allocation cache to store LZ4 compression contexts without kicking
VM subsystem twice for every written record.
Tests on 24-core system show double reduction of CPU time spent on copying
single large well-compressed file.
This patch is not really needed on illumos (while not harm either) since
their memory allocator by default uses caching for all requests up to 128K.
Make UMA to not blindly force offpage slab header allocation for large
(> PAGE_SIZE) zones. If zone is not multiple to PAGE_SIZE, there may
be enough space for the header at the last page, so we may avoid extra
header memory allocation and hash table update/lookup.
ZFS creates bunch of odd-sized UMA zones (5120, 6144, 7168, 10240, 14336).
This change gives good use to at least some of otherwise lost memory there.
Don't count bucket allocation failures for UMA zones as their own failures.
There are good reasons for this to happen, such as recursion prevention, etc.
and they are not fatal since buckets are just an optimization mechanism.
Real bucket allocation failures are any way counted by the bucket zones
themselves, and we don't need double accounting there.
Implement mechanism to safely but slowly purge UMA per-CPU caches.
This is a last resort for very low memory condition in case other measures
to free memory were ineffective. Sequentially cycle through all CPUs and
extract per-CPU cache buckets into zone cache from where they can be freed.
Grow UMA zone bucket size also on lock congestion during item free.
Lock congestion is the same, whether it happens on alloc or free, so
handle it equally. Now that we have back pressure, there is no problem
to grow buckets a bit faster. Any way growth is much slower then in 9.x.
Add two new UMA bucket zones to store 3 and 9 items per bucket.
These new buckets make bucket size self-tuning more soft and precise.
Without them there are buckets for 1, 5, 13, 29, ... items. While at
bigger sizes difference about 2x is fine, at smallest ones it is 5x and
2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13,
29, reducing jumps between steps, making algorithm work softer, allocating
and freeing memory in better fitting chunks. Otherwise there is quite a
big gap between allocating 128K and 5x128K of RAM at once.
Implement soft pressure on UMA cache bucket sizes.
Every time system detects low memory condition decrease bucket sizes for
each zone by one item. As result, higher memory pressure will push to
smaller bucket sizes and so smaller per-CPU caches and so more efficient
memory use.
Before this change there was no force to oppose buckets growth as result
of practically inevitable zone lock conflicts, and after some run time
per-CPU caches could consume enough RAM to kill the system.
Create own free list for each of the first 32 possible allocation sizes.
In case of 4K allocation quantum that means for allocations up to 128K.
With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items). Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive. Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.
While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.
If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
For sys/boot/i386 and sys/boot/pc98, separate flags to be passed
directly to the linker (LD_FLAGS) from flags passed indirectly, via the
compiler driver (LDFLAGS).
This is because several Makefiles under sys/boot/i386 and sys/boot/pc98
use ${LD} directly to link, and the normal LDFLAGS value should not be
used in these cases.
In sys/dev/scc, remove unused static function scc_setmreg(). While
here, invoke scc_getmreg() in two more places where it can be used.
Reviewed by: marcel
Fix bug introduced at r252226, when udata argument passed to bucket_alloc()
was used without making sure first that it was really passed for us.
On some of my systems this bug made user argument passed by ZFS code to
uma_zalloc_arg() unexpectedly block UMA per-CPU caches for those zones.
In sys/dev/mcd/mcd.c, mark the static const COPYRIGHT string as __used,
so it ends up in the object file, and no warnings are emitted about it
being actually unused.
For sys/dev/drm2/radeon, only use -fms-extensions with gcc. This flag
is only to stop gcc complaining about anonymous unions, which clang does
not do. For clang 3.4 however, -fms-extensions enables the Microsoft
__wchar_t type, which clashes with our own types.h.
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
Changes:
- Reinit uio_resid and flags before every call to soreceive().
- Set maximum acceptable size of packet to IP_MAXPACKET. As for now the
module doesn't support INET6.
- Properly handle MSG_TRUNC return from soreceive().
PR: 184601
Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.
Obtained from: Netflix
Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.
This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).
PR: kern/184002
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.
An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.
The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.
The NFSv4.1 client didn't return NFSv4.1 specific error codes
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.
For software builds, the NFS client does many small
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.
Fix this build for clang.
MFC r259730:
To avoid having to explicitly test COMPILER_TYPE for setting
clang-specific or gcc-specific flags, introduce the following new
variables for use in Makefiles:
CFLAGS.clang
CFLAGS.gcc
CXXFLAGS.clang
CXXFLAGS.gcc
In bsd.sys.mk, these get appended to the regular CFLAGS or CXXFLAGS for
the right compiler.
MFC r259913:
For libstand and sys/boot, split off gcc-only flags into CFLAGS.gcc.
MFC r259927:
Fix pc98 build, by also forcing COMPILER_TYPE in sys/boot/pc98/boot2's
Makefile.
Pointy hat to: dim
This set of changes puts in place the infrastructure to allow soft
updates to be multi-threaded. It introduces no functional changes
from its current operation.
MFC of 256860:
Allow kernels without options SOFTUPDATES to build. This should fix the
embedded tinderboxes.
Reviewed by: emaste
MFC of 256845:
Fix build problem on ARM (which defaults to building without soft updates).
Reported by: Tinderbox
Sponsored by: Netflix
MFC of 256817:
Restructuring of the soft updates code to set it up so that the
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256812:
Fourth of several cleanups to soft dependency implementation.
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256808:
Third of several cleanups to soft dependency implementation.
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256803:
Second of several cleanups to soft dependency implementation.
Delete two unused functions in ffs_sofdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256801:
First of several cleanups to soft dependency implementation.
Convert three functions exported from ffs_softdep.c to static
functions as they are not used outside of ffs_softdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
In the flowtable scanner, restart the scan at the last found position,
not at position 0. Changes the scanner from O(N^2) to O(N).
Reviewed by: emax
Obtained from: Netflix
We needlessly panic when trying to flush MKDIR_PARENT dependencies.
We had previously tried to flush all MKDIR_PARENT dependencies (and
all the NEWBLOCK pagedeps) by calling ffs_update(). However this will
only resolve these dependencies in direct blocks. So very large
directories with MKDIR_PARENT dependencies in indirect blocks had
not yet gotten flushed. As the directory is in the midst of doing a
complete sync, we simply defer the checking of the MKDIR_PARENT
dependencies until the indirect blocks have been sync'ed.
Reported by: Shawn Wallbridge of imaginaryforces.com
Tested by: John-Mark Gurney <jmg@funkthat.com>
PR: 183424
In sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, remove static functions
mk_cpl_barrier_ulp(), mk_get_tcb_ulp() and mk_set_tcb_field_ulp(), which
are all unused since r237263.
In sys/vm/vm_pageout.c, since vm_pageout_worker() takes a void * as
argument, cast the incoming 0 argument to void *, to silence a warning
from clang 3.4 ("expression which evaluates to zero treated as a null
pointer constant of type 'void *' [-Wnon-literal-null-conversion]").
Remove some unused static const strings under sys/rpc, which have never
been used since the initial commit (r177633).
MFC r259843:
Move a static const variable to the #if 0 part where it is only used.
(Note the #if 0 part has been inactive since the initial commit,
r177633, so maybe it should be removed altogether).
In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used
whenever KTR is defined, so put it between #ifdef KTR guards. This
avoids a warning about a unused function if KTR is not enabled.
In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever
KTR is defined, so put it between #ifdef KTR guards. This avoids a
warning about a unused function if KTR is not enabled.
Add an FDT DTS and MDROOT kernel configuration for BERI on NetFPGA.
At this point we only support one CPU, the PIC, and a UART console.
Sponsored by: DARPA, AFRL
Fix the processor table entry structure to use a fixed-width type for
32-bit fields so it is the correct size on amd64. Remove a workaround
for the broken structure from bhyve(8).
Fix an off-by-one error in r228960. The maximum priority delta provided
by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting
priority value (before nice adjustment) is between SCHED_PRI_MIN and
SCHED_PRI_MAX, inclusive.
If vn_open_vnode() succeeded in opening the vnode, but subsequent
advisory lock cannot be obtained, prevent double-close of the vnode in
vn_close() called from the fdrop(), by resetting file' f_ops methods.
Do not create a hardware IPv6 server if the listen address is not
in6addr_any and is not in the CLIP table either. This fixes a reported
TOE+IPv6 NULL-dereference panic in do_pass_open_rpl().
While here, stop creating hardware servers for any loopback address.
It's just a waste of server tids.
Plumb the cn_grab and cn_ungrab routines down into the uart
clients. Mask RX interrupts while grabbed on the atmel serial
driver. This UART interrupts every character. When interrupts are
enabled at the mountroot> prompt, this means the ISR eats the
characters. Rather than try to create a cooperative buffering system
for the low level kernel console, instead just mask out the ISR. For
NS8250 and decsendents this isn't needed, since interrupts only happen
after 14 or more characters (depending on the fifo settings). Plumb
such that these are optional so there's no change in behavior for all
the other UART clients. ddb worked on this platform because all
interrupts were disabled while it was running, so this problem wasn't
noticed. The mountroot> issue has been around for a very very long
time.
Approved by: re@ (gjb@)
drm: Lower priority of "EDID checksum is invalid" message
The priority goes from "error" to "debug".
Connectors are polled every 10 seconds. Reading EDID is part of this
polling. However, when an invalid EDID is returned, this error message
is logged. When using Newcons for instance, having a kernel message
every 10 seconds is getting annoying.
Now that it's a debug message, it'll be logged only if hw.dri.debug is
enabled. This fix console spamming for some users.
Tested by: Larry Rosenman <ler@lerctr.org>
drm/ttm, drm/radeon: Replace EINTR/ERESTART by ERESTARTSYS...
... for msleep/cv_*wait() return values, where wait_event*() is used
on Linux. ERESTARTSYS is the return code expected by callers when the
operation was interrupted.
For instance, this is the case of radeon_cs_ioctl() (radeon_cs.c): if
an error occurs, and the code isn't ERESTARTSYS (eg. EINTR), it logs an
error.
Note that ERESTARTSYS is defined as ERESTART, but this keeps callers'
code close to Linux.
Submitted by: avg@ (previous version)
vga_pci: Improve boot display detection
The previous code was checking the "VGA Enable" bit on the video card's
parent PCI-to-PCI bridge only. This didn't work for the case where the
video card is attached to the root PCI bus (ie. the card has no parent
PCI-to-PCI bridge).
Now, the new code:
1. checks the "VGA Enable" bit on the parent bridge only if it's a
PCI-to-PCI bridge;
2. always checks the "I/O" and "Memory address space decoding" bits
on the video card itself.
However, vendor-specific bits are not used.
This fixes the use of many integrated Radeon cards: without this patch,
we fail to detect them as the boot display and, when radeonkms looks for
the Video BIOS, it skips the shadow copy made by the System BIOS. It
then fails to fully initialize the card, because the shadow copy is the
only way to read the Video BIOS in these situations. A workaround was to
force the boot display selection using the "hw.pci.default_vgapci_unit"
tunable.
A previous version of this patch added a new function doing the checks.
Now, the vga_pci_is_boot_display() function is used to perform the
checks (only until the boot display is found) and return if the given
device is the boot display or not.
Furthermore, vga_pci_attach() logs "Boot video device" if the card being
attached it the Chosen One:
vgapci0: <VGA-compatible display> [...]
vgapci0: Boot video device
Reviewed by: kib@, jhb@ (both a previous version)
Tested by: lunatic_ (#freebsd-xorg, integrated Radeon card,
xmj (#freebsd-xorg, i915+NVIDIA cards)
When comparing device IDs, make sure that they have the same type
(like NAA assigned) and identify the same entity (like device or port).
Otherwise there can be false positives since at least some models of
Seagate disks use same IDs for the whole device and one of its ports.
MFC r257251:
Import the driver for VT-d DMAR hardware. Implement the busdma(9) using DMARs.
MFC r257512:
Add support for queued invalidation.
MFC miscellaneous follow-ups to r257251.
MFC r257266:
Remove redundand assignment to error variable and check for its value.
MFC r257308:
Remove redundand declaration.
MFC r257511:
Return BUS_PROBE_NOWILDCARD from the DMAR probe method.
MFC r257860,r257896,r257900,r257902,r257903 (by dim):
Fixes for gcc compilation.
Add check for buflen overflow by comparing the buflen with both offset
and resid.
MFC r258397:
Redo r258088 to avoid relying on signed arithmetic overflow.
Avoid overflow for the page counts.
MFC r258365:
Revert back to use int for the page counts.
Rearrange the checks to correctly handle overflowing address arithmetic.
opensolaris/uts/common/dtrace/fasttrap.c
Fix several problems that can cause panics on kldload and kldunload.
* kproc_create(fasttrap_pid_cleanup_cb, ...) gets called before
fasttrap_provs.fth_table gets allocated. This can lead to a panic
on module load, because fasttrap_pid_cleanup_cb references
fasttrap_provs.fth_table. Move kproc_create down after the point
that fasttrap_provs.fth_table gets allocated, and modify the error
handling accordingly.
* dtrace_fasttrap_{fork,exec,exit} weren't getting NULLed until
after fasttrap_provs.fth_table got freed. That caused panics on
module unload because fasttrap_exec_exit calls
fasttrap_provider_retire, which references
fasttrap_provs.fth_table. NULL those function pointers earlier.
* There wasn't any code to destroy the
fasttrap_{tpoints,provs,procs}.fth_table mutexes on module unload,
leading to a resource leak when WITNESS is enabled. Destroy those
mutexes during fasttrap_unload().
Sponsored by: Spectra Logic Corporation
Add new sysctl, kern.supported_archs, containing the list of FreeBSD
MACHINE_ARCH values whose binaries this kernel can run. This patch provides
a feature requested for implementing pkgng ABI identifiers in a robust
way.
The list is designed to indicate whether, say, an i386 package can be run on
the current system. If kern.supported_abis contains "i386", then the answer
is yes. Otherwise, the answer is no.
At the moment, this only supports MACHINE_ARCH and MACHINE_ARCH32. As we
gain support for more interesting combinations, this needs to become more
flexible, possibily through the sysent framework, along with the
hw.machine_arch emulation immediately preceding this code in kern_mib.c.
Reviewed by: imp
- Fix RF registers for RT3070.
- Initialize BBP68 to improve RX sensitivity.
- Add RT2860_BCN_OFFSET1 and RT2860_MAX_LEN_CFG register initialization to
match with the vendor driver. While here, remove unused RT2860_DEF_MAC
definition.
Apply patch from upstream Heimdal for encoding fix
RFC 4402 specifies the implementation of the gss_pseudo_random()
function for the krb5 mechanism (and the C bindings therein).
The implementation uses a PRF+ function that concatenates the output
of individual krb5 pseudo-random operations produced with a counter
and seed. The original implementation of this function in Heimdal
incorrectly encoded the counter as a little-endian integer, but the
RFC specifies the counter encoding as big-endian. The implementation
initializes the counter to zero, so the first block of output (16 octets,
for the modern AES enctypes 17 and 18) is unchanged. (RFC 4402 specifies
that the counter should begin at 1, but both existing implementations
begin with zero and it looks like the standard will be re-issued, with
test vectors, to begin at zero.)
This is upstream's commit f85652af868e64811f2b32b815d4198e7f9017f6,
from 13 October, 2013:
% Fix krb5's gss_pseudo_random() (n is big-endian)
%
% The first enctype RFC3961 prf output length's bytes are correct because
% the little- and big-endian representations of unsigned zero are the
% same. The second block of output was wrong because the counter was not
% being encoded as big-endian.
%
% This change could break applications. But those applications would not
% have been interoperating with other implementations anyways (in
% particular: MIT's).
Bump __FreeBSD_version accordingly and add a note in UPDATING.
Approved by: hrs (mentor, src committer)
Fix one race and one fence post error. When the TX buffer was
completely full, we'd not complete any of the mbufs due to the fence
post error (this creates a large leak). When this is fixed, we still
leak, but at a much smaller rate due to a race between ateintr and
atestart_locked as well as an asymmetry where atestart_locked is
called from elsewhere. Ensure that we free in-flight packets that
have completed there as well. Also remove needless check for NULL on
mb, checked earlier in the loop and simplify a redundant if.
Bump the maximum VM space from 3 * memory size to a fixed
256MB. That's all we have room for since we map the hardware registers
starting at 0xd0000000. This allows my 64MB AT91SAM9G20 to boot again
after the unmmaped I/O changes were MFC'd at r251897. Other
subplatforms may need similar treatment.
Although not strictly required to boot a 64MB board, bump
vm_max_virtual_address to be KERNVIRTADDR + 256MB. This allows some
future shock protection since the KVA requirements have gone up since
the unmapped changes have gone in, as well as preventing us from
overlapping with the hardware devices, which we map at 0xd0000000,
which we'd hit with anything more than 85MB...
Call cpu_setup() immediately after the page tables are installed. This
enables data cache and other chip-specific features. It was previously
done via an early SYSINIT, but it was being done after pmap and vm setup,
and those setups need to use mutexes. On some modern ARM platforms,
the ldrex/strex instructions that implement mutexes require the data cache
to be enabled.
Call cpu_setup() from the initarm() routine on platforms that don't use
the common FDT-aware initarm() in arm/machdep.c.
Bugfixes... the host capabilties from FDT data are stored in host.caps, not
host.host_ocr, examine the correct field when setting up the hardware. Also,
the offset for the capabilties register should be 0x140, not 0x240.
Look up a nand chip by id in the static table before trying to obtain
ONFI parameters. This allows a static table entry to provide valid data
for chips known to provide invalid ONFI data.
Add ONFI signature check.
Add Micron chip found in Freescale Vybrid Family Phytec COSMIC board.
The vendor specified field is 88 bytes, not 8 bytes.
Update the onfi_params struct to ONFI revision 3.2 (06 12 2013).
Search for and validate the ONFI params as specified in the standard.
ONFI parameters are little-endian, hence we must take care to convert them
to native endianness. We must also pay attention to unaligned accesses.
Rework the routine that returns a pointer to the table of software ECC
byte positions within the OOB area to support chips with unusual OOB
sizes such as 218 or 224 bytes.
Call initarm_lastaddr() later in the init sequence, after establishing
static device mappings, rather than as the first of the initializations
that a platform can hook into. This allows a platform to allocate KVA
from the top of the address space downwards for things like static device
mapping, and return the final "last usable address" result after that and
other early init work is done.
Because some platforms were doing work in initarm_lastaddr() that needs to
be done early, add a new initarm_early_init() routine and move the early
init code to that routine on those platforms.
Make PTE_DEVICE a synonym for PTE_NOCACHE on armv4, to make it easier to
share the same code on both architectures.
Add new helper routines for arm static device mapping. The new code
allocates kva space from the top down for the device mappings and builds
entries in an internal table which is automatically used later by
arm_devmap_bootstrap(). The platform code just calls the new
arm_devmap_add_entry() function as many times as it needs to (up to 32
entries allowed; most platforms use 2 or 3 at most).
Remove imx local devmap code and use the essentially identical common
code that got moved from imx_machdep.c to arm/devmap.c.
Begin reducing code duplication in arm pmap.c and pmap-v6.c by factoring
out common code related to mapping device memory into a new devmap.c file.
Remove the growing duplication of code that used pmap_devmap_find_pa() and
then did some math with the returned results to generate a virtual address,
and likewise in reverse to get a physical address. Now there are a pair
of functions, arm_devmap_vtop() and arm_devmap_ptov(), to do that. The
bus_space_map() implementations are rewritten in terms of these.
Move remaining code and data related to static device mapping into the
new devmap.[ch] files. Emphasize the MD nature of these things by using
the prefix arm_devmap_ on the function and type names (already a few of
these things found their way into MI code, hopefully it will be harder to
do by accident in the future).
Remove the duplicated implementations of some bus_space functions and use
the essentially identical generic implementations instead. The generic
implementations differ only in the spelling of a couple variable names
and some formatting differences.
Rename WANDBOARD-COMMON to WANDBOARD.common and adjust the configs that
include it accordingly. The build machinery for universe and tinderbox
tries to build every kernel config whose name begins and ends with [A-Z0-9]
and the common include file that has most of the options isn't buildable
by itself, so the new lowercase .common will avoid building it.
TI sdhci driver improvements, mostly related to fdt data...
Use the published compatible strings (our own invention, "ti,mmchs" is
still accepted as well, for now).
Don't blindly turn on 8-bit bus mode, because even though the controller
supports it, the board has to be wired appropriately as well. Use the
published property (bus-width=<n>) and honor all the valid values (1,4,8).
The eMMC device on a Beaglebone Black is wired for 8-bit, update the dts.
The mmchs controller can inherently do both 1.8v and 3.0v on the first
device and 1.8v only on other devices, unless an external transceiver is
used. Set the voltage automatically for the first device and honor
the published fdt property (ti,dualvolt) for other devices.
Revamp the SoC identity numbering scheme to be more in line with the way
Freescale numbers the chips in the ID registers.
Add definitions for the register and data that describes the SoC type.
Reset the timer interrupt status register at the top rather than bottom of
the interrupt handler. If the event callback starts a new short timeout,
the timer can fire before returning from the event callback, and clearing
the interrupt status after that loses the interrupt and hangs until the
counter wraps. Fixing all of this removes the need for the do-nothing
loop at the top of the handler which really just waited for the counter to
roll over and reach the one-shot count again.
Also add a missing return(0) in the periodic timer start case.
Expand the list of compatible devices this driver works with. Increase
the target frequency from 1 to 10 MHz because these SoCs are plenty fast
enough to benefit from the extra event timer resolution.