information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.
The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.
Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.
This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.
The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.
With pre-fetch disabled (vfs.zfs.prefetch_disable=1):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s
With pre-fetch enabled (vfs.zfs.prefetch_disable=0):
== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s
In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.
The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc
These changes where based on work started by the zfsonlinux developers:
https://github.com/zfsonlinux/zfs/pull/1487
Reviewed by: gibbs, mav, will
MFC after: 2 weeks
Sponsored by: Multiplay
Change 221669 by bz@bz_zenith on 2013/02/01 12:26:04
Run the initialization for polling earlier along with INTRs
so that we can put network interface into polling mode by default
if DEVICE_POLLING is compiled in and no interrupts are available.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- The new allocator won't return coherent memory for any size > PAGE_SIZE,
so don't assume we have coherent memory, and explicitely use
bus_dmamap_sync().
Change 221767 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:18:53
When printing out information on a TLB MOD exception for a user
process (e.g., an attempt to write to a read-only page), report
it as a "write" in the console message, rather than "unknown".
Change 221768 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/02/05 14:28:00
Fix post-compile but pre-commit typo in last changeset.
MFC after: 3 days
Sponsored by: DARPA/AFRL
and OF_searchencprop(). I thought about using the element size parameter
to OF_getprop_alloc() to do endian-switching automatically, but it breaks
use with structs and a *lot* of FDT code (which can hopefully be moved to
these new APIs).
MFC after: 2 weeks
Change 231031 by brooks@brooks_zenith on 2013/07/11 16:22:08
Turn the unused and uncompilable MIPS_DISABLE_L1_CACHE define in
cache.c into an option and when set force I- and D-cache line
sizes to 0 (the latter part might be better as a tunable).
Fix some casts in an #if 0'd bit of code which attempts to
disable L1 cache ops when the cache is coherent.
Sponsored by: DARPA/AFRL
Change 228019 by bz@bz_zenith on 2013/04/23 13:55:30
Add kernel side support for large TLB on BERI/CHERI.
Modelled similar to NLM
MFC after: 3 days
Sponsored by: DAPRA/AFRL
Change 221534 by rwatson@rwatson_zenith_cl_cam_ac_uk on 2013/01/27 16:05:30
FreeBSD/mips stores page-table entries in a near-identical format
to MIPS TLB entries -- only it overrides certain "reserved" bits
in the MIPS-defined EntryLo register to hold software-defined bits
(swbits) to avoid significantly increasing the page table memory
footprint. On n32 and n64, these bits were (a) colliding with
MIPS64r2 physical memory extensions and (b) being improperly
cleared.
Attempt to fix both of these problems by pushing swbits further
along 64-bit EntryLo registers into the reserved space, and
improving consistency between C-based and assembly-based clearing
of swbits -- in particular, to use the same definition. This
should stop swbits from leaking into TLB entries -- while ignored
by most current MIPS hardware, this would cause a problem with
(much) larger physical memory sizes, and also leads to confusing
hardware-level tracing as physical addresses contain unexpected
(and inconsistent) higher bits.
Discussed with: imp, jmallett
MFC after: 3 days
Sponsored by: DARPA/AFRL
by encode-int. Specifically, it takes a set of 32-bit cell values and
changes them to host byte order. Most non-string instances of OF_getprop()
should be using this function, which is a no-op on big-endian platforms.
segments thinking it received only one segment. This causes it to enable
the delay the ACK for 100ms to wait for another segment which may never
come because all the data was received already.
Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes
us further away from acking every other packet; b) it introduces additional
delay in responding to the sender. The latter is especially bad because it
is in the nature of LRO to aggregated all segments of a burst with no more
coming until an ACK is sent back.
Change the delayed ACK logic to detect LRO segments by being larger than
the MSS for this connection and issuing an immediate ACK for them to keep
the ACK clock ticking without interruption.
Reported by: julian, cperciva
Tested by: cperciva
Reviewed by: lstewart
MFC after: 3 days
Switch the majority of device configuration to FDT from hints.
Add BERI_*_BASE configs to reduce duplication in the MDROOT and SDROOT
kernels.
Add NFS and GSSAPI support by default.
MFC after: 3 days
Sponsored by: DARPA/AFRL
230523, 1123614
Implement a driver for Robert Norton's PIC as an FDT interrupt
controller. Devices whose interrupt-parent property points to a beripic
device will have their interrupt allocation, activation , and setup
operations routed through the IC rather than down the traditional bus
hierarchy.
This driver largely abstracts the underlying CPU away allowing the
PIC to be implemented on CPU's other than BERI. Due to insufficient
abstractions a small amount of MIPS specific code is currently required
in fdt_mips.c and to implement counters.
MFC after: 3 days
Sponsored by: DARPA/AFRL
- Use bus reference phandles in place of FDT offsets as IRQ domain keys
- Unify the identical macio/fdt/mambo OpenPIC drivers into one
- Be more forgiving (following ePAPR) about what we need from the device
tree to identify an OpenPIC
- Correctly map all IRQs into an interrupt domain
- Set IRQ_*_CONFORM for interrupts on an unknown PIC type instead of
failing attachment for that device.
zio_compress_data(..) when compressing l2arc buffers.
This eliminates l2arc I/O errors, which resulted in very poor performance on
vdev's configured with block size greater than 512b due to compression
assuming a smaller min block size than the vdev supports.
MFC after: 2 days
cam_periph_acquire() can return error if periph already invalidated, but
that may be unacceptable and cause deadlock if the invalidated periph can't
be destroyed without "executing" the scheduled request.
Coverity CID: 1109822
MFC after: 2 months
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
The defined now safety requirements are:
- caller should not hold any locks and should be reenterable;
- callee should not depend on GEOM dual-threaded concurency semantics;
- on the way down, if request is unmapped while callee doesn't support it,
the context should be sleepable;
- kernel thread stack usage should be below 50%.
To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
- G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
- G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
- G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
- G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe. If any of requirements are not met, request is queued to
g_up or g_down thread same as before.
Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).
To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.
This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).
Sponsored by: iXsystems, Inc.
MFC after: 2 months
This shuld have been a problem since r230587. Not exactly sure why it
was not detected the last weeks with the tinderbox. I would assume
r255744 is what started to cause it.
MFC after: 1 week
the cfi(4) driver. It remained in the tree longer than would be ideal
due to the time required to bring cfi(4) to feature parity.
Sponsored by: DARPA/AFRL
MFC after: 3 days
Implement support for interrupt-parent nodes in simplebus. The current
implementation requires that device declarations have an interrupt-parent
node and that it point to a device that has registered itself as a
interrupt controller in fdt_ic_list_head and implements the fdt_ic
interface.
Sponsored by: DARPA/AFRL
user. Kqueue now saves the ucred of the allocating thread, to
correctly decrement the counter on close.
Under some specific and not real-world use scenario for kqueue, it is
possible for the kqueues to consume memory proportional to the square
of the number of the filedescriptors available to the process. Limit
allows administrator to prevent the abuse.
This is kernel-mode side of the change, with the user-mode enabling
commit following.
Reported and tested by: pho
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
already. Also, according to the specs, GDRST register is not in the
power well, so the forcewake for reset status read is excessive for
this reason.
Use plain register read for waiting of the reset completion
notification, to avoid gt_lock recursion. Linux upstream did the
similar change, but their code was already restructured.
Reported by: ray
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.
Replace big per-SIM locks with bunch of smaller ones:
- per-LUN locks to protect device and peripheral drivers state;
- per-target locks to protect list of LUNs on target;
- per-bus locks to protect reference counting;
- per-send queue locks to protect queue of CCBs to be sent;
- per-done queue locks to protect queue of completed CCBs;
- remaining per-SIM locks now protect only HBA driver internals.
While holding LUN lock it is allowed (while not recommended for performance
reasons) to take SIM lock. The opposite acquisition order is forbidden.
All the other locks are leaf locks, that can be taken anywhere, but should
not be cascaded. Many functions, such as: xpt_action(), xpt_done(),
xpt_async(), xpt_create_path(), etc. are no longer require (but allow) SIM
lock to be held.
To keep compatibility and solve cases where SIM lock can't be dropped, all
xpt_async() calls in addition to xpt_done() calls are queued to completion
threads for async processing in clean environment without SIM lock held.
Instead of single CAM SWI thread, used for commands completion processing
before, use multiple (depending on number of CPUs) threads. Load balanced
between them using "hash" of the device B:T:L address.
HBA drivers that can drop SIM lock during completion processing and have
sufficient number of completion threads to efficiently scale to multiple
CPUs can use new function xpt_done_direct() to avoid extra context switch.
Make ahci(4) driver to use this mechanism depending on hardware setup.
Sponsored by: iXsystems, Inc.
MFC after: 2 months
Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping
temporary mapped buffer. That fixes double unmap if biodone() called twice
for the same BIO (but with different done methods).
Move mapping removal before calling bio_done() method. I believe that it is
very wrong to do anything to BIO after reporting completion. kib@ thinks
it was done for some forgotten now case when bio_done() method needed mapped
buffer. But 1) if BIO was sent as unmapped, then IMO done() should be called
in the same way; 2) IMO there is no guatantee that buffer will be mapped at
this point at all, for example, if all underlying stack supports unmapped
I/O, so bio_done() handler can not expect that.
the register based on the argument index rather than relying on the fields
in struct reg to be in the right order. This assumption is incorrect on
FreeBSD and generally led to bogus argument values for the sixth argument
of PID and USDT probes; the first five are passed directly to dtrace_probe()
via the fasttrap trap handler and so were correctly handled.
MFC after: 2 weeks
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
from the kernel build until it is ready.
sys/conf/files:
Remove the entry for xen/evtchn/evtchn_dev.c so it is not included
in any kernel builds.
Noticed by: smh
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
Freescale SoCs including the i.MX series. This also works for the newer
SoCs with the ENET gigabit controller, but doesn't use any of the new
hardware features other than enabling gigabit speed.
Convert three functions exported from ffs_softdep.c to static
functions as they are not used outside of ffs_softdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
BUS_PROBE_DEFAULT, it would attach to them all, producing both many
fdtbus instances and preventing other devices from attaching. Instead
return BUS_PROBE_NOWILDCARD, which exists for exactly this purpose.
differed only with respect to the AIM version not following style(9) and
some additional features for 64-bit systems and machines with direct maps
in the AIM implementation that are no-ops on Book-E (at least for now).
the right register for power managment. It
was incorrectly using the clock register
which also caused the status to be the
opposite of what it is supposed to be.
1 - its disabled
0 - its enabled
Per kirkwood spec FSS_88F6180_9x_6281_OpenSource.pdf
Add an option ATSE_CFI_HACK to allow memory mapped CFI devices to have
their address range allocated sharable so that atse(4) can find it's
Ethernet address in the expected location.
We intend to remove this hack once the BERI platform has a loader.
221804, 221805, 222004, 222006, 222055, 222820, 1135077, 1135118, 1136259
Add atse(4), a driver for the Altera Triple Speed Ethernet MegaCore.
The current driver support gigabit Ethernet speeds only and works with
the MegaCore only in the internal FIFO configuration in the soon to be
open sourced BERI CPU configuration.
Submitted by: bz
MFC after: 3 days
Sponsored by: DARPA/AFRL
Change 227630 by bz@bz_zenith on 2013/04/12 08:50:27
Implement soft reset setting sr in sr and just in case loop
endlessly afterwards.
MFC after: 3 days
Sponsored by: DARPA/AFRL
Change 231100 by brooks@brooks_zenith on 2013/07/12 21:01:31
Add a new option ALTERA_SDCARD_FAST_SIM which checks immediatly
for success of I/O operations rather than queuing a task.
MFC after: 3 days
Sponsored by: DARPA/AFRL
Change 227594 by brooks@brooks_zenith on 2013/04/11 17:10:14
When we fail, print the error that occured if we are giving
up or if bootverbose is set.
MFC after: 3 days
Sponsored by: DARPA/AFRL
would resize a partition, but label providers - e.g. /dev/gptid/XXX - would
stay the same size.
Reviewed by: mav
MFC after: 1 month
Sponsored by: FreeBSD Foundation
- Remove two excessive and slow register reads from isp_intr(). Instead
of rereading value every time, assume that registers contain what we have
written there.
- Avoid sequential search through 4096 array elements when looking for
command tag. Use hash of lists to store active tags separately from free
ones and so greatly speedup the searches.
Reviewed by: mjacob
When parent provider has been resized, the scheme specific G_PART_RESIZE
method does an update of scheme's metadata. But all changes are not saved
to disk, until `gpart commit` will be called.
Discussed with: trasz
MFC after: 1 month
The order of releasing resources in mlxen was wrong, which caused
panic on reload of the module.
conf_ctx list should be released before stat_ctx list, otherwise
the leafs in conf_ctx list won't be released because of the dependancy.
The fix is to change the order of the releases.
Submitted by: Shahar Klein (shahark at mellanox.com)
harvester, which now always calls hwrngs with the buffer length
multiple of the word size. This allows to remove the excessive memory
accesses to temporary buffer when saving the entropy word.
Streamline the assembly and unify it between i386 and amd64.
Reviewed by: markm, des
Approved by: so (des)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The MDIO bus frequency is configured as a divisor off of the MDIO bus
reference clock. For the AR9344 and later, the MDIO bus frequency can
be faster than normal (ie, up to 100MHz) and thus a static divisor may
not be very applicable.
So, for those boards that may require an actual frequency to be selected
regardless of what crazy stuff the vendor throws in uboot, one can now
set the MDIO bus frequency. It uses the MDIO frequency and the target
frequency to choose a divisor that doesn't exceed the target frequency.
By default it will choose:
* DIV_28 on everything; except
* DIV_58 on the AR9344 to be conservative.
Whilst I'm here, add some comments about the defaults being not quite
right. For the other internal switch devices (like the AR933x, AR724x)
the divisor can be higher - it's internal and the reference MDIO clock
is much lower than 100MHz.
The divisor tables and loop code is inspired from Linux/OpenWRT. It's very
simple; I didn't feel that reimplementing it would yield a substantially
different solution.
Tested:
* AR9331 (mips24k)
* AR9344 (mips74k)
Obtained from: Linux/OpenWRT
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.
Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.
Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.
Reviewed by: grehan
MFC after: 3 days
correctly by doing nothing, then add a panic for the default case, because
that implies that some driver asked for a sync (probably incorrectly) and
nothing was done.
Usual symptoms are messages like
rn_delete: inconsistent annotation
rn_addmask: mask impossibly already in tree
or inability to flush/delete particular prefix in ipfw table.
Changes:
* Assume 32 bytes as maximum radix key length
* Remove rn_init()
* Statically allocate rn_ones/rn_zeroes
* Make separate mask tree for each "normal" tree instead of system global one
* Remove "optimization" on masks reusage and key zeroying
* Change rn_addmask() arguments to accept tree pointer (no users in base)
PR: kern/182851, kern/169206, kern/135476, kern/134531
Found by: Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after: 2 weeks
Reviewed by: glebius
Sponsored by: Yandex LLC
- Take BIO lock in biodone() only when there is no completion callback set
and so we should wake up thread waiting in biowait().
- Remove msleep() timeout from biowait(). It was added 11 years ago, when
there was no locks used, and it should not be needed any more.
Move tq_enqueue() call out of the queue lock for known handlers (actually
I have found no others in the base system). This reduces queue lock hold
time and congestion spinning under active multithreaded enqueuing.
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
we can now add all the hardware bits for the DB120.
* arge0/argemdio0 is hooked up to an AR8327 switch - which there's currently
no support for. However, the bootloader on this board does set it up as
a basic switch so we can at least _use_ it ourselves.
So we should at least configure the arge0 side of things, including the GMAC
register.
* .. the GMAC config peels off arge0 from the internal switch and exposes it
as an RGMII to said AR8327.
* arge1/argemdio1 are hooked up to an internal 10/100 switch. So, that also
needs configuring.
* Add support for the NOR flash layout.
* Add support for the wifi (which works, with bugs, but it works.)
What's missing!
* No GPIO stuff yet!
* No sound (I2S) and no NAND flash support yet, sorry!
* The normal DB120 has an external AR95xx wifi chip on PCIe but with the
actual calibration data in the NOR flash. My DB120 has been modified
to let me use the PCIe slot as a normal PCIe slot. I'll add the "default"
settings later when I have access to a non-modified one.
* Other stuff, like why the wifi unit gets upset and spits out stuck beacons
and interrupt storms everywhere. Sigh.
Tested:
* DB120 board - AR9344 (mips74k SoC) booting off of SPI flash into multi-user
mode.
* Do the hardware setup in the right order!
* Modify/improve the chip probe check so it can actually
probe the 7240/9340 directly (although it's not yet used..)
* Initialise and fetch the is_mii option
* Fix some debugging whilst I'm here.
This is enough to get things off the ground.
Tested:
* AR9344 SoC
* Add an AR9340 switch version entry;
* Support the switch being connected via MII;
* Add a flag to note that a switch is actually an internal
switch rather than an external switch.
Now:
* The ar9340 switch can interconnect via MII;
* Since some slightly different phy/switch register access methods
and quirks appear for the internal versus external switch,
we will need a flag to mark it as an "internal" switch.
Tested:
* AR9344 (internal switch)
* AR9331 (internal switch)
TODO:
* Test the AR8316 switch!
This is just the chip initialisation code (for now.)
It's not linked into the main build as it requires a bunch of other code
to be tidied up and committed. But it indeed does function as advertised.
Tested:
* AR9344 SoC
up and running.
* The MAC FIFO configurations needed updating;
* Reset the MDIO block at the same time the MAC block is reset;
* The default divisor needs changing as the DB120 runs at a higher
base MDIO bus clock compared to other chips.
The long-term fix is to allow the system to have a target MDIO bus
clock rate and then calculate the most suitable divider to meet
that. This will likely need implementing before stable external
PHY or switch support can be committed.
Tested:
* AR9344 (mips74k)
* AR9331 (mips24k)
Without correct barriers, this code just plain doesn't work on the
mips74k cores (specifically the AR9344.)
In particular, the MDIO register accesses need this barriering or MII bus
access results in out-of-order garbage.
Tested:
* AR9344 (mips74k)
* AR9331 (mips24k)
This is required for correct, stable operation on the MIPS74k SoCs
that are dual-issue, superscalar pipelines.
Tested:
* AR9344 SoC (MIPS74k)
* AR9331 SoC (MIPS24k)
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:
dtrace -n 'syscall::execve:return {trace(curpsinfo->pr_psargs);}'
Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:
fbt::kern_execve:entry
{
printf("%s", memstr(args[1]->begin_argv, ' ',
args[1]->begin_envv - args[1]->begin_argv));
}
The debug.dtrace.memstr_max sysctl limits the maximum argument size to
memstr(). Thanks to Brendan Gregg for helpful comments on freebsd-dtrace.
Tested by: Fabian Keil (earlier version)
MFC after: 2 weeks
the 'vmmdev_mtx' in vmmdev_rw().
Rely on the 'si_threadcount' accounting to ensure that we never destroy the
VM device node while it has operations in progress (e.g. ioctl, mmap etc).
Reported by: grehan
Reviewed by: grehan
section. This prevents a boot crash on nearly all iMacs and PowerMacs/Books.
The allocation in the probe section was working before because ata_probe was
returning 0 which did not invoke a second DEVICE_PROBE. Now it returns
a BUS_PROBE_DEFAULT which can invoke a second DEVICE_PROBE which results in
a "failed to reserve resource" exit.
PR: powerpc/182978
Discussed with: grehan@
MFC after: 1 Week
and not 8-bit. Fix support for isochronous transfers in USB host mode.
Fix a whitespace while at it.
MFC after: 1 week
Reported by: SAITOU Toshihide <toshi@ruby.ocn.ne.jp>
PR: usb/181987
four counters to struct ifaddr. This kills '+=' on a variables shared
between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
took if_data not from the ifaddr, but from ifaddr's ifnet. [2]
Submitted by: melifaro [1], pluknet[2]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
tools would need to know about the counter_u64_t type. Allow to include
sys/counter.h from userspace.
- Utilize now defined type in kvm_counter_u64_fetch().
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
images compiled on the world with higher major version number than the
high version number of the booted kernel. Default to disable.
Sponsored by: The FreeBSD Foundation
Discussed with: bapt
MFC after: 1 week
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.
Sponsored by: The FreeBSD Foundation
Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com>
MFC after: 1 week
* Add the MDIO clock probe during clock initialisation;
* Update the ethernet PLL configuration function to use the correct
values;
* Add a GMAC block configuration to pull the configuration out of hints;
* Add an ethernet switch reconfiguration method.
Tested:
* AR9344 SoC (DB120)
.. however, this has been tested with extra patches in my tree (to fix
the ethernet/MDIO support, SPI support, ethernet switch support)
and thus it isn't enough to bring the full board support up.
* Initialise the MDIO clock to default to the reference clock;
* Add some code to allow the hints mechanism to allow setup of the GMAC
config block.
* Document how the switch is wired up internally.
Tested:
* AR9331 SoC (Carambola 2)
* Print out the platform frequency the same as the other frequencies.
* Print out the MDIO frequency.
* Optionally do GMAC and ethernet switch setup if required.
Tested:
* AR9344
switch reset/initialise functions.
The AR934x and QC955x SoCs both have a configurable MDIO base clock.
The others have the MDIO clock use the same clock as the system
reference clock, whatever that may be.
Tested:
* AR9344 SoC
TODO:
* mips24k - AR933x would be fine for now, just to ensure that things
are sane.
control block.
The GMAC configuration block allows for some configuration of how
the GMAC0 (ie, arge0) port is connected to the on-board switch
(if indeed there is one.) It both can be pushed into the on-board
switch; it could also be torn out and exposed via an external
MII (and that operational mode is also controllable.)
Obtained from: Linux/OpenWRT
Most of the code between UFS1 and UFS2 is shared so this change
is pretty safe. Not only this makes UFS1 and 2 consistent but it
also matches what NetBSD and MacOS X have for some years now.
Reviewed by: mckusick
MFC after: 1 month
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs, grehan
Approved by: re (gjb)
sys/sys/systm.h:
* Add a new VM_GUEST type, VM_GUEST_HV (HyperV guest).
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c:
sys/dev/hyperv/vmbus/hv_hv.c:
sys/dev/hyperv/stordisengage/hv_ata_pci_disengage.c:
* Set vm_guest to VM_GUEST_HV and use that on other HyperV related
devices instead of cloning the cpuid hypervisor check.
* Cleanup the vmbus_identify function.
prior releases.
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (gjb)
sys/dev/xen/blkfront/blkfront.c:
On XenServer versions up to an including 6.2, paravirtualized
CDROM support is broken. When running in an HVM domain,
ignore paravirtualized instances of CDROM media, and instead
rely on native drivers attaching to emulated hardware. This
functions correctly on all currently known Xen based
platforms.
to dummy,yarrow and break the usability of /dev/random.
Fix the name of the tunable to something logical that 'sysctl kern.random'
emits.
Submitted by: des@ (the idea, code by me)
Refactor of /dev/random device. Main points include:
* Userland seeding is no longer used. This auto-seeds at boot time
on PC/Desktop setups; this may need some tweeking and intelligence
from those folks setting up embedded boxes, but the work is believed
to be minimal.
* An entropy cache is written to /entropy (even during installation)
and the kernel uses this at next boot.
* An entropy file written to /boot/entropy can be loaded by loader(8)
* Hardware sources such as rdrand are fed into Yarrow, and are no
longer available raw.
------------------------------------------------------------------------
r256240 | des | 2013-10-09 21:14:16 +0100 (Wed, 09 Oct 2013) | 4 lines
Add a RANDOM_RWFILE option and hide the entropy cache code behind it.
Rename YARROW_RNG and FORTUNA_RNG to RANDOM_YARROW and RANDOM_FORTUNA.
Add the RANDOM_* options to LINT.
------------------------------------------------------------------------
r256239 | des | 2013-10-09 21:12:59 +0100 (Wed, 09 Oct 2013) | 2 lines
Define RANDOM_PURE_RNDTEST for rndtest(4).
------------------------------------------------------------------------
r256204 | des | 2013-10-09 18:51:38 +0100 (Wed, 09 Oct 2013) | 2 lines
staticize struct random_hardware_source
------------------------------------------------------------------------
r256203 | markm | 2013-10-09 18:50:36 +0100 (Wed, 09 Oct 2013) | 2 lines
Wrap some policy-rich code in 'if NOTYET' until we can thresh out
what it really needs to do.
------------------------------------------------------------------------
r256184 | des | 2013-10-09 10:13:12 +0100 (Wed, 09 Oct 2013) | 2 lines
Re-add /dev/urandom for compatibility purposes.
------------------------------------------------------------------------
r256182 | des | 2013-10-09 10:11:14 +0100 (Wed, 09 Oct 2013) | 3 lines
Add missing include guards and move the existing ones out of the
implementation namespace.
------------------------------------------------------------------------
r256168 | markm | 2013-10-08 23:14:07 +0100 (Tue, 08 Oct 2013) | 10 lines
Fix some just-noticed problems:
o Allow this to work with "nodevice random" by fixing where the
MALLOC pool is defined.
o Fix the explicit reseed code. This was correct as submitted, but
in the project branch doesn't need to set the "seeded" bit as this
is done correctly in the "unblock" function.
o Remove some debug ifdeffing.
o Adjust comments.
------------------------------------------------------------------------
r256159 | markm | 2013-10-08 19:48:11 +0100 (Tue, 08 Oct 2013) | 6 lines
Time to eat crow for me.
I replaced the sx_* locks that Arthur used with regular mutexes;
this turned out the be the wrong thing to do as the locks need to
be sleepable. Revert this folly.
# Submitted by: Arthur Mesh <arthurmesh@gmail.com> (In original diff)
------------------------------------------------------------------------
r256138 | des | 2013-10-08 12:05:26 +0100 (Tue, 08 Oct 2013) | 10 lines
Add YARROW_RNG and FORTUNA_RNG to sys/conf/options.
Add a SYSINIT that forces a reseed during proc0 setup, which happens
fairly late in the boot process.
Add a RANDOM_DEBUG option which enables some debugging printf()s.
Add a new RANDOM_ATTACH entropy source which harvests entropy from the
get_cyclecount() delta across each call to a device attach method.
------------------------------------------------------------------------
r256135 | markm | 2013-10-08 07:54:52 +0100 (Tue, 08 Oct 2013) | 8 lines
Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use
EVENTHANDLER(mountroot) instead.
This means we can't count on /var being present, so something will
need to be done about harvesting /var/db/entropy/... .
Some policy now needs to be sorted out, and a pre-sync cache needs
to be written, but apart from that we are now ready to go.
Over to review.
------------------------------------------------------------------------
r256094 | markm | 2013-10-06 23:45:02 +0100 (Sun, 06 Oct 2013) | 8 lines
Snapshot.
Looking pretty good; this mostly works now. New code includes:
* Read cached entropy at startup, both from files and from loader(8)
preloaded entropy. Failures are soft, but announced. Untested.
* Use EVENTHANDLER to do above just before we go multiuser. Untested.
------------------------------------------------------------------------
r256088 | markm | 2013-10-06 14:01:42 +0100 (Sun, 06 Oct 2013) | 2 lines
Fix up the man page for random(4). This mainly removes no-longer-relevant
details about HW RNGs, reseeding explicitly and user-supplied
entropy.
------------------------------------------------------------------------
r256087 | markm | 2013-10-06 13:43:42 +0100 (Sun, 06 Oct 2013) | 6 lines
As userland writing to /dev/random is no more, remove the "better
than nothing" bootstrap mode.
Add SWI harvesting to the mix.
My box seeds Yarrow by itself in a few seconds! YMMV; more to follow.
------------------------------------------------------------------------
r256086 | markm | 2013-10-06 13:40:32 +0100 (Sun, 06 Oct 2013) | 11 lines
Debug run. This now works, except that the "live" sources haven't
been tested. With all sources turned on, this unlocks itself in
a couple of seconds! That is no my box, and there is no guarantee
that this will be the case everywhere.
* Cut debug prints.
* Use the same locks/mutexes all the way through.
* Be a tad more conservative about entropy estimates.
------------------------------------------------------------------------
r256084 | markm | 2013-10-06 13:35:29 +0100 (Sun, 06 Oct 2013) | 5 lines
Don't use the "real" assembler mnemonics; older compilers may not
understand them (like when building CURRENT on 9.x).
# Submitted by: Konstantin Belousov <kostikbel@gmail.com>
------------------------------------------------------------------------
r256081 | markm | 2013-10-06 10:55:28 +0100 (Sun, 06 Oct 2013) | 12 lines
SNAPSHOT.
Simplify the malloc pools; We only need one for this device.
Simplify the harvest queue.
Marginally improve the entropy pool hashing, making it a bit faster
in the process.
Connect up the hardware "live" source harvesting. This is simplistic
for now, and will need to be made rate-adaptive.
All of the above passes a compile test but needs to be debugged.
------------------------------------------------------------------------
r256042 | markm | 2013-10-04 07:55:06 +0100 (Fri, 04 Oct 2013) | 25 lines
Snapshot. This passes the build test, but has not yet been finished or debugged.
Contains:
* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).
* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.
* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.
* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.
* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.
# Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)
------------------------------------------------------------------------
r255319 | markm | 2013-09-06 18:51:52 +0100 (Fri, 06 Sep 2013) | 4 lines
Yarrow wants entropy estimations to be conservative; the usual idea
is that if you are certain you have N bits of entropy, you declare
N/2.
------------------------------------------------------------------------
r255075 | markm | 2013-08-30 18:47:53 +0100 (Fri, 30 Aug 2013) | 4 lines
Remove short-lived idea; thread to harvest (eg) RDRAND enropy into the
usual harvest queues. It was a nifty idea, but too heavyweight.
# Submitted by: Arthur Mesh <arthurmesh@gmail.com>
------------------------------------------------------------------------
r255071 | markm | 2013-08-30 12:42:57 +0100 (Fri, 30 Aug 2013) | 4 lines
Separate out the Software RNG entropy harvesting queue and thread
into its own files.
# Submitted by: Arthur Mesh <arthurmesh@gmail.com>
------------------------------------------------------------------------
r254934 | markm | 2013-08-26 20:07:03 +0100 (Mon, 26 Aug 2013) | 2 lines
Remove the short-lived namei experiment.
------------------------------------------------------------------------
r254928 | markm | 2013-08-26 19:35:21 +0100 (Mon, 26 Aug 2013) | 2 lines
Snapshot; Do some running repairs on entropy harvesting. More needs
to follow.
------------------------------------------------------------------------
r254927 | markm | 2013-08-26 19:29:51 +0100 (Mon, 26 Aug 2013) | 15 lines
Snapshot of current work;
1) Clean up namespace; only use "Yarrow" where it is Yarrow-specific
or close enough to the Yarrow algorithm. For the rest use a neutral
name.
2) Tidy up headers; put private stuff in private places. More could
be done here.
3) Streamline the hashing/encryption; no need for a 256-bit counter;
128 bits will last for long enough.
There are bits of debug code lying around; these will be removed
at a later stage.
------------------------------------------------------------------------
r254784 | markm | 2013-08-24 14:54:56 +0100 (Sat, 24 Aug 2013) | 39 lines
1) example (partially humorous random_adaptor, that I call "EXAMPLE")
* It's not meant to be used in a real system, it's there to show how
the basics of how to create interfaces for random_adaptors. Perhaps
it should belong in a manual page
2) Move probe.c's functionality in to random_adaptors.c
* rename random_ident_hardware() to random_adaptor_choose()
3) Introduce a new way to choose (or select) random_adaptors via tunable
"rngs_want" It's a list of comma separated names of adaptors, ordered
by preferences. I.e.:
rngs_want="yarrow,rdrand"
Such setting would cause yarrow to be preferred to rdrand. If neither of
them are available (or registered), then system will default to
something reasonable (currently yarrow). If yarrow is not present, then
we fall back to the adaptor that's first on the list of registered
adaptors.
4) Introduce a way where RNGs can play a role of entropy source. This is
mostly useful for HW rngs.
The way I envision this is that every HW RNG will use this
functionality by default. Functionality to disable this is also present.
I have an example of how to use this in random_adaptor_example.c (see
modload event, and init function)
5) fix kern.random.adaptors from
kern.random.adaptors: yarrowpanicblock
to
kern.random.adaptors: yarrow,panic,block
6) add kern.random.active_adaptor to indicate currently selected
adaptor:
root@freebsd04:~ # sysctl kern.random.active_adaptor
kern.random.active_adaptor: yarrow
# Submitted by: Arthur Mesh <arthurmesh@gmail.com>
Submitted by: Dag-Erling Smørgrav <des@FreeBSD.org>, Arthur Mesh <arthurmesh@gmail.com>
Reviewed by: des@FreeBSD.org
Approved by: re (delphij)
Approved by: secteam (des,delphij)
and holding a reference prior to calling further into the hyperv
stack.
Added missing FreeBSD idents.
Submitted by: Microsoft hyperv dev team
Approved by: re@ (gjb)
Fixed a panic that occurs when bringing up an interface on 57710/57711
running very old bootcode versions.
Fixed how bool is defined for those who have been using this code on older
versions of FreeBSD.
Approved by: re@ (gjb)
Approved by: davidch (mentor)
I changed it to use if_transmit a while ago but apparently with monitor
mode the if_transmit method is overridden.
This is (mostly) a workaround until a more permanent solution can be
found.
Submitted by: Patrick Kelsey <kelsey@ieee.org>
Approved by: re@ (gjb)
protected mode and may leave protected-mode-specific flags like PSL_NT set
when they return to real mode. This can cause a fault when BTX re-enters
protected mode after the BIOS mode returns.
PR: amd64/182740
Reported by: Julian Pidancet <julian.pidancet@gmail.com>
Approved by: re (gjb)
MFC after: 1 week
- Update FreeBSD version in:
- UPDATING
- sys/conf/newvers.sh
- Add 11.0 FreeBSD version for manual pages
- Bump __FreeBSD_version to 1100000
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
union members in strict C99, by giving them names. While here, add some
FreeBSD keywords where they were missing.
Approved by: re (gjb)
Reviewed by: grehan
The device name was incorrect due to a specific function we ported
from the Linux driver that is not FBSD compatible. This resulted
with a false sysctl registration and some more problematic issues.
The patch basically revokes it all together.
Submitted by: Meny Yossefi (menyy mellanox.com)
Approved by: re
illumos change 14172:be36a38bac3d:
illumos ZFS issues:
4082 zfs receive gets EFBIG from dmu_tx_hold_free()
Please note that this change is slightly different from r255257, because
it is merged out of order with other (larger) upstream changes.
PR: kern/182570
Reported by: Keith White <kwhite@site.uottawa.ca>
Tested by: Keith White <kwhite@site.uottawa.ca>
Approved by: re (glebius)
MFC after: 1 week
X-MFC after: r254753
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Reviewed by: melifaro, adrian
Approved by: re (marius)
called. This probably should be fixed eventually, but for now it is
not needed to try to flush such vnodes from the buffer allocation
context.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
really belong to it. Such vnodes, with the pointers to other vnodes
v_objects, are typically instantiated by the bypass filesystems.
Invalidating mappings of other vnode pages and the pages is wrong,
since reclamation of the upper vnode does not imply that lower vnode
is reclaimed too.
One of the consequences of the improper reclamation was destruction of
the wired mappings of the lower vnode pages, triggering miscellaneous
assertions in the VM system.
Reported by: John Marshall <john.marshall@riverwillow.com.au>
Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
allocated, but the old table is kept around to handle the case of
threads still performing unlocked accesses to it.
Grow the table exponentially instead of increasing its size by
sizeof(long) * 8 chunks when overflowing. This mode significantly
reduces the total memory use for the processes consuming large numbers
of the file descriptors which open them one by one.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (marius)
- This version has support for the new Intel Avoton systems,
including 2.5Gb support, further it now has IPv6/TSO6 support as
well. Shared code has been updated where necessary as well. Thanks
to my new assistant Eric Joyner for doing the transmit path changes
to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the
one bug and change needed in NETMAP.
Approved by: re