The NVMe specification does not define a maximum or optimal delete
size, so technically max delete size is min(full size of namespace,
2^32 - 1 LBAs). A single delete operation for a multi-TB NVMe
namespace though may take much longer to complete than the nvme(4)
I/O timeout period. So choose a sensible default here that is still
suitably large to minimize the number of overall delete operations.
This also fixes possible uint32_t overflow on initial TRIM operation
for zpool create operations for NVMe namespaces with >4G LBAs.
MFC after: 3 days
Sponsored by: Intel
NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
Do not let the two events be combined, since one would overwrite
the other's data.
PR: 180385
Submitted by: David A. Bright <david_a_bright@dell.com>
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D4900
ofw_bus_get_node() must be tested against negative values since
missing parent bus method will result in calling the default method
which simply returns (-1): sys/dev/ofw/ofw_bus_if.m
This was lost in the review process.
Obtained from: Semihalf
Sponsored by: Cavium
Some firmware revisions provide different DTB tree that include
odd MDIO placement in the tree.
This commit adds support for 2 new buses:
- MRML bridge (PCIB subordinate)
- MDIO nexus (MRML subordinate)
This allows for the correct MDIO attachment with both - new and old
firmware.
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5070
New ThunderX firmware incorporates modified DTB that presents
different device hierarchy. In the new device tree, MDIO
devices are below two additional buses that oddly hang on
PCI bridge.
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5069
- Avoid using BUS_ macros as bus_generic_ functions should be used instead.
- Fix mistaken device_t pointers in thunder_pcie_alloc_resource.
Should use dev->parent method and allocate resource for child device
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5068
- Separate FDT and general PCIe driver parts
- Drop some irrelevant printfs that cannot be displayed in
FDT attach
- Move ranges parsing to FDT portion of PCIe code
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5067
Search for BGX node in DTS in two ways:
1. Try to find it uder root node first
2. If not found under root, find the top level PCI bridge node
and search all nodes below it until appropriate BGX node is found.
Move search code to another function to make the code more clear.
Remove unused variable by the way.
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5066
Use driver settable callbacks for handling of:
- core post reset
- reading actual port speed
Typically, OTG enabled EHCI cores wants setting of USBMODE register,
but this register is not defined in EHCI specification and different
cores can have it on different offset.
Also, for cores with TT extension, actual port speed must be determinable.
But again, EHCI specification not covers this so this patch provides
function for two most common variant of speed bits layout.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D5088
Use per-CPU structure to store HW watchpoints registers state
for each CPU present in the system. Those registers will be restored
upon wake up from the STOP state if requested by the debug_monitor
code. The method is similar to the one introduced to AMD64.
We store all possible 16 registers for HW watchpoints
(maximum allowed by the architecture).
HW breakpoints are not maintained since they are used for single
stepping only.
Pointed out by: kib
Reviewed by: wma
No strong objections from: kib
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Juniper Networks Inc.
Differential Revision: https://reviews.freebsd.org/D4338
The code should be comparing pointers, not any data
gathered from a blocked_lock.
Spotted by: cognet
Approved by: zbb, cognet (mentor)
Differential revision: https://reviews.freebsd.org/D5100
This fixes some cases where a process could exit without being untracked
by filemon.
Reported by: mjg
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
crash a server that has exported UFS2 by presenting a filehandle
with an inode number that references an uninitialized inode in a
cylinder group. The problem is that UFS2 only initializes blocks
of inodes as they are first allocated and ffs_fhtovp() does not
validate that the inode is in a range of inodes that have been
initialized. Attempting to read an uninitialized inode gets random
data from the disk. When the kernel tries to interpret it as an
inode, panics often arise.
Reported by: Christos Zoulas (from NetBSD)
Reviewed by: kib
a buffer pointer in the event of an error (for some errors it would
return a buffer pointer and for other errors it would not return a
buffer pointer). The cluster_read() function was similarly inconsistent.
Clients of these functions were inconsistent in handling errors.
Some would assume that no buffer was returned after an error and
would thus lose buffers under certain error conditions. Others would
assume that brelse() should always be called after an error and
would thus panic the system under certain error conditions.
To correct both of these problems with minimal code churn, bread()
and cluster_write() now always free the buffer when returning an
error thus ensuring that buffers will never be lost. The brelse()
routine checks for being passed a NULL buffer pointer and silently
returns to avoid panics. Thus both approaches to handling error
returns from bread() and cluster_read() will work correctly.
Future code should be written assuming that bread() and cluster_read()
will never return a buffer with an error, so should not attempt to
brelse() the buffer when an error is returned.
Reviewed by: kib
It only prints the header from filemon_ioctl. Keep the name though to stay
closer to other implementations.
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD. NetBSD and OpenBSD both changed their
fields to void * back in 1998. No new build failures were reported
via an exp-run.
PR: 206503 (exp-run)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5092
to be 64-bit on 32-bit architectures. It is not uncommon for device trees
to use the upper 32-bits to store what effectively is an index into the
parent ranges property. In this case, when running with a 32-bit bus_addr_t
and bus_size_t, we would previously truncate the address, this may then
incorrectly match the wrong range, and return the wrong address.
Tested by: bz (earlier version)
make it possible for efi_console to recognize and translate box characters
on i386 build (unsigned versus signed char passed as int issue).
Submitted by: Toomas Soome <tsoome at me.com>
Reviewed by: emaste, smh, dteske
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D4993
hate adding yet another define, but it is the lessor of the evil
choices available. Kill another evil by removing PATH_BOOT3 and
replacing it with PATH_LOADER or PATH_LOADER_ZFS as appropriate.
PR: 206659
o Return back the buf[TCP_CA_NAME_MAX] for TCP_CONGESTION,
for TCP_CCALGOOPT use dynamically allocated *pbuf.
o For SOPT_SET TCP_CONGESTION do NULL terminating of string
taking from userland.
o For SOPT_SET TCP_CONGESTION do the search for the algorithm
keeping the inpcb lock.
o For SOPT_GET TCP_CONGESTION first strlcpy() the name
holding the inpcb lock into temporary buffer, then copyout.
Together with: lstewart
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.
Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883
- Use taskqueue instead of swi for event handling.
- Scan the interrupt flags in filter
- Disable ringbuffer interrupt mask in filter to ensure no unnecessary
interrupts.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe, Dexuan <decui microsoft com>
Approved by: adrian (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4920
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).
This is a merge candidate for stable and 10.3
MFC after: 3 days
Similar to dialog(3) keep_tite option used to prevent visually disturbing
initialization or exit that could occur when run from a script using
dpv(3) by way of dpv(1) in sequence with other dialog(1) invocations.
- Start vap(s) (via ieee80211_start_all()) only when initialization
succeeds; stop the first vap otherwise (via ieee80211_stop());
- Do not try to stop a device multiple times
(move (sc->sc_flags & RTWN_RUNNING) check to urtwn_stop_locked()).
Tested by: kevlo
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5058
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.
Submitted by: Jason Wolfe (j at nitrology dot com)
Reviewed by: gnn, rrs, hiren, bz
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D5024
Files required for the NIC driver
Import from vendor-sys/alpine-hal/2.7
SVN rev.: 294828
HAL version: 2.7
Obtained from: Semihalf
Sponsored by: Annapurna Labs
- Add some new hlist macros.
- Update existing hlist macros removing the need for a temporary
iteration variable.
- Properly define the RCU hlist macros to be SMP safe with regard
to RCU.
- Safe list macro arguments by adding a pair of parentheses.
- Prefix the _list_add() and _list_splice() functions with "linux"
to reflect they are LinuxKPI internal functions.
Obtained from: Linux
MFC after: 1 week
Sponsored by: Mellanox Technologies
close or assert the bug that it is clear when leaving.
Remove an unrelated rotted comment that was attached to the buggy
clearing.
Since draining is not done in more cases, flushing is needed in more
cases, so start fixing flushing:
- do a full flush in ttydisc_close(). State what POSIX requires more
clearly. This was missing ttydevsw_pktnotify() calls to tell the
devsw layer to flush. Hardware tty drivers don't actually flush
since they don't understand this API.
- fix 2 missing wakeups in tty_flush(). Most of the wakeups here are
unnecessary for last close. But ttydisc_close() did one of the
missing ones.
This flow control bug ameliorated the design bug of requiring
potentially unbounded waits in draining. Software flow control is the
easiest way to get an unbounded wait, and a long wait is sometimes
actually useful. Users can type the xoff character on the receiver
and (if ixon is set on the sender) expect the output to be held until
the user is ready for more.
Hardware flow control can also give the unbounded wait, and this bug
didn't affect hardware flow control. Unbounded waits from hardware
flow control take a more unusual configuration. E.g., a terminal
program that controls the modem status lines, or unplugging the cable
in a configuration where this doesn't break the connection.
The design bug is still ameliorated by a newer bug in draining for
last close -- the 1 second timeout. E.g., if the user types the
xoff character and the sender reaches last close, then output is
not resumed and the wait times out after just 1 second. This is
broken, but preferable to an unbounded wait. Before this change,
the output was resumed immediately and usually completed.
Submitted by: bde
MFC after: 2 weeks
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
Author: Steven Hartland <steven.hartland@multiplay.co.uk>
illumos/illumos-gate@2bad22584d
exceeds refquota
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@5878fad70d
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Paul Dagnelie <pcd@delphix.com>
illumos/illumos-gate@68ecb2ec93
This allows to do a full (non-incremental send) and receive it as a clone
of an existing dataset. It can leverage nopwrite to share blocks with the
origin. This can be used to change the relationship of datasets on the
target. For example, maybe on the source you have:
A ---- B ---- C
And you have sent to the target a full of B, and the incremental B->C:
B ---- C
You later realize that you want to have A on the target. You will have to
do a full send of A, but nopwrite can save you space on the target if you
receive it as a clone of B, assuming that A and B have some blocks inxi
common:
B ---- C
\
A
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: James Pan <jiaming.pan@yahoo.com>
illumos/illumos-gate@3502ed6e7c
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Will Andrews <will@firepipe.net>
illumos/illumos-gate@eb5bb58421
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@c71c00bbe8
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@5bdd995ddb
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Simon Klinkert <simon.klinkert@gmail.com>
illumos/illumos-gate@6575bca013
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Richard Yao <ryao@gentoo.org>
illumos/illumos-gate@eaef6a96de
6292 exporting a pool while an async destroy is running can leave entries
in the deferred tree
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Fabian Keil <fk@fabiankeil.de>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
illumos/illumos-gate@a443cc80c7
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
illumos/illumos-gate@b39b744be7
This is revert of 5693.
Added hw.ix.flow_control which enables the default flow_control of all ix
interfaces to be set in loader.conf.
Added hw.ix.advertise_speed which enables the default advertised_speed of
all ix interfaces to be set in loader.conf.
Made enable_aim device independent based on hw.ix.enable_aim default.
Reviewed by: erj
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D5060
in elf_cpu_load_file(). The only time when the sync is needed is after
kernel module is loaded and the relocation info is processed. And it's
done in elf_cpu_load_file().
- Avoid main lock contention by trylock for if_start, if that fails,
schedule TX taskqueue for if_start
- Don't do direct sending if the packet to be sent is large, e.g.
TSO packet.
This change gives me stable 9.1Gbps TCP sending performance w/ TSO
over a 10Gbe directly connected network (the performance fluctuated
between 4Gbps and 9Gbps before this commit). It also improves non-
TSO TCP sending performance a lot.
Reviewed by: adrian, royger
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5074
intended behaviour in its man page. Simplify tty_drain() to match.
Don't call ttydevsw methods in tty_flush() if the device is gone
since we now sometimes call it then.
The flushing was supposed to be implemented by passing the FNONBLOCK
flag to VOP_CLOSE() for revoke(). The tty driver is one of the few
that can block in close and was one of the fewer that knew about this.
This almost worked in FreeBSD-1 and similarly in Net/2. These
versions only almost worked because there was and is considerable
confusion between IO_NDELAY and FNONBLOCK (aka O_NONBLOCK). IO_NDELAY
is only valid for VOP_READ() and VOP_WRITE(). For other VOPs it has
the same value as O_SHLOCK. But since vfs_subr.c and tty.c
consistently used the wrong flag and the O_SHLOCK flag is rarely set,
this mostly worked. It also gave the feature than applications could
get the non-blocking close by abusing O_SHLOCK.
This was first broken then fixed in 1995. I changed only the tty
driver to use FNONBLOCK, as a hack to get non-blocking via the normal
flag FNONBLOCK for last closes. I didn't know about revoke()'s use
of IO_NDELAY or change it to be consistent, so revoke() was broken.
Then I changed revoke() to match.
This was next broken in 1997 then fixed in 1998. Importing Lite2 made
the flags inconsistent again by undoing the fix only in vfs_subr.c.
This was next broken in 2008 by replacing everything in tty.c and not
checking any flags in last close. Other bugs in draining limited the
resulting unbounded waits to drain in some cases.
It is now possible to fix this better using the new FREVOKE flag.
Just restore flushing for revoke() for now. Don't restore or undo any
hacks for ordinary last closes yet. But remove dead code in the
1-second relative timeout (r272789). This did extra work to extend
the buggy draining for revoke() for as long as possible. The 1-second
timeout made this not very long by usually flushing after 1 second.
Submitted by: bde
MFC after: 2 weeks
boot1 to pass in files with newlines in them. Now that the EFI loader
groks foo=bar on the command line, this can allow a more general setup
than traditional boot loader args will allow.
Differential Revision: https://reviews.freebsd.org/D5038
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.
The only difference between dcbzl and dcbz is dcbzl operates on native cache
line lengths regardless of L1CSR0[DCBZ32]. Since we don't change the cache line
size, the cacheline_size variable will reflect the used cache line length, and
dcbz will work as expected.
The fail point handler may sleep, but this is not permitted while holding a
rm read lock.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
This allows, for example, UEFI pass a memory map with some ram in this
region, but for us to ignore it. This is the case when running under the
qemu virt machine type.
Sponsored by: ABT Systems Ltd
Allows for using hardware watchpoints for 1, 2, 4, 8 byte long addresses.
The default configuration of watchpoint is RW but code allows to select
RO or WO and X.
Since debugging registers are per-CPU (CP14) the watchpoint is set on
the CPU that was lucky (or not) to enter DDB.
HW breakpoints are used to perform single step in KDB.
When HW breakpoint is enabled all watchpoints are temporary disabled
to avoid recursive abort on both watchpoint and breakpoint.
In case of branch, the breakpoint is set to both - next instruction
and possible branch address. This requires at least 2 breakpoints
supported in the CPU however this is a must for ARMv6/v7 CPUs.
Reviewed by: imp
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Juniper Networks Inc.
Differential Revision: https://reviews.freebsd.org/D4037
open (in disguise as the console device). The only allowed combination
was supposed to be the callin device with the console.
Fix the assertion in ttydev_close() that was meant to detect this (it
only detected all 3 devices being open). Assert this in ttydev_open()
too.
Submitted by: bde
MFC after: 2 weeks
(TF_OPENED_CONS) were broken in r188147 by adding TF_OPENED_CONS
without updating the string. It was especially confusing to display
OPENED_CONS as GONE and BYPASS as ZOMBIE. 2 flags at the end were
not updated in r188487.
Don't print an extra 0x prefix for %p in a ddb command. In the rest
of the kernel there are more than 6000 lines with %p and only about
40 with this bug.
Print a non-extra 0x prefix for %b in a ddb command. In the rest
of the kernel, there are approx. 180 lines with %b and 2/3 of them
have this bug.
Submitted by: bde
MFC after: 2 weeks
Rename gic_v3_ instances to simply use 'gic' and 'its'.
The information about the controller's revision is printed
in the device announcement during boot anyway.
The intention behind this change is to avoid somewhat misleading
GIC instances naming such as:
gic_v30
gic_v31
...
etc.
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5016
Avoid probing GICv2m to any parent bus/driver. Instead, match
GICv2m driver with FDT complatible strings as not every GIC
has a MSI controller in the form of GICv2m extension.
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5015
Currently when the OF_getprop() function returns with error,
the caller (OF_getencprop()) still changes the buffer endiannes.
This may destroy the default value passed in the input buffer if
used on a Little Endian platform.
Reviewed by: mmel
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Even if data cache maintenance was done by IO code, the relocation
fixup process creates dirty cache entries that we must write back
before doing icache sync.
Reported by: Thiagarajan Venkatasubramanian <tvenkata at juniper.net>
Reviewed by: ian
pmap implementations on ARM. This way minidump code can be used without
any platform specific modification.
Also, this is the last piece missing for ARM_NEW_PMAP.
Differential Revision: https://reviews.freebsd.org/D5023
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.
So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.
Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).
New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.
The page information array could contain up to 32 elements (i.e. 512B).
And on network side w/ TSO, 11+ (176B+) elements, i.e. ~44K TSO packet,
in the page information array is quite common.
This saves us some cpu cycles.
Reviewed by: adrian, delphij
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4992
According to all available information, VMSWITCH always does the
TCP segment checksum verification before sending the segment to
guest.
Reviewed by: adrian, delphij, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4991
All used fields are setup one by one, so there is no need to zero
out this large struct.
While I'm here, move the stack variable near its usage.
Reviewed by: adrian, delphij, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4978
While I'm here, move stack variables near their usage.
Reviewed by: adrian, delphij, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4977
- Avoid unnecessary malloc/free on transmission path.
- busdma(9)-fy transmission path.
- Properly handle IFF_DRV_OACTIVE. This should fix the network
stalls reported by many.
- Properly setup TSO parameters.
- Properly handle bpf(4) tapping. This 5 times the performance
during TCP sending test, when there is one bpf(4) attached.
- Allow size of chimney sending be tuned on a running system.
Default value still needs more test to determine.
Reviewed by: adrian, delphij
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4972
* Use the ARM PLATFORM framework
* Use ARM_INTRNG on teh A20 as it has a GICv2
* Add a method to find which Allwinner SoC we are running on
Differential Revision: https://reviews.freebsd.org/D5059
Use malloc(9) for
- struct ieee80211req_wpaie2 (518 bytes, used in
ieee80211_ioctl_getwpaie())
- struct ieee80211_scan_req (128 bytes, used in setmlme_assoc_adhoc()
and ieee80211_ioctl_scanreq())
Also, drop __noinline workarounds; stack overflow is not reproducible
with recent compilers.
Tested with Clang 3.7.1, GCC 4.2.1 (from 9.3-RELEASE) and 4.9.4
(with -fstack-usage flag)
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5041
Do not duplicate code between IEEE80211_IOC_WPAIE and IEEE80211_IOC_WPAIE2
switch cases.
Approved by: adrian (mentor)
Differential Revision: D5041 (part)
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.
This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.
MFC after: 4 days
support frameworks (i.e. regulators/phy/tsensors/fuses...).
It provides simple unified consumers interface for manipulations with
on-chip resets.
Reviewed by: ian, imp (paritaly)
support frameworks(i.e. reset/regulators/phy/tsensors/fuses...).
The clock framework significantly simplifies handling of complex clock
structures found in modern SoCs. It provides the unified consumers
interface, holds and manages actual clock topology, frequency and gating.
It's tested on three different ARM boards (Nvidia Tegra TK1, Inforce 6410 and
Odroid XU2) and on one MIPS board (Creator Ci20) by kan@.
The framework is still far from perfect and probably doesn't have stable
interface yet, but we want to start testing it on more real boards and
different architectures.
Reviewed by: ian, kan (earlier version)
Directory index was introduced in ext3. We don't always use the
prefix to denote the ext2 variant they belong to but when we
do we should try to be accurate.
We use i_flag to carry some flags like IN_E4INDEX which newer
ext2fs variants uses internally.
fsck.ext3 rightfully complains after our implementation tags
non-directory inodes with INDEX_FL.
Initializing i_flag during allocation removes the noise factor
and quiets down fsck.
Patch from: Damjan Jovanovic
PR: 206530
called from zfs_write() - one of them, through dmu_write(), was handled
correctly; the other wasn't.
Reviewed by: avg@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D4923
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.
Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500
Reviewed by: thompsa, glebius, adrian (old patch)
Approved by: bapt (mentor)
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D540
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to
represent an iWARP endpoint when the connection is over TCP. For
servers the current approach is to invoke create_listen callback for
each iWARP RNIC registered with the CM. This doesn't work too well for
INADDR_ANY because a listen on any TCP socket already notifies all
hardware TOEs/RNICs of the new listener. This patch fixes the server
side of things for FreeBSD. We've tried to keep all these modifications
in the iWARP/TCP specific parts of the OFED infrastructure as much as
possible.
Submitted by: Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D4801
data. If vnode bypass for devfs file failed, vn_read/vn_write are
called and might try to dereference f_advice. Limit the accesses to
f_advice to VREG vnodes only, which is the type ensured by
posix_fadvise().
The f_advice for regular files is protected by mtxpool lock. Recheck
that f_advice is not NULL after lock is taken.
Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
When ifconfig sets media then the values displayed by the advertise_speed
value are invalidated.
Fix this by setting the bits correctly including setting advertise to 0 for
media = auto.
Reviewed by: sbruno
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D5034
Windows 10 and Window 2016 will return all zero inquiry data for
non-existing slots. If we dispatched them, then a lot of useless
(0 sized) disks would be created. So we verify the returned inquiry
data (valid type, non-empty vendor/product/revision etc.), before
further dispatching.
Minor white space cleanup and wording fix.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: adrian, sephe, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
Modified by: sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4928
Vmbus event handler will need to find the channel by its relative
id, when software interrupt for event happens. The original lookup
searches the channel list, which is not very efficient. We now
create a table indexed by the channel relative id to speed up
the channel lookup.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: delphij, adrain, sephe, Dexuan Cui <decui microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4802
Simple bus space stubs require the VA-PA mapping
to be identical.
Approved by: hselasky, cognet (mentor)
Differential revision: https://reviews.freebsd.org/D4314
Remove old header from the include list and declare extern symbol
for delay() function.
Approved by: hselasky, cognet (mentor)
Differential revision: https://reviews.freebsd.org/D5012
This teaches the mx25l driver (sys/dev/flash/mx25l.c) to interact with
sys/dev/fdt/fdt_slicer.c and sys/geom/geom_flashmap.c.
This allows systems with SPI flash to benefit from the possibility to define
flash 'slices' via FDT, just the same way that it's currently possible for
CFI and NAND flashes.
Tested:
* Carambola 2, AR9331 + SPI NOR flash
PR: kern/206227
Submitted by: Stanislav Galabov <sgalabov@gmail.com>
separate file. Claim my copyright.
- Provide more comments, better function and structure names.
- Sort out unneeded includes from resulting two files.
No functional changes.
control algorithm options. The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.
Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.
The new API doesn't yet have any in tree consumers.
The original code written by lstewart.
Reviewed by: rrs, emax
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D711
Red Hat created STB_GNU_UNIQUE to handle certain special cases relating
to dynamically loading C++ DSOs[1].
We don't (currently) have support for STB_GNU_UNIQUE, but ought to
reserve the value in ELFNN_ST_BIND. This will also be used by an
upcoming ELF Tool Chain import.
[1] https://www.redhat.com/archives/posix-c++-wg/2009-August/msg00002.html
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
if more than 64 distinct values had been used.
Table value code uses internal objhash API which requires unique key
for each object. For value code, pointer to the actual value data
is used. The actual problem arises from the fact that 'actual' e.g.
runtime data is stored in array and that array is auto-growing. There is
special hook (update_tvalue() function) which is used to update the pointers
after the change. For some reason, object 'key' was not updated.
Fix this by adding update code to the update_tvalue().
Sponsored by: Yandex LLC
- Fix implementation of atomic_add_unless(). The atomic_cmpset_int()
function returns a boolean and not the previous value of the atomic
variable.
- The atomic counters should be signed according to Linux.
- Some minor cosmetics and styling while at it.
Reviewed by: alfred @
MFC after: 1 week
Sponsored by: Mellanox Technologies
Recover the vertical space.
Sponsored by: The FreeBSD Foundation
MFC After: 3 days
Obtained from: p4 CH=180830
Reviewed by: gnn, hiren
Differential Revision: https://reviews.freebsd.org/D4898
use of fdt_fixup_table on PowerPC and ARM. As such we can remove it from
other architectures as it's unneeded.
Reviewed by: nwhitehorn
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D5013
When processing loader.conf if console contained an entry for an unsupported
console then cons_set would return an error refusing to set any console.
This has two side effects:
1. Forth would throw a syntax error and stop processing loader.conf at that
point.
2. The value of console is ignored.
#1 Means other important loader.conf entries may not be processed, which is
clearly undesirable.
#2 Means the users preference for console aren't applied even if they did
contain valid options. Now we have support for multi boot paths from a
single image e.g. bios and efi mode the console preference needs to deal
with the need to set preference for more than one source.
Fix this by:
* Returning CMD_OK where possible from cons_set.
* Allowing set with at least one valid console to proceed.
Reviewed by: allanjude
MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D5018
idr_alloc_cyclic() in the LinuxKPI. Bump the FreeBSD version to
force recompilation of all KLDs due to IDR structure size change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.
Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.
Differential Revision: https://reviews.freebsd.org/D5007
Submitted by: Damjan Jovanovic
Reviewed by: pfg
Tested by: pho
Relnotes: Yes
MFC after: 2 months (only 10.x)
EFI was mixing caching in two separate places causing issues when multiple
partitions where tested.
Eliminate this by removing fsstat and re-factoring fsread into fsread_size,
adding basic parameter validation.
Also:
* Enhance some error print outs.
* Fix compilation under UFS1_ONLY and UFS2_ONLY
* Use sizeof on vars instead of structs.
* Add basic parameter validation to fsread_size.
MFC after: 1 week
X-MFC-With: r293268
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4989
user VM spaces while servicing jobs. Update various comments and data
structures that refer to AIO daemons as threads to refer to processes
instead.
Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D4999
In r55943, a per-process queue of pending socket AIO requests (requests
waiting for the socket to become ready) was added so that they could be
cancelled during process rundown. In r154765, the rundown code was
changed to handle jobs in this state (JOBST_JOBQSOCK) directly removing
the need for the extra queue. However, the per-process queue head and
global lock were never removed.
Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D4997
1. vhold and zap immediately instead of postponing few lines later
2. increment numneg after new entry is added
No functional changes.
No objections: kib
Previously the code would just increment statistics while only holding a
shared lock, in effect losing updates.
Separate tracking for nchstats is removed as values can be obtained from
existing counters. Note that some fields are updated by external
consumers and are left unfixed. This should not be a serious issue as
this structure looks quite obsolete.
No strong objections: kib
a) Look for the CPL in the payload buffer instead of the descriptor.
b) Retrieve the socket associated with the tid with the inpcb lock held.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Redo LC calibration if temperature changed significantly since last
calibration.
Tested with RTL8188EU/RTL8188CUS in STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Obtained from: NetBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D4966
- Use bitmap for debug output selection.
- Add few new messages (one for URTWN_DEBUG_BEACON
and another one for URTWN_DEBUG_INTR).
- Replace an undocumented URTWN_DEBUG definition with USB_DEBUG.
Tested with RTL8188EU / RTL8188CUS in IBSS / HOSTAP modes.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4959
(by default it was set to 9us).
Tested with RTL8188EU / RTL8188CUS in STA mode.
Reviewed by: kevlo
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4535