optimization.
This fixes building with gcc-4.2.1 (it doesn't support SSE4).
gas-2.17.50 [FreeBSD] supports SSE4 instructions, so this doesn't
need using .byte directives.
This fixes depending on host user headers in the kernel.
Fix user includes (don't depend on namespace pollution in <nmmintrin.h>
that is not included now).
The instrinsics had no advantages except to sometimes avoid compiler
pessimixations. clang understands them a bit better than inline asm,
and generates better looking code which also runs better for cem, but
for me it just at the same speed or slower by doing excessive
unrollowing in all the wrong places. gcc-4.2.1 also doesn't understand
what it is doing with unrolling, but with -O3 somehow it does more
unrolling that helps.
Reduce 1 of the the compiler pessimizations (copying a variable which
already satisfies an "rm" constraint in a good way by being in memory
and not used again, to different memory and accessing it there. Force
copying it to a register instead).
Try to optimize the inner loops significantly, so as to run at full
speed on smaller inputs. The algorithm is already very MD, and was
tuned for the throughput of 3 crc32 instructions per cycle found on
at least Sandybridge through Haswell. Now it is even more tuned for
this, so depends more on the compiler not rearranging or unrolling
things too much. The main inner loop for should have no difficulty
runing at full speed on these CPUs unless the compiler unrolls it too
much. However, the main inner loop wasn't even used for buffers smaller
than 24K. Now it is used for buffers larger than 384 bytes. Now it
is not so long, and the main outer loop is used more. The new
optimization is to try to arrange that the outer loop runs in parallel
with the next inner loop except for the final iteration; then reduce
the loop sizes significantly to take advantage of this.
Approved by: cem
Not tested in production by: bde
Some code was additionally moved for (future) lock splitting.
Tested with Intel 6205, STA mode.
Differential Revision: https://reviews.freebsd.org/D10106
We don't have enouch space to store full VFP context within mcontext
stucture. Due to this:
- follow i386/amd64 way and store VFP state outside of the mcontext_t
but point to it. Use the size of VFP state structure as an 'magic'
indicator of the saved VFP state presence.
- teach set_mcontext() about this external storage.
- for signal delivery, store VFP state to expanded 'struct sigframe'.
Submited by: Andrew Gierth (initial version)
PR: 217611
MFC after: 2 weeks
Kernel environment variable hw.busdma.default can take values 'bounce'
and 'dmar' and selects corresponding busdma backend as default.
Per-device environment variable hw.busdma.pci<domain>.<bus>.<slot>.<func>
takes the same values and overrides hw.busdma.default for the given device.
Note that even with hw.busdma.default=bounce, DMA translation engines
are still started if DMARs are enabled, to disable them use
hw.dmar.dma tunable, as before.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
FreeBSD uses upstream DTB for RPi3 build and compatibility string for
i2c device is different there. Add this new string to compatibility data.
Reported by: Karl Denninger
MFC after: 3 days
Do not try to use errno(2) codes here; instead, just return unique
value (1) when radio is disabled via hardware switch and another
one (-1) for any other error in initialization path.
Tested with Intel 6205, STA mode.
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.
Reviewed by: kib
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
ip_forward, TCP/IPv6, and probably SCTP leaked references to L2 cache
entry because they used their own routes on the stack, not in_pcb routes.
The original model for route caching was callers that provided a route
structure to ip{,6}input() would keep the route, and this model was used
for L2 caching as well. Instead, change L2 caching to be done by default
only when using a route structure in the in_pcb; the pcb deallocation
code frees L2 as well as L3 cacches. A separate change will add route
caching to TCP/IPv6.
Another suggestion was to have the transport protocols indicate willingness
to use L2 caching, but this approach keeps the changes in the network
level
Reviewed by: ae gnn
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10059
As noted in the comment, nothing special needs to be done to destroy
the unneeded context after the allocation race, but the context memory
itself still should to be freed.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It does not make sense since identity mapping already provides the
required mapping for RMRR ranges. More, since identity page tables do
not reflect content of map entries for id domains, creating RMRR
entries makes domain data inconsistent.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This can significantly reduce scan duration thus saving time and power.
EBS failure reported by FW disables EBS for current connection. It is
re-enabled upon new connection attempt on any WLAN interface.
Obtained from: dragonflybsd.git 89f579e9823a5c446ca172cf82bbc210d6a054a4
If this is the last running vap wait until device will be powered off
(fixes panic when 'ifconfig wlan0 destroy' is executed for running iwn(4)
interface).
Tested with:
- Intel 6205, STA mode.
- RTL8188EU, STA / IBSS modes.
- RTL8821AU, STA / HOSTAP modes.
Long ago, perhaps only on i386, kernel text was mapped read-only and
it was necessary to change the mapping to read-write to set breakpoints
in kernel text. Other writes by ddb to kernel text were also allowed.
This write protection is harder to implement with 4MB pages, and was
lost even for 4K pages when 4MB pages were implemented. So changing
the mapping became useless. It was actually worse than useless since
it followed followed various null and otherwise garbage pointers to
not change random memory instead of the mapping. (On i386s, the
pointers became good in pmap_bootstrap(), and on amd64 the pointers
became bad in pmap_bootstrap() if not before.)
Another bug broke detection of following of null pointers on i386,
except early in boot where not detecting this was a feature. When
I fixed the bug, I accidentally broke the feature and soon got traps
in db_write_bytes(). Setting breakpoints early in ddb was broken.
kib pointed out that a clean way to do the adjustment would be to use
a special [sub]map giving a small window on the bytes to be written.
The trap handler didn't know how to fix up errors for pagefaults
accessing the map itself. Such errors rarely need fixups, since most
traps for the map are for the first access which is a read.
Reviewed by: kib
crash when the file shrinks. This also fixes sendfile(2) not sending more
data in a case when the file grows, and the request is open-ended or
specifies a size that is greater than old file size.
PR: 217789
Reviewed by: gallatin
MFC after: 10 days
- in mcontext_t, rename newer used 'union __vfp' to equaly sized 'mc_spare'.
Space allocated by 'union __vfp' is too small and cannot hold full
VFP context.
- move structures defined in fp.h to more appropriate headers.
- remove all unused VFP structures.
MFC after: 2 weeks
illumos/illumos-gate@8363e80ae7https://github.com/illumos/illumos-gate/commit/8363e80ae72609660f6090766ca8c2c18https://www.illumos.org/issues/7303
This change introduces a new weighting algorithm to improve metaslab selection.
The new weighting algorithm relies on the SPACEMAP_HISTOGRAM feature. As a result,
the metaslab weight now encodes the type of weighting algorithm used
(size-based vs segment-based).
This also introduce a new allocation tracing facility and two new dcmds to help
debug allocation problems. Each zio now contains a zio_alloc_list_t structure
that is populated as the zio goes through the allocations stage. Here's an
example of how to use the tracing facility:
> c5ec000::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 NOT_ALLOCATABLE ztest.0a
- 0 400 0 ENOSPC ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 NOT_ALLOCATABLE ztest.0a
- 0 200 0 ENOSPC ztest.0a
1 0 400 1 x 8M 17b1a00 ztest.0a
> 1ff2400::print zio_t io_alloc_list | ::walk list | ::metaslab_trace
MSID DVA ASIZE WEIGHT RESULT VDEV
- 0 200 0 NOT_ALLOCATABLE mirror-2
- 0 200 0 NOT_ALLOCATABLE mirror-0
1 0 200 1 x 4M 112ae00 mirror-1
- 1 200 0 NOT_ALLOCATABLE mirror-2
- 1 200 0 NOT_ALLOCATABLE mirror-0
1 1 200 1 x 4M 112b000 mirror-1
- 2 200 0 NOT_ALLOCATABLE mirror-2
If the metaslab is using segment-based weighting then the WEIGHT column will
display the number of segments available in the bucket where the allocation
attempt was made.
Author: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
I got a report of this source file not building on Raspberry Pi. It's
interesting that this only fails for that target and not for others.
Again, that's no reason not to include the right headers.
PR: 217969
Reported by: Johannes Jost Meixner
MFC after: 1 week
Add support for early boot access to NVRAM variables, using a new
bhnd_nvram_data_getvar_direct() API to support zero-allocation direct
reading of NVRAM variables from a bhnd_nvram_io instance backed by the
CFE NVRAM device.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D9913
The existing ELF image activator requires the brandinfo to provide such
a string unconditionally, even if the executable format in question
doesn't use this type of branding. Skip matching when it's a null
pointer.
Reviewed by: kib
MFC after: 2 weeks
r315083 essentially reverted r263954 which was made for a good reason,
but didn't take into account AACRAID_DEBUG.
Now both types of build should be clean.
MFC after: 5 days
No MFC to: stable/10
I think this message is not very useful for end user. Also its formatting
does not match other messages printed at that time. Those who really need
this information can always find it in `camcontrol negotiate daX -v`.
MFC after: 2 weeks
This way we can avoid blocking the whole queue in the low memory
situations. It's better to sacrifice some I/O performance by not doing
the aggregation than to add an indefinite wait for more memory.
Reviewed by: smh
MFC after: 2 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9999
This is done so that the thread state changes during the switch
are not confused with the thread state changes reported when the thread
spins on a lock.
Here is an example, three consecutive entries for the same thread (from top to
bottom):
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep", attributes: prio:84, wmesg:"-", lockname:"(null)"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning", attributes: lockname:"sched lock 1"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running", attributes: none
The above trace could leave an impression that the final state of
the thread was "running".
After this change the sleep state will be reported after the "spinning"
and "running" states reported for the sched lock.
Reviewed by: jhb, markj
MFC after: 1 week
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9961
Newbus handles multiple equally named device classes without problems,
so there is no reason to use slightly cryptic "<foo>_shdci" for them.
In contrast, the driver module name must be unique, so "<foo>_shdci"
is the right name for it.
* All the supported firmwares have these flags set.
* This removes the following flags:
IWM_UCODE_TLV_FLAGS_PM_CMD_SUPPORT,
IWM_UCODE_TLV_FLAGS_NEWBT_COEX,
IWM_UCODE_TLV_FLAGS_BF_UPDATED,
IWM_UCODE_TLV_FLAGS_D3_CONTINUITY_API,
IWM_UCODE_TLV_FLAGS_STA_KEY_CMD,
IWM_UCODE_TLV_FLAGS_DEVICE_PS_CMD,
IWM_UCODE_TLV_FLAGS_SCHED_SCAN,
IWM_UCODE_TLV_FLAGS_RX_ENERGY_API,
IWM_UCODE_TLV_FLAGS_TIME_EVENT_API_V2
* Also remove definitions and code for dealing with the v1 time-event api.
* Remove unneeded calc_rssi() function.
Obtained from: dragonflybsd.git d078c812418d0e2c3392e99fa25fc776d07bdfad
matches static binaries.
Interpretation of the 'static' there is that the binary must not
specify an interpreter. In particular, shared objects are matched by
the brand if BI_CAN_EXEC_DYN is also set.
This improves precision of the brand matching, which should eliminate
surprises due to brand ordering.
Revert r315701.
Discussed with and tested by: ed (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.
Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,
Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026
Similar to the change for sendmsg(), create a pointer size independent
implementation of recvmsg() and let cloudabi32 and cloudabi64 call into
it. In case userspace requests one or more file descriptors, call
kern_recvit() in such a way that we get the control message headers in
an mbuf. Iterate over all of the headers and copy the file descriptors
to userspace.
CloudABI executables are statically linked and don't have an
interpreter. Setting the interpreter path to NULL used to work
previously, but r314851 introduced code that checks the string
unconditionally. Running CloudABI executables now causes a null pointer
dereference.
Looking at the rest of imgact_elf.c, it seems various other codepaths
already leaned on the fact that the interpreter path is set. Let's just
go ahead and pick an obviously incorrect interpreter path to appease
imgact_elf.c.
MFC after: 1 week
Reduce the potential amount of code duplication between cloudabi32 and
cloudabi64 by creating a cloudabi_sock_recv() utility function. The
cloudabi32 and cloudabi64 modules will then only contain code to convert
the iovecs to the native pointer size.
In cloudabi_sock_recv(), we can now construct an SCM_RIGHTS cmsghdr in
an mbuf and pass that on to kern_sendit().
This will provide a slightly better smoking gun than just stating
"can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9)
and sysctl_remove_{name,oid}(9) with a non-dynamic (likely
static) sysctl.
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Because of integer types, the timeout calculation result was limited to
INT_MAX / (1000 * hz) seconds. For systems with hz=10000, this is only 215
seconds. Perform the calculation with 64-bit math to allow sleeping for the
full INT_MAX / hz interval (215000 seconds on such hz=10000 systems).
Submitted by: Scott Ferris <sferris at isilon.com>
Sponsored by: Dell EMC Isilon
We must ensure there's space for the terminating null in the temporary
buffer in imgact_binmisc_populate_interp().
Note that there's no buffer overflow here because xbe->xbe_interpreter's
length and null termination is checked in imgact_binmisc_add_entry()
before imgact_binmisc_populate_interp() is called. However, the latter
should correctly enforce its own bounds.
Reviewed by: sbruno
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10042
Let firmware do its best first, and if it can't, try software recovery.
I would remove software timeout handler completely, but found bunch of
complains on command timeout on sparc64 mailing list few years ago, so
better be safe in case of interrupt loss.
MFC after: 2 weeks
For three years now CAM does not use SIM lock, but still enforces SIM to
use it. Remove this requirement, allowing SIMs to have any locking they
prefer, if they pass no mutex to cam_sim_alloc().
MFC after: 2 weeks
This is a painful change, but it is needed. On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD. We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.
Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
kernel structures inpcb and tcpcb inside. Export into these structures
the fields from inpcb and tcpcb that are known to be used, and put there
a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.
Reviewed by: rrs, gnn
Differential Revision: D10018
Since the uset can set dhcp.interface-mtu, we need to try to validate the
value. So we verify if the conversion to int is successful and we will not
allow to set value greater than max IPv4 packet size.
Also use snprintf for safety.
Reviewed by: allanjude, bapt
Approved by: allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D8492
In r315408, disk_cleanup was removed, which is called at
sys/boot/userboot/userboot/userboot_disk.c:113.
This causes bhyveload to fail.
PR: 217935
Reported by: Fabian Freyer
Reviewed by: allanjude
Approved by: allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D10060
vm_pindex_t's into a vm_ooffset_t.
The length given to shm_dotruncate() must never be negative. Assert this.
Tidy up a comment.
Reviewed by: kib
MFC after: 1 week
mmc(4). For the most part, this consists of support for:
- Switching the signal voltage (VCCQ) to 1.8 V or (if supported
by the host controller) to 1.2 V,
- setting the UHS mode as appropriate in the SDHCI_HOST_CONTROL2
register,
- setting the power class in the eMMC device according to the
core supply voltage (VCC),
- using different bits for enabling a bus width of 4 and 8 bits
in the the eMMC device at DDR or higher timings respectively,
- arbitrating timings faster than high speed if there actually
are additional devices on the same MMC bus.
Given that support for DDR52 is not denoted by SDHCI capability
registers, availability of that timing is indicated by a new
quirk SDHCI_QUIRK_MMC_DDR52 and only enabled for Intel SDHCI
controllers so far. Generally, what it takes for a sdhci(4)
front-end to enable support for DDR52 is to hook up the bridge
method mmcbr_switch_vccq (which especially for 1.2 V signaling
support is chip/board specific) and the sdhci_set_uhs_timing
sdhci(4) method.
As a side-effect, this change also fixes communication with
some eMMC devices at SDR high speed mode with 52 MHz due to
the signaling voltage and UHS bits in the SDHCI controller no
longer being left in an inappropriate state.
Compared to 52 MHz at SDR high speed which typically yields
~45 MB/s with the eMMC chips tested, throughput goes up to
~80 MB/s at DDR52.
Additionally, this change already adds infrastructure and quite
some code for modes up to HS400ES and SDR104 respectively (I did
not want to add to much stuff at a time, though). Essentially,
what is still missing in order to be able to activate support
for these latter is is support for and handling of (re-)tuning.
o In sdhci(4), add two tunables hw.sdhci.quirk_clear as well as
hw.sdhci.quirk_set, which (when hooked up in the front-end)
allow to set/clear sdhci(4) quirks for debugging and testing
purposes. However, especially for SDHCI controllers on the
PCI bus which have no specific support code so far and, thus,
are picked up as generic SDHCI controllers, hw.sdhci.quirk_set
allows for setting the necessary quirks (if required).
o In mmc(4), check and handle the return values of some more
function calls instead of assuming that everything went right.
In case failures actually are not problematic, indicate that
by casting the return value to void.
Reviewed by: jmcneill
This should reduce overhead for aggregates (since every second frame
clears the queue and reschedules the task there is no need to cancel
the callout here; let it just run once at the end - even if queue is
empty).
Reported by: adrian
calculated at runtime based on how long it takes to set up an event in
hardware. This fixes the intermittant 1-minute hang at boot on imx5
systems, and also the occasional oversleeping while running. It doesn't
affect imx6 systems, which use different hardware for eventtimers.
It turns out that it usually takes about 30 timer ticks to set up the timer
compare register, and the old hard-coded minimum period was 10 ticks. On
the rare occasions when a timeout event that short was set up, we'd miss
the event and have to wait about 64 seconds for counter rollover before
the compare interrupt would fire.
Instead of just hardcoding a new bigger value, the code now measures the
time it takes to do the register read/write sequence to set up the compare
register, scales it up by 1.5x to be safe, and calculates the minimum event
period from the result. In the real world, the minimum period works out to
about 750 nanoseconds on imx5 hardware.
It turns out to be surprisingly expensive to access the gpt hardware (on the
order of 150ns per read/write). To cut down on the overhead of setting up
each eventtimer event, eliminate read-modify-write sequences to manage the
compare interrupt enable, by keeping a shadow copy of the hardware register
and only writing to the hardware when the enable bits really change.
This should allow to drop 'ieee80211_ff_[age/flush]' calls from drivers
(an additional call can be made from ieee80211_tx_complete()
for non-default ieee80211_ffagemax values to prevent stalls -
but it will require an additional counter for transmitted frames).
Tested with RTL8821AU, STA mode (A-MSDU part only).
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9984
Simplify the logic for clipping the range returned by the pager to fit
within the map entry.
Use atop() rather than OFF_TO_IDX() on addresses.
Reviewed by: kib
MFC after: 1 week
For 24xx and above use 2 vectors (default and response queue).
For 26xx and above use 3 vectors (default, response and ATIO queues).
Due to global lock interrupt hardlers never run simultaneously now, but
at least this allows to save one regitster read per interrupt.
MFC after: 2 weeks
When re-calculating the last inclusive page index after the pager
call, -1 was erronously ommitted. If the pager extended the run
(unlikely), the result would be insertion of the valid page mapping
outside the current map entry range.
Found by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Since we support RQSTYPE_RPT_ID_ACQ, that functionality is only useful
in loop mode, which probably doesn't worth having this hack in 2017.
MFC after: 2 weeks
When I initially did this 11n TX work in days of yonder, my 802.11 standards
clue was ... not as finely tuned. One of the things in 802.11-2012 (which
I guess technically was after I did this work, but I'm sure it was like this
in the previous rev?) is that among other traffic classes, three things are
important:
* group addressed frames should be default non-QoS, even if they're QoS frames, and
* group addressed frames should have a seqno out of a different space than the
per-TID QoS one; and because of this
* group addressed frames, being non-QoS, should never be in the Block-ACK window
for TX.
Now, net80211 and now this code cheats by using the non-QOS TID, but ideally
we'd introduce a separate seqno space just for multicast/group traffic for
TX and RX comparison.
Later extensions (eg reliable multicast / multimedia) express what one should do
when doing multicast traffic in a TID. Now, technically we /could/ do group traffic
as QoS traffic and throw it into a per-TID seqno space, but this definitely
introduces ordering issues when you take into account things like CABQ behaviour.
(Ie, if some traffic in the TID goes into the CABQ and some doesn't, because
it's doing a split of multicast and non-multicast traffic, then you have
seqno ordering issues.)
So, until someone implements 802.11vv reliable multicast / multimedia extensions,
group traffic is non-QoS.
Next, software/hardware queue TID mapping. In the past I believed the WME tagging
of frames because well, net80211 had a habit of tagging things like management
traffic with it. But, then we also map QoS traffic categories to TIDs as well.
So, we should obey the TID! But! then it put some management traffic into higher
WME categories too, as those frames don't have QoS TIDs. But! It'd do things like
put things like QoS action frames into higher WME categories, when they should
be kept in-order with the rest of the traffic for that TID. So! Given all of this,
the ath(4) driver does overrides to not trust the WME category.
I .. am undoing some of this. Now, the TID has a 1:1 mapping to the hardware
queue. The TID is the primary source of truth now for all QoS traffic.
The WME is only used for non-QoS traffic. This now means that any TID traffic
queued should be consistently queued regardless of WME, so things like the
"TX finished, do more TX" that is occuring right now for transmit handling
should be "better".
The consistent {TID, WME} -> hardware queue mapping is important for
transmit completion. It's used to schedule more traffic for that
particular TID, because that {many TID}:{1 TXQ} mapping in ath_tx_tid_sched()
is used for driving completion. Ie, when the hardware queue completes,
it'll walk that list of scheduled TIDs attached to that TXQ.
The eventual aim is to get ready for some other features around putting
some data into other hardware queues (eg for better PS-POLL support,
uAPSD, support, correct-er TDMA support, etc) which requires that
I tidy all of this up in preparation for then introducing further
TID scheduling that isn't linked to a hardware TXQ (likely a per-WME, per-TID
driver queue, and a per-node driver queue) to enable that.
Tested:
* AR9380, STA mode
* AR9380, AR9580, AP mode
cleanups enabled by that:
- The only thing left in imx_gptvar.h was the softc, which IMO never
should have been in there at all. Move it into the driver, and
delete the header file.
- Remove several unneeded #includes from the driver.
- Change imx_gpt_softc from global to static (it's used by DELAY()), and
don't redundantly static-initialize it to NULL.
In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).
This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.
PR: 217883
Submitted by: Kajetan Staszkiewicz
MFC after: 1 week
Sponsored by: InnoGames GmbH
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Reviewed by: kib, ngie, jilles
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10020
We should never end up executing the inter-function padding, so we
are better off faulting than silently carrying on to whatever function
happens to be next.
Note that LLD will soon do this by default (although it currently pads
with zeros).
Reviewed by: dim, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10047
Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to
elf_map_insert(), it was needlessly enabling execute access on the
mapping, and it would later have to call vm_map_protect() to correct the
mapping's access rights. Now, instead, elf_load_section() always passes
its parameter "prot" to elf_map_insert(). So, elf_load_section() must
only call vm_map_protect() if it needs to remove the write access that
was temporarily granted to perform a copyout().
Reviewed by: kib
MFC after: 1 week
- PIE_MAX_PROB is compared to variable of int64_t and the type promotion
rules can cause the value of that variable to be treated as unsigned.
If the value is actually negative, then the result of the comparsion
is incorrect, causing the algorithm to perform poorly in some
situations. Changing the constant to be signed cause the comparision
to work correctly.
- PIE_SCALE is also compared to signed values. Fortunately they are
also compared to zero and negative values are discarded so this is
more of a cosmetic fix.
- PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small
enough that the automatic promotion to unsigned is harmless.
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep()
has not updated rmt, so sys_nanosleep() updates the user-space rmtp
by copying garbage from its stack frame. This is not only a kernel
memory disclosure, it's also not POSIX-compliant. Fix it to update
rmtp only on EINTR.
Reviewed by: jilles (via D10020), dchagin
MFC after: 3 days
Security: possibly
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10044
There were two copies of the code: one in generic code was half-broken, and
another in platform code was never called. Leave only one in generic code
and working.
MFC after: 2 weeks
supply the addresses for the DPLL register blocks) by hard-coding the
addresses in the driver source code. Yes, this is just as bad an idea as
it sounds, but we have no choice.
In the early days of using fdt data, when we were making up our own data
for each board, we defined 4 sets of memory mapped registers in the data.
The vendor-supplied data only provides the address of the CCM register
block, but not the 3 DPLL blocks. The linux driver has the DPLL physical
addresses (which differ by SOC type) hard-coded in the driver, and we
have no choice but to do the same thing if we want to run with the vendor-
supplied fdt data.
So now we use bus_space_map() to make the DPLL blocks accessible, choosing
the set of fixed addresses to map based on the soc id.
for vt. Restore syscons' rendering of background (bg) brightness as
foreground (fg) blinking and vice versa, and add rendering of blinking
as background brightness to vt.
Bright/saturated is conflated with light/white in the implementation
and in this description.
Bright colors were broken in all cases, but appeared to work in the
only case shown by "vidcontrol show". A boldness hack was applied
only in 1 layering-violation place (for some syscons sequences) where
it made some cases seem to work but was undone by clearing bold using
ANSI sequences, and more seriously was not undone when setting
ANSI/xterm dark colors so left them bright. Move this hack to drivers.
The boldness hack is only for fg brightness. Restore/add a similar hack
for bg brightness rendered as fg blinking and vice versa. This works
even better for vt, since vt changes the default text mode to give the
more useful bg brightness instead of fg blinking.
The brightness bit in colors was unnecessarily removed by the boldness
hack. In other cases, it was lost later by teken_256to8(). Use
teken_256to16() to not lose it. teken_256to8() was intended to be
used for bg colors to allow finer or bg-specific control for the more
difficult reduction to 8; however, since 16 bg colors actually work
on VGA except in syscons text mode and the conversion isn't subtle
enough to significantly in that mode, teken_256to8() is not used now.
There are still bugs, especially in vidcontrol, if bright/blinking
background colors are set.
Restore XOR logic for bold/bright fg in syscons (don't change OR
logic for vt). Remove broken ifdef on FG_UNDERLINE and its wrong
or missing bit and restore the correct hard-coded bit. FG_UNDERLINE
is only for mono mode which is not really supported.
Restore XOR logic for blinking/bright bg in syscons (in vt, add
OR logic and render as bright bg). Remove related broken ifdef
on BG_BLINKING and its missing bit and restore the correct
hard-coded bit. The same bit means blinking or bright bg depending
on the mode, and we want to ignore the difference everywhere.
Simplify conversions of attributes in syscons. Don't pretend to
support bold fonts. Don't support unusual encodings of brightness.
It is as good as possible to map 16 VGA colors to 16 xterm-16
colors. E.g., VGA brown -> xterm-16 Olive will be converted back
to VGA brown, so we don't need to convert to xterm-256 Brown. Teken
cons25 compatibility code already does the same, and duplicates some
small tables. This is mostly for the sc -> te direction. The other
direction uses teken_256to16() which is too generic.
in practice).
db_expr_t is a signed type, but right shifts are fudged to evaluate
them in an unsigned type, and the unsigned type was broken by hard-
coding it as 'unsigned', so casting to it lost the top bits on arches
with db_expr_t larger than u_int.
The unsigned type with the same size as db_expr_t is not declared;
assume that db_addr_t gives it. Fixing this properly is less important
than using the correct type for db_expr_t (originally always long for
C90, but always intmax_t since C99).
Rules are unlinked in shutdown_pf(), so we must call
pf_unload_vnet_purge(), which frees unlinked rules, after that, not
before.
Reviewed by: eri, bz
Differential Revision: https://reviews.freebsd.org/D10040
This adds support for matching against a core lookup table when performing
early boot core lookup, and includes the BCM4706/Northstar-specific
ChipCommon core ID in the set of supported ChipCommon cores.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D10033
Extend the Book-E pmap to support 64-bit operation. Much of this was taken from
Juniper's Junos FreeBSD port. It uses a 3-level page table (page directory
list -- PP2D, page directory, page table), but has gaps in the page directory
list where regions will repeat, due to the design of the PP2D hash (a 20-bit gap
between the two parts of the index). In practice this may not be a problem
given the expanded address space. However, an alternative to this would be to
use a 4-level page table, like Linux, and possibly reduce the available address
space; Linux appears to use a 46-bit address space. Alternatively, a cache of
page directory pointers could be used to keep the overall design as-is, but
remove the gaps in the address space.
This includes a new kernel config for 64-bit QorIQ SoCs, based on MPC85XX, with
the following notes:
* The DPAA driver has not yet been ported to 64-bit so is not included in the
kernel config.
* This has been tested on the AmigaOne X5000, using a MD_ROOT compiled in
(total size kernel+mdroot must be under 64MB).
* This can run both 32-bit and 64-bit processes, and has even been tested to run
a 32-bit init with 64-bit children.
Many thanks to stevek and marcel for getting Juniper's FreeBSD patches open
sourced to be used here, and to stevek for reviewing, and providing some
historical contexts on quirks of the code.
Reviewed by: stevek
Obtained from: Juniper (in part)
MFC after: 2 months
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D9433
For instance, in the dtrace/dtrace module, building dtrace_asm.o wants
to build genassym.o first, but it doesn't build the missing ilinks
and if_*.h headers which are part of the OBJS_DEPEND_GUESS list
of dependencies to build if a .depend file is missing.
MFC after: 1 week
Sponsored by: Dell EMC Isilon
The objects that may be in the dependency graph may not match
${OBJS}. Ensure the ilink link is added as a dependency for
all of them when a .depend file is missing for that objfile.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.
sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.
sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.
sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.
sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.
sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.
sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.
tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.
PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451
functions in the LinuxKPI. Add a usage atomic to the task_struct
structure to facilitate refcounting the task structure when returned
from get_pid_task(). The get_task_struct() and put_task_struct()
function is used to manage atomic refcounting. After this change the
task_struct should only be freed through put_task_struct().
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
On the original i386, %dr[4-5] were unimplemented but not very clearly
reserved, so debuggers read them to print them. i386 was still doing
this.
On the original athlon64, %dr[4-5] are documented as reserved but are
aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps.
On 2 of my systems, accessing %dr[4-5] trapped sometimes. On my Haswell
system, the apparent randomness was because the boot CPU starts with
CR4_DE set while all other CPUs start with CR4_DE clear. FreeBSD
doesn't support the data breakpoints enabled by CR4_DE and it never
changes this flag, so the flag remains different across CPUs and
the behaviour seemed inconsistent except while booting when the CPU
doesn't change.
The invalid accesses broke:
- read access for printing the registers in ddb "show watches" on CPUs
with CR4_DE set
- read accesses in fill_dbregs() on CPUs with CR4_DE set. This didn't
implement panic(3) since the user case always skipped %dr[4-5].
- write accesses in set_dbregs(). This also didn't affect userland.
When it didn't trap, the aliasing made it fragile.
Don't print the dummy (zero) values of %dr[4-5] in "show watches" for
i386 or amd64. Fix style bugs near this printing.
amd64 also has space in the dbregs struct for the reserved %dr[8-15]
and already didn't print the dummy values for these, and never accessed
any of the 10 reserved debug registers.
Remove cpufuncs for making the invalid accesses. Even amd64 had these.
It seems to be old code from the armv6 project branch that never had a
kernel config.
Reviewed by: mmel
Sponsored by: ABT Systems Lrd
Differential Revision: https://reviews.freebsd.org/D7166
nodes from the DTB by default. This will allow us to enumerate the CPUs
without hard coding the CPU count into code.
Reviewed by: br
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D9827
As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs recv,
update the threshold at which we start agressive reclamation to use
SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which
defaults to 1M.
PR: 194513
Reviewed by: avg, mav
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D10012
be migrated to this and will allow the removal of this option.
Reviewed by: ian
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D9907
some associated helper functions in the LinuxKPI. Let the existing
linux_alloc_current() function allocate and initialize the new
structure and let linux_free_current() drop the refcount on the memory
mapping structure. When the mm_struct's refcount reaches zero, the
structure is freed.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
registers.
- Add slot type capability bits. These bits should allow recognizing
removable card slots, embedded cards and shared buses (shared bus
supposedly is always comprised of non-removable cards).
- Dump CAPABILITIES2, ADMA_ERR, HOST_CONTROL2 and ADMA_ADDRESS_LO
registers in sdhci_dumpregs().
- The drive type support flags in the CAPABILITIES2 register are for
drive types A,C,D, drive type B is the default setting (value 0) of
the drive strength field in the SDHCI_HOST_CONTROL2 register.
Obtained from: DragonFlyBSD (9e3c8f63, 455bd1b1)
the default partition, eMMC v4.41 and later devices can additionally
provide up to:
1 enhanced user data area partition
2 boot partitions
1 RPMB (Replay Protected Memory Block) partition
4 general purpose partitions (optionally with a enhanced or extended
attribute)
Of these "partitions", only the enhanced user data area one actually
slices the user data area partition and, thus, gets handled with the
help of geom_flashmap(4). The other types of partitions have address
space independent from the default partition and need to be switched
to via CMD6 (SWITCH), i. e. constitute a set of additional "disks".
The second kind of these "partitions" doesn't fit that well into the
design of mmc(4) and mmcsd(4). I've decided to let mmcsd(4) hook all
of these "partitions" up as disk(9)'s (except for the RPMB partition
as it didn't seem to make much sense to be able to put a file-system
there and may require authentication; therefore, RPMB partitions are
solely accessible via the newly added IOCTL interface currently; see
also below). This approach for one resulted in cleaner code. Second,
it retains the notion of mmcsd(4) children corresponding to a single
physical device each. With the addition of some layering violations,
it also would have been possible for mmc(4) to add separate mmcsd(4)
instances with one disk each for all of these "partitions", however.
Still, both mmc(4) and mmcsd(4) share some common code now e. g. for
issuing CMD6, which has been factored out into mmc_subr.c.
Besides simply subdividing eMMC devices, some Intel NUCs having UEFI
code in the boot partitions etc., another use case for the partition
support is the activation of pseudo-SLC mode, which manufacturers of
eMMC chips typically associate with the enhanced user data area and/
or the enhanced attribute of general purpose partitions.
CAVEAT EMPTOR: Partitioning eMMC devices is a one-time operation.
- Now that properly issuing CMD6 is crucial (so data isn't written to
the wrong partition for example), make a step into the direction of
correctly handling the timeout for these commands in the MMC layer.
Also, do a SEND_STATUS when CMD6 is invoked with an R1B response as
recommended by relevant specifications. However, quite some work is
left to be done in this regard; all other R1B-type commands done by
the MMC layer also should be followed by a SEND_STATUS (CMD13), the
erase timeout calculations/handling as documented in specifications
are entirely ignored so far, the MMC layer doesn't provide timeouts
applicable up to the bridge drivers and at least sdhci(4) currently
is hardcoding 1 s as timeout for all command types unconditionally.
Let alone already available return codes often not being checked in
the MMC layer ...
- Add an IOCTL interface to mmcsd(4); this is sufficiently compatible
with Linux so that the GNU mmc-utils can be ported to and used with
FreeBSD (note that due to the remaining deficiencies outlined above
SANITIZE operations issued by/with `mmc` currently most likely will
fail). These latter will be added to ports as sysutils/mmc-utils in
a bit. Among others, the `mmc` tool of the GNU mmc-utils allows for
partitioning eMMC devices (tested working).
- For devices following the eMMC specification v4.41 or later, year 0
is 2013 rather than 1997; so correct this for assembling the device
ID string properly.
- Let mmcsd.ko depend on mmc.ko. Additionally, bump MMC_VERSION as at
least for some of the above a matching pair is required.
- In the ACPI front-end of sdhci(4) describe the Intel eMMC and SDXC
controllers as such in order to match the PCI one.
Additionally, in the entry for the 80860F14 SDXC controller remove
the eMMC-only SDHCI_QUIRK_INTEL_POWER_UP_RESET.
OKed by: imp
Submitted by: ian (mmc_switch_status() implementation)
We should be more verbose about read errors from biosdisk, except filter
out the floppy controller errors, which apparently are resulting from
read attempt from device without the media present.
Reviewed by: allanjude
Approved by: allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D10032
pairwise to support the FreeBSD way of pushing and popping the page
fault flags. Ensure this by requiring every occurrence of pagefault
disable function call to have a corresponding pagefault enable call.
Obtained from: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
codes. This will be used to fix bright colors.
Improve teken_256to8(). Use a lookup table instead of calculations. The
calculations were inaccurate since they used indexes into the xterm-256
6x6x6 color map instead of actual xterm colors. Also, change the threshold
for converting to a primary color: require the primary's component to be
2 or more higher instead of just higher. This affects about 1/5 of the
table entries and gives uniformly distributed colors in the 6x6x6 submap
except for greys (35 entries each for red, green, blue, cyan, brown and
magenta, instead of approx. only 15 each for the mixed colors). Even
more mixed colors would be better for matching colors, but uniform
distribution is best for preserving contrast.
For teken_256to16(), bright colors are just the ones with luminosity >=
60%. These are actually light colors (more white instead of more
saturation), while xterm bright colors except for white itself are
actually saturated with no white, so have luminosity only 50%.
These functions are layering violations. teken cannot do correct
conversions since it shouldn't know the color maps of anything except
xterm. Translating through xterm-16 colors loses information. This
gives bugs like xterm-256 near-brown -> xterm-16 red -> VGA red.
The ptrace() user has the option of discarding the signal. In such a
case, p_ptevents should not be modified. If the ptrace() user decides to
send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events
do not have the capability to discard the signal, so continue to clear
the mask in that case.
Reviewed by: jhb (initial revision)
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9939
As we provide the disk size verification and correction via disk_ioctl
and disk state provided by disk_open(), we can not share the partition
state in disk_devdesc structure. Also the sharing does make a lot of sense
with ufs, as only one partition is open at any given time, but zfs pools
do keep the disk devices open.
To make sure we do get the correct information about the open device,
just remove the cache.
Reviewed by: allanjude, smh
Approved by: allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D9757
Support is implemented by mapping Linux's "struct net" into FreeBSD's
"struct vnet". Currently only vnet0 is supported by ibcore.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Ignore them like it's done in the MADT parser. This allows booting on a box
with SRAT and APIC IDs > 255.
Reported by: Wei Liu <wei.liu2@citrix.com>
Tested by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Citrix Systems R&D
Fix this by using more dynamic initialization with simpler ifdefs for
the machine dependencies. Find a frame buffer address in a more
portable way that at least compiles on sparc64.
Otherwise, recent Linux guests will use these instructions, resulting
in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.
This fixes boot-time hangs in recent Linux guests on Ryzen CPUs
(and probably Bulldozer aka AMD FX as well).
Reviewed by: kib
MFC after: 1 week
While there, parse u-boot provided command line arguments
for supported switches and update boothowto appropriately.
Also support setting kenv variables from the kernel comman
line.
PR: 216831 (modified)
The boot1.efi immediate issue from PR216964 is that we are reading into
too small buffer, from UEFI spec 2.6:
The size of the Buffer in bytes. This must be a multiple of the intrinsic block size of the device.
The secondary issue is that LBA calculation does not check reminder from
division.
This fix does check the provided buffer size and if we read less than
media sector size or the read offset is not aligned to sector boundary,
we allocate bounce buffer and perform the read by single sector.
PR: 216964
Reported by: Sergey Kozlov
Reviewed by: allanjude, Sergey Kozlov
Approved by: allanjude (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D9870
inpcb. t4_tom detaches the inpcb from the toepcb as soon as the
hardware is done with the connection (in final_cpl_received) but the
socket is around as long as the cm_id and the rest of iWARP state is.
This fixes an intermittent NULL dereference during abort.
Submitted by: KrishnamRaju ErapaRaju @ Chelsio
MFC after: 3 days
Sponsored by: Chelsio Communications
This patch fixes typo which results in extra bits of PMU control register.
PR: 217782
Submitted by: Svyatoslav <razmyslov at viva64.com>
Found by: PVS-Studio
uma_zcreate()'s alignment argument is supposed to be sizeof(foo) - 1,
and uma.h provides a set of helper macros for common types. Passing
sizeof(void *) results in all of the members being misaligned triggering
unaligned access faults on certain architectures (notably MIPS).
Reported by: brooks
Obtained from: CheriBSD
MFC after: 3 days
Sponsored by: DARPA / AFRL