Passing "-x lazyload" to dtrace -G during compilation causes dtrace(1) to
not link drti.o into the output object file, so the USDT probes are not created
during process startup. Instead, dtrace(1) will automatically discover and
create probes on the process' behalf when attaching.
Differential Revision: https://reviews.freebsd.org/D2203
Reviewed by: rpaulo
MFC after: 1 month
dmar_map_entry. Non-zero offset both increases the required mapping
size, which is handled in dmar_bus_dmamap_load_something1(), and makes
it possible that allocated range crosses boundary, which needs a check
in dmar_gas_match_one().
Reported and tested by: jimharris
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This significantly improves performance on multi-core servers where there
is any kind of IPv4 reassembly going on.
glebius@ would like to see the locking moved to be attached to the reassembly
bucket, which would make it per-bucket + per-VNET, instead of being global.
I decided to keep it global for now as it's the minimal useful change;
if people agree / wish to migrate it to be per-bucket / per-VNET then please
do feel free to do so. I won't complain.
Thanks to Norse Corp for giving me access to much larger servers
to test this at across the 4 core boxes I have at home.
Differential Revision: https://reviews.freebsd.org/D2095
Reviewed by: glebius (initial comments incorporated into this patch)
MFC after: 2 weeks
Sponsored by: Norse Corp, Inc (hardware)
loader.efi still needs work, but boot1.efi now builds.
Differential Revision: https://reviews.freebsd.org/D2244
Reviewed by: rpaulo
Sponsored by: The FreeBSD Foundation
Much of this file is common to the architectures we support, so share
an implementation by adding a little #ifdef-ery.
Differential Revision: https://reviews.freebsd.org/D2241
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
- Extend the number of available subtypes for Ethernet media by using some
of the ifmedia word's option bits to help denote subtypes. As a result, the
number of possible Ethernet subtype values increases from 31 to 511.
- Use some of those new values to define new media types.
- lacp_compose_key() recgonizes the new Ethernet media types added.
(Change made as required by a comment in if_media.h)
- New ioctl, SIOGIFXMEDIA, to handle getting the new extended media types.
SIOCGIFMEDIA is retained for backwards compatibility.
- Changes to ifconfig to allow it to handle the new extended media types.
Submitted by: mike@karels.net (original), hselasky
Reviewed by: jfvogel, gnn, hselasky
Approved by: jfvogel (mentor), gnn (mentor)
Differential Revision: http://reviews.freebsd.org/D1965
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.
Differential Revision: https://reviews.freebsd.org/D1815
Reviewed by: ae
Approved by: gnn (mentor)
Start using 'alloc_size' attribute in the allocator functions.
This is useful as it helps the compiler generate warnings on suspicious
code and can also enable some small optimizations.
This is based on r281130, which brought similar enhnacements
to the standard libc headers.
Note that sockaddr_l2cap structure is changed , check socket address
to initialize new structure member and define L2CAP_SOCKET_CHECKED
before including ng_btsocket.h
Differential Revision: https://reviews.freebsd.org/D2021
Reviewed by:emax
ipfilter code as userland application. To reduce kernel structure knowledge
include if_var.h only if a file is compiled with _KERNEL defined.
In !_KERNEL case, provide our own definition of struct ifnet, that will
satisfy ipftest(1). This was already done earlier to struct ifaddr in
r279029. Protect the definition with _NET_IF_VAR_H_, since kernel part
of ipfilter may include if_var.h and ip_compat.h.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
function that does the locking and validation associated with cleaning
a page. This moves 150 lines of code into its own function.
- Rename vm_pageout_clean() to vm_pageout_cluster() to define what it
really does; clustering nearby pages for pageout optimization.
Reviewd by: alc, kib, kmacy
Tested by: pho (earlier version)
Sponsored by: EMC / Isilon
contain kernel pointers, and instead has interface index.
Bump __FreeBSD_version for that change.
o Now, netstat/mroute6.c no longer needs to kvm_read(3) struct ifnet, and
no longer needs to include if_var.h
Note that this change is far from being a complete move of IPv6 multicast
routing to a proper API. Other structures are still dumped into their
sysctls as is, requiring userland application to #define _KERNEL when
including ip6_mroute.h and then call kvm_read(3) to gather all bits and
pieces. But fixing this is out of scope of the opaque ifnet project.
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
We'll just fall into the same local delivery block under the
'if (m->m_flags & M_FASTFWD_OURS)'.
Suggested by: ae
Differential Revision: https://reviews.freebsd.org/D2225
Approved by: gnn (mentor)
In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.
Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.
Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.
In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments. Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.
Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.
Differential Revision: https://reviews.freebsd.org/D2197
Reviewed by: glebius, gnn
Approved by: gnn (mentor)
under bootverbose. Every example I've seen to date has been due to
an ACPI system resource device reserving a range that overlaps with
system memory (which ram0 attempts to reserve) or a local or I/O APIC
(which apic0 attempts to reserve). These are always harmless but look
scary to users.
MFC after: 1 week
no need for them to be this strong, we only need to provide one or the
other.
While here replace atomic_load_acq_* and atomic_store_rel_* with a single
instruction version, and fix the definition of atomic_clear_* to point to
the correct functions.
Sponsored by: The FreeBSD Foundation
I .. stupidly added code to return HAL_ANI_STATS to HAL_DIAG_ANI_STATS.
I discovered this in a noisy environment when the returned values were
enough to .. well, make everything terrible.
So - restore functionality.
Tested:
* AR5416 (uses the AR5212 HAL), in a /very/ noisy 2GHz environment.
Enough to trigger ANI to get upset and generate useful data.
The MAC addresses were totally wrong. They're like the DIR-625C1 - at
0x1ffe0004 and 0x1ffe0018. They're however stored as text strings.
The ath0 MAC address is also not set, even though the calibration
partition is valid.
So, pick the board address / first MAC as the ath0 MAC, and derive
arge0/arge1 from that. That way they're hopefully unique enough
for people with multiple devices.
Tested:
* DIR-655A1
TODO:
* Do the same for the DIR-625A1 and DIR-625C1.
unroll the loop in ENTRY(pagezero)
acc' to the submitter this results in a reproducible 1% perf
improvement under buildworld like workload
I validated correctness and run-testing, but not performance impact
Submitted by: lidl@pix.net
Reviewed by: adrian
PR: 199151
MFC After: 1 month
These are similar to the mips24k performance counters - some are
available on perfcnt0/3, some are available on perfcnt1/4.
However, the events aren't all the same.
* Add the events, named the same as from Linux oprofile.
* Verify they're the same as "MIPS32(R) 74KTM Processor Core Family
Software User's Manual"; Document Number: MD00519; Revision 01.05.
* Rename INSTRUCTIONS to something else, so it doesn't clash with
the alias INSTRUCTIONS. I'll try to tidy this up later; there
are a few other aliases to add and shuffle around.
Tested:
* QCA9558 SoC (AP135 board) - MIPS74Kc core (no FPU.)
* make universe; where it didn't fail for other reasons.
TODO:
* It'd be nice to support the four performance counters
in at least this hardware, rather than just two.
Reviewed by: bsdimp ("looks good; don't break world".)
Summary:
Book-E and AIM trap.c are almost identical, except for a few bits. This is step
1 in unifying them.
This also renumbers EXC_DEBUG, to not conflict with AIM vector numbers. Since
this is the only one thus far that is used in the switch statement in trap(),
it's the only one renumbered. If others get added to the switch, which conflict
with AIM numbers, they should also be renumbered.
Reviewers: #powerpc, marcel, nwhitehorn
Reviewed By: marcel
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D2215
that performs the equivalent of an automatic madvise(..., MADV_DONTNEED).
The current heuristic, even with the improvements that I made a few years
ago, is a good example of making the wrong trade-off, or optimizing for
the infrequent case. The infrequent case being reading a single file that
is much larger than memory using mmap(2). And, in this case, the page
daemon isn't the bottleneck; it's the I/O.
In all other cases, the current heuristic has too many false positives,
i.e., it caches too many pages that are later reused. To give one
example, thousands of pages are cached by the current heuristic during a
buildworld and all of them are reactivated before the buildworld
completes. In particular, clang reads source files using mmap(2) and
there are some relatively large source files in our source tree, e.g.,
sqlite, that are read multiple times. With the new heuristic, I see fewer
false positives and they have a much lower cost.
I actually tried something like this more than two years ago and it
didn't perform as well as the cache behind heuristic. However, that was
before the changes to the page daemon in late summer of 2013 and the
existence of pmap_advise(). In particular, with the page daemon doing
its work more frequently and in smaller batches, it now completes its
work while the application accessing the file is blocked on I/O.
Whereas previously, the page daemon appeared to hog the CPU for so long
that it caused "hiccups" in the application's execution.
Finally, I'll add that the elimination of cache pages is a prerequisite
for NUMA support.
Reviewed by: jeff, kib
Sponsored by: EMC / Isilon Storage Division
- Add macros to handle the differences in accessing these registers on arm
and arm64.
- Use the fdt data to detect if we are on an ARMv7 or ARMv8.
- Use the virtual timer by default on arm64, we may not have access to
the physical timer.
Differential Revision: https://reviews.freebsd.org/D2208
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Previously, the driver was trying to blink the LED in the newstate
function, but that only gets called once (unlike OpenBSD's net80211
stack). Move the LED blinking to set_channel().
While there, don't try to set the channel when we switch to the SCAN
state. This is already accomplished by the set_channel() function.
MFC after: 1 week
It's necessary to reset the screen to make sure any vendor pixels are
gone when we start boot1. In the Lenovo X1 (3rd gen), this is the
only way to clear the screen. Previously, the Lenovo logo would only
disappear after the kernel started scrolling the display.
After resetting the screen, EFI could put us in the worst LCD mode
(oversized characters), so we now find the largest mode we can use and
hope it's the most appropriate one (it's not trivial to tell what's
the correct LCD resolution at this point). It's worth noting that the
final stage loader has a 'mode' command that can be used to switch
text modes.
While there, enable the software cursor, just like in the legacy boot
mode.
MFC after: 1 week
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.
Address this situation from two sides:
- restore KVA usage limitations in a way the most close to Illumos;
- introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.
Experiments show that first limitation done alone is not sufficient. On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk. Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again. :)
MFC after: 1 month
work stack and is reused again on the receive ring. Remaining received packets in the ring are not processed in that
invocation of bxe_rxeof() and defered to the task thread.
MFC after: 5 days
Amd64 uses relocatable object files as the modules format. It is good
WRT not having unneeded overhead for PIC code, in particular, due to
absence of useless GOT and PLT. But the cost is that the module
linking process cannot use hash to speed up the symbol lookup, and
that each reference to the symbol requiring a relocation, instead of
single-place relocation in GOT.
Cache the successfull symbol lookup results in the module symbol
table, using the newly allocated SHN_FBSD_CACHED value from
SHN_LOOS-HIOS range as an indicator. The SHN_FBSD_CACHED together
with the non-existent definition of the found symbol are reverted
after successfull relocations, which is done under kld_sx lock, so it
should not be visible to other consumers of the symbol table.
Submitted by: Conrad Meyer
Differential Revision: https://reviews.freebsd.org/D1718
MFC after: 3 weeks
This was not (and still is not) connected to the build, but the EFI
loader is in the process of being built for other than amd64 so these
files ought to live in their eventual MD location.
header and not only partial flags and fields. Firewalls can attach
classification tags to the outgoing mbufs which should be copied to
all the new fragments. Else only the first fragment will be let
through by the firewall. This can easily be tested by sending a large
ping packet through a firewall. It was also discovered that VLAN
related flags and fields should be copied for packets traversing
through VLANs. This is all handled by "m_dup_pkthdr()".
Regarding the MAC policy check in ip_fragment(), the tag provided by
the originating mbuf is copied instead of using the default one
provided by m_gethdr().
Tested by: Karim Fodil-Lemelin <fodillemlinkarim at gmail.com>
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
PR: 7802
stage, just like for the regular world stage.
Reviewed by: rodrigc, imp, bapt, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D2187
not have interupt property in pl310 node. Interrupt is used only to
detect cache activity when L2 cache is disabled, it's not vital for
normal operations.
- Fix intrhook allocation/initialization
There are cases when gpioled nodes in DTS come from different sources
(e.g. standard Beaglebone Black LEDs in main DTS + shield LEDs in
overlay DTS) so instead of handling only first compatible node go
through all child nodes
where we want to create a new IP datagram.
o Add support for RFC6864, which allows to set IP ID for atomic IP
datagrams to any value, to improve performance. The behaviour is
controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by
default.
o In case if we generate IP ID, use counter(9) to improve performance.
o Gather all code related to IP ID into ip_id.c.
Differential Revision: https://reviews.freebsd.org/D2177
Reviewed by: adrian, cy, rpaulo
Tested by: Emeric POUPON <emeric.poupon stormshield.eu>
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Relnotes: yes
- bus_dmamap_create() does not take the BUS_DMA_NOWAIT flag
- properly unload maps
- do not assign NULL to dma map pointers
Submitted by: Konstantin Belousov <kostikbel@gmail.com>
Approved by: jfv (mentor)
MFC after: 1 week
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.
Note to self: When this is MFCed, sparc64 needs the same fix.
Differential revision: https://reviews.freebsd.org/D2106
Reviewed by: kib
Reported by: Michael Fuckner <michael@fuckner.net>
Tested by: Michael Fuckner <michael@fuckner.net>
MFC after: 2 weeks
On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.
While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.
Differential Revision: https://reviews.freebsd.org/D2189
Approved by: gnn (mentor)
When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.
We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.
Differential Revision: https://reviews.freebsd.org/D2188
Approved by: gnn (mentor)
Return EINVAL instead of EFTYPE if we have a multiboot kernel loaded but
failed to load the modules. This makes it clear that the kernel/module
should be handled by the multiboot handler but something went wrong.
Sponsored by: Citrix Systems R&D
Zero the list of modules array before using it, or else we might pass
uninitialized data in unused fields of the struct that will make Xen choke.
Also add a check to make sure malloc succeeds.
Sponsored by: Citrix Systems R&D
support for booting arm and arm64 from UEFI.
Differential Revision: https://reviews.freebsd.org/D2164
Reviewed by: emaste, imp (previous version)
Sponsored by: The FreeBSD Foundation
on reads or writes, the time marks are used to display idle time by
w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second. The later gives seconds precision,
which is good enough for the purpose.
Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.
Noted by: truckman [1]
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
the top-level HAL.
The athstats program is blindly using a copy of the ar5212 ANI stats structure
to pull out ANI statistics/state and this is problematic for the AR9300
HAL.
So:
* Define HAL_ANI_STATS and HAL_ANI_STATE
* Use HAL_ANI_STATS inside the AR5212 HAL
This commit doesn't (yet) convert the ar5212AniState -> HAL_ANI_STATE when
exporting it to userland; that'll come in the next commit.
Summary:
Add "GELI Passphrase:" prompt to boot loader.
A new loader.conf(5) option of geom_eli_passphrase_prompt="YES" will now
allow you to enter your geli(8) root-mount credentials prior to invoking
the kernel.
See check-password.4th(8) for details.
Differential Revision: https://reviews.freebsd.org/D2105
Reviewed by: (your name[s] here)
MFC after: 3 days
X-MFC-to: stable/10
Relnotes: yes
Test Plan:
Drop a head copy of check-password.4th into /boot and then apply the patch
(only the patch to /boot/check-password.4th is required; no other changes are
required but you do have to have a HEAD copy of check-password.4th to
apply the patch).
NB: The rest of your /boot files can be up to 2 years old but no older.
NB: The test won't work unless your kernel has the following change
https://svnweb.freebsd.org/base?view=revision&revision=273489
Now, put into /boot/loader.conf:
geom_eli_passphrase_prompt="YES"
and reboot.
You should be prompted for a GELI passphrase before the menu (if enabled),
just after loading loader.conf(5).
NB: It doesn't matter if you're using GELI or not. However if you are using
GELI and a sufficiently new enough release (has SVN r273489) and you entered
the proper passphrase to mount your GELI encrypted root device(s), you should
notice that the boot process did not stop (you went from loader all the way to login).
Reviewers: cperciva, allanjude, scottl, kmoore
Subscribers: jkh, imp
Differential Revision: https://reviews.freebsd.org/D2105
vocabularies delay-processing, password-processing, version-processing,
frame-drawing, menu-infrastructure, menu-namespace, menu-command-helpers,
and menusets-infrastructure. The net effect is to remove almost 200
definitions from the main forth vocabulary reducing the dictionary size
by over 50%. The chances of hitting "dictionary full" should be greatly
reduced by this patch.
MFC after: 3 days
X-MFC-to: stable/10
implementation.
The kernel RPC code, which is responsible for the low-level scheduling
of incoming NFS requests, contains a throttling mechanism that
prevents too much kernel memory from being tied up by NFS requests
that are being serviced. When the throttle is engaged, the RPC layer
stops servicing incoming NFS sockets, resulting ultimately in
backpressure on the clients (if they're using TCP). However, this is
a very heavy-handed mechanism as it prevents all clients from making
any requests, regardless of how heavy or light they are. (Thus, when
engaged, the throttle often prevents clients from even mounting the
filesystem.) The throttle mechanism applies specifically to requests
that have been received by the RPC layer (from a TCP or UDP socket)
and are queued waiting to be serviced by one of the nfsd threads; it
does not limit the amount of backlog in the socket buffers.
The original implementation limited the total bytes of queued requests
to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB.
The former limit seems reasonable, since requests queued in the socket
buffers and replies being constructed to the requests in progress will
all require some amount of network memory, but the 45 MiB limit is
plainly ridiculous for modern memory sizes: when running 256 service
threads on a busy server, 45 MiB would result in just a single
maximum-sized NFS3PROC_WRITE queued per thread before throttling.
Removing this limit exposed integer-overflow bugs in the original
computation, and related bugs in the routines that actually account
for the amount of traffic enqueued for service threads. The old
implementation also attempted to reduce accounting overhead by
batching updates until each queue is fully drained, but this is prone
to livelock, resulting in repeated accumulate-throttle-drain cycles on
a busy server. Various data types are changed to long or unsigned
long; explicit 64-bit types are not used due to the unavailability of
64-bit atomics on many 32-bit platforms, but those platforms also
cannot support nmbclusters large enough to cause overflow.
This code (in a 10.1 kernel) is presently running on production NFS
servers at CSAIL.
Summary of this revision:
* Removes 45 MiB limit on requests queued for nfsd service threads
* Fixes integer-overflow and signedness bugs
* Avoids unnecessary throttling by not deferring accounting for
completed requests
Differential Revision: https://reviews.freebsd.org/D2165
Reviewed by: rmacklem, mav
MFC after: 30 days
Relnotes: yes
Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory
%rdi, %rsi, etc are inadvertently bypassed along with the check to
see if the instruction needs to be repeated per the 'rep' prefix.
Add "MOVS" instruction support for the 'MMIO to MMIO' case.
Reviewed by: neel