for UMA startup.
o Introduce another stage of UMA startup, which is entered after
vm_page_startup() finishes. After this stage we don't yet enable buckets,
but we can ask VM for pages. Rename stages to meaningful names while here.
New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
BOOT_RUNNING.
Enabling page alloc earlier allows us to dramatically reduce number of
boot pages required. What is more important number of zones becomes
consistent across different machines, as no MD allocations are done before
the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
startup_alloc(), however that may change, so vm_page_startup() provides
its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
functions calculates sizes of zones zone and kegs zone, and calculates how
many pages UMA will need to bootstrap.
It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
but also as args.size.
Reviewed by: imp, gallatin (earlier version)
Differential Revision: https://reviews.freebsd.org/D14054
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.
The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:
- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
never mark it dirty, thus do not write it out, and hence never
update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
the old check hash is still in the VM cache, but some other pages
are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
error to be printed.
The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.
The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.
- When a read completes we had the following sequence:
- bufdone()
-- b_ckhashcalc (calculates check hash)
-- bufdone_finish()
--- vfs_vmio_iodone() (replaces bogus pages with the cached ones)
- When we are reading a buffer where one or more pages are already
in memory (but not all pages, or we wouldn't be doing the read),
the I/O is done with bogus_page mapped in for the pages that exist
in the VM cache. This mapping is done to avoid corrupting the
cached pages if there is any I/O overrun. The vfs_vmio_iodone()
function is responsible for replacing the bogus_page(s) with the
cached ones. But we were calculating the check hash before the
bogus_page(s) were replaced. Hence, when we were calculating the
check hash, we were partly reading from bogus_page, which means
we calculated a bad check hash (e.g., because multiple pages have
been mapped to bogus_page, so its contents are indeterminate).
The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.
With these two changes, the occasional cylinder-group check-hash
errors are gone.
Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
- Remove the shim interface that allowed bwn(4) to use either siba_bwn or
bhnd(4), replacing all siba_bwn calls with their bhnd(4) bus equivalents.
- Drop the legay, now-unused siba_bwn bus driver.
- Clean up bhnd(4) board flag defines referenced by bwn(4).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13518
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident. For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero). For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>, adrian
Tested on: mips, mips64
MFC after: 1 month
Sponsored by: DARPA / AFRL
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA / AFRL
Differential Revision: https://reviews.freebsd.org/D13945
I'll have to go double check to see if it does indeed pass ARP frames between
switch ports with this disabled, but it seems required for the CPU port to see
ARP traffic.
I'll dig into this some more.
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.
Clean up shared Linuxolator files while here.
Sponsored by: Turing Robotic Industries Inc.
This was a hack to be able to mount ext4 filesystems read-only while not
supporting all the features. We now support all those features so it
doesn't make sense to keep the undocumented hack.
Discussed with: fsu
Delay the initialization of variables until the are needed.
In the case of ext4_ext_rm_leaf(), make sure 'error' value is not
undefined.
Reported by: Clang's static analyzer
Differential Revision: https://reviews.freebsd.org/D14193
It is possible, for complex fork()/collapse situations, to have
sibling address spaces to partially share shadow chains. If one
sibling performs wiring, it can happen that a transient page, invalid
and busy, is installed into a shadow object which is visible to other
sibling for the duration of vm_fault_hold(). When the backing object
contains the valid page, and the wiring is performed on read-only
entry, the transient page is eventually removed.
But the sibling which observed the transient page might perform the
unwire, executing vm_object_unwire(). There, the first page found in
the shadow chain is considered as the page that was wired for the
mapping. It is really the page below it which is wired. So we unwire
the wrong page, either triggering the asserts of breaking the page'
wire counter.
As the fix, wait for the busy state to finish if we find such page
during unwire, and restart the shadow chain walk after the sleep.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14184
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.
In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.
Reported by: Maxime Villard <max at m00nbsd dot net>
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14158
This indeed uses the same registers as the AR8216 and later chips.
There seems to be an issue with ARP requests being sent out from the CPU
through this switch here, so figuring that out is next. Learning works fine on
the AR8327 ethernet switch on the /other/ gigabit ethernet port, so I don't
think it's the network stack or ethernet driver.
Tested:
* DB120 - AR9340 SOC + ethernet switch (and other bits.)
synaptics or elantech sanity checker.
After packet has been rejected contents of packet buffer is not cleared
with setting of inputbytes counter to 0. So when this packet buffer is
filled again being an element of circular queue, new data appends to old
data rather than overwrites it. This leads to packet buffer overflow
after 10 rounds.
Fix it with setting of packet's inputbytes counter to 0 after rejection.
While here add extra logging of rejected packets.
PR: 222667 (for reference)
Reported by: Neel Chauhan <neel@neelc.org>
Tested by: Neel Chauhan <neel@neelc.org>
MFC after: 1 week
The L3 cache controller (Corenet Platform Cache) is listed with one of its
compatible strings as "cache", which this driver can't attach to. Restrict
to a known list of primary cache controller strings, as found in the l2cache
devicetree binding.
Summary:
Some architectures use large (36-bit) physical addresses, with smaller
virtual addresses. Casting between vm_paddr_t (or bus_addr_t) and void * is
considered illegal, so cast through uintptr_t. No functional change on existing
platforms.
Reviewed By: scottl
Differential Revision: https://reviews.freebsd.org/D14042
Most consumers of g_metadata_store were passing in partially unallocated
memory, resulting in stack garbage being written to disk labels. Fix them by
zeroing the memory first.
gvirstor repeated the same mistake, but in the kernel.
Also, glabel's label contained a fixed-size string that wasn't
initialized to zero.
PR: 222077
Reported by: Maxim Khitrov <max@mxcrypt.com>
Reviewed by: cem
MFC after: 3 weeks
X-MFC-With: 323314
X-MFC-With: 323338
Differential Revision: https://reviews.freebsd.org/D14164
superblock, and the kernel will fail to link when UFS is not built
in. This commit makes it depend on a small portion of FFS bits and
thereby fixes build for this situation.
This is intended as an interim bandaid, and the actual superblock
reading code should probably be made independent of UFS, so we do
not need to depend on it (see kib@'s comment in the review for
details), and we will revisit this once the superblock check hashes
are all in place.
Differential Revision: https://reviews.freebsd.org/D14092
* Add the bulk of the ATU table read function
* Correct how the ATU function and WAIT bits work
TODO:
* more testing, figure out how the multi-vlan table stuff works and push that
up to userspace
* Refactor the initial learning configuration (port learning, address expiry,
handling address moving between ports, etc, etc) into a separate HAL routine
* and ensure that it's consistent between switch chips - the AR8216,8316,724x,9331
SoCs all share the same switch code.
* .. the AR8327 needs doing - the defaults seem OK for now
* .. the AR9340 is different but it's also programmed now.
* Add support for flushing a single port worth of ATU entries
* Add support for fetching the ATU table from AR8216 and derived chips
Tested:
* AR9344, Carambola 2
TODO:
* Further testing on other chips
* Add AR9340 support
* Add AR8327 support
Stop leaking kernel pointers though theses sysctls and make sure that the
padding in the structures is zeroed on allocation to avoid other leaks.
Reviewed by: gordon, kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D13459
clear dirty bits for completely invalid blocks.
Otherwise we might not write out the last chunk that is shorter than
512 bytes, if the file end is not aligned on disk block boundary.
This become important after the r324794.
PR: 225586
Reported by: tris_vern@hotmail.com
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
Reported by: Maxime Villard <max at m00nbsd dot net>
MFC after: 3 days
This stuff may be a bit fluid during this -HEAD cycle as various other
switch features are added, but the current stuff is enough to drive
initial development and features on the atheros range of integrated
and external switches.
* add a method to flush the whole address table;
* add a method to flush all addresses on a given port;
* add a method to download the address table;
* .. and then a method to fetch entries from the address table.
The table fetch/read methods pass through to the drivers for now since
the drivers may implement different ways of fetching/caching the address
table data. The atheros devices for example fetch the table by
iterating over the table through a set of registers and so you need
to keep that locked whilst you iterate otherwise you may have the table
flushed half way by a port status change.
This is a no-op until the userland and arswitch code shows up.
The old way required the data to be present really early and copied it from
memory mapped NOR flash; this only worked during kernel boot but not for
ath/ath_hal modules.
Tested:
* AR9331, Carambola2, ath/hal modules.