To save SMART data and for a drive to understand that it's been nicely
shutdown, we need to send a STANDBY IMMEDIATE. Modify adaspindown to
use a local CCB on the stack. When we're panicing, used
xpt_polled_action rather than cam_periph_runccb so that we can SEND
IMMEDIATE after we've shutdown the scheduler.
Sponsored by: Netflix
Reviewed by: scottl@, gallatin@
Differential Revision: https://reviews.freebsd.org/D12799
hw.ipmi.cycle_time is the time to wait for the power down phase of the
ipmi power cycle before falling back to either reboot or halt.
Sponsored by: Netflix
o Make hw.ipmi.on a tuneable
o Changes to keep shutdown from hanging indefinitately after the wd
would normally have been disabled.
o Add support for setting pretimeout (which fires an interrupt
some time before the actual watchdog expires)
o Allow refinement of the actions to take when the watchdog expires
o Allow special startup timeout to keep us from hanging in boot
before watchdogd is started, but after we've loaded the kernel.
Obtained From: Netflix OCA Firmware
An off-by-one error has been present since the system call was first present
in 185878. It additionally became a memory corruption bug after change
324941. The failure is actually revealed by our existing AIO tests.
However, apparently nobody's been running those in 32-bit emulation mode.
Reported by: Coverity, cem
CID: 1382114
MFC after: 18 days
X-MFC-With: 324941
Sponsored by: Spectra Logic Corp
In rS323851, some casts were adjusted in calls to nvlist_next() and
nvlist_get_pararr() in order to make scan-build happy. I think these changes
just confused scan-build into not reporting the strict-aliasing violation.
For example, nvlist_xdescriptors() is causing nvlist_next() to write to its
local variable nvp of type nvpair_t * using the lvalue *cookiep of type
void *, which is not allowed. Given the APIs of nvlist_next(),
nvlist_get_parent() and nvlist_get_pararr(), one possible fix is to create a
local void *cookie in nvlist_xdescriptors() and other places, and to convert
the value to nvpair_t * when necessary. This patch implements that fix.
Reviewed by: oshogbo
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12760
fq_pie schedulers packet classification functions in layer2 (bridge mode).
Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.
Submitted by: lstewart
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12506
Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.
This mirrors what the slow path does.
Reviewed by: ae
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12779
Move framebuffer.{c,h} to sys/boot/efi/loader and add the efifb
related metadata and pass it to the kernel
Reviewed by: imp, andrew
Differential Revision: https://reviews.freebsd.org/D12757
Some BMCs support power cycling the chassis via the chassis control
command 2 subcommand 2 (ipmitool called it 'chassis power cycle'). If
the BMC supports the chassis device, register a shutdown_final handler
that sends the power cycle command if request and waits up to 10s for
it to take effect. To minimize stack strain, we preallocate a ipmi
request in the softc. At the moment, we're verbose about what we're
doing.
Sponsored by: Netflix
RB_POWERCYCLE instructs the platform to power off and then power back
on a short time later, if that's possible. Otherwise, degrade to the
RB_POWEROFF behavior.
Sponsored by: Netflix
Remove at91 bootloader. It only worked on AT91RM9200, and only
specific boards that were all EOLd 10 years ago.
Remove ixp425. It doesn't build anymore and is for boards that were
EOLd 8 years ago.
Sponsored by: Netflix
When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.
Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.
savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.
A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.
Reviewed by: cem (earlier version)
Discussed with: def, rgrimes
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11723
A fictitious page is always wired, so there is no point in trying to
remove one from the page queues.
Completely remove one inaccurate comment from vm_page_free_prep() and
correct another.
Reviewed by: kib, markj
MFC after: 1 week
different from hardware defaults. The congestion channel map, which is
still fixed, needs to be tracked separately now. Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
When limiting I/O, a value of 0 makes no sense as a limit. No progress
can be made. Trade the possibility that someone might be doing
something clever to achieve ultra-low I/O limits vs the damage of not
ever making progress on an I/O in favor of making progress. Now the
machine won't be useless if this accidentally gets requested.
Sponsored by: Netflix
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:
* To size a UMA zone, which globally limits the number of concurrent
aio_suspend calls.
* To artifically limit the number of operations in a single lio_listio call.
There doesn't seem to be any memory allocation associated with this limit.
This change does two things:
* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.
* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
calls will now be limited by the more generous max_aio_queue_per_proc. The
old p1003_1b.aio_listio_max is now an alias for
vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
_SC_AIO_LISTIO_MAX.
Reported by: bde
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12120
Don't check for SPA_MINDEVSIZE in vdev_geom_attach when opening by path.
It's redundant with the check in vdev_open, and failing to attach here
results in the wrong error message being printed. However, still check for
it in some other situations:
* When opening by guids, so we don't get bogged down reading from slow
devices like floppy drives.
* In vdev_geom_read_pool_label for the same reason, because we iterate over
all providers.
* If the caller requests that we verify the guid, because then we'll have to
read from the device before vdev_open verifies the size.
PR: 222227
Reported by: Marie Helene Kvello-Aune <marieheleneka@gmail.com>
Reviewed by: avg, mav
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12531
1. prefetch 128 bytes of mbufs.
2. Re-order filling the pkt_info so cache stalls happen at the end
3. Define empty prefetch2cachelines() macro when the function isn't present.
Provides small performance improvments on some hardware
Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12447
Clearing the unr in tmpfs_unmount is not correct. In the case of
multiple references to the tmpfs mount (e.g. when there are lookup
threads using it) it will not be the one to finish tmpfs_free_tmp. In
those cases tmpfs_free_node_locked will be the final one to execute
tmpfs_free_tmp, and until then the unr must be valid.
Reported by: pho
Approved/reviewed by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12749
BOOTDIR->BOOTSRC
FICLDIR->FICLSRC
LDR_MI->LDRSRC
This matches the patterns used in the rest of the system a bit vetter.
Suggested by: rgrimes@
Sponsored by: Netflix
While here cache-align chains.
This shortens longest found chain during poudriere -j 80 from 32 to 16.
Pushing this higher up will probably require allocation on boot.