kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported
The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation
Reviewed by: dchagin
MFC after: 1 week
Relnotes: yes
Differential Revision: D5830
The urtwn hardware transmits FF/A-MSDU just fine - it takes an 802.11
frame and will dutifully send the thing.
So:
* bump RX queue up from 1. Why's it 1? That's really silly.
* Add the "software A-MSDU" encap capability bit.
* bump the TX buffer size up so we can at least send A-MSDU frames.
* track active frames submitted to the NIC - we can't make assumptions
about how many are in flight in the NIC though. For 88E parts we
could use per-packet TX indication, but for R92 parts we can't.
So, just fake it somewhat.
* Kick the transmit queue when we finish reception; try to avoid stalls.
* Kick the FF queue a little more regularly.
A-MSDU TX won't happen until the net80211 side is done, but atheros
fast-frames support should now work.
Tested:
* urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R ; A-MSDU transmit.
* begin moving the 11n macros out of ieee80211_phy.c and
into a header so they can be used elsewhere.
* rename some of them into the IEEE80211_* namespace.
* convert HT_RC_2_MCS() to work with three-stream rates.
do software A-MSDU encapsulation.
Right now there's AMSDU TX/RX capability bits and they're mostly
unused, however I'd like to maintain those as the general configuration,
not also "please software encap AMSDU." For platforms that can do
A-MSDU in firmware (iwn, iwm, etc) then their init paths can read
this flag to configure A-MSDU.
The PLL_X, base CPU frequency source, doesn't have a bypass switch and thus
we must use another frequency source for CPU while changing its frequency.
PLL_P is ideal for this, it runs at 480MHz and CPU can be clocked at this
frequency at any CPU voltage.
This is compatible with the ds1307, but comparing the mcp7941x datasheet vs the
ds1307 code, appears there is one bit placement difference, so that is now
accounted for.
Relnotes: yes
OFW i2c probing requires a new method ofw_bus_get_node(), and the bus device is
assumed iichb. With these changes, i2c devices attached in fdt are probed and
attached automagically.
The fdc worker thread was using a one second timeout while waiting for
a new bio to arrive or for the device to detach. However, the driver
already does a wakeup when queueing a new bio or asking the thread to
detach, so the timeout only served to waste CPU time waking up the
thread once a second just so it could go right back to sleep. Use an
infinite timeout instead.
Discussed with: phk
Sponsored by: Netflix
many SoCs these two are the same, however there is no requirement for this
to be the case, e.g. on the ARM Juno we boot on what the GIC thinks of as
CPU 2, but FreeBSD numbers it CPU 0.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Previously, the code determined a topology of processing units
(hardware threads, cores, packages) and then deduced a cache topology
using certain assumptions. The new code builds a topology that
includes both processing units and caches using the information
provided by the hardware.
At the moment, the discovered full topology is used only to creeate
a scheduling topology for SCHED_ULE.
There is no KPI for other kernel uses.
Summary:
- based on APIC ID derivation rules for Intel and AMD CPUs
- can handle non-uniform topologies
- requires homogeneous APIC ID assignment (same bit widths for ID
components)
- topology for dual-node AMD CPUs may not be optimal
- topology for latest AMD CPU models may not be optimal as the code is
several years old
- supports only thread/package/core/cache nodes
Todo:
- AMD dual-node processors
- latest AMD processors
- NUMA nodes
- checking for homogeneity of the APIC ID assignment across packages
- more flexible cache placement within topology
- expose topology to userland, e.g., via sysctl nodes
Long term todo:
- KPI for CPU sharing and affinity with respect to various resources
(e.g., two logical processors may share the same FPU, etc)
Reviewed by: mav
Tested by: mav
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D2728
a child of it. This is done in conformity with Linux dts files and
as preparation for rework of BCM2836 interrupt controller for INTRNG.
Reviewed by: gonzo
Differential Revision: https://reviews.freebsd.org/D5807
universal.
(1) New struct intr_map_data is defined as a container for arbitrary
description of an interrupt used by a device. Typically, an interrupt
number and configuration relevant to an interrupt controller is encoded
in such description. However, any additional information may be encoded
too like a set of cpus on which an interrupt should be enabled or vendor
specific data needed for setup of an interrupt in controller. The struct
intr_map_data itself is meant to be opaque for INTRNG.
(2) An intr_map_irq() function is created which takes an interrupt
controller identification and struct intr_map_data as arguments and
returns global interrupt number which identifies an interrupt.
(3) A set of functions to be used by bus drivers is created as well as
a corresponding set of methods for interrupt controller drivers. These
sets take both struct resource and struct intr_map_data as one of the
arguments. There is a goal to keep struct intr_map_data in struct
resource, however, this way a final solution is not limited to that.
(4) Other small changes are done to reflect new situation.
This is only first step aiming to create stable interface for interrupt
controller drivers. Thus, some temporary solution is taken. Interrupt
descriptions for devices are stored in INTRNG and two specific mapping
function are created to be temporary used by bus drivers. That's why
the struct intr_map_data is not opaque for INTRNG now. This temporary
solution will be replaced by final one in next step.
Differential Revision: https://reviews.freebsd.org/D5730
This optimization attempts to utylize as wide as possible register store instructions to zero large buffers.
The implementation, if possible, will use 'dc zva' to zero buffer by cache lines.
Speedup: 60x faster memory zeroing
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5726
Introduce 2 new flags:
- FL_ENABLE_4B_ADDR (forces the use of 4-byte addresses)
- FL_DISABLE_4B_ADDR (forces the use of 3-byte addresses)
If an SPI flash chip is defined with FL_ENABLE_4B_ADDR in its flags,
then an 'Enter 4-byte mode' command is sent to the chip at attach time
and, later, all commands that require addressing are issued with 4-byte
addresses.
If an SPI flash chip is defined with FL_DISABLE_4B_ADDR in its flags,
then an 'Exit 4-byte mode' command is sent to the chip at attach time
and, later, all commands that require addressing are issued with 3-byte
addresses.
For chips that do not have any of these flags defined the behaviour is
unchanged.
This change also adds support for the MX25L25735F and MX25L25635E chips
(vendor id 0xc2, device id 0x2019), which support 4-byte mode and enables
4-byte mode for them. These are 256Mbit devices (32MiB) and, as such, can
only be fully addressed by using 4-byte addresses.
Approved by: adrian (mentor)
Sponsored by: Smartcom - Bulgaria AD
Differential Revision: https://reviews.freebsd.org/D5808
the interface.
I know this may be unpopular, but iwn is not yet completely ready for
a transparent firmware restart. I have this thing panic my laptop
reliably because 11n state isn't kept in sync and the TX completion
path ends up trying to free a null node reference.
If there is an error different from ERESTART, there is some
chance that we may end up accessing an uninitialized value. This
doesn't seem likely/possible but initialize announce_buf[0],
just in case.
CID: 1006486
For the !gsp case there some chance of returning an uninitialized
return value. Prevent that from happening by initializing the
error value.
CID: 1006421
On FreeBSD VFS_HOLD/VN_RELE were mapped to MNT_REF/MNT_REL that
manipulate mnt_ref. But the job of properly maintaining the reference
count is already automatically performed by insmntque(9) and
delmntque(9). So, in effect all ZFS vnodes referenced the corresponding
mountpoint twice.
That was completely harmless, but we want to be very explicit about what
FreeBSD VFS APIs are used, because illumos VFS_HOLD and FreeBSD MNT_REF
provide quite different guarantees with respect to the held vfs_t /
mountpoint. On illumos VFS_HOLD is sufficient to guarantee that
vfs_t.vfs_data stays valid. On the other hand, on FreeBSD MNT_REF does
*not* provide the same guarantee about mnt_data. We have to use
vfs_busy() to get that guarantee.
Thus, the calls to VFS_HOLD/VFS_RELE on vnode init and fini are removed.
VFS_HOLD calls are replaced with vfs_busy in the ioctl handlers.
And because vfs_busy has a richer interface that can not be dumbed down
in all cases it's better to explicitly use it rather than trying to mask
it behind VFS_HOLD.
This change fixes a panic that could result from a race between
zfs_umount() and zfs_ioc_rollback(). We observed a case where
zfsvfs_free() tried to destroy data that zfsvfs_teardown() was still
using. That happened because there was nothing to prevent unmounting of
a ZFS filesystem that was in between zfs_suspend_fs() and
zfs_resume_fs().
Reviewed by: kib, smh
MFC after: 3 weeks
Sponsored by: ClusterHQ
Differential Revision: https://reviews.freebsd.org/D2794
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Eli Rosenthal <eli.rosenthal@delphix.com>
illumos/illumos-gate@c20404ff77
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@d09e4475f6
separate driver. Add support for activating clock and hwreset resources
for these devices when the EXT_RESOURCES option is present.
Reviewed by: andrew, mmel, Emmanuel Vadot <manu@bidouilliste.com>
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5749
Previously, freebsd32 binaries could submit read/write requests with lengths
greater than INT_MAX that a native kernel would have rejected.
Reviewed by: kib
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5788
not getting a proper bounds check.
Thanks to CTurt for pointing at this with a big red blinking neon sign.
PR: 206761
Submitted by: sson
Reviewed by: cturt@hardenedbsd.org
MFC after: 3 days
PowerPC has real Open Firmware and does not necessarily need FDT.
Make ofwpci.c only PCI dependent.
Pointed out by: emaste
Reviewed by: nwhitehorn
Obtained from: Semihalf
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.
Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5765
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725
The i8254 simulation in Hyper-V is kinda broken and is not available
in Generation 2 Hyper-V VMs, so Hyper-V timer must be registered early
enough so that it can be used to do the TSC freq calibration.
This fixes the notorious warning like this:
calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe
Tested by: kib, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5778
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
knows at which address it got loaded and can apply relocations.
Some time ago I made a change to merge together the memory scope
definitions used by mmap (MAP_{PRIVATE,SHARED}) and lock objects
(PTHREAD_PROCESS_{PRIVATE,SHARED}). Though that sounded pretty smart
back then, it's backfiring. In the case of mmap it's used with other
flags in a bitmask, but for locking it's an enumeration. As our plan is
to automatically generate bindings for other languages, that looks a bit
sloppy.
Change all of the locking functions to use separate flags instead.
Obtained from: https://github.com/NuxiNL/cloudabi
This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.
Discussed with: np
Sponsored by: Chelsio Communications
Previously, calls to *sleep() and cv_*wait*() immediately returned during
early boot. Instead, permit threads that request a sleep without a
timeout to sleep as wakeup() works during early boot. Sleeps with
timeouts are harder to emulate without working timers, so just punt and
panic explicitly if any thread tries to use those before timers are
working. Any threads that depend on timeouts should either wait until
SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers
are available.
Until APs are started earlier this should be a no-op as other kthreads
shouldn't get a chance to start running until after timers are working
regardless of when they were created.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5724
- Move some blocks around to reduce the number of 'if (unmap)' checks.
- Use 'pbuf == NULL' instead of 'unmap'.
- Use nitems.
- Pull an assignment out of an if expression.
Reviewed by: kib
Sponsored by: Chelsio Communications
nic->num_vf_en is set based on the number of the enabled LMACs.
This number should not be overwritten later by any routine.
Instead it should fail PCI_IOV_ADD_VF() so that available VFs
with the corresponding LMACs will attach whereas other, disabled
VFs will fail with the proper error code.
Error signaling (due to improper number of VFs requested) is also moved
from PCI_IOV_INIT() to PCI_IOV_ADD_VF().
This will be reworked when multiple queue sets are enabled but for
now this is the correct behavior of the driver.
Obtained from: Semihalf
Sponsored by: Cavium
If the driver is not active or link is down the packet could remain
non-writeable. This commit makes all mbufs enqueued to the driver's
ring buffer to have correct attributes.
Pointed out by: wma
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5800
The FDT description is as follows:
- phy-handle, reg, qlm-mode, mac-address are under nodes in bgx0/1 node
- phy nodes (pointed by phy-handle) are under MDIO even though they may
not be connected through to MDIO. In those nodes they do not contain
MAC address or etc.
This commit changes parsing of the FDT nodes for BGX so that it can
obtain correct MAC address for a given PHY.
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5781
- Avoid memory leak when nicvf_tx_mbuf_locked() fails
- Introduce nicvf_xmit_locked() routine that uses drbr_peek(),
drbr_advance() or drbr_putback() for a specific ifnet.
This gives more clear and efficient design as well as
prevents from dropping mbufs that where not sent due to temporary
lack of descriptors.
- Add missing ETHER_BPF_MTAP() hook
Pointed out by: yongari
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5534
Changes introduced to AHCI code adding support for MSI-x
caused interrupt storm on Alpine boards.
This is unintended behaviour so added quirk to omit this functionality.
Reviewed by: mav
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D4301
increased to 256TiB. The kernel address space can also be increased to be
the same size, but this will be performed in a later change.
To help work with an extra level of page tables two new functions have
been added, one to file the lowest level table entry, and one to find the
block/page level. Both of these find the entry for a given pmap and virtual
address.
This has been tested with a combination of buildworld, stress2 tests, and
by using sort to consume a large amount of memory by sorting /dev/zero. No
new issues are known to be present from this change.
Reviewed by: kib
Obtained from: ABT Systems Ltd
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5720
Make it compile only for i386/amd64 for now as it's been tested there.
It's quite possible it'll show up elsewhere and we can enable it
for other architectures later.
Tested:
* PC Engines APU1C4
Submitted by: Daniel Wyatt <daniel@dewyatt.com>
Reviewed by: adrian, loos
Differential Revision: https://reviews.freebsd.org/D5389
connection manager. Examining so_comp without synchronization with
iw_so_event_handler is a harmless race.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Reviewed by: Steve Wise @ Open Grid Computing
Sponsored by: Chelsio Communications
It turns out that these will clash very annoyingly with the linux
macros in the linuxkpi layer, so let the wookie^Wlinux win.
The only user that I can find is ath(4), so fix it there too.
It turns out that madwifi actually has the basics for uAPSD implemented
but it was never ported to FreeBSD. I may eventually port most of the
pieces; I'll see how it goes!
Obtained from: Madwifi
the functions free the buffer and set the pointer to NULL. Also
remove useless call to brelse(9) on the error path.
PR: 208275
Submitted by: Fabian Keil <fk@fabiankeil.de>
MFC after: 2 weeks
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.
With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.
o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.
Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
Simplify and unify placeholder type definitions.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771
- fix UP build [1]
- do not obliterate initial reading of rdtsc by the loop counter [2]
- restore the meaning of the argument -1 to native_lapic_ipi_wait()
as wait until LAPIC acknowledge without timeout
- correct formula for calculating loop iteration count for 1us, it was
inverted, and ensure that even on unlikely slow CPUs at least one
check for ack is performed.
Reported by: Michael Butler <imb@protected-networks.net> [1], rpokala[2],
jhb[3]
Tested by: Michael Butler
Pointy hat to: kib
Sponsored by: The FreeBSD Foundation
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.
X-MFC-With: r296063
Tested by: Larry Rosenman <ler@lerctr.org> (previous revision)
Unlike Illumos FreeBSD has concept of logical ashift, that specifies
really minimal vdev block size that can be accessed. This knowledge
allows properly pad physical I/O and correctly assert its alignment.
This change fixes L2ARC write errors when device has logical sector
size above 512 bytes.
MFC after: 1 month
This driver works in PIO mode for now, interrupts are available only when
FIFO is enabled. The FIFO cannot be used with arbitrary sizes which defeat
its general use.
At some point we can add DMA transfers where the FIFO can be more useful.
Tested on uBMC (microBMC) and BBB.
Sponsored by: Rubicon Communications (Netgate)
Import portions of the PowerPC OF PCI implementation into new file
"ofwpci.c", common for other platforms. The files ofw_pci.c and ofw_pci.h
from sys/powerpc/ofw no longer exist. All required declarations are moved
to sys/dev/ofw/ofwpci.h. This creates a new ofw_pci_write_ivar() function
and modifies some others methods. Most functions contain existing ppc
implementations in the majority unchanged. Now there is no need to have
multiple identical copies of methods for various architectures.
Requested by: jhibbits
Reviewed by: jhibbits, marius
Submitted by: Marcin Mazurek <mma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D4879
different ID space than the kernel. Because of this we need to read the
ID from the hardware. The hardware will provide this value to the CPU by
reading any of the first 8 Interrupt Processor Targets Registers.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5706
- Do not mark CSUM_IP_CHECKED and CSUM_IP_VALID on IPv6 packets.
IPv6 does not have checksums by definition.
- Set SCTP packets csum_flags CSUM_SCTP_VALID instead of
CSUM_DATA_VALID and skip csum_data
- Set csum_data simply as 0xffff without byteswap
Pointed out by: yongari
Reviewed by: yongari, wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5537
It is not necessary as entries are being manipulated under lock.
Reviewed by: wma
Obtained from: Semihalf
Sponsored by: Cavium
Differential Revision: https://reviews.freebsd.org/D5536
and avoid a delay while waiting for IPI delivery acknowledgement in
xAPIC mode. This makes the loop exit immediately after the delivery
bit in APIC_ICR register is set, instead of waiting for some
microseconds.
We only need to ensure that some amount of time is allowed for the
LAPIC to react to the command, and we need that the wait time is
finite and reasonable. For that reasons, it is irrelevant if the CPU
frequency or throttling decrease the speed and make the loop,
calibrated for full CPU speed at boot time, execute somewhat slower.
Discussed with: bde, jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
This adds Samsung PM851 to the list. It can be found in Lenovo Thinkpad
T440 for instance.
Reviewed by: Kevin Bowling <kevin.bowling@kev009.com>,
Jason Wolfe <j@nitrology.com>
Approved by: Kevin Bowling <kevin.bowling@kev009.com>,
Jason Wolfe <j@nitrology.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5753
requests if cb->state is not IDLE.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Reviewed by: Steve Wise @ Open Grid Computing
Sponsored by: Chelsio Communications
This is a no-op currently, but in kernels with earlier AP startup, the
random kthread was trying to use timeouts with sleeps before timers are
working. Wait until SI_SUB_KICK_SCHEDULER to start the random kproc.
Reviewed by: delphij, imp, markm
Approved by: so
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5712
In r296926 the -P <path> option was added to kbdcontrol, which enables
this change for a simplified compile-time default keymap build process.
PR: 193865
Reviewed by: Oliver Pinter
Tested by: Oliver Pinter
MFC After: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5708
deadline mode the divide configuration is not used and
lapic_timer_divisor is not set.
Reported by: dhw, mav
Tested by: mav
Sponsored by: The FreeBSD Foundation
LAPIC timer iinterrupt when TSC reaches the value written to the
IA32_TSC_DEADLINE MSR. To arm or reset the timer in deadline mode, a
single non-serializing MSR write is enough. This is an advance from
the one-shot mode of LAPIC, where timer operated with the FSB
frequency and required two (serialized in case of xAPIC) writes to the
APIC registers.
The LVT_TIMER register value is cached to avoid unneeded writes in the
deadline mode. Unused arguments to specify period (which is passed in
struct lapic as la_timer_period) and interrupt enable (which is always
enabled) are removed from lapic_timer_{oneshot,periodic,deadline}
functions. Instead, special lapic_timer_oneshot_nointr() function for
interrupt-less one-shot calibration is added.
Reviewed by: mav (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5738
The graphic mode is noticeably slow on hypervisors, especially
on Hyper-V (1 second to each line).
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe, royger (early loader version)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5739
only the first one will actually work and all the others just result in
errors (which would get printed but otherwise ignored).
Instead, wait until we make a choice of which interface will be used to
mount the rootfs, and install the default route associated with it (if any).
After doing the md_mount() call to obtain the needed info, remove the
default route again, and transcribe the route info into the nfs_diskless
structure. If the system eventually chooses to mount the nfs rootfs, the
default route will be installed again when the nfs_diskless code
re-initializes the interface.
The theory here is that since we can only have one default route, the one
most likely to be correct for mounting the rootfs is the one that was
delivered along with the rootpath option.
bootp/dhcp server doesn't provide a router option. Doing so prevents
setting defaultrouter=<ip> in rc.conf (it fails because there's already
a bogus default route installed by bootpc_init).
When an admin wants to use this style of proxy arp on an interface, the
proper mechanism is to set the "use-lease-addr-for-default-route" flag
in the dhcp server config. That causes the lease address to be delivered
in the routers option, and the normal handling of the routers option will
then install the self-ip as the default route.
PR: 187094
doesn't check for errors, and all the errors that can happen result in it
calling panic anyway, except for one that's really more of a warning (and
is going to disappear on an upcoming commit anyway).
server (and not when it came from a fallback method such as the ROOTDEVNAME
option). This makes the code in bootpc_init() choose the first interface
that provided a rootpath name. Previously it was choosing the first
interface that got an IP address, which could be on a different and
potentially unreachable subnet than the server providing the rootfs.
If the rootpath name actually does come from a fallback source, then the
code continues to use the first interface in the list that got configured.
Note that this wasn't directly reported in the PR cited below, but was
discovered while working on that PR.
PR: 187094
into per-mount taskqueue with the private taskqueue processing thread.
This allows to drain the taskqueue on unmount, to ensure that all
TRIMs are finished before mount structures are freed.
But just draining the taskqueue where TRIM biodone geom-up completions
are processed is not enough, since ffs_blkfree(), called by the task,
might result in more writes. Count inflight delayed blkfree's and
pause() unmount until the counter drains as well.
Reported by: Nick Evans <nevans@talkpoint.com>
Tested by: Nick Evans <nevans@talkpoint.com>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
m_unshare passes on the source mbuf's flags as-is to m_getcl and this
results in a leak if the flags include M_NOFREE. The fix is to clear
the bits not listed in M_COPYALL before calling m_getcl. M_RDONLY
should probably be filtered out too but that's outside the scope of this
fix.
Add assertions in the zone_mbuf and zone_pack ctors to catch similar
bugs.
Update netmap_get_mbuf to not pass M_NOFREE to m_getcl. It's not clear
what the original code was trying to do but it's likely incorrect.
Updated code is no different functionally but it avoids the newly added
assertions.
Reviewed by: gnn@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5698
- in atags
- in DT blob (by using 'fdt chosen' U-Boot command)
The command line must start with guard's string 'FreeBSD:' and can contain
list of comma separated kenv strings. Also, boot modifier strings from
boot.h are recognised and parsed into boothowto.
The command line must be passed from U-Boot by setting of bootargs variable:
'setenv bootargs FreeBSD:boot_single=1,vfs.root.mountfrom=ufs:/dev/ada0s1a'
followed by 'fdt chosen' (only for DT based boot)
- Don't convert atags address passed from U-Boot. It's real physical
address (and we have 1:1 mapping).
- Size of tags is encoded in words, not in bytes
This allow us to boot FreeBSD kernel (using uImage encapsulation) directly
from U-boot using 'bootm' command or by Android fastboot loader.
For now, kernel uImage must be marked as Linux, but we can add support for
FreeBSD into U-Boot later.
So that callers could react accordingly.
Reviewed by: gallatin (no objection)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5695
The type definitions and constants that were used by COMPAT_CLOUDABI64
are a literal copy of some headers stored inside of CloudABI's C
library, cloudlibc. What is annoying is that we can't make use of
cloudlibc's system call list, as the format is completely different and
doesn't provide enough information. It had to be synced in manually.
We recently decided to solve this (and some other problems) by moving
the ABI definitions into a separate file:
https://github.com/NuxiNL/cloudabi/blob/master/cloudabi.txt
This file is processed by a pile of Python scripts to generate the
header files like before, documentation (markdown), but in our case more
importantly: a FreeBSD system call table.
This change discards the old files in sys/contrib/cloudabi and replaces
them by the latest copies, which requires some minor changes here and
there. Because cloudabi.txt also enforces consistent names of the system
call arguments, we have to patch up a small number of system call
implementations to use the new argument names.
The new header files can also be included directly in FreeBSD kernel
space without needing any includes/defines, so we can now remove
cloudabi_syscalldefs.h and cloudabi64_syscalldefs.h. Patch up the
sources to include the definitions directly from sys/contrib/cloudabi
instead.
Big buffer size could cause integer overflow and as a result
attempt to copy beyond VM_USERMAX_ADDRESS.
Fixing copyinstr boundary checking where compared value has been
overwritten by accident when setting fault handler.
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5719
- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
using gcc happy [-Wreturn-type]
"type qualifiers ignored on function return type"
I am not entirely happy with this solution putting the u_int there
but it will do for now.
This fixes creation of zvol devices for snapshots during zfs receive,
that previously failed with "ZFS WARNING: Unable to create ZVOL" message.
This solution is not perfect, but IMHO better then it was before.
MFC after: 2 weeks
pagers fault routines always return with a result page, be it the
proper and valid result page, or initially passed freshly allocated
placeholder. Do not free the passed in page until we are able to
provide the replacement, and do not assign NULL to *mres.
Reported and tested by: dumbbell
Reviewed by: royger (who also verified that Xen code is safe)
Sponsored by: The FreeBSD Foundation
controller IPI provider.
New struct intr_ipi is defined which keeps all info about an IPI:
its name, counter, send and dispatch methods. Generic intr_ipi_setup(),
intr_ipi_send() and intr_ipi_dispatch() functions are implemented.
An IPI provider must implement two functions:
(1) an intr_ipi_send_t function which is able to send an IPI,
(2) a setup function which initializes itself for an IPI and
calls intr_ipi_setup() with appropriate arguments.
Differential Revision: https://reviews.freebsd.org/D5700
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.
Submitted by: Mike Karels
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4306
No functional change.
struct radix_node_head's first element is rh so this was already
referring to the same address. It was likely an unintended
s/rnh/&rnh->rh/ change from r294706 as all other rnh_walktree() callers
pass the expected struct radix_node_head * rather than obscurely passing
the address of their first element.
Sponsored by: EMC / Isilon Storage Division
Replace free(M_RTABLE) with rn_detachhead() to match rn_inithead().
This would trigger when reloading NFS exports and was similar to
problems with pf reload [1].
PR: 194078 [1]
Sponsored by: EMC / Isilon Storage Division
Using one taskqueue does not work, since the EOM MSR must be written
on the msg's owner CPU.
Noticed by: Jun Su <junsu microsoft com>
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Most often, hv_vmbus_post_message() doesn't fail. However, it fails
intermittently when GPADLs of large shared memory is to be established
with the host, e.g. on the hn(4) attach path: a GPADL of 15MB sendbuf
is created, for which lots of messages will be flooded to the host.
The host side tries to throttle the message rate by returning
HV_STATUS_INSUFFICIENT_BUFFERS.
Before this commit, we do several retries for failed messages, but the
delay between each retry is pretty/too low, which will cause sporadic
message posting failure. We now use large delay (>=1ms) between each
retry to fix the message posting failure.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5715
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code). This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5710
- Use a different device description for fixed and fixed factor clocks.
- Fix a bug where the "clock-div" property was stored in the "mult" field
of the clock definition.
- Get the fixed factor parent clock by index instead of by name, as a
clock-names property is not required to be present here.
Reviewed by: mmel, adrian (mentor)
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5703
clock-indices property is present, so change the "uint32_t *indices" parameter
to "uint32_t **indices" to allow this.
Reviewed by: mmel, adrian (mentor)
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5702
platform specific drivers a chance to override the generic driver.
Reviewed by: mmel, adrian (mentor)
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D5701
The first of set of patches.
Use wider load/stores when aligned buffer is being copied.
In a simple test:
dd if=/dev/zero of=/dev/null bs=1M count=1024
the performance jumped from 410MB/s up to 3.6GB/s.
TODO:
- better handling of unaligned buffers (WiP)
- implement similar mechanism to bzero
Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Cavium
Reviewed by: kib, andrew, emaste
Differential Revision: https://reviews.freebsd.org/D5664
expects that the loop is always exited with the SU lock owned, even on
error.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Skip the log entry as there is nothing good to write out. Don't fail
the syscall though since it already succeeded. There's no reason
filemon's tracing failure should fail the already-succeeded syscall.
Record the error for later to return from close(2) on the filemon devfs
file descriptor.
Discussed with: markj, sjg, kib (briefly with kib)
Reported by: mjg
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
If the tracer has decided to the close the log then it should be fully
written, not getting more entries, when close(2) returns. This was
a regression in r297156 in that it allowed a traced process to continue
a traced syscall and add more entries to the log while the tracer had
already closed its fd or exited. This was only really part of the
daemonized process case which is abnormal.
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
Furthermore, there is no reason this needs to be a 64-bit integer
for the forseeable future.
Also, there is an inconsistency between to_flags and the mask in
tcp_addoptions(). Before r195654, to_flags was a u_long and the mask in
tcp_addoptions() was a u_int. r195654 changed to_flags to be a u_int64_t
but left the mask in tcp_addoptions() as a u_int, meaning that these
variables will only be the same width on platforms with 64-bit integers.
Convert both to_flags and the mask in tcp_addoptions() to be explicitly
32-bit variables. This may save a few cycles on 32-bit platforms, and
avoids unnecessarily mixing types.
Differential Revision: https://reviews.freebsd.org/D5584
Reviewed by: hiren
MFC after: 2 weeks
Sponsored by: Juniper Networks
Factor out nd6 and in6_attach initialization to their own files.
Also move destruction into those files though still called from
the central initialization.
Sponsored by: CK Software GmbH
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D5033
This restores the pre-r290196 behaviour, eliminating the need to manually
press '.' a couple of times to get USB to finish probing.
Note that there's still something wrong with the console (character
echoing doesn't quite work), and there's also a reported problem with
BHyVe, but those two don't seem related to the problem above.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
The mbuf provider is made up of a set of Statically Defined Tracepoints
which help us look into mbufs as they are allocated and freed. This can be
used to inspect the buffers or for a simplified mbuf leak detector.
New tracepoints are:
mbuf:::m-init
mbuf:::m-gethdr
mbuf:::m-get
mbuf:::m-getcl
mbuf:::m-clget
mbuf:::m-cljget
mbuf:::m-cljset
mbuf:::m-free
mbuf:::m-freem
There is also a translator for mbufs which gives some visibility into the structure,
see mbuf.d for more details.
Reviewed by: bz, markj
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
Differential Revision: https://reviews.freebsd.org/D5682
Make it a device option to be included in the kernel configs that request this file.
Reported by: mmel
Suggested by: mmel
Reviewed by: mmel
Differential Revision: https://reviews.freebsd.org/D5699
from userpsace. Previously we could have triggered a panic by trying to
jump to a kernel address from userland as the trap handling code thought we
received an ast in kernel mode.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
This mainly used to improve ACK timeliness when multiple RX rings
are enabled.
This value gives the best performance in both Azure and Hyper-V
environment, w/ both 10Ge and 40Ge using non-{INVARIANTS,WITNESS}
kernel.
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5691
Set watchdog timer parameters only when they really need to be changed.
In other cases just restart the timer with single Reset command instead
of two (Set and Reset).
From one side this visually reduces amount of CPU time burned in tight
loop waiting while some slow BMC configures its watchdog hardware, that
seems to be much more complicated task then just resetting the timer.
From another side on some BMCs those slow Set commands sometimes tend to
timeout, that leads to noisy log messages and even more CPU time burned,
so avoiding them can provide even bigger bonuses.
MFC after: 2 weeks
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
This gets rid of the per-cpu SWIs.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
Using the same message slot as the other types of the messages has
the side effect that the event timer message could be deferred to
the swi threads to run (lacking of trapframe and the original code
didn't even handle that, so the event timer was actually broken).
As of this commit we use an independent message slot for event timer,
so that we could handle all of event timer messages in the interrupt
handler directly. Note, the message slot for event timer is still
bind to the same interrupt vector as the other types of messages.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5696
This is a pretty good reference for teaching an almost-11n-capable
driver about 11n.
It enables HT20 operation, A-MPDU/A-MSDU RX, but no aggregate support
for transmit. That'll come later. This means that receive throughput
should be higher, but transmit throughput won't have changed much.
* Disable bgscan - for now, bgscan will interfere with AMPDU TX/RX,
so until we correctly handle it in software driven scans, disable.
* Add null 11n methods for channel width / ampdu_enable.
the firmware can apparently handle ampdu tx (and hopefully block-ack
handling and retransmission) so I'll go review the linux code and
figure it out.
* Set the number of tx/rx streams. I /hope/ that nchains == nstreams
here.
* Add 11n channels in the call to ieee80211_init_channels().
* Don't enable HT40 for now - I'll have to verify the channel set command
and tidy it up a bit first.
* Teach the RX path about M_AMPDU for 11n nodes. Kinda wonder why
we aren't just doing this in net80211 already, this is the fourth
driver I've had to do this to.
* Teach rate2ridx() about MCS rates and what hardware rates to use.
* Teach the urtwn_tx_data() routine about MCS/11ng transmission.
It doesn't know about short-gi and 40MHz modes yet; that'll come
later.
* For 8192CU firmware, teach the rate table code about MCS rates.
* Ensure that the fixed rate transmit sets the right transmit flag
so the firmware obeys the driver transmit path.
* Set the default transmit rate to MCS4 if no rate control is available.
* Add HT protection (RTS-CTS exchange) support.
* Add appropriate XXX TODO entries.
TODO:
* 40MHz, short-gi, etc - channel tuning, TX, RX;
* teach urtwn_tx_raw() about (more) 11n stuff;
* A-MPDU TX would be nice!
Thanks to Andriy (avos@) for reviewing the code and testing it on IRC.
Tested:
* RTL8188EU - STA (me)
* RTL8192CU - STA (me)
* RTL8188EU - hostap (avos)
* RTL8192CU - STA (avos)
Reviewed by: avos
- Replace sc_reinittask() by ieee80211_restart_all() (mostly the same).
- Revert r282377 (seems to be unneeded now).
Tested with Intel 3945BG, STA mode.
Differential Revision: https://reviews.freebsd.org/D5056
First, update the return types of aio_return() and aio_waitcomplete() to
ssize_t.
POSIX requires aio_return() to return a ssize_t so that it can represent
all return values from read() and write(). aio_waitcomplete() should use
ssize_t for the same reason.
aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and
system call entry were not updated. aio_waitcomplete() has always
returned int.
Note that this does not require new system call stubs as this is
effectively only an API change in how the compiler interprets the return
value.
Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX.
aio_read/write should now honor the same length limits as normal read/write.
Third, use longs instead of ints in the aio_return() and aio_waitcomplete()
system call functions so that the 64-bit size_t in the in-kernel aiocb
isn't truncated to 32-bits before being copied out to userland or
being returned.
Finally, a simple test has been added to verify the bounds checking on the
maximum read size from a file.
None of lstat(2), fstat(2), fstatat(2) were tracked either.
The other filemon implementations also do not track stat(2), nor
does bmake utilize it. The act of opening a file for read should
be enough to decide that a file is a dependency. There could be
rare cases where just having a file would cause a dependency but it
is unlikely.
MFC after: 2 weeks
Also noted by: sjg
Sponsored by: EMC / Isilon Storage Division
- proc.p_filemon is added which is protected by PROC_LOCK. This improves
performance and avoids double-fork issues, taking allproc_lock
while in syscalls, and walking the process tree in syscalls. A
particular proc.p_filemon can only be changed to NULL or another
filemon, or the filemon inherited, while the filemon->lock is held.
- Filemon are reference counted. On the last reference the log will be closed.
- When closing the devfs file handle, the filemon will be detached from all
processes and inheritance prevented.
- Disallow attaching to a process already being traced since filemon is
typically intended to be used on children only. This is allowed for
curproc as bmake relies on this behavior for rare cases when combining
.MAKE with .META.
- Detach any previously tracked process on ioctl(FILEMON_SET_PID).
- Handle error from devfs_set_cdevpriv() in filemon_open().
- The global filemon lock and lists are removed.
- A free list is no longer kept. Previously this list was
forever-expanding and never garbage cleaned.
- No longer loses track of double-forks. If the process holding the filemon
handle closes it will close the log rather than wait on a daemonized process,
but it will log all activity until it closes its handle. The filemon
will be removed from the process and not inherited.
- A separate process count is kept only as an optimization for
forced detachment to avoid taking allproc_lock and walking the entire
process tree.
- struct filemon access is protected by sx(9) filemon->lock as it was before.
- Add more comments and KASSERTS.
MFC after: 2 weeks
Reviewed by: kib, mjg, markj (all on previous versions)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5520
to the boot.netif.mtu env var, which will be picked up by pre-existing code
in nfs_mountroot() and used to configure the interface accordingly.
This should bring the same functionality when the bootp/dhcp work is done
by loader(8) as r297150 does for the in-kernel BOOTP case.
set that mtu on the interface.
These changes are based on the patch submitted by Robert Blayzor in the
PR, but I changed things around a bit, so the blame for any mistakes
belongs to me.
PR: 187094
Submitted by: Ju Sun <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5651
This write came from Linux commit b4ae3f22d238 which has been implicated
in Sandy Bridge power consumption issues (albeit under different
conditions on Linux). Disabling it restores normal power consumption on
my Sandy Bridge laptop (Thinkpad X220).
PR: 207889
Reviewed by: cem, dumbbell
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5665
a DRIVER_MODULE() referencing mmc_driver has a MODULE_DEPEND() on mmc. This
is because the kernel linker only searches for symbols in dependent modules,
so loading sdhci_pci (and other bus-flavors of sdhci) would fail when mmc
was not compiled into the kernel (even if you hand-loaded mmc first).
(Thanks to jilles@ for providing the vital clue about the kernel linker.)