Replace a single soft updates thread with a thread per FFS-filesystem
mount point. The threads are associated with the bufdaemon process.
Reviewed by: kib
Tested by: Peter Holm and Scott Long
MFC after: 2 weeks
Sponsored by: Netflix
In linux EXT4_LINK_MAX is now 64000. We can't really do that
since i_nlink and va_nlink are signed so setting higher values
is likely to cause trouble.
This is a system limitation so set the EXT_LINK_MAX to
what the system can handle.
MFC after: 3 days
Also disable a couple of ACPI devices that are not usable under Dom0.
To this end a couple of booleans are added that allow disabling ACPI
specific devices.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
x86/xen/xen_nexus.c:
- Return BUS_PROBE_SPECIFIC in the Xen Nexus attachement routine to
force the usage of the Xen Nexus.
- Attach the ACPI bus when running as Dom0.
dev/acpica/acpi_cpu.c:
dev/acpica/acpi_hpet.c:
dev/acpica/acpi_timer.c
- Add a variable that gates the addition of the devices.
x86/include/init.h:
- Declare variables that control the attachment of ACPI cpu, hpet and
timer devices.
Minor fixes to make the Xen Dom0 console work. This includes always
returning there's pending input in xencons_has_input, because on Dom0
there's no shared ring and we cannot test the indexes. The second
fix is to use the CONSOLEIO_read hypercall in order to read input
data from the Xen console.
Sponsored by: Citrix Systems R&D
dev/xen/console/xencons_ring.c:
- Always return true in xencons_has_input for Dom0.
- Implement Dom0 console support for xencons_handle_input.
Allow a privileged Xen guest (Dom0) to parse the MADT ACPI interrupt
overrides and register them with the interrupt subsystem.
Also add a Xen specific implementation for bus_config_intr that
registers interrupts on demand for all the vectors less than
FIRST_MSI_INT.
Sponsored by: Citrix Systems R&D
x86/xen/pvcpu_enum.c:
- Use helper functions from x86/acpica/madt.c in order to parse
interrupt overrides from the MADT.
- Walk the MADT and register any interrupt override with the
interrupt subsystem.
x86/xen/xen_nexus.c:
- Add a custom bus_config_intr method for Xen that intercepts calls
to configure unset interrupts and registers them on the fly (if the
vector is < FIRST_MSI_INT).
Split a portion of the code in madt_parse_interrupt_override to a
separate function, that is public and can be used from other code.
This will be needed by the Xen port, since FreeBSD needs to parse the
interrupt overrides and notify Xen about them.
This commit should not introduce any functional change.
Sponsored by: Citrix Systems R&D
Reviewed by: jhb, gibbs
x86/acpica/madt.c:
- Introduce madt_parse_interrupt_values() that parses the intr
information from ACPI and returns the triggering and the polarity.
This is a subset of the functionality that used to be part of
madt_parse_interrupt_override().
- Make madt_found_sci_override a global variable that can be used
from other files.
x86/include/acpica_machdep.h:
- Prototype of madt_parse_interrupt_values.
- Extern declaration of madt_found_sci_override.
Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can
force the usage of the Xen mptable enumerator even when ACPI is
detected.
This is needed because Xen might restrict the number of vCPUs
available to Dom0, but the MADT ACPI table parsed in FreeBSD is the
native one (which enumerates all the CPUs available in the system).
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
x86/acpica/madt.c:
- Lower MADT enumerator quality to -50.
x86/xen/pvcpu_enum.c:
- Rise Xen PV enumerator to 0.
This change inserts the Xen interrupt subsystem (event channels)
initialization between the system interrupt initialization and the IO
APIC source registration.
This is needed when running on Dom0, that routes physical interrupts
on top of event channels, so that the interrupt sources found during
IO APIC initialization can be registered using the Xen interrupt
subsystem.
The resulting order in the SI_SUB_INTR stage is the following:
- System intr initialization
- Xen intr initalization
- IO APIC source registration
Sponsored by: Citrix Systems R&D
x86/x86/local_apic.c:
- Change order of apic_setup_io to be called after xen interrupt
subsystem is setup.
x86/xen/xen_intr.c:
- Init Xen event channels before apic_setup_io.
Add a new DDB command to dump all registered event channels.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Add a new xen_evtchn command to DDB in order to dump all
information related to event channels.
Mask all event channels during initialization. This is done so that we
don't receive spurious interrupts while dynamically registering new
event channels. There's a small window during registration where an
event channel can fire before we have attached a handler to it.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Mask all event channels on init.
This allows Dom0 to manage physical hardware, redirecting the
physical interrupts to event channels.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Expand struct xenisrc to hold the level and triggering of PIRQ
event channels.
- Implement missing methods in xen_intr_pirq_pic.
- Allow xen_intr_alloc_isrc to take a vector parameter that globally
identifies the interrupt. This is only used for PIRQs that are
bound to a specific hardware IRQ.
- Introduce xen_register_pirq used to register IO APIC legacy PIRQ
interrupts.
- Add support for the dynamic PIRQ EOI map, this shared memory is
modified by Xen (if it suppoorts that feature), and notifies the
guest if an EOI is needed or not. If it's not available fall back
to the old implementation using PHYSDEVOP_irq_status_query.
- Rename xen_intr_isrc_count to xen_intr_auto_vector_count and
replace it's usages.
- Align static variables by name.
xen/xen_intr.h:
- Add prototype for xen_register_pirq.
This allows to avoid extra network traffic when copying files on NTFS iSCSI
disks within one storage host by drag'n'dropping them in Windows Explorer
of Windows 8/2012. It should also accelerate Hyper-V VM operations, etc.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET,
but also changed one of the SOCK_DGRAM code paths to use the new
sbappendaddr_nospacecheck_locked() function. This lead to SOCK_DGRAM
bypassing socket buffer limits.
bridges in strange ways, either rendering them unable to detect
insertion and removal events, or possibly unable to read from the
device behind the bridge.
This fixes at least one laptop, a Toshiba Tecra M5 with a Texas
Instruments PCxx12 (d=0x8039 v=0c104c) bridge. The very similar
Tecra M9 has the same bridge, but worked fine without this change.
The bridge chip has no I/O port BAR, and there is nothing in the spec
to suggest I/O decoding should be enabled; however enabling it fixes
the issue. Add an XXX comment to this effect.
Discussed with: jhb, imp
MFC after: 2 weeks
We continue to use pmap_enter() for that. For unwiring virtual pages, we
now use pmap_unwire(), which unwires a range of virtual addresses instead
of a single virtual page.
Sponsored by: EMC / Isilon Storage Division
NRSACK extension. The default will still be off, since it
it not an RFC (yet).
Changing the sysctl name will be in a separate commit.
MFC after: 1 week
Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().
Reviewed by: mav, gibbs
Differential Revision: https://phabric.freebsd.org/D521
MFC after: 2 weeks
It could be claimed that two things were reasonable protected by
Giant. One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is
now updated with atomics.
Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
option for controlling ECN on future associations and get the
status on current associations.
A simialar pattern will be used for controlling SCTP extensions in
upcoming commits.
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.
(See r268327 and r269134 for the motivation behind this change.)
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.
This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.
Illumos issue:
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>
MFC after: 2 weeks
writes to files for read-only file systems. Since there are already
checks in nandfs_setattr that return an error, this moves detection of
the error earlier.
provides support for a variety of low-end graphics hardware (SBus adapters,
Mach64, QEMU's framebuffer, XVR-100). A driver for at least the Creator3D
cards will have to be present before this can become the default console
driver.
To test vt(4) on sparc64, set kern.vty=vt at the loader prompt.
Reorganize struct sge_iq. Make the iq entry size a compile time
constant. While here, eliminate RX_FL_ESIZE and use EQ_ESIZE directly.
MFC after: 2 weeks
This prevents recursion of vdev_queue_io_done as per r265321 but
using a different method as recommended on the openzfs list.
We now use zio_interrupt(zio) and return ZIO_PIPELINE_STOP instead
of returning ZIO_PIPELINE_CONTINUE from vdev_*_io_start methods.
zio_vdev_io_start now ASSERTS the that vdev_op_io_start returns
ZIO_PIPELINE_STOP to ensure future changes don't reintroduce
ZIO_PIPELINE_CONTINUE returns.
Cleanup flow in vdev_geom_io_start while I'm here.
Also fix some cases not using SET_ERROR(..)
MFC after: 2 weeks
X-MFC-With: r265321
don't need any #ifdef stuff to use atomic_load/store_64() elsewhere in
the kernel. For armv4 the atomics are trivial to implement for kernel
code (just disable interrupts), less so for user mode, so this only has
the kernel mode implementations for now.
nanouptime() instead of getnanouptime(). nanouptime(9) provides more
precise result at expense of being slower.
In r269223, gethrtime() is used as creation time of dbuf, which in turn
acts as portion of lookup key to maintain AVL invariant where there can
not be duplicate items. Before this change, gethrtime() have preferred
better execution time by sacrificing precision, which may lead to panic
on busy systems with:
panic: avl_find() succeeded inside avl_add()
Reported by: allanjude, mav
PR: kern/192284
MFC after: 11 days
X-MFC-with: r269223
value shared across multiple cores is with atomic_load_64() and
atomic_store_64(), because the normal 64-bit load/store instructions
are not atomic on 32-bit arm. Luckily the ldrexd/strexd instructions
that are atomic are fairly cheap on armv6. Because it's fairly simple
to do, this implements all the ops for 64-bit, not just load/store.
Reviewed by: andrew, cognet
The rss_key[] array in netinet/in_rss.c has the bytes in incorrect
order. This results in the RSS test vectors in the Microsft RSS spec
and Intel NIC specs giving incorrect results, and making it difficult
to verify correct hash operation when RSS functionality is added to
new NICs.
CR: https://phabric.freebsd.org/D516
Reviewed by: adrian
We have functions nested within functions, and places where we start a
function then never end it, we just jump to the middle of something else.
We tried to express this with nested ENTRY()/END() macros (which result
in .fnstart and .fnend directives), but it turns out there's no way to
express that nesting in ARM EHABI unwind info, and newer tools treat
multiple .fnstart directives without an intervening .fnend as an error.
These changes introduce two new macros, EENTRY() and EEND(). EENTRY()
creates a global label you can call/jump to just like ENTRY(), but it
doesn't emit a .fnstart. EEND() is a no-op that just documents the
conceptual endpoint that matches up with the same-named EENTRY().
This is based on patches submitted by Stepan Dyatkovskiy, but I made some
changes and added the EEND() stuff, so blame any problems on me.
Submitted by: Stepan Dyatkovskiy <stpworld@narod.ru>
(4 in operation), 4GB ram (3.5 usable) ARM machine.
Support covers device drivers for:
- Serial Peripheral Interface (SPI)
- Chrome Embedded Controller (EC) - SPI-based version
- XHCI and USB 3.0 dual-role device PHY
Also:
- Add support for Exynos5420 in Pad module
- Move power-related functions to separate driver --
Power Management Unit (PMU)
- Enable XHCI for Chromebook1
Special thanks to grehan@ for hardware, and to
hselasky@ for r269139.
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)
Tested by: andreast
compressed tarball, aka package. The file system assumes that the
files are layed-out in the same order as needed to allow for the
package to be streamed. As such, it does not read an entire package
into memory first.
Some properties of the file system:
o Files that start with '+' are silently skipped. These are found
in FreeBSD package files.
o Files smaller than or equal to 4KB will be cached in memory and
as such allow for some flexibility in accessing files out of
order.
o Files with the .tgz suffix are assumed to be (sub-)packages and
signal the end for a directory scan.
Obtained from: Juniper Networks, Inc.
left some of the decisions based on its counterpart, SA_CCB_BUFFER_IO
being random. As a result, propagation of the residual information
for the SPACE command was broken, so the number of filemarks
encountered during a SPACE operation was miscalculated. Consequently,
systems relying on properly tracked filemark counters (like Bacula)
fell apart.
The change also removes a switch/case in sadone() which r256843
degraded to a single remaining case label.
PR: 192285
Approved by: ken
MFC after: 2 weeks
It overflows witness.
Shorten the names of some nfs mutexes.
Reported and tested by: pho
No objections from: rmacklem, mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In the mmcsd layer use this value to populate disk->d_ident. Also set
disk->d_descr to the full set of card identification info (includes vendor,
model, manufacturing date, etc).
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.
This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before
any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will
panic.
Reported by: andreast
* Implements Start Stop Unit for SATA direct-attach devices in IR mode to avoid
data corruption.
* Use CAM_DEV_NOT_THERE instead of CAM_SEL_TIMEOUT and CAM_TID_INVALID
Obtained from: LSI
MFC after: 2 weeks
features. If bootverbose is enabled, a detailed list is provided;
otherwise, a single-line summary is displayed.
- Add read-only sysctls for optional VT-x capabilities used by bhyve
under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that
indicate the presence of optional capabilities under this node.
CR: https://phabric.freebsd.org/D498
Reviewed by: grehan, neel
MFC after: 1 week
framebuffer drivers. This lets ofwfb work with xf86-video-scfb and makes
the driver much more generic and less PCI-centric. This changes some
user-visible behavior and will require updates to the xorg-server port
on PowerPC when using ATI graphics cards.
some parts of the checks are in fact redundand in the surrounding
code, and it is more clear what the conditions are by direct testing
of the flags. Two of the three macros were only used in assertions.
In vnlru_free(), all relevant parts of vholdl() were already inlined,
except the increment of v_holdcnt itself. Do not call vholdl() to do
the increment as well, this allows to make assertions in
vholdl()/vhold() more strict.
In v_incr_usecount(), call vholdl() before incrementing other ref
counters. The change is no-op, but it makes less surprising to see
the vnode state in debugger if interrupted inside v_incr_usecount().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue:
5034 ARC's buf_hash_table is too small
MFC after: 2 weeks
Change dn->dn_dbufs from linked list to AVL tree.
Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets
MFC after: 2 weeks
code behave more like it is on Solaris.
Reported by: avg
Reviewed by: avg, mav (but bugs are mine)
Differential Revision: https://phabric.freebsd.org/D457
moved from the stack into the tag structure. In retrospect that was a bad
idea, because nothing protects that array from concurrent access by
multiple threads.
This change moves the array to the map structure (actually it's allocated
following the structure, but all in a single malloc() call).
This also establishes a "sane" limit of 4096 segments per map. This is
mostly to prevent trying to allocate all of memory if someone accidentally
uses a tag with nsegments set to BUS_SPACE_UNRESTRICTED. If there's ever
a genuine need for more than 4096, don't hesitate to increase this (or
maybe make it tunable).
Reviewed by: cognet
triggers a need to bounce due to cacheline alignment. These buffers
are always aligned to cacheline boundaries, and even when the DMA operation
starts at an offset within the buffer or doesn't extend to the end of the
buffer, it's safe to flush the complete cachelines that were only partially
involved in the DMA. This is because there's a very strict rule on these
types of buffers that there will not be concurrent access by the CPU and
one or more DMA transfers within the buffer.
Reviewed by: cognet
functions, it has evolved to make a variety of decisions about whether
the DMA needs to bounce, so rename it to must_bounce(). Rewrite it to
perform checks outside of the ancestor loop if they're based on information
that's wholly contained within the original tag. Now the loop only checks
exclusion zones in ancestor tags.
Also, add a new function, might_bounce() which does a fast inline check
of flags within the tag and map to quickly eliminate the need to call
the more expensive must_bounce() for each page in the DMA operation.
Within the mapping loops, use map->pagesneeded != 0 as a proxy for all
the various checks on whether bouncing might be required. If no pages
were reserved for bouncing during the checks before the mapping loop,
then there's no need to re-check any of the conditions that can lead
to bouncing -- all those checks already decided there would be no bouncing.
Reviewed by: cognet
exclusion zones and phsyical memory. The phys_avail[i] entries are the
address of the first byte of ram in the region, and phys_avail[i+1]
entries are the address of the first byte of ram in the next region
(i.e., they're not included in the region that starts at phys_avail[i]).
Reviewed by: cognet
unchanging values in the phys_avail array, so do the comparisons just once
at tag creation time and set a flag to remember the result.
Reviewed by: cognet
DMA on arm can bounce for several reasons, and _bus_dma_can_bounce() only
checks for the lowaddr/highaddr exclusion ranges in the dma tag, so now
it's named exclusion_bounce(). The other reasons for bouncing are checked
by the new functions alignment_bounce() and cacheline_bounce().
Reviewed by: cognet
postponing it to zfs_vget(). zfs_root() returned vnode with the
default value of v_hash, which caused inconsistent v_hash value when
root vnode was obtained from zfs_vget().
Nullfs allocated two upper vnodes for the root zfs vnode due to
different hashes, causing consistency problems.
Reported and tested by: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
place for the NFS-based PXE loader. Information like rootpath
or rootip aren't that useful for TFTP and the gateway IP is
typically already printed by the firmware.
2. Only set boot.nfsroot.* environment variables for NFS. This
makes it possible for the OS to work either way by checking
for the presence or absence of environment variables.
3. Set boot.netif.server when using TFTP so that the OS can fetch
files as well. A typical use case for this is network-based
installations with the installation process implemented on
top of FreeBSD.
4. The pxelinux loader has a set of alternative names it tries
for configuration files. Make it easier to do something
similar in Forth by providing the IP address as a 32-bit hex
number in the pxeboot.ip variable and the MAC address with
dashes in the pxeboot.hwaddr environment variable.
Obtained from: Juniper Networks, Inc.
particular, allow loaders to define the name of the RC script the
interpreter needs to use. Use this new-found control to have the
PXE loader (when compiled with TFTP support and not NFS support)
read from ${bootfile}.4th, where ${bootfile} is the name of the
file fetched by the PXE firmware.
The normal startup process involves reading the following files:
1. /boot/boot.4th
2. /boot/loader.rc or alternatively /boot/boot.conf
When these come from a FreeBSD-defined file system, this is all
good. But when we boot over the network, subdirectories and fixed
file names are often painful to administrators and there's really
no way for them to change the behaviour of the loader.
Obtained from: Juniper Networks, Inc.
be able to claim I know how the UART code works."
* Just return 115200 as the current baud rate. I should cache it in the
device struct and return that but I'm lazy right now.
* don't error out on other ioctl settings for now, just silently ignore them.
* remove some code that was copied from the 8250 driver that isn't needed
any longer.
Tested:
* AR9331, Carambola-2 board.
so it really should not be under "optional inet". The fact that uipc_accf.c
lives under kern/ lends some weight to making it a "standard" file.
Moving kern/uipc_accf.c from "optional inet" to "standard" eliminates the
need for #ifdef INET in kern/uipc_socket.c.
Also, this meant the net.inet.accf.unloadable sysctl needed to move, as
net.inet does not exist without networking compiled in (as it lives in
netinet/in_proto.c.) The new sysctl has been named net.accf.unloadable.
In order to support existing accept filter sysctls, the net.inet.accf node
has been added netinet/in_proto.c.
Submitted by: Steve Kiernan <stevek@juniper.net>
Obtained from: Juniper Networks, Inc.
underlying physical pages are mapped by the pmap. If, for example, the
application has performed an mprotect(..., PROT_NONE) on any part of the
wired region, then those pages will no longer be mapped by the pmap.
So, using the pmap to lookup the wired pages in order to unwire them
doesn't always work, and when it doesn't work wired pages are leaked.
To avoid the leak, introduce and use a new function vm_object_unwire()
that locates the wired pages by traversing the object and its backing
objects.
At the same time, switch from using pmap_change_wiring() to the recently
introduced function pmap_unwire() for unwiring the region's mappings.
pmap_unwire() is faster, because it operates a range of virtual addresses
rather than a single virtual page at a time. Moreover, by operating on
a range, it is superpage friendly. It doesn't waste time performing
unnecessary demotions.
Reported by: markj
Reviewed by: kib
Tested by: pho, jmg (arm)
Sponsored by: EMC / Isilon Storage Division