Standard VOP_FSYNC() implementation just syncs data buffers, and due
to this, is the correct and efficient implementation for msdosfs or
any other filesystem which uses bufer cache trivially. Provide
globally visible wrapper vop_stdfdatasync_buf() for future consumption
by other filesystems.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2). For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC(). This is functionally correct but not efficient.
This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.
Reviewed by: mckusick
Discussed with: avg (ZFS), jhb (AIO part)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
Simply change the mode to K_XLATE using a local variable and use the
grab level as a flag to tell screen switches not to change it again,
so that we don't need to switch scp->kbd_mode. We did the latter,
but didn't have the complications to update the keyboard mode switch
for every screen switch. sc->kbd_mode remains at its user setting
for all scp's and ungrabbing restores to it.
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7484
- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7483
Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.
While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7476
Like scr_lock, the grab count needs to be per-physical-device to work.
This bug corrupted the grab count on both vtys if the ungrabbed vty is
different from the console, and failed to restore the keyboard state
on the ungrabbed vty, but not restoring it usually left the keyboard
mode part of the keyboard state uncorrupted at 1 (K_XLATE), while
after this fix the keyboard mode part is usually corrupted to 0 (K_RAW).
While here, rename the grab count from grabbed to grab_level.
This bug corrupted the grab count on both vtys if the ungrabbed vty is
different from the console, and failed to restore the keyboard state
on the ungrabbed vty, but not restoring the latter usually left the
keyboard mode part of it uncorrupted at 1 (K_XLATE), while after this
fix the keyboard mode part is usually corrupted to 0 (K_RAW).
While here, rename the grab count from 'grabbed' to grab_level.
- never call up to the tty layer to restart output for keyboard input in
console mode. This was already disallowed in kdb mode. Other cases
are rarely reached.
- disable the reboot, halt and powerdown keys in console mode. The suspend,
standby and panic keys are still allowed, and aren't even conditonal
on excessive configuration options. Some of these actions are still
available in ddb mode as ddb commands which are equally unsafe. Some
are useful at input prompts and should be restored when the locking is
fixed.
- disallow bells in kdb mode (should be in console mode, but the flag for
that is not available). Visual bell gives very alarming behaviour by
trying to use callouts which don't work in kdb mode. Audio bell uses
timeouts and hardware resources with mutexes that can deadlock in
reasonable use of ddb.
Screen switches in kdb mode are not very safe, but they are important
functionality and there is a lot of code to make them sort of work.
restores avoidance of doing dangerous things like calling wakeup() and
callouts while in ddb.
Initialization of 'debugger' was broken by removing the cndbctl() console
method that was used mainly in this driver to initialize 'debugger' and
switch to the console screen on entry to ddb. The screen switch was
restored using the cngrab() method, but cngrab() is more general so it
should not initialize 'debugger' and never did. 'debugger' was just
an over-engineered alias for kdb_active anyway. It existed because
kdb_active (when it was named ddb_active) was considered as a private
kdb variable, and there are ordering problems initializing the variables
atomically with the state that they represent, but an extra variable and
method to set it increased these problems.
The bug caused LORs, but WITNESS is normally misconfigured with
WITNESS_SKIPSIN so it doesn't check the spinlocks used by wakeup() and
callouts.
virtual-device, but needs to be per-physical-device so that it protects
shared data. Usually, scp->sc->write_in_progress got corrupted first
and further corruption was limited when this variable was left at nonzero
with no write in progress.
Attempt to fix missing lock destruction in r162285. Put it with the
lock destruction for r172250 after moving the latter. Both might be
unreachable.
To demonstrate the bug, find a buggy syscall or sysctl that calls
printf(9) and run this often. Run hd /dev/zero >/dev/ttyvN for any
N != 0. The console spam goes to ttyv0 and the non-console spam goes
to ttyvN, so the lock provided no protection (but it helped for
N == 0).
Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.
PR: 211796
Obtained from: OpenBSD (sthen)
MFC after: 3 days
Fix PC_REGS() so that printing of instructions works in some useful
cases. ddb only understands a single flat address space, but this
macro allows mapping $cs:$eip into vm86's flat address space well
enough for the MI parts of ddb. This doesn't work for the MD parts
that do stack traces, and there are no similar macros for data addresses.
PC_REGS() has to use the trapframe pointer instead of the pcb for this.
For other CPUs, the trapframe pointer is not available except by tracing
back to it. But tracing back through vm86 trapframes is broken even
starting with one.
It was remarkably hard to trace all current threads. "show pcpu" only
showed the pid, and there was nothing (?) better than searching ps output
to find the tids on CPUs. This change simplifies the search, but you
still have to trace the tid for each CPU manually.
the fixed-width numeric field 'wchan', as in ps(1). They were sort
of centered, although the template shows 'state' as right-justified.
The `wmesg' field very rarely has a prefix of '*' (for lock names)
that is still to the left of the header, and the width of this field
is reduced from 8 to 7 (more than 6 is an error).
The 'wmesg' and 'wchan' fields are still misnamed and poorly handled.
They are named sort of backwards relative to ps(1):
- wmesg in ddb = mwchan in ps
- wmesg in ddb = wchan in ps (if it is a wait channel name, not a lock name)
- wchan in ddb = nwchan in ps
ddb ps wastes lots of space for the unimportant 'wchan' field (20
columns altogether on 64-bit arches). ps(1) documents using a
compressed format, but the compression only omits leading nybbles of
0 so it has neveqr worked on arches that put the kernel in the top half
of the address space. It just avoids wasting space for an 0x prefix.
single stepping of multiple instructions (e.g., s/p,<count> and n/p).
db_print_loc_and_inst() already prints a newline on all arches although
it probably shouldn't.
Especially on SMP systems, single stepping tends to deadlock or panic
too quickly to be useful for anything except finding bugs in itself,
but with printing "itself" includes console drivers so it is useful
for generating stress tests for console drivers.
mpc85xx_map_dcsr() returns a vm_offset_t, not an error code.
mpc85xx_fix_errata() will gracefully exit if mpc85xx_map_dcsr() returns 0, as
that indicates an error (NULL pointer).
__syncicache() only syncs the icache on the current CPU, it doesn't touch the
cache on any other core. Replace the call with cpu_flush_dcache() instead.
Since bp_kernload is not touched again by the boot CPU in this code path, dcbf
is no less efficient than the dcbst from __syncicache() by invalidating the
cache line.
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC. Using a rewritable asm stub, print any SPR
provided on the command line.
Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).
Note also that this relies on the kernel text pages being writable. If in the
future this is made not the case, this will need to be reworked.
Test Plan:
Printing the Processor Version Register (PVR, SPR 287):
db> show spr 11f
SPR 287(11f): 80240012
Differential Revision: https://reviews.freebsd.org/D7403
incorrect from the error cases in exec_map_first_page(). They are
unnecessary because we automatically unbusy the page in vm_page_free()
when we remove it from the object. The calls are incorrect because they
happen after the page is freed, so we might actually unbusy the page
after it has been reallocated to a different object. (This error was
introduced in r292373.)
Reviewed by: kib
MFC after: 1 week
Summary:
u-boot, following the ePAPR specification, puts secondary cores into a
spinloop at boot, rather than leaving them shut off. It then relies on the host
OS to write the correct values to a special spin table, located in coherent
memory (on newer implementations), or noncoherent memory (older
implementations).
This supports both implementations of ePAPR, as well as continuing to support
non-ePAPR booting, by first attempting to use the spintable, and falling back to
expecting non-started CPUs.
Test Plan:
Booted on a P5020 board. Tested before and after the changes.
Before the changes, prints the error "SMP: CPU 1 already out of hold-off state!"
and panics shortly thereafter. After the changes, same boot method lets it
complete boot.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D7494
The module works together with ipfw(4) and implemented as its external
action module.
Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.
A configuration of instance should looks like this:
1. Create lookup tables:
# ipfw table T46 create type addr valtype ipv6
# ipfw table T64 create type addr valtype ipv4
2. Fill T46 and T64 tables.
3. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
4. Create NAT64 instance:
# ipfw nat64stl NAT create table4 T46 table6 T64
5. Add rules that matches the traffic:
# ipfw add nat64stl NAT ip from any to table(T46)
# ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.
Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.
A configuration of instance should looks like this:
1. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
2. Create NAT64 instance:
# ipfw nat64lsn NAT create prefix4 A.B.C.D/28
3. Add rules that matches the traffic:
# ipfw add nat64lsn NAT ip from any to A.B.C.D/28
# ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6434
ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.
Subsequent commits will update nfsstat(8) to use the new fields.
Submitted by: will (earlier version)
Reviewed by: ken
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D1626
- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
and avoid multiple calls
Submitted by: shurd
Reviewed by: gallatin
Approved by: scottl
Differential Revision: https://reviews.freebsd.org/D7393
Some defines needed for exporting serial numbers from the SMBIOS were
missed during integration of SMBIOS support in the EFI boot loader (r281138).
This is needed for getting the hostid set from the system hardware UUID.
PR: 206031
Submitted by: Thomas Eberhardt <sneakywumpus@gmail.com>
MFC after: 1 week
These may have ccache in them or -target/--sysroot from external
compiler or SYSTEM_COMPILER support. Many ports do not support
a CC with spaces in it, such as emulators/virtualbox-ose-kmod.
Passing --sysroot to ports makes no sense as ports doesn't support
--sysroot currently.
If these variables need to be overridden for ports then they can
be set in make.conf or passed as make arguments.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
promote memory as I am not sure all the demotion cases are handled, however
it is useful to implement pmap_page_set_memattr. This is used, for example,
when mapping uncached memory for bus_dma(9).
pmap_page_set_memattr needs to demote the DMAP region as on ARM we need to
ensure all mappings to the same physical address have the same attributes.
Reviewed by: kib
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6987
The current in_pcb.h includes route.h, which includes sockaddr structures.
Including <sys/socketvar.h> should require <sys/socket.h>; add it in
the appropriate place.
PR: 211385
Submitted by: Sergey Kandaurov and iron at mail.ua
Reviewed by: gnn
Approved by: gnn (mentor)
MFC after: 1 day
The only difference between 3 and 3B is the size of the RJ45 port.
And now we have a uboot port that expect pcduino3.dts to be present.
Reported by: imp
Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.
Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.
Also missing sysctls for the current tunables so their values can be seen.
PR: 207446
Reported by: Andy Carrel
MFC after: 3 days
Relnotes: Yes
Sponsored by: Multiplay
Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.
PR: 211256
Tested by: Victor Chernov
MFC after: 3 days
It was added in r153192 for XFS and doesn't appear to have been used for
anything else. XFS was disconnected in r241607 and removed entirely in
r247631.
Reported by: mlaier
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D7468
CloudABI executables already provide support for passing in vDSOs. This
functionality is used by the emulator for OS X to inject system call
handlers. On FreeBSD, we could use it to optimize calls to
gettimeofday(), etc.
Though I don't have any plans to optimize any system calls right now,
let's go ahead and already pass in a vDSO. This will allow us to
simplify the executables, as the traditional "syscall" shims can be
removed entirely. It also means that we gain more flexibility with
regards to adding and removing system calls.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D7438
SMBIOS Type 1 fields:
smbios.system.sku - SKU Number (SMBIOS 2.4 and above)
smbios.system.family - Family (SMBIOS 2.4 and above)
Add kernel environment variables under smbios.planar for the following
SMBIOS Type 2 fields:
smbios.planar.tag - Asset Tag
smbios.planar.location - Location in Chassis
Reviewed by: jhb, grembo
Approved by: sjg (mentor)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D7453
unordered user messages using DATA chunks as invalid ones.
While there, ensure that error causes are provided when
sending ABORT chunks in case of reassembly problems detected.
Thanks to Taylor Brandstetter for making me aware of this problem.
MFC after: 3 days
On all the other architectures, this function can also be called on the
currently running thread. In this case, we shouldn't fix up the address
in the PCB, but also patch up the register itself. Otherwise it will not
become active and will simply become overwritten by the next switch.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D7437
The offset of the directory file, passed to getdirentries(2) syscall,
is user-controllable. The value of the offset must not be asserted,
instead the invalid value should be checked and rejected if invalid.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
processes which combine kernel and non-kernel threads, e.g. nfsd. For
such processes, termination of a kthread must recheck signal delivery
among other threads according to masks.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
exception is caught in kernel mode. There are third-party modules
which trigger the issue, and since the problem causes usermode state
corruption at least, panic in production kernels as well.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Machine privilege level was specially designed to use in vendor's
firmware or bootloader. We have implemented operation in machine
mode in FreeBSD as part of understanding RISC-V ISA, but it is time
to remove it.
We now use BBL (Berkeley Boot Loader) -- standard RISC-V firmware,
which provides operation in machine mode for us.
We now use standard SBI calls to machine mode, instead of handmade
'syscalls'.
o Remove HTIF bus.
HTIF bus is now legacy and no longer exists in RISC-V specification.
HTIF code still exists in Spike simulator, but BBL do not provide
raw interface to it.
Memory disk is only choice for now to have multiuser booted in Spike,
until Spike has implemented more devices (e.g. Virtio, etc).
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
`device_t` is not defined outside the kernel but this header is used by
eg. libkvm or vmstat(8). Thus, r303890 broke the build.
So let's restore `struct device` here until a longer term solution is
found.
Reported by: Michael Butler <imb@protected-networks.net>, Jenkins
MFC after: 3 days
MFC with: r303890
Uses of commas instead of a semicolons can easily go undetected. The comma
can serve as a statement separator but this shouldn't be abused when
statements are meant to be standalone.
Detected with devel/coccinelle following a hint from DragonFlyBSD.
MFC after: 1 month
Uses of commas instead of a semicolons can easily go undetected. The comma
can serve as a statement separator but this shouldn't be abused when
statements are meant to be standalone.
Detected with devel/coccinelle following a hint from DragonFlyBSD.
MFC after: 1 month
Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.
The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.
Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447
- Add constants for the fields in the root-entry table address register,
namely the root type type (RTT) and root table address (RTA) mask.
- Add macros for the bitmask of the domain ID field in the second word
of context table entries as well as a helper macro (DMAR_CTX2_GET_DID)
to extract the domain ID from a context table entry.
Reviewed by: kib
MFC after: 1 month
Sponsored by: Chelsio Communications
Previously the loop in PCIIOCGETCONF would terminate as soon as it
found enough matches. Now it will continue iterating through the
PCI device list and only terminate if it finds another matching device
for which it has no room to store a conf structure. This means that
PCI_GETCONF_LAST_DEVICE is reliably returned when the number of
matching devices is equal to the number of slots in the matches
buffer. For example, if a program requests the conf structure for a
single PCI function with a specified domain/bus/slot/function it will
now get PCI_GETCONF_LAST_DEVICE instead of PCI_GETCONF_MORE_DEVS.
While here, simplify the loop conditional a bit more by explicitly
breaking out of the loop if copyout() fails and removing a redundant
i < pci_numdevs check.
Reviewed by: vangyzen, imp
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7445
Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers. For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero. VF devices have a non-zero base
absolute ID and require this change. While here, export the absolute ID
of egress queues via a sysctl.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7446
Clear the device description to avoid use after free because the
bsddev is not destroyed when the mlx5en module is unloaded. Only when
the parent mlx5 module is unloaded the bsddev is destroyed. This fixes
a panic on listing sysctls which refer strings in the bsddev after the
mlx5en module has been unloaded.
Sponsored by: Mellanox Technologies
MFC after: 1 week
The problem was that 'zfsvfs' variable was not initialized if the error
was detected, but in the exit path the variable was dereferenced before
the error code was checked.
Reported by: np
MFC after: 3 days
X-MFC with: r303763
_prison_check_ip4 renamed to prison_check_ip4_locked
Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked
Add appropriate prototypes to sys/sys/jail.h
Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.
Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.
Reviewed by: jtl
Approved by: sjg (mentor)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D6799
If the listening socket is closed while sonewconn() is executing, the
nascent child socket is aborted, which results in recursion on the
unp_link lock when the child's pru_detach method is invoked. Fix this
by using a flag to mark such sockets, and skip a part of the socket's
teardown during detach.
Reported by: Raviprakash Darbha <rdarbha@juniper.net>
Tested by: pho
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7398
in GENERIC.
Fixup #ifdef RSS code blocks so that they build and add/delete variables
that were missesd during the creation of this code.
This code is untested and should have a big red warning on it.
Reported by: npn@
MFC after: 2 days
driver. This change significantly increases the overall RX aggregation
ratio for heavily loaded networks handling 10-80 thousand simultaneous
connections.
Remove the turbo LRO code and all references to it which has now been
superceeded by the tcp_lro_queue_mbuf() function.
Tested by: Netflix
Sponsored by: Mellanox Technologies
MFC after: 1 week
In one corner case in the bxe TX path, a NULL mbuf could be enqueued onto
a drbr queue. This could case a KASSERT to fire with INVARIANTS enabled,
or the processing of packets from the queue to be prematurely ended later
on.
Submitted by: Matt Joras (matt.joras AT isilon.com)
Reviewed by: davidcs
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7041
last SID/SSN pair wasn't filled in.
Thanks to Julian Cordes for providing a packetdrill script
triggering the issue and making me aware of the bug.
MFC after: 3 days
CloudABI executables that are emulated on Mac OS X do not invoke system
calls through "syscall". Instead, they make use of a vDSO that is
provided by the emulator that provides symbols for all of the system
call routines. The emulator can implement these any way it likes.
At some point in time we want to do this for native execution as well,
so that CloudABI executables are entirely oblivious of how system calls
need to be performed. They will simply call into functions and let that
deal with all of the details.
These source files can be used to generate a simple vDSO that does
nothing more than invoke "syscall". All we need to do now is map it into
the processes.
Obtained from: https://github.com/NuxiNL/cloudabi
The saved channel callback in util softc is actually never used.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7424
Without enabling this bit, tlbre and tlbsx don't update the MAS7 register,
resulting in garbage in the register after a read (rather, the previous setting
of it for a tlbwe). This can result in mmu_booke_mapdev_attr() thinking
mappings that should match actually don't, because tlb1_read_entry() can't
determine the correct address of a given entry.
MFC after: 11-RELEASE
bouncing of unmapped buffers. Also treat userspace buffers as unmapped, to
avoid borrowing the UVA for copies. This allows sync'ing userspace buffers
outside the context of the owning process, and sync'ing bounced maps in
non-sleepable contexts.
This change is equivalent to r286787 for x86.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D3989
turn them into a shared definition.
Set M_MCAST/M_BCAST appropriately upon packet reception in net80211, just
before they are delivered up to the ethernet stack.
Submitted by: rstone
* remove the DEBUG ifdef; defining it is too far reaching throughout
the whole system;
* add a bitmask in the softc for controlling debugging;
* .. enable said debugging as a sysctl;
* add bitmaps for register access, reset and vlans.
TODO:
* Now that the debug statements are configurable, we definitely could
do with more debugging
* Move the debugging into the top-level etherswitch driver and have
sub-drivers obey.
This work, originally from Stacey Son, uses the MIPS UserReg for
reading the TLS data, and will fall back to the normal syscall path
when it isn't supported.
This code dynamically patches cpu_switch() to bypass the UserReg
instruction so to avoid generating a machine exception.
Thanks to sson for the original work, and to Dan Nelson for
bringing it to date and testing it on MIPS32 with me.
Tested:
* mips64 (sson)
* mips74k (dnelson_1901@yahoo.com) - AR9344 SoC, UserReg support
* mips24k (adrian) - AR9331 SoC, no UserReg support
Obtained from: sson, dnelson_1901@yahoo.com
Make FDT blob available via opaque hw.fdt.dtb sysctl, if a DTB has been
installed by the time sysctls are registered.
Reviewed by: andrew
Approved by: sjg (mentor)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D7411
These are currently unused in our implementation and some even appear to
have not been implemented yet on linux but it is good to keep them for
reference.
Obtained from: NetBSD (CVS Rev. 1.41)
MFC after: 1 month
* Use the right incantation to get the next stack pointer. Since powerpc uses
special frames for traps, dereferencing the stack pointer straight up won't
get us the next stack pointer in every case.
* Clear EE using the correct instruction sequence. The PowerISA states that
'andi.' ANDs the register with 0||<imm>, instead of sign extending or filling
out the unavailable bits with 1. Even if it did sign extend, PSL_EE is
0x8000, so ~PSL_EE is 0x7fff, and the upper bits would be cleared. Use rlwinm
in the 32-bit case, and a two-rotate sequence in the 64-bit case, the latter
chosen to follow the output generated by gcc.
MFC after: 1 week
The problem is that the special .zfs nodes are not represented by
znodes but by special gfs-based nodes.
r303763 changed interface of zfs_dirlook such that started operating on
znodes rather than on vnodes and, thus, the function became unsuitable
for handling .zfs entities.
The solution is to move the handling of the special cases to zfs_lookup,
the only consumer of zfs_dirlook.
I already had this solution applied in D7421, but for different reasons.
Reported by: asomers
MFC after: 3 days
X-MFC with: r303763
The interpretation of the Electromechanical Interlock Status was
inverted, so we disengaged the EI if a card was inserted.
Fix it to engage the EI if a card is inserted.
When displaying the slot capabilites/status with pciconf:
- We inverted the sense of the Power Controller Control bit,
saying the power was off when it was really on (according to
this bit). Fix that.
- Display the status of the Electromechanical Interlock:
EI(engaged)
EI(disengaged)
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7426
* add an ANY debug level which will always echo the message if debugging
is compiled in;
* log MDIO transaction timeouts if debugging is compiled in;
* the argemdio device is different to arge, so turning on MDIO debugging
flags in arge->sc_debug doesn't help. Add a debug sysctl to argemdio
as well so that MDIO transactions can be debugged.
Tested:
* AR9331
Simplify the logic involved in changing the nic features on the fly, and
only reset the frontend when really needed (when changing RX features). Also
don't return from the ioctl until the interface has been properly
reconfigured.
While there, make sure XN_CSUM_FEATURES is used consistently.
Reported by: julian
MFC after: 5 days
X-MFC-with: r303488
Sponsored by: Citrix Systems R&D
This keeps the segments/ACK/FIN delivery order.
Before this patch, it was observed: if A sent FIN immediately after
an ACK, B would deliver FIN first to the TCP stack, then the ACK.
This out-of-order delivery causes one unnecessary ACK sent from B.
Reviewed by: gallatin, hps
Obtained from: rrs, gallatin
Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe)
Differential Revision: https://reviews.freebsd.org/D7415
ZFS POSIX Layer is originally written for Solaris VFS which is very
different from FreeBSD VFS. Most importantly many things that FreeBSD VFS
manages on behalf of all filesystems are implemented in ZPL in a different
way.
Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS
functionality or, in the worst cases, badly interacts / interferes
with VFS.
The most prominent problem is a deadlock caused by the lock order reversal
of vnode locks that may happen with concurrent zfs_rename() and lookup().
The deadlock is a result of zfs_rename() not observing the vnode locking
contract expected by VFS.
This commit removes all ZPL internal locking that protects parent-child
relationships of filesystem nodes. These relationships are protected
by vnode locks and the code is changed to take advantage of that fact
and to properly interact with VFS.
Removal of the internal locking allowed all ZPL dmu_tx_assign calls to
use TXG_WAIT mode.
Another victim, disputable perhaps, is ZFS support for filesystems with
mixed case sensitivity. That support is not provided by the OS anyway,
so in ZFS it was a buch of dead code.
To do:
- replace ZFS_ENTER mechanism with VFS managed / visible mechanism
- replace zfs_zget with zfs_vget[f] as much as possible
- get rid of not really useful now zfs_freebsd_* adapters
- more cleanups of unneeded / unused code
- fix / replace .zfs support
PR: 209158
Reported by: many
Tested by: many (thank you all!)
MFC after: 5 days
Sponsored by: HybridCluster / ClusterHQ
Differential Revision: https://reviews.freebsd.org/D6533
r296773 was done to only remove libc symbols for <7. We want to provide
the syscall symbols going forward for 7+.
Discussed with: jhb
MFC after: 3 days
Sponsored by: EMC / Isilon Storage Division
In particular, fix factual, grammatical, and spelling errors in various
comments, and remove comments that are out of place in this function.
Reviewed by: kib, markj
MFC after: 3 weeks
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7410
as invalidation will have completed before the pmap_invalidate_* functions
have complete.
Discussed with: alc, kib
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
The Hyper-V on pre-win10 systems will only report SPC-2 conformance,
but it actually conforms to SPC-3. The INQUIRY response is adjusted
to propagate the SPC-3 version information to CAM.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7405
On Zynq 256K-512K memory region is not accessible by all bus masters.
EHCI driver fails when trying to use it for DMA transfers. Patching
memory node does not help because ubldr overrides values there with
the ones obtained from u-boot. So as a workaround we just mark first
512K as reserved.
PR: 211484
Submitted by: Thomas Skibo <thoma555-bsd@yahoo.com>
MFC after: 3 days
Chelsio T4/T5 adapters are multifunction cards. The main driver uses
physical function 4 (PF4). However, VF devices for SR-IOV are only
supported on physical functions 0 through 3, where PF0 creates VFs tied
to port 0, etc. The t4iov/t5iov driver was previously added to
create VF devices for ports that are present on each adapter. This
change uses the recently added pci_iov_attach_name() function to
name the character device in /dev/iov after the associated port on
the card (e.g. /dev/iov/cxl0 is used to create VFs that share the
cxl0 port). With this in place, mark the t4iov/t5iov devices quiet
to prevent them from cluttering dmesg.
Reviewed by: rstone
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7402
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
Reviewed by: rstone
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7400
difference between files.
For pc98, put x86/mp_x86.c into the same place as used by i386 file list.
Fix typo in comment.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Owning Giant in the init/uninit is accidental due to the moment where
VFS modules initialization is performed, and is not enforced by the
VFS interface. The Giant lock does not prevent a parallel execution
of the code, it is VFS which implements the proper protocol.
Approved by: des (pseudofs maintainer)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Our mprotect() function seems to take a "const void *" address to the
pages whose permissions need to be adjusted. POSIX uses "void *". Simply
stick to the POSIX one to prevent us from writing unportable code.
PR: 211423 (exp-run)
Tested by: antoine@ (Thanks!)
CPU number for MPC85XX is a duplicate for E500. Since we don't support MPC85XX
"uncore" registers right now, rather than renumbering it simply remove it, and
it will properly use the E500 CPU ID.
Summary:
MPC85XX and QorIQ are very similar. When the DPAA dTSEC driver was
added, QORIQ_DPAA was brought in as a config option to support the differences
in hardware register settings between QorIQ (e500mc-, e5500- based) SoCs and
QUICC (e500v1/e500v2-based) SoCs, particularly in the Local Access Window (LAW)
target settings.
Unify these settings using macros to hide details and ease porting, and use a
new function (mpc85xx_is_qoriq()) to distinguish between QorIQ and QUICC SoCs at
runtime.
An alternative to using the function could be to use a variable initialized at
platform attach time, which may incur less overhead at runtime. Since it's not
in the critical path once booted, this optimization doesn't seem necessary at
first pass.
Reviewed by: nwhitehorn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D7294
The "rtentry" zone does not use UMA_ZONE_ZINIT, so it is invalid to assume the
mutex's memory will be zero. Without MTX_NEW, garbage backing memory may
trigger the "re-initializing a mutex" assertion.
PR: 200991
Submitted by: Chang-Hsien Tsai <luke.tw AT gmail.com>
This error looks like it was a simple copy-paste typo in the original commit
for this code (r275732).
PR: 204009
Reported by: Chang-Hsien Tsai <luke.tw AT gmail.com>
Sponsored by: EMC / Isilon Storage Division
VF devices use a different register layout than PF devices. Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7389
In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.
This addresses the same issue as commit 1e85b806f9 in Linux.
Reviewed by: cem, ngie
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
After further review of the spec, I do not think the current HotPlug
code handles slots with power controllers correctly. In particular,
the power state of the slot is to be inferred from other events, not
from examining the state of the power control bit in SLOT_CTL. For now,
disable PCI hotplug support on such slots.
PR: 211081
Tested by: Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
MFC after: 3 days
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.
One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.
For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.
Current parameters are autotuned on boot based on mp_cpus.
Autotune factors are very conservative and are subject to change later.
Reviewed by: kib, jhb
Tested by: pho
MFC after: 1 week
but only in the NETMAP code. This lead to the NETMAP code paths
passing nothing up to userland.
Submitted by: Ad Schellevis <ad@opnsense.org>
Reported by: Franco Fichtner <franco@opnsense.org>
MFC after: 1 day
report the size only after first opening. And due to the events are
asynchronous, some consumers can receive this event too late and
this confuses them. This partially restores previous behaviour, and
at the same time this should fix the problem, when already opened
provider loses resize event.
PR: 211028
MFC after: 3 weeks
This addresses a regression from an earlier upstream change which caused
cma_acquire_dev() to bypass the port GID cache and instead query the HCA
for each entry in its GID table. These queries can become extremely slow on
multiport devices, which has a negative impact on connection setup times.
Discussed with: hselasky
Obtained from: Linux
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
OpenZFS uses feature flags instead of a zpool version number to track
features since the split from Oracle. In addition to avoiding confusion
on ZFS vs OpenZFS version numbers, this also allows features to be added
to different operating systems that use OpenZFS in different order.
The previous zfs boot code (gptzfsboot) and loader (zfsloader) blindly
tries to read the pool, and if failed provided only a vague error message.
With this change, both the boot code and loader check the MOS features
list in the ZFS label and compare it against the list of features that
the loader supports. If any unsupported feature is active, the pool is
not considered as a candidate for booting, and a helpful diagnostic
message is printed to the screen. Features that are merely enabled via
zpool upgrade, but not in use, do not block booting from the pool.
Submitted by: Toomas Soome <tsoome@me.com>
Reviewed by: delphij, mav
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6857
to r254304, we had separate functions for reclamation and laundering
(vm_pageout_scan) versus updating usage information, i.e., "reference
bits", on active pages (vm_pageout_page_stats), and we only performed
vm_req_vmdaemon(VM_SWAP_IDLE) if vm_pages_needed was true. However, since
r254303, if vm_swap_idle_enabled was "1", we have performed
vm_req_vmdaemon(VM_SWAP_IDLE) regardless of whether we are short of free
pages. This was unintended and too aggressive, so I suspect no one uses
this feature. With this change, we restore the historical behavior and
only perform vm_req_vmdaemon(VM_SWAP_IDLE) when we are short of free
pages.
Reviewed by: kib, markj
- Re-write tcp_ctlinput6() to closely mimic the IPv4 tcp_ctlinput()
- Now that tcp_ctlinput6() updates t_maxseg, we can allow ip6_output()
to send TCP packets without looking at the tcp host cache for every
single transmit.
- Make the icmp6 code mimic the IPv4 code & avoid returning
PRC_HOSTDEAD because it is so expensive.
Without these changes in place, every TCP6 pmtu discovery or host
unreachable ICMP resulted in a call to in6_pcbnotify() which walks the
tcbinfo table with the write lock held. Because the tcbinfo table is
shared between IPv4 and IPv6, this causes huge scalabilty issues on
servers with lots of (~100K) TCP connections, to the point where even
a small percent of IPv6 traffic had a disproportionate impact on
overall throughput.
Reviewed by: bz, rrs, ae (all earlier versions), lstewart (in Netflix's tree)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D7272
Relying on the boot loader console configuration allows us to use a
common set of device hints for all SENTRY5 devices.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7376
allow us to add an ACPI attachment for arm64.
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7307
between ACPI and FDT. This will be needed on machines with both, e.g. the
SoftIron Overdrive 3000. The kernel will accept one or more comma separated
values of either 'acpi' or 'fdt'. Any other values are skipped.
To set it the user can either set it on the loader command line, or
in loader.conf e.g. in loader.conf:
kern.cfg.order=acpi,fdt
This will try using ACPI then FDT. If none of the selected options work the
kernel tries to use one to get the serial console, then panics.
Reviewed by: emaste (earlier version)
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7274
lot of this module and I want to get the style and whitespace changes in
a separate commit (or maybe more).
PR: 206185
Submitted by: Dmitry Vagin
MFC after: 1 month
Just make sure that the total channel packet size does not exceed 1/2
data size of the TX bufring.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7359
inner-shareable memory accesses. There is no need for full system barriers.
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Both variables are uint64_t, but they only count spins or sleeps.
All reasonable values which we can get here comfortably hit in 32-bit range.
Suggested by: kib
MFC after: 1 week
The current implementation uses non-temporal writes. This turns out to
be detrimental to performance if the page is used shortly after, which
is the typical case with page faults.
Switch to rep stos.
Reviewed by: kib
MFC after: 1 week
For reasons I won't comment on, the AR934x and QCA953x GPIO_OE register
value is inverted - bit set == input, bit clear == output.
So, fix this in the output setting, in reading the initial state from
the boot loader, and also setting any gpiofunc pins that are necessary.
No, this isn't a star trek science joke - sometimes LEDs are wired
up to be active low, so this is needed.
Submitted by: Dan Nelson <dnelson_1901@yahoo.com>
* iwm_poll_bit() returns 1 on success and 0 on failure, whereas
iwl_poll_bit() in Linux's iwlwifi returns >= 0 on success and < 0 on
failure.
* Because of the wrong iwm_poll_bit return code check, no warning was
printed if tx DMA stopping failed.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7371
parse() is the boot loader's interp_parse.c is too naive about quotes
both single and double quotes were allowed to be mixed, and single
quotes did not follow the usual semantics (re variable expansion).
The old code did not check for terminating quotes
This update implements:
* distinguishing single and double quote
* variable expansion will not be done inside single quote protected area
* will preserve inner quote for values like "value 'some list'"
* ending quote check.
this diff does not implement ending quote order check, it shouldn't
be too hard, needs some improvements on parser state machine.
PR: 204602
Submitted by: Toomas Soome <tsoome@me.com>
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6000
dosfs (fat file systems) can perform reads of partial sectors
bcache should support such reads.
Submitted by: Toomas Soome <tsoome@me.com>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D6475
- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.
Sponsored by: Chelsio Communications
I believe it never worked correctly for more the one queue even in Linux.
This fixes case when one of consumer drivers is not loaded on one side,
but its queues still announced as ready if something else brought link up.
While there, remove some pointless NULL checks.