Essentially, this is a literal copy of the code in sys/compat/cloudabi64,
except that it now makes use of 32-bits datatypes and limits. In
sys/conf/files, we now need to take care to build the code in
sys/compat/cloudabi if either COMPAT_CLOUDABI32 or COMPAT_CLOUDABI64 is
turned on.
This change does not yet include any of the CPU dependent bits. Right
now I have implementations for running i386 binaries both on i386 and
x86-64, which I will send out for review separately.
Right now we're casting uint64_t's to native pointers. This isn't
causing any problems right now, but if we want to provide a 32-bit
compatibility layer that works on 64-bit systems as well, this will
cause problems. Casting a uint32_t to a 64-bit pointer throws a compiler
error.
Introduce a TO_PTR() macro that casts the value to uintptr_t before
casting it to a pointer.
Though uio_resid is of type ssize_t, we need to take into account that
this source file contains an implementation specific to a certain
userspace pointer size. If this file provided 32-bit implementations,
this should have used INT32_MAX, even when running a 64-bit kernel.
This change has no effect, but is simply in preparation for adding
support for running 32-bit CloudABI executables.
On 32-bit platforms, our 64-bit timestamps need to be split up across
two registers. A simple assignment to td_retval[0] will cause the top 32
bits to get lost. By using memcpy(), we will automatically either use 1
or 2 registers depending on the size of register_t.
The reason why the old vDSOs were written in C using inline assembly was
purely because they were embedded in the C library directly as static
inline functions. This was practical during development, because it
meant you could invoke system calls without any library dependencies.
The vDSO was simply a copy of these functions.
Now that we require the use of the vDSO, there is no longer any need for
embedding them in C code directly. Rewriting them in assembly has the
advantage that they are closer to ideal (less useless branching, less
assumptions about registers remaining unclobbered by the kernel, etc).
They are also easier to build, as they no longer depend on the C type
information for CloudABI.
Obtained from: https://github.com/NuxiNL/cloudabi
This was .. an interesting headache.
There are two halves:
* The earlier IRIX stuff (yes, early) occasionally would do dead
code removal and generate multiple consecutive LO16 entries.
If this is done for REL entries then it's fine - there's no
state kept between them. But gcc 5.x seems to do this for
RELA entries.
eg:
HI1 LO1 HI2 LO2 LO3 HI4 LO4
.. in this instance, LO2 should affect HI2, but LO3 doesn't at all
affect anything. The matching HI3 was in code that was deleted
as "dead code".
Then, the next one:
* A "GCC extension" allows for multiple HI entries before a LO entry;
and all of those HI entries use the first LO entry as their basis
for RELA offset calculations.
It does this so GCC can also do dead code deletion without necessarily
having to geneate fake relocation entries for balanced HI/LO RELA
entries.
eg:
HI1 LO1 HI2 HI3 HI4 LO4 LO5 HI6 LO6 LO7
in this instance, HI{2,3,4} are the same relocation as LO4 (eg .bss)
and need to be buffered until LO4 - then the RELA offset is applied
from LO4 to HI{2,3,4} calculations.
/And/, the AHL from HI4 is used during the LO4 relocation calculation,
just like in the normal (ie, before this commit) implementation.
Then, LO5 doesn't trigger anything - the HI "buffer" is empty,
so there are no HI relocations to flush out.
HI6/LO6 are normal, and LO7 doesn't trigger any HI updates.
Tested:
* AR9344 SoC, kernel modules, using gcc-5.3 (mips-gcc-5.3.0 package)
Notes:
* Yes, I do feel dirty having written this code.
Reviewed by: imp (after a handful of "this should be on fire" moments wrt gcc and this code)
The default value of the tunable introduced in r304436 couldn't be
effectively overrided on VIMAGE kernels, because instead of being
accessed via the appropriate VNET() accessor macro, it was accessed
via the VNET_NAME() macro, which resolves to the (should-be) read-only
master template of initial values of per-VNET data. Hence, while the
value of udp_require_l2_bcast could be altered on per-VNET basis, the
code in udp_input() would ignore it as it would always read the default
value (one) from the VNET master template.
Silence from: rstone
The ip6_output routine is missing L2 cache invalication as done
in ip_output. Even with that code, some problems with UDP over
IPv6 have been reported. Diabling L2 cache for that problem works
around the problem for now.
PR: 211872 211926
Reviewed by: gnn
Approved by: gnn (mentor)
MFC after: immediate
(NB: This was likely a mismerge from XNU in audit support, where the
text argument to setlogin(2) is captured -- but as a text token,
whereas this change uses the dedicated login-name field in struct
audit_record.)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
ftruncate(2) system call. This was not required by the Common
Criteria, which needed only open-time audit.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
strings provided by user/config files. This update is replacing sprintf with
snprintf for cases the command_errbuf is built from dynamic content.
PR: 211958
Reported by: ecturt@gmail.com
Reviewed by: imp, allanjude
Approved by: imp (mentor), allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D7563
This driver only supports 10Mb Ethernet using PIO (the hardware supports
DMA, but the driver only does PIO). There are not any PCCard adapters
supported by this driver, only ISA cards. In addition, it does not use
bus_space but instead uses bcopy with volatile pointers triggering a
host of warnings. (if_ie.c is one of 3 files always built with
-Wno-error)
Relnotes: yes
to generate one. This is was U-Boot does to generate a random MAC so we end
up with the same MAC address as if U-Boot did generate it.
MFC after: 1 week
This hardware is not present on any modern systems. The driver is quite
hackish (raw inb/outb instead of bus_space, and raw inb/outb to random
I/O ports to enable ACPI since it predated proper ACPI support).
Relnotes: yes
The wl(4) driver supports pre-802.11 PCCard wireless adapters that
are slower than 802.11b. They do not work with any of the 802.11
framework and the driver hasn't been reported to actually work in a
long time.
Relnotes: yes
The si(4) driver supported multiport serial adapters for ISA, EISA, and
PCI buses. This driver does not use bus_space, instead it depends on
direct use of the pointer returned by rman_get_virtual(). It is also
still locked by Giant and calls for patch testing to convert it to use
bus_space were unanswered.
Relnotes: yes
Currently boot parameters (r0 - r3) are forgotten in ARM trampoline code.
This patch save them at startup and restore them before jumping into kernel
_start() routine.
This is usefull when booting with Linux ABI and/or custom bootloader.
Submitted by: Grégory Soutadé <soutade@gmail.com>
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D7395
Such processes are stopped synchronously by a direct call to
ptracestop(SIGTRAP) upon exec. P2_PTRACE_FSTP causes the exec()ing thread
to suspend itself while waiting for a SIGSTOP that never arrives.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7576
This permits a single early return for VF devices in the routines that
add sysctl nodes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7512
Specifically, the FW_PORT_CMD may or may not work for a VF (the PF
driver can choose whether or not to permit access to this command),
so don't attempt to fetch port information on a VF if permission is
denied by the PF.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7511
While here, mark which parameters are PF-specific and which are
VF-specific.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7508
Now that we've switched over to using the vDSO on CloudABI, it becomes a
lot easier for us to phase out old features. System call numbering is no
longer something that's part of the ABI. It's fully based on names. As
long as the numbering used by the kernel and the vDSO is consistent
(which it always is), it's all right.
Let's put this to the test by removing a system call (thread_tcb_set())
that's already unused for quite some time now, but was only left intact
to serve as a placeholder. Sync in the new system call table that uses
alphabetic sorting of system calls.
Obtained from: https://github.com/NuxiNL/cloudabi
- Read interrupt properties at bus enumeration time and store
it into global mapping table.
- At bus_activate_resource() time, given mapping entry is resolved and
connected to real interrupt source. A copy of mapping entry is attached
to given resource.
- At bus_setup_intr() time, mapping entry stored in resource is used
for delivery of requested interrupt configuration.
- For MSI/MSIX interrupts, mapping entry is created within
pci_alloc_msi()/pci_alloc_msix() call.
- For legacy PCI interrupts, mapping entry must be created within
pcib_route_interrupt() by pcib driver itself.
Reviewed by: nwhitehorn, andrew
Differential Revision: https://reviews.freebsd.org/D7493
And don't recreate RXBUF for each primary channel open, it is now
created in device_attach DEVMETHOD and destroyed in device_detach
DEVMETHOD.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7556
[set] notation. This fixes pattern matching for recently added drives
that would set the NCQ Trim being broken incorrectly.
PR: 210686
Tested-by: Tomoaki AOKI
MFC After: 3 days
in_broadcast() was iterating over the ifnet address list without
first taking an IF_ADDR_RLOCK. This could cause a panic if a
concurrent operation modified the list.
Reviewed by: bz
MFC after: 2 months
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7227
in_broadcast() can be quite expensive, so skip calling it if the
incoming mbuf wasn't sent to a broadcast L2 address in the first
place.
Reviewed by: gnn
MFC after: 2 months
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7309
For almost every packet that is transmitted through ip_output(),
a call to in_broadcast() was made to decide if the destination
IP was a broadcast address. in_broadcast() iterates over the
ifnet's address to find a source IP matching the subnet of the
destination IP, and then checks if the IP is a broadcast in that
subnet.
This is completely redundant as we have already performed the
route lookup, so the source IP is already known. Just use that
address to directly check whether the destination IP is a
broadcast address or not.
MFC after: 2 months
Sponsored By: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7266
On the first switch we previously released the newly allocated keyboard
instead of the old one. Keyboard state was very confused afterwards for
further keyboard switches.
Submitted by: bde
nat64_getlasthdr() returns an int, which can be -1 in case of error,
storing the result in an uint8_t and then comparing to < 0 is not
helpful. Do what is done in the rest of the code and make proto an
int here as well.
by returning success on FIONBIO and FIOASYNC IOCTLs. The actual flags
handling is done by the kern_ioctl() function.
Reported by: Alex Bowden <alex.bowden@outlook.com>
Sponsored by: Mellanox Technologies
MFC after: 1 week
axge_setmulti()/axge_setpromisc() with axge_rxfilter().
Multicast filter programming and promiscuous mode requires
access to a common RX configuration register so there is no need to
use separate functions with added complexity. axge_rxfilter() does
not read back AXGE_RCR register since accessing a register in USB
is too slow and we already have all knowledge of required
configuration. Rebuilding RX filter configuration is simpler and
faster than manipulating every bits after reading back the
register.
Note, axge_rxfilter() does not set RCR_IPE(IP header alignment on
32bit boundary) to disable extra padding bytes insertion. The
extra padding wastes ethernet to USB host bandwidth as well as
complicating RX handling logic. Current USB framework requires
copying RX frames to mbufs so there is no need to worry about
alignment. Previously axge_rx_frame() performed wrong bound check
due to the extra padding and it was broken when RX checksum
offloading is disabled. See added comment in axge_rx_frame () for
actual RX packet layout.
In axge_init(), disable WOL. It's meaningless to enable WOL in
normal operation.
In axge_rxeof(), use properly sized mbuf rather than blindly
allocating a mbuf cluster.
Use RX H/W checksum offloading only when administrator requested RX
checksum offloading. Previously it always used RX H/W checksum
offloading result regardless of RX checksum offloading state.
Separate L4 checksum offloading validation from L3 one and properly
set required offloading bits for each layer. This is to fix setting
L4 checksum offloading bits for L3 packets.
There are still lots of RX errors(probably RX FIFO overflows) under
moderate load. Users are strongly recommended to enable ethernet
flow control.
Reviewed by: kevlo (initial version), hselasky
This paves to nuke netvsc_packet, which does not serves much
purpose now.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7541
structures. This simplifies mbuf copy operation to USB buffers as
well as improving readability. The controller supports Microsoft
LSOv1(aka TSO) but this change set does not include the support due
to copying overhead to USB buffers and large amount of memory waste.
Remove useless ZLP padding which seems to come from Linux. Required
bits the code tried to set was not copied into USB buffer so it had
no effect. Unlike Linux, FreeBSD USB stack automatically generates
ZLP so no explicit padding is required in driver.[1]
Micro-optimize updating IFCOUNTER_OPACKETS counter by moving it out
of TX loop since updating counter is not cheap operation as it did
long time ago and we already know how many number of packets were
queued after exiting the loop.
While here, fix a checksum offloading bug which will happen when
upper stack computes checksum while H/W checksum offloading is
active. The controller should be notified to not recompute the
checksum in this case.
Reviewed by: kevlo (initial version), hselasky
Pointed out by: hselasky [1]
Updated sha512 from illumos.
Using skein from freebsd crypto tree.
Since loader itself is using 64MB memory for heap, updated zfsboot to
use same, and this also allows to support zfs large blocks.
Note, adding additional features does increate zfsboot code, therefore
this update does increase zfsboot code to 128k, also I have ported gptldr.S
update to zfsldr.S to support 64k+ code.
With this update, boot1.efi has almost reached the current limit of the size
set for it, so one of the future patches for boot1.efi will need to
increase the limit.
Currently known missing zfs features in boot loader are edonr and gzip support.
Reviewed by: delphij, imp
Approved by: imp (mentor)
Obtained from: sha256.c update and skein_zfs.c stub from illumos.
Differential Revision: https://reviews.freebsd.org/D7418
Allwinner Uniprocessor SoC.
As of now it works with A10 and A13 (and possibly R8 as it is the same as the A13).
Move files.a10 into a1o subdirectory as it should be.
Rename std.a10 into std.allwinner_up
This adds support for EARLY_PRINTF via the CFE console; the aim is to
provide a fix for the otherwise cyclic dependency between PMU discovery
and console printf/DELAY:
- We need to parse the bhnd(4) core table to determine the address (and
type) of the PMU/PLL registers and calculate the CPU clock frequency.
- The core table parsing code will emit a printf() if a parse error is
hit.
- Safely calling printf() without EARLY_PRINTF requires a working
DELAY+cninit, which means we need the PMU.
Errors in core table parsing shouldn't happen, but lack of EARLY_PRINTF
makes debugging more difficult.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7498
Use netisr_get_cpuid() in netisr_select_cpuid() to limit cpuid value
returned by protocol to be sure that it is not greather than nws_count.
PR: 211836
Reviewed by: adrian
MFC after: 3 days
This removes the need to set the MMC pins with pullups in our DTS.
Thanks to jmcneill@ for spotting this.
Tested on Orange Pi One (Allwinner H3).
MFC after: 1 week
GPIO_INPUT/GPIO_OUTPUT.
a10_gpio_get_pud now returns the whole pud not only PULLDOWN/PULLUP.
Add a10_gpio_get_drv to get the current drive strenght.
During fdt pin configure, avoid setting function/drive/pud if it's already in
the correct value.
Tested on Allwinner H3 and A20
MFC after: 1 week
aio_aqueue() calls aio_init_aioinfo() as the first action. There is no
need to duplicate the code in kern_aio_fsync().
Also fix indent for aio_aqueue() definition.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D7523
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
time, by, by default disallow writes to the mmaped HPET pages.
Intent is to allow userspace to use HPET as fast (i.e. no-syscall)
timecounter for gettimeofday(2). Unfortunately, the permission model
does not make it possible to safely unhide /dev/hpet in the jails even
if default mode is set to 0444, because untrusted jailed root may
change device permissions to writeable.
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
SRB status is set to 0x20 by the hypervisor, if the specified LUN is
unaccessible, and even worse the INQUIRY response will not be set by
the hypervisor at all under this situation. Additionally, SRB status
is 0x20 too, for TUR on an unaccessible LUN.
Deliver CAM_SEL_TIMEOUT to CAM upon SRB status errors as suggested by
Scott Long, other values seems improper.
This commit fixes the Hyper-V disk hotplug support.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7521
Some devices report that they have an MRL when they actually
do not. Since they always report that the MRL is open, child
devices would be ignored. Try to detect these devices and
ignore their claim of HotPlug support. Specifically,
if there is an open MRL but the Data Link Layer is active,
the MRL is not real.
Revert r303645 to re-enable HotPlug support for slots with
power controllers, since it works correctly in my testing.
Start the DLL state-change timer if Presence /or/ MRL state changes,
along with other conditions. Previously, we started the timer iff
Presence changed. If there is an MRL, it must be closed for power
to be turned on, so Presence is unlikely to change on an MRL-close event.
Add a printf() of interesting registers on HotPlug interrupts and
commands (one from erj@). These were very useful for debugging.
Guard them with bootverbose, since they're spam in normal operation.
In collaboration with: jhb
Reviewed by: jhb
MFC after: 1 day
Relnotes: yes (re-enable HotPlug support for slots with power controllers)
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7509
it (either async or sync drain).
At this moment the only user of drain is TCP, but TCP wouldn't reschedule a
callout after it has drained it, since it drains only when a tcpcb is closed.
This for now the problem isn't observed.
Submitted by: rrs
- Added a generic bhnd_nvram_parser API, with support for the TLV format
used on WGT634U devices, the standard BCM NVRAM format used on most
modern devices, and the "board text file" format used on some hardware
to supply external NVRAM data at runtime (e.g. via an EFI variable).
- Extended the bhnd_bus_if and bhnd_nvram_if interfaces to support both
string-based and primitive data type variable access, required for
common behavior across both SPROM and NVRAM data sources.
- Extended the existing SPROM implementation to support the new
string-based NVRAM APIs.
- Added an abstract bhnd_nvram driver, implementing the bhnd_nvram_if
atop the bhnd_nvram_parser API.
- Added a CFE-based bhnd_nvram driver to provide read-only access to
NVRAM data on MIPS SoCs, pending implementation of a flash-aware
bhnd_nvram driver.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7489
This replaces the bitfield representation of the bhndb register window
freelist with the bitstring API, eliminating a dependency on
(MIPS-unsupported) __builtin_ctz().
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7495
with softupdates panics the kernel. The problem that has been pointed
out is that when there is a transient write error on certain metadata
blocks, specifically directory blocks (PAGEDEP), inode blocks
(INODEDEP), indirect pointer blocks (INDIRDEPS), and cylinder group
(BMSAFEMAP, but only when journaling is enabled), we get a panic
in one of the routines called by softdep_disk_io_initiation that
the I/O is "already started" when we retry the write.
These dependency types potentially need to do roll-backs when called
by softdep_disk_io_initiation before doing a write and then a
roll-forward when called by softdep_disk_write_complete after the
I/O completes. The panic happens when there is a transient error.
At the top of softdep_disk_write_complete we check to see if the
write had an error and if an error occurred we just return. This
return is correct most of the time because the main role of the routines
called by softdep_disk_write_complete is to process the now-completed
dependencies so that the next I/O steps can happen.
But for the four types listed above, they do not get to do their
rollback operations. This causes the panic when softdep_disk_io_initiation
gets called on the second attempt to do the write and the roll-back
routines find that the roll-backs have already been done. As an
aside I note that there is also the problem that the buffer will
have been unlocked and thus made visible to the filesystem and to
user applications with the roll-backs in place.
The way to resolve the problem is to add a flag to the routines called
by softdep_disk_write_complete for the four dependency types noted
that indicates whether the write was successful (WRITESUCCEEDED).
If the write does not succeed, they do just the roll-backs and then
return. If the write was successful they also do their usual
processing of the now-completed dependencies.
The fix was tested by selectively injecting write errors for buffers
holding dependencies of each of the four types noted above and then
verifying that the kernel no longer paniced and that following the
successful retry of the write that the filesystem could be unmounted
and successfully checked cleanly.
PR: 211013
Reviewed by: kib
allocation unwinding.
Dandling buffers are released on UFS_BALLOC() failure to ensure that
later attempt to allocate blocks in close range do not find the blocks
with invalid content, since possible partial block allocations are
unwound. As such, it is not enough to just release the buffers, the
pages must also invalidated and removed from the vnode vm_object
queue. Otherwise the pages might be found later and used to
reconstruct indirect buffers when doing allocations at offset close to
the failure point, and their stale content compromise the filesystem
integrity.
Note that just marking the buffer as B_INVAL is not enough, B_NOCACHE
is required. To be sure, clear the B_CACHE flag as well. This
complements the r174973, which started releasing buffers.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
that recorded allocated blocks numbers match the physical block
numbers of dandling buffers which are released.
When finally freeing the blocks during unwind, assert that dandling
buffers where not re-allocated. They shouldn't, because the vnode lock
is owned exclusive.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
queue.h header file and in the queue.3 manual page that they are O(n)
so should be used only in low-usage paths with short lists (otherwise
an STAILQ or TAILQ should be used).
Reviewed by: kib
have SU enabled, there is no point in calling softdep_request_cleanup().
The call cannot produce free blocks, but we unecessarily lock ufsmount
and do inode block write. Usual point of not doing optimizations for
the corner case of the full volume is not applicable there, the work
is easily avoidable, and the addition CPU and disk io load do not lead
to succeeding retry.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
overflow local arrays. This is not immediately obvious from the
static code inspection, due to retry logic.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
alternate TCP stack in other then the closed state (pre-listen/connect).
The idea is that *if* that is supported by the alternate stack, it
is asked if its ok to switch. If it approves the "handoff" then we
allow the switch to happen. Also the fini() function now gets a flag
to tell if you are switching away *or* the tcb is destroyed. The
init() call into the alternate stack is moved to the end so the
tcb is more fully formed before the init transpires.
Sponsored by: Netflix Inc.
Differential Revision: D6790
Avoid unnecessary message type setting and centralize the send context
to transaction id cast.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7500
7033 ustack helper should fault on bad return values
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@a2f72b65eb
MFC after: 2 weeks
7034 negative record sizes should be rejected
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alex Wilson <alex.wilson@joyent.com>
illumos/illumos-gate@0b8049bfb0
MFC after: 2 weeks
This is a driver for a pre-ATAPI ISA CD-ROM adapter. As noted in
the manpage, this driver is only useful as a backend to cdcontrol to
play audio CDs since it doesn't use DMA, so its data performance is
"abysmal" (and that was true in the mid 90's).
Make the kern_fsync() function public, so that it can be used by other
parts of the kernel. Fix up existing consumers to make use of it.
Requested by: kib
If the caller of sem_post() wakes up a thread sleeping via sem_wait()
before it clears the has_waiters flag, the caller of sem_wait() has no way of
knowing when it is safe to destroy the semaphore and reuse the memory. This is
because the caller of sem_post() may be interrupted between the wake step and
the clearing of has_waiters. It will then write into the has_waiters flag in
userspace after being preempted for some unknown amount of time.
Reviewed by: jhb, kib, vangyzen
Approved by: kib (mentor), vangyzen (mentor)
MFC after: 2 weeks
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7505
close functions. Scattered calls to sc_cnputc() and sc_cngetc() were
broken by turning the semi-reentrant inline context-switching code in
these functions into the grabbing functions. cncheckc() calls for
panic dumps are the main broken case. The grabbing functions have
special behaviour (mainly screen switching in sc_cngrab()) which makes
them unsuitable as replacements for the inline code.
Standard VOP_FSYNC() implementation just syncs data buffers, and due
to this, is the correct and efficient implementation for msdosfs or
any other filesystem which uses bufer cache trivially. Provide
globally visible wrapper vop_stdfdatasync_buf() for future consumption
by other filesystems.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2). For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC(). This is functionally correct but not efficient.
This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.
Reviewed by: mckusick
Discussed with: avg (ZFS), jhb (AIO part)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471
Simply change the mode to K_XLATE using a local variable and use the
grab level as a flag to tell screen switches not to change it again,
so that we don't need to switch scp->kbd_mode. We did the latter,
but didn't have the complications to update the keyboard mode switch
for every screen switch. sc->kbd_mode remains at its user setting
for all scp's and ungrabbing restores to it.
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7484
- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7483
Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.
While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7476
Like scr_lock, the grab count needs to be per-physical-device to work.
This bug corrupted the grab count on both vtys if the ungrabbed vty is
different from the console, and failed to restore the keyboard state
on the ungrabbed vty, but not restoring it usually left the keyboard
mode part of the keyboard state uncorrupted at 1 (K_XLATE), while
after this fix the keyboard mode part is usually corrupted to 0 (K_RAW).
While here, rename the grab count from grabbed to grab_level.
This bug corrupted the grab count on both vtys if the ungrabbed vty is
different from the console, and failed to restore the keyboard state
on the ungrabbed vty, but not restoring the latter usually left the
keyboard mode part of it uncorrupted at 1 (K_XLATE), while after this
fix the keyboard mode part is usually corrupted to 0 (K_RAW).
While here, rename the grab count from 'grabbed' to grab_level.
- never call up to the tty layer to restart output for keyboard input in
console mode. This was already disallowed in kdb mode. Other cases
are rarely reached.
- disable the reboot, halt and powerdown keys in console mode. The suspend,
standby and panic keys are still allowed, and aren't even conditonal
on excessive configuration options. Some of these actions are still
available in ddb mode as ddb commands which are equally unsafe. Some
are useful at input prompts and should be restored when the locking is
fixed.
- disallow bells in kdb mode (should be in console mode, but the flag for
that is not available). Visual bell gives very alarming behaviour by
trying to use callouts which don't work in kdb mode. Audio bell uses
timeouts and hardware resources with mutexes that can deadlock in
reasonable use of ddb.
Screen switches in kdb mode are not very safe, but they are important
functionality and there is a lot of code to make them sort of work.
restores avoidance of doing dangerous things like calling wakeup() and
callouts while in ddb.
Initialization of 'debugger' was broken by removing the cndbctl() console
method that was used mainly in this driver to initialize 'debugger' and
switch to the console screen on entry to ddb. The screen switch was
restored using the cngrab() method, but cngrab() is more general so it
should not initialize 'debugger' and never did. 'debugger' was just
an over-engineered alias for kdb_active anyway. It existed because
kdb_active (when it was named ddb_active) was considered as a private
kdb variable, and there are ordering problems initializing the variables
atomically with the state that they represent, but an extra variable and
method to set it increased these problems.
The bug caused LORs, but WITNESS is normally misconfigured with
WITNESS_SKIPSIN so it doesn't check the spinlocks used by wakeup() and
callouts.
virtual-device, but needs to be per-physical-device so that it protects
shared data. Usually, scp->sc->write_in_progress got corrupted first
and further corruption was limited when this variable was left at nonzero
with no write in progress.
Attempt to fix missing lock destruction in r162285. Put it with the
lock destruction for r172250 after moving the latter. Both might be
unreachable.
To demonstrate the bug, find a buggy syscall or sysctl that calls
printf(9) and run this often. Run hd /dev/zero >/dev/ttyvN for any
N != 0. The console spam goes to ttyv0 and the non-console spam goes
to ttyvN, so the lock provided no protection (but it helped for
N == 0).
Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.
PR: 211796
Obtained from: OpenBSD (sthen)
MFC after: 3 days
Fix PC_REGS() so that printing of instructions works in some useful
cases. ddb only understands a single flat address space, but this
macro allows mapping $cs:$eip into vm86's flat address space well
enough for the MI parts of ddb. This doesn't work for the MD parts
that do stack traces, and there are no similar macros for data addresses.
PC_REGS() has to use the trapframe pointer instead of the pcb for this.
For other CPUs, the trapframe pointer is not available except by tracing
back to it. But tracing back through vm86 trapframes is broken even
starting with one.
It was remarkably hard to trace all current threads. "show pcpu" only
showed the pid, and there was nothing (?) better than searching ps output
to find the tids on CPUs. This change simplifies the search, but you
still have to trace the tid for each CPU manually.
the fixed-width numeric field 'wchan', as in ps(1). They were sort
of centered, although the template shows 'state' as right-justified.
The `wmesg' field very rarely has a prefix of '*' (for lock names)
that is still to the left of the header, and the width of this field
is reduced from 8 to 7 (more than 6 is an error).
The 'wmesg' and 'wchan' fields are still misnamed and poorly handled.
They are named sort of backwards relative to ps(1):
- wmesg in ddb = mwchan in ps
- wmesg in ddb = wchan in ps (if it is a wait channel name, not a lock name)
- wchan in ddb = nwchan in ps
ddb ps wastes lots of space for the unimportant 'wchan' field (20
columns altogether on 64-bit arches). ps(1) documents using a
compressed format, but the compression only omits leading nybbles of
0 so it has neveqr worked on arches that put the kernel in the top half
of the address space. It just avoids wasting space for an 0x prefix.
single stepping of multiple instructions (e.g., s/p,<count> and n/p).
db_print_loc_and_inst() already prints a newline on all arches although
it probably shouldn't.
Especially on SMP systems, single stepping tends to deadlock or panic
too quickly to be useful for anything except finding bugs in itself,
but with printing "itself" includes console drivers so it is useful
for generating stress tests for console drivers.
mpc85xx_map_dcsr() returns a vm_offset_t, not an error code.
mpc85xx_fix_errata() will gracefully exit if mpc85xx_map_dcsr() returns 0, as
that indicates an error (NULL pointer).
__syncicache() only syncs the icache on the current CPU, it doesn't touch the
cache on any other core. Replace the call with cpu_flush_dcache() instead.
Since bp_kernload is not touched again by the boot CPU in this code path, dcbf
is no less efficient than the dcbst from __syncicache() by invalidating the
cache line.
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC. Using a rewritable asm stub, print any SPR
provided on the command line.
Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).
Note also that this relies on the kernel text pages being writable. If in the
future this is made not the case, this will need to be reworked.
Test Plan:
Printing the Processor Version Register (PVR, SPR 287):
db> show spr 11f
SPR 287(11f): 80240012
Differential Revision: https://reviews.freebsd.org/D7403
incorrect from the error cases in exec_map_first_page(). They are
unnecessary because we automatically unbusy the page in vm_page_free()
when we remove it from the object. The calls are incorrect because they
happen after the page is freed, so we might actually unbusy the page
after it has been reallocated to a different object. (This error was
introduced in r292373.)
Reviewed by: kib
MFC after: 1 week
Summary:
u-boot, following the ePAPR specification, puts secondary cores into a
spinloop at boot, rather than leaving them shut off. It then relies on the host
OS to write the correct values to a special spin table, located in coherent
memory (on newer implementations), or noncoherent memory (older
implementations).
This supports both implementations of ePAPR, as well as continuing to support
non-ePAPR booting, by first attempting to use the spintable, and falling back to
expecting non-started CPUs.
Test Plan:
Booted on a P5020 board. Tested before and after the changes.
Before the changes, prints the error "SMP: CPU 1 already out of hold-off state!"
and panics shortly thereafter. After the changes, same boot method lets it
complete boot.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D7494
The module works together with ipfw(4) and implemented as its external
action module.
Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.
A configuration of instance should looks like this:
1. Create lookup tables:
# ipfw table T46 create type addr valtype ipv6
# ipfw table T64 create type addr valtype ipv4
2. Fill T46 and T64 tables.
3. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
4. Create NAT64 instance:
# ipfw nat64stl NAT create table4 T46 table6 T64
5. Add rules that matches the traffic:
# ipfw add nat64stl NAT ip from any to table(T46)
# ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.
Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.
A configuration of instance should looks like this:
1. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
2. Create NAT64 instance:
# ipfw nat64lsn NAT create prefix4 A.B.C.D/28
3. Add rules that matches the traffic:
# ipfw add nat64lsn NAT ip from any to A.B.C.D/28
# ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6434
ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.
Subsequent commits will update nfsstat(8) to use the new fields.
Submitted by: will (earlier version)
Reviewed by: ken
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D1626
- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
and avoid multiple calls
Submitted by: shurd
Reviewed by: gallatin
Approved by: scottl
Differential Revision: https://reviews.freebsd.org/D7393
Some defines needed for exporting serial numbers from the SMBIOS were
missed during integration of SMBIOS support in the EFI boot loader (r281138).
This is needed for getting the hostid set from the system hardware UUID.
PR: 206031
Submitted by: Thomas Eberhardt <sneakywumpus@gmail.com>
MFC after: 1 week
These may have ccache in them or -target/--sysroot from external
compiler or SYSTEM_COMPILER support. Many ports do not support
a CC with spaces in it, such as emulators/virtualbox-ose-kmod.
Passing --sysroot to ports makes no sense as ports doesn't support
--sysroot currently.
If these variables need to be overridden for ports then they can
be set in make.conf or passed as make arguments.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
promote memory as I am not sure all the demotion cases are handled, however
it is useful to implement pmap_page_set_memattr. This is used, for example,
when mapping uncached memory for bus_dma(9).
pmap_page_set_memattr needs to demote the DMAP region as on ARM we need to
ensure all mappings to the same physical address have the same attributes.
Reviewed by: kib
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6987
The current in_pcb.h includes route.h, which includes sockaddr structures.
Including <sys/socketvar.h> should require <sys/socket.h>; add it in
the appropriate place.
PR: 211385
Submitted by: Sergey Kandaurov and iron at mail.ua
Reviewed by: gnn
Approved by: gnn (mentor)
MFC after: 1 day
The only difference between 3 and 3B is the size of the RJ45 port.
And now we have a uboot port that expect pcduino3.dts to be present.
Reported by: imp
Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.
Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.
Also missing sysctls for the current tunables so their values can be seen.
PR: 207446
Reported by: Andy Carrel
MFC after: 3 days
Relnotes: Yes
Sponsored by: Multiplay
Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.
PR: 211256
Tested by: Victor Chernov
MFC after: 3 days
It was added in r153192 for XFS and doesn't appear to have been used for
anything else. XFS was disconnected in r241607 and removed entirely in
r247631.
Reported by: mlaier
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D7468